this post was submitted on 24 Jun 2023
21 points (100.0% liked)

datahoarder

6758 readers
2 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago
MODERATORS
 

It would be a shame to lose the wealth of knowledge with easy-ish search that subreddits like datahoarder provide if the subreddit is taken down or stays locked forever. Sure it is currently accessible, but will it stay that way?

I know it is being archived, but the accessibility part is the problem.

top 8 comments
sorted by: hot top controversial new old
[–] NightOwl@lemmy.one 5 points 1 year ago (1 children)

I have wondered is there an easy way to perform search through wayback machine for archived reddit data?

And for comments people back up to csv with stuff like power suite delete is there a nice way that displays them as opposed to excel?

[–] Bread@sh.itjust.works 5 points 1 year ago* (last edited 1 year ago) (1 children)

It could be done, but that really isn't the best possible solution in my opinion. What I was thinking was having a bot migrate all the comments and posts here (or another instance). So the bot would take all the names of the users and replace them with the bot's names (instead of trying to create new users on lemmy) and put the old usernames in their comment. Like "Bread commented" and their comment. So we know who said it still.

If the bot maker had control of the instance, we probably might be able to put everything in chronological order by timestamp. So it would look like the comments were all made here orginally. The only indicator it wasn't would be the bot name as the username. So search algorithms would be able to search it just like reddit.

I believe the best way to archive a forum style website, would be on a forum where things have one to one equals.

As for moving Datahoarder to a new instance, that sure would make backups a lot nicer if a datahoarder ran it. I am surprised that it isn't on its own already considering the topic. Same thing with self-hosted.

[–] OutrageousUmpire@lemmy.world 1 points 1 year ago

I love this idea. It raises some issues to think about, too. Like, who “owns” that data? Would Reddit file a lawsuit against the Lemmy instance arguing that the data belongs to Reddit? Does the data belong to the users who posted? What TOS do we agree to when signing up for a Reddit account? Are we giving them ownership of all content we post?

[–] nxlemmy@lemmy.ml 4 points 1 year ago (1 children)

There’s actually already a Reddit to Lemmy importer that lets you bring threads including comments https://github.com/rileynull/RedditLemmyImporter

[–] Bread@sh.itjust.works 2 points 1 year ago

Yeah, that appears to be what I had in mind. Good find!

[–] yakabuff@lemmy.world 3 points 1 year ago* (last edited 1 year ago) (1 children)

If you're just interested in searching:

http://redarc.basedbin.org/search

/r/datahoarder is indexed and searchable

[–] Bread@sh.itjust.works 3 points 1 year ago

That is good to know that exists, thanks! Although I still personally believe it being in a forum like lemmy is the best way to preserve it in its original format.

[–] venoft@lemmy.world 2 points 1 year ago

The-eye has a nice archive of Reddit: https://the-eye.eu/redarcs/

load more comments
view more: next ›