It's A Digital Disease!

86 readers
1 users here now

This is a sub that aims at bringing data hoarders together to share their passion with like minded people.

founded 1 year ago
MODERATORS
1
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Krimreaper1 on 2024-11-09 01:01:35+00:00.


Robots, planets, cities etc. I have used Unicron already as an example.

2
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/TR1771N on 2024-11-08 23:39:48+00:00.

3
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/AshleyUncia on 2024-11-09 04:23:41+00:00.

4
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Special_Agent_Gibbs on 2024-11-08 13:22:19+00:00.


Does anyone have advice on how data from a website, primarily file based data, can be downloaded and preserved in an automated way? The website I’m thinking of (data dot gov) has thousands of CSV files (among others) and I’d like to see those files preserved before they are potentially deleted as early as next year.

5
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/ShovvTime13 on 2024-11-08 03:00:58+00:00.

6
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/tilderain on 2024-11-08 00:40:20+00:00.


Cave Story is arguably the father of modern indie games, yet despite being so influential, has much of its history lost to time.

In November 2004, the creator Pixel released over a dozen versions for public testing before the official launch, and only a tiny amount of people would have even played them. These have been lost for nearly two decades... until now.

As of yet, we have found 3 kind and charming individuals on Twitter who have somehow kept their hard drives from back then in working order, and very thankfully, were willing to give us their old versions and other related tools from that time. We appreciated it a lot and picked through the builds with joy, but there are many versions of the game still lost.

As hard drives fail over time, I'd like to get the word out before then to try to preserve the history of such a classic game, and it'd be a nice 20th anniversary gift for us starved Cave Story nerds to look through.

If you know any leads or could connect us to people who know more, please let us know. Thanks 😃

Feel free to join our discord server if interested:

TCRF page with some of the differences between versions:

Threads with more backstory:

7
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/jusumonkey on 2024-11-07 22:03:34+00:00.

8
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/L1011TriStar on 2024-11-08 03:05:42+00:00.


All,

Last week I posted about our massive project in digitizing a VHS collection, one of the largest VHS collections in the United States consisting of 20,000+ VHS tapes recorded from 1987-2014.

Since that post, we received enough donations to acquire 2 additional VHS recorders and hard drives to preserve the tapes. Once that money is deposited, we will have a total of 5 recording decks running! This is major for speeding up the project!

THANK YOU SUPPORTERS! YOUR DONATIONS MEAN A LOT TO THIS PROJECT AND YOU WILL GET RECOGNITION FOR BEING A PART OF THIS!

THE MORE DONATIONS RECEIVED, THE MORE WE CAN RECORD!

It will take about a month to actually receive the funds and once that happens and we purchase the recorders, we will have 5 recording decks running by January! This shaves the estimated 20-years of recording by a few years! (Yeah we know, it's wild to think that time span for this), but the more we get, the more we can record!

All donations are used specifically for this project, and the more donations we get, the more we can record and provide, so please consider as each bit really does help!

Also a reminder that you can follow along and assist in labeling, viewing, identifying new things, etc. in our Discord

9
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/r4almF1re on 2024-11-07 21:11:24+00:00.


I didn't realize people were hoarding 100TB of stuff, I usually keep

  1. Films and TV shows I loved and will usually re watch or watch with someone special
  2. Oh that sweet sweet corn you just can't ever lose because we've all been there frantically searching the internet for that specific video and never finding that shit
  3. Install files of apps because who the fuck wants to have to download that stuff over and over, and obviously fun stuff like having to have an internet connection to download Wi-Fi drivers and also you have to keep that certain update that just did that thing you wanted
  4. My own personal files

In that order. Tell me your stuff so I can level up guys,

10
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/shmittywerbenyaygrrr on 2024-11-07 04:22:23+00:00.


About 2 weeks ago i bought a couple 12tb drives that were refurbished and on sale. I immediately made a script in python to scrape all the games for every console ever up to ps4/xbone. I set up obs to mirror movies and tvshows and even configured my VAC's properly so i can do whatever on my pc as a show or movie is recording.

I get it now; i dont think i understood the feeling you get when you have EVERYTHING until i did. This is power. I will always have nintendogs, pokemon. I even find it pleasing to have every Disney/Barbie game even if i never have touched them in my life.

I need more. I must make a NAS and have even more storage. I need PB's now. I have so many things i want to HAVE. Im going to archive the world! evil laugh

Downside: my gaming backlog has now increased by about 7000 titles because of this.

11
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/ReadPixel on 2024-11-06 18:02:44+00:00.

12
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/untamedeuphoria on 2024-11-06 10:25:08+00:00.

13
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/miked999b on 2024-11-06 13:18:01+00:00.


Like most of us here, I just accumulate stuff because downloading and curating is fun and very addictive.

I have four internal drives and seven externals. About 100TB in total. It got to the point where every single drive was almost full. File explorer in Windows a sea of red. I'd juggled stuff about from one drive to another as much as I possibly could, but there was nowhere left to go. I needed another drive.

Somehow I just couldn't stomach the thought of buying yet another drive. Wasting hundreds of pounds just to add more stuff I don't even use.

I have all these TV shows that I've never watched and almost certainly never will, but it's nice to have the choice, right? I've also often thought why do I have all nine seasons of this extremely common and easy to obtain show that I've never watched a minute of? Same with films. I've got 1,300 of them. I don't watch films, at all. I've watched one film in 2024. But hey, I might one day.

I always thought it would make sense to just keep season 1 of shows and delete the rest, and download them if/when I need them. I have fast internet, usenet, public trackers, private trackers, real debrid. It's so easy to get stuff. But I could never bring myself to do it. I just couldn't. You know how it is.

But one show was taking up 0.7TB on it's own and I've never watched it. I had to do something, so I deleted season 2 onwards. And seeing the difference it made triggered something inside me. I'd broken through the mental barrier and then I couldn't stop. Spent a whole afternoon deleting seasons 2 onwards of almost every easily obtainable show I had. It felt amazing seeing the free space numbers go up and up.

When I was done I had roughly 17TB of free space. File explorer now a sea of blue. One of my drives had almost 6TB free, wtf? It felt amazing, like I'd freed myself from something. Two weeks on I don't regret it one bit and I haven't missed any of the stuff I deleted in any way.

Not sure if this is an advice post or a confessional at this point 😂 This post will probably go down like a lead balloon in here, but seriously - deleting stuff felt so incredibly freeing and now I have tons of space for things that are actually useful and that I might actually want and use!

14
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Kended on 2024-11-06 06:59:21+00:00.


Well it finally happened to me. I was manually cleaning up an odd temporary file I found in one of my YouTube channel directories that I archive, and I carelessly used the “rm” command without thinking about what it would do.

The command ran for a couple seconds, and there I sat thinking “Oh no, why is it taking so long.”. Then after a quick ls command, the panic set in and I realized I deleted now mostly missing and unretrievable content.

I froze for a minute and thought about the best way to recover, test disk, raid, whatever, and then quickly remembered my mirror drives that my main data drives rsync to once a week. There was about 5 minutes where I carefully double checked all the commands I was entering, and then BAM! Rsync got to work on restoring my deleted directory from my backup drives.

Those backups run once a week via rsync, and have been doing so for about a year now, never once being used or even thought about, until today when I finally REALLY needed them.

All this to say, always have a backup of any data you care about. I know most of us here don’t need another lesson on their importance, but I hope my short story can serve as a lesson on how critical they really are for us in this hobby. I know having a mirror backup saved my data this time, and hopefully it’ll continue to do so in the future.

EDIT: I did recover my data thanks to the backup drives. I was able to use Rsync to copy the deleted channel from a backup that the main drives back up to once a week.

15
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/tuxon64 on 2024-11-05 22:58:09+00:00.


My external storage needs are probably around 6-8 TB. I initially thought of a NAS and liked the idea of running a plex server on that. However my home is not network friendly. I tried those powerline ethernet adapters and transfer rates were pathetic.

I'm leaning towards a DAS and I like the offerings from OWC, mainly because the enclosures have extra USB-A ports and a SD card reader, things Apple keeps eliminating, and these enclosures are about the same price as the ones from Terramaster and QNAP. I'm thinking of getting recertified EXOS drives from serverpartdeals for a little over $10/TB.

My use case is photos/videos, time machine backup, and movies to stream via Plex to an AppleTV. I'm currently using a 5TB external drive, USB3.0 and I can edit 4k videos off that. Data transfer rates are around 100 MB/s. With a 7200 drive and thunderbolt 3 connection I'm thinking I would get around 200 MB/s.

And if you come this far, a total newbie question. I haven't decided to just start with JBOD or go RAID1. Everyone says RAID is not a backup. Is that because of the chance the enclosure itself fails? I can't see both drives failing at the same time and I understand for a true backup you need offsite storage. But life comes with risk and I'm willing to risk a fire or some rando robs my house and takes my hard drives. I will periodically backup the real important stuff with my loose external drives.

16
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/edTechrocks on 2024-11-05 21:31:33+00:00.


I have a Synology 8 drive NAS. Each slot has a Seagate EXOS 18TB drive. About 10 days ago two of the drives failed. It was strange that they failed on the same weekend, but I guess that happens. Good thing I've got RAID6 with 2 drive redundancy.

I go to Seagate's website for an RMA. Their system keeps telling me there's an error at the last step of the order. I try with multiple credit cards and addresses, each one giving an error (I'm in USA). Then I get a fraud alert from one of my credit cards. Seagate charged me to overnight shipping like 9 times across all the credit cards. Wonderful...

I get on with chat. No one can figure out what's going on. They escalate and escalate. Finally some guy says he wants me to try a different email address. I do that... and they charge me again with an order failure. Finally some guy says he's going to do it on their side and the next day I'll get an email with tracking. Next day... no tracking.

I get on Chat and the guy tells me they're shipping it ground delivery, not overnight. Wonderful... I get an email from UPS 4 days later saying my order is being shipped overnight. It took them 5 days to package the drives to ship them overnight.

Today I get the two drives. One of them doesn't start at all. I did all the troubleshooting. They just sent me a dead drive.

Seagate is not serious about their warranty support. Their systems are broken. Their staff is disempowered. Their testing is nonexistent.

17
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Howhaveubeenmypal on 2024-11-05 19:05:23+00:00.


Hello guys. Its indeed an emergency for people who are interested in anime and manga. So this website is closing down on 26th November. The thing which makes this site really important is that it hosts a plethora manga which are either out of print or really obscure. As a result of which this site has been serving scanlation communities or hardcore otakus for nearly a decade for they can experience non-mainstream titles which are nearly impossible to find even if somebody is ready to buy them. So you guys can imagine that those really cool titles all the way from 70s to 2010s will become lost media forever if this site goes down before someone backing up those. The reason why this site is getting shut down is explained here . Apparently its a financial settlement issue involving card companies. So I request you guys to please store as much as possible from the site. I informed other online communities as well but I am looking up to r/DataHoarder cuz I know there are lots of fans here as well. And yes I am doing my part but I am limited by adequate technical know how and equipments. Hence guys save us all.

18
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/YanniRotten on 2024-11-05 18:55:02+00:00.

Original Title: After its website was crippled for nearly a month by a cyberattack, the Internet Archive announced on Monday that it had restored one of its most valuable services—the Save Page Now feature that allows users to add copies of webpages to the organization’s digital library.

19
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/throwmeaway2793 on 2024-11-05 17:16:56+00:00.


sometimes i feel like a legit hoarder, like those people who's houses are filled wall-to-wall with crap that has no practical value or use to them and that they'll never use or look at

for me it's 99% nsfw stuff

i have already downloaded far more than i'll ever look at, even if i spent all my free time going through it - and that's the thing, i almost never actually go back to what i've downloaded, but there's this fear of missing out that keeps pushing me to continue

at this point i'm hoarding just to hoard. is that bad? does anyone else ever feel this way about the data you hoard?

20
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/Zestyclose_Car1088 on 2024-11-05 13:41:00+00:00.


TLDR:

My Recommendations:

I did a quick evaluation of some of the most popular YouTube downloaders, here's the rundown:

Scheduled Downloaders Comparison Table

| Feature | PinchFlat | TubeArchivist | TubeSync | ChannelTube | YoutubeDL-Material | ytdl-sub-gui | |


|


|


|


|


|


|


| | Simple/Nice UI | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | | Lightweight and Quick | ✅ | ❌ | ❌ | ✅ | ✅ | ❌ | | Self-contained Image | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | | Easy Setup | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ | | Auto-Delete Old Files | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | | Filter Text | ✅ | ❌ | ✅ | ✅ | ❌ | ✅ | | Built-in Player | ✅ | ✅ | ❌ | ❌ | ✅ | ❌ | | Audio Only Option | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Single Download | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | | Highly Customizable | ✅ | ✅ | ✅ | ❌ | ❌ | ✅ | | Defer Download | ❌ | ❌ | ✅ | ✅ | ❌ | ✅ |

Overview

...

Once-off Downloader Comparison Table

| Tool | GitHub Stars | Pulls | Size | Nice Mobile Experience | Nice Desktop Experience | Fast Performance | Easy to Select Storage Location | Flexible Usage | |


|


|


|


|


|


|


|


|


| | yt-dlp-web-ui | 800+ | 100k+ | 238.51 MB | ❌ | ❌ | ✅ | ❌ | ✅ | | meTube | 6k+ | 5M+ | 292.14 MB | ✅ | ✅ | ❌ | ✅ | ✅ | | YouTubeDL-Material | 2.6k+ | 80k+ | 1.2 GB | ✅ | ✅ | ✅ | ❌ | ✅ | | TubeTube | 90+ | 6k+ | 271.61 MB | ✅ | ✅ | ✅ | ✅ | ❌ | | JDownloader | 700+ | 50M+ | 304.08 MB | ❌ | ❌ | ✅ | ✅ | ✅ |

Overview of Each Tool

  1. yt-dlp-web-ui
    • Pros: Offers a variety of options for downloading.
    • Cons: The UI can be a bit clunky; somewhat involved setup to configure folders.
  2. meTube
    • Pros: User-friendly interface, ability to easily manage audio and video storage locations, and create custom folders directly from the UI.
    • Cons: The mobile UI can be a little cluttered; only supports single downloads at a time.
  3. YouTubeDL-Material
    • Pros: Built-in media player and subscription options.
    • Cons: Requires an external database; slightly cluttered UI.
  4. TubeTube
    • Pros: Simple interfaces for both mobile and desktop; can support parallel downloads.
    • Cons: Folder and format settings must be done via YAML before running (no setup options available in the UI). Less flexible.
  5. JDownloader
    • Pros: Over 50 million downloads, reliable for bulk downloading.
    • Cons: Limited testing due to UI challenges.

Conclusion

There may be some errors (apologies) in my observations, but this was my experience without delving too far into it, so take it with a pinch of salt. Time for docker system prune!

A big thank you to all the developers behind these projects! Be sure to star and support them!

21
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/SingingCoyote13 on 2024-11-05 11:34:59+00:00.


since the recent hacks past month, it would be advised to change your password(s). the login service is available now. you can change your passwords !

22
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/djmac81 on 2024-11-05 09:34:56+00:00.

23
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/DownVoteBecauseISaid on 2024-11-04 15:35:44+00:00.


Video Game Preservation has been in the news recently with the US Copyright Office just striking down an effort to make it easier to preserve and play legacy video games. Many in the industry say that more needs to be done but is it just talk? The ESA along with game publishers don't seem to be interesting in preserving video games at all. This is why me, you - the community - need to take charge and do it ourselves. We discuss the complex landscape of Video Game Preservation, Piracy and Emulation in today's episode.

I think it is great that he puts so much effort towards video game preservation.

24
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/lonelyroom-eklaghor on 2024-11-04 14:45:49+00:00.


First things first, the literal link from where you can request your Reddit data. If you have an alt account bearing a lot of evidence against a legal problem, then I HIGHLY advise you to request your own data. Unencrypted messages are a bane, but a boon too.

I don't know about the acts involved, but I have used GDPR to access the data. Anyone of you can add any additional legal info in the comments if you know about it or about the other acts.

Importing the files into your device

What do you get?

A zip file containing a bunch of CSV files, that can be opened on any spreadsheet you know.

How am I going to show it? (many can skip this part if you prefer spreadsheet-like softwares)

I will be using SQLite to show whatever is out there (SQLite is just the necessary parts from all the flavours of SQL, such MySQL or Oracle SQL). If you want to follow my steps, you can download the DB Browser for SQLite (not a web browser lol) as well as the actual SQLite (if you want, you can open the files on any SQL flavour you know). The following steps are specific for Windows PCs, though both of the softwares are available for Windows, macOS and Linux (idk about the macOS users, I think they'll have to use DB Browser only).

After unzipping the folder, make a new database on the DB Browser (give it a name) and close the "Edit Table Definition" window that opens.

From there, go to File > Import > Table from CSV file. Open the folder and select all the files. Then, tick the checkboxes "Column names in First Line", "Trim Fields?", and "Separate Tables".

A screenshot of the Import CSV File window, of GiantJupiter45 (my old account)

After importing all that, save the file, then exit the whole thing, or if you want, you can type SQL queries there only.

After exiting the DB browser, launch SQLite in the command prompt by entering sqlite3 .db. Now, just do a small thing for clarity: .mode box. Then, you can use ChatGPT to get a lot of SQL queries, or if you know SQL, you can type it out yourself.

The rest of the tutorial is for everyone, but we'll mention the SQLite-specific queries too as we move along.

Analyzing what files are present

We could have found which files are there, but we haven't. Let's check just that.

If you are on SQLite, just enter .tableor .tables. It will show you all the files that Reddit has shared as part of the respective data request policy (please comment if there is any legal detail you'd like to talk about regarding any of the acts of California, or the act of GDPR, mentioned on the data request page). Under GDPR, this is what I got:

A screenshot of all the files I got

account_gender, approved_submitter_subreddits, chat_history, checkfile, comment_headers, comment_votes, comments, drafts, friends, gilded_content, gold_received, hidden_posts, ip_logs, linked_identities, linked_phone_number, message_headers, messages, moderated_subreddits, multireddits, payouts, persona, poll_votes, post_headers, post_votes, posts, purchases, saved_comments, saved_posts, scheduled_posts, sensitive_ads_preferences, statistics, stripe, subscribed_subreddits, twitter, user_preferences.

That's all.

Check them out yourself. You may check out this answer from Reddit Support for more details.

The most concerning one is that Reddit stores your chat history and IP logs and can tell what you say in which room. Let me explain just this, you'll get the rest of them.

Chat History

.schema gives you how all the tables are structured, but .schema chat_history will show the table structure of only the table named chat_history.

CREATE TABLE IF NOT EXISTS "chat_history" (
        "message_id"    TEXT,
        "created_at"    TEXT,
        "updated_at"    TEXT,
        "username"      TEXT,
        "message"       TEXT,
        "thread_parent_message_id"      TEXT,
        "channel_url"   TEXT,
        "subreddit"     TEXT,
        "channel_name"  TEXT,
        "conversation_type"     TEXT
);

"Create table if not exists" is basically an SQL query, nothing to worry about.

So, message_id is unique, username just gives you the username of the one who messaged, message is basically... well, whatever you wrote.

thread_parent_message_id, as you may understand, is basically the ID of the parent message from which a thread in the chat started, you know, those replies basically.

About channel_url:

channel_url is the most important thing in this. It just lets you get all the messages of a "room" (either a direct message to someone, a group, or a subreddit channel). What can you do to get all the messages you've had in a room?

Simple. For each row, you will have a link in the channel_url column, which resembles with https://chat.reddit.com/room/!:reddit.com, where this `` has your room ID.

Enter a query, something like this, with it:

SELECT * FROM chat_history WHERE channel_url LIKE "%%";

Here, the % symbol on both the sides signify that there are either 0, 1, or multiple characters in place of that symbol. You can also try out something like this, since the URL remains the same (and this one's safer):

SELECT * FROM chat_history WHERE channel_url = (SELECT channel_url FROM chat_history WHERE username = "");

where recipient username is without that "u slash" and should have messaged once, otherwise you won't be able to get it. Also, some people may have their original Reddit usernames shown instead of their changed usernames, so be careful with that.

The fields "subreddit" and "channel_name" are applicable for subreddit channels.

Lastly, the conversation type will tell you which is which. Basically, what I was saying as a subreddit channel is just known as community, what I was saying as a group is known as private_group, and DMs are basically direct.

Conclusion

Regarding the chat history, if these DMs contain sensitive information essential to you, it is highly advised that you import them into a database before you try to deal with them, because these are HUGE stuff. Either use MS Access or some form of SQL for this.

In case you want to learn SQL, then a video to learn it:

I myself learnt from this amazing guy.

Also, I hope that this guide gives you a little push on analyzing your Reddit data.

25
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/datahoarder by /u/TomReddito on 2024-11-03 21:10:57+00:00.


Hi! I want to start a few archiving projects (recording Polish radio, maybe television, and also LiveATC and Flightradar24 and eventually a bit more). I only have around 1 terabyte of storage free (2TB portable HDD that's currently being used as Arch Linux installation) that I can use. I have around 4 terabytes combined on my PC's (more like 2TB free) but I move a lot (twice a week) and I have 1 desktop and 1 laptop, which I use between the 2 locations, so I can't really use that storage. I had an idea to use one of my old phones and leave them at 1 location. Then I'd run programs that automatically download stuff, and then at midnight I'd upload the data to archive.org and then delete it to make space for more data. I can't really find anything that prohibits this in the TOS of IA, so I want to make sure, is this allowed? Thanks in advance.

view more: next ›