this post was submitted on 01 Sep 2023

44 points (100.0% liked)

Selfhosted

40137 readers

637 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.
No spam posting.
Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.
Don't duplicate the full text of your blog or github here. Just post the link for folks to click.
Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).
No trolling.

Resources:

selfh.st Newsletter and index of selfhosted software and apps
awesome-selfhosted software
awesome-sysadmin resources
Self-Hosted Podcast from Jupiter Broadcasting

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago

MODERATORS

HybridSarcasm@lemmy.world

HybridSarcasm@lemmy.hybridsarcasm.xyz

Is bit rot really a threat that I should worry about? (midwest.social)

submitted 1 year ago by mouse@midwest.social to c/selfhosted@lemmy.world

21 comments fedilink hide all child comments

I have recently become interested in mini PCs, but one thing that is stopping me is a feeling that bit rot could cause me to lose data.

Is bit rot something to worry about when storing data for services such as Git, or Samba. I have another PC right now that is setup with btrfs raid1 and backups locally and to the cloud, however was thinking about downsizing for the benefit of size and power usage.

I know many people use the mini PCs such as ThinkCentres, Optiplex, EliteDesks and others, I am curious if I should be worried about losing data due to bit rot, or is bit rot a really rare occurrence?

Let's say I have backups with a year of retention, wouldn't it be possible that the data becomes corrupt and that it isn't noticed until after a year? for example archived data that I don't look at often but might need in the future.

all 22 comments

sorted by: hot top controversial new old

[–] dragontamer@lemmy.world 25 points 1 year ago* (last edited 1 year ago) (3 children)

Wait, what's wrong with issuing "ZFS Scan" every 3 to 6 months or so? If it detects bitrot, it immediately fixes it. As long as the bitrot wasn't too much, most of your data should be fixed. EDIT: I'm a dumb-dumb. The term was "ZFS scrub", not scan.

If you're playing with multiple computers, "choosing" one to be a NAS and being extremely careful with its data that its storing makes sense. Regularly scanning all files and attempting repairs (which is just a few clicks with most NAS software) is incredibly easy, and probably could be automated.

[–] mouse@midwest.social 5 points 1 year ago (2 children)

I guess, my primary concern was if I didn't have the computer with ZFS(in my case btrfs but similar thing). Maybe it is for the best that I keep the raid setup to scrub and make sure important data is safe, and use the smaller single disk mini PC for services and data that isn't as important.

[–] dragontamer@lemmy.world 5 points 1 year ago* (last edited 1 year ago) (1 children)

If you have a NAS, then just put iSCSI disks on the NAS, and network-share those iSCSI fake-disks to your mini-PCs.

iSCSI is "pretend to be a hard-drive over the network". iSCSI can exist "after" ZFS or BTRFS, meaning your scrubs / scans will fix any issues. So your mini-PC can have a small C: drive, but then be configured so that iSCSI is mostly over the D: iSCSI / Network drive.

iSCSI is very low-level. Windows literally thinks its dealing with a (slow) hard drive over the network. As such, it works even in complex situations like Steam installations, albeit at slower network-speeds (it gotta talk to the NAS before the data comes in) rather than faster direct connection to hard drive (or SSD) speeds.

Bitrot is a solved problem. It is solved by using bitrot-resilient filesystems with regular scans / scrubs. You build everything on top of solved problems, so that you never have to worry about the problem ever again.

[–] mouse@midwest.social 1 points 1 year ago

Thanks for that information about iSCSI, I hadn't looked into it. I will probably just stick with my primary server for the moment, maybe rebuild it into a NAS, and than use mini PCs with it as the storage.

[–] iHUNTcriminals@lemm.ee 3 points 1 year ago

Does the smart thing in omv take care of this? Anyone know? Obviously I'm a novice haha.

[–] markstos@lemmy.world 3 points 1 year ago (1 children)

You don’t define bitrot. If you leave software alone with no updates for long enough, yes, there will be problems.

There will eventually be a security issue with no fix, or a new OS or hardware it doesn’t work on.

Backups can also fail over time if restores are not tested periodically.

This recently happened to me. A server wouldn’t boot anymore, so we restored from backup, but it still wouldn’t boot. The issue was that we’d introduced change that caused a boot failure. To fix that by restoring from a backup, we’d need a backup from before that change. It turns out we had one, but didn’t realize what the issue was.

The other moral is to reboot frequently if only to confirm the system can still boot.

[–] dragontamer@lemmy.world 11 points 1 year ago* (last edited 1 year ago)

That's not what storage engineers mean when they say "bitrot".

"Bitrot", in the scope of ZFS and BTFS means the situation where a hard-drive's "0" gets randomly flipped to "1" (or vice versa) during storage. It is a well known problem and can happen within "months". Especially as a 20-TB drive these days is a collection of 160 Trillion bits, there's a high chance that at least some of those bits malfunction over a period of ~double-digit months.

Each problem has a solution. In this case, Bitrot is "solved" by the above procedure because:

Bitrot usually doesn't happen within single-digit months. So ~6 month regular scrubs nearly guarantees that any bitrot problems you find will be limited in scope, just a few bits at the most.
Filesystems like ZFS or BTFS, are designed to handle many many bits of bitrot safely.
Scrubbing is a process where you read, and if necessary restore, any files where bitrot has been detected.

Of course, if hard drives are of noticeably worse quality than expected (ex: if you do have a large number of failures in a shorter time frame), or if you're not using the right filesystem, or if you go too long between your checks (ex: taking 25 months to scrub for bitrot instead of just 6 months), then you might lose data. But we can only plan for the "expected" kinds of bitrot. The kinds that happen within 25 months, or 50 months, or so.

If you've gotten screwed by a hard drive (or SSD) that bitrots away in like 5 days or something awful (maybe someone dropped the hard drive and the head scratched a ton of the data away), then there's nothing you can really do about that.

[–] synthsalad@mycelial.nexus 20 points 1 year ago* (last edited 1 year ago) (1 children)

Nightly automated runs of the chkbit script is the only thing that alerted me to the fact that either the SSD or storage controller in my Mac Mini had issues and was corrupting data. I was very thankful to have already had the automation in place for that exact scenario.

It theoretically shouldn’t be necessary for filesystems that have built-in checksumming.

[–] RoyalEngineering@lemmy.world 5 points 1 year ago (1 children)

Can you post your scripts?

[–] synthsalad@mycelial.nexus 3 points 1 year ago* (last edited 1 year ago)

This is what I use, will work with any filesystem (it writes hashes in hidden/dot files) and on any OS as long as Python is available: https://pypi.org/project/chkbit/

It runs ahead of my nightly backup. If it fails, the backup won’t proceed.

Edit: Because the script relies on hashing files, it uses tons of both disk IO and CPU when it runs, but the tradeoff is worthwhile to me.

[–] NullPointerException@lemm.ee 19 points 1 year ago (2 children)

honestly I don't think it's really a significant issue but if you're worried just use a fs that can repair itself like zfs (not sure if btrfs can do that too but it might)

[–] hedgehog@ttrpg.network 8 points 1 year ago

And if you’re really concerned about data integrity then you should also ensure that your server has ECC RAM.

[–] vividspecter@lemm.ee 3 points 1 year ago

(not sure if btrfs can do that too but it might)

It can. And they'll both alert you of problems if you do regular scrubs, which might be enough even with non-raid installs, if you have secondary backups.

[–] SheeEttin@lemmy.world 11 points 1 year ago (2 children)

How life-or-death critical would it be if you lost one of those files?

Resilient filesystems/raid/multiple backup points should be more than enough.

[–] mustardman@discuss.tchncs.de 8 points 1 year ago* (last edited 1 year ago) (1 children)

Resilient filesystems/raid/multiple backup points should be more than enough.

A word of caution on relying on backups without the other types of error prevention you mention: If it takes you a while to notice that bitrot has ruined a file, then it may have already propagated through your backups. The only type of backups that would account for this is archival backups, such as on tape or quality bluray discs.

[–] mouse@midwest.social 1 points 1 year ago

Yeah, that's kind of what I expected and I am now thinking of keeping my setup how it is currently and getting a mini PC for less important data and services to tinker with.

[–] mouse@midwest.social 4 points 1 year ago (2 children)

That is a very good question, it makes me think of better organization for my data. Data such as task lists, and daily notes aren't necessarily very important, while family photos and documents would be more important.

[–] thelittleblackbird@lemmy.world 3 points 1 year ago

Save yourself a headache and use btrfs/zfs with periodically checks as suggested in another post.

Who cares if it is a problem or not when it has a simple and inexpensive solution.

[–] SeriousBug@infosec.pub 3 points 1 year ago (1 children)

For any family photos and documents you can't afford to lose, make sure you have backups of it. A RAID array does not mean you don't need backups: you want at least 3 copies, at least one offsite.

The copy in your RAID array is one copy. You can back that up to an external hard drive or something as a second copy. Then have an offsite backup on something like Backblaze as your third copy.

[–] mouse@midwest.social 2 points 1 year ago

Thanks for the reassurance. What I currently have is exactly that, RAID for the local data, and a spare drive that is mounted and unmount when data is backed up, and that is rsynced offsite to a cloud provider. I figured that my current setup was really reliable as I had slowly been researching and working on this over a few years.

I have a sort of itch to play with a mini PC, I guess it would be best not to hurt any of my important data by downgrading the setup, however this is a good time to really sort and figure out what I need and is important and what isn't as important and can be reobtained if something fails on the mini PC.

[–] Decronym@lemmy.decronym.xyz 7 points 1 year ago* (last edited 1 year ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters	More Letters
NAS	Network-Attached Storage
RAID	Redundant Array of Independent Disks for mass storage
SSD	Solid State Drive mass storage

3 acronyms in this thread; the most compressed thread commented on today has 11 acronyms.

[Thread #102 for this sub, first seen 1st Sep 2023, 18:15] [FAQ] [Full list] [Contact] [Source code]