this post was submitted on 24 Sep 2023
11 points (100.0% liked)

datahoarder

6841 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 5 years ago
MODERATORS
 

cross-posted from: https://l.antiope.link/post/43914

Hi all. I’m trying to choose a configuration for my home storage. Speed is not a priority, I want a balance of stability and performance. I was thinking of making a raid 6 array with an ext4 file system for 4 disks of 2 TB each. Asking for advice, will this configuration be optimal?

Note - I am going to make a raid array based on external usb drives, which I will plug into the orange pi

top 9 comments
sorted by: hot top controversial new old
[–] Shdwdrgn@mander.xyz 8 points 1 year ago (1 children)

Use ZFS instead of ext4... it has a lot built into it which protects data integrity, and it's always been suitably fast for me. My current large arrays use raid-z2 (which is equivalent to RAID6, having two drives of redundancy), however I started out with the insane configuration of five drives in a raid0, lost power and dropped one of more drives several times, and was able to bring it back up without any data loss. I would NOT recommend this to anyone, but it took me time to learn that my drives were dropping due to a poor quality power supply, and the recovery impressed me so much that I have always used ZFS since then.

There are a few optimizations you can do when you set up your array such as configuring the stripe width and sector sizes, although I believe the default settings on these are pretty ideal now and may no longer require tweaking.

As for overall speed... My most recent array is built from eight 18TB drives, formatted out to around 90TB of usable space. While designing my external rack I knew that HDDs can't really reach SATA3 speeds, but an array can out-perform that by accessing multiple drives at once. I built my assembly with cheap SATA2 backplanes and LSI SAS cards. Even right now with the array in use by multiple servers I am still getting speeds between 483-597MB/s while copying 10GB of random data and nearly 900MB/s copying from /dev/zero. You're obviously not going to see that kind of speed from USB-connected drives, but the point is that ZFS itself will not slow you down in any way.

[–] PigeonCatcher@l.antiope.link 2 points 1 year ago

Yeah, seems like ZFS is an option in my setup.

[–] yote_zip@pawb.social 4 points 1 year ago (1 children)

Are you buying the hardware for this setup, or do you already have it laying around? If you don't have the hardware yet I'd recommend not using external USB drives in any way possible, as speed and reliability will be hindered.

If you already have the hardware and want to use it I'm not super confident on recommending anything given my inexperience with this sort of setup, but I would probably try to use ZFS to minimize any potential read/write issues with dodgy USB connections. ZFS checksums files several times in transit, and will automatically repair and maintain them even if the drive gives you the wrong data. ZFS will probably be cranky when used with USB drives but it should still be possible. If you're already planning on a RAID6 you could use a RAIDZ2 for a roughly equivalent ZFS option, or a double mirror layout for increased speed and IOPS. A RAIDZ2 is probably more resistant against disk failures since you can lose any 2 disks without pool failure, whereas with a double mirror the wrong 2 disks failing can cause a pool failure. The traditional gripe about RAIDZ's longer rebuild times being vulnerable periods of failure are not relevant when your disks are only 2TB. Note you'll likely want to limit ZFS's ARC size if you're pressed for memory on the Orange Pi, as it will try to use a lot of your memory to improve I/O efficiency by default. It should automatically release this memory if anything else needs it but it's not always perfect.

Another option you may consider is SnapRAID+MergerFS, which can be built in a pseudo-RAID5 or RAID6 fashion with 1 or 2 parity drives, but parity calculation is not real time and you have to explicitly schedule parity syncs (aka if a data disk fails, anything changed before your last sync will be vulnerable). You can use any filesystems you want underneath this setup, so XFS/Ext4/BTRFS are all viable options. This sort of setup doesn't have ZFS's licensing baggage and might be easier to set up on an Orange Pi, depending on what distro you're running. One small benefit of this setup is that you can pull the disks at any time and files will be intact (there is no striping). If a catastrophic pool failure happens, your remaining disks will still have readable data for the files that they are responsible for.

In terms of performance: ZFS double mirror > ZFS RAIDZ2 > SnapRAID+MergerFS (only runs at the speed of the disk that has the file).

In terms of stability: ZFS RAIDZ2 >= ZFS double mirror > SnapRAID+MergerFS (lacks obsessive checksumming and parity is not realtime).

[–] PigeonCatcher@l.antiope.link 2 points 1 year ago* (last edited 1 year ago) (2 children)

Thank you! By the way, I've heard that ZFS has some issues with growing raid array. Is it true?

[–] yote_zip@pawb.social 2 points 1 year ago

Yes very much so. You'll need to figure out what your strategy is for array growth before you start building, because ZFS is very inflexible when growing. I normally just use mirrors because you can add 2 disks at a time with no hassles or gotchas. If you use a RAIDZ variant you will basically have to destruct and rebuild the pool/vdev if you want to grow them (or buy a whole bunch of new disks to start a separate RAIDZ array). The only other option is to replace every single disk in the RAIDZ with a larger capacity, which also works. RAIDZ expansion as a feature has been promised for many years (and the code is even already written!) but so far it has not been mainlined and although the current plan is to expect it sometime in the next year or so, that has also been the plan for the past 3-5 years. If you want to count on this feature being implemented by the time you want to grow you can feel free to do so, but it's not something I would count on until it's there.

[–] constantokra@lemmy.one 1 points 1 year ago (1 children)

If you want to be able to grow, check out mergerfs and snapraid. If you're wanting to use a pi and USB drives it's probably more what you're wanting than zfs and raid arrays. It's what i'm using and I've been really happy with it.

[–] PigeonCatcher@l.antiope.link 1 points 1 year ago (1 children)

Thank you! Gonna check it out.

[–] constantokra@lemmy.one 2 points 1 year ago

I've been using linux for a long time, and I have a background in this kind of stuff, but it's not my career and I don't keep as current as if it was, so i'm going to give my point of view on this.

A zfs array is probably the legit way to go. But there's a huge caveat there. If you're not working with this technology all the time, it's really not more robust or reliable for you. If you have a failure in several years, you don't want to rely on the fact that you set it up appropriately years ago, and you don't want to have to relearn it all just to recover your data.

Mergerfs is basically just files on a bunch of disks. Each disk has the same directory structure and your files just exist in one of those directories on a single disk, and your mergerfs volume shows you all files on all disks in that directory. There are finer points of administration, but the bottom line is you don't need to know a lot, or interact with mergerfs at all, to move all those files somewhere else. Just copy from each disk to a new drive and you have it all.

Snapraid is just a snapshot. You can use it to recover your data if a drive fails. The commands are pretty simple, and relearning that isn't going to be too hard several years down the road.

The best way isn't always the best if you know you're not going to keep current with the technology.

[–] greengnu 2 points 1 year ago

Raid stopped being optimal now that btrfs and ZFS exist.

If you plan on doing matching drives ZFS is recommended

If you expect mismatched disks, btrfs will work.

If you are most worried about stability get a computer with ECC memory.

If you are most worried about performance, use SSD drives.

If you want a bunch of storage for cheap, use spinning disks (unless you exceed the 100TB capacity range)