Tyblog | When Disks Die: A ZFS Recovery Post-Mortem

In future give Ceph a try. I switched from a FreeNAS setup with RAIDZ2 over to it and it’s been pretty great but it’s especially good at handling hardware failure. A Ceph cluster is basically a bunch of JBOD drives across one or more machines and ceph strives to keep a certain number of copies of any data on that cluster. In the event of a drive failure ceph says “oh dear, not enough copies of files X Y and Z, I better make some more!” and makes further copies of any data that was on the dead drive. Your storage stays accessible and usable but with reduced total capacity. If drives keep failing that’s fine so long as you have enough space for the minimum number of copies of your data that you specify (3 default) and at least as many places to put data.

The major downsides are that it’s not easy to run and at 33% space efficiency (default) it’s hungry for drives.