Five Years of Btrfs
In 2015, I decided to use the Btrfs file system to store all my data. Its flexibility turned out to be more valuable than I expected.
This post has off-site commentary
What's Btrfs?
This article assumes you have some knowledge of file systems. If you're not so familiar with Btrfs, which is my preferred file system, here are a few links to get you started:
ZFS vs. Btrfs: The Choice Comes Down to Strategy
Like so many others, once I decided I had enough data that warranted a proper data storage solution I had a decision to make: ZFS or Btrfs? Both are mature, modern file system with features that keep data safe (e.g., copy on write, bit rot protection, RAID-like data profiles, etc.). One can find endless online debates regarding which is better, but I think the one single thing to consider before getting too deep in the feature details is this: will you change your disks or data profile much? If the answer is "yes," or "I don't know," then this article is for you.
ZFS: Opinionated
When you create a pool in ZFS, you do it with virtual devices, or vdevs. Let's say you build a simple mirror vdev of two 8 TB drives. This is done easily with a few simple commands. All is well until the time comes to change things up.
If you want to grow the pool, you basically have two recommended options: add a new identical vdev, or replace both devices in the existing vdev with higher capacity devices. So you could buy two more 8 TB drives, create a second mirrored vdev and stripe it with the original to get 16 TB of storage. Or you could buy two 16 TB drives and replace the 8 TB drives one at a time to keep a two disk mirror. Whatever you choose, ZFS makes you take big steps. There aren't good small step options, e.g., let's say you had some money to burn and could afford a single 10 TB drive. There's no good way to add that single disk to your 2x8 TB mirror.
There's also a considerable gotcha even if you don't mind the big steps: imbalanced data. Let's say you had 7 TB written to your 2x8 TB mirror. When you add the second vdev, it stays empty, i.e., ZFS doesn't redistribute the data. So let's say you had no writes for a month and continual reads. Those two new disks would go 100% unused. Only when you started writing data would they start to see utilization, and only for the newly written files. It's likely that for the life of that pool, you'd always have a heavier load on your oldest vdevs. Not the end of the world, but it definitely kills some performance advantages of striping data.
So ok, ZFS has opinions about growth. But you can plan around them, right? Well, maybe. There are many "can't do it" scenarios. Want to break a pool into smaller pools? Can't do it. So let's say you built your 2x8 + 2x8 pool. Then a few years from now 40 TB disks are available and you want to go back to a simple two disk mirror. There's no way to shrink to just 2x40. You'd need to create a new pool, move the data, then destroy the old pool. Got a 4-disk raidz2 pool and want to add a disk? Can't do it. For most fundamental changes, the answer is simple: start over. To be fair, that's not always a terrible idea, but it does require some maintenance down time.
If you want to use ZFS, know that whatever disk setup you go with needs to either grow in a stepwise fashion, or get destroyed if you want to fundamentally change things.
Btrfs: Like, Whatever Man
Btrfs is much more flexible when it comes to growing, shrinking, or generally changing things. It seems to be ok with whatever disk combination or data profile change you have in mind. It's like The Dude of file systems.
Consider that same 2x8 TB mirror. In Btrfs there's a data profile called RAID1. It will ensure there are 2 copies somewhere on N disks. So with 2x8 TB, it's like a traditional RAID 1 mirror. Unlike traditional RAID, you can just start adding whatever disks you like to this storage array. Sale on 10 TB drives? Get one. 8+8+10 for 13 TB usable storage. Find a 14 TB a year later, 8+8+10+14 for 20 TB usable. One of the 8 TB drives fails a few years later? Replace it with an 18 TB drive, 8+10+14+18 for 25 TB usable. Add two 20 TB drives and soon after feel like you made a poor design choice? Prefer to have two RAID1 storage arrays instead? No problem. Remove them from the array and go 8+10+14+18 for 25 TB, and 20+20 for 20 TB. All this can be done without ever taking the storage array offline.
You get the idea. Btrfs will let you change everything and mix-match disks. And it will let you do that all without downtime. I can't stress enough how great this has been.
And remember how ZFS won't redistribute data after you add new vdevs? Btrfs will if you like. Any time you change the data profile of an array, or add/remove a disk to it, Btrfs can balance the data on it with a single command. The result is all disks with roughly identical utilization.
To make it less hypothetical, consider the turbulent data voyage below that I've been on for the last seven years.
My Disk Setup Over the Years
Year |
Primary Server (Local) Disks Installed (in TB) |
Backup Server (Local) Disks Installed (in TB) |
||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
File Sys | A | B | C | D | E | F | Raw | Avail | File Sys | A | B | C | D | E | F | Raw | Avail | |
2011 | Ext4 HW RAID 5 |
1 | 1 | 1 | 1 | 1 | 5 | 4 | ||||||||||
2013 | Ext4 HW RAID 10 |
3 | 3 | 3 | 3 | 12 | 6 | |||||||||||
2015 | Btrfs RAID 6 |
3 | 3 | 3 | 3 | 3 | 3 | 18 | 12 | |||||||||
2016 (early) |
Btrfs RAID 6 |
3 | 3 | 3 | 3 | 3 | 3 | 18 | 12 | ZFS raidz |
6 | 6 | 6 | 18 | 12 | |||
2016 (late) |
Btrfs RAID 1 |
6 | 6 | 3 | 3 | 3 | 3 | 24 | 12 | ZFS raidz |
6 | 6 | 6 | 18 | 12 | |||
2017 | Btrfs RAID 10 |
6 | 6 | 6 | 6 | 6 | 30 | 15 | ZFS raidz |
8 | 8 | 8 | 8 | 32 | 24 | |||
2018 | Btrfs RAID 10 |
8 | 8 | 8 | 8 | 8 | 40 | 20 | Btrfs RAID 10 |
6 | 6 | 6 | 6 | 6 | 30 | 15 | ||
2019 | Btrfs RAID 10 |
8 | 8 | 8 | 8 | 8 | 40 | 20 | Btrfs RAID 1 |
10 | 10 | 6 | 6 | 6 | 38 | 19 | ||
2020 | Btrfs RAID 10 |
8 | 8 | 8 | 8 | 8 | 40 | 20 | Btrfs RAID 1 |
14 | 10 | 10 | 6 | 6 | 46 | 23 |
Notes (table lines after 2019 were added after the article was published):
- 2015 - Made the switch from hardware RAID to Btrfs
- 2016 - Btrfs RAID 6 was already considered experimental, but was called out as dangerous and likely to corrupt data in several scenarios. Switched to RAID 1. Began phase out of 3 TB drives.
- 2017 - Dropped from 6 to 5 disks on primary Btrfs array and replaced all 3s with 6s. Forced to rebuild backup ZFS pool to switch to 8 TB drives.
- 2018 - Built a new server. Decided to give up on ZFS as I was going to have to destroy the array once again.
- 2019 - Bad luck strikes! Two disks from same batch fail within four months. Replace each with a 10 TB drive.
- 2020 - Bad luck continues. :( Another 6TB fails. I guess it's true what they say about the likelihood of devices in the same batch failing together. Replaced with a 14TB, which means the next 6TB to die can just be pulled and not replaced.
As you can see, I can't make up my mind and Btrfs supports this beautifully. While ZFS's handling of data is excellent, it's not a great fit for my constantly changing environment. You do miss out on the raidz/z2/z3 capacity efficiencies that ZFS offers, but with disk sizes growing and costs dropping, a lot of people will tell you to stay away from parity-based RAID altogether and go for N-way mirrors instead. They perform much better and recover much faster when a disk fails.
Looking Forward
I've grown quite fond of Btrfs over the years and plan to stick with it for the foreseeable future. That said, I keep watch on the ZFS on Linux project, as well as the very new bcachefs project. I'm curious to see how the next five years evolve in the world of file systems.