Btrfs layout for data storage - questions & sanity check

Hello all!

I am currently running a raid10 array with mdadm for my data storage which I intend to switch over to a Btrfs raid 1 configuration on 2 larger new disks.
After spending the last two nights reading and trying a few things in a VM there are still open questions that I cannot seem to find the answers for. The more I research, the more confusing it all gets :upside_down_face:.

The goal:
On the new storage I’d want a large shared space for the following items (slightly simplified):

  • “Work” - we want snapshots for this
  • “Projects” - also want independent snapshots for this
  • “Media” - no snapshots needed

I tried creating my 3 subvolumes with each being a root volume like so:

sdb
 |
 |- work
 |- projects
 |- media

However, when I do this, and mount the subvolumes, the drive listed under “devices” in dolphin somehow links to the work subvolume so something seems off.


Do I need to create a root subvolume and then create subvolumes for work, projects and media underneath it?

sdb
 | /
 |- /@work
 |- /@projects
 |- /@media

If this were the correct layout, should the root subvolume be mounted in addition to my work, projects and media subvolumes?


Does it make sense to turn on btrfs compression on the media subvolume where we are already dealing with heavily compressed files?


According to the btrfs wiki:

btrfs filesystems can be created on:

  • partitions (example: /dev/sdb1)
  • raw disks, without partitioning (example: /dev/sdb)

What are the benefits/potential drawbacks of using partitions vs. raw disk?
When researching I am finding confliciting information where some people say that a partition should always be created first while others claim to be running directly on raw disk fine for many years.


Next, I would like to use Snapper for snapshotting using configurations for “work” and “projects”.
Am I correct that snapper will place the snapshots like so and that rollback will be possible, or does this require subvolumes for the snapshots themselves?

sdb
 | /
 |- /@work
 |- -.snapshots
 |- /@projects
 |- - .snapshots
 |- /@media

Currently I run daily system backups of my ext4 system with Timeshift rsync with the destination of the raid10 ext4 array.
Will this still work if the destination were a “sys-backups” btrfs subvolume?


After the moving of data to btrfs is done, I plan on recycling 2 of the current raid10 disks to store extra copies of the work and projects data which is about 250GB (offsite backup is already taken care of). What would be a clever and easy way of doing this? Btrfs send/receive? Rsync?

Thank you for your help!
Beer

I’ll chime in until someone more versed in BTRFS jumps into this thread.

Full disclaimer: I don’t use BTRFS, I use ZFS, and prefer it for multiple reasons. But that’s beyond the scope. However, some concepts do overlap (albeit the “terms” used to describe things are different.)


In ZFS, you can only have one true root dataset (i.e, “volume”), and it honestly serves more as a “placeholder” than anything useful. (You can use it to dictate inherited properties for the children datasets below.) I never knew BTRFS lets you have more than one. Unless I’m reading it incorrectly.

But wouldn’t each subvolume be denoted by an “at” symbol? Such as:

sdb
 |
 |- @work
 |- @projects
 |- @media

You only mount what you need. While volume/snapshotting/block-level BTRFS operations are concerned with BTRFS management itself, when it comes to mounting, the subvolumes are independent of each other. You don’t need to have a higher-level volume mounted in order to be able to mount a lower-level subvolume.


Not sure how gracefully BTRFS handles it, but the default compression setting in ZFS uses the LZ4 compression algorithm, and does a quick calculation to decide whether or not to even try to compress the records. So not only is it fast if the record is compressed/decompressed, but it’s not even “in play” most of the time. Paradoxically, you get about the same or slightly better performance with LZ4 compression enabled versus no compression in ZFS. As for metadata, ZFS always compresses metadata, even if the dataset has compression disabled. Not sure if this applies to BTRFS. Whether my dataset contains heavily compressed data or not, I always use the default LZ4 compression and have been very happy. With modern CPUs, it further hides any bottlenecks we might have seen over a decade ago.


I don’t think it matters. There might have been a legacy reason to only use partitions, rather than the entire block device, maybe for compatibility or alignment issues? Either way, you might as well stick to partitions, since at most you’ll only lose possibly 1 MiB for alignment, and that’s it. At least you have the assurance that is there’s some future bug that is unearthed for those using raw devices, you’ll be safe with the tried-and-true method. :sunglasses:


Snapshots are not “folders” or “directories”. They can be presented as such, so that the user intuitively knows how to navigate them or list them. Snapshots are directly tethered to each volume; since by all accounts a snapshot is literally that volume at a specific point in time.


You can still use rsync backups, regardless. However, in order to leverage snapshots, Timeshift provides an alternative (and much faster) method of BTRFS snapshots, which only works on the BTRFS volumes themselves.

I take it to mean you want to use Timeshift to “rsync” backups into a particular subvolume on your BTRFS filesystem? If so, it should technically be possible. You wouldn’t be taking advantage of the speed and efficiency of snapshots.


I could be wrong on a lot of things, since I don’t use BTRFS (tried it, but greatly prefer ZFS, which has more active development and intuitive tools.) Though many of the concepts are similar.

Someone more experienced and familiar with BTRFS can likely answer your questions better.

You do create one btrfs volume. And as many subvolumes as you need for different snapshot strategies.

The btrfs-root is automatically created with the volume.

The btrfs-root is normally NOT mounted. (only for manual work while doing a rollback or something similar)

Yes

By convention this are the good names :wink:
But normaly there is also a subvolume @ (which acts as the main subvolume for the linux system) and is mounted as /

  • With raw disks you only have one volume, no GPT, no partitioning … (simple, clean, but only data)
  • With partitions you have some other possibilities … (you can boot, grub, EFI, more than one volume, … resize )

Better you read this in the manjaro-wiki :innocent:

( I do use snapper with a custom layout, so that it is not nested. This is easy to rollback manually :point_down: )

Yes :clap:

This is up to you to decide (and measure)

I have everything compressed with zstd:19 and have a ratio of ca. 2:1 over all.


You find good Information about Btrfs in the wiki

https://wiki.manjaro.org/index.php/Btrfs
additional info:
https://forum.manjaro.org/t/how-to-manual-rollback-with-btrfs/80230/5
https://forum.manjaro.org/t/how-to-rescue-data-from-a-damaged-btrfs-volume/79414/4

1 Like

Perhaps, but it may be a dolphin-ism.

When you create the filesystem, it will automatically create the root. You don’t need to manually create it.

A couple of things. First, it is up to you if you want to use subvolume names that start with @. That is not a btrfs convention, it is convention used by some distros when using a “flat” subvolume layout.

While I usually use a “flat” layout, for a pure data disk laid out the way you are proposing, I am not sure that a “nested” layout wouldn’t be better.

In “flat” layout, you usually don’t mount the root(But you can). In a “nested” layout, you would only mount the root and not need to mount any of the subvols.

In your case, I think it would be simpler to use a “nested” layout, mount the root and be done.

I would definitely use a partition. Without that, you could never resize the volume if you wanted to later. Also, what would the advantage of putting a btrfs volume directly on the disk be?

Snapper does need a subvolume for .snapshots, however, it will create it for you. You don’t need to worry about it.

Yes.

You can use either but send incremental snapshots with btrfs send/receive would be what I would do.

1 Like

Thank you @winnie @andreas85 and @dalto for your elaborate responses. I really appreciate the help :smiley:

Main takeaways:

  • Partition disks before use.
  • Label subvolumes with “@” (this is not required, but a common practice).
  • Dolphin “devices” linking to first subvolume seems to only happen if is root isnt mounted.
  • zstd has compression levels that I assume can be set in fstab with “compress=zstd:-7 through 22”
  • Creating configurations for Snapper with commandline was easy and fun! Rollback, comparing snapshots and even single file contents seems rather useful (snapper-gui confused me more than it helped).
  • Using Timeshift to create rsync backups of the ext4 system on the btrfs storage works but raises some additional questions (see below).
  • Will use bfrfs send/receive to create backups of the work/projects snapshots on the 2nd array (need to research and test).

Additional questions:

Backing up the ext4 system to the btrfs pool works. Using hourly snapshots for testing purposes only (will be daily).

>sudo timeshift --check --verbose                                                                                                                                                               
[Mounted '/dev/vdb1' at '/run/timeshift/backup'
Hourly snapshots are enabled
Last hourly snapshot is 31 minutes old

If I mount /mnt/pool (the root or maybe rather volume of the btrfs filesystem), timeshift created a folder next to the subvolumes:

>sudo mount /dev/vdb1 /mnt/pool
>ls -a /mnt/pool                                                                                                                                                                                           
.  ..  @media  @projects  timeshift  @work

Does this pose any problem? There seems be no way to specify a subvolume as a target for rysnc? (Timeshift gui only allows me to select partitions)


When restoring snapshots, in many articles there is a lot of information regarding mounting snapshots. Is there a reason why “undochange” does not find more common mention?

snapper -c work undochange 5..6

At first glance it appears to be much easier to roll back changes this way than doing it the “mount snapshot” way.


Lastly, when creating snapshots with Snapper timeline, is there a way to do only do daily snapshots?
Setting the following settings in my config does not seem to do the trick. I assume that setting Timeline_Create=“no” will disable the whole process.

# create hourly snapshots
TIMELINE_CREATE="yes"

# cleanup hourly snapshots after some time
TIMELINE_CLEANUP="yes"

# limits for timeline cleanup
TIMELINE_MIN_AGE="1800"
TIMELINE_LIMIT_HOURLY="0"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="0"
TIMELINE_LIMIT_MONTHLY="1"
TIMELINE_LIMIT_YEARLY="0"

Thank you once again!
Beer

I do not know :man_shrugging:

But be aware, that hourly snapshots do cost nearly nothing!

  • Because a snaphot does not take any space
  • Btrfs does NOT re-use the space that gets free by replacing a file with an other content. So this garbage will stay for a while.
    This “garbage” may need a few days (or weeks) to be reused
  • Until then your hourly snapshot is gone, and will not prevent the garbage from being cleaned.

Snapshots from far back use a lot of resources. A deleted (hourly) snapshot no longer occupies anything

No. That isn’t an issue.

You could also try btrfs-assistant. Although, I might be biased on that one. :slight_smile:

When you enable timeline snapshots, the snapshots will be taken hourly. The retention settings you set are applied in the cleanup. In other words, if you say you only want to retain daily snapshots, when the cleanup comes through it will delete the rest.

1 Like

Gave it a look in my VM. It seems to be a much better GUI. Thanks!

Ok, seems there is no good way around the hourly snapshots then.

My train of thought here was that since the data in those 2 subvolumes does not change that freqently, hourly snapshots would be overkill. Especially since the machine is always on, there are times at night where the disks might spin down, so there would be no reason to spin them up for an unnecessary snapshot.

I have been documenting the setup I did in the VM. Perhaps after I have figured out the remaining pieces have everything up and running on my live system, I could compile it into some sort of personal log or mini guide for the tutorial section of the forum if you think this might be useful?

2 Likes

You could turn off the timeline snapshots and then create a manual timer/job that takes snapshots daily and designate them “timeline” snapshots so they would get cleaned up by the cleanup service/job.

That being said, I agree with @andreas85 in that the cost to take those snapshots is close to nothing so why not have hourly snapshots. You never know when they will be useful.

1 Like