[How-To] Deduplicate files on BTRFS

Difficulty: ★★★★☆

BTRFS doesn’t have an online in-band deduplication feature like ZFS. But it can somewhat save space by setting equal extents in a “shared state” and reference them.

An exemplary view

mkdir -pv /path/to/btrfs/mountpoint/testdir
cd /path/to/btrfs/mountpoint/testdir
# Create 5 zeroed images
for x in $(seq 1 5); do dd if=/dev/zero of=${x}G.img bs=${x}MB count=1024 status=progress; done
# Check the size
$ sudo compsize .
Processed 5 files, 116014 regular extents (116014 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL        3%      453M          14G          14G       
zstd         3%      453M          14G          14G   
# Remove duplicate extents
sudo duperemove -r -D .
# Check the size again
$ sudo compsize .
Processed 5 files, 1179 regular extents (117189 refs), 0 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL        3%      4.6M         147M          14G       
zstd         3%      4.6M         147M          14G  

Comparison:

state extents disk usage references uncompressed compressed
before 116014 453MB 116014 14GB 453 MB
after 1179 4.6MB 117189 147MB 4.6 MB

There are now 1175 more references, but the extents and disk usage are heavily reduced while it was already compressed by 97%, so the actual size is on the disk 3% of the real size.

du or any file manager will still report the full size (uncompressed and non-deduplicated):

$ du -hs .
15G	.

A closer look:

$ sudo btrfs filesystem du -s  . 
     Total   Exclusive  Set shared  Filename
  14.30GiB   146.81MiB   448.00KiB  .

As you see, the total size here is the referenced size in compsize and the exclusive size the uncompressed, but deduplicated size.

Short exlanation about duperemove

duperemove is still in development and is beta software, but can be considered stable enough for daily usage. In any way, use it on your own risk.

duperemove doesn’t manage any deduplication. What it actually does is gathering information, creating checksums and passing this information to the BTRFS module.

:warning: Note that deduplication can increase free space, but the downside is a higher fragmentation rate. So on HDDs it would be recommended to run defragmentation.

Installation

There are 2 AUR packages which can be used: duperemove-git and duperemove-service.

Install them as needed:

pamac build duperemove-git duperemove-service

Or use directly the source:

Usage

Loading…
[INFO] Nothing was found. Did you write something here?
[INFO] Write something here so that it can be parsed and printed.
[ERROR] EIO - 5

:warning: This is a wiki article. You are free to copy, share, or edit the content without restrictions as long as it doesn’t miss the main topic.

2 Likes