Freeze issues with BTRFS and Timeshift

gustavo · 6 September 2020 12:21

Does anyone know if there are confirmed problems with BTRFS and Timeshift snapshots causing the system to freeze up? I 'm having this problem where the desktop will freeze for long periods (several minutes), without any particular demand neither on the CPU nor on the disk. It will then unfreeze for a short period and then freeze back.

I once managed to fix the issue by running “timeshift --check”, which got rid of corrupted/incomplete snapshots and resumed my system to normal. But this time the situation was so bad that I had to turn-off using the power button in the middle of the checking/fixing procedure, which caused corruption so bad my system does not boot anymore.

I wonder if anyone else has experienced similar problems while using BTRFS and very frequent timeshift snapshots (I was running mine on an hourly basis).

Thanks a lot.

eugen-b · 6 September 2020 13:09

This tends to break btrfs (but also most of othe filesystems as well), you better have an external backup for such cases.

gustavo · 6 September 2020 13:15

Sure, I was quite concerned when I had to power down in the middle of a checking procedure, but the system was frozen…

I don’t know if this problem is being caused by btrfs or by timeshift, or some kind of conjunction of the two together.

Would anyone know if running snapshots on an hourly basis causes some kind of file system instability?

eugen-b · 6 September 2020 13:18

It is btrfs that causes issues. It sometimes gets corrupted in the extent tree and then you can only format and restore from an external backup.

gustavo · 6 September 2020 13:56

I ended up with a dirty filesystem, with a number of “extent holes” that do not get repaired even with the --repair option. Will try the “–init-extent-tree” from live distro later.

I wonder how btrfs has some major sponsors such as OpenSUSE and Synology, if it is not stable yet even on a simple notebook. I thought to myself: If Synology uses it on their commercial NAS systems, then it must be stable enough for me to run on a notebook…

Maybe they use some more stable version than we have here at Manjaro??

eugen-b · 6 September 2020 14:53

I warn you that it will take very long, several days. And it probably won’t repair the damage anyway. Do it only if you have a different machine to work on. You can run it only from a different OS (Manjaro live USB e.g.)

I think in a professional scenario the admin will set up proper backup, not just rely on btrfs snapshots. But the snapshots are nice (when they work).

gustavo · 6 September 2020 15:00

Thanks Eugen. Makes me think twice… The system is working now, despite de dirty filesystem. If I could know that the the existing “extent hole” errors will not propagate causing further problems, I think I would leave them as they are…

gustavo · 12 September 2020 12:53

Hi Eugen. Still having some freeze issues but they seem fewer and farther in between this week, following what seemed a partially successful “btrfsck --repair” from live distro last week. I now managed to get dmesg output right after the computer returned to normal and found this:

[  860.095932] baloo_file_extr D    0  1750   1653 0x00004080
[  860.095934] Call Trace:
[  860.095941]  __schedule+0x2a7/0x7e0
[  860.095944]  schedule+0x43/0xd0
[  860.095971]  btrfs_start_ordered_extent+0x12f/0x160 [btrfs]
[  860.095975]  ? wait_woken+0x80/0x80
[  860.095992]  btrfs_wait_ordered_range+0x92/0x110 [btrfs]
[  860.096007]  btrfs_sync_file+0x217/0x440 [btrfs]
[  860.096012]  __x64_sys_fdatasync+0x47/0x80
[  860.096015]  do_syscall_64+0x49/0x90
[  860.096017]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  860.096019] RIP: 0033:0x7fc429328e4b
[  860.096023] Code: Bad RIP value.
[  860.096024] RSP: 002b:00007ffe190c4960 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[  860.096025] RAX: ffffffffffffffda RBX: 000055f0e01aeee0 RCX: 00007fc429328e4b
[  860.096026] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 000000000000000c
[  860.096027] RBP: 000055f0e27c6788 R08: 0000000000000000 R09: 0000000000001a8e
[  860.096027] R10: 00000000c1a4e000 R11: 0000000000000293 R12: 00007ffe190c4a10
[  860.096028] R13: 000055f0e01b4b80 R14: 0000000000000000 R15: 0000000000000000
[  915.261260] systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
[  915.261611] systemd[1]: Finished Cleanup of Temporary Directories.
[  915.261791] audit: type=1130 audit(1599914155.225:103): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  915.261795] audit: type=1131 audit(1599914155.225:104): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-tmpfiles-clean comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  982.975759] INFO: task kworker/u16:0:7 blocked for more than 122 seconds.
[  982.975761]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.975762] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.975763] kworker/u16:0   D    0     7      2 0x80004000
[  982.975797] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[  982.975798] Call Trace:
[  982.975804]  __schedule+0x2a7/0x7e0
[  982.975806]  schedule+0x43/0xd0
[  982.975822]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.975826]  ? wait_woken+0x80/0x80
[  982.975840]  start_transaction+0x331/0x520 [btrfs]
[  982.975857]  btrfs_finish_ordered_io.isra.0+0x339/0x820 [btrfs]
[  982.975876]  btrfs_work_helper+0xde/0x400 [btrfs]
[  982.975880]  process_one_work+0x1e1/0x3d0
[  982.975882]  worker_thread+0x4d/0x3e0
[  982.975884]  ? rescuer_thread+0x3f0/0x3f0
[  982.975885]  kthread+0x13e/0x160
[  982.975887]  ? __kthread_bind_mask+0x60/0x60
[  982.975890]  ret_from_fork+0x1f/0x40
[  982.975905] INFO: task kworker/u16:2:217 blocked for more than 122 seconds.
[  982.975905]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.975906] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.975907] kworker/u16:2   D    0   217      2 0x80004000
[  982.975927] Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
[  982.975928] Call Trace:
[  982.975930]  __schedule+0x2a7/0x7e0
[  982.975932]  schedule+0x43/0xd0
[  982.975946]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.975948]  ? wait_woken+0x80/0x80
[  982.975962]  start_transaction+0x331/0x520 [btrfs]
[  982.975977]  cow_file_range_inline.constprop.0+0xec/0x670 [btrfs]
[  982.975980]  ? _raw_spin_lock_bh+0x13/0x30
[  982.975982]  ? __local_bh_enable_ip+0x33/0x70
[  982.976000]  ? zstd_put_workspace+0x6e/0x120 [btrfs]
[  982.976018]  ? btrfs_compress_pages+0x77/0xa0 [btrfs]
[  982.976033]  compress_file_range+0x414/0x810 [btrfs]
[  982.976048]  ? submit_compressed_extents+0x410/0x410 [btrfs]
[  982.976065]  async_cow_start+0x12/0x30 [btrfs]
[  982.976088]  btrfs_work_helper+0xde/0x400 [btrfs]
[  982.976091]  process_one_work+0x1e1/0x3d0
[  982.976093]  worker_thread+0x4d/0x3e0
[  982.976095]  ? rescuer_thread+0x3f0/0x3f0
[  982.976096]  kthread+0x13e/0x160
[  982.976098]  ? __kthread_bind_mask+0x60/0x60
[  982.976100]  ret_from_fork+0x1f/0x40
[  982.976107] INFO: task kworker/u16:6:267 blocked for more than 122 seconds.
[  982.976108]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.976108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.976109] kworker/u16:6   D    0   267      2 0x80004000
[  982.976127] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[  982.976128] Call Trace:
[  982.976130]  __schedule+0x2a7/0x7e0
[  982.976132]  schedule+0x43/0xd0
[  982.976146]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.976148]  ? wait_woken+0x80/0x80
[  982.976161]  start_transaction+0x331/0x520 [btrfs]
[  982.976176]  btrfs_finish_ordered_io.isra.0+0x339/0x820 [btrfs]
[  982.976193]  btrfs_work_helper+0xde/0x400 [btrfs]
[  982.976196]  process_one_work+0x1e1/0x3d0
[  982.976198]  worker_thread+0x4d/0x3e0
[  982.976200]  ? rescuer_thread+0x3f0/0x3f0
[  982.976201]  kthread+0x13e/0x160
[  982.976203]  ? __kthread_bind_mask+0x60/0x60
[  982.976205]  ret_from_fork+0x1f/0x40
[  982.976207] INFO: task kworker/u16:7:268 blocked for more than 122 seconds.
[  982.976208]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.976209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.976209] kworker/u16:7   D    0   268      2 0x80004000
[  982.976227] Workqueue: btrfs-delalloc btrfs_work_helper [btrfs]
[  982.976228] Call Trace:
[  982.976230]  __schedule+0x2a7/0x7e0
[  982.976232]  schedule+0x43/0xd0
[  982.976245]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.976247]  ? wait_woken+0x80/0x80
[  982.976260]  start_transaction+0x331/0x520 [btrfs]
[  982.976274]  cow_file_range_inline.constprop.0+0xec/0x670 [btrfs]
[  982.976276]  ? _raw_spin_lock_bh+0x13/0x30
[  982.976278]  ? __local_bh_enable_ip+0x33/0x70
[  982.976296]  ? zstd_put_workspace+0x6e/0x120 [btrfs]
[  982.976313]  ? btrfs_compress_pages+0x77/0xa0 [btrfs]
[  982.976327]  compress_file_range+0x414/0x810 [btrfs]
[  982.976342]  ? submit_compressed_extents+0x410/0x410 [btrfs]
[  982.976355]  async_cow_start+0x12/0x30 [btrfs]
[  982.976372]  btrfs_work_helper+0xde/0x400 [btrfs]
[  982.976375]  process_one_work+0x1e1/0x3d0
[  982.976377]  worker_thread+0x4d/0x3e0
[  982.976379]  ? rescuer_thread+0x3f0/0x3f0
[  982.976380]  kthread+0x13e/0x160
[  982.976381]  ? __kthread_bind_mask+0x60/0x60
[  982.976384]  ret_from_fork+0x1f/0x40
[  982.976386] INFO: task kworker/u16:8:269 blocked for more than 122 seconds.
[  982.976387]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.976387] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.976388] kworker/u16:8   D    0   269      2 0x80004000
[  982.976405] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[  982.976406] Call Trace:
[  982.976408]  __schedule+0x2a7/0x7e0
[  982.976410]  schedule+0x43/0xd0
[  982.976423]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.976425]  ? wait_woken+0x80/0x80
[  982.976438]  start_transaction+0x331/0x520 [btrfs]
[  982.976453]  btrfs_finish_ordered_io.isra.0+0x339/0x820 [btrfs]
[  982.976470]  btrfs_work_helper+0xde/0x400 [btrfs]
[  982.976473]  process_one_work+0x1e1/0x3d0
[  982.976475]  ? rescuer_thread+0x3f0/0x3f0
[  982.976476]  worker_thread+0x4d/0x3e0
[  982.976478]  ? rescuer_thread+0x3f0/0x3f0
[  982.976479]  kthread+0x13e/0x160
[  982.976480]  ? __kthread_bind_mask+0x60/0x60
[  982.976483]  ret_from_fork+0x1f/0x40
[  982.976485] INFO: task kworker/u16:9:288 blocked for more than 122 seconds.
[  982.976486]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.976486] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.976487] kworker/u16:9   D    0   288      2 0x80004000
[  982.976504] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[  982.976505] Call Trace:
[  982.976507]  __schedule+0x2a7/0x7e0
[  982.976509]  schedule+0x43/0xd0
[  982.976522]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.976524]  ? wait_woken+0x80/0x80
[  982.976537]  start_transaction+0x331/0x520 [btrfs]
[  982.976552]  btrfs_finish_ordered_io.isra.0+0x339/0x820 [btrfs]
[  982.976569]  btrfs_work_helper+0xde/0x400 [btrfs]
[  982.976571]  process_one_work+0x1e1/0x3d0
[  982.976573]  ? rescuer_thread+0x3f0/0x3f0
[  982.976574]  worker_thread+0x4d/0x3e0
[  982.976576]  ? rescuer_thread+0x3f0/0x3f0
[  982.976577]  kthread+0x13e/0x160
[  982.976579]  ? __kthread_bind_mask+0x60/0x60
[  982.976581]  ret_from_fork+0x1f/0x40
[  982.976583] INFO: task systemd-journal:298 blocked for more than 122 seconds.
[  982.976584]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.976584] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.976585] systemd-journal D    0   298      1 0x00000324
[  982.976586] Call Trace:
[  982.976588]  __schedule+0x2a7/0x7e0
[  982.976590]  schedule+0x43/0xd0
[  982.976603]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.976605]  ? wait_woken+0x80/0x80
[  982.976618]  start_transaction+0x480/0x520 [btrfs]
[  982.976632]  btrfs_setattr+0x14a/0x5e0 [btrfs]
[  982.976636]  notify_change+0x33b/0x4a0
[  982.976638]  ? cap_inode_need_killpriv+0x1a/0x30
[  982.976641]  do_truncate+0x88/0xe0
[  982.976644]  do_sys_ftruncate+0xfd/0x110
[  982.976647]  do_syscall_64+0x49/0x90
[  982.976650]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  982.976651] RIP: 0033:0x7fd279e9556b
[  982.976655] Code: Bad RIP value.
[  982.976656] RSP: 002b:00007ffc43026008 EFLAGS: 00000206 ORIG_RAX: 000000000000004d
[  982.976657] RAX: ffffffffffffffda RBX: 00007ffc43026040 RCX: 00007fd279e9556b
[  982.976658] RDX: 00005617ac94c310 RSI: 0000000008000000 RDI: 0000000000000017
[  982.976659] RBP: 00005617ac94c500 R08: 0000000000000001 R09: 00005617ac94c584
[  982.976660] R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc43026038
[  982.976660] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[  982.976664] INFO: task kworker/u16:10:420 blocked for more than 122 seconds.
[  982.976665]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.976665] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.976666] kworker/u16:10  D    0   420      2 0x80004080
[  982.976684] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[  982.976684] Call Trace:
[  982.976687]  __schedule+0x2a7/0x7e0
[  982.976689]  schedule+0x43/0xd0
[  982.976702]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.976704]  ? wait_woken+0x80/0x80
[  982.976717]  start_transaction+0x331/0x520 [btrfs]
[  982.976731]  btrfs_finish_ordered_io.isra.0+0x339/0x820 [btrfs]
[  982.976748]  btrfs_work_helper+0xde/0x400 [btrfs]
[  982.976751]  process_one_work+0x1e1/0x3d0
[  982.976753]  ? rescuer_thread+0x3f0/0x3f0
[  982.976754]  worker_thread+0x4d/0x3e0
[  982.976756]  ? rescuer_thread+0x3f0/0x3f0
[  982.976757]  kthread+0x13e/0x160
[  982.976759]  ? __kthread_bind_mask+0x60/0x60
[  982.976761]  ret_from_fork+0x1f/0x40
[  982.976784] INFO: task kwin_x11:1441 blocked for more than 122 seconds.
[  982.976784]       Tainted: G           OE     5.4.64-1-MANJARO #1
[  982.976785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.976785] kwin_x11        D    0  1441      1 0x00000080
[  982.976787] Call Trace:
[  982.976789]  __schedule+0x2a7/0x7e0
[  982.976791]  schedule+0x43/0xd0
[  982.976804]  wait_current_trans+0xdf/0x120 [btrfs]
[  982.976806]  ? wait_woken+0x80/0x80
[  982.976819]  start_transaction+0x331/0x520 [btrfs]
[  982.976834]  btrfs_dirty_inode+0x44/0xd0 [btrfs]
[  982.976836]  file_update_time+0x105/0x150
[  982.976851]  btrfs_page_mkwrite+0xfe/0x4e0 [btrfs]
[  982.976855]  do_page_mkwrite+0x56/0xb0
[  982.976857]  do_wp_page+0x3ee/0x550
[  982.976859]  __handle_mm_fault+0xd7f/0x16c0
[  982.976862]  handle_mm_fault+0xca/0x1f0
[  982.976864]  do_user_addr_fault+0x1ef/0x470
[  982.976867]  ? syscall_slow_exit_work+0x134/0x160
[  982.976868]  page_fault+0x34/0x40
[  982.976870] RIP: 0033:0x7f667900ffe5
[  982.976872] Code: Bad RIP value.
[  982.976873] RSP: 002b:00007ffe9afba000 EFLAGS: 00010246
[  982.976874] RAX: 0000000000000000 RBX: 00000000000005a1 RCX: 0000000000000000
[  982.976874] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007f65b778e008
[  982.976875] RBP: 00007f65b778e008 R08: 000055a547b009f0 R09: 00007ffe9afba198
[  982.976876] R10: 00007ffe9afba130 R11: 00007ffe9afba120 R12: 00007ffe9afba070
[  982.976876] R13: 0000000000000000 R14: 00007f65b778e018 R15: 00007ffe9afba300
[  995.009335] audit: type=1701 audit(1599914234.975:105): auid=4294967295 uid=0 gid=0 ses=4294967295 subj==unconfined pid=298 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" sig=6 res=1
[  995.053215] systemd[1]: systemd-journald.service: Main process exited, code=dumped, status=6/ABRT
[  995.053295] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
[  995.053946] audit: type=1131 audit(1599914235.019:106): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[  995.054991] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
[  995.056582] systemd[1]: Stopping Flush Journal to Persistent Storage...
[  995.062197] systemd[1]: systemd-journal-flush.service: Succeeded.
[  995.062706] systemd[1]: Stopped Flush Journal to Persistent Storage.
[  995.062769] audit: type=1131 audit(1599914235.029:107): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-journal-flush comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  995.063151] systemd[1]: Stopped Journal Service.
[  995.063221] audit: type=1130 audit(1599914235.029:108): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  995.063224] audit: type=1131 audit(1599914235.029:109): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[  995.065505] systemd[1]: Starting Journal Service...
[  995.412503] audit: type=1305 audit(1599914235.379:110): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj==unconfined res=1
[ 1011.274805] systemd-coredump[4966]: Process 298 (systemd-journal) of user 0 dumped core.
[ 1011.274811] systemd-coredump[4966]: Coredump diverted to /var/lib/systemd/coredump/core.systemd-journal.0.64087648dba94698a6e063a6ed522277.298.1599914234000000.zst
[ 1011.274813] systemd-coredump[4966]: Stack trace of thread 298:
[ 1011.274815] systemd-coredump[4966]: #0  0x00007fd279e9556b ftruncate64 (libc.so.6 + 0xf956b)
[ 1011.274817] systemd-coredump[4966]: #1  0x00007fd27a14965e journal_file_post_change (libsystemd-shared-246.so + 0x1ae65e)
[ 1011.274819] systemd-coredump[4966]: #2  0x00007fd27a1496e2 n/a (libsystemd-shared-246.so + 0x1ae6e2)
[ 1011.274821] systemd-coredump[4966]: #3  0x00007fd27a14237e n/a (libsystemd-shared-246.so + 0x1a737e)
[ 1011.274823] systemd-coredump[4966]: #4  0x00007fd27a14271b sd_event_dispatch (libsystemd-shared-246.so + 0x1a771b)
[ 1011.274825] systemd-coredump[4966]: #5  0x00007fd27a143a61 sd_event_run (libsystemd-shared-246.so + 0x1a8a61)
[ 1011.274827] systemd-coredump[4966]: #6  0x00005617ac20c1f6 n/a (systemd-journald + 0x61f6)
[ 1011.274829] systemd-coredump[4966]: #7  0x00007fd279dc4152 __libc_start_main (libc.so.6 + 0x28152)
[ 1011.274831] systemd-coredump[4966]: #8  0x00005617ac20c91e n/a (systemd-journald + 0x691e)
[ 1033.343298] audit: type=1100 audit(1599914273.309:111): pid=5093 uid=1000 auid=1000 ses=2 subj==unconfined msg='op=PAM:authentication grantors=pam_faillock,pam_permit,pam_faillock acct="gustavo" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
[ 1033.344593] audit: type=1101 audit(1599914273.309:112): pid=5093 uid=1000 auid=1000 ses=2 subj==unconfined msg='op=PAM:accounting grantors=pam_permit,pam_time acct="gustavo" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
[ 1033.344775] audit: type=1110 audit(1599914273.309:113): pid=5093 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_faillock acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
[ 1033.347897] audit: type=1105 audit(1599914273.312:114): pid=5093 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:session_open grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'

Interestingly, while the computer was frozen, I noticed hardly no disk activity at all (this notebook has a LED that flashes on disk access), so it did not look as if btrfs was doing any house cleaning either. It just completely froze…

Do you understand what this log talks about? (pretty much all of it is marked red in the original output). Thank you so much.

gustavo · 14 September 2020 01:41

Another weekend toiling with this issue. It is now clear that the freezes relate to the snapshot creation process with Timeshift and btrfs

Hourly snapshots were bringing my computer to a total freeze. I reduced to only at boot, but even that causes the notebook to freeze for like 20 minutes, with a few short hiccups in the meantime, but mostly frozen all the way.

Disk activity remains very low during the freeze and CPU usage also seems to remain low. On some other forums (mostly SuSE) I found users having the exact same problem, but they tend to report high CPU usage with btrfs-transacti and btrfs-cleaner, none of which seem to be installed on Manjaro (I might be wrong).

This seems to be a relatively prevalent matter. I see numerous posts on google but none with a clear solution.

ciradrak · 22 February 2021 20:04

I may have a work around for this problem.

tldr:

turn off Timeshift’s “Enable BTRFS qgroups (recommended)”
It’s in Settings –> Users. Sorry, cannot recommend this feature.
Then turn off btrfs’ quotas:
‘sudo btrfs quota disable .’

why:

btrfs-transacti & btrfs-cleaner appear to be related to the kernel, they don’t seem to be executables. They certainly are part of manjaro. It’s also difficult to see them in the various monitoring tools, but good old ‘top’ shows them fine.

I found this little quote from the btrfs IRC channel:
Zygo: quota is not on my “btrfs features that work” list

You can check to see if btrfs quotas are turned on using:
‘btrfs qgroup show .’
It is not well documented in the man pages, at least not in my opinion as you need to know to check the btrfs-qgroup man page instead of the btrfs-quota man page.

Without turning off the qgroups feature in Timeshift first, it does little good to turn quotas off manually, as Timeshift will turn them back on.

My system has been suffering for weeks, after figuring this out, it’s been running fine for the last day. No more mysterious freezes. Hopefully it will continue to stick. I figured it was worth weighing in on what I’d discovered, and hope this helps everyone else.

Also, I tried using various kernels 5.10, 5.9, & 5.4 but the problem remained.

yuumei · 26 April 2021 12:49

I’m having the same problem, basically btrfs and timeshift with quota enabled I had extreme freezes. Eventually my data started to become corrupt, I managed to fix some of the corruption with “btrfs check --repair” but the problem came back. I found this suggestion to disable quota and it seems to have worked so far. No freezes.

Basically I did “btrfs quota disable” and made sure timeshift had it disabled too as above.

andreas85 · 22 September 2021 06:20

I did have the same problem a few years ago. I used snapper and had activated quotas. I did not find a solution then. But today i think the quotas (together with a lot of snapshots) may be the source of the problem.
When i do recall it right, the freezes did occur when the snapshots where to be deleted.

Spunkie · 23 September 2021 23:59

I’ve had the same issues and resolved the freezing by running sudo btrfs quota disable, the qgroups options had already been disabled in my timeshift settings since init.

So no more freezing, but now timeshift isn’t showing a size for any of my new snapshots.

Not 100% sure if the 0B size problem relates to this qgroups fix, something else, or maybe related to adding compress=zstd:1 to my fstab?

Anyone else run into this?

Edit: The above issue is directly related to disabling qgroups, they are required to calculate the snapshot diff sizes it seems. Re-enabling qgroups will cause the missing diff sizes to be displayed/filled in retroactively, but of course will bring back the system freezes. ¯\_(ツ)_/¯

Also there may be other side effects for snapshot deletion when disabling qgroups, check these github issues I found for more info.

github.com/teejee2008/timeshift

timeshift is re-enabling btrfs quota? btrfs snapshot delete causes processes doing IO to hang for 1-5 minutes

opened 08:22PM - 19 Dec 20 UTC

ptandler

I must admit, I'm not 100% sure, so this is somehow more a question than a bug r…eport: (Is there a support forum for timeshift where these questions would fit better?) As I haven't looked into the sources yet: - does timeshift (re-)enable btrfs quota? (maybe when checking for snapshots to delete) - when are snapshot deletes scheduled? Can this be tweaked? e.g. just delete them once a day if on btrfs? Background: I use timeshift on a btrfs partition since about 6 month. I sometime experience that some processes (My current hypothesis is: all that do some disc IO) hang from time to time for 1 to 5 minutes. That's annoying. I've been hunting this for a while now and found that btrfs-transaction seems to be causing it. The btrfs-transaction process has very high CPU for 1-5 minutes and all processes resume as soon as btrfs-transaction terminates. Today, I talked to one of the btrfs developers and he pointed me at the btrfs snapshop delete operations: "I do daily snapshots. I have machines where a **snapshot delete can take a full hour**" So today I changed the timeshift settings and disabled hourly snapshots. I'll see if this fixes this issue. And a quote about btrfs quota: "oh I forgot about quota. That's probably where all your CPU is going" and "(IIRC it has to rebuild the quota data from scratch in that case?) **quota is not on my "btrfs features that work"** list" To conclude: - if timeshift is enabling quota on btrfs I think this should be avoided until btrfs quota is more robust - timeshift should have an improved scheduling of snapshot deletes on btrfs

github.com/teejee2008/timeshift

Removing snapshots fails if BTRFS quota groups (qgroups) is disabled

opened 09:07PM - 02 Dec 20 UTC

twoktwok

If ... * the option "Enable BTRFS qgroups (recommended)" is _disabled_ in the G…UI settings -and- * btrfs quotas is disabled on the system (`sudo btrfs quota disable /`) ... Timeshift fails to completely remove snapshots on the first try. **e. g. (sample output)** ``` ------------- [...] E: ERROR: can't list qgroups: quotas not enabled E: btrfs returned an error: 256 E: Failed to query subvolume quota [...] ------------- Removing snapshot: 2020-12-02_20-08-29 Deleting subvolume: @ (Id:1513) Deleted subvolume: @ (Id:1513) Destroying qgroup: 0/1513 E: Failed to destroy qgroup: '0/1513' E: Failed to remove snapshot: 2020-12-02_20-08-29 ------------- ``` **Problem:** On this first try everything except the snapshots "`info.json`" file is actually deleted. This causes Timeshift to still list this snapshot! A second remove attempt then successfully removes the already deleted snapshot from the list. This behaviour seriously messes up scheduling because actually deleted snapshots are still counted. I assume the check and failure to "destroy qgroup" error causes the remove process to exit before info.json and the snapshot folder are deleted. --- **System:** - Arch Linux with kernel 5.9.11-arch2-1 - XFCE - Timeshift version 20.11.1 **Note:** I have experienced some serious system freezes on a low-spec machine caused by [btrfs-cleaner] and [btrfs-transaction] after removing snapshots through timeshift. This issue vanishes when btrfs quotas are disabled. That's when I noticed the above bug.

Zesko · 3 February 2022 11:13

I notice the freeze issue when deleting snapshopts in Timeshift in my PC using SSD.
I tested deleting snapshots in the same Manjaro KDE in VM that has no freeze, but it is pretty fast.
I do not know why.

zilexa · 3 February 2022 11:30

I have been using BTRFS on laptops, homePCs of my extended family and for my homeserver. I follow the best practices and have quotas disabled.

Never had issues for the past 3 years. In my opinion its the ideal filesystem for most home users as it allows the easiest snapshots/restore/backup, working without interrupting the user.

The only thing I had issues with (but this was on Ubuntu) was Snapper. In my opinion you should stay away from it. It used to install in obscure locations (they fixed that), very hard to remove/update and the folder structure it uses for snapshots makes absolutely zero sense and is just confusing. Lead often to filled up disks because snapshots were hidden in subsubsubfolders within the subvolume and never got deleted.

Instead, Timeshift works nicely, is easy to use. Manjaro has integrated snapshots nicely in the Grub boot menu. And for proper backups (a snapshot is only a backup if you move it to a different drive or different system), btrbk is the way to go in my opinion. It can be installed via AUR.

Zesko · 3 February 2022 11:33

Does it work with restoring/creating/deleting snapshots without any problem or break data?

zilexa · 3 February 2022 11:37

BTRFS quotas is optional functionality. It is not necessary for any of those actions at all. If something is optional, decide why you would need it. I believe Timeshift used to have it enabled by default just because it wants to get the filesize of snapshots for its GUI. That’s not really a valid reason to enable Quotas.
I believe for new installs (not updates) they disable it by default (or I just hope they do).

also read: https://www.reddit.com/r/btrfs/comments/odg32z/should_i_enable_qgroups_in_timeshift/

With quotas/qgroups enabled, deletion of snapshots can become very slow. If you yourself have no specific reason to use Quotas, disable it in Timeshift (if you disable it via btrfs command but its still enabled in Timeshift, Timeshift will enable it again). Timeshift doens’t need it.

Zesko · 3 February 2022 11:53

Okay.
I checked sudo btrfs qgroup show / in VM. But quota is still enabled by default after installation.
The deleting snapshots is pretty fast , no freeze.

Zesko · 8 February 2022 18:21

I moved many unnecessary heavy resources (e.g. Steam Games and many VMs) from the subvolume @home into a new subvolume @nosnapshot that you don’t need to create a new snapshot.
Make @home snapshots lighter weight and faster.