Avoid kernel > 5.17 if you're using f2fs for now

TL;DR: Stay on 5.15 (LTS) kernel if you’re using f2fs filesystem, especially if using it for system partitions. There’s currently a nasty bug going on upstream.

Recently I updated all my Manjaro systems to 6.0 kernel when it came out, and it appeared that one of my machines is now suffering from a f2fs-related bug (216050), which is currently ongoing upstream. Related Arch Linux task here (FS#74906).

The phenomenon on the surface is that the system is running relatively hotter, with f2fs_gc taking up 100% of one CPU core constantly.

However, it seems to have impaired the system’s functionality to some extent:

  • System shutdown took forever (hours) to complete that I usually have to manually do the shutdown or reset through power buttons.
  • Operations to F2FS may end up unstable. When I tried doing a pacman -Syu yesterday, the process got stuck when updating the keys, and I couldn’t kill the process through normal means such as CTRL-C. I ended up shutting down the system by force, and retried the update using a 5.15 kernel. Fortunately there is no apparent major damage to the system for the time being: I’m only informed about a corrupted .zsh_history file, and pacman -Syu succeeded without reporting any issue.

So for now I have to use 5.15 for that particular machine, as 5.15 predates the f2fs bug and is currently working fine here. According to the bug report the f2fs_gc issue was first sighted on 5.18 kernel version.

I’ll have to keep an eye on that issue to see if there are any signs of fixes. For now, be sure to keep 5.15 kernel installed in case the issue started affecting you at some point.

4 Likes

I’ve adjusted the topic since the upstream bug report you linked mentions kernel 5.17 as last working kernel.

1 Like

I’m the bug reporter on kernel bugzilla. And I’m very sad :frowning:

Apparently a f2fs corruption issue was already mentioned by sobrus as early as in June, which might be related, or that was exactly this same issue which you reported on bugzilla. The kernel version mentioned in that post was 5.18.3, and he had no issues with up to 5.18.1.

Looks like f2fs has been broken upstream for quite a while, but given these are the only relevant posts I could find on this forum, this issue, though serious, may not have affected many people.

A bump on this. The issue persists for me as of 6.0.11 kernel.

It happened as I didn’t check which kernel I booted into after a full system upgrade so it booted directly into 6.0 series, and suddenly the system stopped responding to anything as I noted one of the CPU core is being taken by f2fs_gc. I had no choice but to hard reset the system as I cannot even log in from the terminal, since accesses to the root partition (on which the passwd file resides) all hung indefinitely so the login timed out waiting.

I haven’t tested 6.1rc on that target system yet, as that system needs to stay operational. I noted there were some progresses in the bugzilla, so I wonder if there are anything I could try to work around the bug with later kernel versions.

Another bump. All other non-LTS 5.x versions have become EOL as of now.

If you’re affected by the issue you’ll have to go back to 5.15, or maintain the kernel version you’re using yourself if 5.15 does not work for you.

Again, I haven’t tested much recently as the system in question is too mission critical to be experimented on anymore, and I don’t really have another PC that uses f2fs as root partition. I’ve removed all other kernels on that system and kept only 5.15 to avoid accidentally booting to 6.x kernels upon kernel update like last time.

On the other hand, the upstream bug report suggests the issue is still being looked at…

It’s not a Manjaro fault, but a bug in mainline kernel. At the moment I use 5.15 kernel that work well on my pc, also better than 6.x version. But its EOF is on October 2023 so I hope the bug will me solved earlier.

Indeed… this issue really needs to be addressed before 5.15 goes EOL, or it’ll complicate update process from that point on for rolling release distros like this.

I noticed that you even reformatted your f2fs partition… which kernel version were you on when you did the format?

If you did the format on current kernel versions (6.x) and still seeing the issue then the kernel version on which the f2fs filesystem was created doesn’t matter for this issue. I’m just asking as from what I read f2fs may change over time and require some “upgrade” across different kernel versions, which is handled by its corresponding fsck process.