Avoid kernel > 5.17 if you're using f2fs for now

TL;DR: Stay on 5.15 (LTS) kernel if you’re using f2fs filesystem, especially if using it for system partitions. There’s currently a nasty bug going on upstream.

Recently I updated all my Manjaro systems to 6.0 kernel when it came out, and it appeared that one of my machines is now suffering from a f2fs-related bug (216050), which is currently ongoing upstream. Related Arch Linux task here (FS#74906).

The phenomenon on the surface is that the system is running relatively hotter, with f2fs_gc taking up 100% of one CPU core constantly.

However, it seems to have impaired the system’s functionality to some extent:

  • System shutdown took forever (hours) to complete that I usually have to manually do the shutdown or reset through power buttons.
  • Operations to F2FS may end up unstable. When I tried doing a pacman -Syu yesterday, the process got stuck when updating the keys, and I couldn’t kill the process through normal means such as CTRL-C. I ended up shutting down the system by force, and retried the update using a 5.15 kernel. Fortunately there is no apparent major damage to the system for the time being: I’m only informed about a corrupted .zsh_history file, and pacman -Syu succeeded without reporting any issue.

So for now I have to use 5.15 for that particular machine, as 5.15 predates the f2fs bug and is currently working fine here. According to the bug report the f2fs_gc issue was first sighted on 5.18 kernel version.

I’ll have to keep an eye on that issue to see if there are any signs of fixes. For now, be sure to keep 5.15 kernel installed in case the issue started affecting you at some point.

4 Likes

I’ve adjusted the topic since the upstream bug report you linked mentions kernel 5.17 as last working kernel.

1 Like

I’m the bug reporter on kernel bugzilla. And I’m very sad :frowning:

Apparently a f2fs corruption issue was already mentioned by sobrus as early as in June, which might be related, or that was exactly this same issue which you reported on bugzilla. The kernel version mentioned in that post was 5.18.3, and he had no issues with up to 5.18.1.

Looks like f2fs has been broken upstream for quite a while, but given these are the only relevant posts I could find on this forum, this issue, though serious, may not have affected many people.

A bump on this. The issue persists for me as of 6.0.11 kernel.

It happened as I didn’t check which kernel I booted into after a full system upgrade so it booted directly into 6.0 series, and suddenly the system stopped responding to anything as I noted one of the CPU core is being taken by f2fs_gc. I had no choice but to hard reset the system as I cannot even log in from the terminal, since accesses to the root partition (on which the passwd file resides) all hung indefinitely so the login timed out waiting.

I haven’t tested 6.1rc on that target system yet, as that system needs to stay operational. I noted there were some progresses in the bugzilla, so I wonder if there are anything I could try to work around the bug with later kernel versions.

Another bump. All other non-LTS 5.x versions have become EOL as of now.

If you’re affected by the issue you’ll have to go back to 5.15, or maintain the kernel version you’re using yourself if 5.15 does not work for you.

Again, I haven’t tested much recently as the system in question is too mission critical to be experimented on anymore, and I don’t really have another PC that uses f2fs as root partition. I’ve removed all other kernels on that system and kept only 5.15 to avoid accidentally booting to 6.x kernels upon kernel update like last time.

On the other hand, the upstream bug report suggests the issue is still being looked at…

It’s not a Manjaro fault, but a bug in mainline kernel. At the moment I use 5.15 kernel that work well on my pc, also better than 6.x version. But its EOF is on October 2023 so I hope the bug will me solved earlier.

Indeed… this issue really needs to be addressed before 5.15 goes EOL, or it’ll complicate update process from that point on for rolling release distros like this.

I noticed that you even reformatted your f2fs partition… which kernel version were you on when you did the format?

If you did the format on current kernel versions (6.x) and still seeing the issue then the kernel version on which the f2fs filesystem was created doesn’t matter for this issue. I’m just asking as from what I read f2fs may change over time and require some “upgrade” across different kernel versions, which is handled by its corresponding fsck process.

It’s been a year since I reported the issue. I’m still looking at the development in the kernel bugzilla but it hasn’t been updated for 3 months.

I’m not sure if the f2fs bug is still being investigated and tested. For now I’m still keeping my affected systems on 5.15 as it’s still maintained on Manjaro to some extent, along with a few other older LTS kernels.

I’m thinking, if this bug with f2fs on newer kernels is not going to be addressed in the near future then users should be warned not to use f2fs especially as system partition to avoid encountering failures later on, since current ISOs have switched to at least 6.1 kernel which is affected by this bug, and the chance to encounter it is nonzero however low it might be.

Difficult to have an opion on upstream kernel bugzilla :man_shrugging:

I have a system running 6.6 on f2fs - no apparent issues.

And I can say - bumping the issue here will achieve nothing as Manjaro Linux does fix upstream bugs - that is if it is a bug and not due to a configuration issue.

2 Likes

And I can say - bumping the issue here will achieve nothing as Manjaro Linux does fix upstream bugs - that is if it is a bug and not due to a configuration issue.

I know… considering the potentially destructive nature of this bug and that it could be a bit non-trivial to set up an accurate test environment that could easily reproduce the issue with little real-world risk… can’t really expect anyone else to test and verify.

I have a system running 6.6 on f2fs - no apparent issues.

The last comments on the bugzilla dated August 20-21 were about kernel 6.4.6… there are no further comments on how things are with 6.5 and 6.6 kernels.

I’ve decided to do another experiment by running the system with the most recent 6.6 kernel (which is also LTS). Will be backing up important stuffs on my root and home partitions (which are the only f2fs partitions in that system) to other places so even if I managed to reproduce the bug, it would not end up a real disaster should it resulted in a catastrophic system failure.

After my last post more than a month ago I’ve been running that system with 6.6 kernel. After this long and with several updates in between, I haven’t reproduced the issue, though I’m able to discover a couple of files that appear really corrupted and in some cases impossible to delete (I/O error when I do so). Most likely they were corrupted during the period when I was able to reproduce the issue…

I’m not sure if this issue is still being investigated and tested. For now I’m considering 6.6 (LTS) kernel safe enough to use, though this is in no way 100% conclusive.

EDIT: Considering most kernels in between have gone EOL, the only kernel one needs to pay attention to would be 6.1.

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.