Kernel 4.12: Crashes when resuming from suspending

kde
kernel
laptop

#47

There is now a list of commits from upstream that should fix the issue:


You should move to a supported kernel, whether that’s 4.9 or 4.12.

This is very unlikely to be the same issue. The issue in this thread is most definitely with bfq-mq in kernel 4.12; older kernels don’t have bfq-mq.


#48

4.12.4 is on testing branch now. Suspend to RAM works with it fine without extra kernel parameters, because it has bfq-sq as default scheduler instead of bfq-mq.


#49

Yup; that’s a workaround, not a fix.

See post 33 above and the post it links to


#50

My kernel is working fine, I don’t have any reason to upgrade at the moment, especially when everyone here is having this issue with 4.12, it doesn’t give me any hope for a better experience, when 4.12.4 comes out of testing as mentioned in the message after yours, the solution is to default to the I/O scheduler that causes me problems anyway… If I were to downgrade kernel the 4.9 LTS isn’t any better, 4.8 doesn’t have the problem. I plan to upgrade to 4.13 when it’s available(assuming it will also have any fixes in 4.12 here.

I’m well aware of what this thread is about…

In 4.12 kernel, BFQ got mainlined with blk-mq scheduler, I’m guessing Manjaro uses this by default for 4.12 kernel?

I was trying to point out the similar behaviour with suspend/resume and BFQ that I had. Hasn’t the answer here been changing from the BFQ blk-mq scheduler fixes the issue? So it’s perhaps then possible that the BFQ bug that affects my system might happen to exist in the blk-mq scheduler in a similar form that affects users more easily?


#51

Kernel 4.10 is EOL and hasn’t been getting any security-related updates from the kernel developers. If you’re happy running a vulnerable system, fine. Otherwise, you should be looking at updating. 4.12.4-2 doesn’t exhibit the suspend issues.

If you’re reporting a bug in the BFQ scheduler in kernel 4.9 it’s going to be worth checking a newer kernel to confirm. There’s not a lot of upstream development for older kernels, so any fixes will be in newer ones (unless backported, as in this case).

Switching from bfq-mq to bfq-sq.

If there’s a bug in single-queue (i.e. normal) BFQ it’s not the same as this one or everyone else would have hit it too. Let’s move this to another thread.


#52

Vulnerable in what way? As a competent linux desktop user there is very little concern that I’ll be infected with anything or at much risk beyond someone gaining physical access to my machine.

4.12.4-2 might not exhibit suspend issues for the majority of users in this thread, BFQ still may not be in any better of a state for my similar issue, but perhaps blk-mq equivalent for BFQ doesn’t share the same problem with my system. I’ll upgrade at a later date.

I reported the bug many months ago, as well as on the kernel bug tracker which helped identify that it was due to BFQ and XFS on my system. I was on 4.9 for a while, I can’t recall why I updated to 4.10, something in the kernel features I guess(probably virtualization related). Maybe it’s been fixed by now, presently I’m fine with noop scheduler.

In which case, if the bug persists for my system as it did prior to blk-mq last I checked, that solution/workaround isn’t one for me.

Depends, I assume the blk-mq BFQ scheduler shares similar code to legacy BFQ one, but as the code/implementation is bound to have differences, perhaps blk-mq exposes the bug to a wider audience, a new code path that runs into the same bug I do(BFQ on XFS filesystem, possibly related to my hardware or something else, I don’t know how many others go with XFS over EXT4 which I think is the FS default for Manjaro?).

I definitely wouldn’t cross it out as related though. You should be able to further confirm with logrotate which would cause the same problem as suspend/resume when it ran as a midnight task. I think the command the systemd timer executed just had to be done twice via cli to get the same results. Here is my forum thread, dates back to March, it had been bothering me for sometime(4.9 released in december). That thread and related ones could potential help debug the problem…


#53

The issue still remains in 4.12.5 and 4.13-rc4.


#54

Would it be possible for Manjaro to move away from BFQ and use deadline or cfq until this bug is sorted out and fixed?


#55

@kouros17 are you saying there is a regression? 412-4 working fine here. Will hold back updating.


#56

I imagine that @Ste74’s changes to the default scheduler for 4.12.4-2 haven’t been carried over to 4.12.5-1.


#57

@lisa Please, note the regression mentioned above.


#58

It would be useful to report the regression in this topic:


#59

For me too, because @Ste74 disabled blk-mq in 4.12.4-2.
But blk-mq is one of the most important features of kernel 4.12 (and above) and turning it off is not a real solution but just a workaround.

The new update of kernel (4.12.5) does not contain the “fix of Ste74” and I don’t know if Ste74 want to do the same in 4.12.5.

Anyway I’m waiting for the definitive fix by @philm


[Testing Update] 2017-08-11 - Kernels, Mesa, Firefox, Pamac
#60

When 4.12.5 and 4.13.-rc4 arrives in testing branch. :slight_smile:
But is not a regression because the issue has never been fixed.


#62

In my 4.12, I have enabled blk-mq but I didn’t include the Manjaro mq-bfq patches. Suspend works.
Wouldn’t it be a better solution to just remove the mq-bfq patches altogether until they’re either stable or part of mainline kernel?


#63

Did today’s kernel updates do anything to fix this issue?


#64

Today’s kernel update is only for stable branch (kernel 4.12.4) which it contains the workaround by @Ste74 but not a real solution.


#65

4.12.6 still has the same issue.

@philm?


#66

Philip just told me that we use the same block commits as v4.13 kernel. The patches should be already added. Also the workaround by Stefano is included in v4.12.7 kernel, which will get released soon.


#67

I just successfully did a resume from hibernate with 4.12.6 on my X230, which had issues with that previously. I need to test resume after suspend-to-RAM.

Does this mean 4.13 is patched, but 4.12 is not? I thought there was a patch set in the Google Group thread Phil mentioned a while back? Will those not apply?