System crash due to i915 / BIOS

I’m facing total system crashes recently rather frequently (a few times a week). It happens when I’m using the browser. It starts with the screen freezing, and then the mouse cursor and the audio. Nothing will work and I won’t be even able to tty. The only possible next step is to hard restart.

After the system comes back up, looking into journalctl, I can see the following log:

kernel: i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}

I’m currently on Kernel 5.10.42-1 and I’ve also faced the same exact error on 5.11.22.-2 (on the same hardware). This is a rather new problem and I haven’t switched kernels or anything (unless the kernel I was on updated itself).

I see people talking about a similar issue like a year ago but I just started having them for like a month and even at the beginning, it was not that frequent. Now, it’s more frequent and I wanted to make sure if it is a pure software (/driver) issue or is it related to my hardware health. Of course, it would be great if I could fix it somehow. Also, the old tickets are talking about 5.4, 5.5, and 5.7 but mine is more up to date and still having the issue.

First thought: Is the Hardware Acceleration for Rendering or Video Decoding here involved?

I would rather think that it is a issue of the intel firmware (binary blob)… and the powersaving (idle) mode.

You could restrict the cstate with these kernel parameter as workaround:

processor.max_cstate=1 intel_idle.max_cstate=0 
processor.max_cstate=   [HW,ACPI]
                        Limit processor to maximum C-state
                        max_cstate=9 overrides any DMI blacklist limit.
intel_idle.max_cstate=  [KNL,HW,ACPI,X86]
                        0       disables intel_idle and fall back on acpi_idle.
                        1 to 9  specify maximum depth of C-state.

Thanks, @megavolt. I should have provided more info on the hardware, sorry about that.

This is a Dell XPS 13 7490 laptop, used for software development. But I do have an external monitor connected through a dock which is connected by a USB3 port.

After I posted my question here, I checked Dell’s website and found out that I’m behind my BIOS updates. I’m not sure if it could have caused this issue or not but in any case, I just updated my BIOS from 1.3.0 to 1.8.0 (I was way behind!). Now I’m going to give it some time and see if it helped. Meanwhile, I would like to understand your suggestion. I’m not that versed on this topic. Could you please tell me how should I apply your changes? Also, what is the side effect of these changes? I mean what am I actually changing?

One last thing, you mentioned power-saving and idle mode. I just wanted to add that the problem almost always happens (at least as far as I can remember) while I’m in a meeting (Google Meet) which could be pretty power demanding for my laptop (especially if I have my cam on). So, I’m not sure if the idle mode applies here (just a thought though).

And as a reminder, this is a new problem that I didn’t have before (with the same hardware and the same OS). And as far as I can tell, I didn’t change anything myself (at least intentionally).

Add it in the file ~/etc/default/grub at line GRUB_CMDLINE_LINUX= and run sudo update-grub.

https://wiki.archlinux.org/title/Kernel_parameters#GRUB

C States are so called Power States.

  • C0 is the operating state.
  • C1 (often known as Halt) is a state where the processor is not executing instructions, but can return to an executing state essentially instantaneously. All ACPI-conformant processors must support this power state. Some processors, such as the Pentium 4 and AMD Athlon, also support an Enhanced C1 state (C1E or Enhanced Halt State) for lower power consumption, however this proved to be buggy on some systems.[41][42]
  • C2 (often known as Stop-Clock) is a state where the processor maintains all software-visible state, but may take longer to wake up. This processor state is optional.
  • C3 (often known as Sleep) is a state where the processor does not need to keep its cache coherent, but maintains other state. Some processors have variations on the C3 state (Deep Sleep, Deeper Sleep, etc.) that differ in how long it takes to wake the processor. This processor state is optional.

https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Processor_states

Sure there more than these ones. Just search the web.

The option max_cstate will just tell the kernel to not go further than this cstate. So processor.max_cstate=1 would stay at C0 and C1, but nothing more and intel_idle.max_cstate=0 will stay at C0, so never go in idle.

Side effect is a higher power consumption.

The reason why I suggest it is: I think that while running the browser the intel cpu goes into another cstate and when it goes there it has less power available and therefore it freeze, because the browser demanding power. The instruction when it should switch between cstates are read from the intel firmware (loaded by the kernel). So I assume, that there is a problem.

1 Like

It’s been more than one month and I haven’t had any crashes since I upgraded my BIOS. Seems like that was the cause.

Well, that didn’t take long!

Just a few hours after I posted my last message, I’ve had another episode (not me, my laptop). And it’s the same exact error message:

i915 0000:00:02.0: [drm] *ERROR* rcs0 reset request timed out: {request: 00000001, RESET_CTL: 00000001}

The interesting part is that, based on the timestamp of the error message, I would say that the crash does not happen exactly at the same time as the error. It seems to me it takes a while after the error is logged before my system crashes.