After a bizarre half-crash half-reboot, Manjaro won't boot anymore

I’ll try to be as thorough as reasonably possible.

Yesterday, I started a game I hadn’t played in over two years (Terraria, if that matters). It worked just fine, but when I alt-tabbed out of the game to write something in a text file and went back into the game, my input devices were completely unresponsive. Passing my hand on the entire keyboard did nothing. The game kept running (not frozen) except that, a minute or two later, in the space of a single second, the screen suddenly turned black and displayed these three lines

Starting systemd-udevd version 252.9-1-manjaro
/dev/sda1: recovering journal
/dev/sda1: clean, numbers of files and blocks

just as if the system had rebooted in the background somehow (with the game hiding the fact that it had rebooted), except it stops there with a frozen cursor after those three lines.

And now that’s all it does. Same three lines (slightly different files/blocks numbers, up or down), same frozen cursor. It happens no matter which of my four kernels (5.15, 5.10, 5.4 and 4.19) and their four associated fallbacks I select.

I was told about some things called TTYs accessed via Ctrl+Alt+Fkeys, but it does nothing.

Trying to get in via a live environment, I made a bootable USB drive, set it to take priority when booting, and whether I choose open source or priority drivers, I get pretty much the same result:

Starting systemd-udevd version 252.5-1-manjaro

With a frozen cursor.

Given that I get the same result with all kernels and the USB drive, I’m starting to think that this problem is real bad, a lot worse than just my Manjaro install being damaged. Could it be a hardware problem? If so, what kind?

I am quite lost and don’t know where to even begin trying to identify the cause. Any information is appreciated.

Most people don’t know this, but those lines are actually leftovers from when the system booted. They are being output to the screen buffer of the boot console, and when the graphics subsystem exits, they become visible again.

So what you observed — albeit that this is obviously not the whole story in your case — is that the graphics subsystem exited and made the screen buffer of the boot console visible again.

But that’s when your system froze; the graphics subsystem exited due to some kind of error, and then this error propagated into the rest of the system — probably into the runtime kernel. And if/when the kernel freezes, it’s “game over” and the system won’t respond anymore.

The fact that you’re getting the same result when booting from the live USB does indeed suggest that it would be a hardware problem, and I am guessing — but don’t pin me down on this — that it might be your graphics adapter that’s malfunctioning and causing the kernel to hang.

It could be as simple as overheating, but then the problem should not reoccur after you’ve let the machine sufficiently cool down again. And if you’ve attempted multiple boots, even one in which you had to plug in the USB and boot from there, then the hardware would have had enough time to cool down again, so it’s probably not an overheating issue.

I’ve had strange freezes on a refurbished machine in the past that had similar (albeit slightly different) symptoms, and that were caused by the — in the event of that particular machine — onboard Radeon, as well as — and the following may also be a likely cause — a failing power supply.

The power supply provides different voltage outputs to various components in your system, and if that voltage drops below a certain threshold on one or multiple channels, then the machine may indeed lock up. Both the boot process and gaming — which is very GPU-intensive — draw a lot of current from the power supply, and so it is possible that in those moments, your system doesn’t receive enough power anymore on certain connectors, or on the CPU, or in the RAM, or whatever, causing the raw data to get corrupted and hanging the machine.

Mind you, this is just an educated guess based upon my experience. I’m not an engineer, nor am I a hardware expert — my knowledge about hardware is already somewhat dated by now.

If your machine is a desktop computer, then I’d have the power supply checked at a local computer store, and if it fails the tests — some PSUs have bad capacitors in them that wear out prematurely due to chemical instability — have it replaced. Likewise, if you have a discrete graphics adapter, have it checked, or try temporarily putting in a similar graphics adapter from another machine to see if it exhibits the same problem.

:man_shrugging:

1 Like

Oh, so that’s why it looked like it had rebooted. Good to know, thank you.

It’s definitely not an overheating issue. I left the computer off for over an hour while I ate and made the bootable USB drive. It should have been cool.

My GPU (GeForce GTX 1060) and PSU (650W) are both five years old, the whole desktop computer is. If your theory is correct, the PSU not being able to supply enough power would explain why I had problems shortly after starting to run the game (I haven’t played any resource-intensive games in months, but maybe Terraria qualifies despite being 2D) and while booting.

I guess my next step is to have the PSU tested tomorrow, then. It’s been five years since I built the computer, and even then I didn’t know what I was doing (I just followed the manual very strictly), I hope I managed to figure out how to take it out again. I’ll label all the cables to remember where they go, and of course I’ll unplug the power entirely before I start messing with it in any way.

Really wish I were better with these things, but I only know the very basics.

Thanks, I will try this and report back later.

Someone suggested that I try to stop the booting process at runlevel 3, before the presumably fatal graphics error comes into play.

I did and it gives me these lines:

ERROR: device ‘UUID=(bunch of characters)’ not found. Skipping fsck.
mount: /new_root: can’t find UUID=(same bunch of characters).
You are now being dropped into an emergency shell.
sh: can’t access tty; job control turned off
[rootfs ]#

I’m not sure if that’s normal or if it points to a specific problem.

But it does give me a command line, even if I’m not quite sure what to do with it. I was told that, if I felt adventurous, it is possible to go into /var/cache/pacman/pkg/ and reinstall previous packages and see if they work. But there’s no “cache” directory in “var”.

That’s not runlevel 3, that’s the emergency shell in the initramfs, and therefore, your root filesystem isn’t mounted, and there is also no content in /var.

Either way, your inability of the system to boot from the USB stick strongly suggests that it’s not a problem with the operating system, but rather with the hardware. And as long as that hardware problem exists, you’re not going to be able to boot. :man_shrugging:

Huh. I got there by adding “rw 3” to my boot line, and you’re telling me this isn’t runlevel 3, so… something went wrong, even before any graphical stuff was called upon, and I was kicked to the emergency shell.

Yeah, at this point it really, really looks like a hardware problem. But this error makes me think it’s not my GPU. Could still be the PSU. I really hope it’s not my SSD.

I don’t think so, because then you would have been able to boot off the USB stick. It could be the CPU, the RAM, the GPU, or the PSU. My money’s on the latter. :man_shrugging:

Oh right, the USB drive would have worked. Good thinking… and good news. The SSD would have been the biggest loss.

GPU would be bad, these things are monstrously expensive nowadays. But it’s high on the list of suspects, maybe just under the PSU.

CPU would be somewhat expensive to replace but not too big of a loss, I chose a bit of a weaker one.

I don’t think it’s the RAM, I just tested it yesterday after the problems occurred and got a PASS.

Yeah, I guess the PSU makes the most sense. They are surprisingly expensive but among the least problematic things to replace.

For now I’ll stick with the PSU test plan! Thanks again.

I like to report back in case someone encounters a similar problem in the future…

It was indeed the PSU, but you won’t believe why.

One of the cables has a weak connection and was loose enough that five years’ worth of random vibrations partially disconnected it. All that for a loose cable…

I am loathe to touch anything inside my computer (I am not comfortable around hardware), but next time anything like this happens, I’ll know to ground myself and push on connectors/cards just a little to make sure everything’s in place. The tech said a surprisingly high percentage of hardware issues are due to something dumb like that.

2 Likes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.