Cannot Connect to LAN over Ethernet on Reboot without Unplugging and Replugging Ethernet Cable

❯ uname -a
Linux $HOST 5.12.17-2-MANJARO-ARM #1 SMP PREEMPT Fri Jul 16 18:21:07 CDT 2021 aarch64 GNU/Linux

Display/Keyboard: I don’t have a monitor or keyboard plugged into this thing. It’s meant to be an SSH-only server.

Topography: Pi 4b is connected to network via internal 1Gbps Ethernet port. Same behavior with direct connection to router or direct connectoni to switch.

Testing: After I noticed that none of the computers on my network that use Pi-Hole for their DNS could actually resolve anything, I tried:

  1. Accessing the web interface, and couldn’t get that to load.
  2. Logging into the server via SSH. Host unavailable.
  3. Ping. Host unreachable.
  4. Hard rebooting the Pi (with the power button on my Canakit PSU).
    1. First reboot: no effect (still can’t ping);
    2. Second reboot: pingable for a minute or so, then host not found.
  5. Moving the Pi from a 1Gbps port on my router to a 1Gbps port on a switch connected to the router.

Nothing worked. Moving it to the switch was helpful, as the switch UI actually displays a very easy to use live graph for connected devices’ network usage. I was able to verify that there is NO network activity at all from the Pi when I can’t connect to it.

Solution (at least, to the problem of network connectivity):
Letting the Pi boot up, and then unplugging and replugging the ethernet cable while the device is powered on. This restores

Well… this is new. I’m not sure when this started–I know it wasn’t after the latest stable update, as I rebooted without issue, then.

Not really sure how to troubleshoot this. On one reboot, I got a unit failure notice on systemd-networkd-wait-online.service, but that didn’t happen on the next reboot, even though I still had to unplug and replug the cable.

Any ideas? Or any particular diagnostic info that would be helpful?

Thanks!

EDIT: I’ve confirmed that traffic itself doesn’t appear to contribute to the instability. I let a continuous ping cycle run overnight from one of my other computers to the Pi, and it maintained connection with zero percent packet loss.

Pretty sure that one lost packet is from me terminating the ping process.

44460 packets transmitted, 44459 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.115/0.289/3.145/0.114 ms

A common issue with ethernet is the patch cable - have you tried another cable?

2 Likes

I am running my pi4s on 5.12.17 without issue. However, I recently changed them to use static ip addresses, but I doubt that matters, as this seems to be a hardware ethernet issue. It should be working fine for you. As you have results by pluging-unpluging, it is probably an issue with the pin crimping on the cable.

2 Likes

This is a CableMatters CAT 6a patch cable that I’ve been using for several months without issue, and without unplugging it, so I’d be … pretty aggravated if it decided to die on me. These were not exactly cheap to buy a bag of… :frowning:

I’ve got spares, so I’ll try to do some tests this weekend. For the moment, it seems fine as long as I don’t reboot the thing.

That said, I did need to unplug another cable from the Pi the other day, and it’s all kind of wedged into a small space. I could easily see accidentally bending the cable or something without realizing it.

As I write this, I’m also thinking that the inconsistent systemd error messages are more a sign of a hardware issue than anything in software. With luck, I only damaged the cable, and not the jack itself.

Fingers crossed…

2 Likes

I have a selection of PI units including a PI4 and I decided to test the issue and below is the unfiltered output from my test. I did not touch power or ethernet - just rebooted a couple of times.

Worth noting is the kernel version - which on this card is 5.10 and branch is unstable.

I am not able to draw any conclusions from the test - but it may be of some use for you with your troubleshooting.

➜  ~ ssh 172.30.30.178                       
ssh: connect to host 172.30.30.178 port 22: No route to host
➜  ~ ssh 172.30.30.178
^C
➜  ~ ssh 172.30.30.178
(fh@172.30.30.178) Password: 
Welcome to Manjaro-ARM
~~Website: https://manjaro.org
~~Forum:   https://forum.manjaro.org/c/arm
~~Matrix:  #manjaro-arm-public:matrix.org
Last login: Wed Aug 25 08:40:45 2021 from 172.30.30.20
[fh@pitest ~]$ uname -a
Linux pitest 5.10.60-1-MANJARO-ARM #1 SMP PREEMPT Sun Aug 22 00:35:52 UTC 2021 aarch64 GNU/Linux
[fh@pitest ~]$ pacman-mirrors
Pacman-mirrors version 4.21.5
Local mirror status for arm-unstable branch
 Mirror # 1 https://repos.nix.dk/manjaro/ does not exist
[fh@pitest ~]$ reboot
Failed to set wall message, ignoring: Interactive authentication required.
Failed to reboot system via logind: Interactive authentication required.
Failed to talk to init daemon.
[fh@pitest ~]$ sudo reboot
[sudo] adgangskode for fh: 
Connection to 172.30.30.178 closed by remote host.
Connection to 172.30.30.178 closed.
➜  ~ ssh 172.30.30.178
^C
➜  ~ ssh 172.30.30.178
^C
➜  ~ ssh 172.30.30.178
(fh@172.30.30.178) Password: 
Welcome to Manjaro-ARM
~~Website: https://manjaro.org
~~Forum:   https://forum.manjaro.org/c/arm
~~Matrix:  #manjaro-arm-public:matrix.org
Last login: Wed Aug 25 08:42:45 2021 from 172.30.30.20
[fh@pitest ~]$ sudo reboot
[sudo] adgangskode for fh: 
Connection to 172.30.30.178 closed by remote host.
Connection to 172.30.30.178 closed.
➜  ~ ssh 172.30.30.178
^C
➜  ~ ssh 172.30.30.178
(fh@172.30.30.178) Password: 
Welcome to Manjaro-ARM
~~Website: https://manjaro.org
~~Forum:   https://forum.manjaro.org/c/arm
~~Matrix:  #manjaro-arm-public:matrix.org
Last login: Thu Aug 26 09:03:26 2021 from 172.30.30.20
[fh@pitest ~]$ sudo pacman -Syu
[sudo] adgangskode for fh: 
:: Synkroniserer pakkedatabaser...
 core                  237,4 KiB   631 KiB/s 00:00 [######################] 100%
 extra                   2,4 MiB  3,99 MiB/s 00:01 [######################] 100%
 community               6,1 MiB  7,53 MiB/s 00:01 [######################] 100%
:: Starter fuld systemopgradering...
løser afhængigheder...
kigger efter konflikter mellem pakker...

Pakker (2) libarchive-3.5.2-1  libcap-2.53-1

Samlet overførselsstørrelse:    0,53 MiB
Samlet installationsstørrelse:  1,31 MiB
Netto opgraderingsstørrelse:    0,02 MiB

:: Fortsæt med installation? [J/n] 
:: Indhenter pakker...
 libcap-2.53-1-aa...    71,8 KiB  2,92 MiB/s 00:00 [######################] 100%
 libarchive-3.5.2...   467,9 KiB  6,35 MiB/s 00:00 [######################] 100%
 Total (2/2)           539,7 KiB  6,94 MiB/s 00:00 [######################] 100%
(2/2) undersøger nøgler i nøglering                [######################] 100%
(2/2) undersøger pakkeintegritet                   [######################] 100%
(2/2) indlæser pakkefiler                          [######################] 100%
(2/2) undersøger for filkonflikter                 [######################] 100%
(2/2) undersøger tilgængelig diskplads             [######################] 100%
:: Behandler pakkeændringer...
(1/2) opgraderer libarchive                        [######################] 100%
(2/2) opgraderer libcap                            [######################] 100%
:: Kører eftertransationskroge...
(1/1) Arming ConditionNeedsUpdate...
[fh@pitest ~]$
2 Likes

I’m not an ARM expert, but I do know a thing or 2 about Ethernet and if it’s not the cable but it’s the jack, try setting the network speed manually:

  • 10 Mbps Half Duplex
  • Test (= Reboot and see if it still works)
  • If it does, go to 10 Mbps Full Duplex
  • Test
  • Continue raising it to 100 HD, 100 FD, 1000 HD and when it stops working, go back down to the speed it still worked on.

:crossed_fingers:

1 Like

Interesting. I’m on some version of the 5.13 kernel, and things are just strange now. I left it alone, and at the moment it responds to ping, but refuses to connect via SSH.

@linux-aarhus , what version of the eeprom are you running?

nomachine can’t find it, but nmap can see it:

Host is up (0.00024s latency).
Not shown: 997 closed ports
PORT STATE SERVICE VERSION
139/tcp open tcpwrapped
445/tcp open tcpwrapped
4000/tcp open tcpwrapped

Host script results:
|_nbstat: NetBIOS name: TELETRAAN1, NetBIOS user: , NetBIOS MAC: (unknown)
|_smb2-time: Protocol negotiation failed (SMB2)

Service detection performed. Please report any incorrect results at Nmap OS/Service Fingerprint and Correction Submission Page .
Nmap done: 1 IP address (1 host up) scanned in 5.75 seconds

Services that I know should be up, like DNS (Pi-Hole), are not:

% nmap -A -T4 -p 53 10.0.4.130
Starting Nmap 7.91 ( https://nmap.org ) at 2021-08-26 18:24 CDT
Nmap scan report for $HOST
Host is up (0.00090s latency).

PORT STATE SERVICE VERSION
53/tcp closed domain

@Fabby , does this still look like a possible hardware issue to you? I haven’t had a chance to run the test you suggested yet. I’ll be able to do that later tonight.

But I would think a hardware failure wouldn’t discriminate based on port numbers/services…

Yes… Try the hardware solution first, then the kernel.

:crossed_fingers:

1 Like

Will do.

How would you suggest testing it? I haven’t quite isolated the failure mode. I set my Mac to ping the Pi overnight, forever, and 8 hours and 40,000 pings later, it was still up.

Then it just randomly died while nothing was connected to it.

And I’m back.

My switch is fairly new (2020?), so I was limited on how slow I could make the connection. It stepped thusly: 100 Mbps HDX → 100 Mbps FDX → 1 Gbps FDX → 2.5 Gbps stuff…

Oddly enough, the Pi absolutely refused to negotiate at 100 Mbps anything. The switch didn’t acknowledge a connection at all.

Right now I’ve got the switch set to 1 Gbps FDX, with no auto-negotiation. (Before, it was just set to auto.) I’m SSH’d in without issue.

I’m going to log out of the SSH session and see if it stays up correctly.

The pi device I have used to run the test, was setup when I wrote this topic

The image used for the above topic is Manjaro-ARM-minimal-rpi4-21.07.img and the kernel is the linux-rpi4 package.

Looking at branch-compare there is a couple of kernels - which relates to the different devices supported.

Without anything to back it - could it be that you accidently have installed a kernel for another device?

1 Like

It depends: GUI or CLI?

:thinking:

@linux-aarhus , thanks for this detailed info. I’ve been very careful about switching out kernels, and I’ve been on the one I’m using for several months. I’m … 95 percent … confident it’s the kernel for my device. I will double check, though … once I figure out how to do that. :slight_smile:

@Fabby , just because I enjoy learning things, how about both? :slight_smile: CLI is preferable. This Pi hates XFCE now. :stuck_out_tongue:

So, actually, it’s apparently stable again. No network issues since setting the switch to always do FDX 1 Gbps, with no auto-negotiation. This is bonkers for a couple of reasons:

  1. It wasn’t plugged into the switch when it began to misbehave.
  2. It never had any trouble auto-negotiating to 1Gbps FDX before.

All I can figure is that a config file got a bit corrupted somewhere on the Pi, and removing auto-negotiation from the picture fixed it. No idea.

Case … closed?

  1. Execute:

    nmcli connection show | grep ethernet
    
  2. The first column is the name of your Ethernet connection (Wired connection 1 in my case, so I’m going to use that as an example

  3. Remove Ethernet cable

  4. Execute:

    nmcli connection edit "Wired Connection 1"
    
  5. Type:

    set 802-3-ethernet.speed 10
    set 802-3-ethernet.speed half
    save temporary
    quit
    
  6. Trust, but verify:

    nmcli connection show "Wired connection 1" | grep 802-3-ethernet
    
  7. Insert Ethernet cable

Repeat steps 3-7 until it stops working and then go back one notch but use permanent instead of temporary

:crossed_fingers:

1 Like