Network connection oscillating on/off regularly

My wired network connection regularly starts switching on and off for a while like ~20 times.
This happens at least once a day. I’ve had this problem on my old system as well but thought it was a defective network chip so I got a Realtek Pcie Network card and that fixed it.
Now I have a new PC, an MSI board again, and it started doing that again. Now I assume it’s some issue with my Manjaro installation. The light at the LAN port on the back of my PC turns off and on with the notifications on my screen.

Pulling the Network Cable for 30 seconds fixes the oscillation for a while.

I had slow Internet connection a while ago(~10mbit) but all the other devices in my network had full speed. A reboot fixed that.

I’ve already checked my Network with other Windows PC’s and they don’t loose connection ever. Switched the connector on my Network switch as well did nothing. Didn’t boot Windows with this new system yet but, I’d assume it would just work.

Now I would just put that Realtek card in my PC again and call it a day but my huge graphics card blocks the other 2 PCIE ports, so I’d have to get a riser.

Ideas? I’d hate to reinstall, because I’ve customized and compiled a lot of software for my current installation.

Partial troubleshooting only leaves us with unanswered questions.

That is the definition of an assumption I suppose. But it doesnt actually satisfy a test.

While other systems working well on the network makes it less likely the problem is with the network itself, thats about all we can surmise.

To start we may begin with general system information:

inxi -Fazy

Also heres a general guide, for tips on how to format code, etc:

Yes of course. I will test Windows and another Linux live boot for a few hours tomorrow.

I will also try to capture such an event with

sudo journalctl -f -u NetworkManager

but for now it just gave me the last successful connection attempt.

Here is the general system information you asked for.

System:
  Kernel: 6.5.13-7-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.5-x86_64
    root=UUID=62e798e5-73f6-49ff-b844-d5187cb428de rw quiet splash
    udev.log_priority=3
  Desktop: KDE Plasma v: 5.27.10 tk: Qt v: 5.15.12 info: frameworks
    v: 5.113.0 wm: kwin_wayland vt: 1 dm: SDDM Distro: Manjaro Linux
    base: Arch Linux
Machine:
  Type: Desktop Mobo: Micro-Star model: PRO B650M-P (MS-7E27) v: 1.0
    serial: <superuser required> uuid: <superuser required> UEFI: American
    Megatrends LLC. v: 1.40 date: 11/23/2023
CPU:
  Info: model: AMD Ryzen 5 7600 bits: 64 type: MT MCP arch: Zen 4 gen: 5
    level: v4 note: check built: 2022+ process: TSMC n5 (5nm) family: 0x19 (25)
    model-id: 0x61 (97) stepping: 2 microcode: 0xA601206
  Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
    L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 6 MiB desc: 6x1024 KiB
    L3: 32 MiB desc: 1x32 MiB
  Speed (MHz): avg: 2850 high: 4246 min/max: 400/5170 scaling:
    driver: amd-pstate-epp governor: powersave cores: 1: 4124 2: 400 3: 4246
    4: 4211 5: 4246 6: 3598 7: 400 8: 400 9: 4244 10: 400 11: 4191 12: 3740
    bogomips: 91235
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
  Vulnerabilities:
  Type: gather_data_sampling status: Not affected
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: retbleed status: Not affected
  Type: spec_rstack_overflow mitigation: Safe RET
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: Enhanced / Automatic IBRS, IBPB: conditional,
    STIBP: always-on, RSB filling, PBRSB-eIBRS: Not affected
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: AMD Navi 32 [Radeon RX 7700 XT / 7800 XT] vendor: XFX
    driver: amdgpu v: kernel arch: RDNA-3 code: Navi-3x process: TSMC n5 (5nm)
    built: 2022+ pcie: gen: 4 speed: 16 GT/s lanes: 16 ports: active: DP-2
    empty: DP-1,DP-3,HDMI-A-1 bus-ID: 03:00.0 chip-ID: 1002:747e class-ID: 0300
  Device-2: AMD Raphael vendor: Micro-Star MSI driver: amdgpu v: kernel
    arch: RDNA-2 code: Navi-2x process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4
    speed: 16 GT/s lanes: 16 ports: active: none empty: DP-4, DP-5, DP-6,
    HDMI-A-2 bus-ID: 12:00.0 chip-ID: 1002:164e class-ID: 0300 temp: 54.0 C
  Display: wayland server: X.org v: 1.21.1.10 with: Xwayland v: 23.2.3
    compositor: kwin_wayland driver: X: loaded: amdgpu
    unloaded: modesetting,radeon,vesa alternate: fbdev dri: radeonsi
    gpu: amdgpu,amdgpu display-ID: 0
  Monitor-1: DP-2 res: 1920x1080 size: N/A modes: N/A
  API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
    device: 1 drv: radeonsi device: 2 drv: swrast gbm: drv: radeonsi surfaceless:
    drv: radeonsi wayland: drv: radeonsi x11: drv: radeonsi
  API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 23.3.3-manjaro1.1
    glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 7800 XT (radeonsi
    navi32 LLVM 16.0.6 DRM 3.54 6.5.13-7-MANJARO) device-ID: 1002:747e
    memory: 15.62 GiB unified: no display-ID: :1.0
  API: Vulkan v: 1.3.274 layers: 8 device: 0 type: discrete-gpu name: AMD
    Radeon RX 7800 XT (RADV NAVI32) driver: mesa radv v: 23.3.3-manjaro1.1
    device-ID: 1002:747e surfaces: xcb,xlib,wayland device: 1
    type: integrated-gpu name: AMD Radeon Graphics (RADV RAPHAEL_MENDOCINO)
    driver: mesa radv v: 23.3.3-manjaro1.1 device-ID: 1002:164e
    surfaces: xcb,xlib,wayland
Audio:
  Device-1: AMD Navi 31 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
    gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 03:00.1 chip-ID: 1002:ab30
    class-ID: 0403
  Device-2: AMD Rembrandt Radeon High Definition Audio vendor: Micro-Star MSI
    driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
    bus-ID: 12:00.1 chip-ID: 1002:1640 class-ID: 0403
  Device-3: Apple USB-C to 3.5mm Headphone Jack Adapter
    driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 2.0 speed: 12 Mb/s
    lanes: 1 mode: 1.1 bus-ID: 3-1:2 chip-ID: 05ac:110a class-ID: 0300
    serial: <filter>
  Device-4: Apple USB-C to 3.5mm Headphone Jack Adapter
    driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 2.0 speed: 12 Mb/s
    lanes: 1 mode: 1.1 bus-ID: 3-2:3 chip-ID: 05ac:110a class-ID: 0300
    serial: <filter>
  API: ALSA v: k6.5.13-7-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
  Server-2: JACK v: 1.9.22 status: off tools: N/A
  Server-3: PipeWire v: 1.0.0 status: off with: wireplumber status: active
    tools: pw-cli,wpctl
  Server-4: PulseAudio v: 16.1 status: active with: pulseaudio-alsa
    type: plugin tools: pacat,pactl
Network:
  Device-1: Realtek RTL8125 2.5GbE vendor: Micro-Star MSI driver: r8169
    v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 port: e000 bus-ID: 0e:00.0
    chip-ID: 10ec:8125 class-ID: 0200
  IF: enp14s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Bluetooth:
  Device-1: TP-Link UB500 Adapter driver: btusb v: 0.8 type: USB rev: 1.1
    speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-7:3 chip-ID: 2357:0604
    class-ID: e001 serial: <filter>
  Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:
  Local Storage: total: 1.47 TiB used: 704.04 GiB (46.7%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Crucial model: CT1000P1SSD8
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: P3CR013 temp: 37.9 C
    scheme: GPT
  ID-2: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 850 EVO 500GB
    size: 465.76 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: 3B6Q scheme: MBR
  ID-3: /dev/sdb maj-min: 8:16 vendor: Mushkin model: MKNSSDCR120GB-7
    size: 111.79 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: BBF0 scheme: GPT
Partition:
  ID-1: / raw-size: 931.22 GiB size: 915.53 GiB (98.32%)
    used: 704.04 GiB (76.9%) fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 288 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: 59.0 C mobo: N/A
  Fan Speeds (rpm): N/A
  GPU: device: amdgpu temp: 53.0 C mem: 61.0 C fan: 73 watts: 33.00
    device: amdgpu temp: 54.0 C
Info:
  Memory: total: 32 GiB note: est. available: 30.55 GiB used: 7.1 GiB (23.2%)
  Processes: 329 Power: uptime: 5h 51m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 1 hibernate: platform
    avail: shutdown,reboot,suspend,test_resume image: 12.21 GiB
    daemons: upowerd, org_kde_powerdevil, power-profiles-daemon Init: systemd
    v: 255 default: graphical tool: systemctl
  Packages: 1541 pm: pacman pkgs: 1515 libs: 474 tools: pamac pm: flatpak
    pkgs: 26 Compilers: clang: 16.0.6 gcc: 13.2.1 Shell: Zsh v: 5.9 default: Bash
    v: 5.2.21 running-in: konsole inxi: 3.3.32

If you need any more information let me know.
I appreciate the help.

This kernel is EOL. I suggest installing and booting into one of the LTS (6.1, 6.6) and then removing 6.5. (see kernel.org)
ex:

sudo mhwd-kernel -i linux66

(reboot and select 66, check running kernel with uname -r)

sudo mhwd-kernel -r linux65

So this is the device we are talking about?
These do have a track record of being finnicky.
Not that long ago there were competing modules of 8168 vs 8169.

Actually I think there is still an mhwd profile.
Lets look at that too:

mhwd -li -l

Update your kernel like @cscs says. There were various problems with the RTL8125 in some of the 6.5 kernels which were later fixed in 6.6.

Basically this is the case, the fault repair is not so timely, 6.5 has stopped support, basically there will be no repair of the backport, 6.6 is good.

I’ve been using my PC for a few hours after updating to 6.6 now and so far it seems fixed. If it doesn’t reappear until evening I’ll mark this one as solved.
I think I’ll stick with LTS kernels from now on. Just installed 6.5 trying to fix some other issue a while back but it didn’t even help.

To be honest, the LTS kernel doesn’t help with this kind of glitch, 6.5.11 is totally fine, one of the fixes in 6.5.12 caused this glitch, the kernel’s commit merge rules allow merges that developers feel might cause a glitch commit, in other words LTS will also allow this kind of commit, maybe after some minor version update.

Then, as I said, 6.7 actually didn’t fix this bug either, the fix came very late, causing me to use 6.5.11 for a very long time, and introduced the fix in that version when 6.5 was out of support, I guess it was fixed on 6.7 and then backported to 6.6.

In fact, for a long time, the only new kernel available was 6.5.11, and neither LTS nor the new kernel was useful.

r8169: fix rare issue with broken rx after link-down on RTL8125

[ Upstream commit 621735f590643e3048ca2060c285b80551660601 ]

In very rare cases (I’ve seen two reports so far about different
RTL8125 chip versions) it seems the MAC locks up when link goes down
and requires a software reset to get revived.
Realtek doesn’t publish hw errata information, therefore the root cause
is unknown. Realtek vendor drivers do a full hw re-initialization on
each link-up event, the slimmed-down variant here was reported to fix
the issue for the reporting user.
It’s not fully clear which parts of the NIC are reset as part of the
software reset, therefore I can’t rule out side effects.

I had planned to withdraw the submission and compile a new 6.5 kernel, but due to procrastination and interference from other transactions, this plan was never implemented. Later, due to this issue being updated and fixed, I abandoned this plan.

This indicates your environment outside of your PC

This (strongly) suggests hardware

Did you already try another cable ?

These are not good tests for detecting a defective (shaky) cable

:footprints:

@Dark_iaji @Rusticus
“Using an LTS” does not itself constitute a fix.
But continuing to use an unsupported and dead kernel has a high probability of causing issues.
Not to mention it will be ‘stuck’. And if hung onto long enough, could even create some dependency headaches.
My main point was to … get off of a dead kernel … plus, sure, try others.
An LTS is simply a reasonable suggestion, and one the user can continue to have for a few years.

I’ve done some additional stuff the fist time this appeared with my old pc. I’ve had my Laptop on this cable for a few hours and nothing indicated a problem. Even my PC just running windows never had this problem.
I’ll wait a little while longer to confirm the issue was solved with a kernel update. For now it seems so.
I don’t have a replacement cable long enough to test on hand sadly.

@cscs @Dark_iaji @andreas85 @MrLavender

I just managed to trigger the problem by deactivating my connection to get some work done and afterwards re enabling it.
Here is the output of network manager. It just loops forever.

Feb 19 10:46:38 nicholas-manjaro NetworkManager[592]: <info>  [1708335998.7210] manager: NetworkManager state is now CONNECTING
Feb 19 10:46:38 nicholas-manjaro NetworkManager[592]: <info>  [1708335998.7213] device (enp14s0): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Feb 19 10:46:38 nicholas-manjaro NetworkManager[592]: <info>  [1708335998.7298] device (enp14s0): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Feb 19 10:46:38 nicholas-manjaro NetworkManager[592]: <info>  [1708335998.7299] dhcp4 (enp14s0): activation: beginning transaction (timeout in 45 seconds)
Feb 19 10:46:44 nicholas-manjaro NetworkManager[592]: <info>  [1708336004.7305] device (enp14s0): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 19 10:46:44 nicholas-manjaro NetworkManager[592]: <info>  [1708336004.7514] dhcp4 (enp14s0): canceled DHCP transaction
Feb 19 10:46:44 nicholas-manjaro NetworkManager[592]: <info>  [1708336004.7514] dhcp4 (enp14s0): state changed no lease
Feb 19 10:46:44 nicholas-manjaro NetworkManager[592]: <info>  [1708336004.7518] manager: NetworkManager state is now DISCONNECTED
Feb 19 10:46:50 nicholas-manjaro NetworkManager[592]: <info>  [1708336010.5964] device (enp14s0): carrier: link connected
Feb 19 10:46:50 nicholas-manjaro NetworkManager[592]: <info>  [1708336010.5965] device (enp14s0): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
Feb 19 10:46:50 nicholas-manjaro NetworkManager[592]: <info>  [1708336010.5968] policy: auto-activating connection 'Neue Verbindung 802-3-ethernet' (310a86a6-9944-4755-b208-e6dd8ded4827)
Feb 19 10:46:50 nicholas-manjaro NetworkManager[592]: <info>  [1708336010.5970] device (enp14s0): Activation: starting connection 'Neue Verbindung 802-3-ethernet' (310a86a6-9944-4755-b208-e6dd8ded4827)
Feb 19 10:46:50 nicholas-manjaro NetworkManager[592]: <info>  [1708336010.5970] device (enp14s0): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Feb 19 10:46:50 nicholas-manjaro NetworkManager[592]: <info>  [1708336010.5971] manager: NetworkManager state is now CONNECTING

When I set connection negotiation in the connection settings to ignore it re connected. That was reproduceable. Switching to automatic broke it again, ignore fixed it.

It hasn’t re appeared unprovokedly.

Flaky driver or hardware implementation may cause re-negotiation which in turn will drop and reconnect over and over again.

It appears to be a common issue with RTL8125 - sx.nix.dk

I suggest you try disabling IPv6

Adding ipv6.disable=1 to the kernel line disables the whole IPv6 stack, which is likely what you want if you are experiencing issues. See Kernel parameters for more information.
IPv6 - ArchWiki

Already had IPv6 disabled.

It appears to be a common issue with RTL8125 - sx.nix.dk

Are you suggesting manually installing 8125 driver and disabling 8169 module?

If it works the next days I will not poke it further. If it doesn’t I will try switching the driver.

I have never used RTL8125 - so I have no opinion to offer, nor a solution for that matter - if I were you I would definitely attempt to solve the issue by searching aur for something based on dkms.

My MSI MAG B550 TOMAHAWK has 2 LAN ports with both RTL8111 and RTL8125 controllers. I’ve never bothered even trying the 8125 port because I don’t have a 2.5Gbps router. When I have time I may try swapping my cable to that port and see if I get the same issue.

$ inxi -nz
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    driver: r8169
  IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  Device-2: Realtek RTL8125 2.5GbE driver: r8169
  IF: enp42s0 state: down mac: <filter>

But to be honest if turning off auto-negotiation solves your problem then that seems by far the simplest solution and preferable to messing with drivers.

I believe figuring out a proper solution to 8125 problems would be appreciated by many. The current solution feels hacky.

So to summarize. I solved the issue by switching from a 6.5 kernel to 6.6 and disabled auto negotiation in the network settings.

If anyone wants to properly fix this issue they should start with installing 8125 driver from the source @linux-aarhus mentioned and disabling 8169 module.

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.