R8168 - Complete system freeze when doing large network transfers

network
#1

I'm running Manjaro on a AMD Athlon based system that serve as a gateway/firewall/server and it works just fine.
However, I have noticed some issues when I do network transfers:

  1. The network speed does not go above 100Mbit despite both NICs being Gigabit capable (and detected as such)
  2. If the network transfer lasts long enough (like, 2GB to transfer), the system will suddenly completely freeze: Monitor goes to sleep, fans are still turning, power button is still lit up, PS/2 keyboard is unresponsive (NumLock Led won't turn off)

For the first issue, I have searched around and saw issues on Gigabyte motherboards and the RTL8168 driver suggesting to turn on IOMMU in the BIOS settings. But my motherboard is ASRock 870 Extreme 3 and I could not see such an option in the settings.

Here is a inxi dump:

System:    Host: server Kernel: 4.19.36-1-MANJARO x86_64 bits: 64 compiler: gcc v: 8.3.0 Console: tty 8
           Distro: Manjaro Linux
Machine:   Type: Desktop Mobo: ASRock model: 870 Extreme3 serial: <root required> BIOS: American Megatrends v: P1.60 date: 09/14/2010
CPU:       Topology: Quad Core model: AMD Athlon II X4 640 bits: 64 type: MCP arch: K10 rev: 3 L2 cache: 2048 KiB flags: lm nx pae sse sse2 sse3 sse4a svm bogomips: 24057
           Speed: 800 MHz min/max: 800/3000 MHz Core speeds (MHz): 1: 800 2: 800 3: 2300 4: 800
Graphics:  Device-1: NVIDIA GF119 [GeForce GT 610] driver: nvidia v: 390.116 bus ID: 05:00.0
           Display: tty server: X.org 1.20.4 driver: nvidia tty: 120x43
           Message: Advanced graphics data unavailable in console. Try -G --display
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] SBx00 Azalia vendor: ASRock driver: snd_hda_intel v: kernel bus ID: 00:14.2
           Device-2: NVIDIA GF119 HDMI Audio driver: snd_hda_intel v: kernel bus ID: 05:00.1
           Sound Server: ALSA v: k4.19.36-1-MANJARO
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASRock driver: r8168 v: 8.045.08-NAPI port: a800 bus ID: 02:00.0
           IF: enp2s0 state: up speed: 1000 Mbps duplex: full mac: 00:25:22:8e:00:c7
           Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8168 v: 8.045.08-NAPI port: c800 bus ID: 04:00.0
           IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: 3e:8d:9d:d6:a3:44
Drives:    Local Storage: total: 2.96 TiB used: 1.20 TiB (40.6%)
           ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO 250GB size: 232.89 GiB
           ID-2: /dev/sdb vendor: Seagate model: ST2000VN004-2E4164 size: 1.82 TiB
           ID-3: /dev/sdc vendor: Western Digital model: WD10EZEX-08WN4A0 size: 931.51 GiB
RAID:      Hardware-1: Silicon Image SiI 3124 PCI-X Serial ATA Controller driver: sata_sil24 v: kernel bus ID: 06:05.0
Partition: ID-1: / size: 219.57 GiB used: 13.46 GiB (6.1%) fs: ext4 dev: /dev/sda1
           ID-2: swap-1 size: 8.80 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda2
Sensors:   System Temperatures: cpu: 26.8 C mobo: N/A gpu: nvidia temp: 72 C
           Fan Speeds (RPM): N/A gpu: nvidia fan: 59%
Info:      Processes: 231 Uptime: 12h 06m Memory: 7.79 GiB used: 2.11 GiB (27.1%) Init: systemd Compilers: gcc: 8.3.0 Shell: bash v: 5.0.3 inxi: 3.0.33

The second interface (enp4s0) is based on a "No name" RTL8111 PCI-E card which does not have a burned in MAC address which gave me quite a few headaches at first. I'm about to replace it with a TP-Link TG-3468 card but I suspect that it won't change anything freeze related as the large transfers are done on enp2s0

Do you have any suggestion as to what could be causing this?
What information can I give you to help with this situation?

Regards

R8168 driver causing issues
#2

First off I would suggest testing other kernels. Kernel 4.14 has not seen speed/freezing issues like some of the more recent kernels have. Try at least two or three alternate kernels. Install kernels through Manjaro Settings Manager, and always have at least two kernels installed at all times for safety.

If you are using the r8168 driver then I would suggest this:

The r8168 driver has been experiencing major problems lately.

The r8169 kernel module is now the preferred driver.

Follow the instructions below to get your LAN working properly.

Uninstall the linuxXXX-r8168 driver:

Open Manjaro Settings Manager -> Hardware configuration -> Network controller

Right click on the RTL8111/8168/8411 ethernet device and select “Remove”.

After the uninstall process has finshed, restart.

After you restart the computer, the 8169 kernel module should now be automatically loaded.

If the r8169 kernel module is not loaded automatically when you reboot (after uninstalling r8168) then do this:

Open any file located in /etc/modprobe.d and ensure there is no reference to r8169.

Any file that contains the line:

blacklist r8169

Change to:

blacklist r8168

Save the edited conf file with root permissions, and then reboot

Alternately, you may delete the conf file entirely, (if it only contains the entry "blacklist r8169").

Example:

If /etc/modprobe.d contains a file named r8169_blacklist.conf then you can delete it with this command:

sudo rm /etc/modprobe.d/r8169_blacklist.conf

Be very careful, you do not make any errors when using the "rm" command with sudo privileges.

Reboot after making any changes to files in /etc/modprobe.d.

Check dmesg for any related errors.

dmesg | grep 'r816|eth|enp'
4 Likes
#3

Thanks for this, it helped a lot.
I first went with the r8168 removal, and using your instructions, it also removed the /etc/modprobe.d/r8169_blacklist.conf file that was present on my system.
I also replaced the "no name" card with the TP-Link one.

As a result, I get gigabit speeds when doing transfers on both interfaces, and I did not experience a system freeze either. This is why I decided to way for the kernel changes, until I do further tests.

The network setup is like this:

Win10PC --- switch --- (enp4s0) Manjaro (enp2s0) --- NAS

What I have tested so far is this:

Manjaro download from NAS via FTP protocol, save on Manjaro local disk -> 50MB/s
Win10PC download from Manjaro via FTP protocol, save on Win10PC local disk -> 30MB/s
Win10PC download from NAS via FTP protocol, save on Manjaro SMB share -> 2MB/s

As you see, there is a little oddity on the last test, but I'm not sure if it's SMB related or not.
I'll test Win10PC download from NAS via FTP protocol, saving on Win10PC local disk to see where the subgigabit speed are coming from. It may well be that the switch in between is not happy doing complete full duplex.

1 Like
#4

You're welcome, and I'm glad to hear your problem is now resolved.

1 Like