How / Where do I submit a bug report for a specific driver? (cxgb4)

I first posted this on the /r/manjaro sub reddit looking for advice but was re-directed here.

I can’t put links in my post, so just turn this into a link on your own:

hXXps://old.reddit.com/r/ManjaroLinux/comments/o92yx8/how_where_do_i_submit_a_bug_report_for_a_specific/?

I am having issues with a particular network driver and I am not sure if the issue is known / fixed / yet to be merged into kernel (it looks like 5.13 does not contain any of the very recent patches for the driver). It could also be the case that this issue is not known.

In a nutshell, that’s what i’m trying to establish. Is the issue documented below something that needs to be escalated… and if yes, how?


My system:

   OS: Manjaro Linux x86_64 
   Kernel: 5.12.9-1-MANJARO 
           Uptime: 38 mins 
   Packages: 1024 (pacman), 5 (flatpak), 5 (snap) 
   Shell: zsh 5.8  
   DE: Plasma 5.21.5 
   WM: KWin 
   CPU: AMD Ryzen 9 5950X (32) @ 3.400GHz 
   GPU: AMD ATI 0a:00.0 Navi 22 
   Memory: 11085MiB / 64294MiB 

The device / driver in question is Chelsio T520-CR:

$ ethtool -i enp11s0f4d1 
driver: cxgb4
version: 5.12.9-1-MANJARO
firmware-version: 1.25.4.0, TP 0.1.4.9
expansion-rom-version: 1.0.0.68
bus-info: 0000:0b:00.4
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

The Problem

When I take my system out of sleep, the NIC does not show any link. I can still interact with it via ethtool and ifconfig but no packets can flow. When I try ifconfig up/down i get an interesting error:

$ sudo ifconfig enp11s0f4d1 down
$ echo $?
0

$ sudo ifconfig enp11s0f4d1 up                                                                                                                                                                                       
SIOCSIFFLAGS: Protocol error

It’s actually the SIOCSIFFLAGS: Protocol error message that prompted me to open this thread.

When i check dmesg:

# I believe this is right around the time I issued the ifconfig up cmd that errord
[ 1700.148846] cxgb4 0000:0b:00.3: Device not initialized
[ 1700.191310] cxgb4 0000:0b:00.2: Device not initialized
[ 1700.247567] cxgb4 0000:0b:00.1: Device not initialized
[ 1700.267269] cxgb4 0000:0b:00.0: Device not initialized
<...>
# And this is around the time that i re-inserted the module, i think
[ 1714.364826] cxgb4 0000:0b:00.4: Coming up as MASTER: Initializing adapter
[ 1715.567594] cxgb4 0000:0b:00.4: Successfully configured using Firmware Configuration File "/lib/firmware/cxgb4/t5-config.txt", version 0x1425001c, computed checksum 0xd8c8fbd6
[ 1715.730935] cxgb4 0000:0b:00.4: Hash filter supported only on T6
[ 1715.781356] cxgb4 0000:0b:00.4: max_ordird_qp 21 max_ird_adapter 387072
[ 1715.821211] cxgb4 0000:0b:00.4: Current filter mode/mask 0x632b:0x21
[ 1715.883843] cxgb4 0000:0b:00.4: 128 MSI-X vectors allocated, nic 32 eoqsets 34 per uld 8 mirrorqsets 2
[ 1715.883857] cxgb4 0000:0b:00.4: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 1715.912304] cxgb4 0000:0b:00.4 eth0: eth0: Chelsio T520-CR (0000:0b:00.4) 1G/10GBASE-SFP
[ 1715.912529] cxgb4 0000:0b:00.4 eth1: eth1: Chelsio T520-CR (0000:0b:00.4) 1G/10GBASE-SFP
[ 1715.913172] cxgb4 0000:0b:00.4 enp11s0f4: renamed from eth0
[ 1715.941722] cxgb4 0000:0b:00.4 enp11s0f4d1: renamed from eth1
[ 1715.950925] cxgb4 0000:0b:00.4: Chelsio T520-CR rev 0
[ 1715.950929] cxgb4 0000:0b:00.4: S/N: PT26140032, P/N: 110116050E0
[ 1715.950930] cxgb4 0000:0b:00.4: Firmware version: 1.25.4.0
[ 1715.950931] cxgb4 0000:0b:00.4: Bootstrap version: 1.1.0.0
[ 1715.950932] cxgb4 0000:0b:00.4: TP Microcode version: 0.1.4.9
[ 1715.950932] cxgb4 0000:0b:00.4: Expansion ROM version: 1.0.0.68
[ 1715.950933] cxgb4 0000:0b:00.4: Serial Configuration version: 0x1004000
[ 1715.950934] cxgb4 0000:0b:00.4: VPD version: 0x2
[ 1715.950935] cxgb4 0000:0b:00.4: Configuration: RNIC MSI-X, Offload capable

What I’ve tried:

  • Updated my BIOS to the latest (2021-06-13)
  • Poured through bios looking for any/every setting that relates to power management and devices on the PCIE bus. Toggled things on/off and tested. No change.
  • Googled for cxgb4 sleep issues and related things. I don’t find much. The links that DO show up are for issues that are quite old in most cases. I did find one link that’s recent. More on that below…
  • Checked for any NIC FW updates (not that I know how to apply them…). I found that there is a recent (2021-05-21) release of the Chelsio drivers for linux: 3.14.0.3 which does contain a FW that is slightly newer than the one that appears to be running on the card right now: ChelsioUwire-3.14.0.3/src/network/firmware/t4fw-1.25.6.0.bin. I don’t know where the changelog for the FW is, but i really don’t think that the issue is caused by the delta between 1.25.6 and 1.25.4.

What i’ve figured out:

Despite the SIOCSIFFLAGS: Protocol error error, i can get my NIC back up and working again if I just remove / re-insert the kernel module:

$ sudo rmmod cxgb4
*works* 

$ ethtool -i enp11s0f4d1                                                                                                                                                                                            
Cannot get driver information: No such device
(expected)

$ sudo modprobe cxgb4
$ ethtool -i enp11s0f4d1 
driver: cxgb4
<...>                                                                                                                                                          

I can do this every time I take the system out of sleep, but i’d prefer not to. Which brings me to my question…

My questions:

  1. Is the SIOCSIFFLAGS: Protocol error something that should be reported to the driver maintainer for the card?
  2. If yes, who is that / where / how do I report it?
  3. I did find a few commits* that seem to be updates to the driver for this NIC, but I don’t fully understand what the commits are fixing/addressing. They sound related to my issue, but I can also totally understand if the patches in the link are for something else entirely.

*: I can’t put links in post, so: hXXps://www.spinics.net/lists/netdev/msg747745.htm

Thanks for your time / advice.

is part of the linux-firmware

this should help on how and where to report the bug:
https://kernelnewbies.org/FoundBug

2 Likes

I’ve marked this answer as the solution to your question as it is by far the best answer you’ll get.

However, if you disagree with my choice, please feel free to take any other answer as the solution to your question or even remove the solution altogether: You are in control! (If you disagree with my choice, just send me a personal message and explain why I shouldn’t have done this or :heart: or :+1: if you agree)

:innocent:
P.S. In the future, please don’t forget to come back and click the 3 dots below the answer to mark a solution like this below the answer that helped you most:
Solution
so that the next person that has the exact same problem you just had will benefit from your post as well as your question will now be in the “solved” status.

Thanks for linking to the kernel newbies guide for a new bug. Unfortunately, some of the links on that page are dead :(.

Life and other projects got in the way and are demanding a lot of my time at the moment. While not ideal, a quick sudo rmmod cxgb4; sudo modprobe cxgb4 on resume from sleep is enough to keep me unblocked for now.

I was able to confirm that the 5.13 kernel likely contains the few driver patches that i suspect address this problem. My plan now is to keep working on things as I have been and wait for then 5.13 kernel to make it to Manjaro in the next few days. If the sleep issue persists w/ 5.13 kernel, then I’ll dedicate some time to reaching out to the maintainers @ chelsio

2 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.