LACP Bonding 2 Ethernet Ports on Raspberry Pi: No Gateway

I’ll start out by saying I’m just doing this to see if it can be done.

So, I’ve got a 2.5GbE switch that supports LACP/interface bonding, so I thought it would be interesting to see if I could bond together the Pi 4’s onboard 1GbE with a 1GbE USB dongle.

Good news: I managed to create the bond. It gets an IP from the router.

Bad news: It can’t route anywhere on the network. Not even to the router it’s getting an IP from. Somehow.

Bizarre News: Nomachine, which uses some sort of smart discovery protocol to find servers, can still connect to it just fine. I’m remoted in to it right now, which is great since it’s a headless machine. So it can communicate with the outside world, but it just can’t route via IPv4 or IPv6.

Diagnostics below. Any ideas?

$ ip addr show bond1
5: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether $MAC brd ff:ff:ff:ff:ff:ff
inet 10.0.4.58/24 brd 10.0.4.255 scope global dynamic noprefixroute bond1
valid_lft 85319sec preferred_lft 85319sec
inet6 $IPv6ADDR/64 scope global dynamic noprefixroute
valid_lft 603720sec preferred_lft 603720sec
inet6 fe80::6f08:13a6:7ca1:1889/64 scope link noprefixroute
valid_lft forever preferred_lft forever

Some things in in the bond description seem a bit off (slow rate?), but I’m not sure how to fix them. All the documentation I find on the internet appears to be (1) outdated; (2) inapplicable to Manjaro ARM (there’s no “netcl” service running); or (3) assuming that just creating the bond in Network Manager’s GUI is enough to make it work.

$ cat /proc/net/bonding/bond1
Ethernet Channel Bonding Driver: v5.10.9-1-MANJARO-ARM

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 1
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable

Slave Interface: enp1s0u1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: $MAC
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: monitoring
Partner Churn State: monitoring
Actor Churned Count: 0
Partner Churned Count: 0

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: $MAC
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: monitoring
Partner Churn State: monitoring
Actor Churned Count: 0
Partner Churned Count: 0

I am not a network expert, but I guess the first thing to check is the route to the router.

$ ip route

default via 10.0.2.4 dev wlan0 proto static metric 600  

The IP displayed on this line should be the inside IP of the router.

And if that checks out, what happens if you ping the router? Are all pings replied to, or are some dropped?

Edit: I just found this excellent link. It may not have the answer you need to resolve this issue, but since you are

it may encourage even more ideas to try. :slight_smile:

~]$ ip route
default via 10.0.4.1 dev bond1 proto dhcp metric 20300 
10.0.4.0/24 dev bond1 proto kernel scope link src 10.0.4.58 metric 300 

So…

10.0.4.1 is the router address, and it is presently not reachable by ping. No route/100 percent packet loss.

At the same time, the router sees the Pi as a valid DHCP client, with the assigned address of 10.0.4.58.

That link is great, and I think if I could follow those instructions I’d be set. I’ve been feeling like Network Manager’s GUI/TUI is not giving me enough fine control.

On the other hand, I can’t figure out how to translate those instructions into something I can use in Manjaro ARM.

I don’t actually have an /etc/sysconfig/network directory, and I can’t find where the actual text config files for the interfaces are. They’re not in any of the possible places I’m being told to look by the Manjaro and Arch wikis. All I can find is, like, the virtual network adapter text configs for docker stuff.

If I could find the config files, I think I could make this work, but at this point I’m very confused and spending all my Linux tinkering time with this, instead of the long list of actually fun projects I want to try.

I’d still like to make this work, but in the interests of time and of actually taking advantage of my 2.5GbE switch, I just ordered a USB3-to-2.5GbE adapter. :stuck_out_tongue:

Well, something had to work enough to get a DHCP address. That could be done via a single interface prior or inconjuction with the construction of the bond, not sure how this works. So it may not have been the bond itself that acquired the address, so the issue could be in the bond configuration.

Do you use a firewall? Make sure it is disabled while you are attempting this.

A bond is common enough, nmtui should offer sufficient options. The issue could be on the router too, hard to know without using something like tcpdump/wireshark.

Agreed. There are so many possible points of failure that it’s difficult to pin down exactly what’s going on. Firewall is turned off while I troubleshoot this.

nmtui will let me set up the bond, but it won’t let me do things like define the default gateway or adjust the advanced LACP settings. I need to edit the text files for that, which I can’t find.

I’m far more concerned that I can’t actually find the underlying text config files.

At the same time, there are clearly some issues bringing the bond up. Here’s what I see via ip address show, as well as after boot in journalctl.

Two issues immediately jump out:

  1. The bond does not seem to have an actual valid MAC address.
  2. The two bonded interfaces go up and down repeatedly, so it wouldn’t surprise me if that’s getting them out of sync somehow.
2: eth0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond1 state UP group default qlen 1000
    link/ether $MACADDR ff:ff:ff:ff:ff:ff permaddr $REALMACADDR
4: enp1s0u1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master bond1 state UP group default qlen 1000
    link/ether $MACADDR brd ff:ff:ff:ff:ff:ff permaddr $REALMACADDR
5: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether $MACADDR brd ff:ff:ff:ff:ff:ff
    inet 10.0.4.58/24 brd 10.0.4.255 scope global dynamic noprefixroute bond1
       valid_lft 77764sec preferred_lft 77764sec
    inet6 $IPv6/64 scope global dynamic noprefixroute 
       valid_lft 596159sec preferred_lft 596159sec
    inet6 fe80::6f08:13a6:7ca1:1889/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
> ~]$ journalctl -k 
> -- Journal begins at Fri 2021-02-19 23:21:09 CST, ends at Sat 2021-02-20 03:57:35 CST. --
> Feb 20 01:41:32 teletraan1 kernel: bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
> Feb 20 01:41:32 teletraan1 kernel: bcmgenet fd580000.ethernet eth0: Link is Down
> Feb 20 01:41:32 teletraan1 kernel: bond1: Invalid ad_actor_system MAC address.
> Feb 20 01:41:32 teletraan1 kernel: bond1: option ad_actor_system: invalid value (00:00:00:00:00:00)
> Feb 20 01:41:35 teletraan1 kernel: bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
> Feb 20 01:41:35 teletraan1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Feb 20 01:41:35 teletraan1 kernel: bcmgenet fd580000.ethernet eth0: Link is Down
> Feb 20 01:41:35 teletraan1 kernel: bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
> Feb 20 01:41:35 teletraan1 kernel: bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
> Feb 20 01:41:35 teletraan1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Feb 20 01:41:35 teletraan1 kernel: bcmgenet fd580000.ethernet eth0: Link is Down
> Feb 20 01:41:35 teletraan1 kernel: bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay)
> Feb 20 01:41:35 teletraan1 kernel: bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
> Feb 20 01:41:35 teletraan1 kernel: bond1: (slave eth0): Enslaving as a backup interface with an up link
> Feb 20 01:41:35 teletraan1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready
> Feb 20 01:41:36 teletraan1 kernel: ax88179_178a 2-1:1.0 enp1s0u1: ax88179 - Link status is: 1
> Feb 20 01:41:36 teletraan1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0u1: link becomes ready
> Feb 20 01:41:36 teletraan1 kernel: ax88179_178a 2-1:1.0 enp1s0u1: ax88179 - Link status is: 1
> Feb 20 01:41:36 teletraan1 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0u1: link becomes ready
> Feb 20 01:41:37 teletraan1 kernel: ax88179_178a 2-1:1.0 enp1s0u1: ax88179 - Link status is: 1
> Feb 20 01:41:37 teletraan1 kernel: bond1: (slave enp1s0u1): Enslaving as a backup interface with a down link
> Feb 20 01:41:39 teletraan1 kernel: ax88179_178a 2-1:1.0 enp1s0u1: ax88179 - Link status is: 1
> Feb 20 01:41:39 teletraan1 kernel: bond1: (slave enp1s0u1): link status definitely up, 1000 Mbps full duplex
> Feb 20 01:41:39 teletraan1 kernel: bond1: active interface up!
> Feb 20 01:41:40 teletraan1 kernel: ax88179_178a 2-1:1.0 enp1s0u1: ax88179 - Link status is: 0
> Feb 20 01:41:40 teletraan1 kernel: bond1: (slave enp1s0u1): link status definitely down, disabling slave
> Feb 20 01:41:43 teletraan1 kernel: ax88179_178a 2-1:1.0 enp1s0u1: ax88179 - Link status is: 1
> Feb 20 01:41:43 teletraan1 kernel: bond1: (slave enp1s0u1): link status definitely up, 1000 Mbps full duplex

Have you tested the USB adapter? Just to make sure it is stable. It does not appear to be a particular problem. Something is not happy, possibly even software. Like two network managers trying to manage the devices. But yes, the bad MAC indicates something is wrong in the bond configuration.

Maybe the configuration is located here?

/etc/NetworkManager/system-connections/
1 Like

/etc/sysconfig/network-scripts is something from the Rhel world (including Fedora, CentOS, Oracle Linux,…). If you follow tutorials from the Rhel world you need to switch distributions. It won’t work in Manjaro or Arch.

As I told you before, you need to choose the right tool for the job. NetworkManager is not the tool you can and should use if you want to do something more advanced in networking.

Well, since I can’t actually get into /etc/NetworkManager/system-connections/, I’m going to assume the config files are there. :stuck_out_tongue: Part of me is tempted to just pull the drive and plug it in another computer so I can look at that directory.

Observe:

NetworkManager]$ cd system-connections/
bash: cd: system-connections/: Permission denied
[panoptitom@teletraan1 NetworkManager]$ sudo cd system-connections/
sudo: cd: command not found

I have no idea why it works this way. The OS will gladly let me destroy the installation via sudo rm -rf /, but won’t let me see the text configuration files for the network. Magnificent. (I mean, I know there must be a reason, but I am boggling.)

@xabbu , with all respect, I don’t consider your reply that helpful. I do in fact know which distribution I’m running, and I’ve also been trying to follow the directions on both the Arch and Manjaro wikis without success–the config files aren’t where those wikis say they should be, either. At this point I’m reading anything I can about port trunking/LACP/bonding to make sure I understand the basics of how it’s supposed to work at the kernel level, and most of the write ups for it are written using Rhea-type distros. I am very aware that those instructions don’t translate 1:1 and I need to do the work of figuring out what’s going on and how to do it in Manjaro-land.

And as noted elsewhere in this thread, port trunking/bonding should not be such a complex operation that I have to swap out the part of the system software that actually manages the network interfaces. More to the point, since this is something I do as a hobby for (supposed) fun, I’m not really interested in taking my install that far off the path of default packages and configurations.

Ideally, since I’m willing, I should just be able to edit some text files somewhere to change the bond configuration, but I can’t get access to those for … some reason.

I certainly don’t need LACP enough to rip out and replace parts of the OS–not when I can just buy a faster network adapter. I was going to use LACP to save $30, but I’ve already spent way more than $30 in time screwing with it, and now that I see that I can’t even look in the directory where the configuration files might be, I’m pretty close to done.

The limited nature of Network Manager vs. my desire to be able to readily implement more advanced networking configurations is just one reason I’m gravitating away from Manjaro in my consideration of what Linux distribution to use on my next server build.

cd is internal command to the bash shell, your environment. sudo is for executing binaries and scripts.

Personally, I switch to root via su, but that is up to you.
You could use sudo to change permissions so you could cd into it. Then change them back when you are done.

You could try using Adaptive load balancing (balance-alb) as it doesn’t require any switch support, it’s transparent for the other network devices.

I’ve used it in Proxmox and it works great.

You can read about it here:

https://enterprisenetworkingplanet.com/netsysm/article.php/3697756/Tips-and-Tuning-for-Ethernet-Bonding-With-Linux.htm

Thanks for that. I need to read up on su, @0n0w1c . I’ve never used it before and I don’t really understand how it works. I think–and this is without wiki’ing anything–sudo lets me execute binaries/scripts, as you said, but doesn’t change what user is doing it, and su lets me in effect become the root user? Is that right?

I didn’t realize cd was an internal command for bash. I thought it was just another little program. Since I’m running bash the whole time as me, not root, it makes sense that it wouldn’t work with sudo.

If I manage to get into that file and get the contents of the bond1 config, I’ll let you know.

@Glock24 I might end up trying that. I’ve got a NAS connected to the same switch via LACP, so I know the switch supports it. Part of me wants to be very stubborn about using all the supported hardware features of the switch.

@0n0w1c , it worked! :slight_smile: Granted my stress level is a bit higher at the moment since I’m at a root prompt, but.

system-connections]# ls -la
total 24
drwx------ 2 root root 4096 Feb 20 04:13  .
drwxr-xr-x 7 root root 4096 Feb  4 16:45  ..
-rw------- 1 root root  260 Feb 20 04:13 'bond1 slave 1.nmconnection'
-rw------- 1 root root  264 Feb 20 04:13 'bond1 slave 2.nmconnection'
-rw------- 1 root root  336 Feb 20 04:13 'Ethernet 2Gbps Bond.nmconnection'
-rw------- 1 root root  359 Feb 15 01:30 'On-Board Ethernet.nmconnection'

What immediately jumps out at me is, even though there’s a file for the on-board ethernet settings, there’s not one for the USB module.

# cat 'On-Board Ethernet.nmconnection' 
[connection]
id=On-Board Ethernet
uuid=2e35c740-4618-382e-aba7-58691d19c935
type=ethernet
autoconnect-priority=-999
interface-name=eth0
permissions=
timestamp=1613373561

[ethernet]
duplex=full
mac-address-blacklist=
speed=1000

[ipv4]
dns-search=
may-fail=false
method=auto

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
ip6-privacy=0
method=auto

[proxy]

  #cat 'Ethernet 2Gbps Bond.nmconnection' 
    [connection]
    id=Ethernet 2Gbps Bond
    uuid=55ee0fce-2600-4155-afc7-5c1255ffe642
    type=bond
    autoconnect-priority=100
    interface-name=bond1
    permissions=
    timestamp=1613815892

[bond]
downdelay=0
miimon=1
mode=802.3ad
updelay=0

[ipv4]
dns-search=
method=auto

[ipv6]
addr-gen-mode=stable-privacy
dns-search=
ip6-privacy=0
method=auto

[proxy]

# cat 'bond1 slave 1.nmconnection' 
[connection]
id=bond1 slave 1
uuid=5f7bcdf6-408a-4ea2-9de5-b9b849784e2d
type=ethernet
interface-name=eth0
master=bond1
permissions=
slave-type=bond
timestamp=1613815892

[ethernet]
cloned-mac-address=stable
mac-address=$MAC1
mac-address-blacklist=

# cat 'bond1 slave 2.nmconnection' 
[connection]
id=bond1 slave 2
uuid=2245722b-d3c6-43df-bad6-f98a788298d1
type=ethernet
interface-name=enp1s0u1
master=bond1
permissions=
slave-type=bond
timestamp=1613815892

[ethernet]
cloned-mac-address=stable
mac-address=$MAC2
mac-address-blacklist=

Observations:

  1. eth0/On Board Ethernet’s autoconnect priority is -999.
  2. It seems like one of these interfaces should be listed as the master, and one of them should be listed as the slave. Both of them are listed as slaves, with bond1 being the master. Is that normal?
  3. Both slave1 and slave2 are using “stable” cloned MAC addresses. Should I designate eth0 as “permanent?” Apparently, if I do that it’s supposed to make the system use the MAC address of eth0 as the MAC address of the bonded connection, but when I tried that before it didn’t work.
  4. The actual bond.nmconnection does not list a MAC address at all, which concerns me since it was throwing an error.

I’m going to use the tutorial you sent me, @0n0w1c , to see if I can’t figure out what, if anything, is missing from these files.

I’m starting to suspect that the USB adapter just might not be up to the task. The lack of configuration file and the fact that it takes about 20-30 seconds longer to come up every time the machine is restarted make me wonder…

I suspect the “On Board Ethernet” is from when you used eth0 as a normal ethernet port, not part of this bond setup. Both slaves seems reasonable to me. If you look at the nmcli commands, mybond0 is made first, then slaves, the actual interfaces are added to master bond0. So up to this point, I think you are in good shape.

But I agree, no MAC address, seems wrong to me.

@Glock24 , when you set this up, did you happen to look at the config file to see if the bond had a MAC address listed?

I wonder if it’s as simple as manually adding one by editing the text file…

I’'ll have to experiment.

(Since I’ve already ordered a USB 3.0 2.5GbE adapter, I could stop, but I’d like to know I could make this work if I wanted to. :slight_smile: )

In the nmcli link above, there is a screenshot a page or two into the doc. The $MACADDR as seen in one of your posts above, is the location of the error. An actual MAC address should be seen. How and why that is displayed, will likely lead to the answer of what has gone wrong.

I do not think there is a technical reason why this would not work. Up to you, how far you wish to pursue it.

Hey, thanks again for that link. I just had a chance to really look at it for the first time a few minutes ago.

After reading it, I’m thinking my best bet is to blast out the existing bond I created in the GUI, and recreate the thing using the CLI. At least that way I can get better feedback as I go.

Since the current setup is completely not working, I can’t make it any worse. :slight_smile:

EDIT:

@0n0w1c This thing is so bizarre. One reason I keep wanting to throw in the towel is the failure modes don’t make any sense.

I’ve deleted the bond and just have two separate ethernet adapters that both communicate with the router… in whatever way the OS chooses to use them, I guess. They’re not bonded at all.

  1. I can ping google.com without issue.
  2. I can ping my NAS.
    3. I can’t ping the router .

My other computer can ping the router just fine, btw.

What even?

1 Like

There’s no mac address in the config file for the network, but then I’m not using Network manager in Proxmox (Debian) but rather the config files in /etc/network/interfaces.

After that, I restarted and it decided it was not going to connect to anything anymore. It wouldn’t even let me ping it, at either interface.

Pulling the USB interface and rebooting again got it back, but I did not appreciate that since it crapped out when I rebooted after a system update.

I do not like having to hard power cycle the Pi while the nvme is plugged in. I’m going to give up on this for now before I actually break something.

Thanks for all your help.

Sorry it did not work out, but you made a good attempt and probably learned something new along the way. So not all was lost. :slight_smile:

1 Like

No need to apologize. You kept helping me even when I entered the “contemplate smashing the Pi with a hammer” phase of troubleshooting. :slight_smile:

In fact, based on what you shared with me, I’m pretty sure I could get this working, but I’m so slammed this week that I wouldn’t be able to start over again until this weekend, and by that point I’ll have the 2.5GbE adapter.

It should give me better performance, and I’m guessing that’s about as far as I can reasonably expect the CPU to go. I doubt, even if I wanted to buy two adapters, the CPU could handle two 2.5GbE ports bonded together.