diff mbox

802.3ad bonding aggregator reselection

Message ID CAKdSkDUfat0bM=WBv-Numnaa0MVKu2V6Rcx8eUeTZjcuDiYLpw@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Veli-Matti Lintu June 17, 2016, 10:40 a.m. UTC
Hello,

I have been trying to get the bonding driver working with multiple
aggregators with two switches in mode=802.3ad to handle failing links
properly. The goal is to have always the best possible bonded link in
use if one or physical links fail.

The bonding documentation describes that 802.3ad with
ad_select=bandwidth/count should do this, but I wasn't able to get
those or ad_select=stable working without patching the kernel. As I'm
not really familiar with the codebase, I'm not sure if this is really
a kernel problem or a configuration problem.

Documentation/networking/bonding.txt

ad_select
...
        The bandwidth and count selection policies permit failover of
        802.3ad aggregations when partial failure of the active aggregator
        occurs.  This keeps the aggregator with the highest availability
        (either in bandwidth or in number of ports) active at all times.

        This option was added in bonding version 3.4.0.




The hardware setup consists of two HP 2530-48G switches and servers
that have 6 ports in total that are connected to both switches using
3x1Gbps links. Port groups are configured as LACP on the switches. The
switches are connected to each other, but they do not create a single
aggregator so that all 6 links could be active at the same time. The
NICs use ixgbe and igb drivers.



Here are the tested steps:

ad_select=stable

1. Enable all links on both switches and boot the server, 3 ports are up
2. Disable one link on switch that is the active aggregator

expected: link goes down and port count in /proc/net/bonding/bond0 goes down
result: link goes down and port count in /proc/net/bonding/bond0 does not change

3. Disable all links on switch that is the active aggregator

expected: link goes down and bond switches to using aggregator that has links up
result: link goes down and port count in /proc/net/bonding/bond0 does
not change and connection is lost as there are no links up in active
aggregator.

4. Enable a single link that on active aggregator that has all links down

expect: ?
result: aggregator with most links up is activated (in this case the
previously non-active switch that had 3 links up all the time)



ad_select=bandwidth/count

1. Enable all links on both switches and boot the server, 3 ports are up
2. Disable one link on switch that is the active aggregator

expected: link goes down and aggregator reselection is started and
non-active aggregator with 3 links up becomes active
result: link goes down and port count in /proc/net/bonding/bond0 does
not change, aggregator reselection does not occur

3. Same as with ad_select=stable

4. Enable a single link that on active aggregator that has all links down

expect: aggregator with most links up is activated
result: aggregator with most links up is activated (in this case the
previously non-active switch that had 3 links up all the time)


In all cases miimon does detect the link going down and if I bring one
slaved interface down and back up (ifconfig/ip) in non-active
aggregator, aggregator reselection is done. For me it looks like the
problem is that when link goes down, there's nothing to check the
remaining status of the bond.

I could get this to happen with the following patch, but I'm not sure
what side effects it might cause. Most of the examples googling
revealed seemed to refer to Cisco gear, so I'm wondering if there's
something hardware specific here.






Here's /proc/net/bonding/bond0 on unmodified 4.7-rc3 after disabling
two ports on the switch with active aggregator. The active aggregator
info still shows 3 ports. The results are the same on 4.4.x and 4.6.x
kernels.

The following options were used:

options bonding mode=4 miimon=100 downdelay=200 updelay=200
xmit_hash_policy=layer3+4 ad_select=1 max_bonds=0 min_links=0


Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 2000
Down Delay (ms): 2000

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): bandwidth
System priority: 65535
System MAC address: f2:07:89:4a:7c:9f
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 3
Actor Key: 9
Partner Key: 57
Partner Mac Address: 6c:3b:e5:df:7a:80

Slave Interface: enp5s0f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:c4:7a:34:c7:f1
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 9
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 31360
    system mac address: 6c:3b:e5:df:7a:80
    oper key: 57
    port priority: 0
    port number: 23
    port state: 61

Slave Interface: enp5s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:c4:7a:34:c7:f0
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 9
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 36992
    system mac address: 6c:3b:e5:e0:90:80
    oper key: 57
    port priority: 0
    port number: 23
    port state: 61

Slave Interface: ens6f1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: a0:36:9f:83:3c:41
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 0
    port priority: 255
    port number: 3
    port state: 63
details partner lacp pdu:
    system priority: 31360
    system mac address: 6c:3b:e5:df:7a:80
    oper key: 57
    port priority: 0
    port number: 29
    port state: 61

Slave Interface: ens6f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:9f:83:3c:40
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 9
    port priority: 255
    port number: 4
    port state: 7
details partner lacp pdu:
    system priority: 36992
    system mac address: 6c:3b:e5:e0:90:80
    oper key: 57
    port priority: 0
    port number: 29
    port state: 53

Slave Interface: ens5f1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: a0:36:9f:83:3d:1f
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: churned
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 0
    port priority: 255
    port number: 5
    port state: 143
details partner lacp pdu:
    system priority: 31360
    system mac address: 6c:3b:e5:df:7a:80
    oper key: 57
    port priority: 0
    port number: 28
    port state: 55

Slave Interface: ens5f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:9f:83:3d:1e
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 9
    port priority: 255
    port number: 6
    port state: 63
details partner lacp pdu:
    system priority: 36992
    system mac address: 6c:3b:e5:e0:90:80
    oper key: 57
    port priority: 0
    port number: 28
    port state: 61




The results with the patch after disabling links and aggregator has
been reselected:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 1000
Up Delay (ms): 2000
Down Delay (ms): 2000

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): bandwidth
System priority: 65535
System MAC address: f2:07:89:4a:7c:9f
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 2
Actor Key: 9
Partner Key: 57
Partner Mac Address: 6c:3b:e5:e0:90:80

Slave Interface: enp5s0f1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:c4:7a:34:c7:f1
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 9
    port priority: 255
    port number: 1
    port state: 63
details partner lacp pdu:
    system priority: 31360
    system mac address: 6c:3b:e5:df:7a:80
    oper key: 57
    port priority: 0
    port number: 23
    port state: 61

Slave Interface: enp5s0f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 0c:c4:7a:34:c7:f0
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 9
    port priority: 255
    port number: 2
    port state: 63
details partner lacp pdu:
    system priority: 36992
    system mac address: 6c:3b:e5:e0:90:80
    oper key: 57
    port priority: 0
    port number: 23
    port state: 61

Slave Interface: ens6f1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: a0:36:9f:83:3c:41
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 0
    port priority: 255
    port number: 3
    port state: 7
details partner lacp pdu:
    system priority: 31360
    system mac address: 6c:3b:e5:df:7a:80
    oper key: 57
    port priority: 0
    port number: 29
    port state: 61

Slave Interface: ens6f0
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: a0:36:9f:83:3c:40
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: monitoring
Partner Churn State: monitoring
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 0
    port priority: 255
    port number: 4
    port state: 135
details partner lacp pdu:
    system priority: 36992
    system mac address: 6c:3b:e5:e0:90:80
    oper key: 57
    port priority: 0
    port number: 29
    port state: 55

Slave Interface: ens5f1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: a0:36:9f:83:3d:1f
Slave queue ID: 0
Aggregator ID: 5
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 0
    port priority: 255
    port number: 5
    port state: 135
details partner lacp pdu:
    system priority: 31360
    system mac address: 6c:3b:e5:df:7a:80
    oper key: 57
    port priority: 0
    port number: 28
    port state: 55

Slave Interface: ens5f0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:36:9f:83:3d:1e
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: f2:07:89:4a:7c:9f
    port key: 9
    port priority: 255
    port number: 6
    port state: 63
details partner lacp pdu:
    system priority: 36992
    system mac address: 6c:3b:e5:e0:90:80
    oper key: 57
    port priority: 0
    port number: 28
    port state: 61


Happy hacking!

Veli-Matti

Comments

Veli-Matti Lintu June 21, 2016, 10:50 a.m. UTC | #1
2016-06-20 17:11 GMT+03:00 zhuyj <zyjzyj2000@gmail.com>:
> 5. Switch Configuration
> =======================
>
>         For this section, "switch" refers to whatever system the
> bonded devices are directly connected to (i.e., where the other end of
> the cable plugs into).  This may be an actual dedicated switch device,
> or it may be another regular system (e.g., another computer running
> Linux),
>
>         The active-backup, balance-tlb and balance-alb modes do not
> require any specific configuration of the switch.
>
>         The 802.3ad mode requires that the switch have the appropriate
> ports configured as an 802.3ad aggregation.  The precise method used
> to configure this varies from switch to switch, but, for example, a
> Cisco 3550 series switch requires that the appropriate ports first be
> grouped together in a single etherchannel instance, then that
> etherchannel is set to mode "lacp" to enable 802.3ad (instead of
> standard EtherChannel).

The ports are configured in switch settings (HP Procurve 2530-48G) in
same trunk group (TrkX) and trunk group type is set as LACP.
/proc/net/bonding/bond0 also shows that the three ports belong to same
aggregator and bandwidth tests also support this. In my understanding
Procurve's trunk group is pretty much the same as etherchannel in
Cisco's terminology. The bonded link comes always up properly, but
handling of links going down is the problem. Are there known
differences between different vendors there?

Veli-Matti
Jay Vosburgh June 21, 2016, 3:46 p.m. UTC | #2
Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:

>2016-06-20 17:11 GMT+03:00 zhuyj <zyjzyj2000@gmail.com>:
>> 5. Switch Configuration
>> =======================
>>
>>         For this section, "switch" refers to whatever system the
>> bonded devices are directly connected to (i.e., where the other end of
>> the cable plugs into).  This may be an actual dedicated switch device,
>> or it may be another regular system (e.g., another computer running
>> Linux),
>>
>>         The active-backup, balance-tlb and balance-alb modes do not
>> require any specific configuration of the switch.
>>
>>         The 802.3ad mode requires that the switch have the appropriate
>> ports configured as an 802.3ad aggregation.  The precise method used
>> to configure this varies from switch to switch, but, for example, a
>> Cisco 3550 series switch requires that the appropriate ports first be
>> grouped together in a single etherchannel instance, then that
>> etherchannel is set to mode "lacp" to enable 802.3ad (instead of
>> standard EtherChannel).
>
>The ports are configured in switch settings (HP Procurve 2530-48G) in
>same trunk group (TrkX) and trunk group type is set as LACP.
>/proc/net/bonding/bond0 also shows that the three ports belong to same
>aggregator and bandwidth tests also support this. In my understanding
>Procurve's trunk group is pretty much the same as etherchannel in
>Cisco's terminology. The bonded link comes always up properly, but
>handling of links going down is the problem. Are there known
>differences between different vendors there?

	I did the original LACP reselection testing on a Cisco switch,
but I have an HP 2530 now; I'll test it later today or tomorrow and see
if it behaves properly, and whether your proposed patch is needed.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Veli-Matti Lintu June 21, 2016, 8:48 p.m. UTC | #3
2016-06-21 18:46 GMT+03:00 Jay Vosburgh <jay.vosburgh@canonical.com>:
> Veli-Matti Lintu <veli-matti.lintu@opinsys.fi> wrote:
>
>>2016-06-20 17:11 GMT+03:00 zhuyj <zyjzyj2000@gmail.com>:
>>> 5. Switch Configuration
>>> =======================
>>>
>>>         For this section, "switch" refers to whatever system the
>>> bonded devices are directly connected to (i.e., where the other end of
>>> the cable plugs into).  This may be an actual dedicated switch device,
>>> or it may be another regular system (e.g., another computer running
>>> Linux),
>>>
>>>         The active-backup, balance-tlb and balance-alb modes do not
>>> require any specific configuration of the switch.
>>>
>>>         The 802.3ad mode requires that the switch have the appropriate
>>> ports configured as an 802.3ad aggregation.  The precise method used
>>> to configure this varies from switch to switch, but, for example, a
>>> Cisco 3550 series switch requires that the appropriate ports first be
>>> grouped together in a single etherchannel instance, then that
>>> etherchannel is set to mode "lacp" to enable 802.3ad (instead of
>>> standard EtherChannel).
>>
>>The ports are configured in switch settings (HP Procurve 2530-48G) in
>>same trunk group (TrkX) and trunk group type is set as LACP.
>>/proc/net/bonding/bond0 also shows that the three ports belong to same
>>aggregator and bandwidth tests also support this. In my understanding
>>Procurve's trunk group is pretty much the same as etherchannel in
>>Cisco's terminology. The bonded link comes always up properly, but
>>handling of links going down is the problem. Are there known
>>differences between different vendors there?
>
>         I did the original LACP reselection testing on a Cisco switch,
> but I have an HP 2530 now; I'll test it later today or tomorrow and see
> if it behaves properly, and whether your proposed patch is needed.

Thanks for taking a look at this. Here are some more details about the
setup as Zhu Yanjun also requested.

The server in question has two internal 10Gbps ports (using ixgbe) and
two Intel I350 T2 dual-1Gbps PCIe-cards (using igb). All ports are
using 1Gbps connections.

05:00.0 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
05:00.1 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
81:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
81:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
82:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
82:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)

In the test setup the bonds are setup as:

05:00.0 + 81:00.0 + 82:00.0 and
05:00.1 + 81:00.1 + 82:00.1

So each bond uses one port using ixgbe and two ports using igbe.

When testing, I have disabled the port in the switch configuration
that brings down the link and also miimon sees the link going down on
the server. This should be the same as unplugging the cable, so
there's nothing coming through the wire to the server.

Veli-Matti
diff mbox

Patch

--- a/drivers/net/bonding/bond_3ad.c 2016-06-17 09:49:56.236636742 +0300
+++ b/drivers/net/bonding/bond_3ad.c 2016-06-17 10:04:34.309353452 +0300
@@ -2458,6 +2458,7 @@ 
  /* link has failed */
  port->is_enabled = false;
  ad_update_actor_keys(port, true);
+ port->sm_vars &= ~AD_PORT_SELECTED;
  }
  netdev_dbg(slave->bond->dev, "Port %d changed link status to %s\n",
    port->actor_port_number,