diff mbox series

[v2] bonding: force enable lacp port after link state recovery for 802.3ad

Message ID 20190823034209.14596-1-zhangsha.zhang@huawei.com
State Rejected
Delegated to: David Miller
Headers show
Series [v2] bonding: force enable lacp port after link state recovery for 802.3ad | expand

Commit Message

zhangsha (A) Aug. 23, 2019, 3:42 a.m. UTC
From: Sha Zhang <zhangsha.zhang@huawei.com>

After the commit 334031219a84 ("bonding/802.3ad: fix slave link
initialization transition states") merged,
the slave's link status will be changed to BOND_LINK_FAIL
from BOND_LINK_DOWN in the following scenario:
- Driver reports loss of carrier and
  bonding driver receives NETDEV_DOWN notifier
- slave's duplex and speed is zerod and
  its port->is_enabled is cleard to 'false';
- Driver reports link recovery and
  bonding driver receives NETDEV_UP notifier;
- If speed/duplex getting failed here, the link status
  will be changed to BOND_LINK_FAIL;
- The MII monotor later recover the slave's speed/duplex
  and set link status to BOND_LINK_UP, but remains
  the 'port->is_enabled' to 'false'.

In this scenario, the lacp port will not be enabled even its speed
and duplex are valid. The bond will not send LACPDU's, and its
state is 'AD_STATE_DEFAULTED' forever. The simplest fix I think
is to call bond_3ad_handle_link_change() in bond_miimon_commit,
this function can enable lacp after port slave speed check.
As enabled, the lacp port can run its state machine normally
after link recovery.

Signed-off-by: Sha Zhang <zhangsha.zhang@huawei.com>
---
 drivers/net/bonding/bond_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Miller Aug. 27, 2019, 10:04 p.m. UTC | #1
From: <zhangsha.zhang@huawei.com>
Date: Fri, 23 Aug 2019 11:42:09 +0800

> - If speed/duplex getting failed here, the link status
>   will be changed to BOND_LINK_FAIL;

How does it fail at this step?  I suspect this is a driver specific
problem.
David Miller Aug. 28, 2019, 8:28 p.m. UTC | #2
You've had enough time to respon to my feedback question.

I'm tossing this patch.
zhangsha (A) Aug. 29, 2019, 11:33 a.m. UTC | #3
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: 2019年8月28日 6:05
> To: zhangsha (A) <zhangsha.zhang@huawei.com>
> Cc: j.vosburgh@gmail.com; vfalico@gmail.com; andy@greyhouse.net;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; yuehaibing
> <yuehaibing@huawei.com>; hunongda <hunongda@huawei.com>;
> Chenzhendong (alex) <alex.chen@huawei.com>
> Subject: Re: [PATCH v2] bonding: force enable lacp port after link state
> recovery for 802.3ad
> 
> From: <zhangsha.zhang@huawei.com>
> Date: Fri, 23 Aug 2019 11:42:09 +0800
> 
> > - If speed/duplex getting failed here, the link status
> >   will be changed to BOND_LINK_FAIL;
> 
> How does it fail at this step?  I suspect this is a driver specific problem.

Hi, David,
I'm really sorry for the delayed email and appreciated for your feedback.

I was testing in kernel 4.19 with a Huawei hinic card when the problem occurred.
I checked the dmesg and got the logs in the following order:
1) link status definitely down for interface eth6, disabling it
2) link status up again after 0 ms for interface eth6
3) the paterner's system mac becomes to "00:00:00:00:00:00".
By  reading the codes, I think that the link status of the slave should be changed
to BOND_LINK_FAIL from BOND_LINK_DOWN. 

As this problem has only occurred once only, I am not very sure about whether this is a
driver specific problem or not at the moment. But I find the commit 4d2c0cda, 
its log says " Some NIC drivers don't have correct speed/duplex settings at the
time they send NETDEV_UP notification ...",  so I prefer to believe it's not.

To my problem I think  it is not enough that link-monitoring (miimon) only set
SPEED/DUPLEX right, the lacp port should be enabled too at the same time.
diff mbox series

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 931d9d9..ef4ec99 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2206,7 +2206,7 @@  static void bond_miimon_commit(struct bonding *bond)
 			 */
 			if (BOND_MODE(bond) == BOND_MODE_8023AD &&
 			    slave->link == BOND_LINK_UP)
-				bond_3ad_adapter_speed_duplex_changed(slave);
+				bond_3ad_handle_link_change(slave, BOND_LINK_UP);
 			continue;
 
 		case BOND_LINK_UP: