diff mbox

[1/1] bonding: restrict up state in 802.3ad mode

Message ID 1452147313-22886-1-git-send-email-zyjzyj2000@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Zhu Yanjun Jan. 7, 2016, 6:15 a.m. UTC
From: Zhu Yanjun <yanjun.zhu@windriver.com>

In 802.3ad mode, the speed and duplex is needed. But in some NIC,
there is a time span between NIC up state and getting speed and duplex.
As such, sometimes a slave in 802.3ad mode is in up state without
speed and duplex. This will make bonding in 802.3ad mode can not
work well.
To make bonding driver be compatible with more NICs, it is
necessary to restrict the up state in 802.3ad mode.

Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
---
 drivers/net/bonding/bond_main.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Zhu Yanjun Jan. 7, 2016, 6:22 a.m. UTC | #1
Hi, Emil

Would you like to help me to make tests with this patch?
If the root cause is not the time span, I will make a new patch for this.

Thanks a lot.
Zhu Yanjun

On 01/07/2016 02:15 PM, zyjzyj2000@gmail.com wrote:
> From: Zhu Yanjun <yanjun.zhu@windriver.com>
>
> In 802.3ad mode, the speed and duplex is needed. But in some NIC,
> there is a time span between NIC up state and getting speed and duplex.
> As such, sometimes a slave in 802.3ad mode is in up state without
> speed and duplex. This will make bonding in 802.3ad mode can not
> work well.
> To make bonding driver be compatible with more NICs, it is
> necessary to restrict the up state in 802.3ad mode.
>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
> ---
>   drivers/net/bonding/bond_main.c |   11 +++++++++++
>   1 file changed, 11 insertions(+)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 09f8a48..7df8af5 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1991,6 +1991,17 @@ static int bond_miimon_inspect(struct bonding *bond)
>   
>   		link_state = bond_check_dev_link(bond, slave->dev, 0);
>   
> +		if ((BMSR_LSTATUS == link_state) &&
> +		    (BOND_MODE(bond) == BOND_MODE_8023AD)) {
> +			rtnl_lock();
> +			bond_update_speed_duplex(slave);
> +			rtnl_unlock();
> +			if ((slave->speed == SPEED_UNKNOWN) ||
> +			    (slave->duplex == DUPLEX_UNKNOWN)) {
> +				link_state = 0;
> +				netdev_info(bond->dev, "In 802.3ad mode, it is not enough to up without speed and duplex");
> +			}
> +		}
>   		switch (slave->link) {
>   		case BOND_LINK_UP:
>   			if (link_state)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Jan. 7, 2016, 6:33 a.m. UTC | #2
<zyjzyj2000@gmail.com> wrote:

>From: Zhu Yanjun <yanjun.zhu@windriver.com>
>
>In 802.3ad mode, the speed and duplex is needed. But in some NIC,
>there is a time span between NIC up state and getting speed and duplex.
>As such, sometimes a slave in 802.3ad mode is in up state without
>speed and duplex. This will make bonding in 802.3ad mode can not
>work well.

	From my reading of Emil's comments in the discussion, I'm not
sure the above is an accurate description of the problem.  If I'm
understanding correctly, the cause is due to link flaps racing with the
bonding monitor workqueue polling the state.  Is this correct?

>To make bonding driver be compatible with more NICs, it is
>necessary to restrict the up state in 802.3ad mode.
>
>Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
>---
> drivers/net/bonding/bond_main.c |   11 +++++++++++
> 1 file changed, 11 insertions(+)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 09f8a48..7df8af5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1991,6 +1991,17 @@ static int bond_miimon_inspect(struct bonding *bond)
> 
> 		link_state = bond_check_dev_link(bond, slave->dev, 0);
> 
>+		if ((BMSR_LSTATUS == link_state) &&
>+		    (BOND_MODE(bond) == BOND_MODE_8023AD)) {
>+			rtnl_lock();
>+			bond_update_speed_duplex(slave);
>+			rtnl_unlock();

	This will add a round trip on the RTNL mutex for every miimon
interval when the slave is carrier up.  At common miimon rates (10 - 50
ms), this will hit RTNL between 20 and 100 times per second.  I do not
see how this is acceptable.

	I believe the proper solution here is to supplant the periodic
miimon polling from bonding with link state detection based on notifiers
(As Stephen suggested, not for the first time).

	My suggestion is to have bonding set slave link state based on
notifiers if miimon is set to zero, and poll as usual if it is not.
This would preserve any backwards compatibility with any device out
there that might possibly still be doing netif_carrier_on/off
incorrectly or not at all.  The only minor complication is synchronizing
notifier carrier state detection with the ARP monitor.

	This should have been done a long time ago; I'll work something
up tomorrow (it's late here right now) and post a patch for testing.

	-J

>+			if ((slave->speed == SPEED_UNKNOWN) ||
>+			    (slave->duplex == DUPLEX_UNKNOWN)) {
>+				link_state = 0;
>+				netdev_info(bond->dev, "In 802.3ad mode, it is not enough to up without speed and duplex");
>+			}
>+		}
> 		switch (slave->link) {
> 		case BOND_LINK_UP:
> 			if (link_state)
>-- 
>1.7.9.5
>

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek Jan. 7, 2016, 6:53 a.m. UTC | #3
On Thu, Jan 07, 2016 at 02:15:13PM +0800, zyjzyj2000@gmail.com wrote:
> From: Zhu Yanjun <yanjun.zhu@windriver.com>
> 
> In 802.3ad mode, the speed and duplex is needed. But in some NIC,
> there is a time span between NIC up state and getting speed and duplex.
> As such, sometimes a slave in 802.3ad mode is in up state without
> speed and duplex. This will make bonding in 802.3ad mode can not
> work well.
> To make bonding driver be compatible with more NICs, it is
> necessary to restrict the up state in 802.3ad mode.
> 
> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
> ---
>  drivers/net/bonding/bond_main.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 09f8a48..7df8af5 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1991,6 +1991,17 @@ static int bond_miimon_inspect(struct bonding *bond)
>  
>  		link_state = bond_check_dev_link(bond, slave->dev, 0);
>  
> +		if ((BMSR_LSTATUS == link_state) &&
> +		    (BOND_MODE(bond) == BOND_MODE_8023AD)) {
> +			rtnl_lock();
> +			bond_update_speed_duplex(slave);
> +			rtnl_unlock();
> +			if ((slave->speed == SPEED_UNKNOWN) ||
> +			    (slave->duplex == DUPLEX_UNKNOWN)) {
> +				link_state = 0;
> +				netdev_info(bond->dev, "In 802.3ad mode, it is not enough to up without speed and duplex");

If I read this right, whenever this state (link up but speed/duplex
unknown) is entered, you'll keep writing this message into kernel log
every miimon milliseconds until something changes. I'm not sure how long
a NIC can stay in such state but it might get quite annoying (even more
if something really goes wrong and NIC stays that way which can't be
completely ruled out, IMHO).


> +			}
> +		}
>  		switch (slave->link) {
>  		case BOND_LINK_UP:
>  			if (link_state)

BtW, you accidentally submitted this patch twice.

                                                          Michal Kubecek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yanjun Jan. 7, 2016, 7:37 a.m. UTC | #4
On 01/07/2016 02:53 PM, Michal Kubecek wrote:
> On Thu, Jan 07, 2016 at 02:15:13PM +0800, zyjzyj2000@gmail.com wrote:
>> From: Zhu Yanjun <yanjun.zhu@windriver.com>
>>
>> In 802.3ad mode, the speed and duplex is needed. But in some NIC,
>> there is a time span between NIC up state and getting speed and duplex.
>> As such, sometimes a slave in 802.3ad mode is in up state without
>> speed and duplex. This will make bonding in 802.3ad mode can not
>> work well.
>> To make bonding driver be compatible with more NICs, it is
>> necessary to restrict the up state in 802.3ad mode.
>>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
>> ---
>>   drivers/net/bonding/bond_main.c |   11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 09f8a48..7df8af5 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1991,6 +1991,17 @@ static int bond_miimon_inspect(struct bonding *bond)
>>   
>>   		link_state = bond_check_dev_link(bond, slave->dev, 0);
>>   
>> +		if ((BMSR_LSTATUS == link_state) &&
>> +		    (BOND_MODE(bond) == BOND_MODE_8023AD)) {
>> +			rtnl_lock();
>> +			bond_update_speed_duplex(slave);
>> +			rtnl_unlock();
>> +			if ((slave->speed == SPEED_UNKNOWN) ||
>> +			    (slave->duplex == DUPLEX_UNKNOWN)) {
>> +				link_state = 0;
>> +				netdev_info(bond->dev, "In 802.3ad mode, it is not enough to up without speed and duplex");
> If I read this right, whenever this state (link up but speed/duplex
> unknown) is entered, you'll keep writing this message into kernel log
> every miimon milliseconds until something changes. I'm not sure how long
> a NIC can stay in such state but it might get quite annoying (even more
> if something really goes wrong and NIC stays that way which can't be
> completely ruled out, IMHO).

Sure, Thanks a lot. I want to confirm link_up without link_speed. It is 
not usual. So I think this only lasts for several seconds.
It is very important to us since it can help us to find the root cause.

Zhu Yanjun

>
>
>> +			}
>> +		}
>>   		switch (slave->link) {
>>   		case BOND_LINK_UP:
>>   			if (link_state)
> BtW, you accidentally submitted this patch twice.
>
>                                                            Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michal Kubecek Jan. 7, 2016, 7:59 a.m. UTC | #5
On Thu, Jan 07, 2016 at 03:37:26PM +0800, zhuyj wrote:
> >If I read this right, whenever this state (link up but speed/duplex
> >unknown) is entered, you'll keep writing this message into kernel log
> >every miimon milliseconds until something changes. I'm not sure how long
> >a NIC can stay in such state but it might get quite annoying (even more
> >if something really goes wrong and NIC stays that way which can't be
> >completely ruled out, IMHO).
> 
> Sure, Thanks a lot. I want to confirm link_up without link_speed. It
> is not usual. So I think this only lasts for several seconds.
> It is very important to us since it can help us to find the root cause.

For debugging purposes it's fine, of course. But this looked like an
officially submitted patch so I didn't like the idea of log spamming
(even one second could result in 10-100 messages and admins certainly
would hate that).

                                                       Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yanjun Jan. 7, 2016, 8:35 a.m. UTC | #6
On 01/07/2016 03:59 PM, Michal Kubecek wrote:
> On Thu, Jan 07, 2016 at 03:37:26PM +0800, zhuyj wrote:
>>> If I read this right, whenever this state (link up but speed/duplex
>>> unknown) is entered, you'll keep writing this message into kernel log
>>> every miimon milliseconds until something changes. I'm not sure how long
>>> a NIC can stay in such state but it might get quite annoying (even more
>>> if something really goes wrong and NIC stays that way which can't be
>>> completely ruled out, IMHO).
>> Sure, Thanks a lot. I want to confirm link_up without link_speed. It
>> is not usual. So I think this only lasts for several seconds.
>> It is very important to us since it can help us to find the root cause.
> For debugging purposes it's fine, of course. But this looked like an
> officially submitted patch so I didn't like the idea of log spamming
> (even one second could result in 10-100 messages and admins certainly
> would hate that).
>
>                                                         Michal Kubecek
>
Thanks a lot.

Zhu Yanjun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tantilov, Emil S Jan. 7, 2016, 3:27 p.m. UTC | #7
>-----Original Message-----
>From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com]
>Sent: Wednesday, January 06, 2016 10:34 PM
>To: zyjzyj2000@gmail.com
>Cc: Tantilov, Emil S; mkubecek@suse.cz; vfalico@gmail.com;
>gospo@cumulusnetworks.com; netdev@vger.kernel.org; Shteinbock, Boris (Wind
>River)
>Subject: Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
>
><zyjzyj2000@gmail.com> wrote:
>
>>From: Zhu Yanjun <yanjun.zhu@windriver.com>
>>
>>In 802.3ad mode, the speed and duplex is needed. But in some NIC,
>>there is a time span between NIC up state and getting speed and duplex.
>>As such, sometimes a slave in 802.3ad mode is in up state without
>>speed and duplex. This will make bonding in 802.3ad mode can not
>>work well.
>
>	From my reading of Emil's comments in the discussion, I'm not
>sure the above is an accurate description of the problem.  If I'm
>understanding correctly, the cause is due to link flaps racing with the
>bonding monitor workqueue polling the state.  Is this correct?

That is correct.

>>To make bonding driver be compatible with more NICs, it is
>>necessary to restrict the up state in 802.3ad mode.
>>
>>Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
>>---
>> drivers/net/bonding/bond_main.c |   11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>>diff --git a/drivers/net/bonding/bond_main.c
>b/drivers/net/bonding/bond_main.c
>>index 09f8a48..7df8af5 100644
>>--- a/drivers/net/bonding/bond_main.c
>>+++ b/drivers/net/bonding/bond_main.c
>>@@ -1991,6 +1991,17 @@ static int bond_miimon_inspect(struct bonding
>*bond)
>>
>> 		link_state = bond_check_dev_link(bond, slave->dev, 0);
>>
>>+		if ((BMSR_LSTATUS == link_state) &&
>>+		    (BOND_MODE(bond) == BOND_MODE_8023AD)) {
>>+			rtnl_lock();
>>+			bond_update_speed_duplex(slave);
>>+			rtnl_unlock();
>
>	This will add a round trip on the RTNL mutex for every miimon
>interval when the slave is carrier up.  At common miimon rates (10 - 50
>ms), this will hit RTNL between 20 and 100 times per second.  I do not
>see how this is acceptable.
>
>	I believe the proper solution here is to supplant the periodic
>miimon polling from bonding with link state detection based on notifiers
>(As Stephen suggested, not for the first time).
>
>	My suggestion is to have bonding set slave link state based on
>notifiers if miimon is set to zero, and poll as usual if it is not.
>This would preserve any backwards compatibility with any device out
>there that might possibly still be doing netif_carrier_on/off
>incorrectly or not at all.  The only minor complication is synchronizing
>notifier carrier state detection with the ARP monitor.
>
>	This should have been done a long time ago; I'll work something
>up tomorrow (it's late here right now) and post a patch for testing.

That would be awesome. Looking forward to it.

Thanks,
Emil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Zhu Yanjun Jan. 8, 2016, 2:29 a.m. UTC | #8
On 01/07/2016 02:33 PM, Jay Vosburgh wrote:
> <zyjzyj2000@gmail.com> wrote:
>
>> From: Zhu Yanjun <yanjun.zhu@windriver.com>
>>
>> In 802.3ad mode, the speed and duplex is needed. But in some NIC,
>> there is a time span between NIC up state and getting speed and duplex.
>> As such, sometimes a slave in 802.3ad mode is in up state without
>> speed and duplex. This will make bonding in 802.3ad mode can not
>> work well.
> 	From my reading of Emil's comments in the discussion, I'm not
> sure the above is an accurate description of the problem.  If I'm
> understanding correctly, the cause is due to link flaps racing with the
> bonding monitor workqueue polling the state.  Is this correct?
The following are from my user. What I have done is based on it.
"
Here's one theory that would seem to match my observations:

It seems that the x540T can sometimes report 'link up' a few moments 
before the speed and duplex are known. If the bond driver reads and 
stores the speed and duplex immediately after the link becomes 'up', it 
may occasionally miss the actual parameters by reading them before they 
are ready. If the parameters are only read when the link state changes, 
the 'unknown' status will stay until next state change happens.

I have attached a file 'test.log' that shows the kernel log and states 
of the relevant interfaces after the problem is hit. It shows that the 
final state has all the links up and speeds are known on individual 
interfaces, but bond0 shows only single interface speed. After 
successful negotiation bond0 shows the aggregate speed i.e. 20000Mb/s. 
In the end of the file there is bunch of ethtool runs that were taken in 
a tight loop during the negotiation. It shows in the end that the link 
becomes up some time before the speed and duplex are actually known. If 
the bond driver only reads the speed and duplex during this window, it 
will get it wrong and it won't be corrected when the real speed becomes 
known since the link state won't change at that time.
"

Zhu Yanjun
>
>> To make bonding driver be compatible with more NICs, it is
>> necessary to restrict the up state in 802.3ad mode.
>>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@windriver.com>
>> ---
>> drivers/net/bonding/bond_main.c |   11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index 09f8a48..7df8af5 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -1991,6 +1991,17 @@ static int bond_miimon_inspect(struct bonding *bond)
>>
>> 		link_state = bond_check_dev_link(bond, slave->dev, 0);
>>
>> +		if ((BMSR_LSTATUS == link_state) &&
>> +		    (BOND_MODE(bond) == BOND_MODE_8023AD)) {
>> +			rtnl_lock();
>> +			bond_update_speed_duplex(slave);
>> +			rtnl_unlock();
> 	This will add a round trip on the RTNL mutex for every miimon
> interval when the slave is carrier up.  At common miimon rates (10 - 50
> ms), this will hit RTNL between 20 and 100 times per second.  I do not
> see how this is acceptable.
>
> 	I believe the proper solution here is to supplant the periodic
> miimon polling from bonding with link state detection based on notifiers
> (As Stephen suggested, not for the first time).
>
> 	My suggestion is to have bonding set slave link state based on
> notifiers if miimon is set to zero, and poll as usual if it is not.
> This would preserve any backwards compatibility with any device out
> there that might possibly still be doing netif_carrier_on/off
> incorrectly or not at all.  The only minor complication is synchronizing
> notifier carrier state detection with the ARP monitor.
>
> 	This should have been done a long time ago; I'll work something
> up tomorrow (it's late here right now) and post a patch for testing.
>
> 	-J
>
>> +			if ((slave->speed == SPEED_UNKNOWN) ||
>> +			    (slave->duplex == DUPLEX_UNKNOWN)) {
>> +				link_state = 0;
>> +				netdev_info(bond->dev, "In 802.3ad mode, it is not enough to up without speed and duplex");
>> +			}
>> +		}
>> 		switch (slave->link) {
>> 		case BOND_LINK_UP:
>> 			if (link_state)
>> -- 
>> 1.7.9.5
>>
> ---
> 	-Jay Vosburgh, jay.vosburgh@canonical.com
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 09f8a48..7df8af5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1991,6 +1991,17 @@  static int bond_miimon_inspect(struct bonding *bond)
 
 		link_state = bond_check_dev_link(bond, slave->dev, 0);
 
+		if ((BMSR_LSTATUS == link_state) &&
+		    (BOND_MODE(bond) == BOND_MODE_8023AD)) {
+			rtnl_lock();
+			bond_update_speed_duplex(slave);
+			rtnl_unlock();
+			if ((slave->speed == SPEED_UNKNOWN) ||
+			    (slave->duplex == DUPLEX_UNKNOWN)) {
+				link_state = 0;
+				netdev_info(bond->dev, "In 802.3ad mode, it is not enough to up without speed and duplex");
+			}
+		}
 		switch (slave->link) {
 		case BOND_LINK_UP:
 			if (link_state)