diff mbox

bonding:update speed/duplex for NETDEV_CHANGE

Message ID e00468a2cbb8a25d7a89028e876769449454309f.1320070684.git.wpan@redhat.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Weiping Pan Oct. 31, 2011, 2:19 p.m. UTC
Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
arp monitor, sometimes bonding driver cannot get the speed and duplex from
its slaves, it will assume them to be 100Mb/sec and Full, please see
/proc/net/bonding/bond0.
But there is no such problem when uses miimon.

(Take igb for example)
I find that the reason is that after dev_open() in bond_enslave(),
bond_update_speed_duplex() will call igb_get_settings()
, but in that function,
it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
because igb get an error value of status.
So even dev_open() is called, but the device is not really ready to get its
settings.

Maybe it is safe for us to call igb_get_settings() only after
this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
Flow Control: RX".

So I prefer to update the speed and duplex for a slave when reseices
NETDEV_CHANGE/NETDEV_UP event.

Signed-off-by: Weiping Pan <wpan@redhat.com>
---
 drivers/net/bonding/bond_main.c |   19 ++++++++-----------
 1 files changed, 8 insertions(+), 11 deletions(-)

Comments

Ben Hutchings Oct. 31, 2011, 6:15 p.m. UTC | #1
On Mon, 2011-10-31 at 22:19 +0800, Weiping Pan wrote:
> Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
> arp monitor, sometimes bonding driver cannot get the speed and duplex from
> its slaves, it will assume them to be 100Mb/sec and Full, please see
> /proc/net/bonding/bond0.
> But there is no such problem when uses miimon.
> 
> (Take igb for example)
> I find that the reason is that after dev_open() in bond_enslave(),
> bond_update_speed_duplex() will call igb_get_settings()
> , but in that function,
> it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
> because igb get an error value of status.
> So even dev_open() is called, but the device is not really ready to get its
> settings.
> 
> Maybe it is safe for us to call igb_get_settings() only after
> this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
> Flow Control: RX".
[...]

For any device with autonegotiation enabled, you generally cannot get
the speed and duplex settings until the link is up.  While the link is
down, you may see a value of 0, ~0, or the best mode currently
advertised.  So I think that the bonding driver should avoid updating
the slave speed and duplex values whenever autoneg is enabled and the
link is down.

Ben.
Jay Vosburgh Oct. 31, 2011, 8:32 p.m. UTC | #2
Ben Hutchings <bhutchings@solarflare.com> wrote:

>On Mon, 2011-10-31 at 22:19 +0800, Weiping Pan wrote:
>> Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
>> arp monitor, sometimes bonding driver cannot get the speed and duplex from
>> its slaves, it will assume them to be 100Mb/sec and Full, please see
>> /proc/net/bonding/bond0.
>> But there is no such problem when uses miimon.
>> 
>> (Take igb for example)
>> I find that the reason is that after dev_open() in bond_enslave(),
>> bond_update_speed_duplex() will call igb_get_settings()
>> , but in that function,
>> it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
>> because igb get an error value of status.
>> So even dev_open() is called, but the device is not really ready to get its
>> settings.
>> 
>> Maybe it is safe for us to call igb_get_settings() only after
>> this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
>> Flow Control: RX".
>[...]

	I'll first point out that this patch is somewhat cosmetic, and
really only affects what shows up in /proc/net/bonding/bond0 for speed
and duplex.  The reason being that the modes that actually need to use
the speed and duplex information require the miimon for link state
checking, and that code path does the right thing already.

	This has probably been wrong all along, but relatively recently
code was added to show the speed and duplex in /proc/net/bonding/bond0,
so it now has a visible effect.

	So, the patch is ok as far as it goes, in that it will keep the
values displayed in the /proc file up to date.

	However, I'm not sure that faking the speed/duplex to 100/Full
is still the correct thing to do.  For the modes that use the
information, the ethtool state won't be queried if carrier is down (and
in those cases, if the speed / duplex returns an error while carrier up,
we should probably pay attention).  For the modes that the information
is merely cosmetic, displaying "unknown" as ethtool does is probably a
more accurate representation.

	Can you additionally remove the "fake to 100/Full" logic?  This
involves changing bond_update_speed_duplex to not fake the speed and
duplex, changing bond_enslave to not issue that warning, and changing
bond_info_show_slave to handle "bad" speed and duplex values.

	Anybody see a problem with doing that?

>For any device with autonegotiation enabled, you generally cannot get
>the speed and duplex settings until the link is up.  While the link is
>down, you may see a value of 0, ~0, or the best mode currently
>advertised.  So I think that the bonding driver should avoid updating
>the slave speed and duplex values whenever autoneg is enabled and the
>link is down.

	Well, it's a little more complicated than that.  Bonding already
generally avoids checking the speed and duplex if the slave isn't up (or
at least normally won't complain if it fails).

	This particular case arises only during enslavement.  The call
to bond_update_speed_duplex call has failed, but the device is marked by
bonding to be up.  Bonding complains that the device isn't down, but it
cannot get speed and duplex, and therefore is assuming them to be
100/Full.

	The catch is that this happens only for the ARP monitor, because
it initially presumes a slave to be up regardless of actual carrier
state (for historical reasons related to very old 10 or 10/100 drivers,
prior to the introduction of netif_carrier_*).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Oct. 31, 2011, 8:48 p.m. UTC | #3
On Mon, 2011-10-31 at 13:32 -0700, Jay Vosburgh wrote:
[...]
> 	This particular case arises only during enslavement.  The call
> to bond_update_speed_duplex call has failed, but the device is marked by
> bonding to be up.  Bonding complains that the device isn't down, but it
> cannot get speed and duplex, and therefore is assuming them to be
> 100/Full.
> 
> 	The catch is that this happens only for the ARP monitor, because
> it initially presumes a slave to be up regardless of actual carrier
> state (for historical reasons related to very old 10 or 10/100 drivers,
> prior to the introduction of netif_carrier_*).

Right, I gathered that.  Is there any reason to use the ARP monitor when
all slaves support link state notification?  Maybe the bonding
documentation should recommend miimon in section 7, not just in section
2.

Ben.
Jay Vosburgh Oct. 31, 2011, 9:23 p.m. UTC | #4
Ben Hutchings <bhutchings@solarflare.com> wrote:

>On Mon, 2011-10-31 at 13:32 -0700, Jay Vosburgh wrote:
>[...]
>> 	This particular case arises only during enslavement.  The call
>> to bond_update_speed_duplex call has failed, but the device is marked by
>> bonding to be up.  Bonding complains that the device isn't down, but it
>> cannot get speed and duplex, and therefore is assuming them to be
>> 100/Full.
>> 
>> 	The catch is that this happens only for the ARP monitor, because
>> it initially presumes a slave to be up regardless of actual carrier
>> state (for historical reasons related to very old 10 or 10/100 drivers,
>> prior to the introduction of netif_carrier_*).
>
>Right, I gathered that.  Is there any reason to use the ARP monitor when
>all slaves support link state notification?  Maybe the bonding
>documentation should recommend miimon in section 7, not just in section
>2.

	The ARP monitor can validate that traffic actually flows from
the slave to some destination in the switch domain (and back), so, for
example, it's useful in cases that multiple switch hops exist between
the host and the local router.  A link failure in the middle of the path
won't affect carrier on the local device, but still may cause a
communications break.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Oct. 31, 2011, 10:41 p.m. UTC | #5
On Mon, 2011-10-31 at 14:23 -0700, Jay Vosburgh wrote:
> Ben Hutchings <bhutchings@solarflare.com> wrote:
> 
> >On Mon, 2011-10-31 at 13:32 -0700, Jay Vosburgh wrote:
> >[...]
> >> 	This particular case arises only during enslavement.  The call
> >> to bond_update_speed_duplex call has failed, but the device is marked by
> >> bonding to be up.  Bonding complains that the device isn't down, but it
> >> cannot get speed and duplex, and therefore is assuming them to be
> >> 100/Full.
> >> 
> >> 	The catch is that this happens only for the ARP monitor, because
> >> it initially presumes a slave to be up regardless of actual carrier
> >> state (for historical reasons related to very old 10 or 10/100 drivers,
> >> prior to the introduction of netif_carrier_*).
> >
> >Right, I gathered that.  Is there any reason to use the ARP monitor when
> >all slaves support link state notification?  Maybe the bonding
> >documentation should recommend miimon in section 7, not just in section
> >2.
> 
> 	The ARP monitor can validate that traffic actually flows from
> the slave to some destination in the switch domain (and back), so, for
> example, it's useful in cases that multiple switch hops exist between
> the host and the local router.  A link failure in the middle of the path
> won't affect carrier on the local device, but still may cause a
> communications break.

Then the ARP monitor should gracefully handle the case where a new slave
has link down, as proposed.

Ben.
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c34cc1e..f5458eb 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3220,6 +3220,7 @@  static int bond_slave_netdev_event(unsigned long event,
 {
 	struct net_device *bond_dev = slave_dev->master;
 	struct bonding *bond = netdev_priv(bond_dev);
+	struct slave *slave = NULL;
 
 	switch (event) {
 	case NETDEV_UNREGISTER:
@@ -3230,20 +3231,16 @@  static int bond_slave_netdev_event(unsigned long event,
 				bond_release(bond_dev, slave_dev);
 		}
 		break;
+	case NETDEV_UP:
 	case NETDEV_CHANGE:
-		if (bond->params.mode == BOND_MODE_8023AD || bond_is_lb(bond)) {
-			struct slave *slave;
-
-			slave = bond_get_slave_by_dev(bond, slave_dev);
-			if (slave) {
-				u32 old_speed = slave->speed;
-				u8  old_duplex = slave->duplex;
-
-				bond_update_speed_duplex(slave);
+		slave = bond_get_slave_by_dev(bond, slave_dev);
+		if (slave) {
+			u32 old_speed = slave->speed;
+			u8  old_duplex = slave->duplex;
 
-				if (bond_is_lb(bond))
-					break;
+			bond_update_speed_duplex(slave);
 
+			if (bond->params.mode == BOND_MODE_8023AD) {
 				if (old_speed != slave->speed)
 					bond_3ad_adapter_speed_changed(slave);
 				if (old_duplex != slave->duplex)