Message ID | 27926.1454632634@famine |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
>-----Original Message----- >From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com] >Sent: Thursday, February 04, 2016 4:37 PM >To: Tantilov, Emil S >Cc: netdev@vger.kernel.org; gospo@cumulusnetworks.com; zhuyj; >jiri@mellanox.com >Subject: Re: bonding reports interface up with 0 Mbps > >Jay Vosburgh <jay.vosburgh@canonical.com> wrote: >[...] >> Thinking about the trace again... Emil: what happens in the >>trace before this? Is there ever a call to the ixgbe_get_settings? >>Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event >>function? > > Emil kindly sent me the trace offline, and I think I see what's >going on. It looks like the sequence of events is: > >bond_enslave -> > bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN) > [ do rest of enslavement, start miimon periodic work ] > > [ time passes, device goes carrier up ] > >ixgbe_service_task: eth1: NIC Link is Up 10 Gbps -> > netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line) > > [ a few microseconds later ] > >bond_mii_monitor -> > bond_check_dev_link (now is carrier up) > bond_miimon_commit -> (emits "0 Mbps full duplex" message) > bond_lower_state_changed -> > bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored) > bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN) > > [ a few microseconds later, in response to ixgbe's netif_carrier_on ] > >notifier_call_chain -> > bond_netdev_event NETDEV_CHANGE -> > bond_update_speed_duplex (sees correct SPEED_10000/FULL) -> > bond_3ad_adapter_speed_duplex_changed (updates 802.3ad) > > Basically, the race is that the periodic bond_mii_monitor is >squeezing in between the link going up and bonding's update of the speed >and duplex in response to the NETDEV_CHANGE triggered by the driver's >netif_carrier_on call. bonding ends up using the stale duplex and speed >information obtained at enslavement time. > > I think that, nowadays, the initial speed and duplex will pretty >much always be UNKNOWN, at least for real Ethernet devices, because it >will take longer to autoneg than the time between the dev_open and >bond_update_speed_duplex calls in bond_enslave. > > Adding a case to bond_netdev_event for CHANGELOWERSTATE works >because it's a synchronous call from bonding. For purposes of fixing >this, it's more or less equivalent to calling bond_update_speed_duplex >from bond_miimon_commit (which is part of a test patch I posted earlier >today). > > If the above analysis is correct, then I would expect this patch >to make the problem go away: > >diff --git a/drivers/net/bonding/bond_main.c >b/drivers/net/bonding/bond_main.c >index 56b560558884..cabaeb61333d 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond) > continue; > > case BOND_LINK_UP: >+ bond_update_speed_duplex(slave); > bond_set_slave_link_state(slave, BOND_LINK_UP, > BOND_SLAVE_NOTIFY_NOW); > slave->last_link_up = jiffies; > > > Emil, can you give just the above a test? Sure I'll fire it up. Thanks, Emil
On 02/05/2016 08:37 AM, Jay Vosburgh wrote: > Jay Vosburgh <jay.vosburgh@canonical.com> wrote: > [...] >> Thinking about the trace again... Emil: what happens in the >> trace before this? Is there ever a call to the ixgbe_get_settings? >> Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event >> function? > Emil kindly sent me the trace offline, and I think I see what's > going on. It looks like the sequence of events is: > > bond_enslave -> > bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN) > [ do rest of enslavement, start miimon periodic work ] > > [ time passes, device goes carrier up ] > > ixgbe_service_task: eth1: NIC Link is Up 10 Gbps -> > netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line) > > [ a few microseconds later ] > > bond_mii_monitor -> > bond_check_dev_link (now is carrier up) > bond_miimon_commit -> (emits "0 Mbps full duplex" message) > bond_lower_state_changed -> > bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored) > bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN) > > [ a few microseconds later, in response to ixgbe's netif_carrier_on ] > > notifier_call_chain -> > bond_netdev_event NETDEV_CHANGE -> > bond_update_speed_duplex (sees correct SPEED_10000/FULL) -> > bond_3ad_adapter_speed_duplex_changed (updates 802.3ad) > > Basically, the race is that the periodic bond_mii_monitor is > squeezing in between the link going up and bonding's update of the speed > and duplex in response to the NETDEV_CHANGE triggered by the driver's > netif_carrier_on call. bonding ends up using the stale duplex and speed > information obtained at enslavement time. > > I think that, nowadays, the initial speed and duplex will pretty > much always be UNKNOWN, at least for real Ethernet devices, because it > will take longer to autoneg than the time between the dev_open and > bond_update_speed_duplex calls in bond_enslave. > > Adding a case to bond_netdev_event for CHANGELOWERSTATE works > because it's a synchronous call from bonding. For purposes of fixing > this, it's more or less equivalent to calling bond_update_speed_duplex > from bond_miimon_commit (which is part of a test patch I posted earlier > today). > > If the above analysis is correct, then I would expect this patch > to make the problem go away: > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 56b560558884..cabaeb61333d 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond) > continue; > > case BOND_LINK_UP: > + bond_update_speed_duplex(slave); > bond_set_slave_link_state(slave, BOND_LINK_UP, > BOND_SLAVE_NOTIFY_NOW); > slave->last_link_up = jiffies; > > > Emil, can you give just the above a test? > > I don't see in the trace that there's evidence that ixgbe's link > is rapidly flapping, so I don't think it's necessary to do more than the > above. Sure. I agree with you. I expect this can solve this problem. Thanks a lot. Zhu Yanjun > > Now, separately, bonding really should obey the NETDEV_CHANGE / > NETDEV_UP events instead of polling for carrier state, but if the above > patch works it's a simple fix that is easily backported, which the > CHANGELOWERSTATE method isn't, and the new way (notifier driven) can be > net-next material. > > -J > > --- > -Jay Vosburgh, jay.vosburgh@canonical.com
On 02/05/2016 08:43 AM, Tantilov, Emil S wrote: >> -----Original Message----- >> From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com] >> Sent: Thursday, February 04, 2016 4:37 PM >> To: Tantilov, Emil S >> Cc: netdev@vger.kernel.org; gospo@cumulusnetworks.com; zhuyj; >> jiri@mellanox.com >> Subject: Re: bonding reports interface up with 0 Mbps >> >> Jay Vosburgh <jay.vosburgh@canonical.com> wrote: >> [...] >>> Thinking about the trace again... Emil: what happens in the >>> trace before this? Is there ever a call to the ixgbe_get_settings? >>> Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event >>> function? >> Emil kindly sent me the trace offline, and I think I see what's >> going on. It looks like the sequence of events is: >> >> bond_enslave -> >> bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN) >> [ do rest of enslavement, start miimon periodic work ] >> >> [ time passes, device goes carrier up ] >> >> ixgbe_service_task: eth1: NIC Link is Up 10 Gbps -> >> netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line) >> >> [ a few microseconds later ] >> >> bond_mii_monitor -> >> bond_check_dev_link (now is carrier up) >> bond_miimon_commit -> (emits "0 Mbps full duplex" message) >> bond_lower_state_changed -> >> bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored) >> bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN) >> >> [ a few microseconds later, in response to ixgbe's netif_carrier_on ] >> >> notifier_call_chain -> >> bond_netdev_event NETDEV_CHANGE -> >> bond_update_speed_duplex (sees correct SPEED_10000/FULL) -> >> bond_3ad_adapter_speed_duplex_changed (updates 802.3ad) >> >> Basically, the race is that the periodic bond_mii_monitor is >> squeezing in between the link going up and bonding's update of the speed >> and duplex in response to the NETDEV_CHANGE triggered by the driver's >> netif_carrier_on call. bonding ends up using the stale duplex and speed >> information obtained at enslavement time. >> >> I think that, nowadays, the initial speed and duplex will pretty >> much always be UNKNOWN, at least for real Ethernet devices, because it >> will take longer to autoneg than the time between the dev_open and >> bond_update_speed_duplex calls in bond_enslave. >> >> Adding a case to bond_netdev_event for CHANGELOWERSTATE works >> because it's a synchronous call from bonding. For purposes of fixing >> this, it's more or less equivalent to calling bond_update_speed_duplex > >from bond_miimon_commit (which is part of a test patch I posted earlier >> today). >> >> If the above analysis is correct, then I would expect this patch >> to make the problem go away: >> >> diff --git a/drivers/net/bonding/bond_main.c >> b/drivers/net/bonding/bond_main.c >> index 56b560558884..cabaeb61333d 100644 >> --- a/drivers/net/bonding/bond_main.c >> +++ b/drivers/net/bonding/bond_main.c >> @@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond) >> continue; >> >> case BOND_LINK_UP: >> + bond_update_speed_duplex(slave); >> bond_set_slave_link_state(slave, BOND_LINK_UP, >> BOND_SLAVE_NOTIFY_NOW); >> slave->last_link_up = jiffies; >> >> >> Emil, can you give just the above a test? > Sure I'll fire it up. Let me know the test result. Thanks a lot. Zhu Yanjun > > Thanks, > Emil >
>-----Original Message----- >From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com] >Sent: Thursday, February 04, 2016 4:37 PM >To: Tantilov, Emil S >Cc: netdev@vger.kernel.org; gospo@cumulusnetworks.com; zhuyj; >jiri@mellanox.com >Subject: Re: bonding reports interface up with 0 Mbps > >Jay Vosburgh <jay.vosburgh@canonical.com> wrote: >[...] >> Thinking about the trace again... Emil: what happens in the >>trace before this? Is there ever a call to the ixgbe_get_settings? >>Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event >>function? > > Emil kindly sent me the trace offline, and I think I see what's >going on. It looks like the sequence of events is: > >bond_enslave -> > bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN) > [ do rest of enslavement, start miimon periodic work ] > > [ time passes, device goes carrier up ] > >ixgbe_service_task: eth1: NIC Link is Up 10 Gbps -> > netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line) > > [ a few microseconds later ] > >bond_mii_monitor -> > bond_check_dev_link (now is carrier up) > bond_miimon_commit -> (emits "0 Mbps full duplex" message) > bond_lower_state_changed -> > bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored) > bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN) > > [ a few microseconds later, in response to ixgbe's netif_carrier_on ] > >notifier_call_chain -> > bond_netdev_event NETDEV_CHANGE -> > bond_update_speed_duplex (sees correct SPEED_10000/FULL) -> > bond_3ad_adapter_speed_duplex_changed (updates 802.3ad) > > Basically, the race is that the periodic bond_mii_monitor is >squeezing in between the link going up and bonding's update of the speed >and duplex in response to the NETDEV_CHANGE triggered by the driver's >netif_carrier_on call. bonding ends up using the stale duplex and speed >information obtained at enslavement time. > > I think that, nowadays, the initial speed and duplex will pretty >much always be UNKNOWN, at least for real Ethernet devices, because it >will take longer to autoneg than the time between the dev_open and >bond_update_speed_duplex calls in bond_enslave. > > Adding a case to bond_netdev_event for CHANGELOWERSTATE works >because it's a synchronous call from bonding. For purposes of fixing >this, it's more or less equivalent to calling bond_update_speed_duplex >from bond_miimon_commit (which is part of a test patch I posted earlier >today). > > If the above analysis is correct, then I would expect this patch >to make the problem go away: > >diff --git a/drivers/net/bonding/bond_main.c >b/drivers/net/bonding/bond_main.c >index 56b560558884..cabaeb61333d 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond) > continue; > > case BOND_LINK_UP: >+ bond_update_speed_duplex(slave); > bond_set_slave_link_state(slave, BOND_LINK_UP, > BOND_SLAVE_NOTIFY_NOW); > slave->last_link_up = jiffies; > > > Emil, can you give just the above a test? Test has been running all night and no failures so far. Looking at the logs the condition triggering the race occurred 5 times. I will leave the test over the weekend just in case and post a final update on Monday. Thanks, Emil
>-----Original Message----- >From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com] >Sent: Thursday, February 04, 2016 4:37 PM >To: Tantilov, Emil S <emil.s.tantilov@intel.com> >Cc: netdev@vger.kernel.org; gospo@cumulusnetworks.com; zhuyj ><zyjzyj2000@gmail.com>; jiri@mellanox.com >Subject: Re: bonding reports interface up with 0 Mbps > >Jay Vosburgh <jay.vosburgh@canonical.com> wrote: >[...] >> Thinking about the trace again... Emil: what happens in the >>trace before this? Is there ever a call to the ixgbe_get_settings? >>Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event >>function? > > Emil kindly sent me the trace offline, and I think I see what's >going on. It looks like the sequence of events is: > >bond_enslave -> > bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN) > [ do rest of enslavement, start miimon periodic work ] > > [ time passes, device goes carrier up ] > >ixgbe_service_task: eth1: NIC Link is Up 10 Gbps -> > netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line) > > [ a few microseconds later ] > >bond_mii_monitor -> > bond_check_dev_link (now is carrier up) > bond_miimon_commit -> (emits "0 Mbps full duplex" message) > bond_lower_state_changed -> > bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored) > bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN) > > [ a few microseconds later, in response to ixgbe's netif_carrier_on ] > >notifier_call_chain -> > bond_netdev_event NETDEV_CHANGE -> > bond_update_speed_duplex (sees correct SPEED_10000/FULL) -> > bond_3ad_adapter_speed_duplex_changed (updates 802.3ad) > > Basically, the race is that the periodic bond_mii_monitor is >squeezing in between the link going up and bonding's update of the speed >and duplex in response to the NETDEV_CHANGE triggered by the driver's >netif_carrier_on call. bonding ends up using the stale duplex and speed >information obtained at enslavement time. > > I think that, nowadays, the initial speed and duplex will pretty >much always be UNKNOWN, at least for real Ethernet devices, because it >will take longer to autoneg than the time between the dev_open and >bond_update_speed_duplex calls in bond_enslave. > > Adding a case to bond_netdev_event for CHANGELOWERSTATE works >because it's a synchronous call from bonding. For purposes of fixing >this, it's more or less equivalent to calling bond_update_speed_duplex >from bond_miimon_commit (which is part of a test patch I posted earlier >today). > > If the above analysis is correct, then I would expect this patch >to make the problem go away: > >diff --git a/drivers/net/bonding/bond_main.c >b/drivers/net/bonding/bond_main.c >index 56b560558884..cabaeb61333d 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond) > continue; > > case BOND_LINK_UP: >+ bond_update_speed_duplex(slave); > bond_set_slave_link_state(slave, BOND_LINK_UP, > BOND_SLAVE_NOTIFY_NOW); > slave->last_link_up = jiffies; No issues seen over the weekend with this patch. The condition was hit 32 times. You can add my "tested-by:" when you submit this patch. Thanks Jay for all your help! Emil
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 56b560558884..cabaeb61333d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond) continue; case BOND_LINK_UP: + bond_update_speed_duplex(slave); bond_set_slave_link_state(slave, BOND_LINK_UP, BOND_SLAVE_NOTIFY_NOW); slave->last_link_up = jiffies;