diff mbox

[net-next] bonding: fix wrong port enabling in 802.3ad

Message ID 1318526483-16073-1-git-send-email-fbl@redhat.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Flavio Leitner Oct. 13, 2011, 5:21 p.m. UTC
The port shouldn't be enabled unless its current MUX
state is DISTRIBUTING which is correctly handled by
ad_mux_machine(), otherwise the packet sent can be
lost because the other end may not be ready.

The issue happens on every port initialization, but
as the ports are expected to move quickly to DISTRIBUTING,
it doesn't cause much problem.  However, it does cause
constant packet loss if the other peer has the port
configured to stay in STANDBY (i.e. SYNC set to OFF).

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---

The comments there suggests it was a workaround for losses
of link events, but I couldn't track the changelog as it
seems to be pretty old.  Thus, as all the link notification
stuff has been improved a lot, maybe this is not an issue
anymore.  At least, I didn't find any problem while
unplugging/plugging cables here.

 drivers/net/bonding/bond_3ad.c |    7 -------
 1 files changed, 0 insertions(+), 7 deletions(-)

Comments

Jay Vosburgh Oct. 13, 2011, 5:46 p.m. UTC | #1
Flavio Leitner <fbl@redhat.com> wrote:

>The port shouldn't be enabled unless its current MUX
>state is DISTRIBUTING which is correctly handled by
>ad_mux_machine(), otherwise the packet sent can be
>lost because the other end may not be ready.
>
>The issue happens on every port initialization, but
>as the ports are expected to move quickly to DISTRIBUTING,
>it doesn't cause much problem.  However, it does cause
>constant packet loss if the other peer has the port
>configured to stay in STANDBY (i.e. SYNC set to OFF).

	This may explain another misbehavior I've been looking into: if
the bond's outgoing LACPDUs are lost (never received by the switch), but
the switch's incoming LACPDUs are received, bonding puts the port into
use, and packets to the switch are dropped by the switch.

>Signed-off-by: Flavio Leitner <fbl@redhat.com>
>---
>
>The comments there suggests it was a workaround for losses
>of link events, but I couldn't track the changelog as it
>seems to be pretty old.  Thus, as all the link notification
>stuff has been improved a lot, maybe this is not an issue
>anymore.  At least, I didn't find any problem while
>unplugging/plugging cables here.

	I believe this code fragment is original to the 802.3ad
submission, which would have been around 2003 or so.

	Did you check the standard for what it says should happen in
this case?  I'm guessing this is something not specified by the
standard, given the comment, but we should check to make sure.

	-J

> drivers/net/bonding/bond_3ad.c |    7 -------
> 1 files changed, 0 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 47b928e..b33c099 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -1135,13 +1135,6 @@ static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
> 			__record_pdu(lacpdu, port);
> 			port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(port->actor_oper_port_state & AD_STATE_LACP_TIMEOUT));
> 			port->actor_oper_port_state &= ~AD_STATE_EXPIRED;
>-			// verify that if the aggregator is enabled, the port is enabled too.
>-			//(because if the link goes down for a short time, the 802.3ad will not
>-			// catch it, and the port will continue to be disabled)
>-			if (port->aggregator
>-			    && port->aggregator->is_active
>-			    && !__port_is_enabled(port))
>-				__enable_port(port);
> 			break;
> 		default:    //to silence the compiler
> 			break;
>-- 
>1.7.6
>

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Flavio Leitner Oct. 13, 2011, 6:20 p.m. UTC | #2
On Thu, 13 Oct 2011 10:46:31 -0700
Jay Vosburgh <fubar@us.ibm.com> wrote:

> Flavio Leitner <fbl@redhat.com> wrote:
> 
> >The port shouldn't be enabled unless its current MUX
> >state is DISTRIBUTING which is correctly handled by
> >ad_mux_machine(), otherwise the packet sent can be
> >lost because the other end may not be ready.
> >
> >The issue happens on every port initialization, but
> >as the ports are expected to move quickly to DISTRIBUTING,
> >it doesn't cause much problem.  However, it does cause
> >constant packet loss if the other peer has the port
> >configured to stay in STANDBY (i.e. SYNC set to OFF).
> 
> 	This may explain another misbehavior I've been looking into: if
> the bond's outgoing LACPDUs are lost (never received by the switch), but
> the switch's incoming LACPDUs are received, bonding puts the port into
> use, and packets to the switch are dropped by the switch.
>

Yeah, it could explain that as well because on my tests here,
all ports were enabled as soon as the aggregator is active.

 
> >Signed-off-by: Flavio Leitner <fbl@redhat.com>
> >---
> >
> >The comments there suggests it was a workaround for losses
> >of link events, but I couldn't track the changelog as it
> >seems to be pretty old.  Thus, as all the link notification
> >stuff has been improved a lot, maybe this is not an issue
> >anymore.  At least, I didn't find any problem while
> >unplugging/plugging cables here.
> 
> 	I believe this code fragment is original to the 802.3ad
> submission, which would have been around 2003 or so.
> 
> 	Did you check the standard for what it says should happen in
> this case?  I'm guessing this is something not specified by the
> standard, given the comment, but we should check to make sure.
> 

The standard says that the port should receive when its mux state
is COLLECTING and transmitting when its mux state is DISTRIBUTING.
So, that seems to be violationing the standard because it doesn't
consider the mux state at all.  It is harmless for almost all cases
though, because ports move quickly to DISTRIBUTING or no link at
all, which disables the port.

fbl
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 19, 2011, 7:53 a.m. UTC | #3
From: Flavio Leitner <fbl@redhat.com>
Date: Thu, 13 Oct 2011 14:21:23 -0300

> The port shouldn't be enabled unless its current MUX
> state is DISTRIBUTING which is correctly handled by
> ad_mux_machine(), otherwise the packet sent can be
> lost because the other end may not be ready.
> 
> The issue happens on every port initialization, but
> as the ports are expected to move quickly to DISTRIBUTING,
> it doesn't cause much problem.  However, it does cause
> constant packet loss if the other peer has the port
> configured to stay in STANDBY (i.e. SYNC set to OFF).
> 
> Signed-off-by: Flavio Leitner <fbl@redhat.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 47b928e..b33c099 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -1135,13 +1135,6 @@  static void ad_rx_machine(struct lacpdu *lacpdu, struct port *port)
 			__record_pdu(lacpdu, port);
 			port->sm_rx_timer_counter = __ad_timer_to_ticks(AD_CURRENT_WHILE_TIMER, (u16)(port->actor_oper_port_state & AD_STATE_LACP_TIMEOUT));
 			port->actor_oper_port_state &= ~AD_STATE_EXPIRED;
-			// verify that if the aggregator is enabled, the port is enabled too.
-			//(because if the link goes down for a short time, the 802.3ad will not
-			// catch it, and the port will continue to be disabled)
-			if (port->aggregator
-			    && port->aggregator->is_active
-			    && !__port_is_enabled(port))
-				__enable_port(port);
 			break;
 		default:    //to silence the compiler
 			break;