diff mbox

[net,v2] bonding: Fix ARP monitor validation

Message ID 29855.1454448956@famine
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Jay Vosburgh Feb. 2, 2016, 9:35 p.m. UTC
The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.

	This logic can fail if there is no currently active slave.  In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave.  The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.

	This can arise if the bonding slaves and switch are working, but
the ARP target is not responding.  When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave.  This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.

	Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.

	This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.

Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

---
v2: more detail in log and comment; no code change.

 drivers/net/bonding/bond_main.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

Comments

Jarod Wilson Feb. 3, 2016, 7:40 p.m. UTC | #1
On Tue, Feb 02, 2016 at 01:35:56PM -0800, Jay Vosburgh wrote:
> 
> 	The current logic in bond_arp_rcv will accept an incoming ARP for
> validation if (a) the receiving slave is either "active" (which includes
> the currently active slave, or the current ARP slave) or, (b) there is a
> currently active slave, and it has received an ARP since it became active.
> For case (b), the receiving slave isn't the currently active slave, and is
> receiving the original broadcast ARP request, not an ARP reply from the
> target.
> 
> 	This logic can fail if there is no currently active slave.  In
> this situation, the ARP probe logic cycles through all slaves, assigning
> each in turn as the "current_arp_slave" for one arp_interval, then setting
> that one as "active," and sending an ARP probe from that slave.  The
> current logic expects the ARP reply to arrive on the sending
> current_arp_slave, however, due to switch FDB updating delays, the reply
> may be directed to another slave.
> 
> 	This can arise if the bonding slaves and switch are working, but
> the ARP target is not responding.  When the ARP target recovers, a
> condition may result wherein the ARP target host replies faster than the
> switch can update its forwarding table, causing each ARP reply to be sent
> to the previous current_arp_slave.  This will never pass the logic in
> bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
> 
> 	Some experimentation on a LAN shows ARP reply round trips in the
> 200 usec range, but my available switches never update their FDB in less
> than 4000 usec.
> 
> 	This patch changes the logic in bond_arp_rcv to additionally
> accept an ARP reply for validation on any slave if there is a current ARP
> slave and it sent an ARP probe during the previous arp_interval.
> 
> Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
> 
> ---
> v2: more detail in log and comment; no code change.

This sounds suspiciously like the same problem Uwe was encountering[*] and
attempting to solve. Uwe, can you give this patch a try?

[*] = http://marc.info/?l=linux-netdev&m=144416122705850&w=2
Jay Vosburgh Feb. 3, 2016, 8:23 p.m. UTC | #2
Jarod Wilson <jarod@redhat.com> wrote:
[...]
>This sounds suspiciously like the same problem Uwe was encountering[*] and
>attempting to solve. Uwe, can you give this patch a try?
>
>[*] = http://marc.info/?l=linux-netdev&m=144416122705850&w=2

	Agreed; my apologies for not Cc'ing you, Uwe.

	FWIW, the "Sometimes, the issue sorts itself out after a while"
Uwe reported is likely due to something causing a switch flood (TCN or
clearing of the MAC table); in that case, the ARP reply is sent to all
ports, so the "expected" curr_arp_slave receives the ARP reply.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
David Miller Feb. 7, 2016, 7:26 p.m. UTC | #3
From: Jay Vosburgh <jay.vosburgh@canonical.com>
Date: Tue, 02 Feb 2016 13:35:56 -0800

> 
> 	The current logic in bond_arp_rcv will accept an incoming ARP for
> validation if (a) the receiving slave is either "active" (which includes
> the currently active slave, or the current ARP slave) or, (b) there is a
> currently active slave, and it has received an ARP since it became active.
> For case (b), the receiving slave isn't the currently active slave, and is
> receiving the original broadcast ARP request, not an ARP reply from the
> target.
> 
> 	This logic can fail if there is no currently active slave.  In
> this situation, the ARP probe logic cycles through all slaves, assigning
> each in turn as the "current_arp_slave" for one arp_interval, then setting
> that one as "active," and sending an ARP probe from that slave.  The
> current logic expects the ARP reply to arrive on the sending
> current_arp_slave, however, due to switch FDB updating delays, the reply
> may be directed to another slave.
> 
> 	This can arise if the bonding slaves and switch are working, but
> the ARP target is not responding.  When the ARP target recovers, a
> condition may result wherein the ARP target host replies faster than the
> switch can update its forwarding table, causing each ARP reply to be sent
> to the previous current_arp_slave.  This will never pass the logic in
> bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
> 
> 	Some experimentation on a LAN shows ARP reply round trips in the
> 200 usec range, but my available switches never update their FDB in less
> than 4000 usec.
> 
> 	This patch changes the logic in bond_arp_rcv to additionally
> accept an ARP reply for validation on any slave if there is a current ARP
> slave and it sent an ARP probe during the previous arp_interval.
> 
> Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

I'm going to wait until we get some feedback from Uwe if you don't mind
Jay, it would be nice to know if it solves their problem too.
Jay Vosburgh Feb. 8, 2016, 6:03 a.m. UTC | #4
David Miller <davem@davemloft.net> wrote:
[...]
>> 	This patch changes the logic in bond_arp_rcv to additionally
>> accept an ARP reply for validation on any slave if there is a current ARP
>> slave and it sent an ARP probe during the previous arp_interval.
>> 
>> Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
>> Cc: Veaceslav Falico <vfalico@gmail.com>
>> Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
>> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>
>I'm going to wait until we get some feedback from Uwe if you don't mind
>Jay, it would be nice to know if it solves their problem too.

	Sounds good to me.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
David Miller Feb. 13, 2016, 10:55 a.m. UTC | #5
From: Jay Vosburgh <jay.vosburgh@canonical.com>
Date: Tue, 02 Feb 2016 13:35:56 -0800

> 
> 	The current logic in bond_arp_rcv will accept an incoming ARP for
> validation if (a) the receiving slave is either "active" (which includes
> the currently active slave, or the current ARP slave) or, (b) there is a
> currently active slave, and it has received an ARP since it became active.
> For case (b), the receiving slave isn't the currently active slave, and is
> receiving the original broadcast ARP request, not an ARP reply from the
> target.
> 
> 	This logic can fail if there is no currently active slave.  In
> this situation, the ARP probe logic cycles through all slaves, assigning
> each in turn as the "current_arp_slave" for one arp_interval, then setting
> that one as "active," and sending an ARP probe from that slave.  The
> current logic expects the ARP reply to arrive on the sending
> current_arp_slave, however, due to switch FDB updating delays, the reply
> may be directed to another slave.
> 
> 	This can arise if the bonding slaves and switch are working, but
> the ARP target is not responding.  When the ARP target recovers, a
> condition may result wherein the ARP target host replies faster than the
> switch can update its forwarding table, causing each ARP reply to be sent
> to the previous current_arp_slave.  This will never pass the logic in
> bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
> 
> 	Some experimentation on a LAN shows ARP reply round trips in the
> 200 usec range, but my available switches never update their FDB in less
> than 4000 usec.
> 
> 	This patch changes the logic in bond_arp_rcv to additionally
> accept an ARP reply for validation on any slave if there is a current ARP
> slave and it sent an ARP probe during the previous arp_interval.
> 
> Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
> Cc: Veaceslav Falico <vfalico@gmail.com>
> Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
> Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>

Applied, thanks Jay.
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b560558884..65a4107749df 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -214,6 +214,8 @@  static void bond_uninit(struct net_device *bond_dev);
 static struct rtnl_link_stats64 *bond_get_stats(struct net_device *bond_dev,
 						struct rtnl_link_stats64 *stats);
 static void bond_slave_arr_handler(struct work_struct *work);
+static bool bond_time_in_interval(struct bonding *bond, unsigned long last_act,
+				  int mod);
 
 /*---------------------------- General routines -----------------------------*/
 
@@ -2459,7 +2461,7 @@  int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 		 struct slave *slave)
 {
 	struct arphdr *arp = (struct arphdr *)skb->data;
-	struct slave *curr_active_slave;
+	struct slave *curr_active_slave, *curr_arp_slave;
 	unsigned char *arp_ptr;
 	__be32 sip, tip;
 	int alen, is_arp = skb->protocol == __cpu_to_be16(ETH_P_ARP);
@@ -2506,26 +2508,41 @@  int bond_arp_rcv(const struct sk_buff *skb, struct bonding *bond,
 		     &sip, &tip);
 
 	curr_active_slave = rcu_dereference(bond->curr_active_slave);
+	curr_arp_slave = rcu_dereference(bond->current_arp_slave);
 
-	/* Backup slaves won't see the ARP reply, but do come through
-	 * here for each ARP probe (so we swap the sip/tip to validate
-	 * the probe).  In a "redundant switch, common router" type of
-	 * configuration, the ARP probe will (hopefully) travel from
-	 * the active, through one switch, the router, then the other
-	 * switch before reaching the backup.
+	/* We 'trust' the received ARP enough to validate it if:
+	 *
+	 * (a) the slave receiving the ARP is active (which includes the
+	 * current ARP slave, if any), or
+	 *
+	 * (b) the receiving slave isn't active, but there is a currently
+	 * active slave and it received valid arp reply(s) after it became
+	 * the currently active slave, or
+	 *
+	 * (c) there is an ARP slave that sent an ARP during the prior ARP
+	 * interval, and we receive an ARP reply on any slave.  We accept
+	 * these because switch FDB update delays may deliver the ARP
+	 * reply to a slave other than the sender of the ARP request.
 	 *
-	 * We 'trust' the arp requests if there is an active slave and
-	 * it received valid arp reply(s) after it became active. This
-	 * is done to avoid endless looping when we can't reach the
+	 * Note: for (b), backup slaves are receiving the broadcast ARP
+	 * request, not a reply.  This request passes from the sending
+	 * slave through the L2 switch(es) to the receiving slave.  Since
+	 * this is checking the request, sip/tip are swapped for
+	 * validation.
+	 *
+	 * This is done to avoid endless looping when we can't reach the
 	 * arp_ip_target and fool ourselves with our own arp requests.
 	 */
-
 	if (bond_is_active_slave(slave))
 		bond_validate_arp(bond, slave, sip, tip);
 	else if (curr_active_slave &&
 		 time_after(slave_last_rx(bond, curr_active_slave),
 			    curr_active_slave->last_link_up))
 		bond_validate_arp(bond, slave, tip, sip);
+	else if (curr_arp_slave && (arp->ar_op == htons(ARPOP_REPLY)) &&
+		 bond_time_in_interval(bond,
+				       dev_trans_start(curr_arp_slave->dev), 1))
+		bond_validate_arp(bond, slave, sip, tip);
 
 out_unlock:
 	if (arp != (struct arphdr *)skb->data)