diff mbox

[net] bonding: fix arp monitoring with vlan slaves

Message ID 1375461665-4186-1-git-send-email-nikolay@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Nikolay Aleksandrov Aug. 2, 2013, 4:41 p.m. UTC
From: Nikolay Aleksandrov <Nikolay Aleksandrov nikolay@redhat.com>

When arp monitoring is enabled the bonding relies on slaves'
dev_trans_start() value to check if the slave link is up or not, but for
8021q devices that value is either stale or 0, and can't be used. So use
the 8021q's underlying device value.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
---
 drivers/net/bonding/bond_main.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

David Miller Aug. 2, 2013, 10:32 p.m. UTC | #1
From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Fri,  2 Aug 2013 18:41:05 +0200

> From: Nikolay Aleksandrov <Nikolay Aleksandrov nikolay@redhat.com>
> 
> When arp monitoring is enabled the bonding relies on slaves'
> dev_trans_start() value to check if the slave link is up or not, but for
> 8021q devices that value is either stale or 0, and can't be used. So use
> the 8021q's underlying device value.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Handling this specially in bonding isn't really ideal.

Please either hide this detail in dev_trans_start(), or (preferrably)
have vlan_dev_hard_start_xmit() set the trans_start timestamp
properly thus making this just work for everything.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Aleksandrov Aug. 2, 2013, 10:40 p.m. UTC | #2
On 08/03/2013 12:32 AM, David Miller wrote:
> From: Nikolay Aleksandrov <nikolay@redhat.com>
> Date: Fri,  2 Aug 2013 18:41:05 +0200
> 
>> From: Nikolay Aleksandrov <Nikolay Aleksandrov nikolay@redhat.com>
>>
>> When arp monitoring is enabled the bonding relies on slaves'
>> dev_trans_start() value to check if the slave link is up or not, but for
>> 8021q devices that value is either stale or 0, and can't be used. So use
>> the 8021q's underlying device value.
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
> 
> Handling this specially in bonding isn't really ideal.
> 
Indeed, it's not really a bonding problem.

> Please either hide this detail in dev_trans_start(), or (preferrably)
> have vlan_dev_hard_start_xmit() set the trans_start timestamp
> properly thus making this just work for everything.
> 
> Thanks.
> 
Yep, I prefer option 2 as well. I will submit a patch tomorrow, thank you
for the suggestions.

Nik
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 2, 2013, 11:17 p.m. UTC | #3
On Fri, 2013-08-02 at 15:32 -0700, David Miller wrote:

> Handling this specially in bonding isn't really ideal.
> 
> Please either hide this detail in dev_trans_start(), or (preferrably)
> have vlan_dev_hard_start_xmit() set the trans_start timestamp
> properly thus making this just work for everything.

vlan is LLTX, so setting the timestamp would incur false sharing on
multiqueue.

But it's true we need a helper, because many callers do

if (dev->priv_flags & IFF_802_1Q_VLAN)
       dev = vlan_dev_real_dev(dev);

or the slighly better

if (is_vlan_dev(dev))
   dev = vlan_dev_real_dev(dev);



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Aleksandrov Aug. 2, 2013, 11:29 p.m. UTC | #4
On 08/03/2013 01:17 AM, Eric Dumazet wrote:
> On Fri, 2013-08-02 at 15:32 -0700, David Miller wrote:
> 
>> Handling this specially in bonding isn't really ideal.
>>
>> Please either hide this detail in dev_trans_start(), or (preferrably)
>> have vlan_dev_hard_start_xmit() set the trans_start timestamp
>> properly thus making this just work for everything.
> 
> vlan is LLTX, so setting the timestamp would incur false sharing on
> multiqueue.
> 
Yeah, this statement actually explains a lot for me, thanks :-)
I knew it was because of the LLTX, but I was wondering about the possible
reasons for the xmit_lock_owner check.
So basically the arp monitoring (or any dev_trans_start code) won't work
with LLTX devices because they don't get their trans_start updated (not the
txq trans_start nor the dev->trans_start), is this correct ?
But what if the txqs get bound to a particular CPU, then the txq
trans_start is okay to be updated I suppose.

Nik
> But it's true we need a helper, because many callers do
> 
> if (dev->priv_flags & IFF_802_1Q_VLAN)
>        dev = vlan_dev_real_dev(dev);
> 
> or the slighly better
> 
> if (is_vlan_dev(dev))
>    dev = vlan_dev_real_dev(dev);
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Aug. 2, 2013, 11:59 p.m. UTC | #5
Nikolay Aleksandrov <nikolay@redhat.com> wrote:

>On 08/03/2013 01:17 AM, Eric Dumazet wrote:
>> On Fri, 2013-08-02 at 15:32 -0700, David Miller wrote:
>> 
>>> Handling this specially in bonding isn't really ideal.
>>>
>>> Please either hide this detail in dev_trans_start(), or (preferrably)
>>> have vlan_dev_hard_start_xmit() set the trans_start timestamp
>>> properly thus making this just work for everything.
>> 
>> vlan is LLTX, so setting the timestamp would incur false sharing on
>> multiqueue.
>> 
>Yeah, this statement actually explains a lot for me, thanks :-)
>I knew it was because of the LLTX, but I was wondering about the possible
>reasons for the xmit_lock_owner check.
>So basically the arp monitoring (or any dev_trans_start code) won't work
>with LLTX devices because they don't get their trans_start updated (not the
>txq trans_start nor the dev->trans_start), is this correct ?

	I think what Eric means is that it'll be a performance problem,
because the cache line with the trans_start field will be thrashed
between CPUs as updates compete with each other or inspection from
dev_trans_start.  I suspect it will work from a functional point of view
(unless I'm missing something).

	As a practical matter, it looks like bonding is at present the
only caller of dev_trans_start that passes either (a) a VLAN device, or
(b) a device other than itself (most drivers seem to use it for transmit
timeout detection on their own device).

	Having software devices set trans_start seems unnecessary
(bonding doesn't set it, either), since they'll generally have a
hardware device underneath with a real trans_start value.  It might be
hard to hunt down for tun, though.

>But what if the txqs get bound to a particular CPU, then the txq
>trans_start is okay to be updated I suppose.
>
>Nik
>> But it's true we need a helper, because many callers do
>> 
>> if (dev->priv_flags & IFF_802_1Q_VLAN)
>>        dev = vlan_dev_real_dev(dev);
>> 
>> or the slighly better
>> 
>> if (is_vlan_dev(dev))
>>    dev = vlan_dev_real_dev(dev);

	For just the bonding dev_trans_start() for a VLAN case, one of
the above in dev_trans_start() ought to be sufficient.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 3, 2013, 12:03 a.m. UTC | #6
On Sat, 2013-08-03 at 01:29 +0200, Nikolay Aleksandrov wrote:

> I knew it was because of the LLTX, but I was wondering about the possible
> reasons for the xmit_lock_owner check.
> So basically the arp monitoring (or any dev_trans_start code) won't work
> with LLTX devices because they don't get their trans_start updated (not the
> txq trans_start nor the dev->trans_start), is this correct ?

Nope : LLTX devices are supposed to update their own dev->trans_start
I added for these case a special comment as in :

drivers/net/ethernet/atheros/atl1e/atl1e_main.c:1883:   netdev->trans_start = jiffies; /* NETIF_F_LLTX driver :( */

trans_start is a txq property, and vlan devices have a single txq.

Updating the vlandev->trans_start or the vlantxq->trans_start would be a
performance killer on MQ ethernet.

And frankly, as this trans_start is really seldom queried, it makes no
sense to set it on fast path on the vlan device, if its properly done on
the real device anyway.

> But what if the txqs get bound to a particular CPU, then the txq
> trans_start is okay to be updated I suppose.

Not sure what you are saying. vlan xmit can be called from any cpus.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Aleksandrov Aug. 3, 2013, 12:21 a.m. UTC | #7
On 08/03/2013 02:03 AM, Eric Dumazet wrote:
> On Sat, 2013-08-03 at 01:29 +0200, Nikolay Aleksandrov wrote:
> 
>> I knew it was because of the LLTX, but I was wondering about the possible
>> reasons for the xmit_lock_owner check.
>> So basically the arp monitoring (or any dev_trans_start code) won't work
>> with LLTX devices because they don't get their trans_start updated (not the
>> txq trans_start nor the dev->trans_start), is this correct ?
> 
> Nope : LLTX devices are supposed to update their own dev->trans_start
> I added for these case a special comment as in :
> 
> drivers/net/ethernet/atheros/atl1e/atl1e_main.c:1883:   netdev->trans_start = jiffies; /* NETIF_F_LLTX driver :( */
> 
Ah, didn't know about that, so there could be an LLTX device with proper
trans_start if it updates it itself. Fair enough.

> trans_start is a txq property, and vlan devices have a single txq.
> 
> Updating the vlandev->trans_start or the vlantxq->trans_start would be a
> performance killer on MQ ethernet.
> 
Yeah, that part I got, that's why I said it explains a lot for me earlier :-)

> And frankly, as this trans_start is really seldom queried, it makes no
> sense to set it on fast path on the vlan device, if its properly done on
> the real device anyway.
> 
>> But what if the txqs get bound to a particular CPU, then the txq
>> trans_start is okay to be updated I suppose.
> 
> Not sure what you are saying. vlan xmit can be called from any cpus.
> 
Never mind, I didn't notice that the 8021q dev has a single txq as you said.

Thanks for taking the time to explain all this.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 07f257d4..5f2bad3 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -665,6 +665,17 @@  static int bond_check_dev_link(struct bonding *bond,
 	return reporting ? -1 : BMSR_LSTATUS;
 }
 
+/* For 8021q devices dev_trans_start() returns 0 or a stale value, so use the
+ * 8021q's underlying device value
+ */
+static unsigned long bond_dev_trans_start(struct net_device *dev)
+{
+	while (dev->priv_flags & IFF_802_1Q_VLAN)
+		dev = vlan_dev_real_dev(dev);
+
+	return dev_trans_start(dev);
+}
+
 /*----------------------------- Multicast list ------------------------------*/
 
 /*
@@ -2750,7 +2761,7 @@  void bond_loadbalance_arp_mon(struct work_struct *work)
 	 *       so it can wait
 	 */
 	bond_for_each_slave(bond, slave, i) {
-		unsigned long trans_start = dev_trans_start(slave->dev);
+		unsigned long trans_start = bond_dev_trans_start(slave->dev);
 
 		if (slave->link != BOND_LINK_UP) {
 			if (time_in_range(jiffies,
@@ -2912,7 +2923,7 @@  static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
 		 * - (more than 2*delta since receive AND
 		 *    the bond has an IP address)
 		 */
-		trans_start = dev_trans_start(slave->dev);
+		trans_start = bond_dev_trans_start(slave->dev);
 		if (bond_is_active_slave(slave) &&
 		    (!time_in_range(jiffies,
 			trans_start - delta_in_ticks,
@@ -2947,7 +2958,7 @@  static void bond_ab_arp_commit(struct bonding *bond, int delta_in_ticks)
 			continue;
 
 		case BOND_LINK_UP:
-			trans_start = dev_trans_start(slave->dev);
+			trans_start = bond_dev_trans_start(slave->dev);
 			if ((!bond->curr_active_slave &&
 			     time_in_range(jiffies,
 					   trans_start - delta_in_ticks,