diff mbox

bonding: allow bond in mode balance-alb to work properly in bridge -try4

Message ID 20090326155205.GA28868@psychotron.englab.brq.redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Jiri Pirko March 26, 2009, 3:52 p.m. UTC
(resend, updated changelog, hook moved into skb_bond_should_drop,
skb_bond_should_drop ifdefed)

Hi all.

The problem is described in following bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=487763

Basically here's what's going on. In every mode, bonding interface uses the same
mac address for all enslaved devices (except fail_over_mac). Only balance-alb
will simultaneously use multiple MAC addresses across different slaves. When you
put this kind of bond device into a bridge it will only add one of mac adresses
into a hash list of mac addresses, say X. This mac address is marked as local.
But this bonding interface also has mac address Y. Now then packet arrives with
destination address Y, this address is not marked as local and the packed looks
like it needs to be forwarded. This packet is then lost which is wrong.

Notice that interfaces can be added and removed from bond while it is in bridge.

This patch solves the situation in the bonding without touching bridge code,
as Patrick suggested. For every incoming frame to bonding it searches the
destination address in slaves list and if any of slave addresses matches, it
rewrites the address in frame by the adress of bonding master. This ensures that
all frames comming thru the bonding in alb mode have the same address.

Jirka


Signed-off-by: Jiri Pirko <jpirko@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller March 27, 2009, 7:38 a.m. UTC | #1
From: Jiri Pirko <jpirko@redhat.com>
Date: Thu, 26 Mar 2009 16:52:06 +0100

> (resend, updated changelog, hook moved into skb_bond_should_drop,
> skb_bond_should_drop ifdefed)
> 
> Hi all.
> 
> The problem is described in following bugzilla:
> https://bugzilla.redhat.com/show_bug.cgi?id=487763
 ...
> This patch solves the situation in the bonding without touching bridge code,
> as Patrick suggested. For every incoming frame to bonding it searches the
> destination address in slaves list and if any of slave addresses matches, it
> rewrites the address in frame by the adress of bonding master. This ensures that
> all frames comming thru the bonding in alb mode have the same address.
> 
> Signed-off-by: Jiri Pirko <jpirko@redhat.com>


I don't like the hook, but if that's how it's best done....

Patrick, please review this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 27, 2009, 7:46 a.m. UTC | #2
Fri, Mar 27, 2009 at 08:38:19AM CET, davem@davemloft.net wrote:
>From: Jiri Pirko <jpirko@redhat.com>
>Date: Thu, 26 Mar 2009 16:52:06 +0100
>
>> (resend, updated changelog, hook moved into skb_bond_should_drop,
>> skb_bond_should_drop ifdefed)
>> 
>> Hi all.
>> 
>> The problem is described in following bugzilla:
>> https://bugzilla.redhat.com/show_bug.cgi?id=487763
> ...
>> This patch solves the situation in the bonding without touching bridge code,
>> as Patrick suggested. For every incoming frame to bonding it searches the
>> destination address in slaves list and if any of slave addresses matches, it
>> rewrites the address in frame by the adress of bonding master. This ensures that
>> all frames comming thru the bonding in alb mode have the same address.
>> 
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>
>
>I don't like the hook, but if that's how it's best done....

Yes I agree with you, but I thing that for now it's the best way to do this. I
picked this solution out of 3 that I had in mind and this is the lesser evil :)
If anyone have any other solution please speak up.

>
>Patrick, please review this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 27, 2009, 7:53 a.m. UTC | #3
David Miller wrote:
> From: Jiri Pirko <jpirko@redhat.com>
> Date: Thu, 26 Mar 2009 16:52:06 +0100
> 
>> (resend, updated changelog, hook moved into skb_bond_should_drop,
>> skb_bond_should_drop ifdefed)
>>
>> Hi all.
>>
>> The problem is described in following bugzilla:
>> https://bugzilla.redhat.com/show_bug.cgi?id=487763
>  ...
>> This patch solves the situation in the bonding without touching bridge code,
>> as Patrick suggested. For every incoming frame to bonding it searches the
>> destination address in slaves list and if any of slave addresses matches, it
>> rewrites the address in frame by the adress of bonding master. This ensures that
>> all frames comming thru the bonding in alb mode have the same address.
>>
>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
> 
> 
> I don't like the hook, but if that's how it's best done....
> 
> Patrick, please review this.

Me neither, but I don't think this approach can be done without the
hook. While I still find it questionable whether this mode really
needs to be supported for a bridge at all, an alternative approach
would be to have bonding add FDB entries for all secondary MACs to
make bridging treat them as local.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 27, 2009, 8:41 a.m. UTC | #4
Fri, Mar 27, 2009 at 08:53:13AM CET, kaber@trash.net wrote:
> David Miller wrote:
>> From: Jiri Pirko <jpirko@redhat.com>
>> Date: Thu, 26 Mar 2009 16:52:06 +0100
>>
>>> (resend, updated changelog, hook moved into skb_bond_should_drop,
>>> skb_bond_should_drop ifdefed)
>>>
>>> Hi all.
>>>
>>> The problem is described in following bugzilla:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=487763
>>  ...
>>> This patch solves the situation in the bonding without touching bridge code,
>>> as Patrick suggested. For every incoming frame to bonding it searches the
>>> destination address in slaves list and if any of slave addresses matches, it
>>> rewrites the address in frame by the adress of bonding master. This ensures that
>>> all frames comming thru the bonding in alb mode have the same address.
>>>
>>> Signed-off-by: Jiri Pirko <jpirko@redhat.com>
>>
>>
>> I don't like the hook, but if that's how it's best done....
>>
>> Patrick, please review this.
>
> Me neither, but I don't think this approach can be done without the
> hook. While I still find it questionable whether this mode really
> needs to be supported for a bridge at all

Well there is I think nothing unusual in this net scheme. And by for example
the increasing setups with kvm/bridging it will be needed more and more.

> , an alternative approach
> would be to have bonding add FDB entries for all secondary MACs to
> make bridging treat them as local.

Yes - that is the clear way. But there's not really straihtforward way to do
this. The clear approach would be to extend struct net_device for list of these
mac addresses and let the drivers (binding) fill it and bridge to process it.
But I don't know.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 27, 2009, 8:55 a.m. UTC | #5
Jiri Pirko wrote:
> Fri, Mar 27, 2009 at 08:53:13AM CET, kaber@trash.net wrote:
>> >
>> > Me neither, but I don't think this approach can be done without the
>> > hook. While I still find it questionable whether this mode really
>> > needs to be supported for a bridge at all
> 
> Well there is I think nothing unusual in this net scheme. And by for example
> the increasing setups with kvm/bridging it will be needed more and more.

Mangling ARP packets for load-balancing purposes seems quite unusual.

>> , an alternative approach
>> would be to have bonding add FDB entries for all secondary MACs to
>> make bridging treat them as local.
> 
> Yes - that is the clear way. But there's not really straihtforward way to do
> this. The clear approach would be to extend struct net_device for list of these
> mac addresses and let the drivers (binding) fill it and bridge to process it.
> But I don't know.

We have a list of secondary unicast addresses, but that might not
be suitable in this case since the addresses are (mostly) intended
not to be visible to the stack if I understood correctly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 27, 2009, 9:47 a.m. UTC | #6
Fri, Mar 27, 2009 at 09:55:39AM CET, kaber@trash.net wrote:
> Jiri Pirko wrote:
>> Fri, Mar 27, 2009 at 08:53:13AM CET, kaber@trash.net wrote:
>>> >
>>> > Me neither, but I don't think this approach can be done without the
>>> > hook. While I still find it questionable whether this mode really
>>> > needs to be supported for a bridge at all
>>
>> Well there is I think nothing unusual in this net scheme. And by for example
>> the increasing setups with kvm/bridging it will be needed more and more.
>
> Mangling ARP packets for load-balancing purposes seems quite unusual.

Well, there are many unusual things, that do not imply that they should not be
supported...

>>> , an alternative approach
>>> would be to have bonding add FDB entries for all secondary MACs to
>>> make bridging treat them as local.
>>
>> Yes - that is the clear way. But there's not really straihtforward way to do
>> this. The clear approach would be to extend struct net_device for list of these
>> mac addresses and let the drivers (binding) fill it and bridge to process it.
>> But I don't know.
>
> We have a list of secondary unicast addresses, but that might not
> be suitable in this case since the addresses are (mostly) intended
> not to be visible to the stack if I understood correctly.

I agree this list is not suitable for this - it's used for different purpose and
I think it would be not wise to mix it with what we want...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 29, 2009, 8:53 p.m. UTC | #7
From: Patrick McHardy <kaber@trash.net>
Date: Fri, 27 Mar 2009 08:53:13 +0100

> David Miller wrote:
> > I don't like the hook, but if that's how it's best done....
> > Patrick, please review this.
> 
> Me neither, but I don't think this approach can be done without the
> hook. While I still find it questionable whether this mode really
> needs to be supported for a bridge at all, an alternative approach
> would be to have bonding add FDB entries for all secondary MACs to
> make bridging treat them as local.

Do you guys foresee any possibility of an alternative implementation
any time soon?

Otherwise we're just stalling by not putting something into the tree,
and as far as I can tell this patch here might as well be it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 30, 2009, 12:04 p.m. UTC | #8
David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Fri, 27 Mar 2009 08:53:13 +0100
> 
>> ... an alternative approach
>> would be to have bonding add FDB entries for all secondary MACs to
>> make bridging treat them as local.
> 
> Do you guys foresee any possibility of an alternative implementation
> any time soon?
> 
> Otherwise we're just stalling by not putting something into the tree,
> and as far as I can tell this patch here might as well be it.

Adding bridge FDB entries seems like the best fix. It might
need some minor ugliness to avoid new dependencies between
bonding and bridging, but it definitely beats having new hooks
in the core in my opinion.

But I have no idea whether Jiri is actually implementing this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 30, 2009, 12:40 p.m. UTC | #9
Mon, Mar 30, 2009 at 02:04:25PM CEST, kaber@trash.net wrote:
> David Miller wrote:
>> From: Patrick McHardy <kaber@trash.net>
>> Date: Fri, 27 Mar 2009 08:53:13 +0100
>>
>>> ... an alternative approach
>>> would be to have bonding add FDB entries for all secondary MACs to
>>> make bridging treat them as local.
>>
>> Do you guys foresee any possibility of an alternative implementation
>> any time soon?
>>
>> Otherwise we're just stalling by not putting something into the tree,
>> and as far as I can tell this patch here might as well be it.
>
> Adding bridge FDB entries seems like the best fix. It might
> need some minor ugliness to avoid new dependencies between
> bonding and bridging, but it definitely beats having new hooks
> in the core in my opinion.

Agree with this.
>
> But I have no idea whether Jiri is actually implementing this.

Currently I'm thinking the way. What I have on mind:
I would like to add a list into struct net_device to contain all mac addresses
of the device. I would also like to use similar interface to handle them as
currently is for uc_list and mc_list. However I do not like that these lists are
not using standard list_head but they are propriate lists only for this purpose.
I'm thinking about converting them to use list_head first. Or maybe ignore them
and do the new list for macs in parallel?

Then we can fill this list with macs in bonding driver and let bridge check it
and make fdb entries.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 30, 2009, 12:47 p.m. UTC | #10
Jiri Pirko wrote:
> Currently I'm thinking the way. What I have on mind:
> I would like to add a list into struct net_device to contain all mac addresses
> of the device. I would also like to use similar interface to handle them as
> currently is for uc_list and mc_list. However I do not like that these lists are
> not using standard list_head but they are propriate lists only for this purpose.
> I'm thinking about converting them to use list_head first. Or maybe ignore them
> and do the new list for macs in parallel?

Using list_heads in the address lists would require some pretty large
amount of work since you'd need to convert all the drivers. I'm all
in favour of doing this, but I wouldn't make the fix depend on that
work.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko March 30, 2009, 12:52 p.m. UTC | #11
Mon, Mar 30, 2009 at 02:47:59PM CEST, kaber@trash.net wrote:
> Jiri Pirko wrote:
>> Currently I'm thinking the way. What I have on mind:
>> I would like to add a list into struct net_device to contain all mac addresses
>> of the device. I would also like to use similar interface to handle them as
>> currently is for uc_list and mc_list. However I do not like that these lists are
>> not using standard list_head but they are propriate lists only for this purpose.
>> I'm thinking about converting them to use list_head first. Or maybe ignore them
>> and do the new list for macs in parallel?
>
> Using list_heads in the address lists would require some pretty large
> amount of work since you'd need to convert all the drivers. 

Yes, I'm aware of it...
> I'm all
> in favour of doing this, but I wouldn't make the fix depend on that
> work.

ok so you are suggesting to use the current list struct?
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy March 30, 2009, 12:58 p.m. UTC | #12
Jiri Pirko wrote:
> Mon, Mar 30, 2009 at 02:47:59PM CEST, kaber@trash.net wrote:
>> Jiri Pirko wrote:
>>> Currently I'm thinking the way. What I have on mind:
>>> I would like to add a list into struct net_device to contain all mac addresses
>>> of the device. I would also like to use similar interface to handle them as
>>> currently is for uc_list and mc_list. However I do not like that these lists are
>>> not using standard list_head but they are propriate lists only for this purpose.
>>> I'm thinking about converting them to use list_head first. Or maybe ignore them
>>> and do the new list for macs in parallel?
>> Using list_heads in the address lists would require some pretty large
>> amount of work since you'd need to convert all the drivers. 
> 
> Yes, I'm aware of it...
>> I'm all
>> in favour of doing this, but I wouldn't make the fix depend on that
>> work.
> 
> ok so you are suggesting to use the current list struct?

Whatever will make this easier :) You could of course already add the
new structure and use it for your new list and do the conversion of
the existing structures on top of that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 27fb7f5..b973ede 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1762,6 +1762,25 @@  int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 	return 0;
 }
 
+void bond_alb_change_dest(struct sk_buff *skb, struct net_device *bond_dev)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+	unsigned char *dest = eth_hdr(skb)->h_dest;
+	struct slave *slave;
+	int i;
+
+	if (!compare_ether_addr_64bits(dest, bond_dev->dev_addr))
+		return;
+	read_lock(&bond->lock);
+	bond_for_each_slave(bond, slave, i) {
+		if (!compare_ether_addr_64bits(slave->dev->dev_addr, dest)) {
+			memcpy(dest, bond_dev->dev_addr, ETH_ALEN);
+			break;
+		}
+	}
+	read_unlock(&bond->lock);
+}
+
 void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id)
 {
 	if (bond->alb_info.current_alb_vlan &&
diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
index 50968f8..4924dd7 100644
--- a/drivers/net/bonding/bond_alb.h
+++ b/drivers/net/bonding/bond_alb.h
@@ -127,6 +127,7 @@  void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
 int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
 void bond_alb_monitor(struct work_struct *);
 int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
+void bond_alb_change_dest(struct sk_buff *skb, struct net_device *bond_dev);
 void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
 #endif /* __BOND_ALB_H__ */
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3d76686..7c7cb81 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4294,6 +4294,18 @@  unwind:
 	return res;
 }
 
+/*
+ * Called via bond_change_dest_hook.
+ * note: already called with rcu_read_lock (preempt_disabled)
+ */
+void bond_change_dest(struct sk_buff *skb, struct net_device *bond_dev)
+{
+	struct bonding *bond = netdev_priv(bond_dev);
+
+	if (bond->params.mode == BOND_MODE_ALB)
+		bond_alb_change_dest(skb, bond_dev);
+}
+
 static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev)
 {
 	struct bonding *bond = netdev_priv(bond_dev);
@@ -5243,6 +5255,8 @@  static int __init bonding_init(void)
 	register_inetaddr_notifier(&bond_inetaddr_notifier);
 	bond_register_ipv6_notifier();
 
+	bond_change_dest_hook = bond_change_dest;
+
 	goto out;
 err:
 	list_for_each_entry(bond, &bond_dev_list, bond_list) {
@@ -5266,6 +5280,8 @@  static void __exit bonding_exit(void)
 	unregister_inetaddr_notifier(&bond_inetaddr_notifier);
 	bond_unregister_ipv6_notifier();
 
+	bond_change_dest_hook = NULL;
+
 	bond_destroy_sysfs();
 
 	rtnl_lock();
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index ca849d2..7159483 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -375,5 +375,8 @@  static inline void bond_unregister_ipv6_notifier(void)
 }
 #endif
 
+extern void (*bond_change_dest_hook)(struct sk_buff *skb,
+				     struct net_device *master);
+
 #endif /* _LINUX_BONDING_H */
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6593667..7af6857 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1860,6 +1860,10 @@  static inline void netif_set_gso_max_size(struct net_device *dev,
 	dev->gso_max_size = size;
 }
 
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
+extern void (*bond_change_dest_hook)(struct sk_buff *skb,
+				     struct net_device *master);
+
 /* On bonding slaves other than the currently active slave, suppress
  * duplicates except for 802.3ad ETH_P_SLOW, alb non-mcast/bcast, and
  * ARP on active-backup slaves with arp_validate enabled.
@@ -1876,22 +1880,31 @@  static inline int skb_bond_should_drop(struct sk_buff *skb)
 		if (dev->priv_flags & IFF_SLAVE_INACTIVE) {
 			if ((dev->priv_flags & IFF_SLAVE_NEEDARP) &&
 			    skb->protocol == __constant_htons(ETH_P_ARP))
-				return 0;
+				goto dont_drop;
 
 			if (master->priv_flags & IFF_MASTER_ALB) {
 				if (skb->pkt_type != PACKET_BROADCAST &&
 				    skb->pkt_type != PACKET_MULTICAST)
-					return 0;
+					goto dont_drop;
 			}
 			if (master->priv_flags & IFF_MASTER_8023AD &&
 			    skb->protocol == __constant_htons(ETH_P_SLOW))
-				return 0;
+				goto dont_drop;
 
 			return 1;
 		}
+dont_drop:
+		bond_change_dest_hook(skb, master);
 	}
+
+	return 0;
+}
+#else
+static inline int skb_bond_should_drop(struct sk_buff *skb)
+{
 	return 0;
 }
+#endif
 
 extern struct pernet_operations __net_initdata loopback_net_ops;
 #endif /* __KERNEL__ */
diff --git a/net/core/dev.c b/net/core/dev.c
index e3fe5c7..d9b758b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2061,6 +2061,12 @@  static inline int deliver_skb(struct sk_buff *skb,
 	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
 }
 
+#if defined(CONFIG_BONDING) || defined(CONFIG_BONDING_MODULE)
+void (*bond_change_dest_hook)(struct sk_buff *skb,
+			      struct net_device *master) __read_mostly;
+EXPORT_SYMBOL(bond_change_dest_hook);
+#endif
+
 #if defined(CONFIG_BRIDGE) || defined (CONFIG_BRIDGE_MODULE)
 /* These hooks defined here for ATM */
 struct net_bridge;