diff mbox

bonding: send IPv6 neighbor advertisement on failover

Message ID 48EC091D.7080207@hp.com
State Changes Requested, archived
Delegated to: Jeff Garzik
Headers show

Commit Message

Brian Haley Oct. 8, 2008, 1:13 a.m. UTC
This patch adds better IPv6 failover support for bonding devices, 
especially when in active-backup mode and there are only IPv6 addresses 
configured, as reported by Alex Sidorenko.

- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
   IPv6-specific routines.  Both regular bonds and VLANs over bonds
   are supported.

- Adds a new tunable, num_unsol_na, to limit the number of unsolicited
   IPv6 Neighbor Advertisements that are sent on a failover event.
   Default is 1.

- Creates two new IPv6 neighbor discovery functions:

   ndisc_build_skb()
   ndisc_send_skb()

   These were required to support VLANs since we have to be able to
   add the VLAN id to the skb since ndisc_send_na() and friends
   shouldn't be asked to do this.  These two routines are basically
   __ndisc_send() split into two pieces, in a slightly different order.

- Updates Documentation/networking/bonding.txt and bumps the rev of bond
   support to 3.4.0.

On failover, this new code will generate one packet:

- An unsolicited IPv6 Neighbor Advertisement, which helps the switch
   learn that the address has moved to the new slave.

Testing has shown that sending just the NA results in pretty good 
behavior when in active-back mode, I saw no lost ping packets for example.

-Brian

Signed-off-by: Brian Haley <brian.haley@hp.com>
---

Comments

Simon Horman Oct. 8, 2008, 7:26 a.m. UTC | #1
On Tue, Oct 07, 2008 at 09:13:01PM -0400, Brian Haley wrote:
> This patch adds better IPv6 failover support for bonding devices,  
> especially when in active-backup mode and there are only IPv6 addresses  
> configured, as reported by Alex Sidorenko.
>
> - Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
>   IPv6-specific routines.  Both regular bonds and VLANs over bonds
>   are supported.
>
> - Adds a new tunable, num_unsol_na, to limit the number of unsolicited
>   IPv6 Neighbor Advertisements that are sent on a failover event.
>   Default is 1.
>
> - Creates two new IPv6 neighbor discovery functions:
>
>   ndisc_build_skb()
>   ndisc_send_skb()
>
>   These were required to support VLANs since we have to be able to
>   add the VLAN id to the skb since ndisc_send_na() and friends
>   shouldn't be asked to do this.  These two routines are basically
>   __ndisc_send() split into two pieces, in a slightly different order.
>
> - Updates Documentation/networking/bonding.txt and bumps the rev of bond
>   support to 3.4.0.
>
> On failover, this new code will generate one packet:
>
> - An unsolicited IPv6 Neighbor Advertisement, which helps the switch
>   learn that the address has moved to the new slave.
>
> Testing has shown that sending just the NA results in pretty good  
> behavior when in active-back mode, I saw no lost ping packets for 
> example.
>
> -Brian
>
> Signed-off-by: Brian Haley <brian.haley@hp.com>

The Kconfig / build portions of this look fine to me.
David Stevens Oct. 8, 2008, 5:40 p.m. UTC | #2
Brian,
        I'll make the same comment I did in the earlier version,
which is I don't think we need a new control for the count. Instead,
it should use the count for unsolicited NA's we send when adding
an address. It is the same case, really, and I, FWIW, would rather
we didn't have any more sysctl's than we need.

                                                +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Oct. 8, 2008, 6:08 p.m. UTC | #3
David Stevens wrote:
> Brian,
>         I'll make the same comment I did in the earlier version,
> which is I don't think we need a new control for the count. Instead,
> it should use the count for unsolicited NA's we send when adding
> an address. It is the same case, really, and I, FWIW, would rather
> we didn't have any more sysctl's than we need.
> 
>                                                 +-DLS
> 

Are you referring to 'dad_transmits' because I don't see any sysctls
that control the number of unsolicited NAs.  In fact, in normal operations
we don't send any unsolicited NAs at all.

This functionality effectively duplicates the gratuitous arps controls on the
bond.

-vlad

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Oct. 8, 2008, 6:15 p.m. UTC | #4
Hi Brian

A few comments.

>
> diff --git a/Documentation/networking/bonding.txt
b/Documentation/networking/bonding.txt
> index 688dfe1..4025565 100644
> --- a/Documentation/networking/bonding.txt
> +++ b/Documentation/networking/bonding.txt
> @@ -551,6 +551,16 @@ num_grat_arp
>  	affects only the active-backup mode.  This option was added for
>  	bonding version 3.3.0.
>
> +num_unsol_na
> +
> +	Specifies the number of unsolicited IPv6 Neighbor Advertisements
> +	to be issued after a failover event.  One unsolicited NA is issued
> +	immediately after the failover.
> +
> +	The valid range is 0 - 255; the default value is 1.  This option
> +	affects only the active-backup mode.  This option was added for
> +	bonding version 3.4.0.
> +
>  primary
>
>  	A string (eth0, eth2, etc) specifying which slave is the
> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
> index 2d6a060..37f55e1 100644
> --- a/drivers/net/Kconfig
> +++ b/drivers/net/Kconfig
> @@ -61,6 +61,7 @@ config DUMMY
>  config BONDING
>  	tristate "Bonding driver support"
>  	depends on INET
> +	depends on IPV6 || IPV6=n
>  	---help---
>  	  Say 'Y' or 'M' if you wish to be able to 'bond' multiple Ethernet
>  	  Channels together. This is called 'Etherchannel' by Cisco,
> diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
> index 5cdae2b..6f9c6fa 100644
> --- a/drivers/net/bonding/Makefile
> +++ b/drivers/net/bonding/Makefile
> @@ -6,3 +6,6 @@ obj-$(CONFIG_BONDING) += bonding.o
>
>  bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o
>
> +ipv6-$(subst m,y,$(CONFIG_IPV6)) += bond_ipv6.o
> +bonding-objs += $(ipv6-y)
> +
> diff --git a/drivers/net/bonding/bond_ipv6.c b/drivers/net/bonding/bond_ipv6.c
> new file mode 100644
> index 0000000..b6b0351
> --- /dev/null
> +++ b/drivers/net/bonding/bond_ipv6.c
> @@ -0,0 +1,208 @@
> +/*
> + * Copyright(c) 2008 Hewlett-Packard Development Company, L.P.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the
> + * Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> + * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> + * for more details.
> + *
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, write to the Free Software Foundation, Inc.,
> + * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
> + *
> + * The full GNU General Public License is included in this distribution in the
> + * file called LICENSE.
> + *
> + */
> +
> +//#define BONDING_DEBUG 1
> +
> +#include <linux/types.h>
> +#include <linux/if_vlan.h>
> +#include <net/ipv6.h>
> +#include <net/ndisc.h>
> +#include <net/addrconf.h>
> +#include "bonding.h"
> +
> +/*
> + * Assign bond->master_ipv6 to the next IPv6 address in the list, or
> + * zero it out if there are none.
> + */
> +static void bond_glean_dev_ipv6(struct net_device *dev, struct in6_addr *addr)
> +{
> +	struct inet6_dev *idev;
> +	struct inet6_ifaddr *ifa;
> +
> +	if (!dev)
> +		return;
> +
> +	idev = in6_dev_get(dev);
> +	if (!idev)
> +		return;
> +
> +	ifa = idev->addr_list;
> +	if (ifa)
> +		ipv6_addr_copy(addr, &ifa->addr);
> +	else
> +		ipv6_addr_set(addr, 0, 0, 0, 0);
> +
> +	in6_dev_put(idev);
> +}
> +
> +static void bond_na_send(struct net_device *slave_dev,
> +			 struct in6_addr *daddr,
> +			 int router,
> +			 unsigned short vlan_id)
> +{
> +	struct in6_addr mcaddr;
> +	struct icmp6hdr icmp6h = {
> +		.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT,
> +	};
> +	struct sk_buff *skb;
> +
> +	icmp6h.icmp6_router = router;
> +	icmp6h.icmp6_solicited = 0;
> +	icmp6h.icmp6_override = 1;
> +
> +	addrconf_addr_solict_mult(daddr, &mcaddr);
> +
> +	dprintk("ipv6 na on slave %s: dest %s, src %s\n" NIP6_FMT NIP6_FMT,
> +	       slave->name, NIP6(&mcaddr), NIP6(daddr));
> +
> +	skb = ndisc_build_skb(slave_dev, &mcaddr, daddr, &icmp6h, daddr,
> +			      ND_OPT_TARGET_LL_ADDR);
> +
> +	if (!skb) {
> +		printk(KERN_ERR DRV_NAME ": NA packet allocation failed\n");
> +		return;
> +	}
> +
> +	if (vlan_id) {
> +		skb = vlan_put_tag(skb, vlan_id);
> +		if (!skb) {
> +			printk(KERN_ERR DRV_NAME ": failed to insert VLAN tag\n");
> +			return;
> +		}
> +	}
> +
> +	ndisc_send_skb(skb, slave_dev, NULL, &mcaddr, daddr, &icmp6h);
> +}
> +
> +/*
> + * Kick out an unsolicited Neighbor Advertisement for an IPv6 address on
> + * the bonding master.  This will help the switch learn our address
> + * if in active-backup mode.
> + *
> + * Caller must hold curr_slave_lock for read or better
> + */
> +void bond_send_unsolicited_na(struct bonding *bond)
> +{
> +	struct slave *slave = bond->curr_active_slave;
> +	struct vlan_entry *vlan;
> +	struct inet6_dev *idev;
> +	int is_router;
> +
> +	dprintk("bond_send_unsol_na: bond %s slave %s\n", bond->dev->name,
> +				slave ? slave->dev->name : "NULL");
> +
> +	if (!slave || !bond->send_unsol_na ||
> +	    test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
> +		return;
> +
> +	bond->send_unsol_na--;
> +
> +	idev = in6_dev_get(bond->dev);
> +	if (!idev)
> +		return;
> +
> +	is_router = !!idev->cnf.forwarding;
> +
> +	in6_dev_put(idev);
> +
> +	if (!ipv6_addr_any(&bond->master_ipv6))
> +		bond_na_send(slave->dev, &bond->master_ipv6, is_router, 0);
> +
> +	list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
> +		if (!ipv6_addr_any(&vlan->vlan_ipv6)) {
> +			bond_na_send(slave->dev, &vlan->vlan_ipv6, is_router,
> +				     vlan->vlan_id);
> +		}
> +	}
> +}
> +
> +/*
> + * bond_inet6addr_event: handle inet6addr notifier chain events.
> + *
> + * We keep track of device IPv6 addresses primarily to use as source
> + * addresses in NS probes.
> + *
> + * We track one IPv6 for the main device (if it has one).
> + */
> +static int bond_inet6addr_event(struct notifier_block *this,
> +				unsigned long event,
> +				void *ptr)
> +{
> +	struct inet6_ifaddr *ifa = ptr;
> +	struct net_device *vlan_dev, *event_dev = ifa->idev->dev;
> +	struct bonding *bond;
> +	struct vlan_entry *vlan;
> +
> +	if (dev_net(event_dev) != &init_net)
> +		return NOTIFY_DONE;
> +
> +	list_for_each_entry(bond, &bond_dev_list, bond_list) {
> +		if (bond->dev == event_dev) {
> +			switch (event) {
> +			case NETDEV_UP:
> +				ipv6_addr_copy(&bond->master_ipv6, &ifa->addr);
> +				return NOTIFY_OK;

I think you want to store the first address configured on the device (most
likely link-local), and not overwrite it every time  a new address is
configured.  Since new addresses can be configured rather often (think
temporary, new RAs, etc) we really want the most stable address we can have.
Also, since ND is a link protocol, link-local is sufficient.

> +			case NETDEV_DOWN:
> +				bond_glean_dev_ipv6(bond->dev,
> +						    &bond->master_ipv6);
> +				return NOTIFY_OK;

Here you may want to compare the address being removed with the address stored,
and glean the new address only if you are removing the stored address.

> +			default:
> +				return NOTIFY_DONE;
> +			}
> +		}
> +
> +		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
> +			vlan_dev = vlan_group_get_device(bond->vlgrp,
> +							 vlan->vlan_id);
> +			if (vlan_dev == event_dev) {
> +				switch (event) {
> +				case NETDEV_UP:
> +					ipv6_addr_copy(&vlan->vlan_ipv6,
> +						       &ifa->addr);
> +					return NOTIFY_OK;

Same as above.

> +				case NETDEV_DOWN:
> +					bond_glean_dev_ipv6(vlan_dev,
> +							    &vlan->vlan_ipv6);
> +					return NOTIFY_OK;

Same as above.

> +				default:
> +					return NOTIFY_DONE;
> +				}
> +			}
> +		}
> +	}
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block bond_inet6addr_notifier = {
> +	.notifier_call = bond_inet6addr_event,
> +};
> +
> +void bond_register_ipv6_notifier(void)
> +{
> +	register_inet6addr_notifier(&bond_inet6addr_notifier);
> +}
> +
> +void bond_unregister_ipv6_notifier(void)
> +{
> +	unregister_inet6addr_notifier(&bond_inet6addr_notifier);
> +}
> +

The reset looks good.  I still need to play with it to see what happens to the
MLD traffic.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens Oct. 8, 2008, 6:19 p.m. UTC | #5
Well, actually, it looks like I'm suggesting you to re-use something that 
doesn't
exist. :-)

MLD (and IGMP) has such a thing where unsolicited advertisements are sent
multiple times, with delays in between, to account for lossy networks 
possibly
dropping the first one. There are configurable counts associated with 
probes
and retransmit intervals for solicits, but I don't see the equivalent yet 
for
unsolicited NA's.

So, instead, what I suggest is that you add (or find!) THAT knob, instead 
of a
bonding-specific one. Because adding an address that wasn't there before
has identical issues with unsolicited NA's as bonding has with activating 
a
new address. The default should probably be 1, but if you ever need to
send multiple unsolicited NA's for bonding, you probably also need it for
adding a normal address on the same network. dad_transmits is similar,
but not really the same thing.

                                                        +-DLS


netdev-owner@vger.kernel.org wrote on 10/08/2008 10:40:13 AM:

> Brian,
>         I'll make the same comment I did in the earlier version,
> which is I don't think we need a new control for the count. Instead,
> it should use the count for unsolicited NA's we send when adding
> an address. It is the same case, really, and I, FWIW, would rather
> we didn't have any more sysctl's than we need.
> 
>                                                 +-DLS
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Oct. 8, 2008, 6:34 p.m. UTC | #6
Vlad Yasevich <vladislav.yasevich@hp.com> wrote:

>> +
>> +	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>> +		if (bond->dev == event_dev) {
>> +			switch (event) {
>> +			case NETDEV_UP:
>> +				ipv6_addr_copy(&bond->master_ipv6, &ifa->addr);
>> +				return NOTIFY_OK;
>
>I think you want to store the first address configured on the device (most
>likely link-local), and not overwrite it every time  a new address is
>configured.  Since new addresses can be configured rather often (think
>temporary, new RAs, etc) we really want the most stable address we can have.
>Also, since ND is a link protocol, link-local is sufficient.

	That depends upon how the IPv6 unsolicited NAs are handled by
the switch.  For IPv4, we issue a gratuitous ARP for one of the IP
addresses on the interface to update the switch's MAC table; for this
case, it doesn't matter which IP address is used.

	If IPv6-smart switches snoop the same way, then it again doesn't
matter which IPv6 address is used; this is just to update the MAC table.
I'll agree that it's logically sensible to use a link-local, though.
If, on the other hand, IPv6 needs an update for each configured address,
then storing just one IPv6 address is insufficient (as we'd need an NA
for each address).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley Oct. 8, 2008, 7:01 p.m. UTC | #7
David Stevens wrote:
> Well, actually, it looks like I'm suggesting you to re-use something that 
> doesn't
> exist. :-)
> 
> MLD (and IGMP) has such a thing where unsolicited advertisements are sent
> multiple times, with delays in between, to account for lossy networks 
> possibly
> dropping the first one. There are configurable counts associated with 
> probes
> and retransmit intervals for solicits, but I don't see the equivalent yet 
> for
> unsolicited NA's.

I don't see an equivalent either, since the only unsolicited NA the 
kernel sends is for DAD, which uses dad_transmits.

I left the MLD changes out of this patch so I could work on it 
separately, when I get to it I'll make sure to look at the issues you 
raised in your other email so it follows the RFC, or at least the Linux 
behavior.

> So, instead, what I suggest is that you add (or find!) THAT knob, instead 
> of a
> bonding-specific one. Because adding an address that wasn't there before
> has identical issues with unsolicited NA's as bonding has with activating 
> a
> new address. The default should probably be 1, but if you ever need to
> send multiple unsolicited NA's for bonding, you probably also need it for
> adding a normal address on the same network. dad_transmits is similar,
> but not really the same thing.

The problem is that dad_transmits can be set to zero, although not 
recommended, so if we used that value then bonding failover would be 
just as broken.  I think having this new tunable stay in the bonding 
code is useful since that's the code that's actually doing the transmit.

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley Oct. 8, 2008, 7:05 p.m. UTC | #8
Jay Vosburgh wrote:
> Vlad Yasevich <vladislav.yasevich@hp.com> wrote:
> 
>>> +
>>> +	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>> +		if (bond->dev == event_dev) {
>>> +			switch (event) {
>>> +			case NETDEV_UP:
>>> +				ipv6_addr_copy(&bond->master_ipv6, &ifa->addr);
>>> +				return NOTIFY_OK;
>> I think you want to store the first address configured on the device (most
>> likely link-local), and not overwrite it every time  a new address is
>> configured.  Since new addresses can be configured rather often (think
>> temporary, new RAs, etc) we really want the most stable address we can have.
>> Also, since ND is a link protocol, link-local is sufficient.
> 
> 	That depends upon how the IPv6 unsolicited NAs are handled by
> the switch.  For IPv4, we issue a gratuitous ARP for one of the IP
> addresses on the interface to update the switch's MAC table; for this
> case, it doesn't matter which IP address is used.
> 
> 	If IPv6-smart switches snoop the same way, then it again doesn't
> matter which IPv6 address is used; this is just to update the MAC table.
> I'll agree that it's logically sensible to use a link-local, though.
> If, on the other hand, IPv6 needs an update for each configured address,
> then storing just one IPv6 address is insufficient (as we'd need an NA
> for each address).

My testing has shown that it doesn't matter which address I send the NA 
for, I can ping both the link-local and global without dropping a packet 
simultaneously.  I'm using a Procurve 5400 series switch for what it's 
worth that has IPv6 support, not sure if something not as recent would 
behave any different.

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Oct. 8, 2008, 7:07 p.m. UTC | #9
Jay Vosburgh wrote:
> Vlad Yasevich <vladislav.yasevich@hp.com> wrote:
> 
>>> +
>>> +	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>> +		if (bond->dev == event_dev) {
>>> +			switch (event) {
>>> +			case NETDEV_UP:
>>> +				ipv6_addr_copy(&bond->master_ipv6, &ifa->addr);
>>> +				return NOTIFY_OK;
>> I think you want to store the first address configured on the device (most
>> likely link-local), and not overwrite it every time  a new address is
>> configured.  Since new addresses can be configured rather often (think
>> temporary, new RAs, etc) we really want the most stable address we can have.
>> Also, since ND is a link protocol, link-local is sufficient.
> 
> 	That depends upon how the IPv6 unsolicited NAs are handled by
> the switch.  For IPv4, we issue a gratuitous ARP for one of the IP
> addresses on the interface to update the switch's MAC table; for this
> case, it doesn't matter which IP address is used.
> 
> 	If IPv6-smart switches snoop the same way, then it again doesn't
> matter which IPv6 address is used; this is just to update the MAC table.
> I'll agree that it's logically sensible to use a link-local, though.
> If, on the other hand, IPv6 needs an update for each configured address,
> then storing just one IPv6 address is insufficient (as we'd need an NA
> for each address).
> 

Yes, but the unsolicited NA for the global address just looks rather strange
when the link local one is provide.  Also, with temporaries that can come and
go, it's better to use a stable address.

We are simply using it to refresh the MAC tables and for a while I thought it
would be sufficient to do just one ARP or ND, but then I realized that in an
environment where 2 systems are connected back-to-back, you would potentially
need to do both.  Need to play with this config...

-vlad

> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Oct. 8, 2008, 7:12 p.m. UTC | #10
Hi David

David Stevens wrote:
> Well, actually, it looks like I'm suggesting you to re-use something that 
> doesn't
> exist. :-)
> 
> MLD (and IGMP) has such a thing where unsolicited advertisements are sent
> multiple times, with delays in between, to account for lossy networks 
> possibly
> dropping the first one. There are configurable counts associated with 
> probes
> and retransmit intervals for solicits, but I don't see the equivalent yet 
> for
> unsolicited NA's.
> 
> So, instead, what I suggest is that you add (or find!) THAT knob, instead 
> of a
> bonding-specific one. Because adding an address that wasn't there before
> has identical issues with unsolicited NA's as bonding has with activating 
> a
> new address.

Adding a new address triggers a DAD probe which is enough to let the switch
learn the MAC address.  It's a different scenario for a link failover in
bonding.  Also, adding a new address will automatically trigger and MLD response
if a corresponding solicited node multicast address is added.

So the bonding case is rather different from everything else we've had so far.

-vlad

> The default should probably be 1, but if you ever need to
> send multiple unsolicited NA's for bonding, you probably also need it for
> adding a normal address on the same network. dad_transmits is similar,
> but not really the same thing.
> 
>                                                         +-DLS

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Oct. 8, 2008, 7:36 p.m. UTC | #11
Vlad Yasevich <vladislav.yasevich@hp.com> wrote:

>Jay Vosburgh wrote:
>> Vlad Yasevich <vladislav.yasevich@hp.com> wrote:
>> 
>>>> +
>>>> +	list_for_each_entry(bond, &bond_dev_list, bond_list) {
>>>> +		if (bond->dev == event_dev) {
>>>> +			switch (event) {
>>>> +			case NETDEV_UP:
>>>> +				ipv6_addr_copy(&bond->master_ipv6, &ifa->addr);
>>>> +				return NOTIFY_OK;
>>> I think you want to store the first address configured on the device (most
>>> likely link-local), and not overwrite it every time  a new address is
>>> configured.  Since new addresses can be configured rather often (think
>>> temporary, new RAs, etc) we really want the most stable address we can have.
>>> Also, since ND is a link protocol, link-local is sufficient.
>> 
>> 	That depends upon how the IPv6 unsolicited NAs are handled by
>> the switch.  For IPv4, we issue a gratuitous ARP for one of the IP
>> addresses on the interface to update the switch's MAC table; for this
>> case, it doesn't matter which IP address is used.
>> 
>> 	If IPv6-smart switches snoop the same way, then it again doesn't
>> matter which IPv6 address is used; this is just to update the MAC table.
>> I'll agree that it's logically sensible to use a link-local, though.
>> If, on the other hand, IPv6 needs an update for each configured address,
>> then storing just one IPv6 address is insufficient (as we'd need an NA
>> for each address).
>> 
>
>Yes, but the unsolicited NA for the global address just looks rather strange
>when the link local one is provide.  Also, with temporaries that can come and
>go, it's better to use a stable address.

	As I said, I'll agree that it's logically sensible to use a
link-local address.  This appears to be just cosmetic, though, and
(apparently, from what Brian Haley says) doesn't affect the switch
response to the update.  But, wait, there's more...

>We are simply using it to refresh the MAC tables and for a while I thought it
>would be sufficient to do just one ARP or ND, but then I realized that in an
>environment where 2 systems are connected back-to-back, you would potentially
>need to do both.  Need to play with this config...

	Yah, I've been thinking about that in the background, too,
specifically for cases with devices that cannot change their MAC address
(bonding fail_over_mac enabled); in those cases, the MAC changes during
a failover, so the gratuitous update is particularly important.  The
fail_over_mac is used for Infiniband (fixed MAC) and a few ethernet
multiport devices that are confused by having more than one of their
ports set to the same MAC.

	If those devices (when run back to back without a switch) need a
gratutious for each address, they'll need it for IPv4 and IPv6, I
suspect.  I've not heard of any problems of this sort with Infiniband,
but I'm not sure how common back to back is with Infiniband (not very, I
suspect).

	I think the non-fail_over_mac back to back connect case is ok,
at least for linux, because ARP already connects the MAC address to the
bonding device, not the underlying slave.

	As you say, something to play with (but not today, alas, as my
office space is being remodeled).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Stevens Oct. 8, 2008, 7:41 p.m. UTC | #12
Well, I think the reason to send mulitple of them is identical.
If one is dropped due to network load, it won't happen; sending
multiple increases the odds of success.

DAD itself should update caches for neighboring nodes, so I
guess it makes sense that it isn't sending unsolicited NA's, But
that makes me think that the DAD retransmit counter is the one
you want. At least, the part of the DAD retransmit counter that is
for updating other nodes' caches. :-)

For MLD and IGMP, they were explicit SHOULD's-- I need to have
a look at ND RFC's to again to see what it says about it.

I don't think that alone is a reason to block the patch, but I also
don't think that updating neighbor caches with a new MAC address
is a unique requirement of bonding. Moving an address manually
ought to be identical in needs and behavior, as well as very-quick
reboots where the hardware changed. Thus, I don't think the knob
ought to be specific to bonding. I guess that leads to the suggestion
that you re-use the DAD counter for that.

References to MLD now and before are just me looking for an
analog to what ND should be doing. No new knob is definitely
required for them, since they already have this support for
unsolicited reports.

                                                +-DLS


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Oct. 8, 2008, 7:53 p.m. UTC | #13
David Stevens wrote:
> Well, I think the reason to send mulitple of them is identical.
> If one is dropped due to network load, it won't happen; sending
> multiple increases the odds of success.
> 
> DAD itself should update caches for neighboring nodes, so I
> guess it makes sense that it isn't sending unsolicited NA's, But
> that makes me think that the DAD retransmit counter is the one
> you want. At least, the part of the DAD retransmit counter that is
> for updating other nodes' caches. :-)

Nope, DAD doesn't trigger a cache update.

> 
> For MLD and IGMP, they were explicit SHOULD's-- I need to have
> a look at ND RFC's to again to see what it says about it.
> 
> I don't think that alone is a reason to block the patch, but I also
> don't think that updating neighbor caches with a new MAC address
> is a unique requirement of bonding.

Well, the mac address is not new since the same address is replicated
across all slaves.  Also, unsolicited NAs are not permitted to change
the neighbor cache entries other then state.  An unsolicited NA will
cause an existing entry to go from REACHABLE to STALE, and nothing else.
So, it use in bonding is really the same as gratuitous ARP.

> Moving an address manually
> ought to be identical in needs and behavior, as well as very-quick
> reboots where the hardware changed. Thus, I don't think the knob
> ought to be specific to bonding. I guess that leads to the suggestion
> that you re-use the DAD counter for that.

Yes, a dad counter could be re-used for this, but in some scenarios
it's overkill.  Frankly, NA itself is an overkill.  There may be
some unintentional consequences to using it that I am looking at now.

> 
> References to MLD now and before are just me looking for an
> analog to what ND should be doing. No new knob is definitely
> required for them, since they already have this support for
> unsolicited reports.
> 

The problem is MLDs are only triggered when you are adding a new IPv6
multicast address.  However, in the bond failover case, we are simply
moving a hardware multicast address from one slave interface to
another while leaving the IPv6 multicast address on the master bond interface.
Thus there is not trigger to fire off an MLD report.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sridhar Samudrala Oct. 8, 2008, 10:22 p.m. UTC | #14
On Wed, 2008-10-08 at 15:01 -0400, Brian Haley wrote:
> David Stevens wrote:
> > Well, actually, it looks like I'm suggesting you to re-use something that 
> > doesn't
> > exist. :-)
> > 
> > MLD (and IGMP) has such a thing where unsolicited advertisements are sent
> > multiple times, with delays in between, to account for lossy networks 
> > possibly
> > dropping the first one. There are configurable counts associated with 
> > probes
> > and retransmit intervals for solicits, but I don't see the equivalent yet 
> > for
> > unsolicited NA's.
> 
> I don't see an equivalent either, since the only unsolicited NA the 
> kernel sends is for DAD, which uses dad_transmits.

Doesn't DAD use neighbor solicitation rather than unsolicited NA?
Can we use NS in the bonding failover scenario too?

Thanks
Sridhar




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Brian Haley Oct. 9, 2008, 2:08 a.m. UTC | #15
Sridhar Samudrala wrote:
> On Wed, 2008-10-08 at 15:01 -0400, Brian Haley wrote:
>> David Stevens wrote:
>>> Well, actually, it looks like I'm suggesting you to re-use something that 
>>> doesn't
>>> exist. :-)
>>>
>>> MLD (and IGMP) has such a thing where unsolicited advertisements are sent
>>> multiple times, with delays in between, to account for lossy networks 
>>> possibly
>>> dropping the first one. There are configurable counts associated with 
>>> probes
>>> and retransmit intervals for solicits, but I don't see the equivalent yet 
>>> for
>>> unsolicited NA's.
>> I don't see an equivalent either, since the only unsolicited NA the 
>> kernel sends is for DAD, which uses dad_transmits.
> 
> Doesn't DAD use neighbor solicitation rather than unsolicited NA?

Yes.  There is one case in the NS code that will respond with an 
unsolicited NA if we get a NS doing DAD.  I guess I should have made it 
clearer that it's when we're defending our address during a DAD probe.

> Can we use NS in the bonding failover scenario too?

Both and NS and NA seemed to update the switch, so either one can be 
sent on a failover event.  It seemed to be the consensus that the NA was 
more appropriate, especially since we can send it without the solicited 
bit set.

-Brian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 688dfe1..4025565 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -551,6 +551,16 @@  num_grat_arp
 	affects only the active-backup mode.  This option was added for
 	bonding version 3.3.0.
 
+num_unsol_na
+
+	Specifies the number of unsolicited IPv6 Neighbor Advertisements
+	to be issued after a failover event.  One unsolicited NA is issued
+	immediately after the failover.
+
+	The valid range is 0 - 255; the default value is 1.  This option
+	affects only the active-backup mode.  This option was added for
+	bonding version 3.4.0.
+
 primary
 
 	A string (eth0, eth2, etc) specifying which slave is the
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 2d6a060..37f55e1 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -61,6 +61,7 @@  config DUMMY
 config BONDING
 	tristate "Bonding driver support"
 	depends on INET
+	depends on IPV6 || IPV6=n
 	---help---
 	  Say 'Y' or 'M' if you wish to be able to 'bond' multiple Ethernet
 	  Channels together. This is called 'Etherchannel' by Cisco,
diff --git a/drivers/net/bonding/Makefile b/drivers/net/bonding/Makefile
index 5cdae2b..6f9c6fa 100644
--- a/drivers/net/bonding/Makefile
+++ b/drivers/net/bonding/Makefile
@@ -6,3 +6,6 @@  obj-$(CONFIG_BONDING) += bonding.o
 
 bonding-objs := bond_main.o bond_3ad.o bond_alb.o bond_sysfs.o
 
+ipv6-$(subst m,y,$(CONFIG_IPV6)) += bond_ipv6.o
+bonding-objs += $(ipv6-y)
+
diff --git a/drivers/net/bonding/bond_ipv6.c b/drivers/net/bonding/bond_ipv6.c
new file mode 100644
index 0000000..b6b0351
--- /dev/null
+++ b/drivers/net/bonding/bond_ipv6.c
@@ -0,0 +1,208 @@ 
+/*
+ * Copyright(c) 2008 Hewlett-Packard Development Company, L.P.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called LICENSE.
+ *
+ */
+
+//#define BONDING_DEBUG 1
+
+#include <linux/types.h>
+#include <linux/if_vlan.h>
+#include <net/ipv6.h>
+#include <net/ndisc.h>
+#include <net/addrconf.h>
+#include "bonding.h"
+
+/*
+ * Assign bond->master_ipv6 to the next IPv6 address in the list, or
+ * zero it out if there are none.
+ */
+static void bond_glean_dev_ipv6(struct net_device *dev, struct in6_addr *addr)
+{
+	struct inet6_dev *idev;
+	struct inet6_ifaddr *ifa;
+
+	if (!dev)
+		return;
+
+	idev = in6_dev_get(dev);
+	if (!idev)
+		return;
+
+	ifa = idev->addr_list;
+	if (ifa)
+		ipv6_addr_copy(addr, &ifa->addr);
+	else
+		ipv6_addr_set(addr, 0, 0, 0, 0);
+
+	in6_dev_put(idev);
+}
+
+static void bond_na_send(struct net_device *slave_dev,
+			 struct in6_addr *daddr,
+			 int router,
+			 unsigned short vlan_id)
+{
+	struct in6_addr mcaddr;
+	struct icmp6hdr icmp6h = {
+		.icmp6_type = NDISC_NEIGHBOUR_ADVERTISEMENT,
+	};
+	struct sk_buff *skb;
+
+	icmp6h.icmp6_router = router;
+	icmp6h.icmp6_solicited = 0;
+	icmp6h.icmp6_override = 1;
+
+	addrconf_addr_solict_mult(daddr, &mcaddr);
+
+	dprintk("ipv6 na on slave %s: dest %s, src %s\n" NIP6_FMT NIP6_FMT,
+	       slave->name, NIP6(&mcaddr), NIP6(daddr));
+
+	skb = ndisc_build_skb(slave_dev, &mcaddr, daddr, &icmp6h, daddr,
+			      ND_OPT_TARGET_LL_ADDR);
+
+	if (!skb) {
+		printk(KERN_ERR DRV_NAME ": NA packet allocation failed\n");
+		return;
+	}
+
+	if (vlan_id) {
+		skb = vlan_put_tag(skb, vlan_id);
+		if (!skb) {
+			printk(KERN_ERR DRV_NAME ": failed to insert VLAN tag\n");
+			return;
+		}
+	}
+
+	ndisc_send_skb(skb, slave_dev, NULL, &mcaddr, daddr, &icmp6h);
+}
+
+/*
+ * Kick out an unsolicited Neighbor Advertisement for an IPv6 address on
+ * the bonding master.  This will help the switch learn our address
+ * if in active-backup mode.
+ *
+ * Caller must hold curr_slave_lock for read or better
+ */
+void bond_send_unsolicited_na(struct bonding *bond)
+{
+	struct slave *slave = bond->curr_active_slave;
+	struct vlan_entry *vlan;
+	struct inet6_dev *idev;
+	int is_router;
+
+	dprintk("bond_send_unsol_na: bond %s slave %s\n", bond->dev->name,
+				slave ? slave->dev->name : "NULL");
+
+	if (!slave || !bond->send_unsol_na ||
+	    test_bit(__LINK_STATE_LINKWATCH_PENDING, &slave->dev->state))
+		return;
+
+	bond->send_unsol_na--;
+
+	idev = in6_dev_get(bond->dev);
+	if (!idev)
+		return;
+
+	is_router = !!idev->cnf.forwarding;
+
+	in6_dev_put(idev);
+
+	if (!ipv6_addr_any(&bond->master_ipv6))
+		bond_na_send(slave->dev, &bond->master_ipv6, is_router, 0);
+
+	list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
+		if (!ipv6_addr_any(&vlan->vlan_ipv6)) {
+			bond_na_send(slave->dev, &vlan->vlan_ipv6, is_router,
+				     vlan->vlan_id);
+		}
+	}
+}
+
+/*
+ * bond_inet6addr_event: handle inet6addr notifier chain events.
+ *
+ * We keep track of device IPv6 addresses primarily to use as source
+ * addresses in NS probes.
+ *
+ * We track one IPv6 for the main device (if it has one).
+ */
+static int bond_inet6addr_event(struct notifier_block *this,
+				unsigned long event,
+				void *ptr)
+{
+	struct inet6_ifaddr *ifa = ptr;
+	struct net_device *vlan_dev, *event_dev = ifa->idev->dev;
+	struct bonding *bond;
+	struct vlan_entry *vlan;
+
+	if (dev_net(event_dev) != &init_net)
+		return NOTIFY_DONE;
+
+	list_for_each_entry(bond, &bond_dev_list, bond_list) {
+		if (bond->dev == event_dev) {
+			switch (event) {
+			case NETDEV_UP:
+				ipv6_addr_copy(&bond->master_ipv6, &ifa->addr);
+				return NOTIFY_OK;
+			case NETDEV_DOWN:
+				bond_glean_dev_ipv6(bond->dev,
+						    &bond->master_ipv6);
+				return NOTIFY_OK;
+			default:
+				return NOTIFY_DONE;
+			}
+		}
+
+		list_for_each_entry(vlan, &bond->vlan_list, vlan_list) {
+			vlan_dev = vlan_group_get_device(bond->vlgrp,
+							 vlan->vlan_id);
+			if (vlan_dev == event_dev) {
+				switch (event) {
+				case NETDEV_UP:
+					ipv6_addr_copy(&vlan->vlan_ipv6,
+						       &ifa->addr);
+					return NOTIFY_OK;
+				case NETDEV_DOWN:
+					bond_glean_dev_ipv6(vlan_dev,
+							    &vlan->vlan_ipv6);
+					return NOTIFY_OK;
+				default:
+					return NOTIFY_DONE;
+				}
+			}
+		}
+	}
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block bond_inet6addr_notifier = {
+	.notifier_call = bond_inet6addr_event,
+};
+
+void bond_register_ipv6_notifier(void)
+{
+	register_inet6addr_notifier(&bond_inet6addr_notifier);
+}
+
+void bond_unregister_ipv6_notifier(void)
+{
+	unregister_inet6addr_notifier(&bond_inet6addr_notifier);
+}
+
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8e2be24..fba0f0c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -89,6 +89,7 @@ 
 
 static int max_bonds	= BOND_DEFAULT_MAX_BONDS;
 static int num_grat_arp = 1;
+static int num_unsol_na = 1;
 static int miimon	= BOND_LINK_MON_INTERV;
 static int updelay	= 0;
 static int downdelay	= 0;
@@ -107,6 +108,8 @@  module_param(max_bonds, int, 0);
 MODULE_PARM_DESC(max_bonds, "Max number of bonded devices");
 module_param(num_grat_arp, int, 0644);
 MODULE_PARM_DESC(num_grat_arp, "Number of gratuitous ARP packets to send on failover event");
+module_param(num_unsol_na, int, 0644);
+MODULE_PARM_DESC(num_unsol_na, "Number of unsolicited IPv6 Neighbor Advertisements packets to send on failover event");
 module_param(miimon, int, 0);
 MODULE_PARM_DESC(miimon, "Link check interval in milliseconds");
 module_param(updelay, int, 0);
@@ -242,14 +245,13 @@  static int bond_add_vlan(struct bonding *bond, unsigned short vlan_id)
 	dprintk("bond: %s, vlan id %d\n",
 		(bond ? bond->dev->name: "None"), vlan_id);
 
-	vlan = kmalloc(sizeof(struct vlan_entry), GFP_KERNEL);
+	vlan = kzalloc(sizeof(struct vlan_entry), GFP_KERNEL);
 	if (!vlan) {
 		return -ENOMEM;
 	}
 
 	INIT_LIST_HEAD(&vlan->vlan_list);
 	vlan->vlan_id = vlan_id;
-	vlan->vlan_ip = 0;
 
 	write_lock_bh(&bond->lock);
 
@@ -1208,6 +1210,9 @@  void bond_change_active_slave(struct bonding *bond, struct slave *new_active)
 			bond->send_grat_arp = bond->params.num_grat_arp;
 			bond_send_gratuitous_arp(bond);
 
+			bond->send_unsol_na = bond->params.num_unsol_na;
+			bond_send_unsolicited_na(bond);
+
 			write_unlock_bh(&bond->curr_slave_lock);
 			read_unlock(&bond->lock);
 
@@ -2441,6 +2446,12 @@  void bond_mii_monitor(struct work_struct *work)
 		read_unlock(&bond->curr_slave_lock);
 	}
 
+	if (bond->send_unsol_na) {
+		read_lock(&bond->curr_slave_lock);
+		bond_send_unsolicited_na(bond);
+		read_unlock(&bond->curr_slave_lock);
+	}
+
 	if (bond_miimon_inspect(bond)) {
 		read_unlock(&bond->lock);
 		rtnl_lock();
@@ -3138,6 +3149,12 @@  void bond_activebackup_arp_mon(struct work_struct *work)
 		read_unlock(&bond->curr_slave_lock);
 	}
 
+	if (bond->send_unsol_na) {
+		read_lock(&bond->curr_slave_lock);
+		bond_send_unsolicited_na(bond);
+		read_unlock(&bond->curr_slave_lock);
+	}
+
 	if (bond_ab_arp_inspect(bond, delta_in_ticks)) {
 		read_unlock(&bond->lock);
 		rtnl_lock();
@@ -3813,6 +3830,7 @@  static int bond_close(struct net_device *bond_dev)
 	write_lock_bh(&bond->lock);
 
 	bond->send_grat_arp = 0;
+	bond->send_unsol_na = 0;
 
 	/* signal timers not to re-arm */
 	bond->kill_timers = 1;
@@ -4528,6 +4546,7 @@  static int bond_init(struct net_device *bond_dev, struct bond_params *params)
 	bond->primary_slave = NULL;
 	bond->dev = bond_dev;
 	bond->send_grat_arp = 0;
+	bond->send_unsol_na = 0;
 	bond->setup_by_slave = 0;
 	INIT_LIST_HEAD(&bond->vlan_list);
 
@@ -4776,6 +4795,13 @@  static int bond_check_params(struct bond_params *params)
 		num_grat_arp = 1;
 	}
 
+	if (num_unsol_na < 0 || num_unsol_na > 255) {
+		printk(KERN_WARNING DRV_NAME
+		       ": Warning: num_unsol_na (%d) not in range 0-255 so it "
+		       "was reset to 1 \n", num_unsol_na);
+		num_unsol_na = 1;
+	}
+
 	/* reset values for 802.3ad */
 	if (bond_mode == BOND_MODE_8023AD) {
 		if (!miimon) {
@@ -4977,6 +5003,7 @@  static int bond_check_params(struct bond_params *params)
 	params->xmit_policy = xmit_hashtype;
 	params->miimon = miimon;
 	params->num_grat_arp = num_grat_arp;
+	params->num_unsol_na = num_unsol_na;
 	params->arp_interval = arp_interval;
 	params->arp_validate = arp_validate_value;
 	params->updelay = updelay;
@@ -5129,6 +5156,7 @@  static int __init bonding_init(void)
 
 	register_netdevice_notifier(&bond_netdev_notifier);
 	register_inetaddr_notifier(&bond_inetaddr_notifier);
+	bond_register_ipv6_notifier();
 
 	goto out;
 err:
@@ -5151,6 +5179,7 @@  static void __exit bonding_exit(void)
 {
 	unregister_netdevice_notifier(&bond_netdev_notifier);
 	unregister_inetaddr_notifier(&bond_inetaddr_notifier);
+	bond_unregister_ipv6_notifier();
 
 	bond_destroy_sysfs();
 
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 3bdb473..70ae376 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -981,6 +981,47 @@  out:
 	return ret;
 }
 static DEVICE_ATTR(num_grat_arp, S_IRUGO | S_IWUSR, bonding_show_n_grat_arp, bonding_store_n_grat_arp);
+
+/*
+ * Show and set the number of unsolicted NA's to send after a failover event.
+ */
+static ssize_t bonding_show_n_unsol_na(struct device *d,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	struct bonding *bond = to_bond(d);
+
+	return sprintf(buf, "%d\n", bond->params.num_unsol_na);
+}
+
+static ssize_t bonding_store_n_unsol_na(struct device *d,
+					struct device_attribute *attr,
+					const char *buf, size_t count)
+{
+	int new_value, ret = count;
+	struct bonding *bond = to_bond(d);
+
+	if (sscanf(buf, "%d", &new_value) != 1) {
+		printk(KERN_ERR DRV_NAME
+		       ": %s: no num_unsol_na value specified.\n",
+		       bond->dev->name);
+		ret = -EINVAL;
+		goto out;
+	}
+	if (new_value < 0 || new_value > 255) {
+		printk(KERN_ERR DRV_NAME
+		       ": %s: Invalid num_unsol_na value %d not in range 0-255; rejected.\n",
+		       bond->dev->name, new_value);
+		ret = -EINVAL;
+		goto out;
+	} else {
+		bond->params.num_unsol_na = new_value;
+	}
+out:
+	return ret;
+}
+static DEVICE_ATTR(num_unsol_na, S_IRUGO | S_IWUSR, bonding_show_n_unsol_na, bonding_store_n_unsol_na);
+
 /*
  * Show and set the MII monitor interval.  There are two tricky bits
  * here.  First, if MII monitoring is activated, then we must disable
@@ -1419,6 +1460,7 @@  static struct attribute *per_bond_attrs[] = {
 	&dev_attr_lacp_rate.attr,
 	&dev_attr_xmit_hash_policy.attr,
 	&dev_attr_num_grat_arp.attr,
+	&dev_attr_num_unsol_na.attr,
 	&dev_attr_miimon.attr,
 	&dev_attr_primary.attr,
 	&dev_attr_use_carrier.attr,
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index ffb668d..0491c7c 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -19,16 +19,19 @@ 
 #include <linux/proc_fs.h>
 #include <linux/if_bonding.h>
 #include <linux/kobject.h>
+#include <linux/in6.h>
 #include "bond_3ad.h"
 #include "bond_alb.h"
 
-#define DRV_VERSION	"3.3.0"
-#define DRV_RELDATE	"June 10, 2008"
+#define DRV_VERSION	"3.4.0"
+#define DRV_RELDATE	"October 7, 2008"
 #define DRV_NAME	"bonding"
 #define DRV_DESCRIPTION	"Ethernet Channel Bonding Driver"
 
 #define BOND_MAX_ARP_TARGETS	16
 
+extern struct list_head bond_dev_list;
+
 #ifdef BONDING_DEBUG
 #define dprintk(fmt, args...) \
 	printk(KERN_DEBUG     \
@@ -126,6 +129,7 @@  struct bond_params {
 	int xmit_policy;
 	int miimon;
 	int num_grat_arp;
+	int num_unsol_na;
 	int arp_interval;
 	int arp_validate;
 	int use_carrier;
@@ -148,6 +152,9 @@  struct vlan_entry {
 	struct list_head vlan_list;
 	__be32 vlan_ip;
 	unsigned short vlan_id;
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	struct in6_addr vlan_ipv6;
+#endif
 };
 
 struct slave {
@@ -195,6 +202,7 @@  struct bonding {
 	rwlock_t curr_slave_lock;
 	s8       kill_timers;
 	s8	 send_grat_arp;
+	s8	 send_unsol_na;
 	s8	 setup_by_slave;
 	struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
@@ -218,6 +226,9 @@  struct bonding {
 	struct   delayed_work arp_work;
 	struct   delayed_work alb_work;
 	struct   delayed_work ad_work;
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+	struct   in6_addr master_ipv6;
+#endif
 };
 
 /**
@@ -341,5 +352,24 @@  extern struct bond_parm_tbl xmit_hashtype_tbl[];
 extern struct bond_parm_tbl arp_validate_tbl[];
 extern struct bond_parm_tbl fail_over_mac_tbl[];
 
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+void bond_send_unsolicited_na(struct bonding *bond);
+void bond_register_ipv6_notifier(void);
+void bond_unregister_ipv6_notifier(void);
+#else
+static inline void bond_send_unsolicited_na(struct bonding *bond)
+{
+	return;
+}
+static inline void bond_register_ipv6_notifier(void)
+{
+	return;
+}
+static inline void bond_unregister_ipv6_notifier(void)
+{
+	return;
+}
+#endif
+
 #endif /* _LINUX_BONDING_H */
 
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index a01b7c4..1b0da3d 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -108,6 +108,20 @@  extern void			ndisc_send_redirect(struct sk_buff *skb,
 
 extern int			ndisc_mc_map(struct in6_addr *addr, char *buf, struct net_device *dev, int dir);
 
+extern struct sk_buff		*ndisc_build_skb(struct net_device *dev,
+						 const struct in6_addr *daddr,
+						 const struct in6_addr *saddr,
+						 struct icmp6hdr *icmp6h,
+						 const struct in6_addr *target,
+						 int llinfo);
+
+extern void			ndisc_send_skb(struct sk_buff *skb,
+					       struct net_device *dev,
+					       struct neighbour *neigh,
+					       const struct in6_addr *daddr,
+					       const struct in6_addr *saddr,
+					       struct icmp6hdr *icmp6h);
+
 
 
 /*
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index f1c62ba..c6f8ceb 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -437,38 +437,20 @@  static void pndisc_destructor(struct pneigh_entry *n)
 	ipv6_dev_mc_dec(dev, &maddr);
 }
 
-/*
- *	Send a Neighbour Advertisement
- */
-static void __ndisc_send(struct net_device *dev,
-			 struct neighbour *neigh,
-			 const struct in6_addr *daddr,
-			 const struct in6_addr *saddr,
-			 struct icmp6hdr *icmp6h, const struct in6_addr *target,
-			 int llinfo)
+struct sk_buff *ndisc_build_skb(struct net_device *dev,
+				const struct in6_addr *daddr,
+				const struct in6_addr *saddr,
+				struct icmp6hdr *icmp6h,
+				const struct in6_addr *target,
+				int llinfo)
 {
-	struct flowi fl;
-	struct dst_entry *dst;
 	struct net *net = dev_net(dev);
 	struct sock *sk = net->ipv6.ndisc_sk;
 	struct sk_buff *skb;
 	struct icmp6hdr *hdr;
-	struct inet6_dev *idev;
 	int len;
 	int err;
-	u8 *opt, type;
-
-	type = icmp6h->icmp6_type;
-
-	icmpv6_flow_init(sk, &fl, type, saddr, daddr, dev->ifindex);
-
-	dst = icmp6_dst_alloc(dev, neigh, daddr);
-	if (!dst)
-		return;
-
-	err = xfrm_lookup(&dst, &fl, NULL, 0);
-	if (err < 0)
-		return;
+	u8 *opt;
 
 	if (!dev->addr_len)
 		llinfo = 0;
@@ -485,8 +467,7 @@  static void __ndisc_send(struct net_device *dev,
 		ND_PRINTK0(KERN_ERR
 			   "ICMPv6 ND: %s() failed to allocate an skb.\n",
 			   __func__);
-		dst_release(dst);
-		return;
+		return NULL;
 	}
 
 	skb_reserve(skb, LL_RESERVED_SPACE(dev));
@@ -513,6 +494,42 @@  static void __ndisc_send(struct net_device *dev,
 					   csum_partial((__u8 *) hdr,
 							len, 0));
 
+	return skb;
+}
+
+EXPORT_SYMBOL(ndisc_build_skb);
+
+void ndisc_send_skb(struct sk_buff *skb,
+		    struct net_device *dev,
+		    struct neighbour *neigh,
+		    const struct in6_addr *daddr,
+		    const struct in6_addr *saddr,
+		    struct icmp6hdr *icmp6h)
+{
+	struct flowi fl;
+	struct dst_entry *dst;
+	struct net *net = dev_net(dev);
+	struct sock *sk = net->ipv6.ndisc_sk;
+	struct inet6_dev *idev;
+	int err;
+	u8 type;
+
+	type = icmp6h->icmp6_type;
+
+	icmpv6_flow_init(sk, &fl, type, saddr, daddr, dev->ifindex);
+
+	dst = icmp6_dst_alloc(dev, neigh, daddr);
+	if (!dst) {
+		kfree_skb(skb);
+		return;
+	}
+
+	err = xfrm_lookup(&dst, &fl, NULL, 0);
+	if (err < 0) {
+		kfree_skb(skb);
+		return;
+	}
+
 	skb->dst = dst;
 
 	idev = in6_dev_get(dst->dev);
@@ -529,6 +546,27 @@  static void __ndisc_send(struct net_device *dev,
 		in6_dev_put(idev);
 }
 
+EXPORT_SYMBOL(ndisc_send_skb);
+
+/*
+ *	Send a Neighbour Discover packet
+ */
+static void __ndisc_send(struct net_device *dev,
+			 struct neighbour *neigh,
+			 const struct in6_addr *daddr,
+			 const struct in6_addr *saddr,
+			 struct icmp6hdr *icmp6h, const struct in6_addr *target,
+			 int llinfo)
+{
+	struct sk_buff *skb;
+
+	skb = ndisc_build_skb(dev, daddr, saddr, icmp6h, target, llinfo);
+	if (!skb)
+		return;
+
+	ndisc_send_skb(skb, dev, neigh, daddr, saddr, icmp6h);
+}
+
 static void ndisc_send_na(struct net_device *dev, struct neighbour *neigh,
 			  const struct in6_addr *daddr,
 			  const struct in6_addr *solicited_addr,