diff mbox

IPv6: DAD from bonding iface is treated as dup address from others

Message ID 1317873550-1677-1-git-send-email-Yinglin.Sun@emc.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Yinglin Sun Oct. 6, 2011, 3:59 a.m. UTC
Steps to reproduce this issue:
1. create bond0 over eth0 and eth1, set the mode to balance-xor
2. add an IPv6 address to bond0
3. DAD packet is sent out from one slave and then is looped back from
the other slave. Therefore, it is treated as a duplicate address and
stays tentative afterwards:
   kern.info:
       Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!

Signed-off-by: Yinglin Sun <Yinglin.Sun@emc.com>
---
 net/ipv6/ndisc.c |   15 +++++++++++++--
 1 files changed, 13 insertions(+), 2 deletions(-)

Comments

Neil Horman Oct. 6, 2011, 11 a.m. UTC | #1
On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
> Steps to reproduce this issue:
> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
> 2. add an IPv6 address to bond0
> 3. DAD packet is sent out from one slave and then is looped back from
> the other slave. Therefore, it is treated as a duplicate address and
> stays tentative afterwards:
>    kern.info:
>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
> 
> Signed-off-by: Yinglin Sun <Yinglin.Sun@emc.com>
> ---
>  net/ipv6/ndisc.c |   15 +++++++++++++--
>  1 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 9da6e02..c82f4c7 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -809,9 +809,10 @@ static void ndisc_recv_ns(struct sk_buff *skb)
>  
>  		if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
>  			if (dad) {
> +				const unsigned char *sadr;
> +				sadr = skb_mac_header(skb);
> +
>  				if (dev->type == ARPHRD_IEEE802_TR) {
> -					const unsigned char *sadr;
> -					sadr = skb_mac_header(skb);
>  					if (((sadr[8] ^ dev->dev_addr[0]) & 0x7f) == 0 &&
>  					    sadr[9] == dev->dev_addr[1] &&
>  					    sadr[10] == dev->dev_addr[2] &&
> @@ -821,6 +822,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
>  						/* looped-back to us */
>  						goto out;
>  					}
> +				} else if (dev->type == ARPHRD_ETHER) {
> +					if (sadr[6] == dev->dev_addr[0] &&
> +					    sadr[7] == dev->dev_addr[1] &&
> +					    sadr[8] == dev->dev_addr[2] &&
> +					    sadr[9] == dev->dev_addr[3] &&
> +					    sadr[10] == dev->dev_addr[4] &&
> +					    sadr[11] == dev->dev_addr[5]) {
> +						/* looped-back to us */
> +						goto out;
> +					}
>  				}
>  
>  				/*
> -- 
> 1.7.4.1
> 
Nack, This seems like it will just completely break DAD.  What if theres another
system out there with the same mac address.  A response from that system would
get dropped by this filter, instead of causing The local system to stop using
the address.  What you really want to do is modify
bond_should_deliver_exact_match to detect this frame on the inactive slave or
some such, and drop the frame there.

Neil

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Oct. 6, 2011, 7:05 p.m. UTC | #2
Neil Horman <nhorman@tuxdriver.com> wrote:

>On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
>> Steps to reproduce this issue:
>> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
>> 2. add an IPv6 address to bond0
>> 3. DAD packet is sent out from one slave and then is looped back from
>> the other slave. Therefore, it is treated as a duplicate address and
>> stays tentative afterwards:
>>    kern.info:
>>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
>> 
>> Signed-off-by: Yinglin Sun <Yinglin.Sun@emc.com>
>> ---
>>  net/ipv6/ndisc.c |   15 +++++++++++++--
>>  1 files changed, 13 insertions(+), 2 deletions(-)
>> 
>> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
>> index 9da6e02..c82f4c7 100644
>> --- a/net/ipv6/ndisc.c
>> +++ b/net/ipv6/ndisc.c
>> @@ -809,9 +809,10 @@ static void ndisc_recv_ns(struct sk_buff *skb)
>>  
>>  		if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
>>  			if (dad) {
>> +				const unsigned char *sadr;
>> +				sadr = skb_mac_header(skb);
>> +
>>  				if (dev->type == ARPHRD_IEEE802_TR) {
>> -					const unsigned char *sadr;
>> -					sadr = skb_mac_header(skb);
>>  					if (((sadr[8] ^ dev->dev_addr[0]) & 0x7f) == 0 &&
>>  					    sadr[9] == dev->dev_addr[1] &&
>>  					    sadr[10] == dev->dev_addr[2] &&
>> @@ -821,6 +822,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
>>  						/* looped-back to us */
>>  						goto out;
>>  					}
>> +				} else if (dev->type == ARPHRD_ETHER) {
>> +					if (sadr[6] == dev->dev_addr[0] &&
>> +					    sadr[7] == dev->dev_addr[1] &&
>> +					    sadr[8] == dev->dev_addr[2] &&
>> +					    sadr[9] == dev->dev_addr[3] &&
>> +					    sadr[10] == dev->dev_addr[4] &&
>> +					    sadr[11] == dev->dev_addr[5]) {
>> +						/* looped-back to us */
>> +						goto out;
>> +					}
>>  				}
>>  
>>  				/*
>> -- 
>> 1.7.4.1
>> 
>Nack, This seems like it will just completely break DAD.  What if theres another
>system out there with the same mac address.  A response from that system would
>get dropped by this filter, instead of causing The local system to stop using
>the address.  What you really want to do is modify
>bond_should_deliver_exact_match to detect this frame on the inactive slave or
>some such, and drop the frame there.

	Also NACK; and adding a bit of information.  The balance-xor
mode is nominally expecting to interact with a switch whose ports are
set for etherchannel ("static link aggregation"), in which case the
switch will not loop the packet back around.

	If your switch can do etherchannel, then enable it and the
problem should go away.  If your switch cannot do this, then you may
have other issues, because all of the multicast or broadcast packets
going out any bonding slave will loop around to another slave.  You
could also use 802.3ad / LACP if you switch supports that.

	For balance-xor (or balance-rr, for that matter) mode to a
non-etherchannel switch, it's going to be difficult, if not impossible,
to modify bond_should_deliver_exact_match, because there are no inactive
slaves.  In this mode, bonding is expecting the switch to balance
incoming traffic across the ports, and not deliver looped back packets
or duplicates.  There are no restrictions on what type of traffic
(mcast, bcast, ucast) may arrive on any given port.

	I can't think of a way to make the non-etherchannel case work
for balance-xor (or balance-rr) without breaking the DAD functionality
in the case of an actual duplicate.  I'm not aware of a way to
distinguish a looped back DAD probe from an actual duplicate address
probe elsewhere on the network.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinglin Sun Oct. 6, 2011, 10:17 p.m. UTC | #3
On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>
> Neil Horman <nhorman@tuxdriver.com> wrote:
>
> >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
> >> Steps to reproduce this issue:
> >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
> >> 2. add an IPv6 address to bond0
> >> 3. DAD packet is sent out from one slave and then is looped back from
> >> the other slave. Therefore, it is treated as a duplicate address and
> >> stays tentative afterwards:
> >>    kern.info:
> >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
> >>
> >> Signed-off-by: Yinglin Sun <Yinglin.Sun@emc.com>
> >> ---
> >>  net/ipv6/ndisc.c |   15 +++++++++++++--
> >>  1 files changed, 13 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> >> index 9da6e02..c82f4c7 100644
> >> --- a/net/ipv6/ndisc.c
> >> +++ b/net/ipv6/ndisc.c
> >> @@ -809,9 +809,10 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> >>
> >>              if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
> >>                      if (dad) {
> >> +                            const unsigned char *sadr;
> >> +                            sadr = skb_mac_header(skb);
> >> +
> >>                              if (dev->type == ARPHRD_IEEE802_TR) {
> >> -                                    const unsigned char *sadr;
> >> -                                    sadr = skb_mac_header(skb);
> >>                                      if (((sadr[8] ^ dev->dev_addr[0]) & 0x7f) == 0 &&
> >>                                          sadr[9] == dev->dev_addr[1] &&
> >>                                          sadr[10] == dev->dev_addr[2] &&
> >> @@ -821,6 +822,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> >>                                              /* looped-back to us */
> >>                                              goto out;
> >>                                      }
> >> +                            } else if (dev->type == ARPHRD_ETHER) {
> >> +                                    if (sadr[6] == dev->dev_addr[0] &&
> >> +                                        sadr[7] == dev->dev_addr[1] &&
> >> +                                        sadr[8] == dev->dev_addr[2] &&
> >> +                                        sadr[9] == dev->dev_addr[3] &&
> >> +                                        sadr[10] == dev->dev_addr[4] &&
> >> +                                        sadr[11] == dev->dev_addr[5]) {
> >> +                                            /* looped-back to us */
> >> +                                            goto out;
> >> +                                    }
> >>                              }
> >>
> >>                              /*
> >> --
> >> 1.7.4.1
> >>
> >Nack, This seems like it will just completely break DAD.  What if theres another
> >system out there with the same mac address.  A response from that system would
> >get dropped by this filter, instead of causing The local system to stop using
> >the address.  What you really want to do is modify
> >bond_should_deliver_exact_match to detect this frame on the inactive slave or
> >some such, and drop the frame there.
>
>        Also NACK; and adding a bit of information.  The balance-xor
> mode is nominally expecting to interact with a switch whose ports are
> set for etherchannel ("static link aggregation"), in which case the
> switch will not loop the packet back around.
>
>        If your switch can do etherchannel, then enable it and the
> problem should go away.  If your switch cannot do this, then you may
> have other issues, because all of the multicast or broadcast packets
> going out any bonding slave will loop around to another slave.  You
> could also use 802.3ad / LACP if you switch supports that.
>
>        For balance-xor (or balance-rr, for that matter) mode to a
> non-etherchannel switch, it's going to be difficult, if not impossible,
> to modify bond_should_deliver_exact_match, because there are no inactive
> slaves.  In this mode, bonding is expecting the switch to balance
> incoming traffic across the ports, and not deliver looped back packets
> or duplicates.  There are no restrictions on what type of traffic
> (mcast, bcast, ucast) may arrive on any given port.
>
>        I can't think of a way to make the non-etherchannel case work
> for balance-xor (or balance-rr) without breaking the DAD functionality
> in the case of an actual duplicate.  I'm not aware of a way to
> distinguish a looped back DAD probe from an actual duplicate address
> probe elsewhere on the network.
>

Hi Neil & Jay,

Thanks a lot for the comments.

The use case is to add IPv6 address on the bonding interface first,
and then set up port channel on switch. We'll hit this issue and the
new address will stay tentative and unusable after port channel is set
up on switch. This patch is for this valid use case.

Except failover mode, all slaves are active on receiving packets, so
we are receiving such looped back DAD and the bonding driver cannot
ignore them. I cannot think of a way to distinguish if a DAD is looped
back or from someone else having the same mac address. They look the
same to the host. If there is another machine having the same mac
address, this code path gets executed if both are doing DAD at the
same time for the same IPv6 address. Maybe we should find out what the
specification defines for this case?

Thanks.

Yinglin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinglin Sun Oct. 7, 2011, 12:03 a.m. UTC | #4
On Thu, Oct 6, 2011 at 3:17 PM, Yinglin Sun <Yinglin.Sun@emc.com> wrote:
>
> On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> >
> > Neil Horman <nhorman@tuxdriver.com> wrote:
> >
> > >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
> > >> Steps to reproduce this issue:
> > >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
> > >> 2. add an IPv6 address to bond0
> > >> 3. DAD packet is sent out from one slave and then is looped back from
> > >> the other slave. Therefore, it is treated as a duplicate address and
> > >> stays tentative afterwards:
> > >>    kern.info:
> > >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
> > >>
> > >> Signed-off-by: Yinglin Sun <Yinglin.Sun@emc.com>
> > >> ---
> > >>  net/ipv6/ndisc.c |   15 +++++++++++++--
> > >>  1 files changed, 13 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> > >> index 9da6e02..c82f4c7 100644
> > >> --- a/net/ipv6/ndisc.c
> > >> +++ b/net/ipv6/ndisc.c
> > >> @@ -809,9 +809,10 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> > >>
> > >>              if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
> > >>                      if (dad) {
> > >> +                            const unsigned char *sadr;
> > >> +                            sadr = skb_mac_header(skb);
> > >> +
> > >>                              if (dev->type == ARPHRD_IEEE802_TR) {
> > >> -                                    const unsigned char *sadr;
> > >> -                                    sadr = skb_mac_header(skb);
> > >>                                      if (((sadr[8] ^ dev->dev_addr[0]) & 0x7f) == 0 &&
> > >>                                          sadr[9] == dev->dev_addr[1] &&
> > >>                                          sadr[10] == dev->dev_addr[2] &&
> > >> @@ -821,6 +822,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
> > >>                                              /* looped-back to us */
> > >>                                              goto out;
> > >>                                      }
> > >> +                            } else if (dev->type == ARPHRD_ETHER) {
> > >> +                                    if (sadr[6] == dev->dev_addr[0] &&
> > >> +                                        sadr[7] == dev->dev_addr[1] &&
> > >> +                                        sadr[8] == dev->dev_addr[2] &&
> > >> +                                        sadr[9] == dev->dev_addr[3] &&
> > >> +                                        sadr[10] == dev->dev_addr[4] &&
> > >> +                                        sadr[11] == dev->dev_addr[5]) {
> > >> +                                            /* looped-back to us */
> > >> +                                            goto out;
> > >> +                                    }
> > >>                              }
> > >>
> > >>                              /*
> > >> --
> > >> 1.7.4.1
> > >>
> > >Nack, This seems like it will just completely break DAD.  What if theres another
> > >system out there with the same mac address.  A response from that system would
> > >get dropped by this filter, instead of causing The local system to stop using
> > >the address.  What you really want to do is modify
> > >bond_should_deliver_exact_match to detect this frame on the inactive slave or
> > >some such, and drop the frame there.
> >
> >        Also NACK; and adding a bit of information.  The balance-xor
> > mode is nominally expecting to interact with a switch whose ports are
> > set for etherchannel ("static link aggregation"), in which case the
> > switch will not loop the packet back around.
> >
> >        If your switch can do etherchannel, then enable it and the
> > problem should go away.  If your switch cannot do this, then you may
> > have other issues, because all of the multicast or broadcast packets
> > going out any bonding slave will loop around to another slave.  You
> > could also use 802.3ad / LACP if you switch supports that.
> >
> >        For balance-xor (or balance-rr, for that matter) mode to a
> > non-etherchannel switch, it's going to be difficult, if not impossible,
> > to modify bond_should_deliver_exact_match, because there are no inactive
> > slaves.  In this mode, bonding is expecting the switch to balance
> > incoming traffic across the ports, and not deliver looped back packets
> > or duplicates.  There are no restrictions on what type of traffic
> > (mcast, bcast, ucast) may arrive on any given port.
> >
> >        I can't think of a way to make the non-etherchannel case work
> > for balance-xor (or balance-rr) without breaking the DAD functionality
> > in the case of an actual duplicate.  I'm not aware of a way to
> > distinguish a looped back DAD probe from an actual duplicate address
> > probe elsewhere on the network.
> >
>
> Hi Neil & Jay,
>
> Thanks a lot for the comments.
>
> The use case is to add IPv6 address on the bonding interface first,
> and then set up port channel on switch. We'll hit this issue and the
> new address will stay tentative and unusable after port channel is set
> up on switch. This patch is for this valid use case.
>
> Except failover mode, all slaves are active on receiving packets, so
> we are receiving such looped back DAD and the bonding driver cannot
> ignore them. I cannot think of a way to distinguish if a DAD is looped
> back or from someone else having the same mac address. They look the
> same to the host. If there is another machine having the same mac
> address, this code path gets executed if both are doing DAD at the
> same time for the same IPv6 address. Maybe we should find out what the
> specification defines for this case?
>

RFC4862 has a discussion about this issue:
http://tools.ietf.org/html/rfc4862#appendix-A
The better solution could be to record the number of DAD sent out. If
we received more DAD packets than we sent out, there is someone else
on the network who has the same mac address and sent DAD for the same
IPv6 address. However, this solution doesn't work with bonding
interface, since all other active slaves but the one sending out DAD
will receive packet looped back. It doesn't seem there is a simple
solution for this issue.

Yinglin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Oct. 7, 2011, 12:59 a.m. UTC | #5
Yinglin Sun <Yinglin.Sun@emc.com> wrote:

>On Thu, Oct 6, 2011 at 3:17 PM, Yinglin Sun <Yinglin.Sun@emc.com> wrote:
>>
>> On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>> >
>> > Neil Horman <nhorman@tuxdriver.com> wrote:
>> >
>> > >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
>> > >> Steps to reproduce this issue:
>> > >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
>> > >> 2. add an IPv6 address to bond0
>> > >> 3. DAD packet is sent out from one slave and then is looped back from
>> > >> the other slave. Therefore, it is treated as a duplicate address and
>> > >> stays tentative afterwards:
>> > >>    kern.info:
>> > >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!

[...]

>> > >Nack, This seems like it will just completely break DAD.  What if theres another
>> > >system out there with the same mac address.  A response from that system would
>> > >get dropped by this filter, instead of causing The local system to stop using
>> > >the address.  What you really want to do is modify
>> > >bond_should_deliver_exact_match to detect this frame on the inactive slave or
>> > >some such, and drop the frame there.
>> >
>> >        Also NACK; and adding a bit of information.  The balance-xor
>> > mode is nominally expecting to interact with a switch whose ports are
>> > set for etherchannel ("static link aggregation"), in which case the
>> > switch will not loop the packet back around.
>> >
>> >        If your switch can do etherchannel, then enable it and the
>> > problem should go away.  If your switch cannot do this, then you may
>> > have other issues, because all of the multicast or broadcast packets
>> > going out any bonding slave will loop around to another slave.  You
>> > could also use 802.3ad / LACP if you switch supports that.
>> >
>> >        For balance-xor (or balance-rr, for that matter) mode to a
>> > non-etherchannel switch, it's going to be difficult, if not impossible,
>> > to modify bond_should_deliver_exact_match, because there are no inactive
>> > slaves.  In this mode, bonding is expecting the switch to balance
>> > incoming traffic across the ports, and not deliver looped back packets
>> > or duplicates.  There are no restrictions on what type of traffic
>> > (mcast, bcast, ucast) may arrive on any given port.
>> >
>> >        I can't think of a way to make the non-etherchannel case work
>> > for balance-xor (or balance-rr) without breaking the DAD functionality
>> > in the case of an actual duplicate.  I'm not aware of a way to
>> > distinguish a looped back DAD probe from an actual duplicate address
>> > probe elsewhere on the network.
>> >
>>
>> Hi Neil & Jay,
>>
>> Thanks a lot for the comments.
>>
>> The use case is to add IPv6 address on the bonding interface first,
>> and then set up port channel on switch. We'll hit this issue and the
>> new address will stay tentative and unusable after port channel is set
>> up on switch. This patch is for this valid use case.
>>
>> Except failover mode, all slaves are active on receiving packets, so
>> we are receiving such looped back DAD and the bonding driver cannot
>> ignore them. I cannot think of a way to distinguish if a DAD is looped
>> back or from someone else having the same mac address. They look the
>> same to the host. If there is another machine having the same mac
>> address, this code path gets executed if both are doing DAD at the
>> same time for the same IPv6 address. Maybe we should find out what the
>> specification defines for this case?
>>
>
>RFC4862 has a discussion about this issue:
>http://tools.ietf.org/html/rfc4862#appendix-A
>The better solution could be to record the number of DAD sent out. If
>we received more DAD packets than we sent out, there is someone else
>on the network who has the same mac address and sent DAD for the same
>IPv6 address. However, this solution doesn't work with bonding
>interface, since all other active slaves but the one sending out DAD
>will receive packet looped back. It doesn't seem there is a simple
>solution for this issue.

	Why are you setting up the port channel after configuring the
bond?

	As a possible workaround, if you have control over the setup
process (perhaps it's some sort of manual process), adding one slave to
the bond, leaving the other soon-to-be slaves down, then setting up the
switch, and finally adding the remaining slaves should work around the
issue, since if the bond has only one slave it won't see any looped
packets.

	Or you could bring the bond up as active-backup, then change the
mode to balance-xor once the switch is configured.

	Ultimately, though, the problem stems from the settings mismatch
between the switch and the bonding system; balance-xor is meant to
interoperate with etherchannel, and when the switch is not configured
properly, correct behavior is difficult to guarantee.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinglin Sun Oct. 7, 2011, 1:24 a.m. UTC | #6
On Thu, Oct 6, 2011 at 5:59 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> Yinglin Sun <Yinglin.Sun@emc.com> wrote:
>
>>On Thu, Oct 6, 2011 at 3:17 PM, Yinglin Sun <Yinglin.Sun@emc.com> wrote:
>>>
>>> On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>>> >
>>> > Neil Horman <nhorman@tuxdriver.com> wrote:
>>> >
>>> > >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
>>> > >> Steps to reproduce this issue:
>>> > >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
>>> > >> 2. add an IPv6 address to bond0
>>> > >> 3. DAD packet is sent out from one slave and then is looped back from
>>> > >> the other slave. Therefore, it is treated as a duplicate address and
>>> > >> stays tentative afterwards:
>>> > >>    kern.info:
>>> > >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
>
> [...]
>
>>> > >Nack, This seems like it will just completely break DAD.  What if theres another
>>> > >system out there with the same mac address.  A response from that system would
>>> > >get dropped by this filter, instead of causing The local system to stop using
>>> > >the address.  What you really want to do is modify
>>> > >bond_should_deliver_exact_match to detect this frame on the inactive slave or
>>> > >some such, and drop the frame there.
>>> >
>>> >        Also NACK; and adding a bit of information.  The balance-xor
>>> > mode is nominally expecting to interact with a switch whose ports are
>>> > set for etherchannel ("static link aggregation"), in which case the
>>> > switch will not loop the packet back around.
>>> >
>>> >        If your switch can do etherchannel, then enable it and the
>>> > problem should go away.  If your switch cannot do this, then you may
>>> > have other issues, because all of the multicast or broadcast packets
>>> > going out any bonding slave will loop around to another slave.  You
>>> > could also use 802.3ad / LACP if you switch supports that.
>>> >
>>> >        For balance-xor (or balance-rr, for that matter) mode to a
>>> > non-etherchannel switch, it's going to be difficult, if not impossible,
>>> > to modify bond_should_deliver_exact_match, because there are no inactive
>>> > slaves.  In this mode, bonding is expecting the switch to balance
>>> > incoming traffic across the ports, and not deliver looped back packets
>>> > or duplicates.  There are no restrictions on what type of traffic
>>> > (mcast, bcast, ucast) may arrive on any given port.
>>> >
>>> >        I can't think of a way to make the non-etherchannel case work
>>> > for balance-xor (or balance-rr) without breaking the DAD functionality
>>> > in the case of an actual duplicate.  I'm not aware of a way to
>>> > distinguish a looped back DAD probe from an actual duplicate address
>>> > probe elsewhere on the network.
>>> >
>>>
>>> Hi Neil & Jay,
>>>
>>> Thanks a lot for the comments.
>>>
>>> The use case is to add IPv6 address on the bonding interface first,
>>> and then set up port channel on switch. We'll hit this issue and the
>>> new address will stay tentative and unusable after port channel is set
>>> up on switch. This patch is for this valid use case.
>>>
>>> Except failover mode, all slaves are active on receiving packets, so
>>> we are receiving such looped back DAD and the bonding driver cannot
>>> ignore them. I cannot think of a way to distinguish if a DAD is looped
>>> back or from someone else having the same mac address. They look the
>>> same to the host. If there is another machine having the same mac
>>> address, this code path gets executed if both are doing DAD at the
>>> same time for the same IPv6 address. Maybe we should find out what the
>>> specification defines for this case?
>>>
>>
>>RFC4862 has a discussion about this issue:
>>http://tools.ietf.org/html/rfc4862#appendix-A
>>The better solution could be to record the number of DAD sent out. If
>>we received more DAD packets than we sent out, there is someone else
>>on the network who has the same mac address and sent DAD for the same
>>IPv6 address. However, this solution doesn't work with bonding
>>interface, since all other active slaves but the one sending out DAD
>>will receive packet looped back. It doesn't seem there is a simple
>>solution for this issue.
>
>        Why are you setting up the port channel after configuring the
> bond?
>
>        As a possible workaround, if you have control over the setup
> process (perhaps it's some sort of manual process), adding one slave to
> the bond, leaving the other soon-to-be slaves down, then setting up the
> switch, and finally adding the remaining slaves should work around the
> issue, since if the bond has only one slave it won't see any looped
> packets.
>
>        Or you could bring the bond up as active-backup, then change the
> mode to balance-xor once the switch is configured.
>
>        Ultimately, though, the problem stems from the settings mismatch
> between the switch and the bonding system; balance-xor is meant to
> interoperate with etherchannel, and when the switch is not configured
> properly, correct behavior is difficult to guarantee.
>

Jay,

Thanks a lot for the suggestion.

It's mainly about usability. We would like to provide customers with
consistent IPv6 configuration procedures as IPv4.  Such workarounds
could be confusing and generate customer calls.

Yinglin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Anderson Oct. 7, 2011, 6:13 a.m. UTC | #7
On Thu, Oct 06, 2011 at 06:24:36PM -0700, Yinglin Sun wrote:
> On Thu, Oct 6, 2011 at 5:59 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> >        Why are you setting up the port channel after configuring the
> > bond?
> >
> >        As a possible workaround, if you have control over the setup
> > process (perhaps it's some sort of manual process), adding one slave to
> > the bond, leaving the other soon-to-be slaves down, then setting up the
> > switch, and finally adding the remaining slaves should work around the
> > issue, since if the bond has only one slave it won't see any looped
> > packets.
> >
> >        Or you could bring the bond up as active-backup, then change the
> > mode to balance-xor once the switch is configured.
> >
> >        Ultimately, though, the problem stems from the settings mismatch
> > between the switch and the bonding system; balance-xor is meant to
> > interoperate with etherchannel, and when the switch is not configured
> > properly, correct behavior is difficult to guarantee.
> >
> 
> Jay,
> 
> Thanks a lot for the suggestion.
> 
> It's mainly about usability. We would like to provide customers with
> consistent IPv6 configuration procedures as IPv4.  Such workarounds
> could be confusing and generate customer calls.

You've created/encouraged your customers to create a broken network
configuration by connecting two bonded links to a non-bonded,
non-etherchannel switch port pair.  This type of misconfiguration,
when applied to inter-switch trunks, can cause major network issues,
like looping and broadcast storms, taking down the entire network
unless something like Spanning Tree is enabled to protect against such
accidental loops.  It should be avoided at all costs.  Luckily, if the
Linux host in this case is not being used as a switch/bridge, the
impact of this might not be so bad--perhaps limited to the IPv6 DAD
issue you report.

If you want better usability and plug-n-play bonding, then require
LACP/802.3ad to be used.  Please don't encourage your customers to
connect misconfigured devices to the network, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Oct. 7, 2011, 11:10 a.m. UTC | #8
On Thu, Oct 06, 2011 at 06:24:36PM -0700, Yinglin Sun wrote:
> On Thu, Oct 6, 2011 at 5:59 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> > Yinglin Sun <Yinglin.Sun@emc.com> wrote:
> >
> >>On Thu, Oct 6, 2011 at 3:17 PM, Yinglin Sun <Yinglin.Sun@emc.com> wrote:
> >>>
> >>> On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> >>> >
> >>> > Neil Horman <nhorman@tuxdriver.com> wrote:
> >>> >
> >>> > >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
> >>> > >> Steps to reproduce this issue:
> >>> > >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
> >>> > >> 2. add an IPv6 address to bond0
> >>> > >> 3. DAD packet is sent out from one slave and then is looped back from
> >>> > >> the other slave. Therefore, it is treated as a duplicate address and
> >>> > >> stays tentative afterwards:
> >>> > >>    kern.info:
> >>> > >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
> >
> > [...]
> >
> >>> > >Nack, This seems like it will just completely break DAD.  What if theres another
> >>> > >system out there with the same mac address.  A response from that system would
> >>> > >get dropped by this filter, instead of causing The local system to stop using
> >>> > >the address.  What you really want to do is modify
> >>> > >bond_should_deliver_exact_match to detect this frame on the inactive slave or
> >>> > >some such, and drop the frame there.
> >>> >
> >>> >        Also NACK; and adding a bit of information.  The balance-xor
> >>> > mode is nominally expecting to interact with a switch whose ports are
> >>> > set for etherchannel ("static link aggregation"), in which case the
> >>> > switch will not loop the packet back around.
> >>> >
> >>> >        If your switch can do etherchannel, then enable it and the
> >>> > problem should go away.  If your switch cannot do this, then you may
> >>> > have other issues, because all of the multicast or broadcast packets
> >>> > going out any bonding slave will loop around to another slave.  You
> >>> > could also use 802.3ad / LACP if you switch supports that.
> >>> >
> >>> >        For balance-xor (or balance-rr, for that matter) mode to a
> >>> > non-etherchannel switch, it's going to be difficult, if not impossible,
> >>> > to modify bond_should_deliver_exact_match, because there are no inactive
> >>> > slaves.  In this mode, bonding is expecting the switch to balance
> >>> > incoming traffic across the ports, and not deliver looped back packets
> >>> > or duplicates.  There are no restrictions on what type of traffic
> >>> > (mcast, bcast, ucast) may arrive on any given port.
> >>> >
> >>> >        I can't think of a way to make the non-etherchannel case work
> >>> > for balance-xor (or balance-rr) without breaking the DAD functionality
> >>> > in the case of an actual duplicate.  I'm not aware of a way to
> >>> > distinguish a looped back DAD probe from an actual duplicate address
> >>> > probe elsewhere on the network.
> >>> >
> >>>
> >>> Hi Neil & Jay,
> >>>
> >>> Thanks a lot for the comments.
> >>>
> >>> The use case is to add IPv6 address on the bonding interface first,
> >>> and then set up port channel on switch. We'll hit this issue and the
> >>> new address will stay tentative and unusable after port channel is set
> >>> up on switch. This patch is for this valid use case.
> >>>
> >>> Except failover mode, all slaves are active on receiving packets, so
> >>> we are receiving such looped back DAD and the bonding driver cannot
> >>> ignore them. I cannot think of a way to distinguish if a DAD is looped
> >>> back or from someone else having the same mac address. They look the
> >>> same to the host. If there is another machine having the same mac
> >>> address, this code path gets executed if both are doing DAD at the
> >>> same time for the same IPv6 address. Maybe we should find out what the
> >>> specification defines for this case?
> >>>
> >>
> >>RFC4862 has a discussion about this issue:
> >>http://tools.ietf.org/html/rfc4862#appendix-A
> >>The better solution could be to record the number of DAD sent out. If
> >>we received more DAD packets than we sent out, there is someone else
> >>on the network who has the same mac address and sent DAD for the same
> >>IPv6 address. However, this solution doesn't work with bonding
> >>interface, since all other active slaves but the one sending out DAD
> >>will receive packet looped back. It doesn't seem there is a simple
> >>solution for this issue.
> >
> >        Why are you setting up the port channel after configuring the
> > bond?
> >
> >        As a possible workaround, if you have control over the setup
> > process (perhaps it's some sort of manual process), adding one slave to
> > the bond, leaving the other soon-to-be slaves down, then setting up the
> > switch, and finally adding the remaining slaves should work around the
> > issue, since if the bond has only one slave it won't see any looped
> > packets.
> >
> >        Or you could bring the bond up as active-backup, then change the
> > mode to balance-xor once the switch is configured.
> >
> >        Ultimately, though, the problem stems from the settings mismatch
> > between the switch and the bonding system; balance-xor is meant to
> > interoperate with etherchannel, and when the switch is not configured
> > properly, correct behavior is difficult to guarantee.
> >
> 
> Jay,
> 
> Thanks a lot for the suggestion.
> 
> It's mainly about usability. We would like to provide customers with
> consistent IPv6 configuration procedures as IPv4.  Such workarounds
> could be confusing and generate customer calls.
> 
Its not a workaround, its the way it has to be done.  You can't just drop dad
packets because you can't tell the difference between those that are looped back
and those that are legitimaely from other hosts, so you need to do something
like what Jay is suggesting.

> Yinglin
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinglin Sun Oct. 7, 2011, 4:59 p.m. UTC | #9
On Thu, Oct 6, 2011 at 11:13 PM, Chuck Anderson <cra@wpi.edu> wrote:
>
> On Thu, Oct 06, 2011 at 06:24:36PM -0700, Yinglin Sun wrote:
> > On Thu, Oct 6, 2011 at 5:59 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> > >        Why are you setting up the port channel after configuring the
> > > bond?
> > >
> > >        As a possible workaround, if you have control over the setup
> > > process (perhaps it's some sort of manual process), adding one slave to
> > > the bond, leaving the other soon-to-be slaves down, then setting up the
> > > switch, and finally adding the remaining slaves should work around the
> > > issue, since if the bond has only one slave it won't see any looped
> > > packets.
> > >
> > >        Or you could bring the bond up as active-backup, then change the
> > > mode to balance-xor once the switch is configured.
> > >
> > >        Ultimately, though, the problem stems from the settings mismatch
> > > between the switch and the bonding system; balance-xor is meant to
> > > interoperate with etherchannel, and when the switch is not configured
> > > properly, correct behavior is difficult to guarantee.
> > >
> >
> > Jay,
> >
> > Thanks a lot for the suggestion.
> >
> > It's mainly about usability. We would like to provide customers with
> > consistent IPv6 configuration procedures as IPv4.  Such workarounds
> > could be confusing and generate customer calls.
>
> You've created/encouraged your customers to create a broken network
> configuration by connecting two bonded links to a non-bonded,
> non-etherchannel switch port pair.  This type of misconfiguration,
> when applied to inter-switch trunks, can cause major network issues,
> like looping and broadcast storms, taking down the entire network
> unless something like Spanning Tree is enabled to protect against such
> accidental loops.  It should be avoided at all costs.  Luckily, if the
> Linux host in this case is not being used as a switch/bridge, the
> impact of this might not be so bad--perhaps limited to the IPv6 DAD
> issue you report.
>
> If you want better usability and plug-n-play bonding, then require
> LACP/802.3ad to be used.  Please don't encourage your customers to
> connect misconfigured devices to the network, thanks.

You are right. LACP is the good choice. It should be able to solve
this issue, since all LACP bonding slaves are down before port channel
is set up on switch. Thanks.

I'm not sure this is kind of broken network configuration. If
customers happen to add some IPv6 addresses on bonding interface
before setting up port channel on switch, they have to reconfigure all
of them again. This is not the good user experience. From my point of
view, a nice product should be able to tolerate this issue.

Thanks.

Yinglin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Oct. 7, 2011, 5:29 p.m. UTC | #10
On Fri, Oct 07, 2011 at 09:59:06AM -0700, Yinglin Sun wrote:
> On Thu, Oct 6, 2011 at 11:13 PM, Chuck Anderson <cra@wpi.edu> wrote:
> >
> > On Thu, Oct 06, 2011 at 06:24:36PM -0700, Yinglin Sun wrote:
> > > On Thu, Oct 6, 2011 at 5:59 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> > > >        Why are you setting up the port channel after configuring the
> > > > bond?
> > > >
> > > >        As a possible workaround, if you have control over the setup
> > > > process (perhaps it's some sort of manual process), adding one slave to
> > > > the bond, leaving the other soon-to-be slaves down, then setting up the
> > > > switch, and finally adding the remaining slaves should work around the
> > > > issue, since if the bond has only one slave it won't see any looped
> > > > packets.
> > > >
> > > >        Or you could bring the bond up as active-backup, then change the
> > > > mode to balance-xor once the switch is configured.
> > > >
> > > >        Ultimately, though, the problem stems from the settings mismatch
> > > > between the switch and the bonding system; balance-xor is meant to
> > > > interoperate with etherchannel, and when the switch is not configured
> > > > properly, correct behavior is difficult to guarantee.
> > > >
> > >
> > > Jay,
> > >
> > > Thanks a lot for the suggestion.
> > >
> > > It's mainly about usability. We would like to provide customers with
> > > consistent IPv6 configuration procedures as IPv4.  Such workarounds
> > > could be confusing and generate customer calls.
> >
> > You've created/encouraged your customers to create a broken network
> > configuration by connecting two bonded links to a non-bonded,
> > non-etherchannel switch port pair.  This type of misconfiguration,
> > when applied to inter-switch trunks, can cause major network issues,
> > like looping and broadcast storms, taking down the entire network
> > unless something like Spanning Tree is enabled to protect against such
> > accidental loops.  It should be avoided at all costs.  Luckily, if the
> > Linux host in this case is not being used as a switch/bridge, the
> > impact of this might not be so bad--perhaps limited to the IPv6 DAD
> > issue you report.
> >
> > If you want better usability and plug-n-play bonding, then require
> > LACP/802.3ad to be used.  Please don't encourage your customers to
> > connect misconfigured devices to the network, thanks.
> 
> You are right. LACP is the good choice. It should be able to solve
> this issue, since all LACP bonding slaves are down before port channel
> is set up on switch. Thanks.
> 
> I'm not sure this is kind of broken network configuration. If
> customers happen to add some IPv6 addresses on bonding interface
> before setting up port channel on switch, they have to reconfigure all
Bringing up an interface prior to having it, and the peer interfaces configured
to use an agreed upon mode, is rather by definition broken :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinglin Sun Oct. 7, 2011, 6:08 p.m. UTC | #11
On Fri, Oct 7, 2011 at 4:10 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Thu, Oct 06, 2011 at 06:24:36PM -0700, Yinglin Sun wrote:
>> On Thu, Oct 6, 2011 at 5:59 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>> > Yinglin Sun <Yinglin.Sun@emc.com> wrote:
>> >
>> >>On Thu, Oct 6, 2011 at 3:17 PM, Yinglin Sun <Yinglin.Sun@emc.com> wrote:
>> >>>
>> >>> On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
>> >>> >
>> >>> > Neil Horman <nhorman@tuxdriver.com> wrote:
>> >>> >
>> >>> > >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
>> >>> > >> Steps to reproduce this issue:
>> >>> > >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
>> >>> > >> 2. add an IPv6 address to bond0
>> >>> > >> 3. DAD packet is sent out from one slave and then is looped back from
>> >>> > >> the other slave. Therefore, it is treated as a duplicate address and
>> >>> > >> stays tentative afterwards:
>> >>> > >>    kern.info:
>> >>> > >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
>> >
>> > [...]
>> >
>> >>> > >Nack, This seems like it will just completely break DAD.  What if theres another
>> >>> > >system out there with the same mac address.  A response from that system would
>> >>> > >get dropped by this filter, instead of causing The local system to stop using
>> >>> > >the address.  What you really want to do is modify
>> >>> > >bond_should_deliver_exact_match to detect this frame on the inactive slave or
>> >>> > >some such, and drop the frame there.
>> >>> >
>> >>> >        Also NACK; and adding a bit of information.  The balance-xor
>> >>> > mode is nominally expecting to interact with a switch whose ports are
>> >>> > set for etherchannel ("static link aggregation"), in which case the
>> >>> > switch will not loop the packet back around.
>> >>> >
>> >>> >        If your switch can do etherchannel, then enable it and the
>> >>> > problem should go away.  If your switch cannot do this, then you may
>> >>> > have other issues, because all of the multicast or broadcast packets
>> >>> > going out any bonding slave will loop around to another slave.  You
>> >>> > could also use 802.3ad / LACP if you switch supports that.
>> >>> >
>> >>> >        For balance-xor (or balance-rr, for that matter) mode to a
>> >>> > non-etherchannel switch, it's going to be difficult, if not impossible,
>> >>> > to modify bond_should_deliver_exact_match, because there are no inactive
>> >>> > slaves.  In this mode, bonding is expecting the switch to balance
>> >>> > incoming traffic across the ports, and not deliver looped back packets
>> >>> > or duplicates.  There are no restrictions on what type of traffic
>> >>> > (mcast, bcast, ucast) may arrive on any given port.
>> >>> >
>> >>> >        I can't think of a way to make the non-etherchannel case work
>> >>> > for balance-xor (or balance-rr) without breaking the DAD functionality
>> >>> > in the case of an actual duplicate.  I'm not aware of a way to
>> >>> > distinguish a looped back DAD probe from an actual duplicate address
>> >>> > probe elsewhere on the network.
>> >>> >
>> >>>
>> >>> Hi Neil & Jay,
>> >>>
>> >>> Thanks a lot for the comments.
>> >>>
>> >>> The use case is to add IPv6 address on the bonding interface first,
>> >>> and then set up port channel on switch. We'll hit this issue and the
>> >>> new address will stay tentative and unusable after port channel is set
>> >>> up on switch. This patch is for this valid use case.
>> >>>
>> >>> Except failover mode, all slaves are active on receiving packets, so
>> >>> we are receiving such looped back DAD and the bonding driver cannot
>> >>> ignore them. I cannot think of a way to distinguish if a DAD is looped
>> >>> back or from someone else having the same mac address. They look the
>> >>> same to the host. If there is another machine having the same mac
>> >>> address, this code path gets executed if both are doing DAD at the
>> >>> same time for the same IPv6 address. Maybe we should find out what the
>> >>> specification defines for this case?
>> >>>
>> >>
>> >>RFC4862 has a discussion about this issue:
>> >>http://tools.ietf.org/html/rfc4862#appendix-A
>> >>The better solution could be to record the number of DAD sent out. If
>> >>we received more DAD packets than we sent out, there is someone else
>> >>on the network who has the same mac address and sent DAD for the same
>> >>IPv6 address. However, this solution doesn't work with bonding
>> >>interface, since all other active slaves but the one sending out DAD
>> >>will receive packet looped back. It doesn't seem there is a simple
>> >>solution for this issue.
>> >
>> >        Why are you setting up the port channel after configuring the
>> > bond?
>> >
>> >        As a possible workaround, if you have control over the setup
>> > process (perhaps it's some sort of manual process), adding one slave to
>> > the bond, leaving the other soon-to-be slaves down, then setting up the
>> > switch, and finally adding the remaining slaves should work around the
>> > issue, since if the bond has only one slave it won't see any looped
>> > packets.
>> >
>> >        Or you could bring the bond up as active-backup, then change the
>> > mode to balance-xor once the switch is configured.
>> >
>> >        Ultimately, though, the problem stems from the settings mismatch
>> > between the switch and the bonding system; balance-xor is meant to
>> > interoperate with etherchannel, and when the switch is not configured
>> > properly, correct behavior is difficult to guarantee.
>> >
>>
>> Jay,
>>
>> Thanks a lot for the suggestion.
>>
>> It's mainly about usability. We would like to provide customers with
>> consistent IPv6 configuration procedures as IPv4.  Such workarounds
>> could be confusing and generate customer calls.
>>
> Its not a workaround, its the way it has to be done.  You can't just drop dad
> packets because you can't tell the difference between those that are looped back
> and those that are legitimaely from other hosts, so you need to do something
> like what Jay is suggesting.
>

Yes, you are right. For this case, we cannot tell the difference
between those DADs  looped back and DADs for the same IPv6 addresses,
which are from other machines having the same mac address on the
network.

If this happens, the network is already broken due to duplicate mac
addresses, right? I know this is not the excuse to drop DAD from
others having the same mac address, since those DADs are legitimate
and indicate duplicate IPv6 address in the network. Just out of
curiosity, is there any case that such network can still function well
with multiple machines having the same mac address besides link
aggregation? Maybe we could drop those DADs since the network is
already in trouble?

Thanks.

Yinglin
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Neil Horman Oct. 7, 2011, 7:09 p.m. UTC | #12
On Fri, Oct 07, 2011 at 11:08:49AM -0700, Yinglin Sun wrote:
> On Fri, Oct 7, 2011 at 4:10 AM, Neil Horman <nhorman@tuxdriver.com> wrote:
> > On Thu, Oct 06, 2011 at 06:24:36PM -0700, Yinglin Sun wrote:
> >> On Thu, Oct 6, 2011 at 5:59 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> >> > Yinglin Sun <Yinglin.Sun@emc.com> wrote:
> >> >
> >> >>On Thu, Oct 6, 2011 at 3:17 PM, Yinglin Sun <Yinglin.Sun@emc.com> wrote:
> >> >>>
> >> >>> On Thu, Oct 6, 2011 at 12:05 PM, Jay Vosburgh <fubar@us.ibm.com> wrote:
> >> >>> >
> >> >>> > Neil Horman <nhorman@tuxdriver.com> wrote:
> >> >>> >
> >> >>> > >On Wed, Oct 05, 2011 at 08:59:10PM -0700, Yinglin Sun wrote:
> >> >>> > >> Steps to reproduce this issue:
> >> >>> > >> 1. create bond0 over eth0 and eth1, set the mode to balance-xor
> >> >>> > >> 2. add an IPv6 address to bond0
> >> >>> > >> 3. DAD packet is sent out from one slave and then is looped back from
> >> >>> > >> the other slave. Therefore, it is treated as a duplicate address and
> >> >>> > >> stays tentative afterwards:
> >> >>> > >>    kern.info:
> >> >>> > >>        Oct  5 11:50:18 testvm1 kernel: [  129.224353] bond0: IPv6 duplicate address 1234::1 detected!
> >> >
> >> > [...]
> >> >
> >> >>> > >Nack, This seems like it will just completely break DAD.  What if theres another
> >> >>> > >system out there with the same mac address.  A response from that system would
> >> >>> > >get dropped by this filter, instead of causing The local system to stop using
> >> >>> > >the address.  What you really want to do is modify
> >> >>> > >bond_should_deliver_exact_match to detect this frame on the inactive slave or
> >> >>> > >some such, and drop the frame there.
> >> >>> >
> >> >>> >        Also NACK; and adding a bit of information.  The balance-xor
> >> >>> > mode is nominally expecting to interact with a switch whose ports are
> >> >>> > set for etherchannel ("static link aggregation"), in which case the
> >> >>> > switch will not loop the packet back around.
> >> >>> >
> >> >>> >        If your switch can do etherchannel, then enable it and the
> >> >>> > problem should go away.  If your switch cannot do this, then you may
> >> >>> > have other issues, because all of the multicast or broadcast packets
> >> >>> > going out any bonding slave will loop around to another slave.  You
> >> >>> > could also use 802.3ad / LACP if you switch supports that.
> >> >>> >
> >> >>> >        For balance-xor (or balance-rr, for that matter) mode to a
> >> >>> > non-etherchannel switch, it's going to be difficult, if not impossible,
> >> >>> > to modify bond_should_deliver_exact_match, because there are no inactive
> >> >>> > slaves.  In this mode, bonding is expecting the switch to balance
> >> >>> > incoming traffic across the ports, and not deliver looped back packets
> >> >>> > or duplicates.  There are no restrictions on what type of traffic
> >> >>> > (mcast, bcast, ucast) may arrive on any given port.
> >> >>> >
> >> >>> >        I can't think of a way to make the non-etherchannel case work
> >> >>> > for balance-xor (or balance-rr) without breaking the DAD functionality
> >> >>> > in the case of an actual duplicate.  I'm not aware of a way to
> >> >>> > distinguish a looped back DAD probe from an actual duplicate address
> >> >>> > probe elsewhere on the network.
> >> >>> >
> >> >>>
> >> >>> Hi Neil & Jay,
> >> >>>
> >> >>> Thanks a lot for the comments.
> >> >>>
> >> >>> The use case is to add IPv6 address on the bonding interface first,
> >> >>> and then set up port channel on switch. We'll hit this issue and the
> >> >>> new address will stay tentative and unusable after port channel is set
> >> >>> up on switch. This patch is for this valid use case.
> >> >>>
> >> >>> Except failover mode, all slaves are active on receiving packets, so
> >> >>> we are receiving such looped back DAD and the bonding driver cannot
> >> >>> ignore them. I cannot think of a way to distinguish if a DAD is looped
> >> >>> back or from someone else having the same mac address. They look the
> >> >>> same to the host. If there is another machine having the same mac
> >> >>> address, this code path gets executed if both are doing DAD at the
> >> >>> same time for the same IPv6 address. Maybe we should find out what the
> >> >>> specification defines for this case?
> >> >>>
> >> >>
> >> >>RFC4862 has a discussion about this issue:
> >> >>http://tools.ietf.org/html/rfc4862#appendix-A
> >> >>The better solution could be to record the number of DAD sent out. If
> >> >>we received more DAD packets than we sent out, there is someone else
> >> >>on the network who has the same mac address and sent DAD for the same
> >> >>IPv6 address. However, this solution doesn't work with bonding
> >> >>interface, since all other active slaves but the one sending out DAD
> >> >>will receive packet looped back. It doesn't seem there is a simple
> >> >>solution for this issue.
> >> >
> >> >        Why are you setting up the port channel after configuring the
> >> > bond?
> >> >
> >> >        As a possible workaround, if you have control over the setup
> >> > process (perhaps it's some sort of manual process), adding one slave to
> >> > the bond, leaving the other soon-to-be slaves down, then setting up the
> >> > switch, and finally adding the remaining slaves should work around the
> >> > issue, since if the bond has only one slave it won't see any looped
> >> > packets.
> >> >
> >> >        Or you could bring the bond up as active-backup, then change the
> >> > mode to balance-xor once the switch is configured.
> >> >
> >> >        Ultimately, though, the problem stems from the settings mismatch
> >> > between the switch and the bonding system; balance-xor is meant to
> >> > interoperate with etherchannel, and when the switch is not configured
> >> > properly, correct behavior is difficult to guarantee.
> >> >
> >>
> >> Jay,
> >>
> >> Thanks a lot for the suggestion.
> >>
> >> It's mainly about usability. We would like to provide customers with
> >> consistent IPv6 configuration procedures as IPv4.  Such workarounds
> >> could be confusing and generate customer calls.
> >>
> > Its not a workaround, its the way it has to be done.  You can't just drop dad
> > packets because you can't tell the difference between those that are looped back
> > and those that are legitimaely from other hosts, so you need to do something
> > like what Jay is suggesting.
> >
> 
> Yes, you are right. For this case, we cannot tell the difference
> between those DADs  looped back and DADs for the same IPv6 addresses,
> which are from other machines having the same mac address on the
> network.
> 
> If this happens, the network is already broken due to duplicate mac
> addresses, right? I know this is not the excuse to drop DAD from
Wrong, if you assume that the other systems on the network all implement DAD
correctly, then the network is working properly until such time as your system
joined it.  At that moment, your system is the one with the duplicate mac and
ipv6 system, and DAD is there to bring the network back into a working state.

> others having the same mac address, since those DADs are legitimate
> and indicate duplicate IPv6 address in the network. Just out of
> curiosity, is there any case that such network can still function well
> with multiple machines having the same mac address besides link
> aggregation? Maybe we could drop those DADs since the network is
> already in trouble?
> 
Well, if you're just considering end systems (i.e. not routers), then having two
systems with duplicate macs and/or ipv6 addresses isnt' catastrophic to the
network as a whole,  Those two systems will just fight for ownership of the ipv6
address in the switch arp tables and will both only be intermittently reachable.
The purpose of DAD is to ensure that both parties become aware of this
condition.  The other systems on the network will function as they normally do.

Neil

> Thanks.
> 
> Yinglin
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 9da6e02..c82f4c7 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -809,9 +809,10 @@  static void ndisc_recv_ns(struct sk_buff *skb)
 
 		if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
 			if (dad) {
+				const unsigned char *sadr;
+				sadr = skb_mac_header(skb);
+
 				if (dev->type == ARPHRD_IEEE802_TR) {
-					const unsigned char *sadr;
-					sadr = skb_mac_header(skb);
 					if (((sadr[8] ^ dev->dev_addr[0]) & 0x7f) == 0 &&
 					    sadr[9] == dev->dev_addr[1] &&
 					    sadr[10] == dev->dev_addr[2] &&
@@ -821,6 +822,16 @@  static void ndisc_recv_ns(struct sk_buff *skb)
 						/* looped-back to us */
 						goto out;
 					}
+				} else if (dev->type == ARPHRD_ETHER) {
+					if (sadr[6] == dev->dev_addr[0] &&
+					    sadr[7] == dev->dev_addr[1] &&
+					    sadr[8] == dev->dev_addr[2] &&
+					    sadr[9] == dev->dev_addr[3] &&
+					    sadr[10] == dev->dev_addr[4] &&
+					    sadr[11] == dev->dev_addr[5]) {
+						/* looped-back to us */
+						goto out;
+					}
 				}
 
 				/*