diff mbox

[net-2.6] bonding: drop frames received with master's source MAC

Message ID 1298668408-14849-1-git-send-email-andy@greyhouse.net
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Andy Gospodarek Feb. 25, 2011, 9:13 p.m. UTC
I was looking at my system and wondering why I sometimes saw these
DAD messages in my logs:

bond0: IPv6 duplicate address fe80::21b:21ff:fe38:2ec4 detected!

I traced it back and realized the IPv6 Neighbor Solicitations I was
sending were also coming back into the stack on the slave(s) that did
not transmit the frames.  I could not think of a compelling reason to
notify the user that a NS we sent came back, so I set out to just drop
the frame silently in ndisc_recv_ns drop.

That seemed to work well, but when I thought about it I could not
compelling reason to save any of these frames.  Dropping them as soon as
we get them seems like a much better idea as it fixes other issues that
may exist for more than just IPv6 DAD.

I chose to check the incoming frame against the master's MAC address as
that should be the MAC address used anytime a broadcast frame is sent by
the bonding driver that had the chance to make its way back into one of
the other devices.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Cc: David Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jiri Pirko <jpirko@redhat.com>

---

I realize this patch comes right in the middle of Jiri Pirko's attempts
to move this functionality to the bonding driver, but I think this may
be important enough to include now (possibly in 2.6.38 and to -stable)
if others agree.

---
 net/core/dev.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

Comments

Nicolas de Pesloüan Feb. 25, 2011, 10:04 p.m. UTC | #1
Le 25/02/2011 22:13, Andy Gospodarek a écrit :
> I was looking at my system and wondering why I sometimes saw these
> DAD messages in my logs:
>
> bond0: IPv6 duplicate address fe80::21b:21ff:fe38:2ec4 detected!
>
> I traced it back and realized the IPv6 Neighbor Solicitations I was
> sending were also coming back into the stack on the slave(s) that did
> not transmit the frames.  I could not think of a compelling reason to
> notify the user that a NS we sent came back, so I set out to just drop
> the frame silently in ndisc_recv_ns drop.
>
> That seemed to work well, but when I thought about it I could not
> compelling reason to save any of these frames.  Dropping them as soon as
> we get them seems like a much better idea as it fixes other issues that
> may exist for more than just IPv6 DAD.
>
> I chose to check the incoming frame against the master's MAC address as
> that should be the MAC address used anytime a broadcast frame is sent by
> the bonding driver that had the chance to make its way back into one of
> the other devices.

I think this could break the ARP monitoring. ARP monitoring rely on a normal protocol handler, 
registered in bond_main.c.

void bond_register_arp(struct bonding *bond)
{
         struct packet_type *pt = &bond->arp_mon_pt;

         if (pt->type)
                 return;

         pt->type = htons(ETH_P_ARP);
         pt->dev = bond->dev;
         pt->func = bond_arp_rcv;
         dev_add_pack(pt);
}

For as far as I understand, some variants of arp_validate require the backup interfaces to receive 
ARP requests sent from the master, through the active interface, presumably with the master MAC as 
the source MAC.

As this protocol handler is registered at the master level, the exact match logic in 
__netif_receive_skb(), which apply at the slave level, shouldn't deliver this skb to bond_arp_rcv().

Can someone confirm ? Jay ?

	Nicolas.

> Signed-off-by: Andy Gospodarek<andy@greyhouse.net>
> Cc: David Miller<davem@davemloft.net>
> Cc: Herbert Xu<herbert@gondor.apana.org.au>
> Cc: Jay Vosburgh<fubar@us.ibm.com>
> Cc: Jiri Pirko<jpirko@redhat.com>
>
> ---
>
> I realize this patch comes right in the middle of Jiri Pirko's attempts
> to move this functionality to the bonding driver, but I think this may
> be important enough to include now (possibly in 2.6.38 and to -stable)
> if others agree.
>
> ---
>   net/core/dev.c |    5 +++++
>   1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8ae6631..4a76ccd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2971,6 +2971,11 @@ static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
>   int __skb_bond_should_drop(struct sk_buff *skb, struct net_device *master)
>   {
>   	struct net_device *dev = skb->dev;
> +	struct ethhdr *eth = eth_hdr(skb);
> +
> +	/* Drop all frames with the bond master's source address. */
> +	if (unlikely(!compare_ether_addr(eth->h_source, master->dev_addr)))
> +		return 1;
>
>   	if (master->priv_flags&  IFF_MASTER_ARPMON)
>   		dev->last_rx = jiffies;

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek Feb. 25, 2011, 10:24 p.m. UTC | #2
On Fri, Feb 25, 2011 at 11:04:27PM +0100, Nicolas de Pesloüan wrote:
> Le 25/02/2011 22:13, Andy Gospodarek a écrit :
>> I was looking at my system and wondering why I sometimes saw these
>> DAD messages in my logs:
>>
>> bond0: IPv6 duplicate address fe80::21b:21ff:fe38:2ec4 detected!
>>
>> I traced it back and realized the IPv6 Neighbor Solicitations I was
>> sending were also coming back into the stack on the slave(s) that did
>> not transmit the frames.  I could not think of a compelling reason to
>> notify the user that a NS we sent came back, so I set out to just drop
>> the frame silently in ndisc_recv_ns drop.
>>
>> That seemed to work well, but when I thought about it I could not
>> compelling reason to save any of these frames.  Dropping them as soon as
>> we get them seems like a much better idea as it fixes other issues that
>> may exist for more than just IPv6 DAD.
>>
>> I chose to check the incoming frame against the master's MAC address as
>> that should be the MAC address used anytime a broadcast frame is sent by
>> the bonding driver that had the chance to make its way back into one of
>> the other devices.
>
> I think this could break the ARP monitoring. ARP monitoring rely on a 
> normal protocol handler, registered in bond_main.c.
>
> void bond_register_arp(struct bonding *bond)
> {
>         struct packet_type *pt = &bond->arp_mon_pt;
>
>         if (pt->type)
>                 return;
>
>         pt->type = htons(ETH_P_ARP);
>         pt->dev = bond->dev;
>         pt->func = bond_arp_rcv;
>         dev_add_pack(pt);
> }
>
> For as far as I understand, some variants of arp_validate require the 
> backup interfaces to receive ARP requests sent from the master, through 
> the active interface, presumably with the master MAC as the source MAC.
>
> As this protocol handler is registered at the master level, the exact 
> match logic in __netif_receive_skb(), which apply at the slave level, 
> shouldn't deliver this skb to bond_arp_rcv().
>
> Can someone confirm ? Jay ?
>
> 	Nicolas.
>

I confirmed your suspicion, this breaks ARP monitoring.  I would still
welcome other opinions though as I think it would be nice to fix this as
low as possible.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh Feb. 25, 2011, 10:28 p.m. UTC | #3
Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com> wrote:

>Le 25/02/2011 22:13, Andy Gospodarek a écrit :
>> I was looking at my system and wondering why I sometimes saw these
>> DAD messages in my logs:
>>
>> bond0: IPv6 duplicate address fe80::21b:21ff:fe38:2ec4 detected!
>>
>> I traced it back and realized the IPv6 Neighbor Solicitations I was
>> sending were also coming back into the stack on the slave(s) that did
>> not transmit the frames.  I could not think of a compelling reason to
>> notify the user that a NS we sent came back, so I set out to just drop
>> the frame silently in ndisc_recv_ns drop.
>>
>> That seemed to work well, but when I thought about it I could not
>> compelling reason to save any of these frames.  Dropping them as soon as
>> we get them seems like a much better idea as it fixes other issues that
>> may exist for more than just IPv6 DAD.
>>
>> I chose to check the incoming frame against the master's MAC address as
>> that should be the MAC address used anytime a broadcast frame is sent by
>> the bonding driver that had the chance to make its way back into one of
>> the other devices.
>
>I think this could break the ARP monitoring. ARP monitoring rely on a
>normal protocol handler, registered in bond_main.c.
>
>void bond_register_arp(struct bonding *bond)
>{
>        struct packet_type *pt = &bond->arp_mon_pt;
>
>        if (pt->type)
>                return;
>
>        pt->type = htons(ETH_P_ARP);
>        pt->dev = bond->dev;
>        pt->func = bond_arp_rcv;
>        dev_add_pack(pt);
>}
>
>For as far as I understand, some variants of arp_validate require the
>backup interfaces to receive ARP requests sent from the master, through
>the active interface, presumably with the master MAC as the source MAC.
>
>As this protocol handler is registered at the master level, the exact
>match logic in __netif_receive_skb(), which apply at the slave level,
>shouldn't deliver this skb to bond_arp_rcv().
>
>Can someone confirm ? Jay ?

	Yes, this is how the ARP monitor works for inactive slaves in
active-backup mode.  It expects to see the broadcast ARP requests loop
back around to the inactive slaves.  If arp_validate is on, the ARP
frames will be inspected to insure that it was sent by the appropriate
master.

	Still, though, if the NS packets are coming in on an inactive
slave, why aren't they already being dropped?  Even in alb mode, there
is a loose concept of "active" and "inactive" in the sense that only one
slave is used for things like broadcast or multicast.

	Andy, what configuration are you seeing this problem in?

	-J

>	Nicolas.
>
>> Signed-off-by: Andy Gospodarek<andy@greyhouse.net>
>> Cc: David Miller<davem@davemloft.net>
>> Cc: Herbert Xu<herbert@gondor.apana.org.au>
>> Cc: Jay Vosburgh<fubar@us.ibm.com>
>> Cc: Jiri Pirko<jpirko@redhat.com>
>>
>> ---
>>
>> I realize this patch comes right in the middle of Jiri Pirko's attempts
>> to move this functionality to the bonding driver, but I think this may
>> be important enough to include now (possibly in 2.6.38 and to -stable)
>> if others agree.
>>
>> ---
>>   net/core/dev.c |    5 +++++
>>   1 files changed, 5 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 8ae6631..4a76ccd 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -2971,6 +2971,11 @@ static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
>>   int __skb_bond_should_drop(struct sk_buff *skb, struct net_device *master)
>>   {
>>   	struct net_device *dev = skb->dev;
>> +	struct ethhdr *eth = eth_hdr(skb);
>> +
>> +	/* Drop all frames with the bond master's source address. */
>> +	if (unlikely(!compare_ether_addr(eth->h_source, master->dev_addr)))
>> +		return 1;
>>
>>   	if (master->priv_flags&  IFF_MASTER_ARPMON)
>>   		dev->last_rx = jiffies;

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas de Pesloüan Feb. 25, 2011, 11:08 p.m. UTC | #4
Le 25/02/2011 23:24, Andy Gospodarek a écrit :
> On Fri, Feb 25, 2011 at 11:04:27PM +0100, Nicolas de Pesloüan wrote:
>> Le 25/02/2011 22:13, Andy Gospodarek a écrit :
>>> I was looking at my system and wondering why I sometimes saw these
>>> DAD messages in my logs:
>>>
>>> bond0: IPv6 duplicate address fe80::21b:21ff:fe38:2ec4 detected!
>>>
>>> I traced it back and realized the IPv6 Neighbor Solicitations I was
>>> sending were also coming back into the stack on the slave(s) that did
>>> not transmit the frames.  I could not think of a compelling reason to
>>> notify the user that a NS we sent came back, so I set out to just drop
>>> the frame silently in ndisc_recv_ns drop.
>>>
>>> That seemed to work well, but when I thought about it I could not
>>> compelling reason to save any of these frames.  Dropping them as soon as
>>> we get them seems like a much better idea as it fixes other issues that
>>> may exist for more than just IPv6 DAD.
>>>
>>> I chose to check the incoming frame against the master's MAC address as
>>> that should be the MAC address used anytime a broadcast frame is sent by
>>> the bonding driver that had the chance to make its way back into one of
>>> the other devices.
>>
>> I think this could break the ARP monitoring. ARP monitoring rely on a
>> normal protocol handler, registered in bond_main.c.
>>
>> void bond_register_arp(struct bonding *bond)
>> {
>>          struct packet_type *pt =&bond->arp_mon_pt;
>>
>>          if (pt->type)
>>                  return;
>>
>>          pt->type = htons(ETH_P_ARP);
>>          pt->dev = bond->dev;
>>          pt->func = bond_arp_rcv;
>>          dev_add_pack(pt);
>> }
>>
>> For as far as I understand, some variants of arp_validate require the
>> backup interfaces to receive ARP requests sent from the master, through
>> the active interface, presumably with the master MAC as the source MAC.
>>
>> As this protocol handler is registered at the master level, the exact
>> match logic in __netif_receive_skb(), which apply at the slave level,
>> shouldn't deliver this skb to bond_arp_rcv().
>>
>> Can someone confirm ? Jay ?
>>
>> 	Nicolas.
>>
>
> I confirmed your suspicion, this breaks ARP monitoring.  I would still
> welcome other opinions though as I think it would be nice to fix this as
> low as possible.

Why do you want to fix it earlier that in ndisc_recv_ns drop? Your original idea of silently 
dropping the frame there seems perfect to me.

	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek Feb. 28, 2011, 4:32 p.m. UTC | #5
On Sat, Feb 26, 2011 at 12:08:03AM +0100, Nicolas de Pesloüan wrote:
> Le 25/02/2011 23:24, Andy Gospodarek a écrit :
[...]
>>
>> I confirmed your suspicion, this breaks ARP monitoring.  I would still
>> welcome other opinions though as I think it would be nice to fix this as
>> low as possible.
>
> Why do you want to fix it earlier that in ndisc_recv_ns drop? Your 
> original idea of silently dropping the frame there seems perfect to me.
>

Maybe it's just me, but I cannot understand why we want a bunch of extra
packets floating up into the stack when they may only create issues for
the recipients of these duplicate frames.

Clearly my original patch needs to be refined so ARP monitoring still
works, but I would rather fix the issue there than in a higher layer.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas de Pesloüan Feb. 28, 2011, 9:45 p.m. UTC | #6
Le 28/02/2011 17:32, Andy Gospodarek a écrit :
> On Sat, Feb 26, 2011 at 12:08:03AM +0100, Nicolas de Pesloüan wrote:
>> Le 25/02/2011 23:24, Andy Gospodarek a écrit :
> [...]
>>>
>>> I confirmed your suspicion, this breaks ARP monitoring.  I would still
>>> welcome other opinions though as I think it would be nice to fix this as
>>> low as possible.
>>
>> Why do you want to fix it earlier that in ndisc_recv_ns drop? Your
>> original idea of silently dropping the frame there seems perfect to me.
>>
>
> Maybe it's just me, but I cannot understand why we want a bunch of extra
> packets floating up into the stack when they may only create issues for
> the recipients of these duplicate frames.
>
> Clearly my original patch needs to be refined so ARP monitoring still
> works, but I would rather fix the issue there than in a higher layer.

Jay explained that the current implementation should already trap those frames, on inactive slaves, 
in modes where inactive slaves exist. I agree with him.

What mode are you seeing this problem in? If the current "should drop" logic is leaking, then yes, 
we should fix it. But we currently don't see where it is leaking.

	Nicolas.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek March 1, 2011, 2:35 a.m. UTC | #7
On Mon, Feb 28, 2011 at 10:45:08PM +0100, Nicolas de Pesloüan wrote:
> Le 28/02/2011 17:32, Andy Gospodarek a écrit :
>> On Sat, Feb 26, 2011 at 12:08:03AM +0100, Nicolas de Pesloüan wrote:
>>> Le 25/02/2011 23:24, Andy Gospodarek a écrit :
>> [...]
>>>>
>>>> I confirmed your suspicion, this breaks ARP monitoring.  I would still
>>>> welcome other opinions though as I think it would be nice to fix this as
>>>> low as possible.
>>>
>>> Why do you want to fix it earlier that in ndisc_recv_ns drop? Your
>>> original idea of silently dropping the frame there seems perfect to me.
>>>
>>
>> Maybe it's just me, but I cannot understand why we want a bunch of extra
>> packets floating up into the stack when they may only create issues for
>> the recipients of these duplicate frames.
>>
>> Clearly my original patch needs to be refined so ARP monitoring still
>> works, but I would rather fix the issue there than in a higher layer.
>
> Jay explained that the current implementation should already trap those 
> frames, on inactive slaves, in modes where inactive slaves exist. I agree 
> with him.
>
> What mode are you seeing this problem in? If the current "should drop" 
> logic is leaking, then yes, we should fix it. But we currently don't see 
> where it is leaking.
>

Use round-robin.  To reproduce just take the interface down and bring it
back up.  You should see the messages right away.

I've been doing some more testing on a new patch and should have
something ready tomorrow.  My latest patch actually replaces the final
'return 0' with a call to a function that will return true if the sender
was the device that received it.  This will hopefully prevent some of
the failures with the first patch I posted.  I'll know a bit more
tomorrow if this approach seems reasonable.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh March 1, 2011, 5:46 a.m. UTC | #8
Andy Gospodarek <andy@greyhouse.net> wrote:

>On Mon, Feb 28, 2011 at 10:45:08PM +0100, Nicolas de Pesloüan wrote:
>> Le 28/02/2011 17:32, Andy Gospodarek a écrit :
>>> On Sat, Feb 26, 2011 at 12:08:03AM +0100, Nicolas de Pesloüan wrote:
>>>> Le 25/02/2011 23:24, Andy Gospodarek a écrit :
>>> [...]
>>>>>
>>>>> I confirmed your suspicion, this breaks ARP monitoring.  I would still
>>>>> welcome other opinions though as I think it would be nice to fix this as
>>>>> low as possible.
>>>>
>>>> Why do you want to fix it earlier that in ndisc_recv_ns drop? Your
>>>> original idea of silently dropping the frame there seems perfect to me.
>>>>
>>>
>>> Maybe it's just me, but I cannot understand why we want a bunch of extra
>>> packets floating up into the stack when they may only create issues for
>>> the recipients of these duplicate frames.
>>>
>>> Clearly my original patch needs to be refined so ARP monitoring still
>>> works, but I would rather fix the issue there than in a higher layer.
>>
>> Jay explained that the current implementation should already trap those 
>> frames, on inactive slaves, in modes where inactive slaves exist. I agree 
>> with him.
>>
>> What mode are you seeing this problem in? If the current "should drop" 
>> logic is leaking, then yes, we should fix it. But we currently don't see 
>> where it is leaking.
>>
>
>Use round-robin.  To reproduce just take the interface down and bring it
>back up.  You should see the messages right away.

	What is the bond connected to?  The -rr and -xor modes are meant
to interoperate with switch ports configured for Etherchannel (or an
equivalent).  In that case, I wouldn't expect the switch to turn the
broadcasts / multicasts around and send them back out to a member of the
channel group they originated from.

	If the switch isn't configured properly, then I'd expect it to
complain about MAC flapping or the like.  Unless it's an unmanaged
switch that doesn't do Etherchannel (etc), or you're setting up an
unusual topology.

>I've been doing some more testing on a new patch and should have
>something ready tomorrow.  My latest patch actually replaces the final
>'return 0' with a call to a function that will return true if the sender
>was the device that received it.  This will hopefully prevent some of
>the failures with the first patch I posted.  I'll know a bit more
>tomorrow if this approach seems reasonable.

	How do you figure that out?  In -rr mode, all of the slaves
should have the same MAC address, and the slaves shouldn't be running
IPv6 addrconf separately from the master anyway.  Something magic just
for NA/NS packets?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andy Gospodarek March 1, 2011, 6:16 p.m. UTC | #9
On Mon, Feb 28, 2011 at 09:46:06PM -0800, Jay Vosburgh wrote:
> Andy Gospodarek <andy@greyhouse.net> wrote:
> 
> >On Mon, Feb 28, 2011 at 10:45:08PM +0100, Nicolas de Pesloüan wrote:
> >> Le 28/02/2011 17:32, Andy Gospodarek a écrit :
> >>> On Sat, Feb 26, 2011 at 12:08:03AM +0100, Nicolas de Pesloüan wrote:
> >>>> Le 25/02/2011 23:24, Andy Gospodarek a écrit :
> >>> [...]
> >>>>>
> >>>>> I confirmed your suspicion, this breaks ARP monitoring.  I would still
> >>>>> welcome other opinions though as I think it would be nice to fix this as
> >>>>> low as possible.
> >>>>
> >>>> Why do you want to fix it earlier that in ndisc_recv_ns drop? Your
> >>>> original idea of silently dropping the frame there seems perfect to me.
> >>>>
> >>>
> >>> Maybe it's just me, but I cannot understand why we want a bunch of extra
> >>> packets floating up into the stack when they may only create issues for
> >>> the recipients of these duplicate frames.
> >>>
> >>> Clearly my original patch needs to be refined so ARP monitoring still
> >>> works, but I would rather fix the issue there than in a higher layer.
> >>
> >> Jay explained that the current implementation should already trap those 
> >> frames, on inactive slaves, in modes where inactive slaves exist. I agree 
> >> with him.
> >>
> >> What mode are you seeing this problem in? If the current "should drop" 
> >> logic is leaking, then yes, we should fix it. But we currently don't see 
> >> where it is leaking.
> >>
> >
> >Use round-robin.  To reproduce just take the interface down and bring it
> >back up.  You should see the messages right away.
> 
> 	What is the bond connected to?  The -rr and -xor modes are meant
> to interoperate with switch ports configured for Etherchannel (or an
> equivalent).  In that case, I wouldn't expect the switch to turn the
> broadcasts / multicasts around and send them back out to a member of the
> channel group they originated from.
> 
> 	If the switch isn't configured properly, then I'd expect it to
> complain about MAC flapping or the like.  Unless it's an unmanaged
> switch that doesn't do Etherchannel (etc), or you're setting up an
> unusual topology.
> 

My system is connected to an unmanaged switch, but I guess I don't
generally consider that to be an unual topology.  While balance-rr does
work well when the switch is configured for etherchannel, it can also
work just fine without. 

> >I've been doing some more testing on a new patch and should have
> >something ready tomorrow.  My latest patch actually replaces the final
> >'return 0' with a call to a function that will return true if the sender
> >was the device that received it.  This will hopefully prevent some of
> >the failures with the first patch I posted.  I'll know a bit more
> >tomorrow if this approach seems reasonable.
> 
> 	How do you figure that out?  In -rr mode, all of the slaves
> should have the same MAC address, and the slaves shouldn't be running
> IPv6 addrconf separately from the master anyway.  Something magic just
> for NA/NS packets?
> 

Knowing that I'm using an unmanaged switch with balance-rr probably
helps understand how this is happening.  I'll clarify this however, so
we are all on the same page.

In my situation, eth2 and eth3 are in bond0.  When bond0 transmits the
NS, let's say it goes out eth3.  Since it is a multicast frame my switch
will broadcast this to all ports and eth2 will receive the frame with
the source MAC address being the same as bond0's MAC address.  This
frame is passed up the stack to the ipv6 layer and appears to be a
response to the NS from another host and is dropped.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas de Pesloüan March 1, 2011, 9:30 p.m. UTC | #10
Le 01/03/2011 19:16, Andy Gospodarek a écrit :

[snip]

> Knowing that I'm using an unmanaged switch with balance-rr probably
> helps understand how this is happening.  I'll clarify this however, so
> we are all on the same page.
>
> In my situation, eth2 and eth3 are in bond0.  When bond0 transmits the
> NS, let's say it goes out eth3.  Since it is a multicast frame my switch
> will broadcast this to all ports and eth2 will receive the frame with
> the source MAC address being the same as bond0's MAC address.  This
> frame is passed up the stack to the ipv6 layer and appears to be a
> response to the NS from another host and is dropped.

'sounds perfectly normal.

This problem is described in detail in chapter 5.4.3 and appendix A of RFC4862 "IPv6 Stateless 
Address Autoconfiguration".

As this is clearly IPv6 related, it sounds normal from my point of view to fix it at the 
ndisc_recv_ns() level.

Quoting the RFC:

   "In those cases where the hardware cannot suppress loopbacks, however,
    one possible software heuristic to filter out unwanted loopbacks is
    to discard any received packet whose link-layer source address is the
    same as the receiving interface's.  There is even a link-layer
    specification that requires that any such packets be discarded
    [IEEE802.11].  Unfortunately, use of that criteria also results in
    the discarding of all packets sent by another node using the same
    link-layer address.  Duplicate Address Detection will fail on
    interfaces that filter received packets in this manner:

    [snip]

    Thus, to perform Duplicate Address Detection correctly in the case
    where two interfaces are using the same link-layer address, an
    implementation must have a good understanding of the interface's
    multicast loopback semantics, and the interface cannot discard
    received packets simply because the source link-layer address is the
    same as the interface's."

So, simply dropping frames whose source MAC == local MAC is apparently not the right solution.

	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jay Vosburgh March 1, 2011, 10:25 p.m. UTC | #11
Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com> wrote:

>Le 01/03/2011 19:16, Andy Gospodarek a écrit :
>
>[snip]
>
>> Knowing that I'm using an unmanaged switch with balance-rr probably
>> helps understand how this is happening.  I'll clarify this however, so
>> we are all on the same page.
>>
>> In my situation, eth2 and eth3 are in bond0.  When bond0 transmits the
>> NS, let's say it goes out eth3.  Since it is a multicast frame my switch
>> will broadcast this to all ports and eth2 will receive the frame with
>> the source MAC address being the same as bond0's MAC address.  This
>> frame is passed up the stack to the ipv6 layer and appears to be a
>> response to the NS from another host and is dropped.
>
>'sounds perfectly normal.
>
>This problem is described in detail in chapter 5.4.3 and appendix A of
>RFC4862 "IPv6 Stateless Address Autoconfiguration".
>
>As this is clearly IPv6 related, it sounds normal from my point of view to
>fix it at the ndisc_recv_ns() level.

	Andy's immediate problem is IPv6 related, but the issue itself
is generic: how to deal with broadcast / multicasts arriving at a -rr or
-xor bond, because we do not and cannot know if the switch is going to
flood to the slaves or not.  There may be other instances wherein that
bonus copy of some packet confuses things.

	My view is that -rr and -xor are intended to interoperate with
Etherchannel.  Yes, they will often work tolerably well when connected
to a non-Etherchannel switch.  But, if the host and the switch are not
in agreement on the link aggregation status of the ports, some level of
misbehavior is expected.  If that misbehavior can be corrected without
adversely affecting a properly configured host and switch, then I don't
see much problem with fixing it.

	For the IPv6 case here, I think there's a problem with any fix,
and that is that there's no way for bonding to know if the switch ports
are configured properly or not.  I'm using "properly" to mean that the
switch ports corresponding to the bonding slaves are configured into an
Etherchannel-type channel group.

	If the switch ports are grouped, then if IPv6 sees one of these
messages coming in, it's actually a duplicate detection.  This because
the switch won't loop the broadcast / multicast back around to a member
of the channel group.

	If the switch ports are not grouped, then the switch will
happily send broadcasts and multicasts to all ports of the bond, because
it doesn't know about the aggregation.  In this case, I suspect there's
no way to reliably determine if the incoming packet is a switch artifact
or an actual duplicate detection.  Anybody know for sure if this is the
case?

	For the generic case, I'm not seeing a way to distinguish actual
repeated packets from switch artifact duplicate packets without adding
another knob to bonding to tell it if the switch does etherchannel or
not (which I'm not in favor of doing).

>Quoting the RFC:
>
>  "In those cases where the hardware cannot suppress loopbacks, however,
>   one possible software heuristic to filter out unwanted loopbacks is
>   to discard any received packet whose link-layer source address is the
>   same as the receiving interface's.  There is even a link-layer
>   specification that requires that any such packets be discarded
>   [IEEE802.11].  Unfortunately, use of that criteria also results in
>   the discarding of all packets sent by another node using the same
>   link-layer address.  Duplicate Address Detection will fail on
>   interfaces that filter received packets in this manner:
>
>   [snip]
>
>   Thus, to perform Duplicate Address Detection correctly in the case
>   where two interfaces are using the same link-layer address, an
>   implementation must have a good understanding of the interface's
>   multicast loopback semantics, and the interface cannot discard
>   received packets simply because the source link-layer address is the
>   same as the interface's."
>
>So, simply dropping frames whose source MAC == local MAC is apparently not the right solution.

	I tend to agree here, because this would break DAD for properly
configured (meaning etherchannel on the switch ports) installations.

	Is there a way to fix bonding and/or ndisc_recv_ns to work
correctly for both cases (have/don't have etherchannel on the switch)?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas de Pesloüan March 1, 2011, 11:08 p.m. UTC | #12
Le 01/03/2011 23:25, Jay Vosburgh a écrit :
> Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com>  wrote:
>
>> Le 01/03/2011 19:16, Andy Gospodarek a écrit :
>>
>> [snip]
>>
>>> Knowing that I'm using an unmanaged switch with balance-rr probably
>>> helps understand how this is happening.  I'll clarify this however, so
>>> we are all on the same page.
>>>
>>> In my situation, eth2 and eth3 are in bond0.  When bond0 transmits the
>>> NS, let's say it goes out eth3.  Since it is a multicast frame my switch
>>> will broadcast this to all ports and eth2 will receive the frame with
>>> the source MAC address being the same as bond0's MAC address.  This
>>> frame is passed up the stack to the ipv6 layer and appears to be a
>>> response to the NS from another host and is dropped.
>>
>> 'sounds perfectly normal.
>>
>> This problem is described in detail in chapter 5.4.3 and appendix A of
>> RFC4862 "IPv6 Stateless Address Autoconfiguration".
>>
>> As this is clearly IPv6 related, it sounds normal from my point of view to
>> fix it at the ndisc_recv_ns() level.
>
> 	Andy's immediate problem is IPv6 related, but the issue itself
> is generic: how to deal with broadcast / multicasts arriving at a -rr or
> -xor bond, because we do not and cannot know if the switch is going to
> flood to the slaves or not.  There may be other instances wherein that
> bonus copy of some packet confuses things.

Agreed, even if the only known instances that currently expose the problem is IPv6.

Anyway, let's try and fix it at the bonding level...

> 	My view is that -rr and -xor are intended to interoperate with
> Etherchannel.  Yes, they will often work tolerably well when connected
> to a non-Etherchannel switch.  But, if the host and the switch are not
> in agreement on the link aggregation status of the ports, some level of
> misbehavior is expected.  If that misbehavior can be corrected without
> adversely affecting a properly configured host and switch, then I don't
> see much problem with fixing it.
>
> 	For the IPv6 case here, I think there's a problem with any fix,
> and that is that there's no way for bonding to know if the switch ports
> are configured properly or not.  I'm using "properly" to mean that the
> switch ports corresponding to the bonding slaves are configured into an
> Etherchannel-type channel group.
>
> 	If the switch ports are grouped, then if IPv6 sees one of these
> messages coming in, it's actually a duplicate detection.  This because
> the switch won't loop the broadcast / multicast back around to a member
> of the channel group.
>
> 	If the switch ports are not grouped, then the switch will
> happily send broadcasts and multicasts to all ports of the bond, because
> it doesn't know about the aggregation.  In this case, I suspect there's
> no way to reliably determine if the incoming packet is a switch artifact
> or an actual duplicate detection.  Anybody know for sure if this is the
> case?
>
> 	For the generic case, I'm not seeing a way to distinguish actual
> repeated packets from switch artifact duplicate packets without adding
> another knob to bonding to tell it if the switch does etherchannel or
> not (which I'm not in favor of doing).

I originally thought about such knob and agree with you that we should avoid adding one more...

>> Quoting the RFC:
>>
>>   "In those cases where the hardware cannot suppress loopbacks, however,
>>    one possible software heuristic to filter out unwanted loopbacks is
>>    to discard any received packet whose link-layer source address is the
>>    same as the receiving interface's.  There is even a link-layer
>>    specification that requires that any such packets be discarded
>>    [IEEE802.11].  Unfortunately, use of that criteria also results in
>>    the discarding of all packets sent by another node using the same
>>    link-layer address.  Duplicate Address Detection will fail on
>>    interfaces that filter received packets in this manner:
>>
>>    [snip]
>>
>>    Thus, to perform Duplicate Address Detection correctly in the case
>>    where two interfaces are using the same link-layer address, an
>>    implementation must have a good understanding of the interface's
>>    multicast loopback semantics, and the interface cannot discard
>>    received packets simply because the source link-layer address is the
>>    same as the interface's."
>>
>> So, simply dropping frames whose source MAC == local MAC is apparently not the right solution.
>
> 	I tend to agree here, because this would break DAD for properly
> configured (meaning etherchannel on the switch ports) installations.
>
> 	Is there a way to fix bonding and/or ndisc_recv_ns to work
> correctly for both cases (have/don't have etherchannel on the switch)?

Can we imagine that, at the time we change the bonding mode to -rr or -xor, we simply brodcast or 
multicast one or two frames with some random data and wait to see whether we receive the frame back? 
If we receive at least one frame with the same random data, in one of the slaves interface for this 
bonding, we know for sure the switch configuration is not "multicast loop safe". Bonding already 
send ARP requests/replies in many situations. Adding one broadcast/multicast frame at bond setup 
time is probably acceptable.

And to ensure consistent results, we need to send such broadcast/multicast every time the link goes 
up for an already enslaved slave. This is not perfect, as the switch topology may change in a way 
that won't be detected by bonding, but still cause a new multicast loop, but...

Knowing the switch configuration is not "multicast loop safe", we can, at a minimum, issue a 
warning, telling the user she should expect strange behaviors, like false duplicate address detection.

And we can probably use this information into the should-drop logic, for mode that lack "inactive" 
slaves.

	Nicolas.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 2, 2011, 12:30 p.m. UTC | #13
On Wed, Mar 02, 2011 at 11:10:07AM +0100, Nicolas de Pesloüan wrote:
>
> If one decide to configure two interfaces with the same MAC and connect them
> to the same LAN, then we get the exact same situation. Having eth0 and eth1
> share a single MAC and a single IP address, connected to a switch in
> Etherchannel mode is a perfectly valid setup, while suboptimal. And if the
> Etherchannel mode happens to be improperly configured, we end up with the
> same problem as reported by Andy.

Right.  There's also the case where you have other MAC addresses
sitting behind the bonding device, e.g., virtualisation.  So basing
it purely on the bonding device's MAC address is probably not worth
the trouble.

Cheers,
Nicolas de Pesloüan March 2, 2011, 8:26 p.m. UTC | #14
Le 02/03/2011 00:08, Nicolas de Pesloüan a écrit :
[snip]
> Can we imagine that, at the time we change the bonding mode to -rr or
> -xor, we simply brodcast or multicast one or two frames with some random
> data and wait to see whether we receive the frame back? If we receive at
> least one frame with the same random data, in one of the slaves
> interface for this bonding, we know for sure the switch configuration is
> not "multicast loop safe". Bonding already send ARP requests/replies in
> many situations. Adding one broadcast/multicast frame at bond setup time
> is probably acceptable.
>
> And to ensure consistent results, we need to send such
> broadcast/multicast every time the link goes up for an already enslaved
> slave. This is not perfect, as the switch topology may change in a way
> that won't be detected by bonding, but still cause a new multicast loop,
> but...
>
> Knowing the switch configuration is not "multicast loop safe", we can,
> at a minimum, issue a warning, telling the user she should expect
> strange behaviors, like false duplicate address detection.
>
> And we can probably use this information into the should-drop logic, for
> mode that lack "inactive" slaves.
>

Still thinking about it:

We should drop the frame if :

the bonding interface is in -rr or -xor mode,
and we know the switch topology in front of our slaves is not "multicast
loop safe" (see above).
and the source MAC is the MAC of the bonding interface
and the destination MAC is a multicast one.

That being said, I wonder if this is only bonding related.

If one decide to configure two interfaces with the same MAC and connect
them to the same LAN, then we get the exact same situation. Having eth0
and eth1 share a single MAC and a single IP address, connected to a
switch in Etherchannel mode is a perfectly valid setup, while
suboptimal. And if the Etherchannel mode happens to be improperly
configured, we end up with the same problem as reported by Andy.

          Nicolas.

[ Resent, because netdev ML didn't get it the first time ]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas de Pesloüan March 2, 2011, 8:30 p.m. UTC | #15
Le 02/03/2011 13:30, Herbert Xu a écrit :
> On Wed, Mar 02, 2011 at 11:10:07AM +0100, Nicolas de Pesloüan wrote:
>>
>> If one decide to configure two interfaces with the same MAC and connect them
>> to the same LAN, then we get the exact same situation. Having eth0 and eth1
>> share a single MAC and a single IP address, connected to a switch in
>> Etherchannel mode is a perfectly valid setup, while suboptimal. And if the
>> Etherchannel mode happens to be improperly configured, we end up with the
>> same problem as reported by Andy.
>
> Right.  There's also the case where you have other MAC addresses
> sitting behind the bonding device, e.g., virtualisation.  So basing
> it purely on the bonding device's MAC address is probably not worth
> the trouble.
>
> Cheers,

I'm afraid we miss a general way to fix the general problem. We probably need to handle the problem 
at every places that can suffer from the multicast loop, until we find a general fix, if ever.

	Nicolas.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 8ae6631..4a76ccd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2971,6 +2971,11 @@  static inline void skb_bond_set_mac_by_master(struct sk_buff *skb,
 int __skb_bond_should_drop(struct sk_buff *skb, struct net_device *master)
 {
 	struct net_device *dev = skb->dev;
+	struct ethhdr *eth = eth_hdr(skb);
+
+	/* Drop all frames with the bond master's source address. */
+	if (unlikely(!compare_ether_addr(eth->h_source, master->dev_addr)))
+		return 1;
 
 	if (master->priv_flags & IFF_MASTER_ARPMON)
 		dev->last_rx = jiffies;