diff mbox

[net-next] net: allow vlan traffic to be received under bond

Message ID 20111010191641.2496.84845.stgit@jf-dev1-dcblab
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

John Fastabend Oct. 10, 2011, 7:16 p.m. UTC
The following configuration used to work as I expected. At least
we could use the fcoe interfaces to do MPIO and the bond0 iface
to do load balancing or failover.

       ---eth2.228-fcoe
       |
eth2 -----|
          |
          |---- bond0
          |
eth3 -----|
       |
       ---eth3.228-fcoe

This worked because of a change we added to allow inactive slaves
to rx 'exact' matches. This functionality was kept intact with the
rx_handler mechanism. However now the vlan interface attached to the
active slave never receives traffic because the bonding rx_handler
updates the skb->dev and goto's another_round. Previously, the
vlan_do_receive() logic was called before the bonding rx_handler.

Now by the time vlan_do_receive calls vlan_find_dev() the
skb->dev is set to bond0 and it is clear no vlan is attached
to this iface. The vlan lookup fails.

This patch moves the VLAN check above the rx_handler. A VLAN
tagged frame is now routed to the eth2.228-fcoe iface in the
above schematic. Untagged frames continue to the bond0 as
normal. This case also remains intact,

eth2 --> bond0 --> vlan.228

Here the skb is VLAN tagged but the vlan lookup fails on eth2
causing the bonding rx_handler to be called. On the second
pass the vlan lookup is on the bond0 iface and completes as
expected.

Putting a VLAN.228 on both the bond0 and eth2 device will
result in eth2.228 receiving the skb. I don't think this is
completely unexpected and was the result prior to the rx_handler
result.

Note, the same setup is also used for other storage traffic that
MPIO is used with eg. iSCSI and similar setups can be contrived
without storage protocols.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---

 net/core/dev.c |   22 +++++++++++-----------
 1 files changed, 11 insertions(+), 11 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jiri Pirko Oct. 10, 2011, 10:37 p.m. UTC | #1
Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>The following configuration used to work as I expected. At least
>we could use the fcoe interfaces to do MPIO and the bond0 iface
>to do load balancing or failover.
>
>       ---eth2.228-fcoe
>       |
>eth2 -----|
>          |
>          |---- bond0
>          |
>eth3 -----|
>       |
>       ---eth3.228-fcoe
>
>This worked because of a change we added to allow inactive slaves
>to rx 'exact' matches. This functionality was kept intact with the
>rx_handler mechanism. However now the vlan interface attached to the
>active slave never receives traffic because the bonding rx_handler
>updates the skb->dev and goto's another_round. Previously, the
>vlan_do_receive() logic was called before the bonding rx_handler.
>
>Now by the time vlan_do_receive calls vlan_find_dev() the
>skb->dev is set to bond0 and it is clear no vlan is attached
>to this iface. The vlan lookup fails.
>
>This patch moves the VLAN check above the rx_handler. A VLAN
>tagged frame is now routed to the eth2.228-fcoe iface in the
>above schematic. Untagged frames continue to the bond0 as
>normal. This case also remains intact,
>
>eth2 --> bond0 --> vlan.228
>
>Here the skb is VLAN tagged but the vlan lookup fails on eth2
>causing the bonding rx_handler to be called. On the second
>pass the vlan lookup is on the bond0 iface and completes as
>expected.
>
>Putting a VLAN.228 on both the bond0 and eth2 device will
>result in eth2.228 receiving the skb. I don't think this is
>completely unexpected and was the result prior to the rx_handler
>result.
>
>Note, the same setup is also used for other storage traffic that
>MPIO is used with eg. iSCSI and similar setups can be contrived
>without storage protocols.
>
>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>---
>
> net/core/dev.c |   22 +++++++++++-----------
> 1 files changed, 11 insertions(+), 11 deletions(-)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index 70ecb86..8b6118a 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -3231,6 +3231,17 @@ another_round:
> ncls:
> #endif
> 
>+	if (vlan_tx_tag_present(skb)) {
>+		if (pt_prev) {
>+			ret = deliver_skb(skb, pt_prev, orig_dev);
>+			pt_prev = NULL;
>+		}
>+		if (vlan_do_receive(&skb))
>+			goto another_round;
>+		else if (unlikely(!skb))
>+			goto out;
>+	}
>+
> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
> 	if (rx_handler) {
> 		if (pt_prev) {
>@@ -3251,17 +3262,6 @@ ncls:
> 		}
> 	}
> 
>-	if (vlan_tx_tag_present(skb)) {
>-		if (pt_prev) {
>-			ret = deliver_skb(skb, pt_prev, orig_dev);
>-			pt_prev = NULL;
>-		}
>-		if (vlan_do_receive(&skb))
>-			goto another_round;
>-		else if (unlikely(!skb))
>-			goto out;
>-	}
>-
> 	/* deliver only exact match when indicated */
> 	null_or_dev = deliver_exact ? skb->dev : NULL;
> 
>

Hmm, I must look at this again tomorrow but I have strong feeling this
will break some some scenario including vlan-bridge-macvlan.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend Oct. 11, 2011, 2:07 a.m. UTC | #2
On 10/10/2011 3:37 PM, Jiri Pirko wrote:
> Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>> The following configuration used to work as I expected. At least
>> we could use the fcoe interfaces to do MPIO and the bond0 iface
>> to do load balancing or failover.
>>
>>       ---eth2.228-fcoe
>>       |
>> eth2 -----|
>>          |
>>          |---- bond0
>>          |
>> eth3 -----|
>>       |
>>       ---eth3.228-fcoe
>>
>> This worked because of a change we added to allow inactive slaves
>> to rx 'exact' matches. This functionality was kept intact with the
>> rx_handler mechanism. However now the vlan interface attached to the
>> active slave never receives traffic because the bonding rx_handler
>> updates the skb->dev and goto's another_round. Previously, the
>> vlan_do_receive() logic was called before the bonding rx_handler.
>>
>> Now by the time vlan_do_receive calls vlan_find_dev() the
>> skb->dev is set to bond0 and it is clear no vlan is attached
>> to this iface. The vlan lookup fails.
>>
>> This patch moves the VLAN check above the rx_handler. A VLAN
>> tagged frame is now routed to the eth2.228-fcoe iface in the
>> above schematic. Untagged frames continue to the bond0 as
>> normal. This case also remains intact,
>>
>> eth2 --> bond0 --> vlan.228
>>
>> Here the skb is VLAN tagged but the vlan lookup fails on eth2
>> causing the bonding rx_handler to be called. On the second
>> pass the vlan lookup is on the bond0 iface and completes as
>> expected.
>>
>> Putting a VLAN.228 on both the bond0 and eth2 device will
>> result in eth2.228 receiving the skb. I don't think this is
>> completely unexpected and was the result prior to the rx_handler
>> result.
>>
>> Note, the same setup is also used for other storage traffic that
>> MPIO is used with eg. iSCSI and similar setups can be contrived
>> without storage protocols.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>
>> net/core/dev.c |   22 +++++++++++-----------
>> 1 files changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 70ecb86..8b6118a 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3231,6 +3231,17 @@ another_round:
>> ncls:
>> #endif
>>
>> +	if (vlan_tx_tag_present(skb)) {
>> +		if (pt_prev) {
>> +			ret = deliver_skb(skb, pt_prev, orig_dev);
>> +			pt_prev = NULL;
>> +		}
>> +		if (vlan_do_receive(&skb))
>> +			goto another_round;
>> +		else if (unlikely(!skb))
>> +			goto out;
>> +	}
>> +
>> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
>> 	if (rx_handler) {
>> 		if (pt_prev) {
>> @@ -3251,17 +3262,6 @@ ncls:
>> 		}
>> 	}
>>
>> -	if (vlan_tx_tag_present(skb)) {
>> -		if (pt_prev) {
>> -			ret = deliver_skb(skb, pt_prev, orig_dev);
>> -			pt_prev = NULL;
>> -		}
>> -		if (vlan_do_receive(&skb))
>> -			goto another_round;
>> -		else if (unlikely(!skb))
>> -			goto out;
>> -	}
>> -
>> 	/* deliver only exact match when indicated */
>> 	null_or_dev = deliver_exact ? skb->dev : NULL;
>>
>>
> 
> Hmm, I must look at this again tomorrow but I have strong feeling this
> will break some some scenario including vlan-bridge-macvlan.

Yes please review... I tested cases with vlan, bridge, and macvlan
components and believe this works unless I missed something.

Maybe Jesse, can comment though on why this commit that moved (and
cleaned up) the vlan tag handling put the vlan_do_receive below the
rx_handler rather than above it. Was this intended to fix something?

commit 3701e51382a026cba10c60b03efabe534fba4ca4
Author: Jesse Gross <jesse@nicira.com>
Date:   Wed Oct 20 13:56:06 2010 +0000

    vlan: Centralize handling of hardware acceleration.


Thanks,
John.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross Oct. 11, 2011, 2:43 a.m. UTC | #3
On Mon, Oct 10, 2011 at 7:07 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> On 10/10/2011 3:37 PM, Jiri Pirko wrote:
>> Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>>> The following configuration used to work as I expected. At least
>>> we could use the fcoe interfaces to do MPIO and the bond0 iface
>>> to do load balancing or failover.
>>>
>>>       ---eth2.228-fcoe
>>>       |
>>> eth2 -----|
>>>          |
>>>          |---- bond0
>>>          |
>>> eth3 -----|
>>>       |
>>>       ---eth3.228-fcoe
>>>
>>> This worked because of a change we added to allow inactive slaves
>>> to rx 'exact' matches. This functionality was kept intact with the
>>> rx_handler mechanism. However now the vlan interface attached to the
>>> active slave never receives traffic because the bonding rx_handler
>>> updates the skb->dev and goto's another_round. Previously, the
>>> vlan_do_receive() logic was called before the bonding rx_handler.
>>>
>>> Now by the time vlan_do_receive calls vlan_find_dev() the
>>> skb->dev is set to bond0 and it is clear no vlan is attached
>>> to this iface. The vlan lookup fails.
>>>
>>> This patch moves the VLAN check above the rx_handler. A VLAN
>>> tagged frame is now routed to the eth2.228-fcoe iface in the
>>> above schematic. Untagged frames continue to the bond0 as
>>> normal. This case also remains intact,
>>>
>>> eth2 --> bond0 --> vlan.228
>>>
>>> Here the skb is VLAN tagged but the vlan lookup fails on eth2
>>> causing the bonding rx_handler to be called. On the second
>>> pass the vlan lookup is on the bond0 iface and completes as
>>> expected.
>>>
>>> Putting a VLAN.228 on both the bond0 and eth2 device will
>>> result in eth2.228 receiving the skb. I don't think this is
>>> completely unexpected and was the result prior to the rx_handler
>>> result.
>>>
>>> Note, the same setup is also used for other storage traffic that
>>> MPIO is used with eg. iSCSI and similar setups can be contrived
>>> without storage protocols.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>>>
>>> net/core/dev.c |   22 +++++++++++-----------
>>> 1 files changed, 11 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 70ecb86..8b6118a 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -3231,6 +3231,17 @@ another_round:
>>> ncls:
>>> #endif
>>>
>>> +    if (vlan_tx_tag_present(skb)) {
>>> +            if (pt_prev) {
>>> +                    ret = deliver_skb(skb, pt_prev, orig_dev);
>>> +                    pt_prev = NULL;
>>> +            }
>>> +            if (vlan_do_receive(&skb))
>>> +                    goto another_round;
>>> +            else if (unlikely(!skb))
>>> +                    goto out;
>>> +    }
>>> +
>>>      rx_handler = rcu_dereference(skb->dev->rx_handler);
>>>      if (rx_handler) {
>>>              if (pt_prev) {
>>> @@ -3251,17 +3262,6 @@ ncls:
>>>              }
>>>      }
>>>
>>> -    if (vlan_tx_tag_present(skb)) {
>>> -            if (pt_prev) {
>>> -                    ret = deliver_skb(skb, pt_prev, orig_dev);
>>> -                    pt_prev = NULL;
>>> -            }
>>> -            if (vlan_do_receive(&skb))
>>> -                    goto another_round;
>>> -            else if (unlikely(!skb))
>>> -                    goto out;
>>> -    }
>>> -
>>>      /* deliver only exact match when indicated */
>>>      null_or_dev = deliver_exact ? skb->dev : NULL;
>>>
>>>
>>
>> Hmm, I must look at this again tomorrow but I have strong feeling this
>> will break some some scenario including vlan-bridge-macvlan.
>
> Yes please review... I tested cases with vlan, bridge, and macvlan
> components and believe this works unless I missed something.
>
> Maybe Jesse, can comment though on why this commit that moved (and
> cleaned up) the vlan tag handling put the vlan_do_receive below the
> rx_handler rather than above it. Was this intended to fix something?

The original reason was to ensure packets received from NICs that do
stripping behaved the same as those that don't.  At the time, the
packets with inline vlan tags were handled by the same code that
handled upper layer protocols so it was important that code for vlan
stripped tags be immediately before that.  Otherwise, packets might be
taken either by the bridge hook or vlan code depending the the type of
device.

However, that's no longer an issue because we now emulate vlan
acceleration by untagging packets at the beginning of
__netif_receive_skb(), so the code path will always be the same.
Furthermore, based on feedback received since that patch it seems
pretty clear that people prefer the behavior where vlan devices take
traffic first, so now that we can have both that and consistent
behavior it seems to be the way to go.

This looks correct to me:
Acked-by: Jesse Gross <jesse@nicira.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Pirko Oct. 11, 2011, 10:57 a.m. UTC | #4
Tue, Oct 11, 2011 at 12:37:53AM CEST, jpirko@redhat.com wrote:
>Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>>The following configuration used to work as I expected. At least
>>we could use the fcoe interfaces to do MPIO and the bond0 iface
>>to do load balancing or failover.
>>
>>       ---eth2.228-fcoe
>>       |
>>eth2 -----|
>>          |
>>          |---- bond0
>>          |
>>eth3 -----|
>>       |
>>       ---eth3.228-fcoe
>>
>>This worked because of a change we added to allow inactive slaves
>>to rx 'exact' matches. This functionality was kept intact with the
>>rx_handler mechanism. However now the vlan interface attached to the
>>active slave never receives traffic because the bonding rx_handler
>>updates the skb->dev and goto's another_round. Previously, the
>>vlan_do_receive() logic was called before the bonding rx_handler.
>>
>>Now by the time vlan_do_receive calls vlan_find_dev() the
>>skb->dev is set to bond0 and it is clear no vlan is attached
>>to this iface. The vlan lookup fails.
>>
>>This patch moves the VLAN check above the rx_handler. A VLAN
>>tagged frame is now routed to the eth2.228-fcoe iface in the
>>above schematic. Untagged frames continue to the bond0 as
>>normal. This case also remains intact,
>>
>>eth2 --> bond0 --> vlan.228
>>
>>Here the skb is VLAN tagged but the vlan lookup fails on eth2
>>causing the bonding rx_handler to be called. On the second
>>pass the vlan lookup is on the bond0 iface and completes as
>>expected.
>>
>>Putting a VLAN.228 on both the bond0 and eth2 device will
>>result in eth2.228 receiving the skb. I don't think this is
>>completely unexpected and was the result prior to the rx_handler
>>result.
>>
>>Note, the same setup is also used for other storage traffic that
>>MPIO is used with eg. iSCSI and similar setups can be contrived
>>without storage protocols.
>>
>>Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>---
>>
>> net/core/dev.c |   22 +++++++++++-----------
>> 1 files changed, 11 insertions(+), 11 deletions(-)
>>
>>diff --git a/net/core/dev.c b/net/core/dev.c
>>index 70ecb86..8b6118a 100644
>>--- a/net/core/dev.c
>>+++ b/net/core/dev.c
>>@@ -3231,6 +3231,17 @@ another_round:
>> ncls:
>> #endif
>> 
>>+	if (vlan_tx_tag_present(skb)) {
>>+		if (pt_prev) {
>>+			ret = deliver_skb(skb, pt_prev, orig_dev);
>>+			pt_prev = NULL;
>>+		}
>>+		if (vlan_do_receive(&skb))
>>+			goto another_round;
>>+		else if (unlikely(!skb))
>>+			goto out;
>>+	}
>>+
>> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
>> 	if (rx_handler) {
>> 		if (pt_prev) {
>>@@ -3251,17 +3262,6 @@ ncls:
>> 		}
>> 	}
>> 
>>-	if (vlan_tx_tag_present(skb)) {
>>-		if (pt_prev) {
>>-			ret = deliver_skb(skb, pt_prev, orig_dev);
>>-			pt_prev = NULL;
>>-		}
>>-		if (vlan_do_receive(&skb))
>>-			goto another_round;
>>-		else if (unlikely(!skb))
>>-			goto out;
>>-	}
>>-
>> 	/* deliver only exact match when indicated */
>> 	null_or_dev = deliver_exact ? skb->dev : NULL;
>> 
>>
>
>Hmm, I must look at this again tomorrow but I have strong feeling this
>will break some some scenario including vlan-bridge-macvlan.

I didn't find out anything.

Reviewed-by: Jiri Pirko <jpirko@redhat.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom Oct. 11, 2011, 11:08 a.m. UTC | #5
Hello
On Tuesday 11 October 2011 04:43:03 Jesse Gross wrote:
> On Mon, Oct 10, 2011 at 7:07 PM, John Fastabend
> <john.r.fastabend@intel.com> wrote:
> > On 10/10/2011 3:37 PM, Jiri Pirko wrote:
> >> Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
> >>> The following configuration used to work as I expected. At least
> >>> we could use the fcoe interfaces to do MPIO and the bond0 iface
> >>> to do load balancing or failover.
> >>>
> >>>       ---eth2.228-fcoe
> >>>       |
> >>> eth2 -----|
> >>>          |
> >>>          |---- bond0
> >>>          |
> >>> eth3 -----|
> >>>       |
> >>>       ---eth3.228-fcoe
> >>>
> >>> This worked because of a change we added to allow inactive slaves
> >>> to rx 'exact' matches. This functionality was kept intact with the
> >>> rx_handler mechanism. However now the vlan interface attached to the
> >>> active slave never receives traffic because the bonding rx_handler
> >>> updates the skb->dev and goto's another_round. Previously, the
> >>> vlan_do_receive() logic was called before the bonding rx_handler.
> >>>
> >>> Now by the time vlan_do_receive calls vlan_find_dev() the
> >>> skb->dev is set to bond0 and it is clear no vlan is attached
> >>> to this iface. The vlan lookup fails.
> >>>
> >>> This patch moves the VLAN check above the rx_handler. A VLAN
> >>> tagged frame is now routed to the eth2.228-fcoe iface in the
> >>> above schematic. Untagged frames continue to the bond0 as
> >>> normal. This case also remains intact,
> >>>
> >>> eth2 --> bond0 --> vlan.228
> >>>
> >>> Here the skb is VLAN tagged but the vlan lookup fails on eth2
> >>> causing the bonding rx_handler to be called. On the second
> >>> pass the vlan lookup is on the bond0 iface and completes as
> >>> expected.
> >>>
> >>> Putting a VLAN.228 on both the bond0 and eth2 device will
> >>> result in eth2.228 receiving the skb. I don't think this is
> >>> completely unexpected and was the result prior to the rx_handler
> >>> result.

I think this OK, but I do have a question
if bond0 is in Active/Backup mode, eth2 and eth3 got the same MAC.addr,
what about the VLAN:s ?
(or is just one of thme working ??)

> >>>
> >>> Note, the same setup is also used for other storage traffic that
> >>> MPIO is used with eg. iSCSI and similar setups can be contrived
> >>> without storage protocols.
> >>>
> >>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >>> ---
> >>>
> >>> net/core/dev.c |   22 +++++++++++-----------
> >>> 1 files changed, 11 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/net/core/dev.c b/net/core/dev.c
> >>> index 70ecb86..8b6118a 100644
> >>> --- a/net/core/dev.c
> >>> +++ b/net/core/dev.c
> >>> @@ -3231,6 +3231,17 @@ another_round:
> >>> ncls:
> >>> #endif
> >>>
> >>> +    if (vlan_tx_tag_present(skb)) {
> >>> +            if (pt_prev) {
> >>> +                    ret = deliver_skb(skb, pt_prev, orig_dev);
> >>> +                    pt_prev = NULL;
> >>> +            }
> >>> +            if (vlan_do_receive(&skb))
> >>> +                    goto another_round;
> >>> +            else if (unlikely(!skb))
> >>> +                    goto out;
> >>> +    }
> >>> +
> >>>      rx_handler = rcu_dereference(skb->dev->rx_handler);
> >>>      if (rx_handler) {
> >>>              if (pt_prev) {
> >>> @@ -3251,17 +3262,6 @@ ncls:
> >>>              }
> >>>      }
> >>>
> >>> -    if (vlan_tx_tag_present(skb)) {
> >>> -            if (pt_prev) {
> >>> -                    ret = deliver_skb(skb, pt_prev, orig_dev);
> >>> -                    pt_prev = NULL;
> >>> -            }
> >>> -            if (vlan_do_receive(&skb))
> >>> -                    goto another_round;
> >>> -            else if (unlikely(!skb))
> >>> -                    goto out;
> >>> -    }
> >>> -
> >>>      /* deliver only exact match when indicated */
> >>>      null_or_dev = deliver_exact ? skb->dev : NULL;
> >>>
> >>>
> >>
> >> Hmm, I must look at this again tomorrow but I have strong feeling this
> >> will break some some scenario including vlan-bridge-macvlan.
> >
> > Yes please review... I tested cases with vlan, bridge, and macvlan
> > components and believe this works unless I missed something.
> >
> > Maybe Jesse, can comment though on why this commit that moved (and
> > cleaned up) the vlan tag handling put the vlan_do_receive below the
> > rx_handler rather than above it. Was this intended to fix something?
> 
> The original reason was to ensure packets received from NICs that do
> stripping behaved the same as those that don't.  At the time, the
> packets with inline vlan tags were handled by the same code that
> handled upper layer protocols so it was important that code for vlan
> stripped tags be immediately before that.  Otherwise, packets might be
> taken either by the bridge hook or vlan code depending the the type of
> device.
> 
> However, that's no longer an issue because we now emulate vlan
> acceleration by untagging packets at the beginning of
> __netif_receive_skb(), so the code path will always be the same.
> Furthermore, based on feedback received since that patch it seems
> pretty clear that people prefer the behavior where vlan devices take
> traffic first, so now that we can have both that and consistent
> behavior it seems to be the way to go.
> 
> This looks correct to me:
> Acked-by: Jesse Gross <jesse@nicira.com>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
John Fastabend Oct. 11, 2011, 1:13 p.m. UTC | #6
On 10/11/2011 4:08 AM, Hans Schillstrom wrote:
> Hello
> On Tuesday 11 October 2011 04:43:03 Jesse Gross wrote:
>> On Mon, Oct 10, 2011 at 7:07 PM, John Fastabend
>> <john.r.fastabend@intel.com> wrote:
>>> On 10/10/2011 3:37 PM, Jiri Pirko wrote:
>>>> Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>>>>> The following configuration used to work as I expected. At least
>>>>> we could use the fcoe interfaces to do MPIO and the bond0 iface
>>>>> to do load balancing or failover.
>>>>>
>>>>>       ---eth2.228-fcoe
>>>>>       |
>>>>> eth2 -----|
>>>>>          |
>>>>>          |---- bond0
>>>>>          |
>>>>> eth3 -----|
>>>>>       |
>>>>>       ---eth3.228-fcoe
>>>>>
>>>>> This worked because of a change we added to allow inactive slaves
>>>>> to rx 'exact' matches. This functionality was kept intact with the
>>>>> rx_handler mechanism. However now the vlan interface attached to the
>>>>> active slave never receives traffic because the bonding rx_handler
>>>>> updates the skb->dev and goto's another_round. Previously, the
>>>>> vlan_do_receive() logic was called before the bonding rx_handler.
>>>>>
>>>>> Now by the time vlan_do_receive calls vlan_find_dev() the
>>>>> skb->dev is set to bond0 and it is clear no vlan is attached
>>>>> to this iface. The vlan lookup fails.
>>>>>
>>>>> This patch moves the VLAN check above the rx_handler. A VLAN
>>>>> tagged frame is now routed to the eth2.228-fcoe iface in the
>>>>> above schematic. Untagged frames continue to the bond0 as
>>>>> normal. This case also remains intact,
>>>>>
>>>>> eth2 --> bond0 --> vlan.228
>>>>>
>>>>> Here the skb is VLAN tagged but the vlan lookup fails on eth2
>>>>> causing the bonding rx_handler to be called. On the second
>>>>> pass the vlan lookup is on the bond0 iface and completes as
>>>>> expected.
>>>>>
>>>>> Putting a VLAN.228 on both the bond0 and eth2 device will
>>>>> result in eth2.228 receiving the skb. I don't think this is
>>>>> completely unexpected and was the result prior to the rx_handler
>>>>> result.
> 
> I think this OK, but I do have a question
> if bond0 is in Active/Backup mode, eth2 and eth3 got the same MAC.addr,
> what about the VLAN:s ?
> (or is just one of thme working ??)
> 

The VLAN MAC address will not be managed by the bond. In the
storage case a SAN mac may be used (NETDEV_HW_ADDR_T_SAN).
Otherwise the MAC can be managed normally.

Both VLANs will receive frames but in some modes only to packet
handlers that have exact matches. See bond_should_deliver_exact_match().

.John.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend Oct. 11, 2011, 1:16 p.m. UTC | #7
On 10/10/2011 7:43 PM, Jesse Gross wrote:
> On Mon, Oct 10, 2011 at 7:07 PM, John Fastabend
> <john.r.fastabend@intel.com> wrote:
>> On 10/10/2011 3:37 PM, Jiri Pirko wrote:
>>> Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>>>> The following configuration used to work as I expected. At least
>>>> we could use the fcoe interfaces to do MPIO and the bond0 iface
>>>> to do load balancing or failover.
>>>>
>>>>       ---eth2.228-fcoe
>>>>       |
>>>> eth2 -----|
>>>>          |
>>>>          |---- bond0
>>>>          |
>>>> eth3 -----|
>>>>       |
>>>>       ---eth3.228-fcoe
>>>>
>>>> This worked because of a change we added to allow inactive slaves
>>>> to rx 'exact' matches. This functionality was kept intact with the
>>>> rx_handler mechanism. However now the vlan interface attached to the
>>>> active slave never receives traffic because the bonding rx_handler
>>>> updates the skb->dev and goto's another_round. Previously, the
>>>> vlan_do_receive() logic was called before the bonding rx_handler.
>>>>
>>>> Now by the time vlan_do_receive calls vlan_find_dev() the
>>>> skb->dev is set to bond0 and it is clear no vlan is attached
>>>> to this iface. The vlan lookup fails.
>>>>
>>>> This patch moves the VLAN check above the rx_handler. A VLAN
>>>> tagged frame is now routed to the eth2.228-fcoe iface in the
>>>> above schematic. Untagged frames continue to the bond0 as
>>>> normal. This case also remains intact,
>>>>
>>>> eth2 --> bond0 --> vlan.228
>>>>
>>>> Here the skb is VLAN tagged but the vlan lookup fails on eth2
>>>> causing the bonding rx_handler to be called. On the second
>>>> pass the vlan lookup is on the bond0 iface and completes as
>>>> expected.
>>>>
>>>> Putting a VLAN.228 on both the bond0 and eth2 device will
>>>> result in eth2.228 receiving the skb. I don't think this is
>>>> completely unexpected and was the result prior to the rx_handler
>>>> result.
>>>>
>>>> Note, the same setup is also used for other storage traffic that
>>>> MPIO is used with eg. iSCSI and similar setups can be contrived
>>>> without storage protocols.
>>>>
>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>> ---

[...]

>> Maybe Jesse, can comment though on why this commit that moved (and
>> cleaned up) the vlan tag handling put the vlan_do_receive below the
>> rx_handler rather than above it. Was this intended to fix something?
> 
> The original reason was to ensure packets received from NICs that do
> stripping behaved the same as those that don't.  At the time, the
> packets with inline vlan tags were handled by the same code that
> handled upper layer protocols so it was important that code for vlan
> stripped tags be immediately before that.  Otherwise, packets might be
> taken either by the bridge hook or vlan code depending the the type of
> device.
> 
> However, that's no longer an issue because we now emulate vlan
> acceleration by untagging packets at the beginning of
> __netif_receive_skb(), so the code path will always be the same.
> Furthermore, based on feedback received since that patch it seems
> pretty clear that people prefer the behavior where vlan devices take
> traffic first, so now that we can have both that and consistent
> behavior it seems to be the way to go.
> 
> This looks correct to me:
> Acked-by: Jesse Gross <jesse@nicira.com>

Thanks for the nice summary Jesse.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom Oct. 13, 2011, 1:09 p.m. UTC | #8
>On 10/11/2011 4:08 AM, Hans Schillstrom wrote:
>> Hello
>> On Tuesday 11 October 2011 04:43:03 Jesse Gross wrote:
>>> On Mon, Oct 10, 2011 at 7:07 PM, John Fastabend
>>> <john.r.fastabend@intel.com> wrote:
>>>> On 10/10/2011 3:37 PM, Jiri Pirko wrote:
>>>>> Mon, Oct 10, 2011 at 09:16:41PM CEST, john.r.fastabend@intel.com wrote:
>>>>>> The following configuration used to work as I expected. At least
>>>>>> we could use the fcoe interfaces to do MPIO and the bond0 iface
>>>>>> to do load balancing or failover.
>>>>>>
>>>>>>       ---eth2.228-fcoe
>>>>>>       |
>>>>>> eth2 -----|
>>>>>>          |
>>>>>>          |---- bond0
>>>>>>          |
>>>>>> eth3 -----|
>>>>>>       |
>>>>>>       ---eth3.228-fcoe
>>>>>>
>>>>>> This worked because of a change we added to allow inactive slaves
>>>>>> to rx 'exact' matches. This functionality was kept intact with the
>>>>>> rx_handler mechanism. However now the vlan interface attached to the
>>>>>> active slave never receives traffic because the bonding rx_handler
>>>>>> updates the skb->dev and goto's another_round. Previously, the
>>>>>> vlan_do_receive() logic was called before the bonding rx_handler.
>>>>>>
>>>>>> Now by the time vlan_do_receive calls vlan_find_dev() the
>>>>>> skb->dev is set to bond0 and it is clear no vlan is attached
>>>>>> to this iface. The vlan lookup fails.
>>>>>>
>>>>>> This patch moves the VLAN check above the rx_handler. A VLAN
>>>>>> tagged frame is now routed to the eth2.228-fcoe iface in the
>>>>>> above schematic. Untagged frames continue to the bond0 as
>>>>>> normal. This case also remains intact,
>>>>>>
>>>>>> eth2 --> bond0 --> vlan.228
>>>>>>
>>>>>> Here the skb is VLAN tagged but the vlan lookup fails on eth2
>>>>>> causing the bonding rx_handler to be called. On the second
>>>>>> pass the vlan lookup is on the bond0 iface and completes as
>>>>>> expected.
>>>>>>
>>>>>> Putting a VLAN.228 on both the bond0 and eth2 device will
>>>>>> result in eth2.228 receiving the skb. I don't think this is
>>>>>> completely unexpected and was the result prior to the rx_handler
>>>>>> result.
>>
>> I think this OK, but I do have a question
>> if bond0 is in Active/Backup mode, eth2 and eth3 got the same MAC.addr,
>> what about the VLAN:s ?
>> (or is just one of thme working ??)
>>
>
>The VLAN MAC address will not be managed by the bond. In the
>storage case a SAN mac may be used (NETDEV_HW_ADDR_T_SAN).
>Otherwise the MAC can be managed normally.
>
>Both VLANs will receive frames but in some modes only to packet
>handlers that have exact matches. See bond_should_deliver_exact_match().
>
>.John.

Have made some test now,  this patch solves a big issue that we had with VLANs 
i.e. as a work-a-round we put macvlans in between the phys. interface and the bond.
I have tested the scenario below, where tipc is running on VLAN below the bonding interface.
With the patch it works fine now.
If you want you can add a
Tested-by: Hans Schillstrom <hams.schillstrom@ericsson.com>

                      +---------+        +---------+
                    +---------+ |      +---------+ |
                  +---------+ |-+    +---------+ |-+
                  | macvlan |-+      | macvlan |-+
                  +---------+        +---------+
                     | | |              | | |
                     | | |           +---------+
                     | | |       ----|  vlan8  |
                     | | |      /    +---------+
                     | | |     /
                  +----+----+ /
        +---------|  bond0  |=------------+
        |         +---------+             |
        |                                 |
   +----+----+  +---------+          +----+----+  +---------+
   |   eth1  |--|  vlan20 |          |   eth2  |--|  vlan21 |
   +----+----+  +---------+          +----+----+  +---------+
        |                                 |
        |                                 |
  +-----+-----+                     +-----+-----+
  | Switch-0  |_____________________|   Sw1     |
  |           |    ISL TRUNK        |           |
  +-+---+---+-+                     +-+---+---+-+
    |   |   |                         |   |   |
  vlan1 | vlan20                    vlan1 | vlan21
      vlan8                             vlan8



Thanks 
Hans
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maxime Bizon Oct. 13, 2011, 3:04 p.m. UTC | #9
On Tue, 2011-10-11 at 00:37 +0200, Jiri Pirko wrote:

> Hmm, I must look at this again tomorrow but I have strong feeling this
> will break some some scenario including vlan-bridge-macvlan. 

unless I'm mistaken, today's behaviour:

# vconfig add eth0 100
# brctl addbr br0
# brctl addif br0 eth0

=> eth0.100 gets no more packets, br0.100 is to be used

after the patch won't we get the opposite ?
Jiri Pirko Oct. 13, 2011, 3:38 p.m. UTC | #10
Thu, Oct 13, 2011 at 05:04:34PM CEST, mbizon@freebox.fr wrote:
>
>On Tue, 2011-10-11 at 00:37 +0200, Jiri Pirko wrote:
>
>> Hmm, I must look at this again tomorrow but I have strong feeling this
>> will break some some scenario including vlan-bridge-macvlan. 
>
>unless I'm mistaken, today's behaviour:
>
># vconfig add eth0 100
># brctl addbr br0
># brctl addif br0 eth0
>
>=> eth0.100 gets no more packets, br0.100 is to be used
>
>after the patch won't we get the opposite ?

Looks like it. The question is what is the correct behaviour...

>
>-- 
>Maxime
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maxime Bizon Oct. 13, 2011, 3:48 p.m. UTC | #11
On Thu, 2011-10-13 at 17:38 +0200, Jiri Pirko wrote:

> Looks like it. The question is what is the correct behaviour... 

since we don't want to break existing setup I would say the current one

but I must admit I'm not a big fan of it.
Hans Schillstrom Oct. 13, 2011, 3:59 p.m. UTC | #12
Thu, Oct 13, 2011 at 05:04:34PM CEST, mbizon@freebox.fr wrote:
>>
>>On Tue, 2011-10-11 at 00:37 +0200, Jiri Pirko wrote:
>>
>>> Hmm, I must look at this again tomorrow but I have strong feeling this
>>> will break some some scenario including vlan-bridge-macvlan.
>>
>>unless I'm mistaken, today's behaviour:
>>
>># vconfig add eth0 100
>># brctl addbr br0
>># brctl addif br0 eth0
>>
>>=> eth0.100 gets no more packets, br0.100 is to be used
>>
>>after the patch won't we get the opposite ?
>
>Looks like it. The question is what is the correct behaviour...

I think this it become correct now, you should not destroy lover level if possible.
I.e. as John wrote "it's not an unexpected behaviour"

Consider adding a bridge to a vlan like this

vconfig add eth0 100
brctl addbr br1
brctl addif br1 eth0.100

If you later add a bridge (or bond) should the previous added bridge still work ? 
Yes I think so, for me it's the expected behaviour.

brctl addbr br0
brctl addif br0 eth0

Regards
Hans


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Fastabend Oct. 13, 2011, 5:42 p.m. UTC | #13
On 10/13/2011 8:59 AM, Hans Schillström wrote:
> Thu, Oct 13, 2011 at 05:04:34PM CEST, mbizon@freebox.fr wrote:
>>>
>>> On Tue, 2011-10-11 at 00:37 +0200, Jiri Pirko wrote:
>>>
>>>> Hmm, I must look at this again tomorrow but I have strong feeling this
>>>> will break some some scenario including vlan-bridge-macvlan.
>>>
>>> unless I'm mistaken, today's behaviour:
>>>
>>> # vconfig add eth0 100
>>> # brctl addbr br0
>>> # brctl addif br0 eth0
>>>
>>> => eth0.100 gets no more packets, br0.100 is to be used
>>>
>>> after the patch won't we get the opposite ?
>>
>> Looks like it. The question is what is the correct behaviour...
> 
> I think this it become correct now, you should not destroy lover level if possible.
> I.e. as John wrote "it's not an unexpected behaviour"
> 
> Consider adding a bridge to a vlan like this
> 
> vconfig add eth0 100
> brctl addbr br1
> brctl addif br1 eth0.100
> 
> If you later add a bridge (or bond) should the previous added bridge still work ? 
> Yes I think so, for me it's the expected behaviour.
> 
> brctl addbr br0
> brctl addif br0 eth0
> 
> Regards
> Hans
> 
> 

Sorry I'm not entirely sure I followed the above two posts. Note
this patch restores behavior that has existed for most of the
2.6.x kernels. In the 3.x kernels VLAN100 is dropped in the
schematic below (assuming eth0 is active), I think this is
incorrect.

       ---> eth0.100
       |
       |
eth0 -----> br0

With this patch VLAN100 is delivered to eth0.100 as I expect. Now
adding a VLAN100 to br0.


       ---> eth0.100
       |
       |
eth0 -----> br0---> br0.100

With this patch in the above case VLAN100 is delivered to the
first matching vlan. Here eth0.100 will receive the packet If
you really want br0.100 to receive the packet remove eth0.100.
Without this patch the packet will be delivered to br0.100,
but no configuration will allow a packet to be delivered to
eth0.100 with a bond.

The first schematic is used for doing bonding on LAN traffic
and SAN (storage) traffic with MPIO. So I think my expectations
are correct and have a real use case.

Thanks,
John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hans Schillstrom Oct. 13, 2011, 6:23 p.m. UTC | #14
>On 10/13/2011 8:59 AM, Hans Schillström wrote:
>> Thu, Oct 13, 2011 at 05:04:34PM CEST, mbizon@freebox.fr wrote:
>>>>
>>>> On Tue, 2011-10-11 at 00:37 +0200, Jiri Pirko wrote:
>>>>
>>>>> Hmm, I must look at this again tomorrow but I have strong feeling this
>>>>> will break some some scenario including vlan-bridge-macvlan.
>>>>
>>>> unless I'm mistaken, today's behaviour:
>>>>
>>>> # vconfig add eth0 100
>>>> # brctl addbr br0
>>>> # brctl addif br0 eth0
>>>>
>>>> => eth0.100 gets no more packets, br0.100 is to be used
>>>>
>>>> after the patch won't we get the opposite ?
>>>
>>> Looks like it. The question is what is the correct behaviour...
>>
>> I think this it become correct now, you should not destroy lover level if possible.
>> I.e. as John wrote "it's not an unexpected behaviour"
>>
>> Consider adding a bridge to a vlan like this
>>
>> vconfig add eth0 100
>> brctl addbr br1
>> brctl addif br1 eth0.100
>>
>> If you later add a bridge (or bond) should the previous added bridge still work ?
>> Yes I think so, for me it's the expected behaviour.
>>
>> brctl addbr br0
>> brctl addif br0 eth0
>>
>
>Sorry I'm not entirely sure I followed the above two posts. Note
>this patch restores behavior that has existed for most of the
>2.6.x kernels. In the 3.x kernels VLAN100 is dropped in the
>schematic below (assuming eth0 is active), I think this is
>incorrect.
>
>      ---> eth0.100
>       |
>       |
>eth0 -----> br0
>
>With this patch VLAN100 is delivered to eth0.100 as I expect. Now
>adding a VLAN100 to br0.
>
>
>       ---> eth0.100
>       |
>       |
>eth0 -----> br0---> br0.100
>
>With this patch in the above case VLAN100 is delivered to the
>first matching vlan. Here eth0.100 will receive the packet If
>you really want br0.100 to receive the packet remove eth0.100.
>Without this patch the packet will be delivered to br0.100,
>but no configuration will allow a packet to be delivered to
>eth0.100 with a bond.
>
>The first schematic is used for doing bonding on LAN traffic
>and SAN (storage) traffic with MPIO. So I think my expectations
>are correct and have a real use case.
>
>Thanks,
>John

Sorry if I caused confusion,  I do agree with you to 100%.

Regards
Hans


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Gross Oct. 14, 2011, 12:22 a.m. UTC | #15
2011/10/13 John Fastabend <john.r.fastabend@intel.com>:
> On 10/13/2011 8:59 AM, Hans Schillström wrote:
>> Thu, Oct 13, 2011 at 05:04:34PM CEST, mbizon@freebox.fr wrote:
>>>>
>>>> On Tue, 2011-10-11 at 00:37 +0200, Jiri Pirko wrote:
>>>>
>>>>> Hmm, I must look at this again tomorrow but I have strong feeling this
>>>>> will break some some scenario including vlan-bridge-macvlan.
>>>>
>>>> unless I'm mistaken, today's behaviour:
>>>>
>>>> # vconfig add eth0 100
>>>> # brctl addbr br0
>>>> # brctl addif br0 eth0
>>>>
>>>> => eth0.100 gets no more packets, br0.100 is to be used
>>>>
>>>> after the patch won't we get the opposite ?
>>>
>>> Looks like it. The question is what is the correct behaviour...
>>
>> I think this it become correct now, you should not destroy lover level if possible.
>> I.e. as John wrote "it's not an unexpected behaviour"
>>
>> Consider adding a bridge to a vlan like this
>>
>> vconfig add eth0 100
>> brctl addbr br1
>> brctl addif br1 eth0.100
>>
>> If you later add a bridge (or bond) should the previous added bridge still work ?
>> Yes I think so, for me it's the expected behaviour.
>>
>> brctl addbr br0
>> brctl addif br0 eth0
>>
>> Regards
>> Hans
>>
>>
>
> Sorry I'm not entirely sure I followed the above two posts. Note
> this patch restores behavior that has existed for most of the
> 2.6.x kernels. In the 3.x kernels VLAN100 is dropped in the
> schematic below (assuming eth0 is active), I think this is
> incorrect.

Actually, for most of 2.6.x the behavior was somewhat
non-deterministic since it depended on kernel version and the NIC.  As
a result, I think we can safely say that this wasn't a particularly
firm interface that we have to be wedded to.  Based on overwhelming
feedback, I think the interface in this patch is the preferred one and
what we should stabilize on.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 19, 2011, 3:47 a.m. UTC | #16
From: Jesse Gross <jesse@nicira.com>
Date: Thu, 13 Oct 2011 17:22:02 -0700

> Actually, for most of 2.6.x the behavior was somewhat
> non-deterministic since it depended on kernel version and the NIC.  As
> a result, I think we can safely say that this wasn't a particularly
> firm interface that we have to be wedded to.  Based on overwhelming
> feedback, I think the interface in this patch is the preferred one and
> what we should stabilize on.

Agreed, and I've applied this patch to net-next, thanks everyone!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 28, 2011, 10 a.m. UTC | #17
Le mardi 18 octobre 2011 à 23:47 -0400, David Miller a écrit :
> From: Jesse Gross <jesse@nicira.com>
> Date: Thu, 13 Oct 2011 17:22:02 -0700
> 
> > Actually, for most of 2.6.x the behavior was somewhat
> > non-deterministic since it depended on kernel version and the NIC.  As
> > a result, I think we can safely say that this wasn't a particularly
> > firm interface that we have to be wedded to.  Based on overwhelming
> > feedback, I think the interface in this patch is the preferred one and
> > what we should stabilize on.
> 
> Agreed, and I've applied this patch to net-next, thanks everyone!

Hmm.

Oh well, this broke my setup, a very basic one.

eth1 and eth2 on a bonding device, bond0, active-backup

some vlans on top of bond0, say vlan.103

$ ip link show dev vlan.103
8: vlan.103@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
pfifo_fast state UP qlen 100
    link/ether 00:1e:0b:ec:d3:d2 brd ff:ff:ff:ff:ff:ff


arp_rcv() now gets packets with skb->type PACKET_OTHERHOST and drops
such packets.

     [000] 52870.115435: skb_gro_reset_offset <-napi_gro_receive
     [000] 52870.115435: dev_gro_receive <-napi_gro_receive
     [000] 52870.115435: napi_skb_finish <-napi_gro_receive
     [000] 52870.115435: netif_receive_skb <-napi_skb_finish
     [000] 52870.115435: get_rps_cpu <-netif_receive_skb
     [000] 52870.115435: __netif_receive_skb <-netif_receive_skb
     [000] 52870.115436: vlan_do_receive <-__netif_receive_skb
     [000] 52870.115436: bond_handle_frame <-__netif_receive_skb
     [000] 52870.115436: vlan_do_receive <-__netif_receive_skb
     [000] 52870.115436: arp_rcv <-__netif_receive_skb
     [000] 52870.115436: kfree_skb <-arp_rcv
     [000] 52870.115437: __kfree_skb <-kfree_skb
     [000] 52870.115437: skb_release_head_state <-__kfree_skb
     [000] 52870.115437: skb_release_data <-__kfree_skb
     [000] 52870.115437: kfree <-skb_release_data
     [000] 52870.115437: kmem_cache_free <-__kfree_skb


By the way, we have no SNMP counter here so I spent some time to track
this. I'll send a patch for this.

If this host initiates the trafic, all is well.

Please guys, can we get back ARP or revert this patch ?

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 70ecb86..8b6118a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3231,6 +3231,17 @@  another_round:
 ncls:
 #endif
 
+	if (vlan_tx_tag_present(skb)) {
+		if (pt_prev) {
+			ret = deliver_skb(skb, pt_prev, orig_dev);
+			pt_prev = NULL;
+		}
+		if (vlan_do_receive(&skb))
+			goto another_round;
+		else if (unlikely(!skb))
+			goto out;
+	}
+
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
 	if (rx_handler) {
 		if (pt_prev) {
@@ -3251,17 +3262,6 @@  ncls:
 		}
 	}
 
-	if (vlan_tx_tag_present(skb)) {
-		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
-			pt_prev = NULL;
-		}
-		if (vlan_do_receive(&skb))
-			goto another_round;
-		else if (unlikely(!skb))
-			goto out;
-	}
-
 	/* deliver only exact match when indicated */
 	null_or_dev = deliver_exact ? skb->dev : NULL;