diff mbox

[net] core: Untag packets after rx_handler has run.

Message ID 1409856043-21840-1-git-send-email-vyasevic@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Vladislav Yasevich Sept. 4, 2014, 6:40 p.m. UTC
Currently, we attempt to remove the vlan informaion from the packet
before passing it to the rx_handler.  In most situations this works
just fine since the rx_handlers are usually installed for the
lower device and thus vlan device isn't found.  However, macvtap
device is a bit different as it installs an rx_handler on top
of a macvlan device.  As a result, if someone was define a vlan
device on top of a macvap (for the purposes of enabling a VM
to use vlans with macvtap), then the current code will result
in passing an untagged packet to the macvtap rx_handler and the
VM will not receive tagged traffic.

This patch moves the untaggin code to after the rx_handler for
the current device has been called.  This still works for the
existing rx_handlers (like bonding/teaming/bridging/etc) and
also makes vlans on top of macvtap work as before.

Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device)
Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
---
 net/core/dev.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

Comments

Jiri Pirko Sept. 4, 2014, 7:05 p.m. UTC | #1
Thu, Sep 04, 2014 at 08:40:43PM CEST, vyasevich@gmail.com wrote:
>Currently, we attempt to remove the vlan informaion from the packet
>before passing it to the rx_handler.  In most situations this works
>just fine since the rx_handlers are usually installed for the
>lower device and thus vlan device isn't found.  However, macvtap
>device is a bit different as it installs an rx_handler on top
>of a macvlan device.  As a result, if someone was define a vlan
>device on top of a macvap (for the purposes of enabling a VM
>to use vlans with macvtap), then the current code will result
>in passing an untagged packet to the macvtap rx_handler and the
>VM will not receive tagged traffic.

skb->vlan_tci is set. macvlan should work with that to pass the frame
correctly. This should be handled in macvtap code.

btw can you give me an example of setup where rx_handler is set for
macvlan device? I wonder what is could be.

>
>This patch moves the untaggin code to after the rx_handler for
>the current device has been called.  This still works for the
>existing rx_handlers (like bonding/teaming/bridging/etc) and
>also makes vlans on top of macvtap work as before.
>
>Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device)
>Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>---
> net/core/dev.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
>diff --git a/net/core/dev.c b/net/core/dev.c
>index ab9a165..54691d1 100644
>--- a/net/core/dev.c
>+++ b/net/core/dev.c
>@@ -3642,17 +3642,6 @@ ncls:
> 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
> 		goto drop;
> 
>-	if (vlan_tx_tag_present(skb)) {
>-		if (pt_prev) {
>-			ret = deliver_skb(skb, pt_prev, orig_dev);
>-			pt_prev = NULL;
>-		}
>-		if (vlan_do_receive(&skb))
>-			goto another_round;
>-		else if (unlikely(!skb))
>-			goto unlock;
>-	}
>-
> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
> 	if (rx_handler) {
> 		if (pt_prev) {
>@@ -3674,6 +3663,17 @@ ncls:
> 		}
> 	}
> 
>+	if (vlan_tx_tag_present(skb)) {
>+		if (pt_prev) {
>+			ret = deliver_skb(skb, pt_prev, orig_dev);
>+			pt_prev = NULL;
>+		}
>+		if (vlan_do_receive(&skb))
>+			goto another_round;
>+		else if (unlikely(!skb))
>+			goto unlock;
>+	}
>+

nack. This will definitelly break several stacked setups.


> 	if (unlikely(vlan_tx_tag_present(skb))) {
> 		if (vlan_tx_tag_get_id(skb))
> 			skb->pkt_type = PACKET_OTHERHOST;
>-- 
>1.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vladislav Yasevich Sept. 4, 2014, 7:29 p.m. UTC | #2
On 09/04/2014 03:05 PM, Jiri Pirko wrote:
> Thu, Sep 04, 2014 at 08:40:43PM CEST, vyasevich@gmail.com wrote:
>> Currently, we attempt to remove the vlan informaion from the packet
>> before passing it to the rx_handler.  In most situations this works
>> just fine since the rx_handlers are usually installed for the
>> lower device and thus vlan device isn't found.  However, macvtap
>> device is a bit different as it installs an rx_handler on top
>> of a macvlan device.  As a result, if someone was define a vlan
>> device on top of a macvap (for the purposes of enabling a VM
>> to use vlans with macvtap), then the current code will result
>> in passing an untagged packet to the macvtap rx_handler and the
>> VM will not receive tagged traffic.
> 
> skb->vlan_tci is set. macvlan should work with that to pass the frame
> correctly. This should be handled in macvtap code.

OK.  Consider a configuration.

               vlan10
vlan10           |
  |         VM1:eth0
  v          |
macvtap0 <---+
  |
  V
eth0

On the host, vlan10 can't really send and receive traffic.  It's only
there to add a vlan filter to eth0 so that packets marked with vlan10
can be received.

When traffic is processed by __netif_receive_skb_core(), skb->dev is eth0 and we
have the tci, so that when vlan_do_receive() is called, we can't find the vlan
device and call the rx_handler macvlan_handle_frame().
That handler calls netif_rx() with skb->dev set to macvtap0.

This time through the receive path, vlan_tci is still set and we
do find the vlan device which is on top of macvtap0, so we set the tci to 0
and then pass it to the rx_handler macvtap_handle_frame().

As a result, we pass an untagged frame to the VM.

> 
> btw can you give me an example of setup where rx_handler is set for
> macvlan device? I wonder what is could be.
> 
>>
>> This patch moves the untaggin code to after the rx_handler for
>> the current device has been called.  This still works for the
>> existing rx_handlers (like bonding/teaming/bridging/etc) and
>> also makes vlans on top of macvtap work as before.
>>
>> Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device)
>> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>> net/core/dev.c | 22 +++++++++++-----------
>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index ab9a165..54691d1 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3642,17 +3642,6 @@ ncls:
>> 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
>> 		goto drop;
>>
>> -	if (vlan_tx_tag_present(skb)) {
>> -		if (pt_prev) {
>> -			ret = deliver_skb(skb, pt_prev, orig_dev);
>> -			pt_prev = NULL;
>> -		}
>> -		if (vlan_do_receive(&skb))
>> -			goto another_round;
>> -		else if (unlikely(!skb))
>> -			goto unlock;
>> -	}
>> -
>> 	rx_handler = rcu_dereference(skb->dev->rx_handler);
>> 	if (rx_handler) {
>> 		if (pt_prev) {
>> @@ -3674,6 +3663,17 @@ ncls:
>> 		}
>> 	}
>>
>> +	if (vlan_tx_tag_present(skb)) {
>> +		if (pt_prev) {
>> +			ret = deliver_skb(skb, pt_prev, orig_dev);
>> +			pt_prev = NULL;
>> +		}
>> +		if (vlan_do_receive(&skb))
>> +			goto another_round;
>> +		else if (unlikely(!skb))
>> +			goto unlock;
>> +	}
>> +
> 
> nack. This will definitelly break several stacked setups.

Which ones?  The only thing I can see that would behave differently
is something like:

    vlan0      bridge0
     |           |
     +-------- eth0

In this case, the old code would give an untagged packet to the bridge
and the new code would give a tagged packet.

This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
traffic even though the vlan has no relationship to the bridge.

I've tested a couple of different stacked setups and they all seem to work.

Thanks
-vlad

> 
> 
>> 	if (unlikely(vlan_tx_tag_present(skb))) {
>> 		if (vlan_tx_tag_get_id(skb))
>> 			skb->pkt_type = PACKET_OTHERHOST;
>> -- 
>> 1.9.3
>>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Sept. 4, 2014, 8:43 p.m. UTC | #3
On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
> > nack. This will definitelly break several stacked setups.
> 
> Which ones?  The only thing I can see that would behave differently
> is something like:
> 
>     vlan0      bridge0
>      |           |
>      +-------- eth0
> 
> In this case, the old code would give an untagged packet to the bridge
> and the new code would give a tagged packet.
> 
> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
> traffic even though the vlan has no relationship to the bridge.
> 
> I've tested a couple of different stacked setups and they all seem to work.

2nd nack.
It will break user space, including our setup that has:
 vlanX     OVS
   |        |
   +------ eth0

vlan device has IP assigned and all tagged traffic goes through the stack
and into control plane process. ovs datapath keeps managing eth0 with
all other vlans.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Sept. 4, 2014, 9:01 p.m. UTC | #4
On 09/04/2014 04:43 PM, Alexei Starovoitov wrote:
> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
>>> nack. This will definitelly break several stacked setups.
>>
>> Which ones?  The only thing I can see that would behave differently
>> is something like:
>>
>>     vlan0      bridge0
>>      |           |
>>      +-------- eth0
>>
>> In this case, the old code would give an untagged packet to the bridge
>> and the new code would give a tagged packet.
>>
>> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
>> traffic even though the vlan has no relationship to the bridge.
>>
>> I've tested a couple of different stacked setups and they all seem to work.
> 
> 2nd nack.
> It will break user space, including our setup that has:
>  vlanX     OVS
>    |        |
>    +------ eth0
> 
> vlan device has IP assigned and all tagged traffic goes through the stack
> and into control plane process. ovs datapath keeps managing eth0 with
> all other vlans.
> 

Did you specially configure OVS to pass the traffic up the stack?  I see
OVS will only pass LOOPBACK packets.  All others it seems to consume.

Can the same be accomplished with a tagged internal port?

The reason I am asking, is I am trying to figure out if this is
a valid config.  It seems very hard to get right and seems to work almost
by accident at times.  For example, in the bridge scenario I described.
vlan and bridge have to share a mac address for that work.

-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexei Starovoitov Sept. 4, 2014, 9:54 p.m. UTC | #5
On Thu, Sep 4, 2014 at 2:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
> On 09/04/2014 04:43 PM, Alexei Starovoitov wrote:
>> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
>>>> nack. This will definitelly break several stacked setups.
>>>
>>> Which ones?  The only thing I can see that would behave differently
>>> is something like:
>>>
>>>     vlan0      bridge0
>>>      |           |
>>>      +-------- eth0
>>>
>>> In this case, the old code would give an untagged packet to the bridge
>>> and the new code would give a tagged packet.
>>>
>>> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
>>> traffic even though the vlan has no relationship to the bridge.
>>>
>>> I've tested a couple of different stacked setups and they all seem to work.
>>
>> 2nd nack.
>> It will break user space, including our setup that has:
>>  vlanX     OVS
>>    |        |
>>    +------ eth0
>>
>> vlan device has IP assigned and all tagged traffic goes through the stack
>> and into control plane process. ovs datapath keeps managing eth0 with
>> all other vlans.
>>
>
> Did you specially configure OVS to pass the traffic up the stack?  I see
> OVS will only pass LOOPBACK packets.  All others it seems to consume.
>
> Can the same be accomplished with a tagged internal port?

our ovs config is not using internal port. vlan device is used as
control interface and should be independent of ovs datapath.
Theoretically it may be possible to use ovs for both, but very dangerous,
when control and data are going through the same datapath.
Any ovs programming mistake will kill control plane and whole
hypervisor will become inaccessible.

> The reason I am asking, is I am trying to figure out if this is
> a valid config.  It seems very hard to get right and seems to work almost
> by accident at times.  For example, in the bridge scenario I described.
> vlan and bridge have to share a mac address for that work.

I think it's not valid vs invalid config.
this was the behavior of vlan devices for long time. vlan was parsed
and send to vlan_dev _before_ rx_handler. I suspect there is more
than one user app that is relying on that.
I can change our stuff to do something different, but I think we
should not be breaking vlan behavior for others.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Sept. 4, 2014, 11:48 p.m. UTC | #6
On 09/04/2014 05:54 PM, Alexei Starovoitov wrote:
> On Thu, Sep 4, 2014 at 2:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote:
>> On 09/04/2014 04:43 PM, Alexei Starovoitov wrote:
>>> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote:
>>>>> nack. This will definitelly break several stacked setups.
>>>>
>>>> Which ones?  The only thing I can see that would behave differently
>>>> is something like:
>>>>
>>>>     vlan0      bridge0
>>>>      |           |
>>>>      +-------- eth0
>>>>
>>>> In this case, the old code would give an untagged packet to the bridge
>>>> and the new code would give a tagged packet.
>>>>
>>>> This set-up is a bit ambiguous.  Remove the vlan, and bridge gets a tagged
>>>> traffic even though the vlan has no relationship to the bridge.
>>>>
>>>> I've tested a couple of different stacked setups and they all seem to work.
>>>
>>> 2nd nack.
>>> It will break user space, including our setup that has:
>>>  vlanX     OVS
>>>    |        |
>>>    +------ eth0
>>>
>>> vlan device has IP assigned and all tagged traffic goes through the stack
>>> and into control plane process. ovs datapath keeps managing eth0 with
>>> all other vlans.
>>>
>>
>> Did you specially configure OVS to pass the traffic up the stack?  I see
>> OVS will only pass LOOPBACK packets.  All others it seems to consume.
>>
>> Can the same be accomplished with a tagged internal port?
> 
> our ovs config is not using internal port. vlan device is used as
> control interface and should be independent of ovs datapath.
> Theoretically it may be possible to use ovs for both, but very dangerous,
> when control and data are going through the same datapath.
> Any ovs programming mistake will kill control plane and whole
> hypervisor will become inaccessible.
> 
>> The reason I am asking, is I am trying to figure out if this is
>> a valid config.  It seems very hard to get right and seems to work almost
>> by accident at times.  For example, in the bridge scenario I described.
>> vlan and bridge have to share a mac address for that work.
> 
> I think it's not valid vs invalid config.
> this was the behavior of vlan devices for long time. vlan was parsed
> and send to vlan_dev _before_ rx_handler. I suspect there is more
> than one user app that is relying on that.
> I can change our stuff to do something different, but I think we
> should not be breaking vlan behavior for others.
> 

I see.  So vlan device always appears to take precedence over the rx_handler
if they are at the same level and we can't break this.

OK, this means that to solve this we have to expose the vlan filtering
API on macvtap devices as well.

Thanks
-vlad
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index ab9a165..54691d1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3642,17 +3642,6 @@  ncls:
 	if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
 		goto drop;
 
-	if (vlan_tx_tag_present(skb)) {
-		if (pt_prev) {
-			ret = deliver_skb(skb, pt_prev, orig_dev);
-			pt_prev = NULL;
-		}
-		if (vlan_do_receive(&skb))
-			goto another_round;
-		else if (unlikely(!skb))
-			goto unlock;
-	}
-
 	rx_handler = rcu_dereference(skb->dev->rx_handler);
 	if (rx_handler) {
 		if (pt_prev) {
@@ -3674,6 +3663,17 @@  ncls:
 		}
 	}
 
+	if (vlan_tx_tag_present(skb)) {
+		if (pt_prev) {
+			ret = deliver_skb(skb, pt_prev, orig_dev);
+			pt_prev = NULL;
+		}
+		if (vlan_do_receive(&skb))
+			goto another_round;
+		else if (unlikely(!skb))
+			goto unlock;
+	}
+
 	if (unlikely(vlan_tx_tag_present(skb))) {
 		if (vlan_tx_tag_get_id(skb))
 			skb->pkt_type = PACKET_OTHERHOST;