Message ID | 1409856043-21840-1-git-send-email-vyasevic@redhat.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Thu, Sep 04, 2014 at 08:40:43PM CEST, vyasevich@gmail.com wrote: >Currently, we attempt to remove the vlan informaion from the packet >before passing it to the rx_handler. In most situations this works >just fine since the rx_handlers are usually installed for the >lower device and thus vlan device isn't found. However, macvtap >device is a bit different as it installs an rx_handler on top >of a macvlan device. As a result, if someone was define a vlan >device on top of a macvap (for the purposes of enabling a VM >to use vlans with macvtap), then the current code will result >in passing an untagged packet to the macvtap rx_handler and the >VM will not receive tagged traffic. skb->vlan_tci is set. macvlan should work with that to pass the frame correctly. This should be handled in macvtap code. btw can you give me an example of setup where rx_handler is set for macvlan device? I wonder what is could be. > >This patch moves the untaggin code to after the rx_handler for >the current device has been called. This still works for the >existing rx_handlers (like bonding/teaming/bridging/etc) and >also makes vlans on top of macvtap work as before. > >Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device) >Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> >Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> >--- > net/core/dev.c | 22 +++++++++++----------- > 1 file changed, 11 insertions(+), 11 deletions(-) > >diff --git a/net/core/dev.c b/net/core/dev.c >index ab9a165..54691d1 100644 >--- a/net/core/dev.c >+++ b/net/core/dev.c >@@ -3642,17 +3642,6 @@ ncls: > if (pfmemalloc && !skb_pfmemalloc_protocol(skb)) > goto drop; > >- if (vlan_tx_tag_present(skb)) { >- if (pt_prev) { >- ret = deliver_skb(skb, pt_prev, orig_dev); >- pt_prev = NULL; >- } >- if (vlan_do_receive(&skb)) >- goto another_round; >- else if (unlikely(!skb)) >- goto unlock; >- } >- > rx_handler = rcu_dereference(skb->dev->rx_handler); > if (rx_handler) { > if (pt_prev) { >@@ -3674,6 +3663,17 @@ ncls: > } > } > >+ if (vlan_tx_tag_present(skb)) { >+ if (pt_prev) { >+ ret = deliver_skb(skb, pt_prev, orig_dev); >+ pt_prev = NULL; >+ } >+ if (vlan_do_receive(&skb)) >+ goto another_round; >+ else if (unlikely(!skb)) >+ goto unlock; >+ } >+ nack. This will definitelly break several stacked setups. > if (unlikely(vlan_tx_tag_present(skb))) { > if (vlan_tx_tag_get_id(skb)) > skb->pkt_type = PACKET_OTHERHOST; >-- >1.9.3 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/04/2014 03:05 PM, Jiri Pirko wrote: > Thu, Sep 04, 2014 at 08:40:43PM CEST, vyasevich@gmail.com wrote: >> Currently, we attempt to remove the vlan informaion from the packet >> before passing it to the rx_handler. In most situations this works >> just fine since the rx_handlers are usually installed for the >> lower device and thus vlan device isn't found. However, macvtap >> device is a bit different as it installs an rx_handler on top >> of a macvlan device. As a result, if someone was define a vlan >> device on top of a macvap (for the purposes of enabling a VM >> to use vlans with macvtap), then the current code will result >> in passing an untagged packet to the macvtap rx_handler and the >> VM will not receive tagged traffic. > > skb->vlan_tci is set. macvlan should work with that to pass the frame > correctly. This should be handled in macvtap code. OK. Consider a configuration. vlan10 vlan10 | | VM1:eth0 v | macvtap0 <---+ | V eth0 On the host, vlan10 can't really send and receive traffic. It's only there to add a vlan filter to eth0 so that packets marked with vlan10 can be received. When traffic is processed by __netif_receive_skb_core(), skb->dev is eth0 and we have the tci, so that when vlan_do_receive() is called, we can't find the vlan device and call the rx_handler macvlan_handle_frame(). That handler calls netif_rx() with skb->dev set to macvtap0. This time through the receive path, vlan_tci is still set and we do find the vlan device which is on top of macvtap0, so we set the tci to 0 and then pass it to the rx_handler macvtap_handle_frame(). As a result, we pass an untagged frame to the VM. > > btw can you give me an example of setup where rx_handler is set for > macvlan device? I wonder what is could be. > >> >> This patch moves the untaggin code to after the rx_handler for >> the current device has been called. This still works for the >> existing rx_handlers (like bonding/teaming/bridging/etc) and >> also makes vlans on top of macvtap work as before. >> >> Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device) >> Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> >> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> >> --- >> net/core/dev.c | 22 +++++++++++----------- >> 1 file changed, 11 insertions(+), 11 deletions(-) >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index ab9a165..54691d1 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -3642,17 +3642,6 @@ ncls: >> if (pfmemalloc && !skb_pfmemalloc_protocol(skb)) >> goto drop; >> >> - if (vlan_tx_tag_present(skb)) { >> - if (pt_prev) { >> - ret = deliver_skb(skb, pt_prev, orig_dev); >> - pt_prev = NULL; >> - } >> - if (vlan_do_receive(&skb)) >> - goto another_round; >> - else if (unlikely(!skb)) >> - goto unlock; >> - } >> - >> rx_handler = rcu_dereference(skb->dev->rx_handler); >> if (rx_handler) { >> if (pt_prev) { >> @@ -3674,6 +3663,17 @@ ncls: >> } >> } >> >> + if (vlan_tx_tag_present(skb)) { >> + if (pt_prev) { >> + ret = deliver_skb(skb, pt_prev, orig_dev); >> + pt_prev = NULL; >> + } >> + if (vlan_do_receive(&skb)) >> + goto another_round; >> + else if (unlikely(!skb)) >> + goto unlock; >> + } >> + > > nack. This will definitelly break several stacked setups. Which ones? The only thing I can see that would behave differently is something like: vlan0 bridge0 | | +-------- eth0 In this case, the old code would give an untagged packet to the bridge and the new code would give a tagged packet. This set-up is a bit ambiguous. Remove the vlan, and bridge gets a tagged traffic even though the vlan has no relationship to the bridge. I've tested a couple of different stacked setups and they all seem to work. Thanks -vlad > > >> if (unlikely(vlan_tx_tag_present(skb))) { >> if (vlan_tx_tag_get_id(skb)) >> skb->pkt_type = PACKET_OTHERHOST; >> -- >> 1.9.3 >> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote: > > nack. This will definitelly break several stacked setups. > > Which ones? The only thing I can see that would behave differently > is something like: > > vlan0 bridge0 > | | > +-------- eth0 > > In this case, the old code would give an untagged packet to the bridge > and the new code would give a tagged packet. > > This set-up is a bit ambiguous. Remove the vlan, and bridge gets a tagged > traffic even though the vlan has no relationship to the bridge. > > I've tested a couple of different stacked setups and they all seem to work. 2nd nack. It will break user space, including our setup that has: vlanX OVS | | +------ eth0 vlan device has IP assigned and all tagged traffic goes through the stack and into control plane process. ovs datapath keeps managing eth0 with all other vlans. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/04/2014 04:43 PM, Alexei Starovoitov wrote: > On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote: >>> nack. This will definitelly break several stacked setups. >> >> Which ones? The only thing I can see that would behave differently >> is something like: >> >> vlan0 bridge0 >> | | >> +-------- eth0 >> >> In this case, the old code would give an untagged packet to the bridge >> and the new code would give a tagged packet. >> >> This set-up is a bit ambiguous. Remove the vlan, and bridge gets a tagged >> traffic even though the vlan has no relationship to the bridge. >> >> I've tested a couple of different stacked setups and they all seem to work. > > 2nd nack. > It will break user space, including our setup that has: > vlanX OVS > | | > +------ eth0 > > vlan device has IP assigned and all tagged traffic goes through the stack > and into control plane process. ovs datapath keeps managing eth0 with > all other vlans. > Did you specially configure OVS to pass the traffic up the stack? I see OVS will only pass LOOPBACK packets. All others it seems to consume. Can the same be accomplished with a tagged internal port? The reason I am asking, is I am trying to figure out if this is a valid config. It seems very hard to get right and seems to work almost by accident at times. For example, in the bridge scenario I described. vlan and bridge have to share a mac address for that work. -vlad -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 4, 2014 at 2:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote: > On 09/04/2014 04:43 PM, Alexei Starovoitov wrote: >> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote: >>>> nack. This will definitelly break several stacked setups. >>> >>> Which ones? The only thing I can see that would behave differently >>> is something like: >>> >>> vlan0 bridge0 >>> | | >>> +-------- eth0 >>> >>> In this case, the old code would give an untagged packet to the bridge >>> and the new code would give a tagged packet. >>> >>> This set-up is a bit ambiguous. Remove the vlan, and bridge gets a tagged >>> traffic even though the vlan has no relationship to the bridge. >>> >>> I've tested a couple of different stacked setups and they all seem to work. >> >> 2nd nack. >> It will break user space, including our setup that has: >> vlanX OVS >> | | >> +------ eth0 >> >> vlan device has IP assigned and all tagged traffic goes through the stack >> and into control plane process. ovs datapath keeps managing eth0 with >> all other vlans. >> > > Did you specially configure OVS to pass the traffic up the stack? I see > OVS will only pass LOOPBACK packets. All others it seems to consume. > > Can the same be accomplished with a tagged internal port? our ovs config is not using internal port. vlan device is used as control interface and should be independent of ovs datapath. Theoretically it may be possible to use ovs for both, but very dangerous, when control and data are going through the same datapath. Any ovs programming mistake will kill control plane and whole hypervisor will become inaccessible. > The reason I am asking, is I am trying to figure out if this is > a valid config. It seems very hard to get right and seems to work almost > by accident at times. For example, in the bridge scenario I described. > vlan and bridge have to share a mac address for that work. I think it's not valid vs invalid config. this was the behavior of vlan devices for long time. vlan was parsed and send to vlan_dev _before_ rx_handler. I suspect there is more than one user app that is relying on that. I can change our stuff to do something different, but I think we should not be breaking vlan behavior for others. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/04/2014 05:54 PM, Alexei Starovoitov wrote: > On Thu, Sep 4, 2014 at 2:01 PM, Vlad Yasevich <vyasevic@redhat.com> wrote: >> On 09/04/2014 04:43 PM, Alexei Starovoitov wrote: >>> On Thu, Sep 04, 2014 at 03:29:00PM -0400, Vlad Yasevich wrote: >>>>> nack. This will definitelly break several stacked setups. >>>> >>>> Which ones? The only thing I can see that would behave differently >>>> is something like: >>>> >>>> vlan0 bridge0 >>>> | | >>>> +-------- eth0 >>>> >>>> In this case, the old code would give an untagged packet to the bridge >>>> and the new code would give a tagged packet. >>>> >>>> This set-up is a bit ambiguous. Remove the vlan, and bridge gets a tagged >>>> traffic even though the vlan has no relationship to the bridge. >>>> >>>> I've tested a couple of different stacked setups and they all seem to work. >>> >>> 2nd nack. >>> It will break user space, including our setup that has: >>> vlanX OVS >>> | | >>> +------ eth0 >>> >>> vlan device has IP assigned and all tagged traffic goes through the stack >>> and into control plane process. ovs datapath keeps managing eth0 with >>> all other vlans. >>> >> >> Did you specially configure OVS to pass the traffic up the stack? I see >> OVS will only pass LOOPBACK packets. All others it seems to consume. >> >> Can the same be accomplished with a tagged internal port? > > our ovs config is not using internal port. vlan device is used as > control interface and should be independent of ovs datapath. > Theoretically it may be possible to use ovs for both, but very dangerous, > when control and data are going through the same datapath. > Any ovs programming mistake will kill control plane and whole > hypervisor will become inaccessible. > >> The reason I am asking, is I am trying to figure out if this is >> a valid config. It seems very hard to get right and seems to work almost >> by accident at times. For example, in the bridge scenario I described. >> vlan and bridge have to share a mac address for that work. > > I think it's not valid vs invalid config. > this was the behavior of vlan devices for long time. vlan was parsed > and send to vlan_dev _before_ rx_handler. I suspect there is more > than one user app that is relying on that. > I can change our stuff to do something different, but I think we > should not be breaking vlan behavior for others. > I see. So vlan device always appears to take precedence over the rx_handler if they are at the same level and we can't break this. OK, this means that to solve this we have to expose the vlan filtering API on macvtap devices as well. Thanks -vlad -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dev.c b/net/core/dev.c index ab9a165..54691d1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3642,17 +3642,6 @@ ncls: if (pfmemalloc && !skb_pfmemalloc_protocol(skb)) goto drop; - if (vlan_tx_tag_present(skb)) { - if (pt_prev) { - ret = deliver_skb(skb, pt_prev, orig_dev); - pt_prev = NULL; - } - if (vlan_do_receive(&skb)) - goto another_round; - else if (unlikely(!skb)) - goto unlock; - } - rx_handler = rcu_dereference(skb->dev->rx_handler); if (rx_handler) { if (pt_prev) { @@ -3674,6 +3663,17 @@ ncls: } } + if (vlan_tx_tag_present(skb)) { + if (pt_prev) { + ret = deliver_skb(skb, pt_prev, orig_dev); + pt_prev = NULL; + } + if (vlan_do_receive(&skb)) + goto another_round; + else if (unlikely(!skb)) + goto unlock; + } + if (unlikely(vlan_tx_tag_present(skb))) { if (vlan_tx_tag_get_id(skb)) skb->pkt_type = PACKET_OTHERHOST;
Currently, we attempt to remove the vlan informaion from the packet before passing it to the rx_handler. In most situations this works just fine since the rx_handlers are usually installed for the lower device and thus vlan device isn't found. However, macvtap device is a bit different as it installs an rx_handler on top of a macvlan device. As a result, if someone was define a vlan device on top of a macvap (for the purposes of enabling a VM to use vlans with macvtap), then the current code will result in passing an untagged packet to the macvtap rx_handler and the VM will not receive tagged traffic. This patch moves the untaggin code to after the rx_handler for the current device has been called. This still works for the existing rx_handlers (like bonding/teaming/bridging/etc) and also makes vlans on top of macvtap work as before. Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device) Reported-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> --- net/core/dev.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)