Message ID | 200911241002.20904.arnd@arndb.de |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Arnd Bergmann wrote: > On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote: >>> + skb_dst_drop(skb); >>> + skb->tstamp.tv64 = 0; >>> + skb->pkt_type = PACKET_HOST; >>> + skb->protocol = eth_type_trans(skb, dev); >>> + skb->mark = 0; >> skb->mark clearing should stay private to veth since its usually >> supposed to stay intact. The only exception is packets crossing >> namespaces, where they should appear like a freshly received skbs. > > But isn't that what we want in macvlan as well when we're > forwarding from one downstream interface to another? In the TX direction you can use the mark for TC classification on the underlying device. > I did all my testing with macvlan interfaces in separate namespaces > communicating with each other, so I'd assume that we should always > clear skb->mark and skb->dst in this function. Good point, in that case we probably should clear it as well. But in the non-namespace case the TC classification currently works and this is consistent with any other virtual device driver, so it should continue to work. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote: > Arnd Bergmann wrote: > > On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote: > >>> + skb_dst_drop(skb); > >>> + skb->tstamp.tv64 = 0; > >>> + skb->pkt_type = PACKET_HOST; > >>> + skb->protocol = eth_type_trans(skb, dev); > >>> + skb->mark = 0; > >> skb->mark clearing should stay private to veth since its usually > >> supposed to stay intact. The only exception is packets crossing > >> namespaces, where they should appear like a freshly received skbs. > > > > But isn't that what we want in macvlan as well when we're > > forwarding from one downstream interface to another? > > In the TX direction you can use the mark for TC classification > on the underlying device. I don't use dev_forward_skb for the case where the data is sent to the underlying device, so the TC classification should stay intact. > > I did all my testing with macvlan interfaces in separate namespaces > > communicating with each other, so I'd assume that we should always > > clear skb->mark and skb->dst in this function. > > Good point, in that case we probably should clear it as well. But > in the non-namespace case the TC classification currently works and > this is consistent with any other virtual device driver, so it > should continue to work. Do you think we should be able to use TC to direct traffic between macvlans on the same underlying device in bridge mode? It does sound useful, but I'm not sure how to implement that or if you'd expect it to work with the current code. If we support that, it should probably also work with namespaces, by consuming the mark in the macvlan and veth drivers. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Arnd Bergmann wrote: > On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote: >> Arnd Bergmann wrote: >>> On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote: >>>>> + skb_dst_drop(skb); >>>>> + skb->tstamp.tv64 = 0; >>>>> + skb->pkt_type = PACKET_HOST; >>>>> + skb->protocol = eth_type_trans(skb, dev); >>>>> + skb->mark = 0; >>>> skb->mark clearing should stay private to veth since its usually >>>> supposed to stay intact. The only exception is packets crossing >>>> namespaces, where they should appear like a freshly received skbs. >>> But isn't that what we want in macvlan as well when we're >>> forwarding from one downstream interface to another? >> In the TX direction you can use the mark for TC classification >> on the underlying device. > > I don't use dev_forward_skb for the case where the data is sent > to the underlying device, so the TC classification should stay > intact. Right, I see. This looks fine. >>> I did all my testing with macvlan interfaces in separate namespaces >>> communicating with each other, so I'd assume that we should always >>> clear skb->mark and skb->dst in this function. >> Good point, in that case we probably should clear it as well. But >> in the non-namespace case the TC classification currently works and >> this is consistent with any other virtual device driver, so it >> should continue to work. > > Do you think we should be able to use TC to direct traffic between > macvlans on the same underlying device in bridge mode? It does sound > useful, but I'm not sure how to implement that or if you'd expect > it to work with the current code. If we support that, it should probably > also work with namespaces, by consuming the mark in the macvlan > and veth drivers. I don't think its necessary, we bypass outgoing queuing anyways. But if you'd want to add it, just keeping the skb->mark clearing in veth should work from what I can tell. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 24 November 2009, Patrick McHardy wrote: > I don't think its necessary, we bypass outgoing queuing anyways. > But if you'd want to add it, just keeping the skb->mark clearing > in veth should work from what I can tell. Ok, I won't bother with it for now then. Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy <kaber@trash.net> writes: >>>> I did all my testing with macvlan interfaces in separate namespaces >>>> communicating with each other, so I'd assume that we should always >>>> clear skb->mark and skb->dst in this function. >>> Good point, in that case we probably should clear it as well. But >>> in the non-namespace case the TC classification currently works and >>> this is consistent with any other virtual device driver, so it >>> should continue to work. >> >> Do you think we should be able to use TC to direct traffic between >> macvlans on the same underlying device in bridge mode? It does sound >> useful, but I'm not sure how to implement that or if you'd expect >> it to work with the current code. If we support that, it should probably >> also work with namespaces, by consuming the mark in the macvlan >> and veth drivers. > > I don't think its necessary, we bypass outgoing queuing anyways. > But if you'd want to add it, just keeping the skb->mark clearing > in veth should work from what I can tell. veth doesn't have an outgoing queue. The reason we clear skb->mark in veth is because when reentering the networking stack the packet needs to be reclassified. At the point of loopback we are talking a packet that has at least logically gone out of the machine on a wire and come back into the machine on another physical interface. So it seems to me we should have consistent handling for macvlans, veth, for the cases where we are looping packets back around. In practice I expect all of those cases are going to be cross namespace as otherwise we would have intercepted the packet before going out a physical interface. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman wrote: > Patrick McHardy <kaber@trash.net> writes: > >>>>> I did all my testing with macvlan interfaces in separate namespaces >>>>> communicating with each other, so I'd assume that we should always >>>>> clear skb->mark and skb->dst in this function. >>>> Good point, in that case we probably should clear it as well. But >>>> in the non-namespace case the TC classification currently works and >>>> this is consistent with any other virtual device driver, so it >>>> should continue to work. >>> Do you think we should be able to use TC to direct traffic between >>> macvlans on the same underlying device in bridge mode? It does sound >>> useful, but I'm not sure how to implement that or if you'd expect >>> it to work with the current code. If we support that, it should probably >>> also work with namespaces, by consuming the mark in the macvlan >>> and veth drivers. >> I don't think its necessary, we bypass outgoing queuing anyways. >> But if you'd want to add it, just keeping the skb->mark clearing >> in veth should work from what I can tell. > > veth doesn't have an outgoing queue. The reason we clear skb->mark > in veth is because when reentering the networking stack the packet > needs to be reclassified. At the point of loopback we are talking > a packet that has at least logically gone out of the machine on a > wire and come back into the machine on another physical interface. > > So it seems to me we should have consistent handling for macvlans, > veth, for the cases where we are looping packets back around. In > practice I expect all of those cases are going to be cross namespace > as otherwise we would have intercepted the packet before going > out a physical interface. Agreed on the looping case, that's what we're doing now. In the layered case (macvlan -> eth0) its common behaviour to keep the mark however. But in case of different namespaces, I think macvlan should also clear the mark on the dev_queue_xmit() path since this is just a shortcut to looping the packets through veth. In fact probably both of them should also clear skb->priority so other namespaces don't accidentally misclassify packets. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy <kaber@trash.net> writes: > Eric W. Biederman wrote: >> Patrick McHardy <kaber@trash.net> writes: >> >>>>>> I did all my testing with macvlan interfaces in separate namespaces >>>>>> communicating with each other, so I'd assume that we should always >>>>>> clear skb->mark and skb->dst in this function. >>>>> Good point, in that case we probably should clear it as well. But >>>>> in the non-namespace case the TC classification currently works and >>>>> this is consistent with any other virtual device driver, so it >>>>> should continue to work. >>>> Do you think we should be able to use TC to direct traffic between >>>> macvlans on the same underlying device in bridge mode? It does sound >>>> useful, but I'm not sure how to implement that or if you'd expect >>>> it to work with the current code. If we support that, it should probably >>>> also work with namespaces, by consuming the mark in the macvlan >>>> and veth drivers. >>> I don't think its necessary, we bypass outgoing queuing anyways. >>> But if you'd want to add it, just keeping the skb->mark clearing >>> in veth should work from what I can tell. >> >> veth doesn't have an outgoing queue. The reason we clear skb->mark >> in veth is because when reentering the networking stack the packet >> needs to be reclassified. At the point of loopback we are talking >> a packet that has at least logically gone out of the machine on a >> wire and come back into the machine on another physical interface. >> >> So it seems to me we should have consistent handling for macvlans, >> veth, for the cases where we are looping packets back around. In >> practice I expect all of those cases are going to be cross namespace >> as otherwise we would have intercepted the packet before going >> out a physical interface. > > Agreed on the looping case, that's what we're doing now. > > In the layered case (macvlan -> eth0) its common behaviour to > keep the mark however. But in case of different namespaces, > I think macvlan should also clear the mark on the dev_queue_xmit() > path since this is just a shortcut to looping the packets > through veth. In fact probably both of them should also clear > skb->priority so other namespaces don't accidentally misclassify > packets. That is why I pushed for what is becoming dev_forward_skb. So that we have one place where we can make all of those tweaks. It seems like in every review we find another field that should be cleared/handled specially. I don't quite follow what you intend with dev_queue_xmit when the macvlan is in one namespace and the real physical device is in another. Are you mentioning that the packet classifier runs in the namespace where the primary device lives with packets from a different namespace? Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tuesday 24 November 2009, Eric W. Biederman wrote: > I don't quite follow what you intend with dev_queue_xmit when the macvlan > is in one namespace and the real physical device is in another. Are > you mentioning that the packet classifier runs in the namespace where > the primary device lives with packets from a different namespace? I treat internal and external delivery very differently, the three cases are: 1. skb from real device to macvlan (macvlan_handle_frame): basically unchanged from before, except avoiding duplicate broadcasts. All skbs end up in netif_rx(vlan->dev) without clearing any data. We catch the frame in netif_receive_skb before it interacts with the namespace of the real device. 2. skb to external device (macvlan_start_xmit): if the destination is external, we just end up in dev_queue_xmit, with skb->dev set to the external device but no other changes. The data is already on the way out at this stage, so the namespace should not matter any more. 3. internal delivery: an skb from one macvlan to another gets always sent through dev_forward_skb, which is supposed to clear anything that must not leave the namespace. Does this make sense? Arnd <>< -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman wrote: > Patrick McHardy <kaber@trash.net> writes: > >> In the layered case (macvlan -> eth0) its common behaviour to >> keep the mark however. But in case of different namespaces, >> I think macvlan should also clear the mark on the dev_queue_xmit() >> path since this is just a shortcut to looping the packets >> through veth. In fact probably both of them should also clear >> skb->priority so other namespaces don't accidentally misclassify >> packets. > > That is why I pushed for what is becoming dev_forward_skb. So that > we have one place where we can make all of those tweaks. It seems > like in every review we find another field that should be cleared/handled > specially. > > I don't quite follow what you intend with dev_queue_xmit when the macvlan > is in one namespace and the real physical device is in another. Are > you mentioning that the packet classifier runs in the namespace where > the primary device lives with packets from a different namespace? Exactly. And I think we should make sure that the namespace of the macvlan device can't (deliberately or accidentally) cause misclassification. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/net/core/dev.c +++ b/net/core/dev.c @@ -1433,6 +1433,10 @@ static inline void net_timestamp(struct sk_buff *skb) * dev_forward_skb can be used for injecting an skb from the * start_xmit function of one device into the receive queue * of another device. + * + * The receiving device may be in another namespace, so + * we have to clear all information in the skb that could + * impact namespace isolation. */ int dev_forward_skb(struct net_device *dev, struct sk_buff *skb) {