Message ID | 20160713073152.GC29661@penelope.isobedori.kobe.vergenet.net |
---|---|
State | Not Applicable |
Headers | show |
On Wed, Jul 13, 2016 at 12:31 AM, Simon Horman <simon.horman@netronome.com> wrote: > Hi Pravin, > > On Thu, Jul 07, 2016 at 01:54:15PM -0700, pravin shelar wrote: >> On Wed, Jul 6, 2016 at 10:59 AM, Simon Horman >> <simon.horman@netronome.com> wrote: > > ... > >> > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c >> > index 0ea128eeeab2..86f2cfb19de3 100644 >> > --- a/net/openvswitch/flow.c >> > +++ b/net/openvswitch/flow.c >> ... >> >> > @@ -723,9 +729,17 @@ int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info, >> > key->phy.skb_mark = skb->mark; >> > ovs_ct_fill_key(skb, key); >> > key->ovs_flow_hash = 0; >> > + key->phy.is_layer3 = skb->mac_len == 0; >> >> I do not think mac_len can be used. mac_header needs to be checked. >> ... > > Yes, indeed. The update to use skb_mac_header_was_set() here accidently > slipped into the following patch, sorry about that. > > With that change in place I believe that this patch is internally > consistent because mac_header and mac_len are set correctly by the > call to key_extract() which is called by ovs_flow_key_extract() just > after where the excerpt above ends. > > That said, I do think that it is possible to rely on skb_mac_header_was_set > throughout the datapath, including action processing etc... I have provided > an incremental patch - which I created on top of this entire series - at > the end of this email. If you prefer that approach I am happy to take it, > though I do feel that using mac_len leads to slightly cleaner code. Let me > know what you think. > I am not sure if you can use only mac_len to detect L3 packet. This does not work with MPLS packets, mac_len is used to account MPLS headers pushed on skb. Therefore in case of a MPLS header on L3 packet, mac_len would be non zero and we have to look at either mac_header or some other metadata like is_layer3 flag from key to check for L3 packet. >> > diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c >> > index 4e3972344aa6..733e7914f6bd 100644 >> > --- a/net/openvswitch/vport-netdev.c >> > +++ b/net/openvswitch/vport-netdev.c >> > @@ -57,8 +57,10 @@ static void netdev_port_receive(struct sk_buff *skb) >> > if (unlikely(!skb)) >> > return; >> > >> > - skb_push(skb, ETH_HLEN); >> > - skb_postpush_rcsum(skb, skb->data, ETH_HLEN); >> > + if (vport->dev->type == ARPHRD_ETHER) { >> > + skb_push(skb, ETH_HLEN); >> > + skb_postpush_rcsum(skb, skb->data, ETH_HLEN); >> > + } >> This is still required for tunnel device of ARPHRD_NONE which can >> handle l2 packets. > > That is not necessary given the current implementation (of ipgre) as it > supplies an skb with the mac header in place if the inner packet was an > Ethernet packet. This scheme could of course be adjusted. > > ... > I think we should send L2 header with l2 header pushed on skb. This is what OVS expect. The skb-push should be done for all l2 packets rather than for particular type of device. > > > Update to use skb_mac_header_was_set() more as mentioned above. > Please let me know what you think about this approach. > > include/net/mpls.h | 4 ++- > net/openvswitch/actions.c | 42 ++++++++++++++++++++--------------- > net/openvswitch/flow.c | 23 +++++++++++-------- > net/openvswitch/vport-internal_dev.c | 2 - > net/openvswitch/vport-netdev.c | 4 +-- > 5 files changed, 44 insertions(+), 31 deletions(-) > > diff --git a/include/net/mpls.h b/include/net/mpls.h > index 5b3b5addfb08..296b68661be0 100644 > --- a/include/net/mpls.h > +++ b/include/net/mpls.h > @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type) > */ > static inline unsigned char *skb_mpls_header(struct sk_buff *skb) > { > - return skb_mac_header(skb) + skb->mac_len; > + return skb_mac_header_was_set(skb) ? > + skb_mac_header(skb) + skb->mac_len : > + skb->data; > } This function is also called from GSO layer. issue is in GSO layer, it does reset mac header and mac length and then calls mpls-gso-handler. So all subsequent check for L3 packet fails. So far we have explored three different ways to detect L3 packet but each has its own issue. 1. skb mac header : GSO can reset mac header. 2. skb mac length : MPLS uses mac_len to account for MPLS header length along with L2 header 3. skb protocol: ETH_P_TEB is not set for all L2 frames, networking stack is not ready to handle this type for given skb. So none of them works consistently. I think the only option to detect L3 packet reliably (and without adding field to skb) is to use skb-protocol along with ARPHRD_NONE device type. If ARPHRD_NONE type device generates L2 packet it needs to set protocol to ETH_P_TEB. Some networking stack function also needs to be fixed to handle this protocol type, e.g. vlan_get_protocol(), br_dev_queue_push_xmit(), etc.
Reviving a very old thread, sorry. Simon handed this over to me, I'm preparing v12. On Fri, 15 Jul 2016 14:07:37 -0700, pravin shelar wrote: > I am not sure if you can use only mac_len to detect L3 packet. This > does not work with MPLS packets, mac_len is used to account MPLS > headers pushed on skb. Therefore in case of a MPLS header on L3 > packet, mac_len would be non zero and we have to look at either > mac_header or some other metadata like is_layer3 flag from key to > check for L3 packet. I went through the relevant code paths and I don't see any problem in using mac_len for that. MPLS GSO seems to work correctly. The kernel MPLS code expects mac_len to be just the L2 header len, excluding MPLS. The same is the case for openvswitch (you're not correct that "mac_len is used to account MPLS headers pushed on skb", at least not with the current code). In no place I see any problem with mac_len being 0, the calculations just nicely work. What was your concern with that, Pravin? In another mail in this thread you mentioned skb_mpls_header. That one works correctly with mac_len == 0 if mac_header points to the beginning of the packet. You also wrote: > I was thinking in overall networking stack rather than just ovs > datapath. I think we should have consistent method of detecting L3 > packet. As commented in previous mail it could be achieved using > skb-protocol and device type. Again, mac_len == 0 works correctly and consistently, provided that both mac_header and network_header point to the same place. In case of a MPLS packet it would be the beginning of MPLS headers. > > --- a/include/net/mpls.h > > +++ b/include/net/mpls.h > > @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type) > > */ > > static inline unsigned char *skb_mpls_header(struct sk_buff *skb) > > { > > - return skb_mac_header(skb) + skb->mac_len; > > + return skb_mac_header_was_set(skb) ? > > + skb_mac_header(skb) + skb->mac_len : > > + skb->data; > > } > > This function is also called from GSO layer. I don't see it used anywhere outside of openvswitch. Not even when grepping git history. I may be missing something, though. > issue is in GSO layer, it > does reset mac header and mac length and then calls mpls-gso-handler. > So all subsequent check for L3 packet fails. > So far we have explored three different ways to detect L3 packet but > each has its own issue. > 1. skb mac header : GSO can reset mac header. > 2. skb mac length : MPLS uses mac_len to account for MPLS header > length along with L2 header It does not appear to be the case. Or at least not anymore. > 3. skb protocol: ETH_P_TEB is not set for all L2 frames, networking > stack is not ready to handle this type for given skb. > > So none of them works consistently. I think the only option to detect > L3 packet reliably (and without adding field to skb) is to use > skb-protocol along with ARPHRD_NONE device type. If ARPHRD_NONE type > device generates L2 packet it needs to set protocol to ETH_P_TEB. Some > networking stack function also needs to be fixed to handle this > protocol type, e.g. vlan_get_protocol(), br_dev_queue_push_xmit(), > etc. All of this said, I'm not opposed to using the skb_eth_header_present helper and checking the device type, it works. I just want to understand whether I missed some problem with mac_len. Seems to make some things simpler if we could use mac_len. Thanks, Jiri
On Mon, Sep 26, 2016 at 9:53 AM, Jiri Benc <jbenc@redhat.com> wrote: > Reviving a very old thread, sorry. Simon handed this over to me, I'm > preparing v12. > > On Fri, 15 Jul 2016 14:07:37 -0700, pravin shelar wrote: >> I am not sure if you can use only mac_len to detect L3 packet. This >> does not work with MPLS packets, mac_len is used to account MPLS >> headers pushed on skb. Therefore in case of a MPLS header on L3 >> packet, mac_len would be non zero and we have to look at either >> mac_header or some other metadata like is_layer3 flag from key to >> check for L3 packet. > > I went through the relevant code paths and I don't see any problem in > using mac_len for that. MPLS GSO seems to work correctly. The kernel > MPLS code expects mac_len to be just the L2 header len, excluding MPLS. > The same is the case for openvswitch (you're not correct that "mac_len > is used to account MPLS headers pushed on skb", at least not with the > current code). In no place I see any problem with mac_len being 0, the > calculations just nicely work. > > What was your concern with that, Pravin? > > In another mail in this thread you mentioned skb_mpls_header. That one > works correctly with mac_len == 0 if mac_header points to the beginning > of the packet. > > You also wrote: > >> I was thinking in overall networking stack rather than just ovs >> datapath. I think we should have consistent method of detecting L3 >> packet. As commented in previous mail it could be achieved using >> skb-protocol and device type. > > Again, mac_len == 0 works correctly and consistently, provided that > both mac_header and network_header point to the same place. In case of > a MPLS packet it would be the beginning of MPLS headers. > >> > --- a/include/net/mpls.h >> > +++ b/include/net/mpls.h >> > @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type) >> > */ >> > static inline unsigned char *skb_mpls_header(struct sk_buff *skb) >> > { >> > - return skb_mac_header(skb) + skb->mac_len; >> > + return skb_mac_header_was_set(skb) ? >> > + skb_mac_header(skb) + skb->mac_len : >> > + skb->data; >> > } >> >> This function is also called from GSO layer. > > I don't see it used anywhere outside of openvswitch. Not even when > grepping git history. I may be missing something, though. > >> issue is in GSO layer, it >> does reset mac header and mac length and then calls mpls-gso-handler. >> So all subsequent check for L3 packet fails. >> So far we have explored three different ways to detect L3 packet but >> each has its own issue. >> 1. skb mac header : GSO can reset mac header. >> 2. skb mac length : MPLS uses mac_len to account for MPLS header >> length along with L2 header > > It does not appear to be the case. Or at least not anymore. > >> 3. skb protocol: ETH_P_TEB is not set for all L2 frames, networking >> stack is not ready to handle this type for given skb. >> >> So none of them works consistently. I think the only option to detect >> L3 packet reliably (and without adding field to skb) is to use >> skb-protocol along with ARPHRD_NONE device type. If ARPHRD_NONE type >> device generates L2 packet it needs to set protocol to ETH_P_TEB. Some >> networking stack function also needs to be fixed to handle this >> protocol type, e.g. vlan_get_protocol(), br_dev_queue_push_xmit(), >> etc. > > All of this said, I'm not opposed to using the skb_eth_header_present > helper and checking the device type, it works. I just want to understand > whether I missed some problem with mac_len. Seems to make some things > simpler if we could use mac_len. > After commit 48d2ab609b6bb ("net: mpls: Fixups for GSO") MPLS does not need to use skb mac-len to track the header, so using mac-len test for L3 packet detection would result in better and cleaner solution.
diff --git a/include/net/mpls.h b/include/net/mpls.h index 5b3b5addfb08..296b68661be0 100644 --- a/include/net/mpls.h +++ b/include/net/mpls.h @@ -34,6 +34,8 @@ static inline bool eth_p_mpls(__be16 eth_type) */ static inline unsigned char *skb_mpls_header(struct sk_buff *skb) { - return skb_mac_header(skb) + skb->mac_len; + return skb_mac_header_was_set(skb) ? + skb_mac_header(skb) + skb->mac_len : + skb->data; } #endif diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c index 0001f651c934..a18feccb2baa 100644 --- a/net/openvswitch/actions.c +++ b/net/openvswitch/actions.c @@ -163,18 +163,20 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key, return -ENOMEM; skb_push(skb, MPLS_HLEN); - skb_reset_mac_header(skb); new_mpls_lse = (__be32 *)skb_mpls_header(skb); *new_mpls_lse = mpls->mpls_lse; skb_postpush_rcsum(skb, new_mpls_lse, MPLS_HLEN); - if (skb->mac_len) { + if (skb_mac_header_was_set(skb)) { + skb_reset_mac_header(skb); + update_ethertype(skb, eth_hdr(skb), mpls->mpls_ethertype); memmove(skb_mac_header(skb) - MPLS_HLEN, skb_mac_header(skb), skb->mac_len); } + if (!skb->inner_protocol) skb_set_inner_protocol(skb, skb->protocol); skb->protocol = mpls->mpls_ethertype; @@ -186,22 +188,18 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key, static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key, const __be16 ethertype) { - int err; - - err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN); - if (unlikely(err)) - return err; - - skb_postpull_rcsum(skb, skb_mpls_header(skb), MPLS_HLEN); + if (skb_mac_header_was_set(skb)) { + struct ethhdr *hdr; + int err; - memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb), - skb->mac_len); + skb_postpull_rcsum(skb, skb_mpls_header(skb), MPLS_HLEN); - __skb_pull(skb, MPLS_HLEN); - skb_reset_mac_header(skb); + err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN); + if (unlikely(err)) + return err; - if (skb->mac_len) { - struct ethhdr *hdr; + memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb), + skb->mac_len); /* skb_mpls_header() is used to locate the ethertype * field correctly in the presence of VLAN tags. @@ -210,6 +208,11 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key, update_ethertype(skb, hdr, ethertype); } + __skb_pull(skb, MPLS_HLEN); + + if (skb_mac_header_was_set(skb)) + skb_reset_mac_header(skb); + if (eth_p_mpls(skb->protocol)) skb->protocol = ethertype; @@ -220,11 +223,14 @@ static int pop_mpls(struct sk_buff *skb, struct sw_flow_key *key, static int set_mpls(struct sk_buff *skb, struct sw_flow_key *flow_key, const __be32 *mpls_lse, const __be32 *mask) { + __u16 mac_len; __be32 *stack; __be32 lse; int err; - err = skb_ensure_writable(skb, skb->mac_len + MPLS_HLEN); + mac_len = skb_mac_header_was_set(skb) ? skb->mac_len : 0; + + err = skb_ensure_writable(skb, mac_len + MPLS_HLEN); if (unlikely(err)) return err; @@ -307,7 +313,7 @@ static int set_eth_addr(struct sk_buff *skb, struct sw_flow_key *flow_key, static int pop_eth(struct sk_buff *skb, struct sw_flow_key *key) { skb_pull_rcsum(skb, ETH_HLEN); - skb_reset_mac_header(skb); + skb_unset_mac_header(skb); skb->mac_len -= ETH_HLEN; invalidate_flow_key(key); @@ -325,7 +331,7 @@ static int push_eth(struct sk_buff *skb, struct sw_flow_key *key, skb_push(skb, ETH_HLEN); skb_reset_mac_header(skb); - skb->mac_len += ETH_HLEN; + skb->mac_len = ETH_HLEN; hdr = eth_hdr(skb); ether_addr_copy(hdr->h_source, ethh->addresses.eth_src); diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 42587d5bf894..837ea4f9a71d 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -465,17 +465,18 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) /* Flags are always used as part of stats */ key->tp.flags = 0; - skb_reset_mac_header(skb); - /* Link layer. */ key->eth.tci = 0; if (key->phy.is_layer3) { if (skb_vlan_tag_present(skb)) key->eth.tci = htons(skb->vlan_tci); key->eth.type = skb->protocol; + skb_reset_network_header(skb); } else { struct ethhdr *eth = eth_hdr(skb); + skb_reset_mac_header(skb); + ether_addr_copy(key->eth.src, eth->h_source); ether_addr_copy(key->eth.dst, eth->h_dest); @@ -493,11 +494,11 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) key->eth.type = parse_ethertype(skb); if (unlikely(key->eth.type == htons(0))) return -ENOMEM; - } - skb_reset_network_header(skb); - skb_reset_mac_len(skb); - __skb_push(skb, skb->data - skb_mac_header(skb)); + skb_reset_network_header(skb); + skb_reset_mac_len(skb); + __skb_push(skb, skb->data - skb_mac_header(skb)); + } /* Network layer. */ if (key->eth.type == htons(ETH_P_IP)) { @@ -608,12 +609,16 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) * header and the beginning of the L3 header differ. * * Advance network_header to the beginning of the L3 - * header. mac_len corresponds to the end of the L2 header. + * header. For packets with an L2 header mac_len corresponds + * to the end of the L2 header. */ while (1) { + __u16 mac_len; __be32 lse; - error = check_header(skb, skb->mac_len + stack_len); + mac_len = key->phy.is_layer3 ? 0 : skb->mac_len; + + error = check_header(skb, mac_len + stack_len); if (unlikely(error)) return 0; @@ -622,7 +627,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) if (stack_len == MPLS_HLEN) memcpy(&key->mpls.top_lse, &lse, MPLS_HLEN); - skb_set_network_header(skb, skb->mac_len + stack_len); + skb_set_network_header(skb, mac_len + stack_len); if (lse & htonl(MPLS_LS_S_MASK)) break; diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c index 5ad184bd5802..a5ea0bcd310c 100644 --- a/net/openvswitch/vport-internal_dev.c +++ b/net/openvswitch/vport-internal_dev.c @@ -255,7 +255,7 @@ static netdev_tx_t internal_dev_recv(struct sk_buff *skb) struct pcpu_sw_netstats *stats; /* Only send/receive L2 packets */ - if (!skb->mac_len) { + if (!skb_mac_header_was_set(skb)) { kfree_skb(skb); return -EINVAL; } diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c index 7d54414b35eb..05985209f611 100644 --- a/net/openvswitch/vport-netdev.c +++ b/net/openvswitch/vport-netdev.c @@ -201,9 +201,9 @@ int ovs_netdev_send(struct sk_buff *skb) { struct net_device *dev = skb->dev; - if (dev->type != ARPHRD_ETHER && skb->mac_len) { + if (dev->type != ARPHRD_ETHER && skb_mac_header_was_set(skb)) { skb->protocol = htons(ETH_P_TEB); - } else if (dev->type == ARPHRD_ETHER && !skb->mac_len) { + } else if (dev->type == ARPHRD_ETHER && !skb_mac_header_was_set(skb)) { kfree_skb(skb); return -EINVAL; }