Message ID | 1328545627.2220.72.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 06 Feb 2012 17:27:07 +0100 > Shlomo Pongratz reported GRO L2 header check was suited for Ethernet > only, and failed on IB/ipoib traffic. > > He provided a patch faking a zeroed header to let GRO aggregates frames. > > Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header > check to be more generic. > > This patch introduces a new netdevice field, gro_mac_header_len, giving > L2 header length, default to ETH_HLEN (14 bytes) > > A device setup function can override this default value. > > __napi_gro_receive() has special handling for the common case (Ethernet) > to avoid a memcmp() call and use an inline optimized function instead. > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > Reported-by: Shlomo Pongratz <shlomop@mellanox.com> We really need an explanation, probably both in the commit message and the comments next to this new struct member, explaining why in the world we can't use ->hard_header_len for this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2/6/2012 6:31 PM, David Miller wrote: > We really need an explanation, probably both in the commit message and > the comments next to this new struct member, explaining why in the > world we can't use ->hard_header_len for this. Dave, As I wrote earlier on this thread, the reason is that in order to be able to xmit skb's for which the stack doesn't provide a neighbour, IPoIB makes sure to have extra INFINIBAND_ALEN bytes headroom on skbs. Such that the ipoib hard_header functions writes a "ipoib_pseudoheader" (= hw address) on the skb and the xmit function grabs it from there and use that pseudoheader to search for the actual IB L2 address handle kept for that dest. Example such skbs are unicast ARP replies and multicast. see below some relevant code sections from ipoib_main.c Or. > static void ipoib_setup(struct net_device *dev) > [...] > /* > * We add in INFINIBAND_ALEN to allow for the destination > * address "pseudoheader" for skbs without neighbour struct. > */ > dev->hard_header_len = IPOIB_ENCAP_LEN + INFINIBAND_ALEN; > dev->addr_len = INFINIBAND_ALEN; > dev->type = ARPHRD_INFINIBAND; > static int ipoib_hard_header(struct sk_buff *skb, > struct net_device *dev, > unsigned short type, > const void *daddr, const void *saddr, > unsigned len) > [...] > /* > * If we don't have a neighbour structure, stuff the > * destination address onto the front of the skb so we can > * figure out where to send the packet later. > */ > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Or Gerlitz <ogerlitz@mellanox.com> Date: Mon, 6 Feb 2012 18:44:03 +0200 > As I wrote earlier on this thread, the reason is that in order to be > able to xmit skb's for which the stack doesn't provide a neighbour, > IPoIB makes sure to have extra INFINIBAND_ALEN bytes headroom on > skbs. Such that the ipoib hard_header functions writes a > "ipoib_pseudoheader" (= hw address) on the skb and the xmit function > grabs it from there and use that pseudoheader to search for the actual > IB L2 address handle kept for that dest. Example such skbs are unicast > ARP replies and multicast. see below some relevant code sections from > ipoib_main.c Then IPoIB is broken and must be fixed, see my other reply. We're not adding a bogus member to the netdevice just because IPoIB tries to do path resolution and neighbour validation behind the neighbour cache's back. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0eac07c..d17192b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1095,6 +1095,7 @@ struct net_device { unsigned int mtu; /* interface MTU value */ unsigned short type; /* interface hardware type */ unsigned short hard_header_len; /* hardware hdr length */ + unsigned int gro_mac_header_len; /* extra head- and tailroom the hardware may need, but not in all cases * can this be guaranteed, especially tailroom. Some cases also use diff --git a/net/core/dev.c b/net/core/dev.c index f124947..0b43939 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3491,14 +3491,20 @@ static inline gro_result_t __napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb) { struct sk_buff *p; + unsigned int maclen = skb->dev->gro_mac_header_len; for (p = napi->gro_list; p; p = p->next) { unsigned long diffs; diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev; diffs |= p->vlan_tci ^ skb->vlan_tci; - diffs |= compare_ether_header(skb_mac_header(p), - skb_gro_mac_header(skb)); + if (maclen == ETH_HLEN) + diffs |= compare_ether_header(skb_mac_header(p), + skb_gro_mac_header(skb)); + else if (!diffs) + diffs = memcmp(skb_mac_header(p), + skb_gro_mac_header(skb), + maclen); NAPI_GRO_CB(p)->same_flow = !diffs; NAPI_GRO_CB(p)->flush = 0; } @@ -5962,6 +5968,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(&dev->unreg_list); INIT_LIST_HEAD(&dev->link_watch_list); dev->priv_flags = IFF_XMIT_DST_RELEASE; + dev->gro_mac_header_len = ETH_HLEN; setup(dev); dev->num_tx_queues = txqs;
Shlomo Pongratz reported GRO L2 header check was suited for Ethernet only, and failed on IB/ipoib traffic. He provided a patch faking a zeroed header to let GRO aggregates frames. Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header check to be more generic. This patch introduces a new netdevice field, gro_mac_header_len, giving L2 header length, default to ETH_HLEN (14 bytes) A device setup function can override this default value. __napi_gro_receive() has special handling for the common case (Ethernet) to avoid a memcmp() call and use an inline optimized function instead. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Shlomo Pongratz <shlomop@mellanox.com> Cc: Roland Dreier <roland@kernel.org> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> --- include/linux/netdevice.h | 1 + net/core/dev.c | 11 +++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html