diff mbox

[net-next] gro: introduce gro_mac_header_len

Message ID 1328545627.2220.72.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Feb. 6, 2012, 4:27 p.m. UTC
Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
only, and failed on IB/ipoib traffic.

He provided a patch faking a zeroed header to let GRO aggregates frames.

Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
check to be more generic.

This patch introduces a new netdevice field, gro_mac_header_len, giving
L2 header length, default to ETH_HLEN (14 bytes)

A device setup function can override this default value.

__napi_gro_receive() has special handling for the common case (Ethernet)
to avoid a memcmp() call and use an inline optimized function instead.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Shlomo Pongratz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   11 +++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Feb. 6, 2012, 4:31 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 06 Feb 2012 17:27:07 +0100

> Shlomo Pongratz reported GRO L2 header check was suited for Ethernet
> only, and failed on IB/ipoib traffic.
> 
> He provided a patch faking a zeroed header to let GRO aggregates frames.
> 
> Roland Dreier, Herbert Xu, and others suggested we change GRO L2 header
> check to be more generic.
> 
> This patch introduces a new netdevice field, gro_mac_header_len, giving
> L2 header length, default to ETH_HLEN (14 bytes)
> 
> A device setup function can override this default value.
> 
> __napi_gro_receive() has special handling for the common case (Ethernet)
> to avoid a memcmp() call and use an inline optimized function instead.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Reported-by: Shlomo Pongratz <shlomop@mellanox.com>

We really need an explanation, probably both in the commit message and
the comments next to this new struct member, explaining why in the world
we can't use ->hard_header_len for this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Feb. 6, 2012, 4:44 p.m. UTC | #2
On 2/6/2012 6:31 PM, David Miller wrote:
> We really need an explanation, probably both in the commit message and 
> the comments next to this new struct member, explaining why in the 
> world we can't use ->hard_header_len for this. 

Dave,

As I wrote earlier on this thread, the reason is that in order to be 
able to xmit skb's for which the stack doesn't provide a neighbour, 
IPoIB makes sure to have extra INFINIBAND_ALEN bytes headroom on skbs. 
Such that the ipoib hard_header functions writes a "ipoib_pseudoheader" 
(= hw address) on the skb and the xmit function grabs it from there and 
use that pseudoheader to search for the actual IB L2 address handle kept 
for that dest. Example such skbs are unicast ARP replies and multicast. 
see below some relevant code sections from ipoib_main.c

Or.

> static void ipoib_setup(struct net_device *dev)
> [...]
>         /*
>          * We add in INFINIBAND_ALEN to allow for the destination
>          * address "pseudoheader" for skbs without neighbour struct.
>          */
>         dev->hard_header_len     = IPOIB_ENCAP_LEN + INFINIBAND_ALEN;
>         dev->addr_len            = INFINIBAND_ALEN;
>         dev->type                = ARPHRD_INFINIBAND;


> static int ipoib_hard_header(struct sk_buff *skb,
>                              struct net_device *dev,
>                              unsigned short type,
>                              const void *daddr, const void *saddr, 
> unsigned len)
>         [...]
>         /*
>          * If we don't have a neighbour structure, stuff the
>          * destination address onto the front of the skb so we can
>          * figure out where to send the packet later.
>          */
>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 6, 2012, 5 p.m. UTC | #3
From: Or Gerlitz <ogerlitz@mellanox.com>
Date: Mon, 6 Feb 2012 18:44:03 +0200

> As I wrote earlier on this thread, the reason is that in order to be
> able to xmit skb's for which the stack doesn't provide a neighbour,
> IPoIB makes sure to have extra INFINIBAND_ALEN bytes headroom on
> skbs. Such that the ipoib hard_header functions writes a
> "ipoib_pseudoheader" (= hw address) on the skb and the xmit function
> grabs it from there and use that pseudoheader to search for the actual
> IB L2 address handle kept for that dest. Example such skbs are unicast
> ARP replies and multicast. see below some relevant code sections from
> ipoib_main.c

Then IPoIB is broken and must be fixed, see my other reply.

We're not adding a bogus member to the netdevice just because IPoIB
tries to do path resolution and neighbour validation behind the
neighbour cache's back.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0eac07c..d17192b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1095,6 +1095,7 @@  struct net_device {
 	unsigned int		mtu;	/* interface MTU value		*/
 	unsigned short		type;	/* interface hardware type	*/
 	unsigned short		hard_header_len;	/* hardware hdr length	*/
+	unsigned int		gro_mac_header_len;
 
 	/* extra head- and tailroom the hardware may need, but not in all cases
 	 * can this be guaranteed, especially tailroom. Some cases also use
diff --git a/net/core/dev.c b/net/core/dev.c
index f124947..0b43939 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3491,14 +3491,20 @@  static inline gro_result_t
 __napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
 	struct sk_buff *p;
+	unsigned int maclen = skb->dev->gro_mac_header_len;
 
 	for (p = napi->gro_list; p; p = p->next) {
 		unsigned long diffs;
 
 		diffs = (unsigned long)p->dev ^ (unsigned long)skb->dev;
 		diffs |= p->vlan_tci ^ skb->vlan_tci;
-		diffs |= compare_ether_header(skb_mac_header(p),
-					      skb_gro_mac_header(skb));
+		if (maclen == ETH_HLEN)
+			diffs |= compare_ether_header(skb_mac_header(p),
+						      skb_gro_mac_header(skb));
+		else if (!diffs)
+			diffs = memcmp(skb_mac_header(p),
+				       skb_gro_mac_header(skb),
+				       maclen);
 		NAPI_GRO_CB(p)->same_flow = !diffs;
 		NAPI_GRO_CB(p)->flush = 0;
 	}
@@ -5962,6 +5968,7 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	INIT_LIST_HEAD(&dev->unreg_list);
 	INIT_LIST_HEAD(&dev->link_watch_list);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE;
+	dev->gro_mac_header_len = ETH_HLEN;
 	setup(dev);
 
 	dev->num_tx_queues = txqs;