diff mbox

[net-next] gro: should aggregate frames without DF

Message ID 1370019752.5109.108.camel@edumazet-glaptop
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 31, 2013, 5:02 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

GRO on IPv4 doesn't aggregate frames if they don't have DF bit set.

Some servers use IP_MTU_DISCOVER/IP_PMTUDISC_PROBE, so linux receivers
are unable to aggregate this kind of traffic.

The right thing to do is to allow aggregation as long as the DF bit has
same value on all segments.

bnx2x LRO does this correctly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
---
 net/ipv4/af_inet.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jerry Chu May 31, 2013, 6:14 p.m. UTC | #1
On Fri, May 31, 2013 at 10:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> GRO on IPv4 doesn't aggregate frames if they don't have DF bit set.
>
> Some servers use IP_MTU_DISCOVER/IP_PMTUDISC_PROBE, so linux receivers
> are unable to aggregate this kind of traffic.
>
> The right thing to do is to allow aggregation as long as the DF bit has
> same value on all segments.
>
> bnx2x LRO does this correctly.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jerry Chu <hkchu@google.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> ---
>  net/ipv4/af_inet.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index b05ae96..328cc62 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1384,9 +1384,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
>         if (unlikely(ip_fast_csum((u8 *)iph, 5)))
>                 goto out_unlock;
>
> -       id = ntohl(*(__be32 *)&iph->id);
> -       flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id ^ IP_DF));
> -       id >>= 16;
> +       flush = ntohs(iph->tot_len) ^ skb_gro_len(skb);
> +
> +       id = ntohs(iph->id);
>
>         for (p = *head; p; p = p->next) {
>                 struct iphdr *iph2;
> @@ -1407,6 +1407,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
>                 NAPI_GRO_CB(p)->flush |=
>                         (iph->ttl ^ iph2->ttl) |
>                         (iph->tos ^ iph2->tos) |
> +                       ((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |
>                         ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
>
>                 NAPI_GRO_CB(p)->flush |= flush;
>
>

Acked-by: Jerry Chu <hkchu@google.com>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings May 31, 2013, 8:09 p.m. UTC | #2
On Fri, 2013-05-31 at 10:02 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> GRO on IPv4 doesn't aggregate frames if they don't have DF bit set.
> 
> Some servers use IP_MTU_DISCOVER/IP_PMTUDISC_PROBE, so linux receivers
> are unable to aggregate this kind of traffic.
> 
> The right thing to do is to allow aggregation as long as the DF bit has
> same value on all segments.
> 
> bnx2x LRO does this correctly.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jerry Chu <hkchu@google.com>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> ---
>  net/ipv4/af_inet.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index b05ae96..328cc62 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1384,9 +1384,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
>  	if (unlikely(ip_fast_csum((u8 *)iph, 5)))
>  		goto out_unlock;
>  
> -	id = ntohl(*(__be32 *)&iph->id);
> -	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id ^ IP_DF));
> -	id >>= 16;
> +	flush = ntohs(iph->tot_len) ^ skb_gro_len(skb);
> +
> +	id = ntohs(iph->id);
>  
>  	for (p = *head; p; p = p->next) {
>  		struct iphdr *iph2;
> @@ -1407,6 +1407,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head,
>  		NAPI_GRO_CB(p)->flush |=
>  			(iph->ttl ^ iph2->ttl) |
>  			(iph->tos ^ iph2->tos) |
> +			((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |

But this results in ignoring the actual offset bits of frag_off!
We should allow merging only if all packets have frag_off == IP_DF or
all have frag_off == 0.  The first assignment of flush therefore still
needs to check the combined id/frag_off word, but using (id & ~IP_DF)
instead of (id ^ IP_DF).

Ben.

>  			((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
>  
>  		NAPI_GRO_CB(p)->flush |= flush;
Eric Dumazet May 31, 2013, 9:12 p.m. UTC | #3
On Fri, 2013-05-31 at 21:09 +0100, Ben Hutchings wrote:

> But this results in ignoring the actual offset bits of frag_off!
> We should allow merging only if all packets have frag_off == IP_DF or
> all have frag_off == 0.  The first assignment of flush therefore still
> needs to check the combined id/frag_off word, but using (id & ~IP_DF)
> instead of (id ^ IP_DF).

You're right, I guess I wanted too much getting rid of black magic in
this code ;)

I'll send a v2


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index b05ae96..328cc62 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1384,9 +1384,9 @@  static struct sk_buff **inet_gro_receive(struct sk_buff **head,
 	if (unlikely(ip_fast_csum((u8 *)iph, 5)))
 		goto out_unlock;
 
-	id = ntohl(*(__be32 *)&iph->id);
-	flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id ^ IP_DF));
-	id >>= 16;
+	flush = ntohs(iph->tot_len) ^ skb_gro_len(skb);
+
+	id = ntohs(iph->id);
 
 	for (p = *head; p; p = p->next) {
 		struct iphdr *iph2;
@@ -1407,6 +1407,7 @@  static struct sk_buff **inet_gro_receive(struct sk_buff **head,
 		NAPI_GRO_CB(p)->flush |=
 			(iph->ttl ^ iph2->ttl) |
 			(iph->tos ^ iph2->tos) |
+			((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) |
 			((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id);
 
 		NAPI_GRO_CB(p)->flush |= flush;