Message ID | 1370019752.5109.108.camel@edumazet-glaptop |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, May 31, 2013 at 10:02 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > GRO on IPv4 doesn't aggregate frames if they don't have DF bit set. > > Some servers use IP_MTU_DISCOVER/IP_PMTUDISC_PROBE, so linux receivers > are unable to aggregate this kind of traffic. > > The right thing to do is to allow aggregation as long as the DF bit has > same value on all segments. > > bnx2x LRO does this correctly. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jerry Chu <hkchu@google.com> > Cc: Herbert Xu <herbert@gondor.apana.org.au> > --- > net/ipv4/af_inet.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index b05ae96..328cc62 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1384,9 +1384,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, > if (unlikely(ip_fast_csum((u8 *)iph, 5))) > goto out_unlock; > > - id = ntohl(*(__be32 *)&iph->id); > - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id ^ IP_DF)); > - id >>= 16; > + flush = ntohs(iph->tot_len) ^ skb_gro_len(skb); > + > + id = ntohs(iph->id); > > for (p = *head; p; p = p->next) { > struct iphdr *iph2; > @@ -1407,6 +1407,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, > NAPI_GRO_CB(p)->flush |= > (iph->ttl ^ iph2->ttl) | > (iph->tos ^ iph2->tos) | > + ((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) | > ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id); > > NAPI_GRO_CB(p)->flush |= flush; > > Acked-by: Jerry Chu <hkchu@google.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-05-31 at 10:02 -0700, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > GRO on IPv4 doesn't aggregate frames if they don't have DF bit set. > > Some servers use IP_MTU_DISCOVER/IP_PMTUDISC_PROBE, so linux receivers > are unable to aggregate this kind of traffic. > > The right thing to do is to allow aggregation as long as the DF bit has > same value on all segments. > > bnx2x LRO does this correctly. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Jerry Chu <hkchu@google.com> > Cc: Herbert Xu <herbert@gondor.apana.org.au> > --- > net/ipv4/af_inet.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index b05ae96..328cc62 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1384,9 +1384,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, > if (unlikely(ip_fast_csum((u8 *)iph, 5))) > goto out_unlock; > > - id = ntohl(*(__be32 *)&iph->id); > - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id ^ IP_DF)); > - id >>= 16; > + flush = ntohs(iph->tot_len) ^ skb_gro_len(skb); > + > + id = ntohs(iph->id); > > for (p = *head; p; p = p->next) { > struct iphdr *iph2; > @@ -1407,6 +1407,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, > NAPI_GRO_CB(p)->flush |= > (iph->ttl ^ iph2->ttl) | > (iph->tos ^ iph2->tos) | > + ((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) | But this results in ignoring the actual offset bits of frag_off! We should allow merging only if all packets have frag_off == IP_DF or all have frag_off == 0. The first assignment of flush therefore still needs to check the combined id/frag_off word, but using (id & ~IP_DF) instead of (id ^ IP_DF). Ben. > ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id); > > NAPI_GRO_CB(p)->flush |= flush;
On Fri, 2013-05-31 at 21:09 +0100, Ben Hutchings wrote: > But this results in ignoring the actual offset bits of frag_off! > We should allow merging only if all packets have frag_off == IP_DF or > all have frag_off == 0. The first assignment of flush therefore still > needs to check the combined id/frag_off word, but using (id & ~IP_DF) > instead of (id ^ IP_DF). You're right, I guess I wanted too much getting rid of black magic in this code ;) I'll send a v2 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index b05ae96..328cc62 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1384,9 +1384,9 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, if (unlikely(ip_fast_csum((u8 *)iph, 5))) goto out_unlock; - id = ntohl(*(__be32 *)&iph->id); - flush = (u16)((ntohl(*(__be32 *)iph) ^ skb_gro_len(skb)) | (id ^ IP_DF)); - id >>= 16; + flush = ntohs(iph->tot_len) ^ skb_gro_len(skb); + + id = ntohs(iph->id); for (p = *head; p; p = p->next) { struct iphdr *iph2; @@ -1407,6 +1407,7 @@ static struct sk_buff **inet_gro_receive(struct sk_buff **head, NAPI_GRO_CB(p)->flush |= (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | + ((iph->frag_off ^ iph2->frag_off) & htons(IP_DF)) | ((u16)(ntohs(iph2->id) + NAPI_GRO_CB(p)->count) ^ id); NAPI_GRO_CB(p)->flush |= flush;