diff mbox

[v3] ip: fix truesize mismatch in ip fragmentation

Message ID 1285094865.2452.2.camel@edumazet-laptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 21, 2010, 6:47 p.m. UTC
> Looks better and better to me, except, checkpatch complains about the
> (existing) indentation fault here (and later), but I guess you've seen
> that?
> 

Indeed, there is checkpatch somewhere ;)

Thanks !

[PATCH v4] ip: fix truesize mismatch in ip fragmentation

Special care should be taken when slow path is hit in ip_fragment() :

When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.

Many thanks to Nick Bowler, who provided a very clean bug report and
test program.

Thanks to Jarek for reviewing my first patch and providing a V2

While Nick bisection pointed to commit 2b85a34e911 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)

A side effect is to extend work done in commit b2722b1c3a893e
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.

Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/ip_output.c  |   19 +++++++++++++------
 net/ipv6/ip6_output.c |   18 +++++++++++++-----
 2 files changed, 26 insertions(+), 11 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jarek Poplawski Sept. 21, 2010, 7:21 p.m. UTC | #1
On Tue, Sep 21, 2010 at 08:47:45PM +0200, Eric Dumazet wrote:
> 
> > Looks better and better to me, except, checkpatch complains about the
> > (existing) indentation fault here (and later), but I guess you've seen
> > that?
> > 
> 
> Indeed, there is checkpatch somewhere ;)
> 
> Thanks !
> 
> [PATCH v4] ip: fix truesize mismatch in ip fragmentation
> 
> Special care should be taken when slow path is hit in ip_fragment() :
> 
> When walking through frags, we transfert truesize ownership from skb to
> frags. Then if we hit a slow_path condition, we must undo this or risk
> uncharging frags->truesize twice, and in the end, having negative socket
> sk_wmem_alloc counter, or even freeing socket sooner than expected.
> 
> Many thanks to Nick Bowler, who provided a very clean bug report and
> test program.
> 
> Thanks to Jarek for reviewing my first patch and providing a V2
> 
> While Nick bisection pointed to commit 2b85a34e911 (net: No more
> expensive sock_hold()/sock_put() on each tx), underlying bug is older
> (2.6.12-rc5)
> 
> A side effect is to extend work done in commit b2722b1c3a893e
> (ip_fragment: also adjust skb->truesize for packets not owned by a
> socket) to ipv6 as well.
> 
> Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
> Tested-by: Nick Bowler <nbowler@elliptictech.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Jarek Poplawski <jarkao2@gmail.com>
> CC: Patrick McHardy <kaber@trash.net>

Looks perfect to me.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 21, 2010, 10:15 p.m. UTC | #2
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 21 Sep 2010 21:21:54 +0200

> On Tue, Sep 21, 2010 at 08:47:45PM +0200, Eric Dumazet wrote:
>> [PATCH v4] ip: fix truesize mismatch in ip fragmentation
>> 
>> Special care should be taken when slow path is hit in ip_fragment() :
>> 
>> When walking through frags, we transfert truesize ownership from skb to
>> frags. Then if we hit a slow_path condition, we must undo this or risk
>> uncharging frags->truesize twice, and in the end, having negative socket
>> sk_wmem_alloc counter, or even freeing socket sooner than expected.
>> 
>> Many thanks to Nick Bowler, who provided a very clean bug report and
>> test program.
>> 
>> Thanks to Jarek for reviewing my first patch and providing a V2
>> 
>> While Nick bisection pointed to commit 2b85a34e911 (net: No more
>> expensive sock_hold()/sock_put() on each tx), underlying bug is older
>> (2.6.12-rc5)
>> 
>> A side effect is to extend work done in commit b2722b1c3a893e
>> (ip_fragment: also adjust skb->truesize for packets not owned by a
>> socket) to ipv6 as well.
>> 
>> Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
>> Tested-by: Nick Bowler <nbowler@elliptictech.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> CC: Jarek Poplawski <jarkao2@gmail.com>
>> CC: Patrick McHardy <kaber@trash.net>
> 
> Looks perfect to me.

Great work everyone!

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 04b6989..7649d77 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -488,9 +488,8 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 	 * we can switch to copy when see the first bad fragment.
 	 */
 	if (skb_has_frags(skb)) {
-		struct sk_buff *frag;
+		struct sk_buff *frag, *frag2;
 		int first_len = skb_pagelen(skb);
-		int truesizes = 0;
 
 		if (first_len - hlen > mtu ||
 		    ((first_len - hlen) & 7) ||
@@ -503,18 +502,18 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			if (frag->len > mtu ||
 			    ((frag->len & 7) && frag->next) ||
 			    skb_headroom(frag) < hlen)
-			    goto slow_path;
+				goto slow_path_clean;
 
 			/* Partially cloned skb? */
 			if (skb_shared(frag))
-				goto slow_path;
+				goto slow_path_clean;
 
 			BUG_ON(frag->sk);
 			if (skb->sk) {
 				frag->sk = skb->sk;
 				frag->destructor = sock_wfree;
 			}
-			truesizes += frag->truesize;
+			skb->truesize -= frag->truesize;
 		}
 
 		/* Everything is OK. Generate! */
@@ -524,7 +523,6 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		frag = skb_shinfo(skb)->frag_list;
 		skb_frag_list_init(skb);
 		skb->data_len = first_len - skb_headlen(skb);
-		skb->truesize -= truesizes;
 		skb->len = first_len;
 		iph->tot_len = htons(first_len);
 		iph->frag_off = htons(IP_MF);
@@ -576,6 +574,15 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		}
 		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
 		return err;
+
+slow_path_clean:
+		skb_walk_frags(skb, frag2) {
+			if (frag2 == frag)
+				break;
+			frag2->sk = NULL;
+			frag2->destructor = NULL;
+			skb->truesize += frag2->truesize;
+		}
 	}
 
 slow_path:
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index d40b330..980912e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -639,7 +639,7 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 	if (skb_has_frags(skb)) {
 		int first_len = skb_pagelen(skb);
-		int truesizes = 0;
+		struct sk_buff *frag2;
 
 		if (first_len - hlen > mtu ||
 		    ((first_len - hlen) & 7) ||
@@ -651,18 +651,18 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			if (frag->len > mtu ||
 			    ((frag->len & 7) && frag->next) ||
 			    skb_headroom(frag) < hlen)
-			    goto slow_path;
+				goto slow_path_clean;
 
 			/* Partially cloned skb? */
 			if (skb_shared(frag))
-				goto slow_path;
+				goto slow_path_clean;
 
 			BUG_ON(frag->sk);
 			if (skb->sk) {
 				frag->sk = skb->sk;
 				frag->destructor = sock_wfree;
-				truesizes += frag->truesize;
 			}
+			skb->truesize -= frag->truesize;
 		}
 
 		err = 0;
@@ -693,7 +693,6 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 		first_len = skb_pagelen(skb);
 		skb->data_len = first_len - skb_headlen(skb);
-		skb->truesize -= truesizes;
 		skb->len = first_len;
 		ipv6_hdr(skb)->payload_len = htons(first_len -
 						   sizeof(struct ipv6hdr));
@@ -756,6 +755,15 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			      IPSTATS_MIB_FRAGFAILS);
 		dst_release(&rt->dst);
 		return err;
+
+slow_path_clean:
+		skb_walk_frags(skb, frag2) {
+			if (frag2 == frag)
+				break;
+			frag2->sk = NULL;
+			frag2->destructor = NULL;
+			skb->truesize += frag2->truesize;
+		}
 	}
 
 slow_path: