diff mbox

[v3] ip: fix truesize mismatch in ip fragmentation

Message ID 1285084705.2617.636.camel@edumazet-laptop
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 21, 2010, 3:58 p.m. UTC
Le mardi 21 septembre 2010 à 16:16 +0200, Eric Dumazet a écrit :
> Le mardi 21 septembre 2010 à 10:05 -0400, Nick Bowler a écrit :
> 
> > This hunk introduces some whitespace damage.
> > 
> > Anyway, I tried this with ESP on both IPv4 and IPv6 and it appears to
> > correct the issue.  Thanks!
> > 
> 
> Indeed good catch.
> 
> Here is an updated patch, I added your Tested-by
> 
> Thanks for testing !
> 
> [PATCH] ip : fix truesize mismatch in ip fragmentation
> 
> We should not set frag->destructor to sock_wkfree() until we are sure we
> dont hit slow path in ip_fragment(). Or we risk uncharging
> frag->truesize twice, and in the end, having negative socket
> sk_wmem_alloc counter, or even freeing socket sooner than expected.
> 
> Many thanks to Nick Bowler, who provided a very clean bug report and
> test program.
> 
> While Nick bisection pointed to commit 2b85a34e911bf483 (net: No more
> expensive sock_hold()/sock_put() on each tx), underlying bug is older.
> 


Hmm, while looking at git history, I found commit from Patrick
b2722b1c3a893e (ip_fragment: also adjust skb->truesize for packets not
owned by a socket)
As my patch reverts it, we probably want a more polished patch.

(Also port Patrick work to ipv6)

So here is a V3

[PATCH v3] ip: fix truesize mismatch in ip fragmentation

Special care should be taken when slow path is hit in ip_fragment() :

When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.

Many thanks to Nick Bowler, who provided a very clean bug report and
test program.

Thanks to Jarek for reviewing my first patch and providing a V2

While Nick bisection pointed to commit 2b85a34e911 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)

A side effect is to extend work done in commit b2722b1c3a893e
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.

Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
Tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/ip_output.c  |   19 +++++++++++++------
 net/ipv6/ip6_output.c |   18 +++++++++++++-----
 2 files changed, 26 insertions(+), 11 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Henrique de Moraes Holschuh Sept. 21, 2010, 4:26 p.m. UTC | #1
Should this be a candidate for -stable?
Eric Dumazet Sept. 21, 2010, 4:31 p.m. UTC | #2
Le mardi 21 septembre 2010 à 13:26 -0300, Henrique de Moraes Holschuh a
écrit :
> Should this be a candidate for -stable?
> 

Yes, of course, but David wants to handle stable submissions himself.

I am not sure we want to bug stable team with dozens of mails while
polishing patches ?

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski Sept. 21, 2010, 5:50 p.m. UTC | #3
On Tue, Sep 21, 2010 at 05:58:25PM +0200, Eric Dumazet wrote:
> Hmm, while looking at git history, I found commit from Patrick
> b2722b1c3a893e (ip_fragment: also adjust skb->truesize for packets not
> owned by a socket)
> As my patch reverts it, we probably want a more polished patch.
> 
> (Also port Patrick work to ipv6)
> 
> So here is a V3
> 
> [PATCH v3] ip: fix truesize mismatch in ip fragmentation
> 
> Special care should be taken when slow path is hit in ip_fragment() :
> 
> When walking through frags, we transfert truesize ownership from skb to
> frags. Then if we hit a slow_path condition, we must undo this or risk
> uncharging frags->truesize twice, and in the end, having negative socket
> sk_wmem_alloc counter, or even freeing socket sooner than expected.
> 
> Many thanks to Nick Bowler, who provided a very clean bug report and
> test program.
> 
> Thanks to Jarek for reviewing my first patch and providing a V2
> 
> While Nick bisection pointed to commit 2b85a34e911 (net: No more
> expensive sock_hold()/sock_put() on each tx), underlying bug is older
> (2.6.12-rc5)
> 
> A side effect is to extend work done in commit b2722b1c3a893e
> (ip_fragment: also adjust skb->truesize for packets not owned by a
> socket) to ipv6 as well.
> 
> Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com>
> Tested-by: Nick Bowler <nbowler@elliptictech.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> CC: Jarek Poplawski <jarkao2@gmail.com>
> CC: Patrick McHardy <kaber@trash.net>
> ---
>  net/ipv4/ip_output.c  |   19 +++++++++++++------
>  net/ipv6/ip6_output.c |   18 +++++++++++++-----
>  2 files changed, 26 insertions(+), 11 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 04b6989..a643f7a 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -488,9 +488,8 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
>  	 * we can switch to copy when see the first bad fragment.
>  	 */
>  	if (skb_has_frags(skb)) {
> -		struct sk_buff *frag;
> +		struct sk_buff *frag, *frag2;
>  		int first_len = skb_pagelen(skb);
> -		int truesizes = 0;
>  
>  		if (first_len - hlen > mtu ||
>  		    ((first_len - hlen) & 7) ||
> @@ -503,18 +502,18 @@ int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
>  			if (frag->len > mtu ||
>  			    ((frag->len & 7) && frag->next) ||
>  			    skb_headroom(frag) < hlen)
> -			    goto slow_path;
> +			    goto slow_path_undo;

Looks better and better to me, except, checkpatch complains about the
(existing) indentation fault here (and later), but I guess you've seen
that?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Henrique de Moraes Holschuh Sept. 21, 2010, 6:09 p.m. UTC | #4
On Tue, 21 Sep 2010, Eric Dumazet wrote:
> Le mardi 21 septembre 2010 à 13:26 -0300, Henrique de Moraes Holschuh a
> écrit :
> > Should this be a candidate for -stable?
> > 
> 
> Yes, of course, but David wants to handle stable submissions himself.
> 
> I am not sure we want to bug stable team with dozens of mails while
> polishing patches ?

We don't.  But one often marks commits that should go to -stable using a Cc:
pseudo-header, and also includes relevant information (e.g. to which stable
kernels it should be applied to) to the commit message.

Since there wasn't one, and I didn't readly find any post in this thread
that mentioned it should also go to stable, AND it looked at first glance
like something that should go to stable, I asked about it.
David Miller Sept. 21, 2010, 7:24 p.m. UTC | #5
From: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Date: Tue, 21 Sep 2010 15:09:40 -0300

> On Tue, 21 Sep 2010, Eric Dumazet wrote:
>> Le mardi 21 septembre 2010 à 13:26 -0300, Henrique de Moraes Holschuh a
>> écrit :
>> > Should this be a candidate for -stable?
>> > 
>> 
>> Yes, of course, but David wants to handle stable submissions himself.
>> 
>> I am not sure we want to bug stable team with dozens of mails while
>> polishing patches ?
> 
> We don't.  But one often marks commits that should go to -stable using a Cc:
> pseudo-header, and also includes relevant information (e.g. to which stable
> kernels it should be applied to) to the commit message.
> 
> Since there wasn't one, and I didn't readly find any post in this thread
> that mentioned it should also go to stable, AND it looked at first glance
> like something that should go to stable, I asked about it.

Sorry, that is not how we typically handle things in the networking.

I queue up all appropriate -stable patches automatically.

I do this mainly because:

1) It isn't the submitter who gets to decide all by himself that
   something is -stable material, that's partly my job too.

   So if the submitter puts the CC: stable thing in the commit
   message, that takes me out of the decision making process.

2) I do not want -stable submissions to go in just because a patch
   made it into Linus's tree.

   I want fixes to sit and cook in Linus's tree for a while before
   they go to -stable unless it's an _incredibly_ obvious fix.
   This allows any bugs in the fix to be shaken out first.

   Again, the CC: stable tag subverts that.

I really think the "CC: stable" tag is only appropriate for a very
limited scope of bug fixes.  The incredibly obvious ones that need
almost no testing and time exposure in Linus's tree.

All of the rest should be carefully queued up for -stable and
submitted there after a week or two of the patch sitting in Linus's
tree.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Henrique de Moraes Holschuh Sept. 21, 2010, 11:06 p.m. UTC | #6
On Tue, 21 Sep 2010, David Miller wrote:
> > Since there wasn't one, and I didn't readly find any post in this thread
> > that mentioned it should also go to stable, AND it looked at first glance
> > like something that should go to stable, I asked about it.
> 
> Sorry, that is not how we typically handle things in the networking.
> 
> I queue up all appropriate -stable patches automatically.

I see.

>    I want fixes to sit and cook in Linus's tree for a while before
>    they go to -stable unless it's an _incredibly_ obvious fix.
>    This allows any bugs in the fix to be shaken out first.

...

> All of the rest should be carefully queued up for -stable and
> submitted there after a week or two of the patch sitting in Linus's
> tree.

That is a very good way of doing things.  Thanks for the hard work!
diff mbox

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 04b6989..a643f7a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -488,9 +488,8 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 	 * we can switch to copy when see the first bad fragment.
 	 */
 	if (skb_has_frags(skb)) {
-		struct sk_buff *frag;
+		struct sk_buff *frag, *frag2;
 		int first_len = skb_pagelen(skb);
-		int truesizes = 0;
 
 		if (first_len - hlen > mtu ||
 		    ((first_len - hlen) & 7) ||
@@ -503,18 +502,18 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			if (frag->len > mtu ||
 			    ((frag->len & 7) && frag->next) ||
 			    skb_headroom(frag) < hlen)
-			    goto slow_path;
+			    goto slow_path_undo;
 
 			/* Partially cloned skb? */
 			if (skb_shared(frag))
-				goto slow_path;
+				goto slow_path_undo;
 
 			BUG_ON(frag->sk);
 			if (skb->sk) {
 				frag->sk = skb->sk;
 				frag->destructor = sock_wfree;
 			}
-			truesizes += frag->truesize;
+			skb->truesize -= frag->truesize;
 		}
 
 		/* Everything is OK. Generate! */
@@ -524,7 +523,6 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		frag = skb_shinfo(skb)->frag_list;
 		skb_frag_list_init(skb);
 		skb->data_len = first_len - skb_headlen(skb);
-		skb->truesize -= truesizes;
 		skb->len = first_len;
 		iph->tot_len = htons(first_len);
 		iph->frag_off = htons(IP_MF);
@@ -576,6 +574,15 @@  int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 		}
 		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
 		return err;
+
+slow_path_undo:
+		skb_walk_frags(skb, frag2) {
+			if (frag2 == frag)
+				break;
+			frag2->sk = NULL;
+			frag2->destructor = NULL;
+			skb->truesize += frag2->truesize;
+		}
 	}
 
 slow_path:
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index d40b330..ca7ba44 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -639,7 +639,7 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 	if (skb_has_frags(skb)) {
 		int first_len = skb_pagelen(skb);
-		int truesizes = 0;
+		struct sk_buff *frag2;
 
 		if (first_len - hlen > mtu ||
 		    ((first_len - hlen) & 7) ||
@@ -651,18 +651,18 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			if (frag->len > mtu ||
 			    ((frag->len & 7) && frag->next) ||
 			    skb_headroom(frag) < hlen)
-			    goto slow_path;
+			    goto slow_path_undo;
 
 			/* Partially cloned skb? */
 			if (skb_shared(frag))
-				goto slow_path;
+				goto slow_path_undo;
 
 			BUG_ON(frag->sk);
 			if (skb->sk) {
 				frag->sk = skb->sk;
 				frag->destructor = sock_wfree;
-				truesizes += frag->truesize;
 			}
+			skb->truesize -= frag->truesize;
 		}
 
 		err = 0;
@@ -693,7 +693,6 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 
 		first_len = skb_pagelen(skb);
 		skb->data_len = first_len - skb_headlen(skb);
-		skb->truesize -= truesizes;
 		skb->len = first_len;
 		ipv6_hdr(skb)->payload_len = htons(first_len -
 						   sizeof(struct ipv6hdr));
@@ -756,6 +755,15 @@  static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
 			      IPSTATS_MIB_FRAGFAILS);
 		dst_release(&rt->dst);
 		return err;
+
+slow_path_undo:
+		skb_walk_frags(skb, frag2) {
+			if (frag2 == frag)
+				break;
+			frag2->sk = NULL;
+			frag2->destructor = NULL;
+			skb->truesize += frag2->truesize;
+		}
 	}
 
 slow_path: