diff mbox

2.6.29-rc5-git3 Leak r=1 3

Message ID Pine.LNX.4.64.0902211324460.20284@wrl-59.cs.helsinki.fi
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Ilpo Järvinen Feb. 21, 2009, 11:44 a.m. UTC
On Fri, 20 Feb 2009, Denys Fedoryschenko wrote:

> On Friday 20 February 2009 18:09:54 Ilpo Järvinen wrote:
> > > [45853.641072] TCP: Peer 172.16.193.87:1259/8080 unexpectedly shrunk
> > > window 1905466446:1905480750 (repaired)
> >
> > Hmm, we might have sent past our allowed window due to some bug. ...Or
> > combined some segments in the new shifting code incorrectly past the
> > right edge of the window though I quickly checked and we do check for
> > tcp_send_head there so it should be right.
> 
> 
> This message (about window), i have tons of them, and had before a lot. 
> 
> I think it is not related, just i paste whole dmesg part where i see 
> more "Leak" messages. I'm sure i have more of them now.

Indeed, there's a possiblity that tcp_sacktag_one didn't behave
as expected wrt. retrans_out. You shouldn't have had any Leak in or
before 2.6.28 (except perhaps in some ancient dev kernels).

I also noticed that end_seq usage in tcp_sacktag_one is a bit
suspicious too but found out that it mostly works. It would be
safer to make partial end_seq available there though it's not
too easy to find a sane scenario where the wrong end_seq makes
any difference though. No need to do that in mainline for sure
considering how small the worst case effect is.

...What a can of worms that tcp_sacktag_one is...

Comments

Denys Fedoryshchenko Feb. 22, 2009, 2:32 a.m. UTC | #1
> Indeed, there's a possiblity that tcp_sacktag_one didn't behave
> as expected wrt. retrans_out. You shouldn't have had any Leak in or
> before 2.6.28 (except perhaps in some ancient dev kernels).
Well, not so ancient. I dig more, looks similar:

Linux corporategw 2.6.18-gentoo-r3 #3 SMP Tue Nov 28 02:05:14 Local time zone 
must be set--see zic  i686 Intel(R) Xeon(TM) CPU 3.00GHz GenuineIntel 
GNU/Linux
TCP: Treason uncloaked! Peer 80.83.30.237:50676/8080 shrinks window 
445059938:445066258. Repaired.
TCP: Treason uncloaked! Peer 80.83.30.237:50676/8080 shrinks window 
445059938:445066258. Repaired.
TCP: Treason uncloaked! Peer 80.83.30.237:50676/8080 shrinks window 
445059938:445066258. Repaired.
TCP: Treason uncloaked! Peer 80.83.25.222:49488/8080 shrinks window 
3868565231:3868566407. Repaired.
TCP: Treason uncloaked! Peer 80.83.25.222:49488/8080 shrinks window 
3868565231:3868566407. Repaired.
TCP: Treason uncloaked! Peer 80.83.25.222:49488/8080 shrinks window 
3868565231:3868566407. Repaired.

NEWNET_PROXY ~ # dmesg
TCP: Treason uncloaked! Peer 77.222.40.36:80/60350 shrinks window 
3300731285:3300731381. Repaired.
NEWNET_PROXY ~ # uname -a
Linux NEWNET_PROXY 2.6.19-gentoo-r5 #5 SMP Fri Apr 6 21:13:31 EEST 2007 i686 
Intel(R) Xeon(TM) CPU 2.40GHz GenuineIntel GNU/Linux



proxy2 ~ # uname -a
Linux proxy2 2.6.27-build-0036 #5 SMP Sun Oct 12 14:22:11 Local time zone must 
be set--see zic  i686 unknown

[8159479.417657] TCP: Treason uncloaked! Peer 172.16.213.82:3248/8080 shrinks 
window 3854928708:3854943141. Repaired.
[8159496.302532] TCP: Treason uncloaked! Peer 172.16.213.82:3248/8080 shrinks 
window 3854928708:3854943141. Repaired.
[8218620.535525] UDP: short packet: From 217.151.224.29:21004 34620/1480 to 
213.187.247.9:57711
[8241928.195260] UDP: short packet: From 217.151.224.29:21219 58294/1480 to 
213.187.247.9:2490
[8337257.470320] TCP: Treason uncloaked! Peer 172.16.24.103:49556/8080 shrinks 
window 4233674068:4233674858. Repaired.
[8337261.370289] TCP: Treason uncloaked! Peer 172.16.24.103:49556/8080 shrinks 
window 4233674068:4233674858. Repaired.                                                                                    
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen Feb. 22, 2009, 6:06 a.m. UTC | #2
On Sun, 22 Feb 2009, Denys Fedoryschenko wrote:

> > Indeed, there's a possiblity that tcp_sacktag_one didn't behave
> > as expected wrt. retrans_out. You shouldn't have had any Leak in or
> > before 2.6.28 (except perhaps in some ancient dev kernels).
>
> Well, not so ancient. I dig more, looks similar:
> 
> Linux corporategw 2.6.18-gentoo-r3 #3 SMP Tue Nov 28 02:05:14 Local time zone 
> must be set--see zic  i686 Intel(R) Xeon(TM) CPU 3.00GHz GenuineIntel 
> GNU/Linux
> TCP: Treason uncloaked! Peer 80.83.30.237:50676/8080 shrinks window 
> 445059938:445066258. Repaired.
> TCP: Treason uncloaked! Peer 80.83.30.237:50676/8080 shrinks window 
> 445059938:445066258. Repaired.
> TCP: Treason uncloaked! Peer 80.83.30.237:50676/8080 shrinks window 
> 445059938:445066258. Repaired.
> TCP: Treason uncloaked! Peer 80.83.25.222:49488/8080 shrinks window 
> 3868565231:3868566407. Repaired.
> TCP: Treason uncloaked! Peer 80.83.25.222:49488/8080 shrinks window 
> 3868565231:3868566407. Repaired.
> TCP: Treason uncloaked! Peer 80.83.25.222:49488/8080 shrinks window 
> 3868565231:3868566407. Repaired.
> 
> NEWNET_PROXY ~ # dmesg
> TCP: Treason uncloaked! Peer 77.222.40.36:80/60350 shrinks window 
> 3300731285:3300731381. Repaired.
> NEWNET_PROXY ~ # uname -a
> Linux NEWNET_PROXY 2.6.19-gentoo-r5 #5 SMP Fri Apr 6 21:13:31 EEST 2007 i686 
> Intel(R) Xeon(TM) CPU 2.40GHz GenuineIntel GNU/Linux
> 
> 
> 
> proxy2 ~ # uname -a
> Linux proxy2 2.6.27-build-0036 #5 SMP Sun Oct 12 14:22:11 Local time zone must 
> be set--see zic  i686 unknown
> 
> [8159479.417657] TCP: Treason uncloaked! Peer 172.16.213.82:3248/8080 shrinks 
> window 3854928708:3854943141. Repaired.
> [8159496.302532] TCP: Treason uncloaked! Peer 172.16.213.82:3248/8080 shrinks 
> window 3854928708:3854943141. Repaired.
> [8218620.535525] UDP: short packet: From 217.151.224.29:21004 34620/1480 to 
> 213.187.247.9:57711
> [8241928.195260] UDP: short packet: From 217.151.224.29:21219 58294/1480 to 
> 213.187.247.9:2490
> [8337257.470320] TCP: Treason uncloaked! Peer 172.16.24.103:49556/8080 shrinks 
> window 4233674068:4233674858. Repaired.
> [8337261.370289] TCP: Treason uncloaked! Peer 172.16.24.103:49556/8080 shrinks 
> window 4233674068:4233674858. Repaired.                                                                                    

No Leaks afaict? Maybe there's some language related confusion (now) 
between us :-). I agree that treasons can happen since they can be 
triggered by the actions of the remote end (though this is not the only 
explanation, in some kernels our local bugs have also caused those 
messages in which we then put blame on the peer :-D). However, the 
authorities recently decided that treasons, literally, must no longer 
happen and put a end to all future treasons ;-) (ie., the message was
slightly modified :-)).

...What I said above is that "Leak" printouts in 2.6.28 and before 
shouldn't be there (or at least I only remember it happening in some 
ancient dev kernels, when it was last time seen). And the patch in 
the previous mail is supposed to deal with the ones that got introduced 
into 2.6.29-rc1.
David Miller Feb. 22, 2009, 7:57 a.m. UTC | #3
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Sat, 21 Feb 2009 13:44:27 +0200 (EET)

> [PATCH] tcp: fix retrans_out leaks in shifting code
> 
> There's conflicting assumptions in shifting, the caller assumes
> that dupsack results in S'ed skbs (or a part of it) for sure but
> never gave a hint to tcp_sacktag_one when dsack is actually in
> use. Thus DSACK retrans_out -= pcount was not taken and the
> counter became out of sync. Remove obstacle from that information
> flow to get DSACKs accounted in tcp_sacktag_one as expected.
> 
> Compile tested only.
> 
> In general the call to tcp_sacktag_one with the later full shift
> from already SACKed skb should not be there, it's just doing
> duplicating work already done earlier so it just wastes cycles and
> also seems to duplicate small amount of code unnecessarily :-/.
> However, moving tcp_sacktag_one call one level up is not what I want
> to do in mainline -rc5, the counting & seqno state manipulation that
> is going on there is just too hairy to get right in a hurry (order
> is very significant there as the state transition is in progress
> making normal invariants are not to be trusted). Luckily
> tcp_sacktag_one is effectively a no-op atm with dup_sack set to 0
> for already SACKed skb.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Can you do some testing on this before I toss it into
net-next-2.6? :-)  It looks fine to be, but... so did
Herbert's URG change :)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denys Fedoryshchenko Feb. 27, 2009, 3:04 p.m. UTC | #4
On Sunday 22 February 2009 09:57:24 David Miller wrote:
> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Sat, 21 Feb 2009 13:44:27 +0200 (EET)
>
> > [PATCH] tcp: fix retrans_out leaks in shifting code
> >
> > There's conflicting assumptions in shifting, the caller assumes
> > that dupsack results in S'ed skbs (or a part of it) for sure but
> > never gave a hint to tcp_sacktag_one when dsack is actually in
> > use. Thus DSACK retrans_out -= pcount was not taken and the
> > counter became out of sync. Remove obstacle from that information
> > flow to get DSACKs accounted in tcp_sacktag_one as expected.
> >
> > Compile tested only.
> >
> > In general the call to tcp_sacktag_one with the later full shift
> > from already SACKed skb should not be there, it's just doing
> > duplicating work already done earlier so it just wastes cycles and
> > also seems to duplicate small amount of code unnecessarily :-/.
> > However, moving tcp_sacktag_one call one level up is not what I want
> > to do in mainline -rc5, the counting & seqno state manipulation that
> > is going on there is just too hairy to get right in a hurry (order
> > is very significant there as the state transition is in progress
> > making normal invariants are not to be trusted). Luckily
> > tcp_sacktag_one is effectively a no-op atm with dup_sack set to 0
> > for already SACKed skb.
> >
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
>
> Can you do some testing on this before I toss it into
> net-next-2.6? :-)  It looks fine to be, but... so did
> Herbert's URG change :)
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>

No crashes for sure, running for 5 hours, no Leak messages appearing too.
Test will continue more, after 20 hours more i can tell for sure, if it helps 
or not.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a6961d7..c28976a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1374,7 +1374,8 @@  static u8 tcp_sacktag_one(struct sk_buff *skb, struct sock *sk,
 
 static int tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
 			   struct tcp_sacktag_state *state,
-			   unsigned int pcount, int shifted, int mss)
+			   unsigned int pcount, int shifted, int mss,
+			   int dup_sack)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *prev = tcp_write_queue_prev(sk, skb);
@@ -1410,7 +1411,7 @@  static int tcp_shifted_skb(struct sock *sk, struct sk_buff *skb,
 	}
 
 	/* We discard results */
-	tcp_sacktag_one(skb, sk, state, 0, pcount);
+	tcp_sacktag_one(skb, sk, state, dup_sack, pcount);
 
 	/* Difference in this won't matter, both ACKed by the same cumul. ACK */
 	TCP_SKB_CB(prev)->sacked |= (TCP_SKB_CB(skb)->sacked & TCPCB_EVER_RETRANS);
@@ -1561,7 +1562,7 @@  static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb,
 
 	if (!skb_shift(prev, skb, len))
 		goto fallback;
-	if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss))
+	if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss, dup_sack))
 		goto out;
 
 	/* Hole filled allows collapsing with the next as well, this is very
@@ -1580,7 +1581,7 @@  static struct sk_buff *tcp_shift_skb_data(struct sock *sk, struct sk_buff *skb,
 	len = skb->len;
 	if (skb_shift(prev, skb, len)) {
 		pcount += tcp_skb_pcount(skb);
-		tcp_shifted_skb(sk, skb, state, tcp_skb_pcount(skb), len, mss);
+		tcp_shifted_skb(sk, skb, state, tcp_skb_pcount(skb), len, mss, 0);
 	}
 
 out: