Patchwork two other cases Re: [Bug 11721] after upgrade to 2.6.27 i cannot navigate

login
register
mail settings
Submitter Ilpo Järvinen
Date Oct. 21, 2008, 9:36 a.m.
Message ID <Pine.LNX.4.64.0810211200170.7072@wrl-59.cs.helsinki.fi>
Download mbox | patch
Permalink /patch/5207/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Ilpo Järvinen - Oct. 21, 2008, 9:36 a.m.
Ah, I forgot to add bugzilla back last time, so added it there now.

On Tue, 21 Oct 2008, Aldo Maggi wrote:

> just as matter of information, two other cases similar to mine were
> reported in ubuntu bug pages:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/272896
> https://bugs.launchpad.net/ubuntu/+bug/285430
> i originated the first one and gave Jarek's first solution:
> tcp_sack=0
> which worked for the two other users.
> 
> maybe they could be contacted in order to perform further tests in a
> different environment.

It's hardly surprising that I couldn't reproduce this, the non-compliance 
here is most probably in isp's device or in some end-user sold embedded 
device.

Can you try this another debug patch below (on 2.6.27.2 is fine). It moves 
the mss to the last position but should keep timestamps in place by making 
wscale as first option. It is well possible that you won't get it working 
at all except with all ts,sack and wscale set to 0 (the most likely 
result). Please try with all wscale,sack,ts combinations (no need to 
provide dumps, just working/not working per case)... This should 
tell us for quite high certaintity what is the actual option which is 
causing this (would it not be the mss-at-beginning which is the most 
likely cause), however, your finding may well be specific to your network 
while the other people might a bit different results.

In order to provide maximal compatibility, I think we just restore the 
previous ordering of the fields (basically the first patch you tested).
It has no additional cost, so it won't hurt any, but it's quite ridiculous 
still that some devices care so little about basic tcp spec which has 
devastating effect on interoperatibility here, they should really fix
the devices instead but knowing how little most of the isp & etc. care
(or even understand) I'm not expecting too much to happen on that
front, and those who care probably run some semi-sane stuff already
anyway... :-). ...Sadly, it's much easier and cheaper to blame the 
end-user's equipment or Linux (if/once it becomes known that it's in use) 
and do nothing in case one is fool enough to complain to them.
Jarek Poplawski - Oct. 21, 2008, 10:09 a.m.
On Tue, Oct 21, 2008 at 12:36:33PM +0300, Ilpo Järvinen wrote:
...
> Can you try this another debug patch below (on 2.6.27.2 is fine). It moves 
> the mss to the last position but should keep timestamps in place by making 
> wscale as first option. It is well possible that you won't get it working 
> at all except with all ts,sack and wscale set to 0 (the most likely 
> result). Please try with all wscale,sack,ts combinations (no need to 
> provide dumps, just working/not working per case)... This should 
> tell us for quite high certaintity what is the actual option which is 
> causing this (would it not be the mss-at-beginning which is the most 
> likely cause), [...]

I'd like to remind it seems to work with only sack off, so mss after
ts. If so, my suspicion would be around rfc's/options' dating?

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen - Oct. 21, 2008, 10:51 a.m.
On Tue, 21 Oct 2008, Jarek Poplawski wrote:

> On Tue, Oct 21, 2008 at 12:36:33PM +0300, Ilpo Järvinen wrote:
> ...
> > Can you try this another debug patch below (on 2.6.27.2 is fine). It moves 
> > the mss to the last position but should keep timestamps in place by making 
> > wscale as first option. It is well possible that you won't get it working 
> > at all except with all ts,sack and wscale set to 0 (the most likely 
> > result). Please try with all wscale,sack,ts combinations (no need to 
> > provide dumps, just working/not working per case)... This should 
> > tell us for quite high certaintity what is the actual option which is 
> > causing this (would it not be the mss-at-beginning which is the most 
> > likely cause), [...]
> 
> I'd like to remind it seems to work with only sack off, so mss after
> ts. If so, my suspicion would be around rfc's/options' dating?

Thanks, I missed that case... there are so many cases already... :-)
...I'm not sure we'll find some very obvious reason for this, rfc dating 
probably has very little to do here.

I still think that putting an alternative option in front is useful test 
(though my initial mss-at-beginning thought wasn't bullet-proof) but 
maybe we could skip all those alternative settings of wscale,sack,ts and 
have them just all enabled.
Jarek Poplawski - Oct. 21, 2008, 11:12 a.m.
On Tue, Oct 21, 2008 at 01:51:10PM +0300, Ilpo Järvinen wrote:
> On Tue, 21 Oct 2008, Jarek Poplawski wrote:
> 
> > On Tue, Oct 21, 2008 at 12:36:33PM +0300, Ilpo Järvinen wrote:
> > ...
> > > Can you try this another debug patch below (on 2.6.27.2 is fine). It moves 
> > > the mss to the last position but should keep timestamps in place by making 
> > > wscale as first option. It is well possible that you won't get it working 
> > > at all except with all ts,sack and wscale set to 0 (the most likely 
> > > result). Please try with all wscale,sack,ts combinations (no need to 
> > > provide dumps, just working/not working per case)... This should 
> > > tell us for quite high certaintity what is the actual option which is 
> > > causing this (would it not be the mss-at-beginning which is the most 
> > > likely cause), [...]
> > 
> > I'd like to remind it seems to work with only sack off, so mss after
> > ts. If so, my suspicion would be around rfc's/options' dating?
> 
> Thanks, I missed that case... there are so many cases already... :-)
> ...I'm not sure we'll find some very obvious reason for this, rfc dating 
> probably has very little to do here.
> 
> I still think that putting an alternative option in front is useful test 
> (though my initial mss-at-beginning thought wasn't bullet-proof) but 
> maybe we could skip all those alternative settings of wscale,sack,ts and 
> have them just all enabled.

I agree with you, but when this test with all options enabled fails
(?!), IMHO disabling sack only could be enough to get it working
again.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen - Oct. 21, 2008, 12:18 p.m.
On Tue, 21 Oct 2008, Jarek Poplawski wrote:

> On Tue, Oct 21, 2008 at 01:51:10PM +0300, Ilpo Järvinen wrote:
> > On Tue, 21 Oct 2008, Jarek Poplawski wrote:
> > 
> > > On Tue, Oct 21, 2008 at 12:36:33PM +0300, Ilpo Järvinen wrote:
> > > ...
> > > > Can you try this another debug patch below (on 2.6.27.2 is fine). It moves 
> > > > the mss to the last position but should keep timestamps in place by making 
> > > > wscale as first option. It is well possible that you won't get it working 
> > > > at all except with all ts,sack and wscale set to 0 (the most likely 
> > > > result). Please try with all wscale,sack,ts combinations (no need to 
> > > > provide dumps, just working/not working per case)... This should 
> > > > tell us for quite high certaintity what is the actual option which is 
> > > > causing this (would it not be the mss-at-beginning which is the most 
> > > > likely cause), [...]
> > > 
> > > I'd like to remind it seems to work with only sack off, so mss after
> > > ts. If so, my suspicion would be around rfc's/options' dating?
> > 
> > Thanks, I missed that case... there are so many cases already... :-)
> > ...I'm not sure we'll find some very obvious reason for this, rfc dating 
> > probably has very little to do here.
> > 
> > I still think that putting an alternative option in front is useful test 
> > (though my initial mss-at-beginning thought wasn't bullet-proof) but 
> > maybe we could skip all those alternative settings of wscale,sack,ts and 
> > have them just all enabled.
> 
> I agree with you, but when this test with all options enabled fails
> (?!), IMHO disabling sack only could be enough to get it working
> again.

...Yes it should, but we could be all wrong and it just works with them 
all too :-).
Jarek Poplawski - Oct. 21, 2008, 12:34 p.m.
On Tue, Oct 21, 2008 at 03:18:57PM +0300, Ilpo Järvinen wrote:
...
> ...Yes it should, but we could be all wrong and it just works with them 
> all too :-).

All execept you! I guess you want it tested because you assume it's
possible? (Anyway it would be really funny :-)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen - Oct. 21, 2008, 2:16 p.m.
On Tue, 21 Oct 2008, Jarek Poplawski wrote:

> On Tue, Oct 21, 2008 at 03:18:57PM +0300, Ilpo Järvinen wrote:
> ...
> > ...Yes it should, but we could be all wrong and it just works with them 
> > all too :-).
> 
> All execept you! I guess you want it tested because you assume it's
> possible? (Anyway it would be really funny :-)

We're dealing with a case with in itself is really funny, yes, I assume 
it's very much possible though not too likely :-). I'd have very much 
wanted to test it myself too but sadly my network does not exhibit this 
corner-case behavior (ie., bug). Whether I'm right or wrong is not that 
significant for me :-), just understanding what the black-box does (or
is fine with) instead of just knowing how to workaround that. Or did I 
miss another case where the significance of mss specifically being there 
was decided? :-)

But whatever, if Aldo feels it's waste of his time he can just ignore 
me... :-)
Aldo Maggi - Oct. 21, 2008, 2:47 p.m.
Il giorno Tue, 21 Oct 2008 17:16:43 +0300 (EEST)
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> ha scritto:

[...]

> 
> But whatever, if Aldo feels it's waste of his time he can just ignore 
> me... :-)
> 
> 

ARE YOU JOKIIIIING? :-)
i'm only too honoured to cooperate from my very low level with you!

you do not imagine how much i value your commitment for linux!
my only dream is to see bloody ms in the dust ... and linux as a winner!

many thanks for your and your collegues work!
aldo
 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Oct. 21, 2008, 4:29 p.m.
On Tue, Oct 21, 2008 at 05:16:43PM +0300, Ilpo Järvinen wrote:
...
> We're dealing with a case with in itself is really funny, yes, I assume 
> it's very much possible though not too likely :-). I'd have very much 
> wanted to test it myself too but sadly my network does not exhibit this 
> corner-case behavior (ie., bug). Whether I'm right or wrong is not that 
> significant for me :-), just understanding what the black-box does (or
> is fine with) instead of just knowing how to workaround that. Or did I 
> miss another case where the significance of mss specifically being there 
> was decided? :-)
> 
> But whatever, if Aldo feels it's waste of his time he can just ignore 
> me... :-)

Hmm... If by any chance my previous messages caused you think like
this, I'm really sorry. I completely agree with the way you deal with
this, and I don't pretend I know the result. I still agree there seems
to be not much logic in this, so all results are probable, and worth
checking - especially when we have such helpful assistant as Aldo.

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aldo Maggi - Oct. 21, 2008, 6:18 p.m.
Il giorno Tue, 21 Oct 2008 12:36:33 +0300 (EEST)
"Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> ha scritto:

[...]

> 
> Can you try this another debug patch below (on 2.6.27.2 is fine). It
> moves the mss to the last position but should keep timestamps in
> place by making wscale as first option. It is well possible that you
> won't get it working at all except with all ts,sack and wscale set to
> 0 (the most likely result). Please try with all wscale,sack,ts
> combinations (no need to provide dumps, just working/not working per
> case)... This should tell us for quite high certaintity what is the
> actual option which is causing this (would it not be the
> mss-at-beginning which is the most likely cause), however, your
> finding may well be specific to your network while the other people
> might a bit different results.

i've compiled 2.6.27.2 source after having patched it with your today's
patch.

it works! i.e. i can navigate (w3m kernel.org) and update (apt-get
update)

i send you anyway the usual tcpdump as in previous msgs, should it be
of any help.

do you want me to provide some more commands output?

waiting...

aldo
Jarek Poplawski - Oct. 21, 2008, 6:45 p.m.
On Tue, Oct 21, 2008 at 08:18:57PM +0200, Aldo Maggi wrote:
...
> i've compiled 2.6.27.2 source after having patched it with your today's
> patch.
> 
> it works! i.e. i can navigate (w3m kernel.org) and update (apt-get
> update)

Ilpo, I should say you're incredible! ...But, since I've promised to
myself not to disturb you anymore (again), I can't do this, sorry :-(

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Oct. 21, 2008, 11:32 p.m.
From: Jarek Poplawski <jarkao2@gmail.com>
Date: Tue, 21 Oct 2008 20:45:39 +0200

> On Tue, Oct 21, 2008 at 08:18:57PM +0200, Aldo Maggi wrote:
> ...
> > i've compiled 2.6.27.2 source after having patched it with your today's
> > patch.
> > 
> > it works! i.e. i can navigate (w3m kernel.org) and update (apt-get
> > update)
> 
> Ilpo, I should say you're incredible! ...But, since I've promised to
> myself not to disturb you anymore (again), I can't do this, sorry :-(

Indeed, excellent work Ilpo.

Ilpo, now that we know this fixes things for sure, could you submit
this formally with a proper signoff?

I'll queue it up for -stable too.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index de54f02..63b0a3f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -376,6 +376,13 @@  static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
 		*md5_hash = NULL;
 	}
 
+	if (unlikely(opts->ws)) {
+		*ptr++ = htonl((TCPOPT_NOP << 24) |
+			       (TCPOPT_WINDOW << 16) |
+			       (TCPOLEN_WINDOW << 8) |
+			       opts->ws);
+	}
+
 	if (likely(OPTION_TS & opts->options)) {
 		if (unlikely(OPTION_SACK_ADVERTISE & opts->options)) {
 			*ptr++ = htonl((TCPOPT_SACK_PERM << 24) |
@@ -392,12 +399,6 @@  static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
 		*ptr++ = htonl(opts->tsecr);
 	}
 
-	if (unlikely(opts->mss)) {
-		*ptr++ = htonl((TCPOPT_MSS << 24) |
-			       (TCPOLEN_MSS << 16) |
-			       opts->mss);
-	}
-
 	if (unlikely(OPTION_SACK_ADVERTISE & opts->options &&
 		     !(OPTION_TS & opts->options))) {
 		*ptr++ = htonl((TCPOPT_NOP << 24) |
@@ -406,11 +407,10 @@  static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
 			       TCPOLEN_SACK_PERM);
 	}
 
-	if (unlikely(opts->ws)) {
-		*ptr++ = htonl((TCPOPT_NOP << 24) |
-			       (TCPOPT_WINDOW << 16) |
-			       (TCPOLEN_WINDOW << 8) |
-			       opts->ws);
+	if (unlikely(opts->mss)) {
+		*ptr++ = htonl((TCPOPT_MSS << 24) |
+			       (TCPOLEN_MSS << 16) |
+			       opts->mss);
 	}
 
 	if (unlikely(opts->num_sack_blocks)) {