diff mbox

[v2] tcp: fix MSG_PEEK race check

Message ID Pine.LNX.4.64.0905180958150.3381@wrl-59.cs.helsinki.fi
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Ilpo Järvinen May 18, 2009, 7:24 a.m. UTC
On Sun, 17 May 2009, David Miller wrote:

> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Mon, 11 May 2009 09:32:34 +0300 (EEST)
> 
> > [PATCH v2] tcp: fix MSG_PEEK race check
> > 
> > Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
> > blocking behavior) lets the loop run longer than the race check
> > did previously expect, so we need to be more careful with this
> > check and consider the work we have been doing.
> > 
> > I tried my best to deal with urg hole madness too which happens
> > here:
> > 	if (!sock_flag(sk, SOCK_URGINLINE)) {
> > 		++*seq;
> > 		...
> > by using additional offset by one but I certainly have very
> > little interest in testing that part.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > Tested-by: Frans Pop <elendil@planet.nl>
> 
> Ok, now that I've looked at this, the urg_hole part of this change has
> to be removed.

Thanks for taking a look... :-) The first patch btw was with RFC and 
without urg bits btw but you put that into some discarded like(?) state in 
patchwork... :-/

> That case being accounted for with urg_hole is exactly what the
> debugging message is trying to catch, where we are doing MSG_PEEK and
> tcp_check_urg() advances ->copied_seq on us during one of those
> "release_sock();/lock_sock();" sequences (which thus invoke TCP input
> processing).

Sorry, I'm not sure you thought this fully through here. What my patch 
with urg_hole does is that it keeps peek_seq non-advancing using the 
offsets. Now without the urg offsetting, the peek_seq side of the race 
check advances, which means that we didn't catch the race when it happened 
for real as copied_seq advanced too?!?

From another perspective when the race didn't happen but potential for it 
existed, the current check should catch that since peek_seq advanced and 
copied_seq stayed where it was. But that certainly doesn't match to your 
description above.

I wonder why all this fuzz about this particular race as we will do our 
best anyway to skip the hole on MSG_PEEK side (if I read the code right)? 
Either we never see the hole in MSG_PEEK side, or we're happily past it 
already, does it make any difference anymore in the latter case? <insert 
some "not that I fully understand all of that multipage function" 
disclaimer here though, I may think too simple scenarios here>. Not that 
I'm too interested in "improving" urg or so anywa, I'm just curious... :-)

> Could you please respin this patch with the URG bits removed?

Below.

Comments

Matthias Andree May 18, 2009, 3:34 p.m. UTC | #1
Am 18.05.2009, 09:24 Uhr, schrieb Ilpo Järvinen  
<ilpo.jarvinen@helsinki.fi>:

> On Sun, 17 May 2009, David Miller wrote:
>
>> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
>> Date: Mon, 11 May 2009 09:32:34 +0300 (EEST)
>>
>> > [PATCH v2] tcp: fix MSG_PEEK race check
>> >
>> > Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
>> > blocking behavior) lets the loop run longer than the race check
>> > did previously expect, so we need to be more careful with this
>> > check and consider the work we have been doing.
>> >
>> > I tried my best to deal with urg hole madness too which happens
>> > here:
>> > 	if (!sock_flag(sk, SOCK_URGINLINE)) {
>> > 		++*seq;
>> > 		...
>> > by using additional offset by one but I certainly have very
>> > little interest in testing that part.
>> >
>> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
>> > Tested-by: Frans Pop <elendil@planet.nl>
>>
>> Ok, now that I've looked at this, the urg_hole part of this change has
>> to be removed.
>
> Thanks for taking a look... :-) The first patch btw was with RFC and
> without urg bits btw but you put that into some discarded like(?) state  
> in patchwork... :-/

WRT the earlier patch, we have one more success report from one of the  
users who reported the problem, namely Ian Zimmermann:

<http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=513695#155>
David Miller May 18, 2009, 10:04 p.m. UTC | #2
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Mon, 18 May 2009 10:24:23 +0300 (EEST)

> Sorry, I'm not sure you thought this fully through here. What my patch 
> with urg_hole does is that it keeps peek_seq non-advancing using the 
> offsets. Now without the urg offsetting, the peek_seq side of the race 
> check advances, which means that we didn't catch the race when it happened 
> for real as copied_seq advanced too?!?

I see now what the situation is, thanks for explaining.

You're account for the "*seq" advance for URG that happens in
tcp_recvmsg() rather than the one that happens in tcp_check_urg().

> From another perspective when the race didn't happen but potential for it 
> existed, the current check should catch that since peek_seq advanced and 
> copied_seq stayed where it was. But that certainly doesn't match to your 
> description above.

Right.

> I wonder why all this fuzz about this particular race as we will do our 
> best anyway to skip the hole on MSG_PEEK side (if I read the code right)? 
> Either we never see the hole in MSG_PEEK side, or we're happily past it 
> already, does it make any difference anymore in the latter case? <insert 
> some "not that I fully understand all of that multipage function" 
> disclaimer here though, I may think too simple scenarios here>. Not that 
> I'm too interested in "improving" urg or so anywa, I'm just curious... :-)

A long long time ago we had an assertion here checking whether
->copied_seq moved without our knowledge.  Alexey and I found this
could trigger and investigation of that is what helped us
find the tcp_check_urg()+MSG_PEEK case.  That's when we added this
test and log message.

When the MSG_PEEK test existing in the !copied if() branch of
tcp_recvmsg(), so many of these code paths we are dealing with in this
fix could simply not be reached when MSG_PEEK.  That ++*seq could
never happen, because if "copied" was non-zero and MSG_PEEK was true
we would leave the loop and return from tcp_recvmsg() immediately.

Now, we have to accomodate those adjustments.

Since I now understand your urg_hole code I'm going to apply your v2
patch which takes care of the URG stuff.  I also buy the argument that
perhaps we should get rid of the log message, but look at how it
helped us find this kernel regression :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen May 19, 2009, 4:33 a.m. UTC | #3
On Mon, 18 May 2009, David Miller wrote:

> From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
> Date: Mon, 18 May 2009 10:24:23 +0300 (EEST)
> 
> > Sorry, I'm not sure you thought this fully through here. What my patch 
> > with urg_hole does is that it keeps peek_seq non-advancing using the 
> > offsets. Now without the urg offsetting, the peek_seq side of the race 
> > check advances, which means that we didn't catch the race when it happened 
> > for real as copied_seq advanced too?!?
> 
> I see now what the situation is, thanks for explaining.
> 
> You're account for the "*seq" advance for URG that happens in
> tcp_recvmsg() rather than the one that happens in tcp_check_urg().

Right.

[I've moved the next fragment here from below...]

> When the MSG_PEEK test existing in the !copied if() branch of
> tcp_recvmsg(), so many of these code paths we are dealing with in this
> fix could simply not be reached when MSG_PEEK.  That ++*seq could
> never happen, because if "copied" was non-zero and MSG_PEEK was true
> we would leave the loop and return from tcp_recvmsg() immediately.
>
> Now, we have to accomodate those adjustments.

...I thought I wrote something along those lines in the log message 
(though I failed to make it so dead obvious as you do it here :-)).
 
> > From another perspective when the race didn't happen but potential for it 
> > existed, the current check should catch that since peek_seq advanced and 
> > copied_seq stayed where it was. But that certainly doesn't match to your 
> > description above.
> 
> Right.

[...more moving of fragments here...]

> Since I now understand your urg_hole code I'm going to apply your v2
> patch which takes care of the URG stuff.

I wonder if you realized, that after my change the check _no longer_ 
catches this case where the potential exists but race didn't happen for 
real? (bleh, I realized while writing the previous mail that this might be 
misinterpreted but I was lazy enough to not fix that :-()

If we want to "forbid" urgs during msg_peek (ie., unconditionally warn), 
we could just change the check to do || urg_hole and that would catch both 
cases, or even better, just clone it into if (MSG_PEEK) under urg hole 
branch. The latter would even allow us to distinguish between the cases => 
more intelligent message can then be given in the urg case but that of 
course needs first the decision that it's something we must check for
at all.

> > I wonder why all this fuzz about this particular race as we will do our 
> > best anyway to skip the hole on MSG_PEEK side (if I read the code right)? 
> > Either we never see the hole in MSG_PEEK side, or we're happily past it 
> > already, does it make any difference anymore in the latter case? <insert 
> > some "not that I fully understand all of that multipage function" 
> > disclaimer here though, I may think too simple scenarios here>. Not that 
> > I'm too interested in "improving" urg or so anywa, I'm just curious... :-)
> 
> A long long time ago we had an assertion here checking whether
> ->copied_seq moved without our knowledge.  Alexey and I found this
> could trigger and investigation of that is what helped us
> find the tcp_check_urg()+MSG_PEEK case.  That's when we added this
> test and log message.

Sure, the copied_seq is moving, but what I'm after is that does it really 
make any difference from tcp_recvmsg point of view? It certainly triggers 
the message but that won't work as a proof of the evilness for me.

...The above paragraph is assuming recvmsg is able to deal with that and 
doesn't choke because of the changing copied_seq. I'd have to audit it 
once again to verify that it's really ok but I don't see any particular 
reason why it couldn't be possible to make recvmsg to not care on the 
tcp_check_urg side copied_seq changes but I'd like to hear a clear 
confirmation on that from you too.

> I also buy the argument that perhaps we should get rid of the log message, 

I'm not against the log message here. It's still useful for detecting real 
userspace races but the urg vs. MSG_PEEK case is what I'm not so sure is 
as evil in itself as claimed. I'm not sure if you meant the latter?

> but look at how it helped us find this kernel regression :-)

Heh, the regression was quite devastating yeah... ...limited to that check
itself :-). However, I had to audit rest of the loop too to check for 
other similar problems, so it certainly had more potential that just 
spurious printout (luckily I didn't find any other problem like this).
David Miller May 19, 2009, 4:40 a.m. UTC | #4
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Tue, 19 May 2009 07:33:02 +0300 (EEST)

> Sure, the copied_seq is moving, but what I'm after is that does it really 
> make any difference from tcp_recvmsg point of view? It certainly triggers 
> the message but that won't work as a proof of the evilness for me.
> 
> ...The above paragraph is assuming recvmsg is able to deal with that and 
> doesn't choke because of the changing copied_seq. I'd have to audit it 
> once again to verify that it's really ok but I don't see any particular 
> reason why it couldn't be possible to make recvmsg to not care on the 
> tcp_check_urg side copied_seq changes but I'd like to hear a clear 
> confirmation on that from you too.

recvmsg can deal with it fine, because we reset the peek_seq when we
print out that message.

There is no way that an application writer has any clue about this
interaction, where peeked bytes disappear and then reappear in
the out-of-band URG byte.  That's why the message is there.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matthias Andree May 19, 2009, 9:05 a.m. UTC | #5
David Miller schrieb:

> Since I now understand your urg_hole code I'm going to apply your v2
> patch which takes care of the URG stuff.  I also buy the argument that
> perhaps we should get rid of the log message, but look at how it
> helped us find this kernel regression :-)

Hi,

...and this discovery is a reason to leave it in, and perhaps make sure
that the check code is properly linked (through comments for lack of better
means) to the actual data transfer functions.

Cheers
MA
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1d7f49c..ccbd69b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1532,7 +1532,7 @@  do_prequeue:
 				}
 			}
 		}
-		if ((flags & MSG_PEEK) && peek_seq != tp->copied_seq) {
+		if ((flags & MSG_PEEK) && (peek_seq - copied != tp->copied_seq)) {
 			if (net_ratelimit())
 				printk(KERN_DEBUG "TCP(%s:%d): Application bug, race in MSG_PEEK.\n",
 				       current->comm, task_pid_nr(current));