diff mbox

Strange Application bug, race in MSG_PEEK complaints (was: Bug#513695: fetchmail: race in MSG_PEEK)

Message ID Pine.LNX.4.64.0905070931590.12068@wrl-59.cs.helsinki.fi
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Ilpo Järvinen May 7, 2009, 6:48 a.m. UTC
On Thu, 7 May 2009, Matthias Andree wrote:

> (Please Cc: Frans and myself on replies, we're not subscribed to netdev@.)
> (Resent from mutt, after Opera trashed my headers...)
> (DaveM, this was Cc:d to you earlier today under a different subject. The
> failing wget is new though.)
> 
> Dear Kernel Hackers,
> 
> I am getting rather frequent "TCP(...): Application bug, race in MSG_PEEK."
> complaints for fetchmail these days.
> The strange part is that fetchmail is a single-threaded application that does
> not deal with URG data.
> I cannot see what fetchmail would race against in this situation.
> 
> Frans Pop has found interesting patterns probably related to segment or
> packet/payload sizes (1460 bytes!) - see his message quoted below.
> 
> His most recent findings are:
> : As further info, I've also got very infrequent MSGPEEK errors on my
> : laptop, which runs nothing strange (... KDE ...)
> : Interesting may be that both systems are dual core, running x86_64.
> :
> : Here are the errors from my laptop (also with my extra debug info):
> : Mar 14 09:23:18 aragorn kernel: TCP(kio_imap4:5646): Application bug, race
> : in MSG_PEEK: 0, 23.
> : Apr  8 12:21:28 aragorn kernel: TCP(kio_imap4:24173): Application bug, race
> : in MSG_PEEK: 38dd5ab8, 8.
> : May  3 23:45:29 aragorn kernel: TCP(wget:4137): Application bug, race in
> : MSG_PEEK: 77093ab8, 3e.
> : Note the last one. A simple wget?!
> 
> Application use: Fetchmail does use MSG_PEEK in several places to look ahead.
> The same process that peeks will later read the data. It can happen that we
> peek at 1 byte, then more bytes, or that we peek at 512 bytes and read only
> smaller amounts (say, 30 - 70) of them before we peek again.
> 
> As to the application source code: Look for instance for fm_peek in the
> function SockRead() in
> <http://mknod.org/svn/fetchmail/branches/BRANCH_6-3/socket.c> and its callers.
> 
> Can we be sure that
> 
> (a) the kernel correctly handles peek_seq and tp->copied_seq and their
> comparison (see Frans's message below - is the != in the if statement actually
> the right thing or should this be a > or <?), and
> 
> (b) that the message is printed only if there is a real app bug (that I fail
> to see), and
> 
> (c) that the kernel handles package boundaries, wraparounds, and buffers
> correctly?
> 
> Might the reception of further data in the socket's receive buffer trigger the
> message? As in: between peek and read, or between read and peek, further data
> arrives, and triggers the message?
> 
> Might the kernel's TCP code be confused about absolute vs. relative
> sequence/position indexes/pointers/counters inside the TCP code?
> 
> Thanks a lot in advance. (Frans's earlier message included below.)
> 
> Best regards
> Matthias
> 
> 
> ------- Weitergeleitete Nachricht -------
> Von: "Frans Pop" <elendil@planet.nl>
> An: "Matthias Andree" <matthias.andree@gmx.de>
> Kopie:
> Betreff: Re: Bug#513695: fetchmail: race in MSG_PEEK - trace data
> Datum: Sat, 14 Mar 2009 19:00:13 +0100
> 
> On Monday 09 March 2009, Matthias Andree wrote:
> >Unfortunately, the strace seems to be incomplete - likely strace macros
> >wrong: I see lots of recvfrom with replies, but not the send or write
> >that actually shows the other half of the conversation. Could I have
> >coordinated full strace logging and kernel messages?
> 
> See attachment. The syslog is complete for the period the tcpdump was
> running, but the tcpdump itself is filtered for only the relevant
> conversation. Again account name and password removed from the traces.
> 
> It took a bit longer because I wanted to get some history from a kernel
> patch I added. I've now automated the creation of the traces, so if you
> want more data, even from the kernel, please ask.
> 
> I've applied the following patch to net/ipv4/tcp.c for the kernel running
> on my server (2.6.29-rc8):
> @@ -1499,8 +1499,9 @@ do_prequeue:
>    		}
>    		if ((flags & MSG_PEEK) && peek_seq != tp->copied_seq) {
>    			if (net_ratelimit())
> -				printk(KERN_DEBUG "TCP(%s:%d): Application
> bug, race in MSG_PEEK.\n",
> -				       current->comm, task_pid_nr(current));
> +				printk(KERN_DEBUG "TCP(%s:%d): Application
> bug, race in MSG_PEEK: %x,
> %x.\n",
> +				       current->comm, task_pid_nr(current)),
> +				       peek_seq, tp->copied_seq;

I cannot resist myself from noting that this certainly wasn't the patch 
one got those printks below... It might happily compile though :-).

>    			peek_seq = tp->copied_seq;
>    		}
>    		continue;
> 
> So, the values you see at the end of the warning are peek_seq and
> tp->copied_seq. This gives messages like:
> kernel: TCP(fetchmail:9311): Application bug, race in MSG_PEEK: 156233,
> 16a.
> kernel: TCP(fetchmail:9311): Application bug, race in MSG_PEEK: 71259ac8,
> 5b4.
> kernel: TCP(fetchmail:31216): Application bug, race in MSG_PEEK: 833fe5,
> c0.
> 
> The distribution for tp->copied_seq is interesting:
>         1 24
>         1 c0
>         1 132
>         1 16a
>         3 248
>         1 2a7
>         1 400
>       166 5b4
> 
> So, 0x5b4 (1460) seems to be special (is it a maximum size maybe?); if

1460 is size of the maximum wire segment when not using timestamps (1500 
MTU - (20+20 headers), would be 1448 with timestamps). So basically the 
skbs in the receiver queue will be of that length (holds until the driver 
of your nic is converted to gro).

> I grep the kernel for that I get:
> net/ipv4/tcp_input.c:           if (tp->mss_cache > 1460)
> net/ipv4/tcp_output.c:          if (mss > 1460 * 3)
> net/ipv4/tcp_output.c:          else if (mss > 1460)
> And:
> net/ipv[46]/syncookies.c:
> static __u16 const msstab[] = {
>           64 - 1,
>           256 - 1,
>           512 - 1,
>           536 - 1,
>           1024 - 1,
>           1440 - 1,
>           1460 - 1,
>           4312 - 1,
>           (__u16)-1
> };
> 
> Not sure if all that means anything, but I thought I'd mention it.

What would you think about the following, untested patch... I suppose it 
is enough to capture the racy situations except with that crazy urg hole, 
grr (I suppose that will need just another variable to do the offset
of one).

Comments

Frans Pop May 7, 2009, 5:16 p.m. UTC | #1
On Thursday 07 May 2009, Ilpo Järvinen wrote:
> On Thu, 7 May 2009, Matthias Andree wrote:
> > I've applied the following patch to net/ipv4/tcp.c for the kernel
> > running on my server (2.6.29-rc8):
> > @@ -1499,8 +1499,9 @@ do_prequeue:
> >    		}
> >    		if ((flags & MSG_PEEK) && peek_seq != tp->copied_seq) {
> >    			if (net_ratelimit())
> > -				printk(KERN_DEBUG "TCP(%s:%d): Application bug, race in MSG_PEEK.\n",
> > -				       current->comm, task_pid_nr(current));
> > +				printk(KERN_DEBUG "TCP(%s:%d): Application bug, race in MSG_PEEK: %x, %x.\n",
> > +				       current->comm, task_pid_nr(current)),
> > +				       peek_seq, tp->copied_seq;
>
> I cannot resist myself from noting that this certainly wasn't the patch
> one got those printks below... It might happily compile though :-).

Can you please elaborate why you think that? It may be horribly broken
(I've never claimed to be a C coder, and probably never will), but it
also really is the patch that generates the printks...

> >    			peek_seq = tp->copied_seq;
> >    		}
> >    		continue;
> >
> > So, the values you see at the end of the warning are peek_seq and
> > tp->copied_seq. This gives messages like:
> > kernel: TCP(fetchmail:9311): Application bug, race in MSG_PEEK: 156233, 16a.
> > kernel: TCP(fetchmail:9311): Application bug, race in MSG_PEEK: 71259ac8, 5b4.
> > kernel: TCP(fetchmail:31216): Application bug, race in MSG_PEEK: 833fe5, c0.

[...]

> What would you think about the following, untested patch... I suppose
> it is enough to capture the racy situations except with that crazy urg
> hole, grr (I suppose that will need just another variable to do the
> offset of one).

I'll give your patch a try and report back.

Thanks,
FJP
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen May 7, 2009, 6:48 p.m. UTC | #2
On Thu, 7 May 2009, Frans Pop wrote:

> On Thursday 07 May 2009, Ilpo Järvinen wrote:
> > On Thu, 7 May 2009, Matthias Andree wrote:
> > > I've applied the following patch to net/ipv4/tcp.c for the kernel
> > > running on my server (2.6.29-rc8):
> > > @@ -1499,8 +1499,9 @@ do_prequeue:
> > >    		}
> > >    		if ((flags & MSG_PEEK) && peek_seq != tp->copied_seq) {
> > >    			if (net_ratelimit())
> > > -				printk(KERN_DEBUG "TCP(%s:%d): Application bug, race in MSG_PEEK.\n",
> > > -				       current->comm, task_pid_nr(current));
> > > +				printk(KERN_DEBUG "TCP(%s:%d): Application bug, race in MSG_PEEK: %x, %x.\n",
> > > +				       current->comm, task_pid_nr(current)),
> > > +				       peek_seq, tp->copied_seq;
> >
> > I cannot resist myself from noting that this certainly wasn't the patch
> > one got those printks below... It might happily compile though :-).
> 
> Can you please elaborate why you think that? It may be horribly broken
> (I've never claimed to be a C coder, and probably never will), but it
> also really is the patch that generates the printks...

...This was mainly meant to be a joke... :-)

The parenthesis won't match how a printk with string and 4 args should 
be called, so with this version you have in the mail peek_seq and 
tp->copied_seq are not put into stack or you were just super lucky.

> > >    			peek_seq = tp->copied_seq;
> > >    		}
> > >    		continue;
> > >
> > > So, the values you see at the end of the warning are peek_seq and
> > > tp->copied_seq. This gives messages like:
> > > kernel: TCP(fetchmail:9311): Application bug, race in MSG_PEEK: 156233, 16a.
> > > kernel: TCP(fetchmail:9311): Application bug, race in MSG_PEEK: 71259ac8, 5b4.
> > > kernel: TCP(fetchmail:31216): Application bug, race in MSG_PEEK: 833fe5, c0.
> 
> [...]
> 
> > What would you think about the following, untested patch... I suppose
> > it is enough to capture the racy situations except with that crazy urg
> > hole, grr (I suppose that will need just another variable to do the
> > offset of one).
> 
> I'll give your patch a try and report back.

Thanks. If it works I can add that urg hole madness handling too there.
Frans Pop May 7, 2009, 8:43 p.m. UTC | #3
On Thursday 07 May 2009, Ilpo Järvinen wrote:
> > Can you please elaborate why you think that? It may be horribly
> > broken (I've never claimed to be a C coder, and probably never will),
> > but it also really is the patch that generates the printks...
>
> ...This was mainly meant to be a joke... :-)
>
> The parenthesis won't match how a printk with string and 4 args should
> be called, so with this version you have in the mail peek_seq and
> tp->copied_seq are not put into stack or you were just super lucky.

Ah, yes. You're absolutely right. Duh.
gcc does warn about it, but I must have missed that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frans Pop May 9, 2009, 6:14 p.m. UTC | #4
On Thursday 07 May 2009, Ilpo Järvinen wrote:
> [RFC PATCH] tcp: fix MSG_PEEK race check
>
> Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
> blocking behavior) lets the loop run longer than this check
> did previously expect, so we need to be more careful with
> this check and consider the work we have been doing.
>
> I'm a bit unsure if this improved check can still fail as
>         if (!sock_flag(sk, SOCK_URGINLINE)) {
>                 ++*seq;
>                 ...
> does not increment copied.
>
> Compile tested.
>
> Signed-off-by: Ilpo J?rvinen <ilpo.jarvinen@helsinki.fi>

I've been running with the patch for 2 days now and have not seen any more 
MSG_PEEK errors, so as far as I'm concerned the patch does fix the issue 
(needed the time in order to be confident of that).

So:
Tested-by: Frans Pop <elendil@planet.nl>

Suggest to also add a CC for stable.

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1d7f49c..ccbd69b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1532,7 +1532,7 @@  do_prequeue:
 				}
 			}
 		}
-		if ((flags & MSG_PEEK) && peek_seq != tp->copied_seq) {
+		if ((flags & MSG_PEEK) && (peek_seq - copied != tp->copied_seq)) {
 			if (net_ratelimit())
 				printk(KERN_DEBUG "TCP(%s:%d): Application bug, race in MSG_PEEK.\n",
 				       current->comm, task_pid_nr(current));