diff mbox

ppoll() stuck on POLLIN while TCP peer is sending

Message ID 1357220469.21409.24574.camel@edumazet-glaptop
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Jan. 3, 2013, 1:41 p.m. UTC
On Wed, 2013-01-02 at 20:47 +0000, Eric Wong wrote:
> Eric Wong <normalperson@yhbt.net> wrote:
> > [1] my full setup is very strange.
> > 
> >     Other than the FUSE component I forgot to mention, little depends on
> >     the kernel.  With all this, the standalone toosleepy can get stuck.
> >     I'll try to reproduce it with less...
> 
> I just confirmed my toosleepy processes will get stuck while just
> doing "rsync -a" between local disks.  So this does not depend on
> sendfile or FUSE to reproduce.
> --

How do you tell your 'toosleepy' is stuck ?

If reading its output, you should change its logic, there is no
guarantee the recv() will deliver exactly 16384 bytes each round.

With the following patch, I cant reproduce the 'apparent stuck'






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Wong Jan. 3, 2013, 6:32 p.m. UTC | #1
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-01-02 at 20:47 +0000, Eric Wong wrote:
> > Eric Wong <normalperson@yhbt.net> wrote:
> > > [1] my full setup is very strange.
> > > 
> > >     Other than the FUSE component I forgot to mention, little depends on
> > >     the kernel.  With all this, the standalone toosleepy can get stuck.
> > >     I'll try to reproduce it with less...
> > 
> > I just confirmed my toosleepy processes will get stuck while just
> > doing "rsync -a" between local disks.  So this does not depend on
> > sendfile or FUSE to reproduce.
> > --
> 
> How do you tell your 'toosleepy' is stuck ?

My original post showed it stuck with strace (in ppoll + send).
I only strace after seeing it's not using any CPU in top.

http://mid.gmane.org/20121228014503.GA5017@dcvr.yhbt.net
(lsof also confirmed the ppoll/send sockets were peers)

> If reading its output, you should change its logic, there is no
> guarantee the recv() will deliver exactly 16384 bytes each round.
> 
> With the following patch, I cant reproduce the 'apparent stuck'

Right, the output is just an approximation and the logic there
was bogus.

Thanks for looking at this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Wong Jan. 3, 2013, 11:45 p.m. UTC | #2
Eric Wong <normalperson@yhbt.net> wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > With the following patch, I cant reproduce the 'apparent stuck'
> 
> Right, the output is just an approximation and the logic there
> was bogus.
> 
> Thanks for looking at this.

I'm still able to reproduce the issue under v3.8-rc2 with your patch
for toosleepy.

(As expected when blocked,) TCP send() will eventually return
ETIMEOUT when I forget to check (and toosleepy will abort from it)

I think this requires frequent dirtying/cycling of pages to reproduce.
(from copying large files around) to interact with compaction.
I'll see if I can reproduce the issue with read-only FS activity.

With 3.7.1 and compaction/THP disabled, I was able to run ~21 hours
and copy a few TB around without anything getting stuck.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Wong Jan. 4, 2013, 12:26 a.m. UTC | #3
Eric Wong <normalperson@yhbt.net> wrote:
> I think this requires frequent dirtying/cycling of pages to reproduce.
> (from copying large files around) to interact with compaction.
> I'll see if I can reproduce the issue with read-only FS activity.

Still successfully running the read-only test on my main machine, will
provide another update in a few hours or so if it's still successful
(it usually takes <1 hour to hit).

I also fired up a VM on my laptop (still running v3.7) and was able to
get stuck with only 2 cores and 512M on the VM (x86_64).  On the small
VM with little disk space, it doesn't need much dirty data to trigger.
I just did this:

    find $45G_NFS_MOUNT -type f -print0 | \
       xargs -0 -n1 -P4 sh -c 'cat "$1" >> tmp; > tmp' --

...while running two instances of toosleepy (one got stuck and aborted).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Wong Jan. 4, 2013, 3:52 a.m. UTC | #4
Eric Wong <normalperson@yhbt.net> wrote:
> Eric Wong <normalperson@yhbt.net> wrote:
> > I think this requires frequent dirtying/cycling of pages to reproduce.
> > (from copying large files around) to interact with compaction.
> > I'll see if I can reproduce the issue with read-only FS activity.
> 
> Still successfully running the read-only test on my main machine, will
> provide another update in a few hours or so if it's still successful
> (it usually takes <1 hour to hit).

The read-only test is still going on my main machine.
I think writes/dirty data is required to reproduce the issue...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/toosleepy.c b/toosleepy.c
index e64b7cd..df3610f 100644
--- a/toosleepy.c
+++ b/toosleepy.c
@@ -15,6 +15,7 @@ 
 #include <fcntl.h>
 #include <assert.h>
 #include <limits.h>
+#include <time.h>
 
 struct receiver {
 	int rfd;
@@ -53,6 +54,7 @@  static void * recv_loop(void *p)
 	ssize_t r, s;
 	size_t received = 0;
 	size_t sent = 0;
+	time_t t0 = time(NULL), t1;
 
 	for (;;) {
 		r = recv(rcvr->rfd, buf, sizeof(buf), 0);
@@ -80,9 +82,12 @@  static void * recv_loop(void *p)
 				write(-1, buf, sizeof(buf));
 			}
 		}
-		if ((received % (sizeof(buf) * sizeof(buf) * 16) == 0))
+		t1 = time(NULL);
+		if (t1 != t0) {
 			dprintf(2, " %d progress: %zu\n",
 			        rcvr->rfd, received);
+			t0 = t1;
+		}
 	}
 	dprintf(2, "%d got: %zu\n", rcvr->rfd, received);
 	if (rcvr->sfd >= 0) {