Message ID | 1357220469.21409.24574.camel@edumazet-glaptop |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2013-01-02 at 20:47 +0000, Eric Wong wrote: > > Eric Wong <normalperson@yhbt.net> wrote: > > > [1] my full setup is very strange. > > > > > > Other than the FUSE component I forgot to mention, little depends on > > > the kernel. With all this, the standalone toosleepy can get stuck. > > > I'll try to reproduce it with less... > > > > I just confirmed my toosleepy processes will get stuck while just > > doing "rsync -a" between local disks. So this does not depend on > > sendfile or FUSE to reproduce. > > -- > > How do you tell your 'toosleepy' is stuck ? My original post showed it stuck with strace (in ppoll + send). I only strace after seeing it's not using any CPU in top. http://mid.gmane.org/20121228014503.GA5017@dcvr.yhbt.net (lsof also confirmed the ppoll/send sockets were peers) > If reading its output, you should change its logic, there is no > guarantee the recv() will deliver exactly 16384 bytes each round. > > With the following patch, I cant reproduce the 'apparent stuck' Right, the output is just an approximation and the logic there was bogus. Thanks for looking at this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Wong <normalperson@yhbt.net> wrote: > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > With the following patch, I cant reproduce the 'apparent stuck' > > Right, the output is just an approximation and the logic there > was bogus. > > Thanks for looking at this. I'm still able to reproduce the issue under v3.8-rc2 with your patch for toosleepy. (As expected when blocked,) TCP send() will eventually return ETIMEOUT when I forget to check (and toosleepy will abort from it) I think this requires frequent dirtying/cycling of pages to reproduce. (from copying large files around) to interact with compaction. I'll see if I can reproduce the issue with read-only FS activity. With 3.7.1 and compaction/THP disabled, I was able to run ~21 hours and copy a few TB around without anything getting stuck. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Wong <normalperson@yhbt.net> wrote: > I think this requires frequent dirtying/cycling of pages to reproduce. > (from copying large files around) to interact with compaction. > I'll see if I can reproduce the issue with read-only FS activity. Still successfully running the read-only test on my main machine, will provide another update in a few hours or so if it's still successful (it usually takes <1 hour to hit). I also fired up a VM on my laptop (still running v3.7) and was able to get stuck with only 2 cores and 512M on the VM (x86_64). On the small VM with little disk space, it doesn't need much dirty data to trigger. I just did this: find $45G_NFS_MOUNT -type f -print0 | \ xargs -0 -n1 -P4 sh -c 'cat "$1" >> tmp; > tmp' -- ...while running two instances of toosleepy (one got stuck and aborted). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Wong <normalperson@yhbt.net> wrote: > Eric Wong <normalperson@yhbt.net> wrote: > > I think this requires frequent dirtying/cycling of pages to reproduce. > > (from copying large files around) to interact with compaction. > > I'll see if I can reproduce the issue with read-only FS activity. > > Still successfully running the read-only test on my main machine, will > provide another update in a few hours or so if it's still successful > (it usually takes <1 hour to hit). The read-only test is still going on my main machine. I think writes/dirty data is required to reproduce the issue... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/toosleepy.c b/toosleepy.c index e64b7cd..df3610f 100644 --- a/toosleepy.c +++ b/toosleepy.c @@ -15,6 +15,7 @@ #include <fcntl.h> #include <assert.h> #include <limits.h> +#include <time.h> struct receiver { int rfd; @@ -53,6 +54,7 @@ static void * recv_loop(void *p) ssize_t r, s; size_t received = 0; size_t sent = 0; + time_t t0 = time(NULL), t1; for (;;) { r = recv(rcvr->rfd, buf, sizeof(buf), 0); @@ -80,9 +82,12 @@ static void * recv_loop(void *p) write(-1, buf, sizeof(buf)); } } - if ((received % (sizeof(buf) * sizeof(buf) * 16) == 0)) + t1 = time(NULL); + if (t1 != t0) { dprintf(2, " %d progress: %zu\n", rcvr->rfd, received); + t0 = t1; + } } dprintf(2, "%d got: %zu\n", rcvr->rfd, received); if (rcvr->sfd >= 0) {