Message ID | 1406960889.3178.60.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On 08/02/2014 12:28 AM, Eric Dumazet wrote: > On Fri, 2014-08-01 at 21:51 -0600, Chris Friesen wrote: >> I've got an app that tries to connect() to both of them in turn. The connect() >> to the first socket fails with EAGAIN, the second one succeeds, and all >> subsequent retries on the first fail. Here's an strace() of the sequence: >> >> socket(PF_FILE, SOCK_STREAM, 0) = 6 >> fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) >> fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > > Non blocking socket : If listener queue is full, -EAGAIN is expected That doesn't make any sense though, there is only one process that ever attempts to connect() to this socket, and I only ran it one instance at a time. That implies that the first time I got EAGAIN the queue would have been empty when the connection request came in. >> With the app not running, netstat seems to show that something is trying to >> connect to the socket in question: >> >> root@compute-0:~# netstat -ap unix |grep messaging >> unix 2 [ ACC ] STREAM LISTENING 1109818 17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> unix 2 [ ACC ] STREAM LISTENING 1110051 17425/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock >> unix 2 [ ] STREAM CONNECTING 0 - /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> unix 2 [ ] STREAM CONNECTING 0 - /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> unix 2 [ ] STREAM CONNECTED 1109848 17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> >> >> Here's /proc/net/unix for completeness: >> >> root@compute-0:~/host-guest-comm# grep -a messaging /proc/net/unix >> ffff880045c35540: 00000002 00000000 00010000 0001 01 1109818 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> ffff8800576b8a80: 00000002 00000000 00010000 0001 01 1110051 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock >> ffff880045e2f040: 00000002 00000000 00000000 0001 02 0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> ffff88004bc5ea80: 00000002 00000000 00000000 0001 02 0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> ffff880045e2f540: 00000002 00000000 00000000 0001 03 1109848 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock >> >> >> >> The crazy thing is that I can't figure out what could be causing the >> CONNECTED/CONNECTING sockets. There are no background processes of the >> connecting app running, no zombie processes, no forked children, etc. >> >> Just to make things more interesting, I successfully ran this application >> several times (connecting to both sockets) before this behaviour started >> happening. I was running it under strace and just killed it with ctrl-C. >> >> Anyone got any ideas? Please CC me since I'm not subscribed to the list. > > The application might use a too small listen() backlog ? Looking at the qemu code I think it's calling listen(sock,1) which makes sense since I think it's only designed to allow a single connection up into the guest at a time. Not sure how that could be the problem though, since there is only one process that tries to connect() to the application, and I only ran it one instance at a time. I'll give the patch a try, but how would that explain the sockets that are in a CONNECTING state when as far as I can tell they don't belong to any process? Am I correct to think that the CONNECTED socket may be due to the two CONNECTING ones somehow? Chris -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 2014-08-02 at 08:11 -0600, Chris Friesen wrote: > On 08/02/2014 12:28 AM, Eric Dumazet wrote: > > On Fri, 2014-08-01 at 21:51 -0600, Chris Friesen wrote: > > >> I've got an app that tries to connect() to both of them in turn. The connect() > >> to the first socket fails with EAGAIN, the second one succeeds, and all > >> subsequent retries on the first fail. Here's an strace() of the sequence: > >> > >> socket(PF_FILE, SOCK_STREAM, 0) = 6 > >> fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) > >> fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0 > > > > Non blocking socket : If listener queue is full, -EAGAIN is expected > > > That doesn't make any sense though, there is only one process that ever > attempts to connect() to this socket, and I only ran it one instance at > a time. That implies that the first time I got EAGAIN the queue would > have been empty when the connection request came in. This looks like an application bug, missing a POLLIN event and it always call accept() too late. > Looking at the qemu code I think it's calling listen(sock,1) which makes > sense since I think it's only designed to allow a single connection up > into the guest at a time. > > Not sure how that could be the problem though, since there is only one > process that tries to connect() to the application, and I only ran it > one instance at a time. Well, change listen() backlog to 10, and maybe it will hide the application bug. > > I'll give the patch a try, but how would that explain the sockets that > are in a CONNECTING state when as far as I can tell they don't belong to > any process? The accept() call comes too late. You have the CONNECTING state as long accept() was not yet called. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index e96884380732..78b7a7cf3071 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2380,6 +2380,8 @@ static int unix_seq_show(struct seq_file *seq, void *v) for ( ; i < len; i++) seq_putc(seq, u->addr->name->sun_path[i]); } + seq_printf(seq, " %u/%u", skb_queue_len(&s->sk_receive_queue), + s->sk_max_ack_backlog); unix_state_unlock(s); seq_putc(seq, '\n'); }