diff mbox

weird behaviour, getting EAGAIN on a connect() call on a unix stream socket

Message ID 1406960889.3178.60.camel@edumazet-glaptop2.roam.corp.google.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Aug. 2, 2014, 6:28 a.m. UTC
On Fri, 2014-08-01 at 21:51 -0600, Chris Friesen wrote:
> Hi,
> 
> I'm trying to figure out what would case a connect() call on a unix stream
> socket to return EAGAIN.  (On a 3.4 kernel, if it matters.)
> 
> I've got two unix stream sockets on the system, created by two qemu instances
> as virtio-serial channels.
> 
> I've got an app that tries to connect() to both of them in turn.  The connect()
> to the first socket fails with EAGAIN, the second one succeeds, and all
> subsequent retries on the first fail.  Here's an strace() of the sequence:
> 
> socket(PF_FILE, SOCK_STREAM, 0)         = 6
> fcntl(6, F_GETFL)                       = 0x2 (flags O_RDWR)
> fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK)    = 0

Non blocking socket : If listener queue is full, -EAGAIN is expected

> connect(6, {sa_family=AF_FILE, sun_path="/var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock"}, 61) = -1 EAGAIN (Resource temporarily unavailable)
> clock_gettime(CLOCK_MONOTONIC, {158877, 262941763}) = 0
> socket(PF_FILE, SOCK_STREAM, 0)         = 7
> fcntl(7, F_GETFL)                       = 0x2 (flags O_RDWR)
> fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
> connect(7, {sa_family=AF_FILE, sun_path="/var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock"}, 61) = 0
> getdents(5, /* 0 entries */, 32768)     = 0
> close(5)                                = 0
> clock_gettime(CLOCK_MONOTONIC, {158877, 265359109}) = 0
> poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=7, events=POLLIN}], 3, 997) = 0 (Timeout)
> clock_gettime(CLOCK_MONOTONIC, {158878, 265914614}) = 0
> connect(6, {sa_family=AF_FILE, sun_path="/var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock"}, 61) = -1 EAGAIN (Resource temporarily unavailable)
> 
> 
> With the app not running, netstat seems to show that something is trying to
> connect to the socket in question:
> 
> root@compute-0:~# netstat -ap unix |grep messaging
> unix  2      [ ACC ]     STREAM     LISTENING     1109818  17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> unix  2      [ ACC ]     STREAM     LISTENING     1110051  17425/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock
> unix  2      [ ]         STREAM     CONNECTING    0        -                   /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> unix  2      [ ]         STREAM     CONNECTING    0        -                   /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> unix  2      [ ]         STREAM     CONNECTED     1109848  17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> 
> 
> Here's /proc/net/unix for completeness:
> 
> root@compute-0:~/host-guest-comm# grep -a messaging /proc/net/unix
> ffff880045c35540: 00000002 00000000 00010000 0001 01 1109818 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> ffff8800576b8a80: 00000002 00000000 00010000 0001 01 1110051 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock
> ffff880045e2f040: 00000002 00000000 00000000 0001 02     0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> ffff88004bc5ea80: 00000002 00000000 00000000 0001 02     0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> ffff880045e2f540: 00000002 00000000 00000000 0001 03 1109848 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
> 
> 
> 
> The crazy thing is that I can't figure out what could be causing the
> CONNECTED/CONNECTING sockets.  There are no background processes of the
> connecting app running, no zombie processes, no forked children, etc.
> 
> Just to make things more interesting, I successfully ran this application
> several times (connecting to both sockets) before this behaviour started
> happening.  I was running it under strace and just killed it with ctrl-C.
> 
> Anyone got any ideas?   Please CC me since I'm not subscribed to the list.

The application might use a too small listen() backlog ?

Try this debugging patch : (note this might break some applications
parsing /proc/net/unix)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chris Friesen Aug. 2, 2014, 2:11 p.m. UTC | #1
On 08/02/2014 12:28 AM, Eric Dumazet wrote:
> On Fri, 2014-08-01 at 21:51 -0600, Chris Friesen wrote:

>> I've got an app that tries to connect() to both of them in turn.  The connect()
>> to the first socket fails with EAGAIN, the second one succeeds, and all
>> subsequent retries on the first fail.  Here's an strace() of the sequence:
>>
>> socket(PF_FILE, SOCK_STREAM, 0)         = 6
>> fcntl(6, F_GETFL)                       = 0x2 (flags O_RDWR)
>> fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
>
> Non blocking socket : If listener queue is full, -EAGAIN is expected


That doesn't make any sense though, there is only one process that ever 
attempts to connect() to this socket, and I only ran it one instance at 
a time.  That implies that the first time I got EAGAIN the queue would 
have been empty when the connection request came in.


>> With the app not running, netstat seems to show that something is trying to
>> connect to the socket in question:
>>
>> root@compute-0:~# netstat -ap unix |grep messaging
>> unix  2      [ ACC ]     STREAM     LISTENING     1109818  17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>> unix  2      [ ACC ]     STREAM     LISTENING     1110051  17425/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock
>> unix  2      [ ]         STREAM     CONNECTING    0        -                   /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>> unix  2      [ ]         STREAM     CONNECTING    0        -                   /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>> unix  2      [ ]         STREAM     CONNECTED     1109848  17379/qemu-system-x /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>>
>>
>> Here's /proc/net/unix for completeness:
>>
>> root@compute-0:~/host-guest-comm# grep -a messaging /proc/net/unix
>> ffff880045c35540: 00000002 00000000 00010000 0001 01 1109818 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>> ffff8800576b8a80: 00000002 00000000 00010000 0001 01 1110051 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000008.sock
>> ffff880045e2f040: 00000002 00000000 00000000 0001 02     0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>> ffff88004bc5ea80: 00000002 00000000 00000000 0001 02     0 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>> ffff880045e2f540: 00000002 00000000 00000000 0001 03 1109848 /var/lib/libvirt/qemu/cgcs.messaging.instance-00000007.sock
>>
>>
>>
>> The crazy thing is that I can't figure out what could be causing the
>> CONNECTED/CONNECTING sockets.  There are no background processes of the
>> connecting app running, no zombie processes, no forked children, etc.
>>
>> Just to make things more interesting, I successfully ran this application
>> several times (connecting to both sockets) before this behaviour started
>> happening.  I was running it under strace and just killed it with ctrl-C.
>>
>> Anyone got any ideas?   Please CC me since I'm not subscribed to the list.
>
> The application might use a too small listen() backlog ?

Looking at the qemu code I think it's calling listen(sock,1) which makes 
sense since I think it's only designed to allow a single connection up 
into the guest at a time.

Not sure how that could be the problem though, since there is only one 
process that tries to connect() to the application, and I only ran it 
one instance at a time.

I'll give the patch a try, but how would that explain the sockets that 
are in a CONNECTING state when as far as I can tell they don't belong to 
any process?

Am I correct to think that the CONNECTED socket may be due to the two 
CONNECTING ones somehow?

Chris

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 3, 2014, 6:14 a.m. UTC | #2
On Sat, 2014-08-02 at 08:11 -0600, Chris Friesen wrote:
> On 08/02/2014 12:28 AM, Eric Dumazet wrote:
> > On Fri, 2014-08-01 at 21:51 -0600, Chris Friesen wrote:
> 
> >> I've got an app that tries to connect() to both of them in turn.  The connect()
> >> to the first socket fails with EAGAIN, the second one succeeds, and all
> >> subsequent retries on the first fail.  Here's an strace() of the sequence:
> >>
> >> socket(PF_FILE, SOCK_STREAM, 0)         = 6
> >> fcntl(6, F_GETFL)                       = 0x2 (flags O_RDWR)
> >> fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
> >
> > Non blocking socket : If listener queue is full, -EAGAIN is expected
> 
> 
> That doesn't make any sense though, there is only one process that ever 
> attempts to connect() to this socket, and I only ran it one instance at 
> a time.  That implies that the first time I got EAGAIN the queue would 
> have been empty when the connection request came in.

This looks like an application bug, missing a POLLIN event and it always
call accept() too late.

> Looking at the qemu code I think it's calling listen(sock,1) which makes 
> sense since I think it's only designed to allow a single connection up 
> into the guest at a time.



> 
> Not sure how that could be the problem though, since there is only one 
> process that tries to connect() to the application, and I only ran it 
> one instance at a time.

Well, change listen() backlog to 10, and maybe it will hide the
application bug.

> 
> I'll give the patch a try, but how would that explain the sockets that 
> are in a CONNECTING state when as far as I can tell they don't belong to 
> any process?

The accept() call comes too late.

You have the CONNECTING state as long accept() was not yet called.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e96884380732..78b7a7cf3071 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2380,6 +2380,8 @@  static int unix_seq_show(struct seq_file *seq, void *v)
 			for ( ; i < len; i++)
 				seq_putc(seq, u->addr->name->sun_path[i]);
 		}
+		seq_printf(seq, " %u/%u", skb_queue_len(&s->sk_receive_queue),
+			   s->sk_max_ack_backlog);
 		unix_state_unlock(s);
 		seq_putc(seq, '\n');
 	}