diff mbox

[V2] Handle EAGAIN in wpa_supplicant_ctrl_iface_send

Message ID 1379924225-6699-1-git-send-email-pontus.fuchs@gmail.com
State Changes Requested
Headers show

Commit Message

Pontus Fuchs Sept. 23, 2013, 8:17 a.m. UTC
If a burst of events are sent, the socket queue can overflow and
sendmsg fails with EAGAIN. When this happens the monitor is silently
detached.

Signed-hostap: Pontus Fuchs <pontus.fuchs@gmail.com>
---

V2: Update commit msg.

 wpa_supplicant/ctrl_iface_unix.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Comments

Jouni Malinen Sept. 25, 2013, 1:32 p.m. UTC | #1
On Mon, Sep 23, 2013 at 10:17:05AM +0200, Pontus Fuchs wrote:
> If a burst of events are sent, the socket queue can overflow and
> sendmsg fails with EAGAIN. When this happens the monitor is silently
> detached.

> diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c
> @@ -623,14 +623,21 @@ static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock,
> -				if (dst->errors > 1000 ||
> -				    (_errno != ENOBUFS && dst->errors > 10) ||
> -				    _errno == ENOENT) {
> +				if (dst->errors > 1000 || _errno == ENOENT)
> +					detach = 1;
> +				if (!(_errno == ENOBUFS || _errno == EAGAIN ||
> +				    _errno == EWOULDBLOCK) && dst->errors > 10)
> +					detach = 1;

I'm seeing quite bad issues with the control interface sockets reaching
the maximum send buffer limit and getting stuck in the previous design.
This change could make this happen even more easily, so I don't think I
can apply this now. I pushed a workaround patch to recover from this
state. It is not really nice design, but well, no easy solution for the
real issue has shown up yet and it may be necessary to do a much larger
design change to address this cleanly using connection oriented sockets
rather than a single unconnected socket.

I would also welcome a patch to figure out more reliably if a client of
the ctrl_iface socket is not available to reduce the likelihood of
messages being left in the send buffer (or a timeout to get rid of those
pending messages somehow if that is possible with a single socket). I
did not see any obvious way of doing this without using multiple sockets
and potentially changes to the control interface that would not
necessarily be backwards compatible (i.e., without requiring all clients
to be rebuilt in addition to modifying wpa_supplicant).
Dan Williams Sept. 25, 2013, 2:30 p.m. UTC | #2
On Wed, 2013-09-25 at 16:32 +0300, Jouni Malinen wrote:
> On Mon, Sep 23, 2013 at 10:17:05AM +0200, Pontus Fuchs wrote:
> > If a burst of events are sent, the socket queue can overflow and
> > sendmsg fails with EAGAIN. When this happens the monitor is silently
> > detached.
> 
> > diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c
> > @@ -623,14 +623,21 @@ static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock,
> > -				if (dst->errors > 1000 ||
> > -				    (_errno != ENOBUFS && dst->errors > 10) ||
> > -				    _errno == ENOENT) {
> > +				if (dst->errors > 1000 || _errno == ENOENT)
> > +					detach = 1;
> > +				if (!(_errno == ENOBUFS || _errno == EAGAIN ||
> > +				    _errno == EWOULDBLOCK) && dst->errors > 10)
> > +					detach = 1;
> 
> I'm seeing quite bad issues with the control interface sockets reaching
> the maximum send buffer limit and getting stuck in the previous design.
> This change could make this happen even more easily, so I don't think I
> can apply this now. I pushed a workaround patch to recover from this
> state. It is not really nice design, but well, no easy solution for the
> real issue has shown up yet and it may be necessary to do a much larger
> design change to address this cleanly using connection oriented sockets
> rather than a single unconnected socket.

D-Bus! :) </rim shot>

You don't even need a dbus-daemon process running to use the D-Bus
protocol as long as you enforce authorization by checking the socket
peer credentials.  All you need is libdbus.  If there's interest, that
could certainly be implemented in the supplicant; typically it's just a
private socket somewhere in /var.

Dan
Ben Greear Sept. 25, 2013, 5:58 p.m. UTC | #3
On 09/25/2013 06:32 AM, Jouni Malinen wrote:
> On Mon, Sep 23, 2013 at 10:17:05AM +0200, Pontus Fuchs wrote:
>> If a burst of events are sent, the socket queue can overflow and
>> sendmsg fails with EAGAIN. When this happens the monitor is silently
>> detached.
>
>> diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c
>> @@ -623,14 +623,21 @@ static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock,
>> -				if (dst->errors > 1000 ||
>> -				    (_errno != ENOBUFS && dst->errors > 10) ||
>> -				    _errno == ENOENT) {
>> +				if (dst->errors > 1000 || _errno == ENOENT)
>> +					detach = 1;
>> +				if (!(_errno == ENOBUFS || _errno == EAGAIN ||
>> +				    _errno == EWOULDBLOCK) && dst->errors > 10)
>> +					detach = 1;
>
> I'm seeing quite bad issues with the control interface sockets reaching
> the maximum send buffer limit and getting stuck in the previous design.
> This change could make this happen even more easily, so I don't think I
> can apply this now. I pushed a workaround patch to recover from this
> state. It is not really nice design, but well, no easy solution for the
> real issue has shown up yet and it may be necessary to do a much larger
> design change to address this cleanly using connection oriented sockets
> rather than a single unconnected socket.

Have you tried just increasing the socket buffer sizes?  That would probably
be easier and more efficient than keeping a pending queue in supplicant.

Thanks,
Ben
Jouni Malinen Sept. 25, 2013, 8:21 p.m. UTC | #4
On Wed, Sep 25, 2013 at 10:58:08AM -0700, Ben Greear wrote:
> Have you tried just increasing the socket buffer sizes?  That would probably
> be easier and more efficient than keeping a pending queue in supplicant.

I haven't, but I don't think it would help either. The messages sent by
wpa_supplicant for clients that never receive them seem to be stuck
indefinitely and making the limit larger will just push out the
inevitable issue a bit further. If there were a way to make kernel drop
these eventually on some timeout that would obviously be helpful, but
I'm not sure what that would be for unconnected AF_UNIX datagram socket.
Jouni Malinen Sept. 25, 2013, 8:25 p.m. UTC | #5
On Wed, Sep 25, 2013 at 09:30:30AM -0500, Dan Williams wrote:
> D-Bus! :) </rim shot>
> 
> You don't even need a dbus-daemon process running to use the D-Bus
> protocol as long as you enforce authorization by checking the socket
> peer credentials.  All you need is libdbus.  If there's interest, that
> could certainly be implemented in the supplicant; typically it's just a
> private socket somewhere in /var.

I'm not sure this would be appropriate replacement for the control
interface.. I want to be able to live without dependency on D-Bus here
and we already have a separate D-Bus interface. I was thinking more of
using SOCK_STREAM with AF_UNIX socket or SOCK_SEQPACKET for that matter.
Anyway, since these would result in having to modify all client
implementations, these are not really easy steps to take and if taken,
this could as well move to any other mechanism as long as it does not
come with unacceptable dependencies on external components.
Ben Greear Sept. 25, 2013, 9:28 p.m. UTC | #6
On 09/25/2013 01:21 PM, Jouni Malinen wrote:
> On Wed, Sep 25, 2013 at 10:58:08AM -0700, Ben Greear wrote:
>> Have you tried just increasing the socket buffer sizes?  That would probably
>> be easier and more efficient than keeping a pending queue in supplicant.
>
> I haven't, but I don't think it would help either. The messages sent by
> wpa_supplicant for clients that never receive them seem to be stuck
> indefinitely and making the limit larger will just push out the
> inevitable issue a bit further. If there were a way to make kernel drop
> these eventually on some timeout that would obviously be helpful, but
> I'm not sure what that would be for unconnected AF_UNIX datagram socket.

I have never used PF_UNIX much...but this piece of code in the kernel
looks interesting:

/* When dgram socket disconnects (or changes its peer), we clear its receive
  * queue of packets arrived from previous peer. First, it allows to do
  * flow control based only on wmem_alloc; second, sk connected to peer
  * may receive messages only from that peer. */
static void unix_dgram_disconnected(struct sock *sk, struct sock *other)
{
	if (!skb_queue_empty(&sk->sk_receive_queue)) {
		skb_queue_purge(&sk->sk_receive_queue);
		wake_up_interruptible_all(&unix_sk(sk)->peer_wait);

		/* If one link of bidirectional dgram pipe is disconnected,
		 * we signal error. Messages are lost. Do not make this,
		 * when peer was not connected to us.
		 */
		if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) == sk) {
			other->sk_err = ECONNRESET;
			other->sk_error_report(other);
		}
	}
}


It seems to me that it should go ahead and signal the other even if there
is nothing in the local rcv queue?  Sort of seems like someone would
have noticed this if it were a real problem though...

Ben
diff mbox

Patch

diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c
index 49489d6..2c67b9c 100644
--- a/wpa_supplicant/ctrl_iface_unix.c
+++ b/wpa_supplicant/ctrl_iface_unix.c
@@ -623,14 +623,21 @@  static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock,
 			msg.msg_name = (void *) &dst->addr;
 			msg.msg_namelen = dst->addrlen;
 			if (sendmsg(sock, &msg, MSG_DONTWAIT) < 0) {
-				int _errno = errno;
+				int _errno = errno, detach = 0;
 				wpa_printf(MSG_INFO, "CTRL_IFACE monitor[%d]: "
 					   "%d - %s",
 					   idx, errno, strerror(errno));
 				dst->errors++;
-				if (dst->errors > 1000 ||
-				    (_errno != ENOBUFS && dst->errors > 10) ||
-				    _errno == ENOENT) {
+				if (dst->errors > 1000 || _errno == ENOENT)
+					detach = 1;
+				if (!(_errno == ENOBUFS || _errno == EAGAIN ||
+				    _errno == EWOULDBLOCK) && dst->errors > 10)
+					detach = 1;
+				if (detach) {
+					wpa_printf(MSG_ERROR, "CTRL_IFACE "
+						   "monitor[%d]: Too many "
+						   "errors. Detaching. ",
+						   idx);
 					wpa_supplicant_ctrl_iface_detach(
 						ctrl_dst, &dst->addr,
 						dst->addrlen);