Message ID | 1379924225-6699-1-git-send-email-pontus.fuchs@gmail.com |
---|---|
State | Changes Requested |
Headers | show |
On Mon, Sep 23, 2013 at 10:17:05AM +0200, Pontus Fuchs wrote: > If a burst of events are sent, the socket queue can overflow and > sendmsg fails with EAGAIN. When this happens the monitor is silently > detached. > diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c > @@ -623,14 +623,21 @@ static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock, > - if (dst->errors > 1000 || > - (_errno != ENOBUFS && dst->errors > 10) || > - _errno == ENOENT) { > + if (dst->errors > 1000 || _errno == ENOENT) > + detach = 1; > + if (!(_errno == ENOBUFS || _errno == EAGAIN || > + _errno == EWOULDBLOCK) && dst->errors > 10) > + detach = 1; I'm seeing quite bad issues with the control interface sockets reaching the maximum send buffer limit and getting stuck in the previous design. This change could make this happen even more easily, so I don't think I can apply this now. I pushed a workaround patch to recover from this state. It is not really nice design, but well, no easy solution for the real issue has shown up yet and it may be necessary to do a much larger design change to address this cleanly using connection oriented sockets rather than a single unconnected socket. I would also welcome a patch to figure out more reliably if a client of the ctrl_iface socket is not available to reduce the likelihood of messages being left in the send buffer (or a timeout to get rid of those pending messages somehow if that is possible with a single socket). I did not see any obvious way of doing this without using multiple sockets and potentially changes to the control interface that would not necessarily be backwards compatible (i.e., without requiring all clients to be rebuilt in addition to modifying wpa_supplicant).
On Wed, 2013-09-25 at 16:32 +0300, Jouni Malinen wrote: > On Mon, Sep 23, 2013 at 10:17:05AM +0200, Pontus Fuchs wrote: > > If a burst of events are sent, the socket queue can overflow and > > sendmsg fails with EAGAIN. When this happens the monitor is silently > > detached. > > > diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c > > @@ -623,14 +623,21 @@ static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock, > > - if (dst->errors > 1000 || > > - (_errno != ENOBUFS && dst->errors > 10) || > > - _errno == ENOENT) { > > + if (dst->errors > 1000 || _errno == ENOENT) > > + detach = 1; > > + if (!(_errno == ENOBUFS || _errno == EAGAIN || > > + _errno == EWOULDBLOCK) && dst->errors > 10) > > + detach = 1; > > I'm seeing quite bad issues with the control interface sockets reaching > the maximum send buffer limit and getting stuck in the previous design. > This change could make this happen even more easily, so I don't think I > can apply this now. I pushed a workaround patch to recover from this > state. It is not really nice design, but well, no easy solution for the > real issue has shown up yet and it may be necessary to do a much larger > design change to address this cleanly using connection oriented sockets > rather than a single unconnected socket. D-Bus! :) </rim shot> You don't even need a dbus-daemon process running to use the D-Bus protocol as long as you enforce authorization by checking the socket peer credentials. All you need is libdbus. If there's interest, that could certainly be implemented in the supplicant; typically it's just a private socket somewhere in /var. Dan
On 09/25/2013 06:32 AM, Jouni Malinen wrote: > On Mon, Sep 23, 2013 at 10:17:05AM +0200, Pontus Fuchs wrote: >> If a burst of events are sent, the socket queue can overflow and >> sendmsg fails with EAGAIN. When this happens the monitor is silently >> detached. > >> diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c >> @@ -623,14 +623,21 @@ static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock, >> - if (dst->errors > 1000 || >> - (_errno != ENOBUFS && dst->errors > 10) || >> - _errno == ENOENT) { >> + if (dst->errors > 1000 || _errno == ENOENT) >> + detach = 1; >> + if (!(_errno == ENOBUFS || _errno == EAGAIN || >> + _errno == EWOULDBLOCK) && dst->errors > 10) >> + detach = 1; > > I'm seeing quite bad issues with the control interface sockets reaching > the maximum send buffer limit and getting stuck in the previous design. > This change could make this happen even more easily, so I don't think I > can apply this now. I pushed a workaround patch to recover from this > state. It is not really nice design, but well, no easy solution for the > real issue has shown up yet and it may be necessary to do a much larger > design change to address this cleanly using connection oriented sockets > rather than a single unconnected socket. Have you tried just increasing the socket buffer sizes? That would probably be easier and more efficient than keeping a pending queue in supplicant. Thanks, Ben
On Wed, Sep 25, 2013 at 10:58:08AM -0700, Ben Greear wrote: > Have you tried just increasing the socket buffer sizes? That would probably > be easier and more efficient than keeping a pending queue in supplicant. I haven't, but I don't think it would help either. The messages sent by wpa_supplicant for clients that never receive them seem to be stuck indefinitely and making the limit larger will just push out the inevitable issue a bit further. If there were a way to make kernel drop these eventually on some timeout that would obviously be helpful, but I'm not sure what that would be for unconnected AF_UNIX datagram socket.
On Wed, Sep 25, 2013 at 09:30:30AM -0500, Dan Williams wrote: > D-Bus! :) </rim shot> > > You don't even need a dbus-daemon process running to use the D-Bus > protocol as long as you enforce authorization by checking the socket > peer credentials. All you need is libdbus. If there's interest, that > could certainly be implemented in the supplicant; typically it's just a > private socket somewhere in /var. I'm not sure this would be appropriate replacement for the control interface.. I want to be able to live without dependency on D-Bus here and we already have a separate D-Bus interface. I was thinking more of using SOCK_STREAM with AF_UNIX socket or SOCK_SEQPACKET for that matter. Anyway, since these would result in having to modify all client implementations, these are not really easy steps to take and if taken, this could as well move to any other mechanism as long as it does not come with unacceptable dependencies on external components.
On 09/25/2013 01:21 PM, Jouni Malinen wrote: > On Wed, Sep 25, 2013 at 10:58:08AM -0700, Ben Greear wrote: >> Have you tried just increasing the socket buffer sizes? That would probably >> be easier and more efficient than keeping a pending queue in supplicant. > > I haven't, but I don't think it would help either. The messages sent by > wpa_supplicant for clients that never receive them seem to be stuck > indefinitely and making the limit larger will just push out the > inevitable issue a bit further. If there were a way to make kernel drop > these eventually on some timeout that would obviously be helpful, but > I'm not sure what that would be for unconnected AF_UNIX datagram socket. I have never used PF_UNIX much...but this piece of code in the kernel looks interesting: /* When dgram socket disconnects (or changes its peer), we clear its receive * queue of packets arrived from previous peer. First, it allows to do * flow control based only on wmem_alloc; second, sk connected to peer * may receive messages only from that peer. */ static void unix_dgram_disconnected(struct sock *sk, struct sock *other) { if (!skb_queue_empty(&sk->sk_receive_queue)) { skb_queue_purge(&sk->sk_receive_queue); wake_up_interruptible_all(&unix_sk(sk)->peer_wait); /* If one link of bidirectional dgram pipe is disconnected, * we signal error. Messages are lost. Do not make this, * when peer was not connected to us. */ if (!sock_flag(other, SOCK_DEAD) && unix_peer(other) == sk) { other->sk_err = ECONNRESET; other->sk_error_report(other); } } } It seems to me that it should go ahead and signal the other even if there is nothing in the local rcv queue? Sort of seems like someone would have noticed this if it were a real problem though... Ben
diff --git a/wpa_supplicant/ctrl_iface_unix.c b/wpa_supplicant/ctrl_iface_unix.c index 49489d6..2c67b9c 100644 --- a/wpa_supplicant/ctrl_iface_unix.c +++ b/wpa_supplicant/ctrl_iface_unix.c @@ -623,14 +623,21 @@ static void wpa_supplicant_ctrl_iface_send(const char *ifname, int sock, msg.msg_name = (void *) &dst->addr; msg.msg_namelen = dst->addrlen; if (sendmsg(sock, &msg, MSG_DONTWAIT) < 0) { - int _errno = errno; + int _errno = errno, detach = 0; wpa_printf(MSG_INFO, "CTRL_IFACE monitor[%d]: " "%d - %s", idx, errno, strerror(errno)); dst->errors++; - if (dst->errors > 1000 || - (_errno != ENOBUFS && dst->errors > 10) || - _errno == ENOENT) { + if (dst->errors > 1000 || _errno == ENOENT) + detach = 1; + if (!(_errno == ENOBUFS || _errno == EAGAIN || + _errno == EWOULDBLOCK) && dst->errors > 10) + detach = 1; + if (detach) { + wpa_printf(MSG_ERROR, "CTRL_IFACE " + "monitor[%d]: Too many " + "errors. Detaching. ", + idx); wpa_supplicant_ctrl_iface_detach( ctrl_dst, &dst->addr, dst->addrlen);