diff mbox

unix: avoid use-after-free in ep_remove_wait_queue

Message ID 877flp34fl.fsf@doppelsaurus.mobileactivedefense.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Rainer Weikusat Nov. 10, 2015, 9:55 p.m. UTC
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
---

	- use wait_queue_t passed as argument to _relay

        - fix possible deadlock and logic error in _dgram_sendmsg by
          straightening the control flow ("spaghetti code considered
          confusing")

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Hannes Frederic Sowa Nov. 11, 2015, 12:28 p.m. UTC | #1
Hello,

On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote:
> An AF_UNIX datagram socket being the client in an n:1 association with
> some server socket is only allowed to send messages to the server if the
> receive queue of this socket contains at most sk_max_ack_backlog
> datagrams. This implies that prospective writers might be forced to go
> to sleep despite none of the message presently enqueued on the server
> receive queue were sent by them. In order to ensure that these will be
> woken up once space becomes again available, the present unix_dgram_poll
> routine does a second sock_poll_wait call with the peer_wait wait queue
> of the server socket as queue argument (unix_dgram_recvmsg does a wake
> up on this queue after a datagram was received). This is inherently
> problematic because the server socket is only guaranteed to remain alive
> for as long as the client still holds a reference to it. In case the
> connection is dissolved via connect or by the dead peer detection logic
> in unix_dgram_sendmsg, the server socket may be freed despite "the
> polling mechanism" (in particular, epoll) still has a pointer to the
> corresponding peer_wait queue. There's no way to forcibly deregister a
> wait queue with epoll.
> 
> Based on an idea by Jason Baron, the patch below changes the code such
> that a wait_queue_t belonging to the client socket is enqueued on the
> peer_wait queue of the server whenever the peer receive queue full
> condition is detected by either a sendmsg or a poll. A wake up on the
> peer queue is then relayed to the ordinary wait queue of the client
> socket via wake function. The connection to the peer wait queue is again
> dissolved if either a wake up is about to be relayed or the client
> socket reconnects or a dead peer is detected or the client socket is
> itself closed. This enables removing the second sock_poll_wait from
> unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
> that no blocked writer sleeps forever.

This whole patch seems pretty complicated to me.

Can't we just remove the unix_recvq_full checks alltogether and unify
unix_dgram_poll with unix_poll?

If we want to be cautious we could simply make unix_max_dgram_qlen limit
the number of skbs which are in flight from a sending socket. The skb
destructor can then decrement this. This seems much simpler.

Would this work?

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rainer Weikusat Nov. 11, 2015, 4:12 p.m. UTC | #2
Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote:
>> An AF_UNIX datagram socket being the client in an n:1 association with
>> some server socket is only allowed to send messages to the server if the
>> receive queue of this socket contains at most sk_max_ack_backlog
>> datagrams.

[...]

> This whole patch seems pretty complicated to me.
>
> Can't we just remove the unix_recvq_full checks alltogether and unify
> unix_dgram_poll with unix_poll?
>
> If we want to be cautious we could simply make unix_max_dgram_qlen limit
> the number of skbs which are in flight from a sending socket. The skb
> destructor can then decrement this. This seems much simpler.
>
> Would this work?

In the way this is intended to work, cf

http://marc.info/?t=115627606000002&r=1&w=2

only if the limit would also apply to sockets which didn't sent anything
so far. Which means it'll end up in the exact same situation as before:
Sending something using a certain socket may not be possible because of
data sent by other sockets, so either, code trying to send using this
sockets ends up busy-waiting for "space again available" despite it's
trying to use select/ poll/ epolll/ $whatnot to get notified of this
condition and sleep until then or this notification needs to be
propagated to sleeping threads which didn't get to send anything yet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Baron Nov. 11, 2015, 5:35 p.m. UTC | #3
Hi Rainer,

> +
> +/* Needs sk unix state lock. After recv_ready indicated not ready,
> + * establish peer_wait connection if still needed.
> + */
> +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
> +{
> +	int connected;
> +
> +	connected = unix_dgram_peer_wake_connect(sk, other);
> +
> +	if (unix_recvq_full(other))
> +		return 1;
> +
> +	if (connected)
> +		unix_dgram_peer_wake_disconnect(sk, other);
> +
> +	return 0;
> +}
> +

So the comment above this function says 'needs unix state lock', however
the usage in unix_dgram_sendmsg() has the 'other' lock, while the usage
in unix_dgram_poll() has the 'sk' lock. So this looks racy.

Also, another tweak on this scheme: Instead of calling
'__remove_wait_queue()' in unix_dgram_peer_wake_relay(). We could
instead simply mark each item in the queue as 'WQ_FLAG_EXCLUSIVE'. Then,
since 'unix_dgram_recvmsg()' does an exclusive wakeup the queue has
effectively been disabled (minus the first exlusive item in the list
which can just return if its marked exclusive). This means that in
dgram_poll(), we add to the list if we have not yet been added, and if
we are on the list, we do a remove and then add removing the exclusive
flag. Thus, all the waiters that need a wakeup are at the beginning of
the queue, and the disabled ones are at the end with the
'WQ_FLAG_EXCLUSIVE' flag set.

This does make the list potentially long, but if we only walk it to the
point we are doing the wakeup, it has no impact. I like the fact that in
this scheme the wakeup doesn't have to call remove against a long of
waiters - its just setting the exclusive flag.

Thanks,

-Jason





>  static inline int unix_writable(struct sock *sk)
>  {
>  	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
> @@ -430,6 +546,8 @@ static void unix_release_sock(struct sock *sk, int embrion)
>  			skpair->sk_state_change(skpair);
>  			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
>  		}
> +
> +		unix_dgram_peer_wake_disconnect(sk, skpair);
>  		sock_put(skpair); /* It may now die */
>  		unix_peer(sk) = NULL;
>  	}
> @@ -664,6 +782,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
>  	INIT_LIST_HEAD(&u->link);
>  	mutex_init(&u->readlock); /* single task reading lock */
>  	init_waitqueue_head(&u->peer_wait);
> +	init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
>  	unix_insert_socket(unix_sockets_unbound(sk), sk);
>  out:
>  	if (sk == NULL)
> @@ -1031,6 +1150,13 @@ restart:
>  	if (unix_peer(sk)) {
>  		struct sock *old_peer = unix_peer(sk);
>  		unix_peer(sk) = other;
> +
> +		if (unix_dgram_peer_wake_disconnect(sk, other))
> +			wake_up_interruptible_poll(sk_sleep(sk),
> +						   POLLOUT |
> +						   POLLWRNORM |
> +						   POLLWRBAND);
> +
>  		unix_state_double_unlock(sk, other);
>  
>  		if (other != old_peer)
> @@ -1565,6 +1691,13 @@ restart:
>  		unix_state_lock(sk);
>  		if (unix_peer(sk) == other) {
>  			unix_peer(sk) = NULL;
> +
> +			if (unix_dgram_peer_wake_disconnect(sk, other))
> +				wake_up_interruptible_poll(sk_sleep(sk),
> +							   POLLOUT |
> +							   POLLWRNORM |
> +							   POLLWRBAND);
> +
>  			unix_state_unlock(sk);
>  
>  			unix_dgram_disconnected(sk, other);
> @@ -1590,19 +1723,21 @@ restart:
>  			goto out_unlock;
>  	}
>  
> -	if (unix_peer(other) != sk && unix_recvq_full(other)) {
> -		if (!timeo) {
> -			err = -EAGAIN;
> -			goto out_unlock;
> -		}
> +	if (!unix_dgram_peer_recv_ready(sk, other)) {
> +		if (timeo) {
> +			timeo = unix_wait_for_peer(other, timeo);
>  
> -		timeo = unix_wait_for_peer(other, timeo);
> +			err = sock_intr_errno(timeo);
> +			if (signal_pending(current))
> +				goto out_free;
>  
> -		err = sock_intr_errno(timeo);
> -		if (signal_pending(current))
> -			goto out_free;
> +			goto restart;
> +		}
>  
> -		goto restart;
> +		if (unix_dgram_peer_wake_me(sk, other)) {
> +			err = -EAGAIN;
> +			goto out_unlock;
> +		}
>  	}
>  
>  	if (sock_flag(other, SOCK_RCVTSTAMP))
> @@ -2453,14 +2588,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
>  		return mask;
>  
>  	writable = unix_writable(sk);
> -	other = unix_peer_get(sk);
> -	if (other) {
> -		if (unix_peer(other) != sk) {
> -			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
> -			if (unix_recvq_full(other))
> -				writable = 0;
> -		}
> -		sock_put(other);
> +	if (writable) {
> +		unix_state_lock(sk);
> +
> +		other = unix_peer(sk);
> +		if (other &&
> +		    !unix_dgram_peer_recv_ready(sk, other) &&
> +		    unix_dgram_peer_wake_me(sk, other))
> +			writable = 0;
> +
> +		unix_state_unlock(sk);
>  	}
>  
>  	if (writable)
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Nov. 11, 2015, 6:52 p.m. UTC | #4
Hi,

On Wed, Nov 11, 2015, at 17:12, Rainer Weikusat wrote:
> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> > On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote:
> >> An AF_UNIX datagram socket being the client in an n:1 association with
> >> some server socket is only allowed to send messages to the server if the
> >> receive queue of this socket contains at most sk_max_ack_backlog
> >> datagrams.
> 
> [...]
> 
> > This whole patch seems pretty complicated to me.
> >
> > Can't we just remove the unix_recvq_full checks alltogether and unify
> > unix_dgram_poll with unix_poll?
> >
> > If we want to be cautious we could simply make unix_max_dgram_qlen limit
> > the number of skbs which are in flight from a sending socket. The skb
> > destructor can then decrement this. This seems much simpler.
> >
> > Would this work?
> 
> In the way this is intended to work, cf
> 
> http://marc.info/?t=115627606000002&r=1&w=2

Oh, I see, we don't limit closed but still referenced sockets. This
actually makes sense on how fd handling is implemented, just as a range
check.

Have you checked if we can somehow deregister the socket in the poll
event framework? You wrote that it does not provide such a function but
maybe it would be easy to add?

Thanks,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rainer Weikusat Nov. 12, 2015, 7:11 p.m. UTC | #5
Jason Baron <jbaron@akamai.com> writes:
>> +
>> +/* Needs sk unix state lock. After recv_ready indicated not ready,
>> + * establish peer_wait connection if still needed.
>> + */
>> +static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
>> +{
>> +	int connected;
>> +
>> +	connected = unix_dgram_peer_wake_connect(sk, other);
>> +
>> +	if (unix_recvq_full(other))
>> +		return 1;
>> +
>> +	if (connected)
>> +		unix_dgram_peer_wake_disconnect(sk, other);
>> +
>> +	return 0;
>> +}
>> +
>
> So the comment above this function says 'needs unix state lock', however
> the usage in unix_dgram_sendmsg() has the 'other' lock, while the usage
> in unix_dgram_poll() has the 'sk' lock. So this looks racy.

That's one thing which is broken with this patch. Judging from a 'quick'
look at the _dgram_sendmsg code, the unix_state_lock(other) will need to
be turned into a unix_state_double_lock(sk, other) and the remaining
code changed accordingly (since all of the checks must be done without
unlocking other). 

There's also something else seriously wrong with the present patch: Some
code in unix_dgram_connect presently (with this change) looks like this:

        /*
         * If it was connected, reconnect.
         */
        if (unix_peer(sk)) {
                struct sock *old_peer = unix_peer(sk);
                unix_peer(sk) = other;

                if (unix_dgram_peer_wake_disconnect(sk, other))
                        wake_up_interruptible_poll(sk_sleep(sk),
                                                   POLLOUT |
                                                   POLLWRNORM |
                                                   POLLWRBAND);

                unix_state_double_unlock(sk, other);

                if (other != old_peer)
                        unix_dgram_disconnected(sk, old_peer);
                sock_put(old_peer);

and trying to disconnect from a peer the socket is just being
connected to is - of course - "flowering tomfoolery" (literal
translation of the German "bluehender Bloedsinn") --- it needs to
disconnect from old_peer instead.

I'll address the suggestion and send an updated patch "later today" (may
become "early tomorrow"). I have some code addressing both issues but
that's part of a release of 'our' kernel fork, ie, 3.2.54-based I'll
need to do 'soon'.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rainer Weikusat Nov. 13, 2015, 7:06 p.m. UTC | #6
Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
> On Wed, Nov 11, 2015, at 17:12, Rainer Weikusat wrote:
>> Hannes Frederic Sowa <hannes@stressinduktion.org> writes:
>> > On Tue, Nov 10, 2015, at 22:55, Rainer Weikusat wrote:
>> >> An AF_UNIX datagram socket being the client in an n:1 association with
>> >> some server socket is only allowed to send messages to the server if the
>> >> receive queue of this socket contains at most sk_max_ack_backlog
>> >> datagrams.
>> 
>> [...]
>> 
>> > This whole patch seems pretty complicated to me.
>> >
>> > Can't we just remove the unix_recvq_full checks alltogether and unify
>> > unix_dgram_poll with unix_poll?
>> >
>> > If we want to be cautious we could simply make unix_max_dgram_qlen limit
>> > the number of skbs which are in flight from a sending socket. The skb
>> > destructor can then decrement this. This seems much simpler.
>> >
>> > Would this work?
>> 
>> In the way this is intended to work, cf
>> 
>> http://marc.info/?t=115627606000002&r=1&w=2
>
> Oh, I see, we don't limit closed but still referenced sockets. This
> actually makes sense on how fd handling is implemented, just as a range
> check.
>
> Have you checked if we can somehow deregister the socket in the poll
> event framework? You wrote that it does not provide such a function but
> maybe it would be easy to add?

I thought about this but this would amount to adding a general interface
for the sole purpose of enabling the af_unix code to talk to the
eventpoll code and I don't really like this idea: IMHO, there should be
at least two users (preferably three) before creating any kind of
'abstract interface'. An even more ideal "castle in the air"
(hypothetical) solution would be "change the eventpoll.c code such that
it won't be affected if a wait queue just goes away". That's at least
theoretically possible (although it might not be in practice).

I wouldn't mind doing that (assuming it was possible) if it was just
for the kernels my employer uses because I'm aware of the uses these
will be put to and in control of the corresponding userland code. But
for "general Linux code", changing epoll in order to help the af_unix
code is more potential trouble than it's worth: Exchanging a relatively
unimportant bug in some module for a much more visibly damaging bug in a
central facility would be a bad tradeoff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index b36d837..2a91a05 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -62,6 +62,7 @@  struct unix_sock {
 #define UNIX_GC_CANDIDATE	0
 #define UNIX_GC_MAYBE_CYCLE	1
 	struct socket_wq	peer_wq;
+	wait_queue_t		peer_wake;
 };
 
 static inline struct unix_sock *unix_sk(const struct sock *sk)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 94f6582..4297d8e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -326,6 +326,122 @@  found:
 	return s;
 }
 
+/* Support code for asymmetrically connected dgram sockets
+ *
+ * If a datagram socket is connected to a socket not itself connected
+ * to the first socket (eg, /dev/log), clients may only enqueue more
+ * messages if the present receive queue of the server socket is not
+ * "too large". This means there's a second writeability condition
+ * poll and sendmsg need to test. The dgram recv code will do a wake
+ * up on the peer_wait wait queue of a socket upon reception of a
+ * datagram which needs to be propagated to sleeping would-be writers
+ * since these might not have sent anything so far. This can't be
+ * accomplished via poll_wait because the lifetime of the server
+ * socket might be less than that of its clients if these break their
+ * association with it or if the server socket is closed while clients
+ * are still connected to it and there's no way to inform "a polling
+ * implementation" that it should let go of a certain wait queue
+ *
+ * In order to propagate a wake up, a wait_queue_t of the client
+ * socket is enqueued on the peer_wait queue of the server socket
+ * whose wake function does a wake_up on the ordinary client socket
+ * wait queue. This connection is established whenever a write (or
+ * poll for write) hit the flow control condition and broken when the
+ * association to the server socket is dissolved or after a wake up
+ * was relayed.
+ */
+
+static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
+				      void *key)
+{
+	struct unix_sock *u;
+	wait_queue_head_t *u_sleep;
+
+	u = container_of(q, struct unix_sock, peer_wake);
+
+	__remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
+			    q);
+	u->peer_wake.private = NULL;
+
+	/* relaying can only happen while the wq still exists */
+	u_sleep = sk_sleep(&u->sk);
+	if (u_sleep)
+		wake_up_interruptible_poll(u_sleep, key);
+
+	return 0;
+}
+
+static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+	int rc;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	rc = 0;
+
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (!u->peer_wake.private) {
+		u->peer_wake.private = other;
+		__add_wait_queue(&u_other->peer_wait, &u->peer_wake);
+
+		rc = 1;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+	return rc;
+}
+
+static int unix_dgram_peer_wake_disconnect(struct sock *sk, struct sock *other)
+{
+	struct unix_sock *u, *u_other;
+	int rc;
+
+	u = unix_sk(sk);
+	u_other = unix_sk(other);
+	rc = 0;
+
+	spin_lock(&u_other->peer_wait.lock);
+
+	if (u->peer_wake.private == other) {
+		__remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
+		u->peer_wake.private = NULL;
+
+		rc = 1;
+	}
+
+	spin_unlock(&u_other->peer_wait.lock);
+	return rc;
+}
+
+/* Needs sk unix state lock. Otherwise lockless check if data can (likely)
+ * be sent.
+ */
+static int unix_dgram_peer_recv_ready(struct sock *sk,
+				      struct sock *other)
+{
+	return unix_peer(other) == sk || !unix_recvq_full(other);
+}
+
+/* Needs sk unix state lock. After recv_ready indicated not ready,
+ * establish peer_wait connection if still needed.
+ */
+static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
+{
+	int connected;
+
+	connected = unix_dgram_peer_wake_connect(sk, other);
+
+	if (unix_recvq_full(other))
+		return 1;
+
+	if (connected)
+		unix_dgram_peer_wake_disconnect(sk, other);
+
+	return 0;
+}
+
 static inline int unix_writable(struct sock *sk)
 {
 	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
@@ -430,6 +546,8 @@  static void unix_release_sock(struct sock *sk, int embrion)
 			skpair->sk_state_change(skpair);
 			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
 		}
+
+		unix_dgram_peer_wake_disconnect(sk, skpair);
 		sock_put(skpair); /* It may now die */
 		unix_peer(sk) = NULL;
 	}
@@ -664,6 +782,7 @@  static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
 	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->readlock); /* single task reading lock */
 	init_waitqueue_head(&u->peer_wait);
+	init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
 	unix_insert_socket(unix_sockets_unbound(sk), sk);
 out:
 	if (sk == NULL)
@@ -1031,6 +1150,13 @@  restart:
 	if (unix_peer(sk)) {
 		struct sock *old_peer = unix_peer(sk);
 		unix_peer(sk) = other;
+
+		if (unix_dgram_peer_wake_disconnect(sk, other))
+			wake_up_interruptible_poll(sk_sleep(sk),
+						   POLLOUT |
+						   POLLWRNORM |
+						   POLLWRBAND);
+
 		unix_state_double_unlock(sk, other);
 
 		if (other != old_peer)
@@ -1565,6 +1691,13 @@  restart:
 		unix_state_lock(sk);
 		if (unix_peer(sk) == other) {
 			unix_peer(sk) = NULL;
+
+			if (unix_dgram_peer_wake_disconnect(sk, other))
+				wake_up_interruptible_poll(sk_sleep(sk),
+							   POLLOUT |
+							   POLLWRNORM |
+							   POLLWRBAND);
+
 			unix_state_unlock(sk);
 
 			unix_dgram_disconnected(sk, other);
@@ -1590,19 +1723,21 @@  restart:
 			goto out_unlock;
 	}
 
-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
-		if (!timeo) {
-			err = -EAGAIN;
-			goto out_unlock;
-		}
+	if (!unix_dgram_peer_recv_ready(sk, other)) {
+		if (timeo) {
+			timeo = unix_wait_for_peer(other, timeo);
 
-		timeo = unix_wait_for_peer(other, timeo);
+			err = sock_intr_errno(timeo);
+			if (signal_pending(current))
+				goto out_free;
 
-		err = sock_intr_errno(timeo);
-		if (signal_pending(current))
-			goto out_free;
+			goto restart;
+		}
 
-		goto restart;
+		if (unix_dgram_peer_wake_me(sk, other)) {
+			err = -EAGAIN;
+			goto out_unlock;
+		}
 	}
 
 	if (sock_flag(other, SOCK_RCVTSTAMP))
@@ -2453,14 +2588,16 @@  static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 		return mask;
 
 	writable = unix_writable(sk);
-	other = unix_peer_get(sk);
-	if (other) {
-		if (unix_peer(other) != sk) {
-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
-			if (unix_recvq_full(other))
-				writable = 0;
-		}
-		sock_put(other);
+	if (writable) {
+		unix_state_lock(sk);
+
+		other = unix_peer(sk);
+		if (other &&
+		    !unix_dgram_peer_recv_ready(sk, other) &&
+		    unix_dgram_peer_wake_me(sk, other))
+			writable = 0;
+
+		unix_state_unlock(sk);
 	}
 
 	if (writable)