diff mbox

unix: fix use-after-free with unix_dgram_poll()

Message ID 20151002191039.9124420ED@prod-mail-relay10.akamai.com
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Jason Baron Oct. 2, 2015, 7:10 p.m. UTC
From: Jason Baron <jbaron@akamai.com>

The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
queue associated with the socket s that we've called poll() on, but it also
calls sock_poll_wait() for a remote peer socket's wait queue, if it's connected.
Thus, if we call poll()/select()/epoll() for the socket s, there are then
a couple of code paths in which the remote peer socket s2 and its associated
peer_wait queue can be freed before poll()/select()/epoll() have a chance
to remove themselves from this remote peer socket s2's wait queue.

The remote peer's socket and associated wait queues can be freed via:

1. If s calls connect() to connect to a new socket other than s2, it will
drop its reference on s2, and thus a close() on s2 will free it.

2. If we call close() on s2, then a subsequent sendmsg() from s, will drop
the final reference to s2, allowing it to be freed.

Address this issue, by reverting unix_dgram_poll() to only register with
the wait queue associated with s and simply drop the second sock_poll_wait()
registration for the remote peer socket wait queue. This then presents the
expected semantics to poll()/select()/epoll().

This works because we will continue to get POLLOUT wakeups from
unix_write_space(), which is called via sock_wfree(). In fact, we avoid having
two wakeup calls here for every buffer we read, since unix_dgram_recvmsg()
unconditionally calls wake_up_interruptible_sync_poll() on its 'peer_wait' queue
and we will no longer be in poll against that queue. So I think this should be
more performant than the current code. And we avoid the second poll() call here
as well during registration.

unix_write_space() should probably be enhanced such that it checks for the
unix_recvq_full() condition as well. In fact, it should probably look for
some fraction of that buffer being free, as is done in unix_writable(). But I'm
considering that a separate enhancement from fixing this issue.

I've tested this by specifically reproducing cases #1 and #2 above as well as
by running the test code here: https://lkml.org/lkml/2015/9/13/195

Signed-off-by: Jason Baron <jbaron@akamai.com>
---
 net/unix/af_unix.c | 1 -
 1 file changed, 1 deletion(-)

Comments

Rainer Weikusat Oct. 2, 2015, 7:30 p.m. UTC | #1
Jason Baron <jbaron@akamai.com> writes:
> From: Jason Baron <jbaron@akamai.com>
>
> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
> queue associated with the socket s that we've called poll() on, but it also
> calls sock_poll_wait() for a remote peer socket's wait queue, if it's connected.
> Thus, if we call poll()/select()/epoll() for the socket s, there are then
> a couple of code paths in which the remote peer socket s2 and its associated
> peer_wait queue can be freed before poll()/select()/epoll() have a chance
> to remove themselves from this remote peer socket s2's wait queue.

[...]

> This works because we will continue to get POLLOUT wakeups from
> unix_write_space(), which is called via sock_wfree().

As pointed out in my original comment, this doesn't work (as far as I
can/ could tell) because it will only wake up sockets which had a chance
to enqueue datagrams to the queue of the receiving socket as only
skbuffs enqueued there will be consumed. A socket which is really
waiting for space in the receiving queue won't ever be woken up in this
way.

Further, considering that you're demonstrably not interested in
debugging and fixing this issue (as you haven't even bothered to post
one of the test programs you claim to have), I'm beginning to wonder why
this tripe is being sent to me at all --- it's not "git on autopilot"
this time as someone took the time to dig up my current e-mail address
as the one in the original commit is not valid anymore. Could you please
refrain from such exercises in future unless a discussion is actually
intended?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rainer Weikusat Oct. 2, 2015, 7:49 p.m. UTC | #2
Rainer Weikusat <rw@doppelsaurus.mobileactivedefense.com> writes:
> Jason Baron <jbaron@akamai.com> writes:
>> From: Jason Baron <jbaron@akamai.com>
>>
>> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
>> queue associated with the socket s that we've called poll() on, but it also
>> calls sock_poll_wait() for a remote peer socket's wait queue, if it's connected.
>> Thus, if we call poll()/select()/epoll() for the socket s, there are then
>> a couple of code paths in which the remote peer socket s2 and its associated
>> peer_wait queue can be freed before poll()/select()/epoll() have a chance
>> to remove themselves from this remote peer socket s2's wait queue.
>
> [...]
>
>> This works because we will continue to get POLLOUT wakeups from
>> unix_write_space(), which is called via sock_wfree().
>
> As pointed out in my original comment, this doesn't work (as far as I
> can/ could tell) because it will only wake up sockets which had a chance
> to enqueue datagrams to the queue of the receiving socket as only
> skbuffs enqueued there will be consumed. A socket which is really
> waiting for space in the receiving queue won't ever be woken up in this
> way.

Program which shows that (on 3.2.54 + "local modification", with the 2nd
sock_poll_wait commented out):

---------------
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/poll.h>
#include <sys/wait.h>
#include <unistd.h>

int main(void)
{
    struct sockaddr_un sun;
    struct pollfd pfd;
    int tg, sk0, sk1, rc;
    char buf[16];

    sun.sun_family = AF_UNIX;
    
    tg = socket(AF_UNIX, SOCK_DGRAM, 0);
    strncpy(sun.sun_path, "/tmp/tg", sizeof(sun.sun_path));
    unlink(sun.sun_path);
    bind(tg, (struct sockaddr *)&sun, sizeof(sun));
    
    sk0 = socket(AF_UNIX, SOCK_DGRAM, 0);
    connect(sk0, (struct sockaddr *)&sun, sizeof(sun));
    
    sk1 = socket(AF_UNIX, SOCK_DGRAM, 0);
    connect(sk1, (struct sockaddr *)&sun, sizeof(sun));

    fcntl(sk0, F_SETFL, fcntl(sk0, F_GETFL) | O_NONBLOCK);
    fcntl(sk1, F_SETFL, fcntl(sk1, F_GETFL) | O_NONBLOCK);
    
    while (write(sk0, "bla", 3) != -1);

    if (fork() == 0) {
	pfd.fd = sk1;
	pfd.events = POLLOUT;
	rc = poll(&pfd, 1, -1);

	_exit(0);
    }
    
    sleep(3);
    read(tg, buf, sizeof(buf));
    wait(&rc);

    return 0;
}
------------

For me, this blocks forever while it should terminate as soon as the
datagram was read. Something else may have changed this behaviour in the
meantime, though.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Baron Oct. 2, 2015, 7:50 p.m. UTC | #3
On 10/02/2015 03:30 PM, Rainer Weikusat wrote:
> Jason Baron <jbaron@akamai.com> writes:
>> From: Jason Baron <jbaron@akamai.com>
>>
>> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
>> queue associated with the socket s that we've called poll() on, but it also
>> calls sock_poll_wait() for a remote peer socket's wait queue, if it's connected.
>> Thus, if we call poll()/select()/epoll() for the socket s, there are then
>> a couple of code paths in which the remote peer socket s2 and its associated
>> peer_wait queue can be freed before poll()/select()/epoll() have a chance
>> to remove themselves from this remote peer socket s2's wait queue.
> 
> [...]
> 
>> This works because we will continue to get POLLOUT wakeups from
>> unix_write_space(), which is called via sock_wfree().
> 
> As pointed out in my original comment, this doesn't work (as far as I
> can/ could tell) because it will only wake up sockets which had a chance
> to enqueue datagrams to the queue of the receiving socket as only
> skbuffs enqueued there will be consumed. A socket which is really
> waiting for space in the receiving queue won't ever be woken up in this
> way.

Ok, good point. I was hoping to avoid a more complex approach here. I think
then that the patch I posted in the previous thread on this would address
this concern. I will post it for review.

> 
> Further, considering that you're demonstrably not interested in
> debugging and fixing this issue (as you haven't even bothered to post
> one of the test programs you claim to have), I'm beginning to wonder why
> this tripe is being sent to me at all --- it's not "git on autopilot"
> this time as someone took the time to dig up my current e-mail address
> as the one in the original commit is not valid anymore. Could you please
> refrain from such exercises in future unless a discussion is actually
> intended?
> 
> 

Just trying to help fix this.

Thanks,

-Jason

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rainer Weikusat Oct. 2, 2015, 8:11 p.m. UTC | #4
Jason Baron <jbaron@akamai.com> writes:
> On 10/02/2015 03:30 PM, Rainer Weikusat wrote:
>> Jason Baron <jbaron@akamai.com> writes:
>>> From: Jason Baron <jbaron@akamai.com>
>>>
>>> The unix_dgram_poll() routine calls sock_poll_wait() not only for the wait
>>> queue associated with the socket s that we've called poll() on, but it also
>>> calls sock_poll_wait() for a remote peer socket's wait queue, if it's connected.
>>> Thus, if we call poll()/select()/epoll() for the socket s, there are then
>>> a couple of code paths in which the remote peer socket s2 and its associated
>>> peer_wait queue can be freed before poll()/select()/epoll() have a chance
>>> to remove themselves from this remote peer socket s2's wait queue.
>> 
>> [...]
>> 
>>> This works because we will continue to get POLLOUT wakeups from
>>> unix_write_space(), which is called via sock_wfree().
>> 
>> As pointed out in my original comment, this doesn't work (as far as I
>> can/ could tell) because it will only wake up sockets which had a chance
>> to enqueue datagrams to the queue of the receiving socket as only
>> skbuffs enqueued there will be consumed. A socket which is really
>> waiting for space in the receiving queue won't ever be woken up in this
>> way.
>
> Ok, good point. I was hoping to avoid a more complex approach here. I think
> then that the patch I posted in the previous thread on this would address
> this concern. I will post it for review.

Some comments on that: From what I remember, this introduced another
wait queue solely for "peer events" in the connecting socket and
enqueued it there on connect. I think this should use the peer_wait
queue because that's what its purpose seems to be and it should also
only be put onto this wait queue if it's actually interested, similar to
the way this is handled in unix_dgram_sendmsg (via
unix_wait_for_peer). But this (likely) implies it would be necessary to
get rid of the second registration in unix_dgram_disconnected (which
gets called if a datagram socket disconnects from another) which may not
be feasible.

Insofar this stays an issue, I'll put more work into this but right now,
my "work" (as in "stuff I'm supposed to do for the people who pay me")
priorities are something rather different.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 03ee4d3..c1ae595 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2441,7 +2441,6 @@  static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 	other = unix_peer_get(sk);
 	if (other) {
 		if (unix_peer(other) != sk) {
-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
 			if (unix_recvq_full(other))
 				writable = 0;
 		}