diff mbox

Crash with SO_REUSEPORT and ef456144da8ef507c8cf504284b6042e9201a05c

Message ID CAB9dFdugvFvdQLGAU83CQHYbwaWCMHrAm5Xk746iwMG5g94PqQ@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Marc Dionne Jan. 19, 2016, 3:57 p.m. UTC
I shared this one with Craig but I thought I'd put it out to a wider audience.

Trying to run the current kernel mainline on a test system I found
that any attempt to run many of our executables would crash the
system.  The networking code in all of these opens and listens on
multiple UDP sockets set with SO_REUSEPORT.  We also like to bind the
first socket before setting SO_REUSEPORT so we can catch some cases
where the port is actually in use by someone else (for instance a
previous incarnation of the same service).

This is easily reproduced with this sequence:
- create 2 sockets A and B
- bind socket A to an address
- set SO_REUSEPORT on socket A
- set SO_REUSEPORT on socket B
- bind socket B to the same address as A

The sk_reuseport_cb structure is only allocated at bind time if
SO_REUSEPORT is already set, so A doesn't have one.  When we bind B, A
is found as a match that has SO_REUSEPORT and reuseport_add_sock will
try to use the NULL sk_reuseport_cb structure from A, causing a crash.

Not sure what the best fix is, but seems like the structure could be
either allocated (if not already done) when setting SO_REUSEPORT, or
when we find it to be NULL in reuseport_add_sock (but locking may be
an issue there).  I was able to test that allocating sk_reuseport_cb
when setting SO_REUSEPORT makes things behave normally again; see
attached patch.  That's surely not a correct/complete fix as B (in the
scenario above) will have an unnecessary sk_reuseport_cb which will
trigger a warning and should be dealt with.

Thanks,
Marc

Comments

Marc Dionne Jan. 19, 2016, 4:04 p.m. UTC | #1
Resent with correct address for Eric.

On Tue, Jan 19, 2016 at 11:57 AM, Marc Dionne <marc.c.dionne@gmail.com> wrote:
> I shared this one with Craig but I thought I'd put it out to a wider audience.
>
> Trying to run the current kernel mainline on a test system I found
> that any attempt to run many of our executables would crash the
> system.  The networking code in all of these opens and listens on
> multiple UDP sockets set with SO_REUSEPORT.  We also like to bind the
> first socket before setting SO_REUSEPORT so we can catch some cases
> where the port is actually in use by someone else (for instance a
> previous incarnation of the same service).
>
> This is easily reproduced with this sequence:
> - create 2 sockets A and B
> - bind socket A to an address
> - set SO_REUSEPORT on socket A
> - set SO_REUSEPORT on socket B
> - bind socket B to the same address as A
>
> The sk_reuseport_cb structure is only allocated at bind time if
> SO_REUSEPORT is already set, so A doesn't have one.  When we bind B, A
> is found as a match that has SO_REUSEPORT and reuseport_add_sock will
> try to use the NULL sk_reuseport_cb structure from A, causing a crash.
>
> Not sure what the best fix is, but seems like the structure could be
> either allocated (if not already done) when setting SO_REUSEPORT, or
> when we find it to be NULL in reuseport_add_sock (but locking may be
> an issue there).  I was able to test that allocating sk_reuseport_cb
> when setting SO_REUSEPORT makes things behave normally again; see
> attached patch.  That's surely not a correct/complete fix as B (in the
> scenario above) will have an unnecessary sk_reuseport_cb which will
> trigger a warning and should be dealt with.
>
> Thanks,
> Marc
Eric Dumazet Jan. 19, 2016, 4:14 p.m. UTC | #2
On Tue, 2016-01-19 at 11:57 -0400, Marc Dionne wrote:
> I shared this one with Craig but I thought I'd put it out to a wider audience.
> 
> Trying to run the current kernel mainline on a test system I found
> that any attempt to run many of our executables would crash the
> system.  The networking code in all of these opens and listens on
> multiple UDP sockets set with SO_REUSEPORT.  We also like to bind the
> first socket before setting SO_REUSEPORT so we can catch some cases
> where the port is actually in use by someone else (for instance a
> previous incarnation of the same service).
> 
> This is easily reproduced with this sequence:
> - create 2 sockets A and B
> - bind socket A to an address
> - set SO_REUSEPORT on socket A
> - set SO_REUSEPORT on socket B
> - bind socket B to the same address as A
> 
> The sk_reuseport_cb structure is only allocated at bind time if
> SO_REUSEPORT is already set, so A doesn't have one.  When we bind B, A
> is found as a match that has SO_REUSEPORT and reuseport_add_sock will
> try to use the NULL sk_reuseport_cb structure from A, causing a crash.
> 
> Not sure what the best fix is, but seems like the structure could be
> either allocated (if not already done) when setting SO_REUSEPORT, or
> when we find it to be NULL in reuseport_add_sock (but locking may be
> an issue there).  I was able to test that allocating sk_reuseport_cb
> when setting SO_REUSEPORT makes things behave normally again; see
> attached patch.  That's surely not a correct/complete fix as B (in the
> scenario above) will have an unnecessary sk_reuseport_cb which will
> trigger a warning and should be dealt with.

Hi Marc

Your patch looks fine to me, please add a "Fixes:" tag in it ?

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")

Thanks.
Craig Gallek Jan. 19, 2016, 4:31 p.m. UTC | #3
On Tue, Jan 19, 2016 at 11:04 AM, Marc Dionne <marc.c.dionne@gmail.com> wrote:
> Resent with correct address for Eric.
>
> On Tue, Jan 19, 2016 at 11:57 AM, Marc Dionne <marc.c.dionne@gmail.com> wrote:
>> I shared this one with Craig but I thought I'd put it out to a wider audience.
>>
>> Trying to run the current kernel mainline on a test system I found
>> that any attempt to run many of our executables would crash the
>> system.  The networking code in all of these opens and listens on
>> multiple UDP sockets set with SO_REUSEPORT.  We also like to bind the
>> first socket before setting SO_REUSEPORT so we can catch some cases
>> where the port is actually in use by someone else (for instance a
>> previous incarnation of the same service).
>>
>> This is easily reproduced with this sequence:
>> - create 2 sockets A and B
>> - bind socket A to an address
>> - set SO_REUSEPORT on socket A
>> - set SO_REUSEPORT on socket B
>> - bind socket B to the same address as A
>>
>> The sk_reuseport_cb structure is only allocated at bind time if
>> SO_REUSEPORT is already set, so A doesn't have one.  When we bind B, A
>> is found as a match that has SO_REUSEPORT and reuseport_add_sock will
>> try to use the NULL sk_reuseport_cb structure from A, causing a crash.
>>
>> Not sure what the best fix is, but seems like the structure could be
>> either allocated (if not already done) when setting SO_REUSEPORT, or
>> when we find it to be NULL in reuseport_add_sock (but locking may be
>> an issue there).  I was able to test that allocating sk_reuseport_cb
>> when setting SO_REUSEPORT makes things behave normally again; see
>> attached patch.  That's surely not a correct/complete fix as B (in the
>> scenario above) will have an unnecessary sk_reuseport_cb which will
>> trigger a warning and should be dealt with.
>>
>> Thanks,
>> Marc

There are really two issues here: The change in behavior of when you
can set SO_REUSEPORT on a bound socket and the NULL pointer
dereference race.

It's not completely safe to allocate the reuseport struct from the
setsockopt function.  Acquisition of the spin lock in reuseport_alloc
requires prior acquisition of the hlist lock for the socket (which
doesn't exist before bind).

Further, allocating this structure before bind means that it's
possible to associate two different BPF filters with two different
sockets and then try to bind them both to the same address.  This
means that the reuseport_add_sock function would need to decide to
pick between one of these two structures and get rid of the other.
There's currently a WARN_ON in that function which would complain
about this.  (the test program in
tools/testing/selftests/net/reuseport_bpf.c will trigger this warning
with your change).

I need to think about how to handle setsockopt-after-bind condition a
bit more, but the NULL pointer dereference is obviously wrong.  Do you
have a way to easily reproduce this?  I've only managed to get it to
happen once so far...
Marc Dionne Jan. 19, 2016, 5:08 p.m. UTC | #4
On Tue, Jan 19, 2016 at 12:31 PM, Craig Gallek <kraig@google.com> wrote:
>
> I need to think about how to handle setsockopt-after-bind condition a
> bit more, but the NULL pointer dereference is obviously wrong.  Do you
> have a way to easily reproduce this?  I've only managed to get it to
> happen once so far...

The attached code reliably triggers the crash for me.
diff mbox

Patch

commit 518120f41aec30bdde39caf58818e1e5d2d6a4a4
Author: Marc Dionne <marc.dionne@auristor.com>
Date:   Fri Jan 15 16:02:32 2016 -0400

    soreuseport: Allow bind before setting SO_REUSEPORT
    
    If a socket is bound before the SO_REUSEPORT option is
    set, it will have no sk_reuseport_cb structure allocated
    and attached to it.  If there is then an attempt to bind
    a second socket to the same port, the first socket will
    be found as matching and reuseport_add_sock will attempt
    to use the NULL sk_reuseport_cb pointer, resulting in an
    oops.
    
    Allocate sk_reuseport_cb in setsockopt if not already set.
    Also, don't preclude reusing a port because the sk has
    sk_reuseport_cb set.
    
    Signed-off-by: Marc Dionne <marc.dionne@auristor.com>

diff --git a/net/core/sock.c b/net/core/sock.c
index 5127023..587de5b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -724,6 +724,8 @@  int sock_setsockopt(struct socket *sock, int level, int optname,
 		break;
 	case SO_REUSEPORT:
 		sk->sk_reuseport = valbool;
+		if (!rcu_access_pointer(sk->sk_reuseport_cb))
+		    reuseport_alloc(sk);
 		break;
 	case SO_TYPE:
 	case SO_PROTOCOL:
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index dc45b53..13884b6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -154,7 +154,6 @@  static int udp_lib_lport_inuse(struct net *net, __u16 num,
 		    (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 		     sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
 		    (!sk2->sk_reuseport || !sk->sk_reuseport ||
-		     rcu_access_pointer(sk->sk_reuseport_cb) ||
 		     !uid_eq(uid, sock_i_uid(sk2))) &&
 		    saddr_comp(sk, sk2, true)) {
 			if (!bitmap)
@@ -190,7 +189,6 @@  static int udp_lib_lport_inuse2(struct net *net, __u16 num,
 		    (!sk2->sk_bound_dev_if || !sk->sk_bound_dev_if ||
 		     sk2->sk_bound_dev_if == sk->sk_bound_dev_if) &&
 		    (!sk2->sk_reuseport || !sk->sk_reuseport ||
-		     rcu_access_pointer(sk->sk_reuseport_cb) ||
 		     !uid_eq(uid, sock_i_uid(sk2))) &&
 		    saddr_comp(sk, sk2, true)) {
 			res = 1;