diff mbox

NULL pointer deref, selinux_socket_unix_may_send+0x34/0x90

Message ID 2355680.noQDWa4NlY@sifl
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Paul Moore March 22, 2013, 3:24 p.m. UTC
On Thursday, March 21, 2013 11:19:22 PM Ján Stanček wrote:
> Hi,
> 
> I'm occasionally seeing a panic early after system booted and while
> systemd is starting other services.
> 
> I made a reproducer which is quite reliable on my system (32 CPU Intel)
> and can usually trigger this issue within a minute or two. I can reproduce
> this issue with 3.9.0-rc3 as root or unprivileged user (see call trace
> below).
> 
> I'm attaching my reproducer and (experimental) patch, which fixes the
> issue for me.

Hi Jan,

I've heard some similar reports over the past few years but I've never been 
able to reproduce the problem and the reporters have never show enough 
interest to be able to help me diagnose the problem.  Your information about 
the size of the machine and the reproducer may help, thank you!

I'll try your reproducer but since I don't happen to have a machine handy that 
is the same size as yours would you mind trying the attached (also pasted 
inline for others to comment on) patch?  I can't promise it will solve your 
problem but it was the best idea I could come up with a few years ago when I 
first became aware of the problem.  I think you are right in that there is a 
race condition somewhere with the AF_UNIX sockets shutting down, I'm just not 
yet certain where it is ...

Thanks again.

net: fix some potential race issues in the AF_UNIX code

From: Paul Moore <pmoore@redhat.com>

At least one user had reported some odd behavior with UNIX sockets which
could be attributed to some _possible_ race conditions in
unix_release_sock().

Reported-by: Konstantin Boyandin <konstantin@boyandin.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
---
 net/unix/af_unix.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

 			if (!skb_queue_empty(&sk->sk_receive_queue) || embrion)
@@ -417,9 +419,10 @@ static int unix_release_sock(struct sock *sk, int 
embrion)
 			unix_state_unlock(skpair);
 			skpair->sk_state_change(skpair);
 			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
-		}
-		sock_put(skpair); /* It may now die */
+		} else
+			unix_state_unlock(skpair);
 		unix_peer(sk) = NULL;
+		sock_put(skpair); /* It may now die */
 	}
 
 	/* Try to flush out this socket. Throw out buffers at least */

Comments

Ján Stanček March 22, 2013, 3:48 p.m. UTC | #1
On Fri, Mar 22, 2013 at 4:24 PM, Paul Moore <paul@paul-moore.com> wrote:
> On Thursday, March 21, 2013 11:19:22 PM Ján Stanček wrote:
>> Hi,
>>
>> I'm occasionally seeing a panic early after system booted and while
>> systemd is starting other services.
>>
>> I made a reproducer which is quite reliable on my system (32 CPU Intel)
>> and can usually trigger this issue within a minute or two. I can reproduce
>> this issue with 3.9.0-rc3 as root or unprivileged user (see call trace
>> below).
>>
>> I'm attaching my reproducer and (experimental) patch, which fixes the
>> issue for me.
>
> Hi Jan,
>
> I've heard some similar reports over the past few years but I've never been
> able to reproduce the problem and the reporters have never show enough
> interest to be able to help me diagnose the problem.  Your information about
> the size of the machine and the reproducer may help, thank you!
>
> I'll try your reproducer but since I don't happen to have a machine handy that
> is the same size as yours would you mind trying the attached (also pasted
> inline for others to comment on) patch?  I can't promise it will solve your
> problem but it was the best idea I could come up with a few years ago when I
> first became aware of the problem.  I think you are right in that there is a
> race condition somewhere with the AF_UNIX sockets shutting down, I'm just not
> yet certain where it is ...

Hi Paul,

thanks for reply, I'll try your patch and let you know.

I'm not certain about cause either, but patch I sent in last email
makes it go away,
so maybe that can help in some way.

I made a v2 of the reproducer (attached), which triggers the issue a lot faster
on 2 systems I tried (32 CPU and 4 CPU systems) - just in couple of seconds.

Regards,
Jan
Paul Moore March 22, 2013, 4:24 p.m. UTC | #2
On Friday, March 22, 2013 04:48:32 PM Ján Stanček wrote:
> Hi Paul,
> 
> thanks for reply, I'll try your patch and let you know.

Great, thanks.
 
> I'm not certain about cause either, but patch I sent in last email
> makes it go away, so maybe that can help in some way.

At the very least you've found a way to reproduce the problem and your patch 
furthers my belief that we've got a race condition somewhere - all very 
helpful!  It may also turn out that your patch is the "right" solution, I'd 
just like to better understand why we are seeing the race in the first place.

> I made a v2 of the reproducer (attached), which triggers the issue a lot
> faster on 2 systems I tried (32 CPU and 4 CPU systems) - just in couple of
> seconds.

Excellent, while I don't have a 32 cpu system handy, I do have a 4 cpu system 
that I can play with.  Thanks again.

-Paul
Ján Stanček March 22, 2013, 4:52 p.m. UTC | #3
Paul,

I applied your patch on top of 3.9-rc3 and ran v2 of reproducer. It
hit the issue
almost instantly:

[  249.316283] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000250
[  249.325044] IP: [<ffffffff812a2d04>] selinux_socket_unix_may_send+0x34/0x90
[  249.332829] PGD 80a8e5067 PUD 803048067 PMD 0
[  249.337821] Oops: 0000 [#1] SMP
[  249.453113] CPU 22
[  249.455262] Pid: 6928, comm: a.out Tainted: GF
3.9.0-rc3+ #1 Intel Corporation W2600CR/W2600CR
[  249.466132] RIP: 0010:[<ffffffff812a2d04>]  [<ffffffff812a2d04>]
selinux_socket_unix_may_send+0x34/0x90
[  249.476632] RSP: 0018:ffff880826569ab8  EFLAGS: 00010246
[  249.482551] RAX: ffff880417ee4100 RBX: ffff880826569be8 RCX: 0000000000000007
[  249.490511] RDX: 0000000000000000 RSI: ffff880828f77d00 RDI: ffff880826569ad8
[  249.498472] RBP: ffff880826569b18 R08: ffff880424ada080 R09: 0000000000000000
[  249.506434] R10: ffff880826569a38 R11: 000000000000000f R12: ffff880417ee4100
[  249.514395] R13: 7fffffffffffffff R14: ffff880424ada080 R15: ffff880424ada370
[  249.522355] FS:  00007f6d44abf740(0000) GS:ffff88042f7c0000(0000)
knlGS:0000000000000000
[  249.531383] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  249.537792] CR2: 0000000000000250 CR3: 0000000800ffa000 CR4: 00000000000407e0
[  249.545755] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  249.553716] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  249.561678] Process a.out (pid: 6928, threadinfo ffff880826568000,
task ffff880825d999a0)
[  249.570802] Stack:
[  249.573046]  ffff880424ada002 ffff880427c5bc00 ffff880417ee4100
ffff880424ada080
[  249.581339]  ffff880424ada080 000000000000000a ffff880826569af8
ffffffff8129ef36
[  249.589637]  ffff880826569b28 ffffffff81529747 ffff880826569be8
0000000011f46a34
[  249.597933] Call Trace:
[  249.600666]  [<ffffffff8129ef36>] ? security_sock_rcv_skb+0x16/0x20
[  249.607661]  [<ffffffff81529747>] ? sk_filter+0x37/0xd0
[  249.613491]  [<ffffffff8129ef16>] security_unix_may_send+0x16/0x20
[  249.620390]  [<ffffffff815b697d>] unix_dgram_sendmsg+0x48d/0x640
[  249.627094]  [<ffffffff814fd9c0>] sock_sendmsg+0xb0/0xe0
[  249.633024]  [<ffffffff812adee7>] ? ebitmap_cpy+0x47/0xd0
[  249.639048]  [<ffffffff814ff55c>] __sys_sendmsg+0x3ac/0x3c0
[  249.645267]  [<ffffffff811a3357>] ? do_sync_write+0xa7/0xe0
[  249.651487]  [<ffffffff811e31fb>] ? fsnotify+0x24b/0x340
[  249.657416]  [<ffffffff815013c9>] sys_sendmsg+0x49/0x90
[  249.663249]  [<ffffffff81630bd9>] system_call_fastpath+0x16/0x1b
[  249.669949] Code: 00 00 45 31 c9 48 89 e5 48 83 ec 60 48 8b 56 20
65 48 8b 04 25 28 00 00 00 48 89 45 f8 31 c0 48 8b 47 20 48 8d 7d c0
c6 45 a0 02 <48> 8b b2 50 02 00 00 4c 8b 80 50 02 00 00 31 c0 f3 48 ab
48 89

Regards,
Jan

On Fri, Mar 22, 2013 at 5:24 PM, Paul Moore <paul@paul-moore.com> wrote:
> On Friday, March 22, 2013 04:48:32 PM Ján Stanček wrote:
>> Hi Paul,
>>
>> thanks for reply, I'll try your patch and let you know.
>
> Great, thanks.
>
>> I'm not certain about cause either, but patch I sent in last email
>> makes it go away, so maybe that can help in some way.
>
> At the very least you've found a way to reproduce the problem and your patch
> furthers my belief that we've got a race condition somewhere - all very
> helpful!  It may also turn out that your patch is the "right" solution, I'd
> just like to better understand why we are seeing the race in the first place.
>
>> I made a v2 of the reproducer (attached), which triggers the issue a lot
>> faster on 2 systems I tried (32 CPU and 4 CPU systems) - just in couple of
>> seconds.
>
> Excellent, while I don't have a 32 cpu system handy, I do have a 4 cpu system
> that I can play with.  Thanks again.
>
> -Paul
>
> --
> paul moore
> www.paul-moore.com
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Moore March 22, 2013, 6:24 p.m. UTC | #4
On Friday, March 22, 2013 05:52:37 PM Ján Stanček wrote:
> Paul,
> 
> I applied your patch on top of 3.9-rc3 and ran v2 of reproducer. It
> hit the issue almost instantly:

Okay, thanks for trying it ... let me see what I can do.
diff mbox

Patch

net: fix some potential race issues in the AF_UNIX code

From: Paul Moore <pmoore@redhat.com>

At least one user had reported some odd behavior with UNIX sockets which
could be attributed to some _possible_ race conditions in
unix_release_sock().

Reported-by: Konstantin Boyandin <konstantin@boyandin.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
---
 net/unix/af_unix.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 51be64f..886b8da 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -408,8 +408,10 @@  static int unix_release_sock(struct sock *sk, int embrion)
 	skpair = unix_peer(sk);
 
 	if (skpair != NULL) {
-		if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) {
-			unix_state_lock(skpair);
+		unix_state_lock(skpair);
+		if (unix_our_peer(sk, skpair) &&
+		    (sk->sk_type == SOCK_STREAM ||
+		     sk->sk_type == SOCK_SEQPACKET)) {
 			/* No more writes */
 			skpair->sk_shutdown = SHUTDOWN_MASK;
 			if (!skb_queue_empty(&sk->sk_receive_queue) || embrion)
@@ -417,9 +419,10 @@  static int unix_release_sock(struct sock *sk, int embrion)
 			unix_state_unlock(skpair);
 			skpair->sk_state_change(skpair);
 			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
-		}
-		sock_put(skpair); /* It may now die */
+		} else
+			unix_state_unlock(skpair);
 		unix_peer(sk) = NULL;
+		sock_put(skpair); /* It may now die */
 	}
 
 	/* Try to flush out this socket. Throw out buffers at least */