diff mbox series

[net] tcp: fix a race in inet_diag_dump_icsk()

Message ID 20181220232856.1496-1-edumazet@google.com
State Accepted, archived
Delegated to: David Miller
Headers show
Series [net] tcp: fix a race in inet_diag_dump_icsk() | expand

Commit Message

Eric Dumazet Dec. 20, 2018, 11:28 p.m. UTC
Alexei reported use after frees in inet_diag_dump_icsk() [1]

Because we use refcount_set() when various sockets are setup and
inserted into ehash, we also need to make sure inet_diag_dump_icsk()
wont race with the refcount_set() operations.

Jonathan Lemon sent a patch changing net_twsk_hashdance() but
other spots would need risky changes.

Instead, fix inet_diag_dump_icsk() as this bug came with
linux-4.10 only.

[1] Quoting Alexei :

First something iterating over sockets finds already freed tw socket:

refcount_t: increment on 0; use-after-free.
WARNING: CPU: 2 PID: 2738 at lib/refcount.c:153 refcount_inc+0x26/0x30
RIP: 0010:refcount_inc+0x26/0x30
RSP: 0018:ffffc90004c8fbc0 EFLAGS: 00010282
RAX: 000000000000002b RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88085ee9d680 RSI: ffff88085ee954c8 RDI: ffff88085ee954c8
RBP: ffff88010ecbd2c0 R08: 0000000000000000 R09: 000000000000174c
R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8806ba9bf210 R14: ffffffff82304600 R15: ffff88010ecbd328
FS:  00007f81f5a7d700(0000) GS:ffff88085ee80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f81e2a95000 CR3: 000000069b2eb006 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 inet_diag_dump_icsk+0x2b3/0x4e0 [inet_diag]  // sock_hold(sk); in net/ipv4/inet_diag.c:1002
 ? kmalloc_large_node+0x37/0x70
 ? __kmalloc_node_track_caller+0x1cb/0x260
 ? __alloc_skb+0x72/0x1b0
 ? __kmalloc_reserve.isra.40+0x2e/0x80
 __inet_diag_dump+0x3b/0x80 [inet_diag]
 netlink_dump+0x116/0x2a0
 netlink_recvmsg+0x205/0x3c0
 sock_read_iter+0x89/0xd0
 __vfs_read+0xf7/0x140
 vfs_read+0x8a/0x140
 SyS_read+0x3f/0xa0
 do_syscall_64+0x5a/0x100

then a minute later twsk timer fires and hits two bad refcnts
for this freed socket:

refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 31 PID: 0 at lib/refcount.c:228 refcount_dec+0x2e/0x40
Modules linked in:
RIP: 0010:refcount_dec+0x2e/0x40
RSP: 0018:ffff88085f5c3ea8 EFLAGS: 00010296
RAX: 000000000000002c RBX: ffff88010ecbd2c0 RCX: 000000000000083f
RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
RBP: ffffc90003c77280 R08: 0000000000000000 R09: 00000000000017d3
R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: ffffffff82ad2d80
R13: ffffffff8182de00 R14: ffff88085f5c3ef8 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88085f5c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbe42685250 CR3: 0000000002209001 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 inet_twsk_kill+0x9d/0xc0  // inet_twsk_bind_unhash(tw, hashinfo);
 call_timer_fn+0x29/0x110
 run_timer_softirq+0x36b/0x3a0

refcount_t: underflow; use-after-free.
WARNING: CPU: 31 PID: 0 at lib/refcount.c:187 refcount_sub_and_test+0x46/0x50
RIP: 0010:refcount_sub_and_test+0x46/0x50
RSP: 0018:ffff88085f5c3eb8 EFLAGS: 00010296
RAX: 0000000000000026 RBX: ffff88010ecbd2c0 RCX: 000000000000083f
RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
RBP: ffff88010ecbd358 R08: 0000000000000000 R09: 000000000000185b
R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: ffff88010ecbd358
R13: ffffffff8182de00 R14: ffff88085f5c3ef8 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88085f5c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbe42685250 CR3: 0000000002209001 CR4: 00000000003606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 inet_twsk_put+0x12/0x20  // inet_twsk_put(tw);
 call_timer_fn+0x29/0x110
 run_timer_softirq+0x36b/0x3a0

Fixes: 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling tcp_get_info()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <ast@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 net/ipv4/inet_diag.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Jonathan Lemon Dec. 21, 2018, 12:42 a.m. UTC | #1
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>

On 20 Dec 2018, at 15:28, Eric Dumazet wrote:

> Alexei reported use after frees in inet_diag_dump_icsk() [1]
>
> Because we use refcount_set() when various sockets are setup and
> inserted into ehash, we also need to make sure inet_diag_dump_icsk()
> wont race with the refcount_set() operations.
>
> Jonathan Lemon sent a patch changing net_twsk_hashdance() but
> other spots would need risky changes.
>
> Instead, fix inet_diag_dump_icsk() as this bug came with
> linux-4.10 only.
>
> [1] Quoting Alexei :
>
> First something iterating over sockets finds already freed tw socket:
>
> refcount_t: increment on 0; use-after-free.
> WARNING: CPU: 2 PID: 2738 at lib/refcount.c:153 refcount_inc+0x26/0x30
> RIP: 0010:refcount_inc+0x26/0x30
> RSP: 0018:ffffc90004c8fbc0 EFLAGS: 00010282
> RAX: 000000000000002b RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ffff88085ee9d680 RSI: ffff88085ee954c8 RDI: ffff88085ee954c8
> RBP: ffff88010ecbd2c0 R08: 0000000000000000 R09: 000000000000174c
> R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: 0000000000000000
> R13: ffff8806ba9bf210 R14: ffffffff82304600 R15: ffff88010ecbd328
> FS:  00007f81f5a7d700(0000) GS:ffff88085ee80000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f81e2a95000 CR3: 000000069b2eb006 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  inet_diag_dump_icsk+0x2b3/0x4e0 [inet_diag]  // sock_hold(sk); in 
> net/ipv4/inet_diag.c:1002
>  ? kmalloc_large_node+0x37/0x70
>  ? __kmalloc_node_track_caller+0x1cb/0x260
>  ? __alloc_skb+0x72/0x1b0
>  ? __kmalloc_reserve.isra.40+0x2e/0x80
>  __inet_diag_dump+0x3b/0x80 [inet_diag]
>  netlink_dump+0x116/0x2a0
>  netlink_recvmsg+0x205/0x3c0
>  sock_read_iter+0x89/0xd0
>  __vfs_read+0xf7/0x140
>  vfs_read+0x8a/0x140
>  SyS_read+0x3f/0xa0
>  do_syscall_64+0x5a/0x100
>
> then a minute later twsk timer fires and hits two bad refcnts
> for this freed socket:
>
> refcount_t: decrement hit 0; leaking memory.
> WARNING: CPU: 31 PID: 0 at lib/refcount.c:228 refcount_dec+0x2e/0x40
> Modules linked in:
> RIP: 0010:refcount_dec+0x2e/0x40
> RSP: 0018:ffff88085f5c3ea8 EFLAGS: 00010296
> RAX: 000000000000002c RBX: ffff88010ecbd2c0 RCX: 000000000000083f
> RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
> RBP: ffffc90003c77280 R08: 0000000000000000 R09: 00000000000017d3
> R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: ffffffff82ad2d80
> R13: ffffffff8182de00 R14: ffff88085f5c3ef8 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88085f5c0000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fbe42685250 CR3: 0000000002209001 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <IRQ>
>  inet_twsk_kill+0x9d/0xc0  // inet_twsk_bind_unhash(tw, hashinfo);
>  call_timer_fn+0x29/0x110
>  run_timer_softirq+0x36b/0x3a0
>
> refcount_t: underflow; use-after-free.
> WARNING: CPU: 31 PID: 0 at lib/refcount.c:187 
> refcount_sub_and_test+0x46/0x50
> RIP: 0010:refcount_sub_and_test+0x46/0x50
> RSP: 0018:ffff88085f5c3eb8 EFLAGS: 00010296
> RAX: 0000000000000026 RBX: ffff88010ecbd2c0 RCX: 000000000000083f
> RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
> RBP: ffff88010ecbd358 R08: 0000000000000000 R09: 000000000000185b
> R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: ffff88010ecbd358
> R13: ffffffff8182de00 R14: ffff88085f5c3ef8 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff88085f5c0000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fbe42685250 CR3: 0000000002209001 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <IRQ>
>  inet_twsk_put+0x12/0x20  // inet_twsk_put(tw);
>  call_timer_fn+0x29/0x110
>  run_timer_softirq+0x36b/0x3a0
>
> Fixes: 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling 
> tcp_get_info()")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  net/ipv4/inet_diag.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 
> 4e5bc4b2f14e6786ceb7d63e5902f8fc17819dfa..1a4e9ff02762ed757545da13de1ee352f38c867b 
> 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -998,7 +998,9 @@ void inet_diag_dump_icsk(struct inet_hashinfo 
> *hashinfo, struct sk_buff *skb,
>  			if (!inet_diag_bc_sk(bc, sk))
>  				goto next_normal;
>
> -			sock_hold(sk);
> +			if (!refcount_inc_not_zero(&sk->sk_refcnt))
> +				goto next_normal;
> +
>  			num_arr[accum] = num;
>  			sk_arr[accum] = sk;
>  			if (++accum == SKARR_SZ)
> -- 
> 2.20.1.415.g653613c723-goog
David Miller Dec. 21, 2018, 3:24 a.m. UTC | #2
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 20 Dec 2018 15:28:56 -0800

> Alexei reported use after frees in inet_diag_dump_icsk() [1]
> 
> Because we use refcount_set() when various sockets are setup and
> inserted into ehash, we also need to make sure inet_diag_dump_icsk()
> wont race with the refcount_set() operations.
> 
> Jonathan Lemon sent a patch changing net_twsk_hashdance() but
> other spots would need risky changes.
> 
> Instead, fix inet_diag_dump_icsk() as this bug came with
> linux-4.10 only.
> 
> [1] Quoting Alexei :
 ...
> Fixes: 67db3e4bfbc9 ("tcp: no longer hold ehash lock while calling tcp_get_info()")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <ast@kernel.org>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>

Applied and queued up for -stable, thanks Eric.
diff mbox series

Patch

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 4e5bc4b2f14e6786ceb7d63e5902f8fc17819dfa..1a4e9ff02762ed757545da13de1ee352f38c867b 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -998,7 +998,9 @@  void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *skb,
 			if (!inet_diag_bc_sk(bc, sk))
 				goto next_normal;
 
-			sock_hold(sk);
+			if (!refcount_inc_not_zero(&sk->sk_refcnt))
+				goto next_normal;
+
 			num_arr[accum] = num;
 			sk_arr[accum] = sk;
 			if (++accum == SKARR_SZ)