Message ID | CAM_iQpXDwx3=tF4Atu7O+STswdTxLFzzT5-K2bgqtcWcF4aghA@mail.gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, 2017-02-09 at 17:24 -0800, Cong Wang wrote: > On Thu, Feb 9, 2017 at 5:14 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > > Hello, > > > > I've got the following use-after-free report in packet_rcv_fanout > > while running syzkaller fuzzer on linux-next > > e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993. So far it happened once and > > is not reproducible, but maybe the stacks will allow you to figure out > > what happens. > > > > BUG: KASAN: use-after-free in __lock_acquire+0x3212/0x3430 > > kernel/locking/lockdep.c:3224 at addr ffff8801d903d538 > > Read of size 8 by task syz-executor1/10596 > > CPU: 1 PID: 10596 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170208 #1 > > Hardware name: Google Google Compute Engine/Google Compute Engine, > > BIOS Google 01/01/2011 > > > > Call Trace: > > __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332 > > __lock_acquire+0x3212/0x3430 kernel/locking/lockdep.c:3224 > > lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 > > __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline] > > _raw_spin_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:175 > > spin_lock_bh include/linux/spinlock.h:304 [inline] > > packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308 > > fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388 > > packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490 > > dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898 > > xmit_one net/core/dev.c:2870 [inline] > > dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2890 > > __dev_queue_xmit+0x16d1/0x1e60 net/core/dev.c:3355 > > dev_queue_xmit+0x17/0x20 net/core/dev.c:3388 > > neigh_hh_output include/net/neighbour.h:468 [inline] > > dst_neigh_output include/net/dst.h:452 [inline] > > ip6_finish_output2+0x1461/0x2380 net/ipv6/ip6_output.c:123 > > ip6_finish_output+0x2f9/0x950 net/ipv6/ip6_output.c:149 > > NF_HOOK_COND include/linux/netfilter.h:246 [inline] > > ip6_output+0x1cb/0x8c0 net/ipv6/ip6_output.c:163 > > ip6_xmit+0xc2f/0x1e80 include/net/dst.h:498 > > inet6_csk_xmit+0x320/0x5d0 net/ipv6/inet6_connection_sock.c:139 > > tcp_transmit_skb+0x1ab4/0x3460 net/ipv4/tcp_output.c:1054 > > tcp_send_syn_data net/ipv4/tcp_output.c:3343 [inline] > > tcp_connect+0x11a7/0x2f50 net/ipv4/tcp_output.c:3375 > > tcp_v6_connect+0x1a6e/0x1f70 net/ipv6/tcp_ipv6.c:295 > > __inet_stream_connect+0x2d1/0xf80 net/ipv4/af_inet.c:618 > > tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline] > > tcp_sendmsg+0x23ac/0x3bd0 net/ipv4/tcp.c:1133 > > inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761 > > sock_sendmsg_nosec net/socket.c:633 [inline] > > sock_sendmsg+0xca/0x110 net/socket.c:643 > > SYSC_sendto+0x660/0x810 net/socket.c:1685 > > SyS_sendto+0x40/0x50 net/socket.c:1653 > > entry_SYSCALL_64_fastpath+0x1f/0xc2 > > It seems on-flying packets could still refer the struct sock pointer > via f->arr[i], if so we need a sync before unlinking it: > > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index d56ee46..8724a98 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -2924,6 +2924,8 @@ static int packet_release(struct socket *sock) > sock_prot_inuse_add(net, sk->sk_prot, -1); > preempt_enable(); > > + synchronize_net(); > + > spin_lock(&po->bind_lock); > unregister_prot_hook(sk, false); > packet_cached_dev_reset(po); More likely the bug is in fanout_add(), with a buggy sequence in error case, and not correct locking. kfree(po->rollover); po->rollover = NULL; Two cpus entering fanout_add() (using the same af_packet socket, syzkaller courtesy...) might both see po->fanout being NULL. Then they grab the mutex. Too late...
On (02/09/17 19:19), Eric Dumazet wrote: > > More likely the bug is in fanout_add(), with a buggy sequence in error > case, and not correct locking. > > kfree(po->rollover); > po->rollover = NULL; > > Two cpus entering fanout_add() (using the same af_packet socket, > syzkaller courtesy...) might both see po->fanout being NULL. > > Then they grab the mutex. Too late... I'm not sure I follow- aiui the panic was in acceessing the sk_receive_queue.lock in a socket that had been closed earlier. I think the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit packet delivery can be done safely, and the synchronize_net in packet_release() makes sure that the Tx paths are quiesced before freeing the socket. What is the race-hole here? Does it have to do with the _bh and softirq context, somehow? --Sowmini
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: > On (02/09/17 19:19), Eric Dumazet wrote: >> >> More likely the bug is in fanout_add(), with a buggy sequence in error >> case, and not correct locking. >> >> kfree(po->rollover); >> po->rollover = NULL; >> >> Two cpus entering fanout_add() (using the same af_packet socket, >> syzkaller courtesy...) might both see po->fanout being NULL. >> >> Then they grab the mutex. Too late... > > I'm not sure I follow- aiui the panic was in acceessing the > sk_receive_queue.lock in a socket that had been closed earlier. I think > the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and > rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit > packet delivery can be done safely, and the synchronize_net in > packet_release() makes sure that the Tx paths are quiesced before freeing > the socket. What is the race-hole here? Does it have to do with the > _bh and softirq context, somehow? > We have probably a dozen of bugs to fix in af_packet.c The race in fanout_add() is one ot theml. I do not believe Anoob Soman sent his fixes btw ... ( Look for this thread : http://marc.info/?l=linux-netdev&m=148588680525648&w=2
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan <sowmini.varadhan@oracle.com> wrote: > On (02/09/17 19:19), Eric Dumazet wrote: >> >> More likely the bug is in fanout_add(), with a buggy sequence in error >> case, and not correct locking. >> >> kfree(po->rollover); >> po->rollover = NULL; >> >> Two cpus entering fanout_add() (using the same af_packet socket, >> syzkaller courtesy...) might both see po->fanout being NULL. >> >> Then they grab the mutex. Too late... > > I'm not sure I follow- aiui the panic was in acceessing the > sk_receive_queue.lock in a socket that had been closed earlier. I think > the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and > rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit > packet delivery can be done safely, and the synchronize_net in > packet_release() makes sure that the Tx paths are quiesced before freeing > the socket. What is the race-hole here? Does it have to do with the > _bh and softirq context, somehow? My understanding about the race here is packet_release() doesn't wait for flying packets correctly, which leads to a flying packet still refers to the struct sock which is being released. This could happen because struct packet_fanout is refcn'ted, it is still there when this is not the last sock referring it, therefore, the callback packet_rcv_fanout() is not removed yet. When packet_release() tries to remove the pointer to struct sock from f->arr[i] in __fanout_unlink(), a flying packet could race with f->arr[i]: po = pkt_sk(f->arr[idx]); Of course, the fix may not be as easy as just adding a synchronize_net(), perhaps we need the spinlock too in fanout_demux_rollover(). At least I believe this explains the crash Dmitry reported.
On (02/10/17 10:00), Cong Wang wrote: > My understanding about the race here is packet_release() doesn't > wait for flying packets correctly, which leads to a flying packet still > refers to the struct sock which is being released. > > This could happen because struct packet_fanout is refcn'ted, it is : > At least I believe this explains the crash Dmitry reported. hmm, the proof of the pudding is in the eating- would be good to be able to reliably reproduce this somewhere (thus proving that root-cause analysis is rock-solid), maybe by introducing artificial delays to slow down paths.. I'm travelling at the moment but may be able to give this (try to reproduce it reliably) next week. --Sowmini
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index d56ee46..8724a98 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2924,6 +2924,8 @@ static int packet_release(struct socket *sock) sock_prot_inuse_add(net, sk->sk_prot, -1); preempt_enable(); + synchronize_net(); + spin_lock(&po->bind_lock); unregister_prot_hook(sk, false); packet_cached_dev_reset(po);