diff mbox

net/packet: use-after-free in packet_rcv_fanout

Message ID CAM_iQpXDwx3=tF4Atu7O+STswdTxLFzzT5-K2bgqtcWcF4aghA@mail.gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Cong Wang Feb. 10, 2017, 1:24 a.m. UTC
On Thu, Feb 9, 2017 at 5:14 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> I've got the following use-after-free report in packet_rcv_fanout
> while running syzkaller fuzzer on linux-next
> e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993. So far it happened once and
> is not reproducible, but maybe the stacks will allow you to figure out
> what happens.
>
> BUG: KASAN: use-after-free in __lock_acquire+0x3212/0x3430
> kernel/locking/lockdep.c:3224 at addr ffff8801d903d538
> Read of size 8 by task syz-executor1/10596
> CPU: 1 PID: 10596 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170208 #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
>
> Call Trace:
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  __lock_acquire+0x3212/0x3430 kernel/locking/lockdep.c:3224
>  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
>  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
>  _raw_spin_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:175
>  spin_lock_bh include/linux/spinlock.h:304 [inline]
>  packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308
>  fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388
>  packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490
>  dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898
>  xmit_one net/core/dev.c:2870 [inline]
>  dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2890
>  __dev_queue_xmit+0x16d1/0x1e60 net/core/dev.c:3355
>  dev_queue_xmit+0x17/0x20 net/core/dev.c:3388
>  neigh_hh_output include/net/neighbour.h:468 [inline]
>  dst_neigh_output include/net/dst.h:452 [inline]
>  ip6_finish_output2+0x1461/0x2380 net/ipv6/ip6_output.c:123
>  ip6_finish_output+0x2f9/0x950 net/ipv6/ip6_output.c:149
>  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
>  ip6_output+0x1cb/0x8c0 net/ipv6/ip6_output.c:163
>  ip6_xmit+0xc2f/0x1e80 include/net/dst.h:498
>  inet6_csk_xmit+0x320/0x5d0 net/ipv6/inet6_connection_sock.c:139
>  tcp_transmit_skb+0x1ab4/0x3460 net/ipv4/tcp_output.c:1054
>  tcp_send_syn_data net/ipv4/tcp_output.c:3343 [inline]
>  tcp_connect+0x11a7/0x2f50 net/ipv4/tcp_output.c:3375
>  tcp_v6_connect+0x1a6e/0x1f70 net/ipv6/tcp_ipv6.c:295
>  __inet_stream_connect+0x2d1/0xf80 net/ipv4/af_inet.c:618
>  tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline]
>  tcp_sendmsg+0x23ac/0x3bd0 net/ipv4/tcp.c:1133
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
>  sock_sendmsg_nosec net/socket.c:633 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:643
>  SYSC_sendto+0x660/0x810 net/socket.c:1685
>  SyS_sendto+0x40/0x50 net/socket.c:1653
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

It seems on-flying packets could still refer the struct sock pointer
via f->arr[i], if so we need a sync before unlinking it:

Comments

Eric Dumazet Feb. 10, 2017, 3:19 a.m. UTC | #1
On Thu, 2017-02-09 at 17:24 -0800, Cong Wang wrote:
> On Thu, Feb 9, 2017 at 5:14 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> > Hello,
> >
> > I've got the following use-after-free report in packet_rcv_fanout
> > while running syzkaller fuzzer on linux-next
> > e3e6c5f3544c5d05c6b3b309a34f4f2c3537e993. So far it happened once and
> > is not reproducible, but maybe the stacks will allow you to figure out
> > what happens.
> >
> > BUG: KASAN: use-after-free in __lock_acquire+0x3212/0x3430
> > kernel/locking/lockdep.c:3224 at addr ffff8801d903d538
> > Read of size 8 by task syz-executor1/10596
> > CPU: 1 PID: 10596 Comm: syz-executor1 Not tainted 4.10.0-rc7-next-20170208 #1
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS Google 01/01/2011
> >
> > Call Trace:
> >  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
> >  __lock_acquire+0x3212/0x3430 kernel/locking/lockdep.c:3224
> >  lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753
> >  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
> >  _raw_spin_lock_bh+0x3a/0x50 kernel/locking/spinlock.c:175
> >  spin_lock_bh include/linux/spinlock.h:304 [inline]
> >  packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308
> >  fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388
> >  packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490
> >  dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898
> >  xmit_one net/core/dev.c:2870 [inline]
> >  dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2890
> >  __dev_queue_xmit+0x16d1/0x1e60 net/core/dev.c:3355
> >  dev_queue_xmit+0x17/0x20 net/core/dev.c:3388
> >  neigh_hh_output include/net/neighbour.h:468 [inline]
> >  dst_neigh_output include/net/dst.h:452 [inline]
> >  ip6_finish_output2+0x1461/0x2380 net/ipv6/ip6_output.c:123
> >  ip6_finish_output+0x2f9/0x950 net/ipv6/ip6_output.c:149
> >  NF_HOOK_COND include/linux/netfilter.h:246 [inline]
> >  ip6_output+0x1cb/0x8c0 net/ipv6/ip6_output.c:163
> >  ip6_xmit+0xc2f/0x1e80 include/net/dst.h:498
> >  inet6_csk_xmit+0x320/0x5d0 net/ipv6/inet6_connection_sock.c:139
> >  tcp_transmit_skb+0x1ab4/0x3460 net/ipv4/tcp_output.c:1054
> >  tcp_send_syn_data net/ipv4/tcp_output.c:3343 [inline]
> >  tcp_connect+0x11a7/0x2f50 net/ipv4/tcp_output.c:3375
> >  tcp_v6_connect+0x1a6e/0x1f70 net/ipv6/tcp_ipv6.c:295
> >  __inet_stream_connect+0x2d1/0xf80 net/ipv4/af_inet.c:618
> >  tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline]
> >  tcp_sendmsg+0x23ac/0x3bd0 net/ipv4/tcp.c:1133
> >  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
> >  sock_sendmsg_nosec net/socket.c:633 [inline]
> >  sock_sendmsg+0xca/0x110 net/socket.c:643
> >  SYSC_sendto+0x660/0x810 net/socket.c:1685
> >  SyS_sendto+0x40/0x50 net/socket.c:1653
> >  entry_SYSCALL_64_fastpath+0x1f/0xc2
> 
> It seems on-flying packets could still refer the struct sock pointer
> via f->arr[i], if so we need a sync before unlinking it:
> 
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index d56ee46..8724a98 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2924,6 +2924,8 @@ static int packet_release(struct socket *sock)
>         sock_prot_inuse_add(net, sk->sk_prot, -1);
>         preempt_enable();
> 
> +       synchronize_net();
> +
>         spin_lock(&po->bind_lock);
>         unregister_prot_hook(sk, false);
>         packet_cached_dev_reset(po);

More likely the bug is in fanout_add(), with a buggy sequence in error
case, and not correct locking.

kfree(po->rollover);
po->rollover = NULL;

Two cpus entering fanout_add() (using the same af_packet socket,
syzkaller courtesy...) might both see po->fanout being NULL.

Then they grab the mutex.  Too late...
Sowmini Varadhan Feb. 10, 2017, 3:33 a.m. UTC | #2
On (02/09/17 19:19), Eric Dumazet wrote:
> 
> More likely the bug is in fanout_add(), with a buggy sequence in error
> case, and not correct locking.
> 
> kfree(po->rollover);
> po->rollover = NULL;
> 
> Two cpus entering fanout_add() (using the same af_packet socket,
> syzkaller courtesy...) might both see po->fanout being NULL.
> 
> Then they grab the mutex.  Too late...

I'm not sure I follow- aiui the panic was in acceessing the
sk_receive_queue.lock in a socket that had been closed earlier. I think
the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
packet delivery can be done safely, and the synchronize_net in
packet_release() makes sure that the Tx paths are quiesced before freeing
the socket.  What is the race-hole here? Does it have to do with the
_bh and softirq context, somehow?

--Sowmini
Eric Dumazet Feb. 10, 2017, 4:18 a.m. UTC | #3
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?
>

We have probably a dozen of bugs to fix in af_packet.c

The race in fanout_add() is one ot theml.

I do not believe Anoob Soman sent his fixes btw ...

( Look for this thread : http://marc.info/?l=linux-netdev&m=148588680525648&w=2
Cong Wang Feb. 10, 2017, 6 p.m. UTC | #4
On Thu, Feb 9, 2017 at 7:33 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (02/09/17 19:19), Eric Dumazet wrote:
>>
>> More likely the bug is in fanout_add(), with a buggy sequence in error
>> case, and not correct locking.
>>
>> kfree(po->rollover);
>> po->rollover = NULL;
>>
>> Two cpus entering fanout_add() (using the same af_packet socket,
>> syzkaller courtesy...) might both see po->fanout being NULL.
>>
>> Then they grab the mutex.  Too late...
>
> I'm not sure I follow- aiui the panic was in acceessing the
> sk_receive_queue.lock in a socket that had been closed earlier. I think
> the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and
> rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit
> packet delivery can be done safely, and the synchronize_net in
> packet_release() makes sure that the Tx paths are quiesced before freeing
> the socket.  What is the race-hole here? Does it have to do with the
> _bh and softirq context, somehow?

My understanding about the race here is packet_release() doesn't
wait for flying packets correctly, which leads to a flying packet still
refers to the struct sock which is being released.

This could happen because struct packet_fanout is refcn'ted, it is
still there when this is not the last sock referring it, therefore, the
callback packet_rcv_fanout() is not removed yet. When packet_release()
tries to remove the pointer to struct sock from f->arr[i] in
__fanout_unlink(), a flying packet could race with f->arr[i]:

po = pkt_sk(f->arr[idx]);

Of course, the fix may not be as easy as just adding a synchronize_net(),
perhaps we need the spinlock too in fanout_demux_rollover().

At least I believe this explains the crash Dmitry reported.
Sowmini Varadhan Feb. 10, 2017, 7:16 p.m. UTC | #5
On (02/10/17 10:00), Cong Wang wrote:
> My understanding about the race here is packet_release() doesn't
> wait for flying packets correctly, which leads to a flying packet still
> refers to the struct sock which is being released.
> 
> This could happen because struct packet_fanout is refcn'ted, it is
   :
> At least I believe this explains the crash Dmitry reported.

hmm, the proof of the pudding is in the eating- would be good to 
be able to reliably reproduce this somewhere (thus proving that
root-cause analysis is rock-solid), maybe by introducing artificial
delays to slow down paths..

I'm travelling at the moment but may be able to give this (try
to reproduce it reliably) next week.

--Sowmini
diff mbox

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index d56ee46..8724a98 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2924,6 +2924,8 @@  static int packet_release(struct socket *sock)
        sock_prot_inuse_add(net, sk->sk_prot, -1);
        preempt_enable();

+       synchronize_net();
+
        spin_lock(&po->bind_lock);
        unregister_prot_hook(sk, false);
        packet_cached_dev_reset(po);