diff mbox series

[net] net/packet: fix a race in packet_bind() and packet_notifier()

Message ID 1511841652.16595.11.camel@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [net] net/packet: fix a race in packet_bind() and packet_notifier() | expand

Commit Message

Eric Dumazet Nov. 28, 2017, 4 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

syzbot reported crashes [1] and provided a C repro easing bug hunting.

When/if packet_do_bind() calls __unregister_prot_hook() and releases
po->bind_lock, another thread can run packet_notifier() and process an
NETDEV_UP event.

This calls register_prot_hook() and hook again the socket right before
first thread was able to grab again po->bind_lock.

Fixes this issue by adding po->frozen bit :

It is set and cleared by __unregister_prot_hook() if po->bind_lock
needs to be released temporarily.

It is tested in register_prot_hook() to prevent the race condition.

[1]
dev_remove_pack: ffff8801bf16fa80 not found
------------[ cut here ]------------
kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
device syz0 entered promiscuous mode
CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cc57a500 task.stack: ffff8801cc588000
RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
device syz0 entered promiscuous mode
RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
 tun_detach drivers/net/tun.c:670 [inline]
 tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
 __fput+0x333/0x7f0 fs/file_table.c:210
 ____fput+0x15/0x20 fs/file_table.c:244
 task_work_run+0x199/0x270 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x9bb/0x1ae0 kernel/exit.c:865
 do_group_exit+0x149/0x400 kernel/exit.c:968
 SYSC_exit_group kernel/exit.c:979 [inline]
 SyS_exit_group+0x1d/0x20 kernel/exit.c:977
 entry_SYSCALL_64_fastpath+0x1f/0x96
RIP: 0033:0x44ad19


Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
---
 net/packet/af_packet.c |    5 ++++-
 net/packet/internal.h  |    1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Francesco Ruggeri Nov. 28, 2017, 10:23 a.m. UTC | #1
On Mon, Nov 27, 2017 at 8:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> syzbot reported crashes [1] and provided a C repro easing bug hunting.
>
> When/if packet_do_bind() calls __unregister_prot_hook() and releases
> po->bind_lock, another thread can run packet_notifier() and process an
> NETDEV_UP event.
>
> This calls register_prot_hook() and hook again the socket right before
> first thread was able to grab again po->bind_lock.
>
> Fixes this issue by adding po->frozen bit :
>
> It is set and cleared by __unregister_prot_hook() if po->bind_lock
> needs to be released temporarily.
>
> It is tested in register_prot_hook() to prevent the race condition.
>
> [1]
> dev_remove_pack: ffff8801bf16fa80 not found
> ------------[ cut here ]------------
> kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
> invalid opcode: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> device syz0 entered promiscuous mode
> CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> task: ffff8801cc57a500 task.stack: ffff8801cc588000
> RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
> RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
> RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
> RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
> device syz0 entered promiscuous mode
> RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
> R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
> FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
>  tun_detach drivers/net/tun.c:670 [inline]
>  tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
>  __fput+0x333/0x7f0 fs/file_table.c:210
>  ____fput+0x15/0x20 fs/file_table.c:244
>  task_work_run+0x199/0x270 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x9bb/0x1ae0 kernel/exit.c:865
>  do_group_exit+0x149/0x400 kernel/exit.c:968
>  SYSC_exit_group kernel/exit.c:979 [inline]
>  SyS_exit_group+0x1d/0x20 kernel/exit.c:977
>  entry_SYSCALL_64_fastpath+0x1f/0x96
> RIP: 0033:0x44ad19
>
>
> Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
> ---
>  net/packet/af_packet.c |    5 ++++-
>  net/packet/internal.h  |    1 +
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index 737092ca9b4eed464b6c0907d85b679ae4da6046..64382200ab9b0701a510f16a098257ccd7ac5ff5 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -336,7 +336,7 @@ static void register_prot_hook(struct sock *sk)
>  {
>         struct packet_sock *po = pkt_sk(sk);
>
> -       if (!po->running) {
> +       if (!po->running && !po->frozen) {

Would it make sense to move the check for po->frozen to
packet_notifier(NETDEV_UP)?
As far as I can tell that is the only case today that can cause this
race condition, and if new cases come up in the future an error code
may be required rather than silently turning register_prot_hook() into
a noop.
Otherwise it looks fine to me.

Thanks,

Acked-by: Francesco Ruggeri <fruggeri@arista.com>

>                 if (po->fanout)
>                         __fanout_link(sk, po);
>                 else
> @@ -368,9 +368,11 @@ static void __unregister_prot_hook(struct sock *sk, bool sync)
>         __sock_put(sk);
>
>         if (sync) {
> +               po->frozen = 1;
>                 spin_unlock(&po->bind_lock);
>                 synchronize_net();
>                 spin_lock(&po->bind_lock);
> +               po->frozen = 0;
>         }
>  }
>
> @@ -3105,6 +3107,7 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
>                                                                  dev->ifindex);
>                 }
>
> +               BUG_ON(po->running);
>                 po->num = proto;
>                 po->prot_hook.type = proto;
>
> diff --git a/net/packet/internal.h b/net/packet/internal.h
> index 562fbc155006374862e5bfdd78b65a7f46210bea..e039af6e71d650e3beb2936af6cbdd167313499e 100644
> --- a/net/packet/internal.h
> +++ b/net/packet/internal.h
> @@ -114,6 +114,7 @@ struct packet_sock {
>         spinlock_t              bind_lock;
>         struct mutex            pg_vec_lock;
>         unsigned int            running:1,      /* prot_hook is attached*/
> +                               frozen:1,
>                                 auxdata:1,
>                                 origdev:1,
>                                 has_vnet_hdr:1;
David Miller Nov. 28, 2017, 2:48 p.m. UTC | #2
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 27 Nov 2017 20:00:52 -0800

> @@ -368,9 +368,11 @@ static void __unregister_prot_hook(struct sock *sk, bool sync)
>  	__sock_put(sk);
>  
>  	if (sync) {
> +		po->frozen = 1;
>  		spin_unlock(&po->bind_lock);
>  		synchronize_net();
>  		spin_lock(&po->bind_lock);
> +		po->frozen = 0;
>  	}
>  }
>  

Ugh.

Maybe you can just set po->num to zero in the bind code path which causes
this problem.  That will prevent this situation entirely.
Eric Dumazet Nov. 28, 2017, 3:13 p.m. UTC | #3
On Tue, 2017-11-28 at 02:23 -0800, Francesco Ruggeri wrote:
> On Mon, Nov 27, 2017 at 8:00 PM, Eric Dumazet <eric.dumazet@gmail.com
> > wrote:
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > 
...
> > +++ b/net/packet/af_packet.c
> > @@ -336,7 +336,7 @@ static void register_prot_hook(struct sock *sk)
> >  {
> >         struct packet_sock *po = pkt_sk(sk);
> > 
> > -       if (!po->running) {
> > +       if (!po->running && !po->frozen) {
> 
> Would it make sense to move the check for po->frozen to
> packet_notifier(NETDEV_UP)?
> As far as I can tell that is the only case today that can cause this
> race condition, and if new cases come up in the future an error code
> may be required rather than silently turning register_prot_hook()
> into
> a noop.
> Otherwise it looks fine to me.
> 

Whatever works for me is fine, I have no strong opinion on this.

Note that frozen is only set in the case we know that
register_prot_hook() is going to be called by us.
Eric Dumazet Nov. 28, 2017, 3:14 p.m. UTC | #4
On Tue, 2017-11-28 at 09:48 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 27 Nov 2017 20:00:52 -0800
> 
> > @@ -368,9 +368,11 @@ static void __unregister_prot_hook(struct sock
> *sk, bool sync)
> >       __sock_put(sk);
> >  
> >       if (sync) {
> > +             po->frozen = 1;
> >               spin_unlock(&po->bind_lock);
> >               synchronize_net();
> >               spin_lock(&po->bind_lock);
> > +             po->frozen = 0;
> >       }
> >  }
> >  
> 
> Ugh.
> 
> Maybe you can just set po->num to zero in the bind code path which
> causes
> this problem.  That will prevent this situation entirely.

Yes, I can submit a V2 with this idea implemented, thanks.
diff mbox series

Patch

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 737092ca9b4eed464b6c0907d85b679ae4da6046..64382200ab9b0701a510f16a098257ccd7ac5ff5 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -336,7 +336,7 @@  static void register_prot_hook(struct sock *sk)
 {
 	struct packet_sock *po = pkt_sk(sk);
 
-	if (!po->running) {
+	if (!po->running && !po->frozen) {
 		if (po->fanout)
 			__fanout_link(sk, po);
 		else
@@ -368,9 +368,11 @@  static void __unregister_prot_hook(struct sock *sk, bool sync)
 	__sock_put(sk);
 
 	if (sync) {
+		po->frozen = 1;
 		spin_unlock(&po->bind_lock);
 		synchronize_net();
 		spin_lock(&po->bind_lock);
+		po->frozen = 0;
 	}
 }
 
@@ -3105,6 +3107,7 @@  static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
 								 dev->ifindex);
 		}
 
+		BUG_ON(po->running);
 		po->num = proto;
 		po->prot_hook.type = proto;
 
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 562fbc155006374862e5bfdd78b65a7f46210bea..e039af6e71d650e3beb2936af6cbdd167313499e 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -114,6 +114,7 @@  struct packet_sock {
 	spinlock_t		bind_lock;
 	struct mutex		pg_vec_lock;
 	unsigned int		running:1,	/* prot_hook is attached*/
+				frozen:1,
 				auxdata:1,
 				origdev:1,
 				has_vnet_hdr:1;