diff mbox series

Re: Crashers on netnext with apache-benchmark

Message ID 38933857c0499172c20b863084e122437413e2d0.camel@redhat.com
State RFC, archived
Headers show
Series Re: Crashers on netnext with apache-benchmark | expand

Commit Message

Paolo Abeni June 1, 2020, 10:55 a.m. UTC
On Fri, 2020-05-29 at 17:41 -0700, Christoph Paasch wrote:
> On 05/29/20 - 13:19, Christoph Paasch wrote:
> > On 05/29/20 - 13:11, Christoph Paasch wrote:
> > > On 05/29/20 - 22:04, Paolo Abeni wrote:
> > > > On Tue, 2020-05-26 at 17:28 -0700, Christoph Paasch wrote:
> > > > > And another one:
> > > > > 
> > > > > [   62.586401] ==================================================================
> > > > > [   62.588813] BUG: KASAN: use-after-free in inet_twsk_bind_unhash+0x5f/0xe0
> > > > > [   62.589975] Write of size 8 at addr ffff88810f155a20 by task ksoftirqd/2/21
> > > > > [   62.591194]
> > > > > [   62.591485] CPU: 2 PID: 21 Comm: ksoftirqd/2 Kdump: loaded Not tainted 5.7.0-rc6.mptcp #36
> > > > > [   62.593067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> > > > > [   62.595268] Call Trace:
> > > > > [   62.595775]  dump_stack+0x76/0xa0
> > > > > [   62.596448]  print_address_description.constprop.0+0x3a/0x60
> > > > > [   62.600581]  __kasan_report.cold+0x20/0x3b
> > > > > [   62.602968]  kasan_report+0x38/0x50
> > > > > [   62.603561]  inet_twsk_bind_unhash+0x5f/0xe0
> > > > > [   62.604282]  inet_twsk_kill+0x195/0x200
> > > > > [   62.604945]  inet_twsk_deschedule_put+0x25/0x30
> > > > > [   62.605731]  tcp_v4_rcv+0xa79/0x15e0
> > > > > [   62.607139]  ip_protocol_deliver_rcu+0x37/0x270
> > > > > [   62.607980]  ip_local_deliver_finish+0xb0/0xd0
> > > > > [   62.608758]  ip_local_deliver+0x1c9/0x1e0
> > > > > [   62.611162]  ip_sublist_rcv_finish+0x84/0xa0
> > > > > [   62.611894]  ip_sublist_rcv+0x22c/0x320
> > > > > [   62.616143]  ip_list_rcv+0x1e4/0x225
> > > > > [   62.619427]  __netif_receive_skb_list_core+0x439/0x460
> > > > > [   62.622771]  netif_receive_skb_list_internal+0x3ea/0x570
> > > > > [   62.625320]  gro_normal_list.part.0+0x14/0x50
> > > > > [   62.626088]  napi_gro_receive+0x6a/0xb0
> > > > > [   62.626787]  receive_buf+0x371/0x1d50
> > > > > [   62.632092]  virtnet_poll+0x2be/0x5b0
> > > > > [   62.634099]  net_rx_action+0x1ec/0x4c0
> > > > > [   62.636132]  __do_softirq+0xfc/0x29c
> > > > > [   62.638180]  run_ksoftirqd+0x15/0x30
> > > > > [   62.638787]  smpboot_thread_fn+0x1fc/0x380
> > > > > [   62.642009]  kthread+0x1f1/0x210
> > > > > [   62.643478]  ret_from_fork+0x35/0x40
> > > > > [   62.644094]
> > > > > [   62.644371] Allocated by task 1355:
> > > > > [   62.644980]  save_stack+0x1b/0x40
> > > > > [   62.645539]  __kasan_kmalloc.constprop.0+0xc2/0xd0
> > > > > [   62.646347]  kmem_cache_alloc+0xb8/0x190
> > > > > [   62.647006]  getname_flags+0x6b/0x2b0
> > > > > [   62.647627]  user_path_at_empty+0x1b/0x40
> > > > > [   62.648306]  vfs_statx+0xba/0x140
> > > > > [   62.648875]  __do_sys_newstat+0x8c/0xf0
> > > > > [   62.649518]  do_syscall_64+0xbc/0x790
> > > > > [   62.650199]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > [   62.651091]
> > > > > [   62.651360] Freed by task 1355:
> > > > > [   62.651903]  save_stack+0x1b/0x40
> > > > > [   62.652460]  __kasan_slab_free+0x12f/0x180
> > > > > [   62.653147]  kmem_cache_free+0x87/0x240
> > > > > [   62.653795]  filename_lookup+0x183/0x250
> > > > > [   62.654447]  vfs_statx+0xba/0x140
> > > > > [   62.655001]  __do_sys_newstat+0x8c/0xf0
> > > > > [   62.655640]  do_syscall_64+0xbc/0x790
> > > > > [   62.656246]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > [   62.657089]
> > > > > [   62.657351] The buggy address belongs to the object at ffff88810f155500
> > > > >                 which belongs to the cache names_cache of size 4096
> > > > > [   62.659420] The buggy address is located 1312 bytes inside of
> > > > >                 4096-byte region [ffff88810f155500, ffff88810f156500)
> > > > > [   62.661358] The buggy address belongs to the page:
> > > > > [   62.662175] page:ffffea00043c5400 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 head:ffffea00043c5400 order:3 compound_mapcount:0 compound_pincount:0
> > > > > [   62.664523] flags: 0x8000000000010200(slab|head)
> > > > > [   62.665342] raw: 8000000000010200 0000000000000000 0000000400000001 ffff88811ac772c0
> > > > > [   62.666713] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
> > > > > [   62.667984] page dumped because: kasan: bad access detected
> > > > > [   62.668904]
> > > > > [   62.669171] Memory state around the buggy address:
> > > > > [   62.669975]  ffff88810f155900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [   62.671163]  ffff88810f155980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [   62.672363] >ffff88810f155a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [   62.673559]                                ^
> > > > > [   62.674349]  ffff88810f155a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [   62.675531]  ffff88810f155b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > > > [   62.676723] ==================================================================
> > > > 
> > > > Could you please try booting with "slab_nomerge" on the kernel command
> > > > line?
> > > > 
> > > > Possibly kasan could track better the really relevant UAF cycle that
> > > > way.
> > > > 
> > > > Thanks!
> > > 
> > > Sure, will try that!
> > > 
> > > Some more info from my crash-analysis: the next-pointer in the bindhash-table
> > > is pointing to free'd memory. That free'd memory seems to be a tcp_sock in state CLOSED.
> > > 
> > > 
> > > So, I guess the bind_node is not unlinked when a socket is being free'd. I'm
> > > adding debugging code to see if that is true.
> 
> Here is where I got today:
> 
> As mentioned above, the next-pointer in the timewait-sock's bind-hash-node
> points to a struct sock (to the sk_bind_node-element) that has been free'd.
> 
> I found that __sk_destruct ends up freeing a socket that still is in the
> bind-hash-table.
> 
> The socket under question here is the one that goes through the error-path
> in subflow_syn_recv_sock when mptcp_finish_join() fails. So, somehow the
> cleanup for that socket is not working correctly.
> 
> 
> Now, from here on I'm less sure about what is happening:
> 
> I see that on the subflow at the end of tcp_done() (called from
> subflow_syn_recv_sock()), icsk_bind_hash is NULL, but the sk_bind_node still
> is in the list (I changed __sk_del_bind_node to use hlist_del_init to be
> able to really check that).
> 
> That does not make a whole lot of sense to me...
> 
> Anyway, that's how far I got. I'm calling it a day ;-)

Thank you for the great effort! we still have a race in child socket
disposal:

subflow_syn_recv_sock() calls inet_csk_prepare_for_destroy_sock which
clear 'inet_num'. Later we have the following call-chain: tcp_done() ->
inet_csk_destroy_sock() -> tcp_v4_destroy_sock() -> inet_put_port() -> 
__inet_put_port().

The latter uses 'inet_num' to select the correct bind hash bucket and
acquire the correct lock. Since 'inet_num' is now always 0, it likely
acquires the wrong lock and the list operation is unsafe -> may corrupt
the list (on any bucket other than 0).

I still can't reproduce the issue here, so could you please test the
following:

---
diff mbox series

Patch

diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 2f1f8c3efb26..e5b388f5fa20 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -292,7 +292,6 @@  static inline void inet_csk_prepare_for_destroy_sock(struct sock *sk)
 	/* The below has to be done to allow calling inet_csk_destroy_sock */
 	sock_set_flag(sk, SOCK_DEAD);
 	percpu_counter_inc(sk->sk_prot->orphan_count);
-	inet_sk(sk)->inet_num = 0;
 }
 
 void inet_csk_destroy_sock(struct sock *sk);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index f40b1b72f979..afaf582a5aa9 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -902,6 +902,7 @@  void inet_csk_prepare_forced_close(struct sock *sk)
 	bh_unlock_sock(sk);
 	sock_put(sk);
 	inet_csk_prepare_for_destroy_sock(sk);
+	inet_sk(sk)->inet_num = 0;
 }
 EXPORT_SYMBOL(inet_csk_prepare_forced_close);