Message ID | 1527382936-4850-1-git-send-email-liuhangbin@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net] VSOCK: check sk state before receive | expand |
Hmm...Although I won't reproduce this bug with my reproducer after
apply my patch. I could still get a similiar issue with syzkaller sock vnet test.
It looks this patch is not complete. Here is the KASAN call trace with my patch.
I can also reproduce it without my patch.
==================================================================
BUG: KASAN: use-after-free in vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
Read of size 4 at addr ffff880026a3a914 by task kworker/0:2/96
CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.17.0-rc6.vsock+ #28
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: events dg_delayed_dispatch [vmw_vmci]
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xdd/0x18e lib/dump_stack.c:113
print_address_description+0x7a/0x3e0 mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report+0x1dd/0x460 mm/kasan/report.c:412
vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
vmci_transport_recv_dgram_cb+0x5d/0x200 [vmw_vsock_vmci_transport]
dg_delayed_dispatch+0x99/0x1b0 [vmw_vmci]
process_one_work+0xa4e/0x1720 kernel/workqueue.c:2145
worker_thread+0x1df/0x1400 kernel/workqueue.c:2279
kthread+0x343/0x4b0 kernel/kthread.c:240
ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
Allocated by task 2684:
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slub.c:2741 [inline]
slab_alloc mm/slub.c:2749 [inline]
kmem_cache_alloc+0x105/0x330 mm/slub.c:2754
sk_prot_alloc+0x6a/0x2c0 net/core/sock.c:1468
sk_alloc+0xc9/0xbb0 net/core/sock.c:1528
__vsock_create+0xc8/0x9b0 [vsock]
vsock_create+0xfd/0x1a0 [vsock]
__sock_create+0x310/0x690 net/socket.c:1285
sock_create net/socket.c:1325 [inline]
__sys_socket+0x101/0x240 net/socket.c:1355
__do_sys_socket net/socket.c:1364 [inline]
__se_sys_socket net/socket.c:1362 [inline]
__x64_sys_socket+0x7d/0xd0 net/socket.c:1362
do_syscall_64+0x175/0x630 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 2684:
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x130/0x180 mm/kasan/kasan.c:521
slab_free_hook mm/slub.c:1388 [inline]
slab_free_freelist_hook mm/slub.c:1415 [inline]
slab_free mm/slub.c:2988 [inline]
kmem_cache_free+0xce/0x410 mm/slub.c:3004
sk_prot_free net/core/sock.c:1509 [inline]
__sk_destruct+0x629/0x940 net/core/sock.c:1593
sk_destruct+0x4e/0x90 net/core/sock.c:1601
__sk_free+0xd3/0x320 net/core/sock.c:1612
sk_free+0x2a/0x30 net/core/sock.c:1623
__vsock_release+0x431/0x610 [vsock]
vsock_release+0x3c/0xc0 [vsock]
sock_release+0x91/0x200 net/socket.c:594
sock_close+0x17/0x20 net/socket.c:1149
__fput+0x368/0xa20 fs/file_table.c:209
task_work_run+0x1c5/0x2a0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x1876/0x26c0 kernel/exit.c:865
do_group_exit+0x159/0x3e0 kernel/exit.c:968
get_signal+0x65a/0x1780 kernel/signal.c:2482
do_signal+0xa4/0x1fe0 arch/x86/kernel/signal.c:810
exit_to_usermode_loop+0x1b8/0x260 arch/x86/entry/common.c:162
prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
do_syscall_64+0x505/0x630 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff880026a3a600
which belongs to the cache AF_VSOCK of size 1056
The buggy address is located 788 bytes inside of
1056-byte region [ffff880026a3a600, ffff880026a3aa20)
The buggy address belongs to the page:
page:ffffea00009a8e00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
flags: 0xfffffc0008100(slab|head)
raw: 000fffffc0008100 0000000000000000 0000000000000000 00000001000d000d
raw: dead000000000100 dead000000000200 ffff880034471a40 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880026a3a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880026a3a880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff880026a3a900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff880026a3a980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff880026a3aa00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
On Sun, May 27, 2018 at 11:29:45PM +0800, Hangbin Liu wrote: > Hmm...Although I won't reproduce this bug with my reproducer after > apply my patch. I could still get a similiar issue with syzkaller sock vnet test. > > It looks this patch is not complete. Here is the KASAN call trace with my patch. > I can also reproduce it without my patch. Seems like a race between vmci_datagram_destroy_handle() and the delayed callback, vmci_transport_recv_dgram_cb(). I don't know the VMCI transport well so I'll leave this to Jorgen. > ================================================================== > BUG: KASAN: use-after-free in vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport] > Read of size 4 at addr ffff880026a3a914 by task kworker/0:2/96 > > CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.17.0-rc6.vsock+ #28 > Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > Workqueue: events dg_delayed_dispatch [vmw_vmci] > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xdd/0x18e lib/dump_stack.c:113 > print_address_description+0x7a/0x3e0 mm/kasan/report.c:256 > kasan_report_error mm/kasan/report.c:354 [inline] > kasan_report+0x1dd/0x460 mm/kasan/report.c:412 > vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport] > vmci_transport_recv_dgram_cb+0x5d/0x200 [vmw_vsock_vmci_transport] > dg_delayed_dispatch+0x99/0x1b0 [vmw_vmci] > process_one_work+0xa4e/0x1720 kernel/workqueue.c:2145 > worker_thread+0x1df/0x1400 kernel/workqueue.c:2279 > kthread+0x343/0x4b0 kernel/kthread.c:240 > ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412 > > Allocated by task 2684: > set_track mm/kasan/kasan.c:460 [inline] > kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553 > slab_post_alloc_hook mm/slab.h:444 [inline] > slab_alloc_node mm/slub.c:2741 [inline] > slab_alloc mm/slub.c:2749 [inline] > kmem_cache_alloc+0x105/0x330 mm/slub.c:2754 > sk_prot_alloc+0x6a/0x2c0 net/core/sock.c:1468 > sk_alloc+0xc9/0xbb0 net/core/sock.c:1528 > __vsock_create+0xc8/0x9b0 [vsock] > vsock_create+0xfd/0x1a0 [vsock] > __sock_create+0x310/0x690 net/socket.c:1285 > sock_create net/socket.c:1325 [inline] > __sys_socket+0x101/0x240 net/socket.c:1355 > __do_sys_socket net/socket.c:1364 [inline] > __se_sys_socket net/socket.c:1362 [inline] > __x64_sys_socket+0x7d/0xd0 net/socket.c:1362 > do_syscall_64+0x175/0x630 arch/x86/entry/common.c:287 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Freed by task 2684: > set_track mm/kasan/kasan.c:460 [inline] > __kasan_slab_free+0x130/0x180 mm/kasan/kasan.c:521 > slab_free_hook mm/slub.c:1388 [inline] > slab_free_freelist_hook mm/slub.c:1415 [inline] > slab_free mm/slub.c:2988 [inline] > kmem_cache_free+0xce/0x410 mm/slub.c:3004 > sk_prot_free net/core/sock.c:1509 [inline] > __sk_destruct+0x629/0x940 net/core/sock.c:1593 > sk_destruct+0x4e/0x90 net/core/sock.c:1601 > __sk_free+0xd3/0x320 net/core/sock.c:1612 > sk_free+0x2a/0x30 net/core/sock.c:1623 > __vsock_release+0x431/0x610 [vsock] > vsock_release+0x3c/0xc0 [vsock] > sock_release+0x91/0x200 net/socket.c:594 > sock_close+0x17/0x20 net/socket.c:1149 > __fput+0x368/0xa20 fs/file_table.c:209 > task_work_run+0x1c5/0x2a0 kernel/task_work.c:113 > exit_task_work include/linux/task_work.h:22 [inline] > do_exit+0x1876/0x26c0 kernel/exit.c:865 > do_group_exit+0x159/0x3e0 kernel/exit.c:968 > get_signal+0x65a/0x1780 kernel/signal.c:2482 > do_signal+0xa4/0x1fe0 arch/x86/kernel/signal.c:810 > exit_to_usermode_loop+0x1b8/0x260 arch/x86/entry/common.c:162 > prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] > syscall_return_slowpath arch/x86/entry/common.c:265 [inline] > do_syscall_64+0x505/0x630 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > The buggy address belongs to the object at ffff880026a3a600 > which belongs to the cache AF_VSOCK of size 1056 > The buggy address is located 788 bytes inside of > 1056-byte region [ffff880026a3a600, ffff880026a3aa20) > The buggy address belongs to the page: > page:ffffea00009a8e00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 > flags: 0xfffffc0008100(slab|head) > raw: 000fffffc0008100 0000000000000000 0000000000000000 00000001000d000d > raw: dead000000000100 dead000000000200 ffff880034471a40 0000000000000000 > page dumped because: kasan: bad access detected > > Memory state around the buggy address: > ffff880026a3a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff880026a3a880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > >ffff880026a3a900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ^ > ffff880026a3a980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb > ffff880026a3aa00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc > ==================================================================
> On May 30, 2018, at 11:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Sun, May 27, 2018 at 11:29:45PM +0800, Hangbin Liu wrote: >> Hmm...Although I won't reproduce this bug with my reproducer after >> apply my patch. I could still get a similiar issue with syzkaller sock vnet test. >> >> It looks this patch is not complete. Here is the KASAN call trace with my patch. >> I can also reproduce it without my patch. > > Seems like a race between vmci_datagram_destroy_handle() and the > delayed callback, vmci_transport_recv_dgram_cb(). > > I don't know the VMCI transport well so I'll leave this to Jorgen. Yes, it looks like we are calling the delayed callback after we return from vmci_datagram_destroy_handle(). I’ll take a closer look at the VMCI side here - the refcounting of VMCI datagram endpoints should guard against this, since the delayed callback does a get on the datagram resource, so this could a VMCI driver issue, and not a problem in the VMCI transport for AF_VSOCK. > >> ================================================================== >> BUG: KASAN: use-after-free in vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport] >> Read of size 4 at addr ffff880026a3a914 by task kworker/0:2/96 >> >> CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.17.0-rc6.vsock+ #28 >> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 >> Workqueue: events dg_delayed_dispatch [vmw_vmci] >> Call Trace: >> __dump_stack lib/dump_stack.c:77 [inline] >> dump_stack+0xdd/0x18e lib/dump_stack.c:113 >> print_address_description+0x7a/0x3e0 mm/kasan/report.c:256 >> kasan_report_error mm/kasan/report.c:354 [inline] >> kasan_report+0x1dd/0x460 mm/kasan/report.c:412 >> vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport] >> vmci_transport_recv_dgram_cb+0x5d/0x200 [vmw_vsock_vmci_transport] >> dg_delayed_dispatch+0x99/0x1b0 [vmw_vmci] >> process_one_work+0xa4e/0x1720 kernel/workqueue.c:2145 >> worker_thread+0x1df/0x1400 kernel/workqueue.c:2279 >> kthread+0x343/0x4b0 kernel/kthread.c:240 >> ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412 >> >> Allocated by task 2684: >> set_track mm/kasan/kasan.c:460 [inline] >> kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553 >> slab_post_alloc_hook mm/slab.h:444 [inline] >> slab_alloc_node mm/slub.c:2741 [inline] >> slab_alloc mm/slub.c:2749 [inline] >> kmem_cache_alloc+0x105/0x330 mm/slub.c:2754 >> sk_prot_alloc+0x6a/0x2c0 net/core/sock.c:1468 >> sk_alloc+0xc9/0xbb0 net/core/sock.c:1528 >> __vsock_create+0xc8/0x9b0 [vsock] >> vsock_create+0xfd/0x1a0 [vsock] >> __sock_create+0x310/0x690 net/socket.c:1285 >> sock_create net/socket.c:1325 [inline] >> __sys_socket+0x101/0x240 net/socket.c:1355 >> __do_sys_socket net/socket.c:1364 [inline] >> __se_sys_socket net/socket.c:1362 [inline] >> __x64_sys_socket+0x7d/0xd0 net/socket.c:1362 >> do_syscall_64+0x175/0x630 arch/x86/entry/common.c:287 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> Freed by task 2684: >> set_track mm/kasan/kasan.c:460 [inline] >> __kasan_slab_free+0x130/0x180 mm/kasan/kasan.c:521 >> slab_free_hook mm/slub.c:1388 [inline] >> slab_free_freelist_hook mm/slub.c:1415 [inline] >> slab_free mm/slub.c:2988 [inline] >> kmem_cache_free+0xce/0x410 mm/slub.c:3004 >> sk_prot_free net/core/sock.c:1509 [inline] >> __sk_destruct+0x629/0x940 net/core/sock.c:1593 >> sk_destruct+0x4e/0x90 net/core/sock.c:1601 >> __sk_free+0xd3/0x320 net/core/sock.c:1612 >> sk_free+0x2a/0x30 net/core/sock.c:1623 >> __vsock_release+0x431/0x610 [vsock] >> vsock_release+0x3c/0xc0 [vsock] >> sock_release+0x91/0x200 net/socket.c:594 >> sock_close+0x17/0x20 net/socket.c:1149 >> __fput+0x368/0xa20 fs/file_table.c:209 >> task_work_run+0x1c5/0x2a0 kernel/task_work.c:113 >> exit_task_work include/linux/task_work.h:22 [inline] >> do_exit+0x1876/0x26c0 kernel/exit.c:865 >> do_group_exit+0x159/0x3e0 kernel/exit.c:968 >> get_signal+0x65a/0x1780 kernel/signal.c:2482 >> do_signal+0xa4/0x1fe0 arch/x86/kernel/signal.c:810 >> exit_to_usermode_loop+0x1b8/0x260 arch/x86/entry/common.c:162 >> prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] >> syscall_return_slowpath arch/x86/entry/common.c:265 [inline] >> do_syscall_64+0x505/0x630 arch/x86/entry/common.c:290 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> The buggy address belongs to the object at ffff880026a3a600 >> which belongs to the cache AF_VSOCK of size 1056 >> The buggy address is located 788 bytes inside of >> 1056-byte region [ffff880026a3a600, ffff880026a3aa20) >> The buggy address belongs to the page: >> page:ffffea00009a8e00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 >> flags: 0xfffffc0008100(slab|head) >> raw: 000fffffc0008100 0000000000000000 0000000000000000 00000001000d000d >> raw: dead000000000100 dead000000000200 ffff880034471a40 0000000000000000 >> page dumped because: kasan: bad access detected >> >> Memory state around the buggy address: >> ffff880026a3a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff880026a3a880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >>> ffff880026a3a900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ^ >> ffff880026a3a980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >> ffff880026a3aa00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc >> ==================================================================
On Mon, Jun 04, 2018 at 04:02:39PM +0000, Jorgen S. Hansen wrote: > > > On May 30, 2018, at 11:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Sun, May 27, 2018 at 11:29:45PM +0800, Hangbin Liu wrote: > >> Hmm...Although I won't reproduce this bug with my reproducer after > >> apply my patch. I could still get a similiar issue with syzkaller sock vnet test. > >> > >> It looks this patch is not complete. Here is the KASAN call trace with my patch. > >> I can also reproduce it without my patch. > > > > Seems like a race between vmci_datagram_destroy_handle() and the > > delayed callback, vmci_transport_recv_dgram_cb(). > > > > I don't know the VMCI transport well so I'll leave this to Jorgen. > > Yes, it looks like we are calling the delayed callback after we return from vmci_datagram_destroy_handle(). I’ll take a closer look at the VMCI side here - the refcounting of VMCI datagram endpoints should guard against this, since the delayed callback does a get on the datagram resource, so this could a VMCI driver issue, and not a problem in the VMCI transport for AF_VSOCK. Hi Jorgen, Thanks for helping look at this. I'm happy to run test for you patch. Thanks Hangbin
Hi Hangbin, I finaly got to the bottom of this - the issue was indeed in the VMCI driver. The patch is posted here: https://lkml.org/lkml/2018/9/21/326 I used your reproduce.log to test the fix. Thanks for discovering this issue. Thanks, Jørgen
On Fri, Sep 21, 2018 at 07:48:25AM +0000, Jorgen S. Hansen wrote: > Hi Hangbin, > > I finaly got to the bottom of this - the issue was indeed in the VMCI driver. The patch is posted here: > > https://lkml.org/lkml/2018/9/21/326 > > I used your reproduce.log to test the fix. Thanks for discovering this issue. Hi Jorgen, Thanks for your patch. I built a test kernel with your fix, run my reproducer and syzkaller socket vnet test for a while. There is no such error. So I think your patch fixed this issue. BTW, with FAULT_INJECTION enabled. I got another call trace: [ 251.166377] FAULT_INJECTION: forcing a failure. [ 251.178736] CPU: 15 PID: 10448 Comm: syz-executor7 Not tainted 4.19.0-rc4.syz.vnet+ #3 [ 251.187577] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016 [ 251.187578] Call Trace: [ 251.187586] dump_stack+0x8c/0xce [ 251.187594] should_fail+0x5dd/0x6b0 [ 251.199932] ? fault_create_debugfs_attr+0x1d0/0x1d0 [ 251.199937] __should_failslab+0xe8/0x120 [ 251.199945] should_failslab+0xa/0x20 [ 251.228430] kmem_cache_alloc_trace+0x43/0x1f0 [ 251.233392] ? vhost_dev_set_owner+0x366/0x790 [vhost] [ 251.239129] vhost_dev_set_owner+0x366/0x790 [vhost] [ 251.244672] ? vhost_poll_wakeup+0xa0/0xa0 [vhost] [ 251.250018] ? kasan_unpoison_shadow+0x30/0x40 [ 251.254978] ? vhost_worker+0x370/0x370 [vhost] [ 251.260035] ? kasan_kmalloc_large+0x71/0xe0 [ 251.264799] ? kmalloc_order+0x54/0x60 [ 251.268985] vhost_net_ioctl+0xc2e/0x14c0 [vhost_net] [ 251.274635] ? avc_ss_reset+0x150/0x150 [ 251.278915] ? kstrtouint_from_user+0xe5/0x140 [ 251.283876] ? handle_tx_kick+0x40/0x40 [vhost_net] [ 251.289320] ? save_stack+0x89/0xb0 [ 251.293213] ? __kasan_slab_free+0x12e/0x180 [ 251.297979] ? kmem_cache_free+0x7a/0x210 [ 251.302452] ? putname+0xe2/0x120 [ 251.306151] ? get_pid_task+0x6e/0x90 [ 251.310238] ? proc_fail_nth_write+0x91/0x1c0 [ 251.315100] ? map_files_get_link+0x3c0/0x3c0 [ 251.319963] ? exit_robust_list+0x1c0/0x1c0 [ 251.324633] ? __vfs_write+0xf7/0x6a0 [ 251.328711] ? handle_tx_kick+0x40/0x40 [vhost_net] [ 251.334154] do_vfs_ioctl+0x1a5/0xfb0 [ 251.338241] ? ioctl_preallocate+0x1c0/0x1c0 [ 251.343009] ? selinux_file_ioctl+0x382/0x560 [ 251.347872] ? selinux_capable+0x40/0x40 [ 251.352250] ? __fget+0x211/0x2e0 [ 251.355949] ? iterate_fd+0x1c0/0x1c0 [ 251.360038] ? syscall_trace_enter+0x285/0xaa0 [ 251.365011] ? security_file_ioctl+0x5d/0xb0 [ 251.369776] ? selinux_capable+0x40/0x40 [ 251.374153] ksys_ioctl+0x89/0xa0 [ 251.377853] __x64_sys_ioctl+0x74/0xb0 [ 251.382036] do_syscall_64+0xc3/0x390 [ 251.386123] ? syscall_return_slowpath+0x14c/0x230 [ 251.391473] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 251.397111] RIP: 0033:0x451b89 [ 251.400519] Code: fc ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b 67 fc ff c3 66 2e 0f 1f 84 00 00 00 00 [ 251.421476] RSP: 002b:00007fc0d9673c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 251.429927] RAX: ffffffffffffffda RBX: 00007fc0d96746b4 RCX: 0000000000451b89 [ 251.437889] RDX: 0000000000000000 RSI: 000000000000af01 RDI: 0000000000000003 [ 251.445852] RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 [ 251.453815] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 [ 251.461778] R13: 0000000000006450 R14: 00000000004d3090 R15: 00007fc0d9674700 Thanks Hangbin
On Sep 22, 2018, at 8:27 AM, Hangbin Liu <liuhangbin@gmail.com> wrote: > > On Fri, Sep 21, 2018 at 07:48:25AM +0000, Jorgen S. Hansen wrote: >> Hi Hangbin, >> >> I finaly got to the bottom of this - the issue was indeed in the VMCI driver. The patch is posted here: >> >> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2018%2F9%2F21%2F326&data=02%7C01%7Cjhansen%40vmware.com%7C280a9c79e99248db3d1108d6205488de%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C1%7C0%7C636731944784367201&sdata=BEExXdB%2BF0SSh83Epa5fAuyoG4qn%2FAeM2J0wG2gM6GQ%3D&reserved=0 >> >> I used your reproduce.log to test the fix. Thanks for discovering this issue. > > Hi Jorgen, > > Thanks for your patch. I built a test kernel with your fix, run my > reproducer and syzkaller socket vnet test for a while. There is no such > error. So I think your patch fixed this issue. Great. Thanks a lot for trying out the patch. > BTW, with FAULT_INJECTION enabled. I got another call trace: The vhost_* stuff is for Virtio. Stefan would know better what is going on there. > [ 251.166377] FAULT_INJECTION: forcing a failure. > [ 251.178736] CPU: 15 PID: 10448 Comm: syz-executor7 Not tainted 4.19.0 > -rc4.syz.vnet+ #3 > [ 251.187577] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.5 04/11/2016 > [ 251.187578] Call Trace: > [ 251.187586] dump_stack+0x8c/0xce > [ 251.187594] should_fail+0x5dd/0x6b0 > [ 251.199932] ? fault_create_debugfs_attr+0x1d0/0x1d0 > [ 251.199937] __should_failslab+0xe8/0x120 > [ 251.199945] should_failslab+0xa/0x20 > [ 251.228430] kmem_cache_alloc_trace+0x43/0x1f0 > [ 251.233392] ? vhost_dev_set_owner+0x366/0x790 [vhost] > [ 251.239129] vhost_dev_set_owner+0x366/0x790 [vhost] > [ 251.244672] ? vhost_poll_wakeup+0xa0/0xa0 [vhost] > [ 251.250018] ? kasan_unpoison_shadow+0x30/0x40 > [ 251.254978] ? vhost_worker+0x370/0x370 [vhost] > [ 251.260035] ? kasan_kmalloc_large+0x71/0xe0 > [ 251.264799] ? kmalloc_order+0x54/0x60 > [ 251.268985] vhost_net_ioctl+0xc2e/0x14c0 [vhost_net] > [ 251.274635] ? avc_ss_reset+0x150/0x150 > [ 251.278915] ? kstrtouint_from_user+0xe5/0x140 > [ 251.283876] ? handle_tx_kick+0x40/0x40 [vhost_net] > [ 251.289320] ? save_stack+0x89/0xb0 > [ 251.293213] ? __kasan_slab_free+0x12e/0x180 > [ 251.297979] ? kmem_cache_free+0x7a/0x210 > [ 251.302452] ? putname+0xe2/0x120 > [ 251.306151] ? get_pid_task+0x6e/0x90 > [ 251.310238] ? proc_fail_nth_write+0x91/0x1c0 > [ 251.315100] ? map_files_get_link+0x3c0/0x3c0 > [ 251.319963] ? exit_robust_list+0x1c0/0x1c0 > [ 251.324633] ? __vfs_write+0xf7/0x6a0 > [ 251.328711] ? handle_tx_kick+0x40/0x40 [vhost_net] > [ 251.334154] do_vfs_ioctl+0x1a5/0xfb0 > [ 251.338241] ? ioctl_preallocate+0x1c0/0x1c0 > [ 251.343009] ? selinux_file_ioctl+0x382/0x560 > [ 251.347872] ? selinux_capable+0x40/0x40 > [ 251.352250] ? __fget+0x211/0x2e0 > [ 251.355949] ? iterate_fd+0x1c0/0x1c0 > [ 251.360038] ? syscall_trace_enter+0x285/0xaa0 > [ 251.365011] ? security_file_ioctl+0x5d/0xb0 > [ 251.369776] ? selinux_capable+0x40/0x40 > [ 251.374153] ksys_ioctl+0x89/0xa0 > [ 251.377853] __x64_sys_ioctl+0x74/0xb0 > [ 251.382036] do_syscall_64+0xc3/0x390 > [ 251.386123] ? syscall_return_slowpath+0x14c/0x230 > [ 251.391473] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 251.397111] RIP: 0033:0x451b89 > [ 251.400519] Code: fc ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 0b 67 fc ff c3 66 2e 0f 1f 84 00 00 00 00 > [ 251.421476] RSP: 002b:00007fc0d9673c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 251.429927] RAX: ffffffffffffffda RBX: 00007fc0d96746b4 RCX: 0000000000451b89 > [ 251.437889] RDX: 0000000000000000 RSI: 000000000000af01 RDI: 0000000000000003 > [ 251.445852] RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000 > [ 251.453815] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004 > [ 251.461778] R13: 0000000000006450 R14: 00000000004d3090 R15: 00007fc0d9674700 > > Thanks > Hangbin Thanks, Jorgen
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index a7a73ff..0d26040 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -612,6 +612,13 @@ static int vmci_transport_recv_dgram_cb(void *data, struct vmci_datagram *dg) if (!vmci_transport_allow_dgram(vsk, dg->src.context)) return VMCI_ERROR_NO_ACCESS; + bh_lock_sock(sk); + if (sk->sk_state == TCP_CLOSE) { + bh_unlock_sock(sk); + return VMCI_ERROR_DATAGRAM_FAILED; + } + bh_unlock_sock(sk); + size = VMCI_DG_SIZE(dg); /* Attach the packet to the socket's receive queue as an sk_buff. */
Since vmci_transport_recv_dgram_cb is a callback function and we access the socket struct without holding the lock here, there is a possibility that sk has been released and we use it again. This may cause a NULL pointer dereference later, while receiving. Here is the call trace: [ 389.486319] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 389.494148] PGD 0 P4D 0 [ 389.496687] Oops: 0000 [#1] SMP PTI [ 389.500170] Modules linked in: vhost_net vmw_vsock_vmci_transport tun vsock vhost vmw_vmci tap iptable_security iptable_raw iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_s [ 389.510984] Failed to add new resource (handle=0x2:0x2711), error: -22 [ 389.543309] Failed to add new resource (handle=0x2:0x2711), error: -22 [ 389.570936] ttm drm crc32c_intel mptsas scsi_transport_sas serio_raw ata_piix mptscsih libata i2c_core mptbase bnx2 dm_mirror dm_region_hash dm_log dm_mod [ 389.597899] CPU: 3 PID: 113 Comm: kworker/3:2 Tainted: G I 4.17.0-rc6.latest+ #25 [ 389.606673] Hardware name: Dell Inc. PowerEdge R710/0XDX06, BIOS 6.1.0 10/18/2011 [ 389.614158] Workqueue: events dg_delayed_dispatch [vmw_vmci] [ 389.619820] RIP: 0010:selinux_socket_sock_rcv_skb+0x46/0x270 [ 389.625475] RSP: 0018:ffffbcb5416b7ce0 EFLAGS: 00010293 [ 389.630698] RAX: 0000000000000000 RBX: 0000000000000028 RCX: 0000000000000007 [ 389.637825] RDX: 0000000000000000 RSI: ffff94a29feec500 RDI: ffffbcb5416b7d18 [ 389.644953] RBP: ffff94a29bd9a640 R08: 0000000000000001 R09: ffff94a187c03080 [ 389.652080] R10: ffffbcb5416b7d80 R11: 0000000000000000 R12: ffffbcb5416b7d18 [ 389.659206] R13: ffff94a29feec500 R14: ffff94a2afda5e00 R15: 0ffff94a2afda5e0 [ 389.666336] FS: 0000000000000000(0000) GS:ffff94a2afd80000(0000) knlGS:0000000000000000 [ 389.674419] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 389.680160] CR2: 0000000000000010 CR3: 000000004320a003 CR4: 00000000000206e0 [ 389.687283] Call Trace: [ 389.689738] ? __alloc_skb+0xa0/0x230 [ 389.693407] security_sock_rcv_skb+0x32/0x60 [ 389.697679] ? __alloc_skb+0xa0/0x230 [ 389.701343] sk_filter_trim_cap+0x4e/0x1f0 [ 389.705442] __sk_receive_skb+0x32/0x290 [ 389.709372] vmci_transport_recv_dgram_cb+0xa7/0xd0 [vmw_vsock_vmci_transport] [ 389.716593] dg_delayed_dispatch+0x22/0x50 [vmw_vmci] [ 389.721648] process_one_work+0x1f2/0x4a0 [ 389.725662] worker_thread+0x38/0x4c0 [ 389.729329] ? process_one_work+0x4a0/0x4a0 [ 389.733512] kthread+0x12f/0x150 [ 389.736743] ? kthread_create_worker_on_cpu+0x90/0x90 [ 389.741796] ret_from_fork+0x35/0x40 [ 389.745370] Code: 8b 04 25 28 00 00 00 48 89 44 24 70 31 c0 e8 42 15 db ff 0f b7 5d 10 48 8b 85 70 02 00 00 4c 8d 64 24 38 b9 07 00 00 00 4c 89 e7 <44> 8b 70 10 31 c0 41 89 df 41 83 e7 f7 [ 389.764342] RIP: selinux_socket_sock_rcv_skb+0x46/0x270 RSP: ffffbcb5416b7ce0 [ 389.771467] CR2: 0000000000000010 [ 389.774784] ---[ end trace e83d65291a15ae6a ]--- Fix it by checking sk state before using it. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> --- net/vmw_vsock/vmci_transport.c | 7 +++++++ 1 file changed, 7 insertions(+)