diff mbox series

[1/1] accel/kvm/kvm-all: fix vm crash when set dirty ring and memorybacking

Message ID 20230323131914.35583-1-zhoujiajing.vergil@bytedance.com
State New
Headers show
Series [1/1] accel/kvm/kvm-all: fix vm crash when set dirty ring and memorybacking | expand

Commit Message

Jiajing Zhou March 23, 2023, 1:19 p.m. UTC
From: "zhoujiajing.vergil" <zhoujiajing.vergil@bytedance.com>

It is possible enter this function when the cpu not finished creating but
is already in the cpu list. The value of dirty_gfns is null, causing vm
crash here.

When both dirty-ring and memorybacking are set, creating a vm will assert
on kvm_dirty_ring_reap_one. Part of the xml as follows:


<domain type='kvm' id='9'>
  ...
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' memAccess='shared'/>
    </hugepages>
  </memoryBacking>
  ...
  <features>
    <acpi/>
    <kvm>
      <dirty-ring state='on' size='4096'/>
    </kvm>
  </features>
  ...
<domain/>

The kvm-reaper thread was created before vcpu thread, and the value of
cpu->kvm_dirty_gfns is assigned at cpu thread. In the x86_cpu_realizefn
function, the cpu is inserted into the cpu list first, and then the cpu
thread is created for initialization. The entry functions are
cpu_exec_realizefn and qemu_init_vcpu. In the existing logic, the
kvm-reaper thread traverses the cpu list every second and finally call
kvm_dirty_ring_reap_one for each cpu in the list. If cpu has been inserted
into cpu list but has not been initialized so that the value of dirty_gfns
is null, kvm-reaper thread call kvm_dirty_ring_reap_one will cause vm crash.

The call stack is as follows:
  kvm_dirty_ring_reaper_thread
    -> kvm_dirty_ring_reap
       ->kvm_dirty_ring_reap_locked
         ->kvm_dirty_ring_reap_one


Signed-off-by: zhoujiajing.vergil <zhoujiajing.vergil@bytedance.com>
---
 accel/kvm/kvm-all.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Peter Xu March 23, 2023, 1:43 p.m. UTC | #1
On Thu, Mar 23, 2023 at 09:19:15PM +0800, Jiajing Zhou wrote:
> From: "zhoujiajing.vergil" <zhoujiajing.vergil@bytedance.com>
> 
> It is possible enter this function when the cpu not finished creating but
> is already in the cpu list. The value of dirty_gfns is null, causing vm
> crash here.
> 
> When both dirty-ring and memorybacking are set, creating a vm will assert
> on kvm_dirty_ring_reap_one. Part of the xml as follows:
> 
> 
> <domain type='kvm' id='9'>
>   ...
>   <memoryBacking>
>     <hugepages>
>       <page size='2048' unit='KiB' memAccess='shared'/>
>     </hugepages>
>   </memoryBacking>
>   ...
>   <features>
>     <acpi/>
>     <kvm>
>       <dirty-ring state='on' size='4096'/>
>     </kvm>
>   </features>
>   ...
> <domain/>
> 
> The kvm-reaper thread was created before vcpu thread, and the value of
> cpu->kvm_dirty_gfns is assigned at cpu thread. In the x86_cpu_realizefn
> function, the cpu is inserted into the cpu list first, and then the cpu
> thread is created for initialization. The entry functions are
> cpu_exec_realizefn and qemu_init_vcpu. In the existing logic, the
> kvm-reaper thread traverses the cpu list every second and finally call
> kvm_dirty_ring_reap_one for each cpu in the list. If cpu has been inserted
> into cpu list but has not been initialized so that the value of dirty_gfns
> is null, kvm-reaper thread call kvm_dirty_ring_reap_one will cause vm crash.
> 
> The call stack is as follows:
>   kvm_dirty_ring_reaper_thread
>     -> kvm_dirty_ring_reap
>        ->kvm_dirty_ring_reap_locked
>          ->kvm_dirty_ring_reap_one
> 
> 
> Signed-off-by: zhoujiajing.vergil <zhoujiajing.vergil@bytedance.com>

Acked-by: Peter Xu <peterx@redhat.com>

And there's a prior fix last year:

https://lore.kernel.org/r/20220927154653.77296-1-peterx@redhat.com

The most recent post that I'm aware of is by Yong:

https://lore.kernel.org/r/1d14deb6684bcb7de1c9633c5bd21113988cc698.1676563222.git.huangy81@chinatelecom.cn

A bunch of people hit this already.

Paolo, ping yet again - would you consider merging any of the versions?
For this one I'd think it'll be good to have even for 8.0.

Thanks!
diff mbox series

Patch

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index f2a6ea6a68..698a59162a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -685,6 +685,11 @@  static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
     uint32_t ring_size = s->kvm_dirty_ring_size;
     uint32_t count = 0, fetch = cpu->kvm_fetch_index;
 
+    /* return 0 when cpu not finished creating */
+    if (!cpu->created) {
+        return 0;
+    }
+
     assert(dirty_gfns && ring_size);
     trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index);