mbox series

[00/14] KVM: Halt-polling fixes, cleanups and a new stat

Message ID 20210925005528.1145584-1-seanjc@google.com
Headers show
Series KVM: Halt-polling fixes, cleanups and a new stat | expand

Message

Sean Christopherson Sept. 25, 2021, 12:55 a.m. UTC
The main purpose of this series is differentiate between "halt" and a more
generic "block", where "halt" aligns with x86's HLT instruction, the
halt-polling mechanisms, and associated stats, and "block" means any guest
action that causes the vCPU to block/wait.

This series arose out of a discussion over adding a stat to track if a
vCPU is blocked/halted[*].  The TL;DR of the discussion is that x86 has
several non-halt "wait" states, and arguably those states should not
participate in halt-polling.  In practice, it really doesn't matter from
a functionality perspective because there are typically so few occurences
of the non-halt waits that they're in the noise compared to the number of
actual HLTs, especially for a long-running VM.  So, my justification for
the rename is that because it doesn't truly affect functionality, KVM
might as well be technically correct and only use halt-polling for HLT.

The other annoyance this series addresses is that KVM mixes "halt" and
"block", e.g. the existing function is kvm_vcpu_block(), but all the stats
and the tracepoint use "halt".  Ideally, KVM would probably avoid "block"
altogether as people often think of "blocked" as meaning the vCPU is
blocked due to _host_ activity.  But I don't have a better alternative,
e.g. "halt" is obviously taken, "wait" is equivalent to "halt" on arm64,
"stop" has specific meaning on s390, etc...  I tried to address the host
vs. guest issue by naming the new stat "blocking" instead of "blocked",
e.g. to convey that the vCPU is "actively blocking" instead of "being
blocked".

Patch 01 fixes a theoretical, benign s390 bug, and sets the stage for
additional cleanups.

Patches 02-04 reconcile discrepancies in when KVM considers halt-polling
to be "successful".  Some stats consider it a success so long as KVM
doesn't schedule() away, others consider it a success if and only if a
wake event is detected in the halt-polling loop.

Patches 05-06 are prep cleanup to split out the core "block" routine.

Patch 07 is more prep, and should also be a small perf optimization for
halt-polling on arm64.

Patch 08 is x86 cleanup to free up the name kvm_vcpu_halt().

Patches 09-10 rename the existing kvm_vcpu_block() to kvm_vcpu_halt(), and
split out the core "block" routine to a new helper.

Patches 11-12 are minor cleanups to avoid unnecessary ktime_get().

Patches 13-14 convert non-HLT x86 flows to use kvm_vcpu_block().

[*] https://lkml.kernel.org/r/20210817230508.142907-1-jingzhangos@google.com

Jing Zhang (1):
  KVM: stats: Add stat to detect if vcpu is currently blocking

Sean Christopherson (13):
  KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPU
  KVM: Update halt-polling stats if and only if halt-polling was
    attempted
  KVM: Refactor and document halt-polling stats update helper
  KVM: Reconcile discrepancies in halt-polling stats
  KVM: s390: Clear valid_wakeup in kvm_s390_handle_wait(), not in arch
    hook
  KVM: Drop obsolete kvm_arch_vcpu_block_finish()
  KVM: Don't block+unblock when halt-polling is successful
  KVM: x86: Tweak halt emulation helper names to free up kvm_vcpu_halt()
  KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()
  KVM: Split out a kvm_vcpu_block() helper from kvm_vcpu_halt()
  KVM: Don't redo ktime_get() when calculating halt-polling
    stop/deadline
  KVM: x86: Directly block (instead of "halting") UNINITIALIZED vCPUs
  KVM: x86: Invoke kvm_vcpu_block() directly for non-HALTED wait states

 arch/arm64/include/asm/kvm_host.h   |   1 -
 arch/arm64/kvm/arch_timer.c         |   2 +-
 arch/arm64/kvm/handle_exit.c        |   4 +-
 arch/arm64/kvm/psci.c               |   2 +-
 arch/mips/include/asm/kvm_host.h    |   1 -
 arch/mips/kvm/emulate.c             |   2 +-
 arch/powerpc/include/asm/kvm_host.h |   1 -
 arch/powerpc/kvm/book3s_pr.c        |   2 +-
 arch/powerpc/kvm/book3s_pr_papr.c   |   2 +-
 arch/powerpc/kvm/booke.c            |   2 +-
 arch/powerpc/kvm/powerpc.c          |   2 +-
 arch/s390/include/asm/kvm_host.h    |   2 -
 arch/s390/kvm/interrupt.c           |   3 +-
 arch/s390/kvm/kvm-s390.c            |   7 +-
 arch/x86/include/asm/kvm_host.h     |   4 +-
 arch/x86/kvm/vmx/nested.c           |   2 +-
 arch/x86/kvm/vmx/vmx.c              |   4 +-
 arch/x86/kvm/x86.c                  |  25 ++++--
 include/linux/kvm_host.h            |   6 +-
 include/linux/kvm_types.h           |   1 +
 virt/kvm/kvm_main.c                 | 131 +++++++++++++++++-----------
 21 files changed, 118 insertions(+), 88 deletions(-)

Comments

Christian Borntraeger Sept. 27, 2021, 7:22 a.m. UTC | #1
While looking into this series,

I realized that Davids patch

commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
Author:     David Matlack <dmatlack@google.com>
AuthorDate: Fri Apr 17 15:14:46 2020 -0700
Commit:     Paolo Bonzini <pbonzini@redhat.com>
CommitDate: Fri Apr 24 12:53:17 2020 -0400

     kvm: add capability for halt polling

broke the possibility for an admin to disable halt polling for already running KVM guests.
In past times doing
echo 0 > /sys/module/kvm/parameters/halt_poll_ns

stopped polling system wide.
Now all KVM guests will use the halt_poll_ns value that was active during startup - even those that do not use KVM_CAP_HALT_POLL.

I guess this was not intended?
Sean Christopherson Sept. 27, 2021, 2:59 p.m. UTC | #2
On Mon, Sep 27, 2021, Christian Borntraeger wrote:
> While looking into this series,
> 
> I realized that Davids patch
> 
> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
> Author:     David Matlack <dmatlack@google.com>
> AuthorDate: Fri Apr 17 15:14:46 2020 -0700
> Commit:     Paolo Bonzini <pbonzini@redhat.com>
> CommitDate: Fri Apr 24 12:53:17 2020 -0400
> 
>     kvm: add capability for halt polling
> 
> broke the possibility for an admin to disable halt polling for already running KVM guests.
> In past times doing
> echo 0 > /sys/module/kvm/parameters/halt_poll_ns
> 
> stopped polling system wide.
> Now all KVM guests will use the halt_poll_ns value that was active during
> startup - even those that do not use KVM_CAP_HALT_POLL.
> 
> I guess this was not intended?

Ouch.  I would go so far as to say that halt_poll_ns should be a hard limit on
the capability.  What about having the per-VM variable track only the capability,
and then use the module param to cap the max when doing adjustments?  E.g. add
a variant of this early in the series?

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 80f78daa6b8d..f50e4e31a0cf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1078,8 +1078,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
                        goto out_err_no_arch_destroy_vm;
        }

-       kvm->max_halt_poll_ns = halt_poll_ns;
-
        r = kvm_arch_init_vm(kvm, type);
        if (r)
                goto out_err_no_arch_destroy_vm;
@@ -3136,7 +3134,8 @@ void kvm_sigset_deactivate(struct kvm_vcpu *vcpu)
        sigemptyset(&current->real_blocked);
 }

-static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
+static void grow_halt_poll_ns(struct kvm_vcpu *vcpu,
+                             unsigned int max_halt_poll_ns)
 {
        unsigned int old, val, grow, grow_start;

@@ -3150,8 +3149,8 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
        if (val < grow_start)
                val = grow_start;

-       if (val > vcpu->kvm->max_halt_poll_ns)
-               val = vcpu->kvm->max_halt_poll_ns;
+       if (val > max_halt_poll_ns)
+               val = max_halt_poll_ns;

        vcpu->halt_poll_ns = val;
 out:
@@ -3261,6 +3260,7 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 {
        bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
        bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
+       unsigned int max_halt_poll_ns;
        ktime_t start, cur, poll_end;
        bool waited = false;
        u64 halt_ns;
@@ -3304,19 +3304,25 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
                update_halt_poll_stats(vcpu, start, poll_end, !waited);

        if (halt_poll_allowed) {
+               max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns;
+               if (max_halt_poll_ns)
+                       max_halt_poll_ns = min(max_halt_poll_ns, halt_poll_ns);
+               else
+                       max_halt_poll_ns = halt_poll_ns;
+
                if (!vcpu_valid_wakeup(vcpu)) {
                        shrink_halt_poll_ns(vcpu);
-               } else if (vcpu->kvm->max_halt_poll_ns) {
+               } else if (max_halt_poll_ns) {
                        if (halt_ns <= vcpu->halt_poll_ns)
                                ;
                        /* we had a long block, shrink polling */
                        else if (vcpu->halt_poll_ns &&
-                                halt_ns > vcpu->kvm->max_halt_poll_ns)
+                                halt_ns > max_halt_poll_ns)
                                shrink_halt_poll_ns(vcpu);
                        /* we had a short halt and our poll time is too small */
-                       else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns &&
-                                halt_ns < vcpu->kvm->max_halt_poll_ns)
-                               grow_halt_poll_ns(vcpu);
+                       else if (vcpu->halt_poll_ns < max_halt_poll_ns &&
+                                halt_ns < max_halt_poll_ns)
+                               grow_halt_poll_ns(vcpu, max_halt_poll_ns);
                } else {
                        vcpu->halt_poll_ns = 0;
                }
Paolo Bonzini Sept. 27, 2021, 3:03 p.m. UTC | #3
On 27/09/21 16:59, Sean Christopherson wrote:
>> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
>> Author:     David Matlack<dmatlack@google.com>
>> AuthorDate: Fri Apr 17 15:14:46 2020 -0700
>> Commit:     Paolo Bonzini<pbonzini@redhat.com>
>> CommitDate: Fri Apr 24 12:53:17 2020 -0400
>> 
>>      kvm: add capability for halt polling
>> 
>> broke the possibility for an admin to disable halt polling for already running KVM guests.
>> In past times doing
>> echo 0 > /sys/module/kvm/parameters/halt_poll_ns
>> 
>> stopped polling system wide.
>> Now all KVM guests will use the halt_poll_ns value that was active during
>> startup - even those that do not use KVM_CAP_HALT_POLL.
>> 
>> I guess this was not intended?

No, but...

> I would go so far as to say that halt_poll_ns should be a hard limit on
> the capability

... this would not be a good idea I think.  Anything that wants to do a 
lot of polling can just do "for (;;)".

So I think there are two possibilities that makes sense:

* track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns 
follow that

* just make halt_poll_ns read-only.

Paolo
Sean Christopherson Sept. 27, 2021, 3:15 p.m. UTC | #4
On Mon, Sep 27, 2021, Paolo Bonzini wrote:
> On 27/09/21 16:59, Sean Christopherson wrote:
> > > commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
> > > Author:     David Matlack<dmatlack@google.com>
> > > AuthorDate: Fri Apr 17 15:14:46 2020 -0700
> > > Commit:     Paolo Bonzini<pbonzini@redhat.com>
> > > CommitDate: Fri Apr 24 12:53:17 2020 -0400
> > > 
> > >      kvm: add capability for halt polling
> > > 
> > > broke the possibility for an admin to disable halt polling for already running KVM guests.
> > > In past times doing
> > > echo 0 > /sys/module/kvm/parameters/halt_poll_ns
> > > 
> > > stopped polling system wide.
> > > Now all KVM guests will use the halt_poll_ns value that was active during
> > > startup - even those that do not use KVM_CAP_HALT_POLL.
> > > 
> > > I guess this was not intended?
> 
> No, but...
> 
> > I would go so far as to say that halt_poll_ns should be a hard limit on
> > the capability
> 
> ... this would not be a good idea I think.  Anything that wants to do a lot
> of polling can just do "for (;;)".

Hmm, true, there is no danger to the system in having the capability override the
module param.

> So I think there are two possibilities that makes sense:
> 
> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns
> follow that

I think this option makes more sense, making halt_poll_ns read-only is basically
forcing users to switch to KVM_CAP_HALT_POLL.

> * just make halt_poll_ns read-only.
> 
> Paolo
>
Christian Borntraeger Sept. 27, 2021, 3:16 p.m. UTC | #5
Am 27.09.21 um 17:03 schrieb Paolo Bonzini:
> On 27/09/21 16:59, Sean Christopherson wrote:
>>> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
>>> Author:     David Matlack<dmatlack@google.com>
>>> AuthorDate: Fri Apr 17 15:14:46 2020 -0700
>>> Commit:     Paolo Bonzini<pbonzini@redhat.com>
>>> CommitDate: Fri Apr 24 12:53:17 2020 -0400
>>>
>>>      kvm: add capability for halt polling
>>>
>>> broke the possibility for an admin to disable halt polling for already running KVM guests.
>>> In past times doing
>>> echo 0 > /sys/module/kvm/parameters/halt_poll_ns
>>>
>>> stopped polling system wide.
>>> Now all KVM guests will use the halt_poll_ns value that was active during
>>> startup - even those that do not use KVM_CAP_HALT_POLL.
>>>
>>> I guess this was not intended?
> 
> No, but...
> 
>> I would go so far as to say that halt_poll_ns should be a hard limit on
>> the capability
> 
> ... this would not be a good idea I think.  Anything that wants to do a lot of polling can just do "for (;;)".
> 
> So I think there are two possibilities that makes sense:
> 
> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that

what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.
> 
> * just make halt_poll_ns read-only.
> 
> Paolo
>
David Matlack Sept. 27, 2021, 4:58 p.m. UTC | #6
On Mon, Sep 27, 2021 at 8:17 AM Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
>
>
>
> Am 27.09.21 um 17:03 schrieb Paolo Bonzini:
> > On 27/09/21 16:59, Sean Christopherson wrote:
> >>> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
> >>> Author:     David Matlack<dmatlack@google.com>
> >>> AuthorDate: Fri Apr 17 15:14:46 2020 -0700
> >>> Commit:     Paolo Bonzini<pbonzini@redhat.com>
> >>> CommitDate: Fri Apr 24 12:53:17 2020 -0400
> >>>
> >>>      kvm: add capability for halt polling
> >>>
> >>> broke the possibility for an admin to disable halt polling for already running KVM guests.
> >>> In past times doing
> >>> echo 0 > /sys/module/kvm/parameters/halt_poll_ns
> >>>
> >>> stopped polling system wide.
> >>> Now all KVM guests will use the halt_poll_ns value that was active during
> >>> startup - even those that do not use KVM_CAP_HALT_POLL.
> >>>
> >>> I guess this was not intended?
> >
> > No, but...
> >
> >> I would go so far as to say that halt_poll_ns should be a hard limit on
> >> the capability
> >
> > ... this would not be a good idea I think.  Anything that wants to do a lot of polling can just do "for (;;)".

I agree. It would also be a maintenance burden and subtle "gotcha" to
have to increase halt_poll_ns anytime one wants to increase
KVM_CAP_HALT_POLL.

> >
> > So I think there are two possibilities that makes sense:
> >
> > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that
>
> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.

None of these options would cover Christian's original use-case
though. (Write to module to disable halt-polling system-wide.)

What about adding a writable "enable_halt_polling" module parameter
that affects all VMs? Once that is in place we could also consider
getting rid of halt_poll_ns entirely.

> >
> > * just make halt_poll_ns read-only.
> >
> > Paolo
> >
Paolo Bonzini Sept. 27, 2021, 5:24 p.m. UTC | #7
On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger
<borntraeger@de.ibm.com> wrote:
> > So I think there are two possibilities that makes sense:
> >
> > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that
>
> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.

Yes, that's what I meant.  David pointed out that doesn't allow you to
disable halt polling altogether, but for that you can always ask each
VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I
don't know about Google's usecase, but mine was actually more about
using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!).

Paolo
Sean Christopherson Sept. 27, 2021, 5:33 p.m. UTC | #8
On Mon, Sep 27, 2021, Paolo Bonzini wrote:
> On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
> > > So I think there are two possibilities that makes sense:
> > >
> > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that
> >
> > what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.
> 
> Yes, that's what I meant.  David pointed out that doesn't allow you to
> disable halt polling altogether, but for that you can always ask each
> VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I
> don't know about Google's usecase, but mine was actually more about
> using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!).

I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or
in-the-field mitigation if halt-polling is broken.  It'd be trivial to support, e.g.

@@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
                update_halt_poll_stats(vcpu, start, poll_end, !waited);

        if (halt_poll_allowed) {
+               max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns;
+               if (!max_halt_poll_ns || !halt_poll_ns)  <------ squish the max if halt_poll_ns==0
+                       max_halt_poll_ns = halt_poll_ns;
+
                if (!vcpu_valid_wakeup(vcpu)) {
                        shrink_halt_poll_ns(vcpu);
-               } else if (vcpu->kvm->max_halt_poll_ns) {
+               } else if (max_halt_poll_ns) {
                        if (halt_ns <= vcpu->halt_poll_ns)
                                ;
                        /* we had a long block, shrink polling */
                        else if (vcpu->halt_poll_ns &&
-                                halt_ns > vcpu->kvm->max_halt_poll_ns)
+                                halt_ns > max_halt_poll_ns)
                                shrink_halt_poll_ns(vcpu);
                        /* we had a short halt and our poll time is too small */
-                       else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns &&
-                                halt_ns < vcpu->kvm->max_halt_poll_ns)
-                               grow_halt_poll_ns(vcpu);
+                       else if (vcpu->halt_poll_ns < max_halt_poll_ns &&
+                                halt_ns < max_halt_poll_ns)
+                               grow_halt_poll_ns(vcpu, max_halt_poll_ns);
                } else {
                        vcpu->halt_poll_ns = 0;
                }
Christian Borntraeger Sept. 29, 2021, 6:56 a.m. UTC | #9
Am 27.09.21 um 18:58 schrieb David Matlack:
> On Mon, Sep 27, 2021 at 8:17 AM Christian Borntraeger
> <borntraeger@de.ibm.com> wrote:
>>
>>
>>
>> Am 27.09.21 um 17:03 schrieb Paolo Bonzini:
>>> On 27/09/21 16:59, Sean Christopherson wrote:
>>>>> commit acd05785e48c01edb2c4f4d014d28478b5f19fb5
>>>>> Author:     David Matlack<dmatlack@google.com>
>>>>> AuthorDate: Fri Apr 17 15:14:46 2020 -0700
>>>>> Commit:     Paolo Bonzini<pbonzini@redhat.com>
>>>>> CommitDate: Fri Apr 24 12:53:17 2020 -0400
>>>>>
>>>>>       kvm: add capability for halt polling
>>>>>
>>>>> broke the possibility for an admin to disable halt polling for already running KVM guests.
>>>>> In past times doing
>>>>> echo 0 > /sys/module/kvm/parameters/halt_poll_ns
>>>>>
>>>>> stopped polling system wide.
>>>>> Now all KVM guests will use the halt_poll_ns value that was active during
>>>>> startup - even those that do not use KVM_CAP_HALT_POLL.
>>>>>
>>>>> I guess this was not intended?
>>>
>>> No, but...
>>>
>>>> I would go so far as to say that halt_poll_ns should be a hard limit on
>>>> the capability
>>>
>>> ... this would not be a good idea I think.  Anything that wants to do a lot of polling can just do "for (;;)".
> 
> I agree. It would also be a maintenance burden and subtle "gotcha" to
> have to increase halt_poll_ns anytime one wants to increase
> KVM_CAP_HALT_POLL.

I think the idea of the upper bound is not about preventing wasting CPUs
but to reconfigure existing poll intervals on a global level. So I think
this idea is a bad idea in itself. Especially as the admin might not have
access to the monitor of user QEMUs.

>>>
>>> So I think there are two possibilities that makes sense:
>>>
>>> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that
>>
>> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.
> 
> None of these options would cover Christian's original use-case
> though. (Write to module to disable halt-polling system-wide.)
> 
> What about adding a writable "enable_halt_polling" module parameter

that would then affect both classes with and without KVM_CAP_HALT_POLL.

> that affects all VMs? Once that is in place we could also consider
> getting rid of halt_poll_ns entirely.

As far as I can tell QEMU does not yet use KVM_CAP_HALT_POLL.
So having a system wide halt_poll_ns makes sense. And I think for all
processes not using KVM_CAP_HALT_POLL we should really follow what
halt_poll_ns is NOW and not what it used to be.
wangyanan (Y) Nov. 15, 2022, 3:28 a.m. UTC | #10
Hi Sean, Paolo,

I recently also notice the behavior change of param halt_poll_ns.
Now it loses the ability to:
1) dynamically disable halt polling for all the running VMs
by `echo 0 > /sys`
2) dynamically adjust the halt polling interval for all the
running VMs by `echo * > /sys`

While in our cases, we usually use above two abilities, and
KVM_CAP_HALT_POLL is not used yet.

On 2021/9/28 1:33, Sean Christopherson wrote:
> On Mon, Sep 27, 2021, Paolo Bonzini wrote:
>> On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger
>> <borntraeger@de.ibm.com> wrote:
>>>> So I think there are two possibilities that makes sense:
>>>>
>>>> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that
>>> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.
>> Yes, that's what I meant.  David pointed out that doesn't allow you to
>> disable halt polling altogether, but for that you can always ask each
>> VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I
>> don't know about Google's usecase, but mine was actually more about
>> using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!).
> I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or
> in-the-field mitigation if halt-polling is broken.  It'd be trivial to support, e.g.
Do we have any plan to repost the diff as a fix?
I would be very nice that this issue can be solved.

Besides, I think we may need some Doc for users to describe
how halt_poll_ns works with KVM_CAP_HALT_POLL, like
"Documentation/virt/guest-halt-polling.rst".
> @@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>                  update_halt_poll_stats(vcpu, start, poll_end, !waited);
>
>          if (halt_poll_allowed) {
> +               max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns;
> +               if (!max_halt_poll_ns || !halt_poll_ns)  <------ squish the max if halt_poll_ns==0
> +                       max_halt_poll_ns = halt_poll_ns;
> +
Does this mean that KVM_CAP_HALT_POLL will not be able to
disable halt polling for a VM individually when halt_poll_ns !=0?
>                  if (!vcpu_valid_wakeup(vcpu)) {
>                          shrink_halt_poll_ns(vcpu);
> -               } else if (vcpu->kvm->max_halt_poll_ns) {
> +               } else if (max_halt_poll_ns) {
>                          if (halt_ns <= vcpu->halt_poll_ns)
>                                  ;
>                          /* we had a long block, shrink polling */
>                          else if (vcpu->halt_poll_ns &&
> -                                halt_ns > vcpu->kvm->max_halt_poll_ns)
> +                                halt_ns > max_halt_poll_ns)
>                                  shrink_halt_poll_ns(vcpu);
>                          /* we had a short halt and our poll time is too small */
> -                       else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns &&
> -                                halt_ns < vcpu->kvm->max_halt_poll_ns)
> -                               grow_halt_poll_ns(vcpu);
> +                       else if (vcpu->halt_poll_ns < max_halt_poll_ns &&
> +                                halt_ns < max_halt_poll_ns)
> +                               grow_halt_poll_ns(vcpu, max_halt_poll_ns);
>                  } else {
>                          vcpu->halt_poll_ns = 0;
>                  }
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> .
Thanks,
Yanan
David Matlack Nov. 16, 2022, 5:19 p.m. UTC | #11
On Tue, Nov 15, 2022 at 11:28:56AM +0800, wangyanan (Y) wrote:
> Hi Sean, Paolo,
> 
> I recently also notice the behavior change of param halt_poll_ns.
> Now it loses the ability to:
> 1) dynamically disable halt polling for all the running VMs
> by `echo 0 > /sys`
> 2) dynamically adjust the halt polling interval for all the
> running VMs by `echo * > /sys`
> 
> While in our cases, we usually use above two abilities, and
> KVM_CAP_HALT_POLL is not used yet.

I think the right path forward is to make KVM_CAP_HALT_POLL a pure
override of halt_poll_ns, and restore the pre-existing behavior of
halt_poll_ns whenever KVM_CAP_HALT_POLL is not used. e.g. see the patch
below.

That will fix issues (1) and (2) above for any VM not using
KVM_CAP_HALT_POLL. If a VM is using KVM_CAP_HALT_POLL, it will ignore
all changes to halt_poll_ns. If we truly need a mechanism for admins to
disable halt-polling on VMs using KVM_CAP_HALT_POLL, we can introduce a
separate module parameter for that. But IMO, any setup that is
sophisticated enough to use KVM_CAP_HALT_POLL should also be able to use
KVM_CAP_HALT_POLL to disable halt polling.

If everyone is happy with this approach I can test and send a real patch
to the mailing list.

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e6e66c5e56f2..253ad055b6ad 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -788,6 +788,7 @@ struct kvm {
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
+	bool override_halt_poll_ns;
 	unsigned int max_halt_poll_ns;
 	u32 dirty_ring_size;
 	bool vm_bugged;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 43bbe4fde078..479d0d0da0b5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1198,8 +1198,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
 			goto out_err_no_arch_destroy_vm;
 	}
 
-	kvm->max_halt_poll_ns = halt_poll_ns;
-
 	r = kvm_arch_init_vm(kvm, type);
 	if (r)
 		goto out_err_no_arch_destroy_vm;
@@ -3371,7 +3369,7 @@ void kvm_sigset_deactivate(struct kvm_vcpu *vcpu)
 	sigemptyset(&current->real_blocked);
 }
 
-static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
+static void grow_halt_poll_ns(struct kvm_vcpu *vcpu, unsigned int max)
 {
 	unsigned int old, val, grow, grow_start;
 
@@ -3385,8 +3383,8 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
 	if (val < grow_start)
 		val = grow_start;
 
-	if (val > vcpu->kvm->max_halt_poll_ns)
-		val = vcpu->kvm->max_halt_poll_ns;
+	if (val > max)
+		val = max;
 
 	vcpu->halt_poll_ns = val;
 out:
@@ -3501,10 +3499,17 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 {
 	bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
 	bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
+	unsigned int max_halt_poll_ns;
 	ktime_t start, cur, poll_end;
+	struct kvm *kvm = vcpu->kvm;
 	bool waited = false;
 	u64 halt_ns;
 
+	if (kvm->override_halt_poll_ns)
+		max_halt_poll_ns = kvm->max_halt_poll_ns;
+	else
+		max_halt_poll_ns = READ_ONCE(halt_poll_ns);
+
 	start = cur = poll_end = ktime_get();
 	if (do_halt_poll) {
 		ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns);
@@ -3545,17 +3550,16 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
 	if (halt_poll_allowed) {
 		if (!vcpu_valid_wakeup(vcpu)) {
 			shrink_halt_poll_ns(vcpu);
-		} else if (vcpu->kvm->max_halt_poll_ns) {
+		} else if (max_halt_poll_ns) {
 			if (halt_ns <= vcpu->halt_poll_ns)
 				;
 			/* we had a long block, shrink polling */
-			else if (vcpu->halt_poll_ns &&
-				 halt_ns > vcpu->kvm->max_halt_poll_ns)
+			else if (vcpu->halt_poll_ns && halt_ns > max_halt_poll_ns)
 				shrink_halt_poll_ns(vcpu);
 			/* we had a short halt and our poll time is too small */
-			else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns &&
-				 halt_ns < vcpu->kvm->max_halt_poll_ns)
-				grow_halt_poll_ns(vcpu);
+			else if (vcpu->halt_poll_ns < max_halt_poll_ns &&
+				 halt_ns < max_halt_poll_ns)
+				grow_halt_poll_ns(vcpu, max_halt_poll_ns);
 		} else {
 			vcpu->halt_poll_ns = 0;
 		}
@@ -4588,6 +4592,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
 		if (cap->flags || cap->args[0] != (unsigned int)cap->args[0])
 			return -EINVAL;
 
+		kvm->override_halt_poll_ns = true;
 		kvm->max_halt_poll_ns = cap->args[0];
 		return 0;
 	}

> 
> On 2021/9/28 1:33, Sean Christopherson wrote:
> > On Mon, Sep 27, 2021, Paolo Bonzini wrote:
> > > On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger
> > > <borntraeger@de.ibm.com> wrote:
> > > > > So I think there are two possibilities that makes sense:
> > > > > 
> > > > > * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that
> > > > what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.
> > > Yes, that's what I meant.  David pointed out that doesn't allow you to
> > > disable halt polling altogether, but for that you can always ask each
> > > VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I
> > > don't know about Google's usecase, but mine was actually more about
> > > using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!).
> > I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or
> > in-the-field mitigation if halt-polling is broken.  It'd be trivial to support, e.g.
> Do we have any plan to repost the diff as a fix?
> I would be very nice that this issue can be solved.
> 
> Besides, I think we may need some Doc for users to describe
> how halt_poll_ns works with KVM_CAP_HALT_POLL, like
> "Documentation/virt/guest-halt-polling.rst".
> > @@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
> >                  update_halt_poll_stats(vcpu, start, poll_end, !waited);
> > 
> >          if (halt_poll_allowed) {
> > +               max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns;
> > +               if (!max_halt_poll_ns || !halt_poll_ns)  <------ squish the max if halt_poll_ns==0
> > +                       max_halt_poll_ns = halt_poll_ns;
> > +
> Does this mean that KVM_CAP_HALT_POLL will not be able to
> disable halt polling for a VM individually when halt_poll_ns !=0?
> >                  if (!vcpu_valid_wakeup(vcpu)) {
> >                          shrink_halt_poll_ns(vcpu);
> > -               } else if (vcpu->kvm->max_halt_poll_ns) {
> > +               } else if (max_halt_poll_ns) {
> >                          if (halt_ns <= vcpu->halt_poll_ns)
> >                                  ;
> >                          /* we had a long block, shrink polling */
> >                          else if (vcpu->halt_poll_ns &&
> > -                                halt_ns > vcpu->kvm->max_halt_poll_ns)
> > +                                halt_ns > max_halt_poll_ns)
> >                                  shrink_halt_poll_ns(vcpu);
> >                          /* we had a short halt and our poll time is too small */
> > -                       else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns &&
> > -                                halt_ns < vcpu->kvm->max_halt_poll_ns)
> > -                               grow_halt_poll_ns(vcpu);
> > +                       else if (vcpu->halt_poll_ns < max_halt_poll_ns &&
> > +                                halt_ns < max_halt_poll_ns)
> > +                               grow_halt_poll_ns(vcpu, max_halt_poll_ns);
> >                  } else {
> >                          vcpu->halt_poll_ns = 0;
> >                  }
> > _______________________________________________
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> > .
> Thanks,
> Yanan
wangyanan (Y) Nov. 18, 2022, 2:29 a.m. UTC | #12
On 2022/11/17 1:19, David Matlack wrote:
> On Tue, Nov 15, 2022 at 11:28:56AM +0800, wangyanan (Y) wrote:
>> Hi Sean, Paolo,
>>
>> I recently also notice the behavior change of param halt_poll_ns.
>> Now it loses the ability to:
>> 1) dynamically disable halt polling for all the running VMs
>> by `echo 0 > /sys`
>> 2) dynamically adjust the halt polling interval for all the
>> running VMs by `echo * > /sys`
>>
>> While in our cases, we usually use above two abilities, and
>> KVM_CAP_HALT_POLL is not used yet.
> I think the right path forward is to make KVM_CAP_HALT_POLL a pure
> override of halt_poll_ns, and restore the pre-existing behavior of
> halt_poll_ns whenever KVM_CAP_HALT_POLL is not used. e.g. see the patch
> below.
Agree with this.
kvm.halt_poll_ns serves like a legacy method to control halt polling
globally. Once KVM_CAP_HALT_POLL is used for a VM, it should
hold 100% responsibility to control on the VM, including disabling
the polling. This strategy helps to keep the two mechanisms
decoupled.
> That will fix issues (1) and (2) above for any VM not using
> KVM_CAP_HALT_POLL. If a VM is using KVM_CAP_HALT_POLL, it will ignore
> all changes to halt_poll_ns. If we truly need a mechanism for admins to
> disable halt-polling on VMs using KVM_CAP_HALT_POLL, we can introduce a
> separate module parameter for that. But IMO, any setup that is
> sophisticated enough to use KVM_CAP_HALT_POLL should also be able to use
> KVM_CAP_HALT_POLL to disable halt polling.
>
> If everyone is happy with this approach I can test and send a real patch
> to the mailing list.
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e6e66c5e56f2..253ad055b6ad 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -788,6 +788,7 @@ struct kvm {
>   	struct srcu_struct srcu;
>   	struct srcu_struct irq_srcu;
>   	pid_t userspace_pid;
> +	bool override_halt_poll_ns;
>   	unsigned int max_halt_poll_ns;
>   	u32 dirty_ring_size;
>   	bool vm_bugged;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 43bbe4fde078..479d0d0da0b5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1198,8 +1198,6 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
>   			goto out_err_no_arch_destroy_vm;
>   	}
>   
> -	kvm->max_halt_poll_ns = halt_poll_ns;
> -
>   	r = kvm_arch_init_vm(kvm, type);
>   	if (r)
>   		goto out_err_no_arch_destroy_vm;
> @@ -3371,7 +3369,7 @@ void kvm_sigset_deactivate(struct kvm_vcpu *vcpu)
>   	sigemptyset(&current->real_blocked);
>   }
>   
> -static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
> +static void grow_halt_poll_ns(struct kvm_vcpu *vcpu, unsigned int max)
>   {
>   	unsigned int old, val, grow, grow_start;
>   
> @@ -3385,8 +3383,8 @@ static void grow_halt_poll_ns(struct kvm_vcpu *vcpu)
>   	if (val < grow_start)
>   		val = grow_start;
>   
> -	if (val > vcpu->kvm->max_halt_poll_ns)
> -		val = vcpu->kvm->max_halt_poll_ns;
> +	if (val > max)
> +		val = max;
>   
>   	vcpu->halt_poll_ns = val;
>   out:
> @@ -3501,10 +3499,17 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>   {
>   	bool halt_poll_allowed = !kvm_arch_no_poll(vcpu);
>   	bool do_halt_poll = halt_poll_allowed && vcpu->halt_poll_ns;
> +	unsigned int max_halt_poll_ns;
>   	ktime_t start, cur, poll_end;
> +	struct kvm *kvm = vcpu->kvm;
>   	bool waited = false;
>   	u64 halt_ns;
>   
> +	if (kvm->override_halt_poll_ns)
> +		max_halt_poll_ns = kvm->max_halt_poll_ns;
> +	else
> +		max_halt_poll_ns = READ_ONCE(halt_poll_ns);
> +
>   	start = cur = poll_end = ktime_get();
>   	if (do_halt_poll) {
>   		ktime_t stop = ktime_add_ns(start, vcpu->halt_poll_ns);
> @@ -3545,17 +3550,16 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>   	if (halt_poll_allowed) {
>   		if (!vcpu_valid_wakeup(vcpu)) {
>   			shrink_halt_poll_ns(vcpu);
> -		} else if (vcpu->kvm->max_halt_poll_ns) {
> +		} else if (max_halt_poll_ns) {
>   			if (halt_ns <= vcpu->halt_poll_ns)
>   				;
>   			/* we had a long block, shrink polling */
> -			else if (vcpu->halt_poll_ns &&
> -				 halt_ns > vcpu->kvm->max_halt_poll_ns)
> +			else if (vcpu->halt_poll_ns && halt_ns > max_halt_poll_ns)
>   				shrink_halt_poll_ns(vcpu);
>   			/* we had a short halt and our poll time is too small */
> -			else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns &&
> -				 halt_ns < vcpu->kvm->max_halt_poll_ns)
> -				grow_halt_poll_ns(vcpu);
> +			else if (vcpu->halt_poll_ns < max_halt_poll_ns &&
> +				 halt_ns < max_halt_poll_ns)
> +				grow_halt_poll_ns(vcpu, max_halt_poll_ns);
>   		} else {
>   			vcpu->halt_poll_ns = 0;
>   		}
> @@ -4588,6 +4592,7 @@ static int kvm_vm_ioctl_enable_cap_generic(struct kvm *kvm,
>   		if (cap->flags || cap->args[0] != (unsigned int)cap->args[0])
>   			return -EINVAL;
>   
> +		kvm->override_halt_poll_ns = true;
>   		kvm->max_halt_poll_ns = cap->args[0];
>   		return 0;
>   	}
Looks sensible to me overall.
I will look at the RFC series, thanks for your quick response.

Yanan
.
>> On 2021/9/28 1:33, Sean Christopherson wrote:
>>> On Mon, Sep 27, 2021, Paolo Bonzini wrote:
>>>> On Mon, Sep 27, 2021 at 5:17 PM Christian Borntraeger
>>>> <borntraeger@de.ibm.com> wrote:
>>>>>> So I think there are two possibilities that makes sense:
>>>>>>
>>>>>> * track what is using KVM_CAP_HALT_POLL, and make writes to halt_poll_ns follow that
>>>>> what about using halt_poll_ns for those VMs that did not uses KVM_CAP_HALT_POLL and the private number for those that did.
>>>> Yes, that's what I meant.  David pointed out that doesn't allow you to
>>>> disable halt polling altogether, but for that you can always ask each
>>>> VM's userspace one by one, or just not use KVM_CAP_HALT_POLL. (Also, I
>>>> don't know about Google's usecase, but mine was actually more about
>>>> using KVM_CAP_HALT_POLL to *disable* halt polling on some VMs!).
>>> I kinda like the idea if special-casing halt_poll_ns=0, e.g. for testing or
>>> in-the-field mitigation if halt-polling is broken.  It'd be trivial to support, e.g.
>> Do we have any plan to repost the diff as a fix?
>> I would be very nice that this issue can be solved.
>>
>> Besides, I think we may need some Doc for users to describe
>> how halt_poll_ns works with KVM_CAP_HALT_POLL, like
>> "Documentation/virt/guest-halt-polling.rst".
>>> @@ -3304,19 +3304,23 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
>>>                   update_halt_poll_stats(vcpu, start, poll_end, !waited);
>>>
>>>           if (halt_poll_allowed) {
>>> +               max_halt_poll_ns = vcpu->kvm->max_halt_poll_ns;
>>> +               if (!max_halt_poll_ns || !halt_poll_ns)  <------ squish the max if halt_poll_ns==0
>>> +                       max_halt_poll_ns = halt_poll_ns;
>>> +
>> Does this mean that KVM_CAP_HALT_POLL will not be able to
>> disable halt polling for a VM individually when halt_poll_ns !=0?
>>>                   if (!vcpu_valid_wakeup(vcpu)) {
>>>                           shrink_halt_poll_ns(vcpu);
>>> -               } else if (vcpu->kvm->max_halt_poll_ns) {
>>> +               } else if (max_halt_poll_ns) {
>>>                           if (halt_ns <= vcpu->halt_poll_ns)
>>>                                   ;
>>>                           /* we had a long block, shrink polling */
>>>                           else if (vcpu->halt_poll_ns &&
>>> -                                halt_ns > vcpu->kvm->max_halt_poll_ns)
>>> +                                halt_ns > max_halt_poll_ns)
>>>                                   shrink_halt_poll_ns(vcpu);
>>>                           /* we had a short halt and our poll time is too small */
>>> -                       else if (vcpu->halt_poll_ns < vcpu->kvm->max_halt_poll_ns &&
>>> -                                halt_ns < vcpu->kvm->max_halt_poll_ns)
>>> -                               grow_halt_poll_ns(vcpu);
>>> +                       else if (vcpu->halt_poll_ns < max_halt_poll_ns &&
>>> +                                halt_ns < max_halt_poll_ns)
>>> +                               grow_halt_poll_ns(vcpu, max_halt_poll_ns);
>>>                   } else {
>>>                           vcpu->halt_poll_ns = 0;
>>>                   }
>>> _______________________________________________
>>> kvmarm mailing list
>>> kvmarm@lists.cs.columbia.edu
>>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
>>> .
>> Thanks,
>> Yanan
> .