diff mbox series

[qemu-web] Add a blog post on "Micro-Optimizing KVM VM-Exits"

Message ID 20191108092247.16207-1-kchamart@redhat.com
State New
Headers show
Series [qemu-web] Add a blog post on "Micro-Optimizing KVM VM-Exits" | expand

Commit Message

Kashyap Chamarthy Nov. 8, 2019, 9:22 a.m. UTC
This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
given by Andrea Arcangeli at the recently concluded KVM Forum 2019.

[1] https://kvmforum2019.sched.com/event/Tmwr/micro-optimizing-kvm-vm-exits-andrea-arcangeli-red-hat-inc

Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
---
 ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt

Comments

Kashyap Chamarthy Nov. 12, 2019, 9:42 a.m. UTC | #1
[Cc: Rich Jones, addressing his feedback on IRC, below.]

On Fri, Nov 08, 2019 at 10:22:47AM +0100, Kashyap Chamarthy wrote:
> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
> given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
> 
> [1] https://kvmforum2019.sched.com/event/Tmwr/micro-optimizing-kvm-vm-exits-andrea-arcangeli-red-hat-inc
> 
> Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
> ---

[...]

> +The microbechmark: CPUID in a one million loop
> +----------------------------------------------
> +
> +The synthetic microbenchmark (meaning, focus on measuring the
> +performance of a specific area of code) Andrea used was to run the CPUID
> +instruction one million times, without any GCC optimizations or caching.
> +This was done to test the latency of VM-Exits.

I can send a v2 (but will wait for any other feedback), or when applying
someone please replace the above paragraph with the following:

    "Andrea constructed a synthetic microbenchmark program (without any
    GCC optimizations or caching) which runs the CPUID instructions one
    million times in a loop.  This microbenchmark is meant to focus on
    measuring the performance of a specific area of the code -- in this
    case, to test the latency of VM-Exits."

(Rich, hope that reads better.  Thanks for the review.)

> +While stressing that the results of these microbenchmarks do not
> +represent real-world workloads, he had two goals in mind with it: (a)
> +explain how the software mitigation works; and (b) to justify to the
> +broader community the value of the software optimizations he's working
> +on in KVM.
> +
> +Andrea then reasoned through several interesting graphs that show how
> +CPU computation time gets impacted when you disable or enable the
> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.

[...]
Stefan Hajnoczi Nov. 12, 2019, 10:37 a.m. UTC | #2
On Fri, Nov 08, 2019 at 10:22:47AM +0100, Kashyap Chamarthy wrote:
> +The proposal: "KVM Monolithic"
> +------------------------------
> +
> +Based on his investigation, Andrea proposed a patch series, ["KVM
> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM

s/monolithc/monolithic/

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Thomas Huth Nov. 15, 2019, 12:08 p.m. UTC | #3
On 08/11/2019 10.22, Kashyap Chamarthy wrote:
> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
> given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
> 

 Hi Kashyap,

first thanks for writing up this article! It's a really nice summary of
the presentation, I think.

But before we include it, let me ask a meta-question: Is an article
about the KVM *kernel* code suitable for the *QEMU* blog? Or is there
maybe a better place for this, like an article on www.linux-kvm.org ?

Opinions? Ideas?

 Thomas


> ---
>  ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++
>  1 file changed, 115 insertions(+)
>  create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
> 
> diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
> new file mode 100644
> index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976
> --- /dev/null
> +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
> @@ -0,0 +1,115 @@
> +---
> +layout: post
> +title: "Micro-Optimizing KVM VM-Exits"
> +date:   2019-11-08
> +categories: [kvm, optimization]
> +---
> +
> +Background on VM-Exits
> +----------------------
> +
> +KVM (Kernel-based Virtual Machine) is the Linux kernel module that
> +allows a host to run virtualized guests (Linux, Windows, etc).  The KVM
> +"guest execution loop", with QEMU (the open source emulator and
> +virtualizer) as its user space, is roughly as follows: QEMU issues the
> +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode"
> +-- a special processor mode which allows guest code to safely run
> +directly on the physical CPU.  The guest code, which is inside a "jail"
> +and thus cannot interfere with the rest of the system, keeps running on
> +the hardware until it encounters a request it cannot handle.  Then the
> +processor gives the control back (referred to as "VM-Exit") either to
> +kernel space, or to the user space to handle the request.  Once the
> +request is handled, native execution of guest code on the processor
> +resumes again.  And the loop goes on.
> +
> +There are dozens of reasons for VM-Exits (Intel's Software Developer
> +Manual outlines 64 "Basic Exit Reasons").  For example, when a guest
> +needs to emulate the CPUID instruction, it causes a "light-weight exit"
> +to kernel space, because CPUID (among a few others) is emulated in the
> +kernel itself, for performance reasons.  But when the kernel _cannot_
> +handle a request, e.g. to emulate certain hardware, it results in a
> +"heavy-weight exit" to QEMU, to perform the emulation.  These VM-Exits
> +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can
> +be expensive.  What can be done about it?
> +
> +Guest workloads that are hard to virtualize
> +-------------------------------------------
> +
> +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
> +Arcangeli, attempted to address the kernel part of minimizing VM-Exits.
> +
> +His talk touched on the cost of VM-Exits into the kernel, especially for
> +guest workloads (e.g. enterprise databases) that are sensitive to their
> +performance penalty.  However, these workloads cannot avoid triggering
> +VM-Exits with a high frequency.  Andrea then outlined some of the
> +optimizations he's been working on to improve the VM-Exit performance in
> +the KVM code path -- especially in light of applying mitigations for
> +speculative execution flaws (Spectre v2, MDS, L1TF).
> +
> +Andrea gave a brief recap of the different kinds of speculative
> +execution attacks (retpolines, IBPB, PTI, SSBD, etc).  Followed by that
> +he outlined the performance impact of Spectre-v2 mitigations in context
> +of KVM.
> +
> +The microbechmark: CPUID in a one million loop
> +----------------------------------------------
> +
> +The synthetic microbenchmark (meaning, focus on measuring the
> +performance of a specific area of code) Andrea used was to run the CPUID
> +instruction one million times, without any GCC optimizations or caching.
> +This was done to test the latency of VM-Exits.
> +
> +While stressing that the results of these microbenchmarks do not
> +represent real-world workloads, he had two goals in mind with it: (a)
> +explain how the software mitigation works; and (b) to justify to the
> +broader community the value of the software optimizations he's working
> +on in KVM.
> +
> +Andrea then reasoned through several interesting graphs that show how
> +CPU computation time gets impacted when you disable or enable the
> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.
> +
> +The proposal: "KVM Monolithic"
> +------------------------------
> +
> +Based on his investigation, Andrea proposed a patch series, ["KVM
> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM
> +common module, 'kvm.ko'.  Instead the KVM common code gets linked twice
> +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and
> +'kvm-amd.ko'.
> +
> +The reason for doing this is that the 'kvm.ko' module indirectly calls
> +(via the "retpoline" technique) the vendor-specific KVM modules at every
> +VM-Exit, several times.  These indirect calls were not optimal before,
> +but the "retpoline" mitigation (which isolates indirect branches, that
> +allow a CPU to execute code from arbitrary locations, from speculative
> +execution) for Spectre v2 compounds the problem, as it degrades
> +performance.
> +
> +This approach will result in a few MiB of increased disk space for
> +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls,
> +and the elimination of "retpoline" overhead at run-time more than
> +compensate for it.
> +
> +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks
> +show a double-digit improvement in performance with default mitigations
> +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'.  And
> +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES
> +"KVM monolithic" still improve[s] performance, albiet it's on the order
> +of 1%.
> +
> +Conclusion
> +----------
> +
> +Removal of the common KVM module has a non-negligible positive
> +performance impact.  And the "KVM Monolitic" patch series is still
> +actively being reviewed, modulo some pending clean-ups.  Based on the
> +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other
> +reviewers seemed amenable to merge the series.
> +
> +Although, we still have to deal with mitigations for 'indirect branch
> +prediction' for a long time, reducing the VM-Exit latency is important
> +in general; and more specifically, for guest workloads that happen to
> +trigger frequent VM-Exits, without having to disable Spectre v2
> +mitigations on the host, as Andrea stated in the cover letter of his
> +patch series.
>
Paolo Bonzini Nov. 15, 2019, 12:18 p.m. UTC | #4
On 15/11/19 13:08, Thomas Huth wrote:
> On 08/11/2019 10.22, Kashyap Chamarthy wrote:
>> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
>> given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
>>
> 
>  Hi Kashyap,
> 
> first thanks for writing up this article! It's a really nice summary of
> the presentation, I think.
> 
> But before we include it, let me ask a meta-question: Is an article
> about the KVM *kernel* code suitable for the *QEMU* blog? Or is there
> maybe a better place for this, like an article on www.linux-kvm.org ?

I'm not sure there is such a thing as articles on www.linux-kvm.org. :)

I have the same doubt, actually.  Unfortunately I cannot think of
another place that would host KVM-specific articles.

Paolo

> 
> Opinions? Ideas?
> 
>  Thomas
> 
> 
>> ---
>>  ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++
>>  1 file changed, 115 insertions(+)
>>  create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>>
>> diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976
>> --- /dev/null
>> +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>> @@ -0,0 +1,115 @@
>> +---
>> +layout: post
>> +title: "Micro-Optimizing KVM VM-Exits"
>> +date:   2019-11-08
>> +categories: [kvm, optimization]
>> +---
>> +
>> +Background on VM-Exits
>> +----------------------
>> +
>> +KVM (Kernel-based Virtual Machine) is the Linux kernel module that
>> +allows a host to run virtualized guests (Linux, Windows, etc).  The KVM
>> +"guest execution loop", with QEMU (the open source emulator and
>> +virtualizer) as its user space, is roughly as follows: QEMU issues the
>> +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode"
>> +-- a special processor mode which allows guest code to safely run
>> +directly on the physical CPU.  The guest code, which is inside a "jail"
>> +and thus cannot interfere with the rest of the system, keeps running on
>> +the hardware until it encounters a request it cannot handle.  Then the
>> +processor gives the control back (referred to as "VM-Exit") either to
>> +kernel space, or to the user space to handle the request.  Once the
>> +request is handled, native execution of guest code on the processor
>> +resumes again.  And the loop goes on.
>> +
>> +There are dozens of reasons for VM-Exits (Intel's Software Developer
>> +Manual outlines 64 "Basic Exit Reasons").  For example, when a guest
>> +needs to emulate the CPUID instruction, it causes a "light-weight exit"
>> +to kernel space, because CPUID (among a few others) is emulated in the
>> +kernel itself, for performance reasons.  But when the kernel _cannot_
>> +handle a request, e.g. to emulate certain hardware, it results in a
>> +"heavy-weight exit" to QEMU, to perform the emulation.  These VM-Exits
>> +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can
>> +be expensive.  What can be done about it?
>> +
>> +Guest workloads that are hard to virtualize
>> +-------------------------------------------
>> +
>> +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
>> +Arcangeli, attempted to address the kernel part of minimizing VM-Exits.
>> +
>> +His talk touched on the cost of VM-Exits into the kernel, especially for
>> +guest workloads (e.g. enterprise databases) that are sensitive to their
>> +performance penalty.  However, these workloads cannot avoid triggering
>> +VM-Exits with a high frequency.  Andrea then outlined some of the
>> +optimizations he's been working on to improve the VM-Exit performance in
>> +the KVM code path -- especially in light of applying mitigations for
>> +speculative execution flaws (Spectre v2, MDS, L1TF).
>> +
>> +Andrea gave a brief recap of the different kinds of speculative
>> +execution attacks (retpolines, IBPB, PTI, SSBD, etc).  Followed by that
>> +he outlined the performance impact of Spectre-v2 mitigations in context
>> +of KVM.
>> +
>> +The microbechmark: CPUID in a one million loop
>> +----------------------------------------------
>> +
>> +The synthetic microbenchmark (meaning, focus on measuring the
>> +performance of a specific area of code) Andrea used was to run the CPUID
>> +instruction one million times, without any GCC optimizations or caching.
>> +This was done to test the latency of VM-Exits.
>> +
>> +While stressing that the results of these microbenchmarks do not
>> +represent real-world workloads, he had two goals in mind with it: (a)
>> +explain how the software mitigation works; and (b) to justify to the
>> +broader community the value of the software optimizations he's working
>> +on in KVM.
>> +
>> +Andrea then reasoned through several interesting graphs that show how
>> +CPU computation time gets impacted when you disable or enable the
>> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.
>> +
>> +The proposal: "KVM Monolithic"
>> +------------------------------
>> +
>> +Based on his investigation, Andrea proposed a patch series, ["KVM
>> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM
>> +common module, 'kvm.ko'.  Instead the KVM common code gets linked twice
>> +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and
>> +'kvm-amd.ko'.
>> +
>> +The reason for doing this is that the 'kvm.ko' module indirectly calls
>> +(via the "retpoline" technique) the vendor-specific KVM modules at every
>> +VM-Exit, several times.  These indirect calls were not optimal before,
>> +but the "retpoline" mitigation (which isolates indirect branches, that
>> +allow a CPU to execute code from arbitrary locations, from speculative
>> +execution) for Spectre v2 compounds the problem, as it degrades
>> +performance.
>> +
>> +This approach will result in a few MiB of increased disk space for
>> +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls,
>> +and the elimination of "retpoline" overhead at run-time more than
>> +compensate for it.
>> +
>> +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks
>> +show a double-digit improvement in performance with default mitigations
>> +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'.  And
>> +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES
>> +"KVM monolithic" still improve[s] performance, albiet it's on the order
>> +of 1%.
>> +
>> +Conclusion
>> +----------
>> +
>> +Removal of the common KVM module has a non-negligible positive
>> +performance impact.  And the "KVM Monolitic" patch series is still
>> +actively being reviewed, modulo some pending clean-ups.  Based on the
>> +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other
>> +reviewers seemed amenable to merge the series.
>> +
>> +Although, we still have to deal with mitigations for 'indirect branch
>> +prediction' for a long time, reducing the VM-Exit latency is important
>> +in general; and more specifically, for guest workloads that happen to
>> +trigger frequent VM-Exits, without having to disable Spectre v2
>> +mitigations on the host, as Andrea stated in the cover letter of his
>> +patch series.
>>
>
Alex Bennée Nov. 15, 2019, 12:25 p.m. UTC | #5
Thomas Huth <thuth@redhat.com> writes:

> On 08/11/2019 10.22, Kashyap Chamarthy wrote:
>> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
>> given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
>>
>
>  Hi Kashyap,
>
> first thanks for writing up this article! It's a really nice summary of
> the presentation, I think.
>
> But before we include it, let me ask a meta-question: Is an article
> about the KVM *kernel* code suitable for the *QEMU* blog? Or is there
> maybe a better place for this, like an article on www.linux-kvm.org ?
>
> Opinions? Ideas?

I don't think it is a particular problem hosting it on the QEMU blog
given the closeness of the two projects. It would get syndicated to
planet.libvirt as well ;-)

>
>  Thomas
>
>
>> ---
>>  ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++
>>  1 file changed, 115 insertions(+)
>>  create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>>
>> diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976
>> --- /dev/null
>> +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
>> @@ -0,0 +1,115 @@
>> +---
>> +layout: post
>> +title: "Micro-Optimizing KVM VM-Exits"
>> +date:   2019-11-08
>> +categories: [kvm, optimization]
>> +---
>> +
>> +Background on VM-Exits
>> +----------------------
>> +
>> +KVM (Kernel-based Virtual Machine) is the Linux kernel module that
>> +allows a host to run virtualized guests (Linux, Windows, etc).  The KVM
>> +"guest execution loop", with QEMU (the open source emulator and
>> +virtualizer) as its user space, is roughly as follows: QEMU issues the
>> +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode"
>> +-- a special processor mode which allows guest code to safely run
>> +directly on the physical CPU.  The guest code, which is inside a "jail"
>> +and thus cannot interfere with the rest of the system, keeps running on
>> +the hardware until it encounters a request it cannot handle.  Then the
>> +processor gives the control back (referred to as "VM-Exit") either to
>> +kernel space, or to the user space to handle the request.  Once the
>> +request is handled, native execution of guest code on the processor
>> +resumes again.  And the loop goes on.
>> +
>> +There are dozens of reasons for VM-Exits (Intel's Software Developer
>> +Manual outlines 64 "Basic Exit Reasons").  For example, when a guest
>> +needs to emulate the CPUID instruction, it causes a "light-weight exit"
>> +to kernel space, because CPUID (among a few others) is emulated in the
>> +kernel itself, for performance reasons.  But when the kernel _cannot_
>> +handle a request, e.g. to emulate certain hardware, it results in a
>> +"heavy-weight exit" to QEMU, to perform the emulation.  These VM-Exits
>> +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can
>> +be expensive.  What can be done about it?
>> +
>> +Guest workloads that are hard to virtualize
>> +-------------------------------------------
>> +
>> +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
>> +Arcangeli, attempted to address the kernel part of minimizing VM-Exits.
>> +
>> +His talk touched on the cost of VM-Exits into the kernel, especially for
>> +guest workloads (e.g. enterprise databases) that are sensitive to their
>> +performance penalty.  However, these workloads cannot avoid triggering
>> +VM-Exits with a high frequency.  Andrea then outlined some of the
>> +optimizations he's been working on to improve the VM-Exit performance in
>> +the KVM code path -- especially in light of applying mitigations for
>> +speculative execution flaws (Spectre v2, MDS, L1TF).
>> +
>> +Andrea gave a brief recap of the different kinds of speculative
>> +execution attacks (retpolines, IBPB, PTI, SSBD, etc).  Followed by that
>> +he outlined the performance impact of Spectre-v2 mitigations in context
>> +of KVM.
>> +
>> +The microbechmark: CPUID in a one million loop
>> +----------------------------------------------
>> +
>> +The synthetic microbenchmark (meaning, focus on measuring the
>> +performance of a specific area of code) Andrea used was to run the CPUID
>> +instruction one million times, without any GCC optimizations or caching.
>> +This was done to test the latency of VM-Exits.
>> +
>> +While stressing that the results of these microbenchmarks do not
>> +represent real-world workloads, he had two goals in mind with it: (a)
>> +explain how the software mitigation works; and (b) to justify to the
>> +broader community the value of the software optimizations he's working
>> +on in KVM.
>> +
>> +Andrea then reasoned through several interesting graphs that show how
>> +CPU computation time gets impacted when you disable or enable the
>> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.
>> +
>> +The proposal: "KVM Monolithic"
>> +------------------------------
>> +
>> +Based on his investigation, Andrea proposed a patch series, ["KVM
>> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM
>> +common module, 'kvm.ko'.  Instead the KVM common code gets linked twice
>> +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and
>> +'kvm-amd.ko'.
>> +
>> +The reason for doing this is that the 'kvm.ko' module indirectly calls
>> +(via the "retpoline" technique) the vendor-specific KVM modules at every
>> +VM-Exit, several times.  These indirect calls were not optimal before,
>> +but the "retpoline" mitigation (which isolates indirect branches, that
>> +allow a CPU to execute code from arbitrary locations, from speculative
>> +execution) for Spectre v2 compounds the problem, as it degrades
>> +performance.
>> +
>> +This approach will result in a few MiB of increased disk space for
>> +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls,
>> +and the elimination of "retpoline" overhead at run-time more than
>> +compensate for it.
>> +
>> +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks
>> +show a double-digit improvement in performance with default mitigations
>> +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'.  And
>> +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES
>> +"KVM monolithic" still improve[s] performance, albiet it's on the order
>> +of 1%.
>> +
>> +Conclusion
>> +----------
>> +
>> +Removal of the common KVM module has a non-negligible positive
>> +performance impact.  And the "KVM Monolitic" patch series is still
>> +actively being reviewed, modulo some pending clean-ups.  Based on the
>> +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other
>> +reviewers seemed amenable to merge the series.
>> +
>> +Although, we still have to deal with mitigations for 'indirect branch
>> +prediction' for a long time, reducing the VM-Exit latency is important
>> +in general; and more specifically, for guest workloads that happen to
>> +trigger frequent VM-Exits, without having to disable Spectre v2
>> +mitigations on the host, as Andrea stated in the cover letter of his
>> +patch series.
>>


--
Alex Bennée
Daniel P. Berrangé Nov. 15, 2019, 12:33 p.m. UTC | #6
On Fri, Nov 15, 2019 at 01:08:53PM +0100, Thomas Huth wrote:
> On 08/11/2019 10.22, Kashyap Chamarthy wrote:
> > This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
> > given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
> > 
> 
>  Hi Kashyap,
> 
> first thanks for writing up this article! It's a really nice summary of
> the presentation, I think.
> 
> But before we include it, let me ask a meta-question: Is an article
> about the KVM *kernel* code suitable for the *QEMU* blog? Or is there
> maybe a better place for this, like an article on www.linux-kvm.org ?
> 
> Opinions? Ideas?

I don't see a problem with this. KVM and QEMU developers work
very closely together and many users of QEMU care about the whole
stack, so KVM is on-topic IMHO


Regards,
Daniel
Kashyap Chamarthy Nov. 15, 2019, 12:37 p.m. UTC | #7
On Fri, Nov 15, 2019 at 01:08:53PM +0100, Thomas Huth wrote:
> On 08/11/2019 10.22, Kashyap Chamarthy wrote:
> > This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
> > given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
> > 
> 
>  Hi Kashyap,
> 
> first thanks for writing up this article! It's a really nice summary of
> the presentation, I think.

Hi Thomas,

Thanks!

> But before we include it, let me ask a meta-question: Is an article
> about the KVM *kernel* code suitable for the *QEMU* blog? 

I had the same thought, and expressed it to Stefan as such, when he
suggested qemu.org :-).  I too found it odd to have a kernel-heavy
article on qemu.org.

> Or is there
> maybe a better place for this, like an article on www.linux-kvm.org ?

I thought about it; but I've never seen anyone write an "article" there;
as it's a WikiSpace.  And, like Paolo, I couldn't think of a better
place either.  

FWIW, the qemu.org blog is indexed by a few blog "planet" aggregators;
and linux-kvm.org is largely a static site that is occasionally updated
by people if they happened to notice something (especially if it's
egregiously wrong).

> Opinions? Ideas?

Another _potential_ venue: Given the topic is kernel space-related, it
is likely to fit in with the LWN audience.  LWN itself says they
generally look for kernel-related articles.  Although, I'm aware that
there's already a few LWN articles being written on KVM Forum-based
talks.  (Perhaps once the "KVM Monolithic" patch series merges, this can
be reworked into a standalone LWN kernel article — assuming LWN is
amenable to it; need to check with LWN.)

[...]
Paolo Bonzini Nov. 15, 2019, 12:41 p.m. UTC | #8
On 15/11/19 13:37, Kashyap Chamarthy wrote:
>> Opinions? Ideas?
> Another _potential_ venue: Given the topic is kernel space-related, it
> is likely to fit in with the LWN audience.  LWN itself says they
> generally look for kernel-related articles.  Although, I'm aware that
> there's already a few LWN articles being written on KVM Forum-based
> talks.  (Perhaps once the "KVM Monolithic" patch series merges, this can
> be reworked into a standalone LWN kernel article — assuming LWN is
> amenable to it; need to check with LWN.)

Yeah, perhaps later.  For now I guess qemu.org is the best.

Paolo
Laszlo Ersek Nov. 15, 2019, 12:45 p.m. UTC | #9
On 11/08/19 10:22, Kashyap Chamarthy wrote:
> This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
> given by Andrea Arcangeli at the recently concluded KVM Forum 2019.
> 
> [1] https://kvmforum2019.sched.com/event/Tmwr/micro-optimizing-kvm-vm-exits-andrea-arcangeli-red-hat-inc
> 
> Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
> ---
>  ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++
>  1 file changed, 115 insertions(+)
>  create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
> 
> diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
> new file mode 100644
> index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976
> --- /dev/null
> +++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
> @@ -0,0 +1,115 @@
> +---
> +layout: post
> +title: "Micro-Optimizing KVM VM-Exits"
> +date:   2019-11-08
> +categories: [kvm, optimization]
> +---
> +
> +Background on VM-Exits
> +----------------------
> +
> +KVM (Kernel-based Virtual Machine) is the Linux kernel module that
> +allows a host to run virtualized guests (Linux, Windows, etc).  The KVM
> +"guest execution loop", with QEMU (the open source emulator and
> +virtualizer) as its user space, is roughly as follows: QEMU issues the
> +ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode"
> +-- a special processor mode which allows guest code to safely run
> +directly on the physical CPU.  The guest code, which is inside a "jail"
> +and thus cannot interfere with the rest of the system, keeps running on
> +the hardware until it encounters a request it cannot handle.  Then the
> +processor gives the control back (referred to as "VM-Exit") either to
> +kernel space, or to the user space to handle the request.  Once the
> +request is handled, native execution of guest code on the processor
> +resumes again.  And the loop goes on.
> +
> +There are dozens of reasons for VM-Exits (Intel's Software Developer
> +Manual outlines 64 "Basic Exit Reasons").  For example, when a guest
> +needs to emulate the CPUID instruction, it causes a "light-weight exit"
> +to kernel space, because CPUID (among a few others) is emulated in the
> +kernel itself, for performance reasons.  But when the kernel _cannot_
> +handle a request, e.g. to emulate certain hardware, it results in a
> +"heavy-weight exit" to QEMU, to perform the emulation.  These VM-Exits
> +and subsequent re-entries ("VM-Enters"), even the light-weight ones, can
> +be expensive.  What can be done about it?
> +
> +Guest workloads that are hard to virtualize
> +-------------------------------------------
> +
> +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
> +Arcangeli, attempted to address the kernel part of minimizing VM-Exits.

I'd suggest "addressed", not "attempted to address".

> +
> +His talk touched on the cost of VM-Exits into the kernel, especially for
> +guest workloads (e.g. enterprise databases) that are sensitive to their
> +performance penalty.  However, these workloads cannot avoid triggering
> +VM-Exits with a high frequency.  Andrea then outlined some of the
> +optimizations he's been working on to improve the VM-Exit performance in
> +the KVM code path -- especially in light of applying mitigations for
> +speculative execution flaws (Spectre v2, MDS, L1TF).
> +
> +Andrea gave a brief recap of the different kinds of speculative
> +execution attacks (retpolines, IBPB, PTI, SSBD, etc).  Followed by that
> +he outlined the performance impact of Spectre-v2 mitigations in context
> +of KVM.
> +
> +The microbechmark: CPUID in a one million loop
> +----------------------------------------------
> +
> +The synthetic microbenchmark (meaning, focus on measuring the
> +performance of a specific area of code) Andrea used was to run the CPUID
> +instruction one million times, without any GCC optimizations or caching.
> +This was done to test the latency of VM-Exits.
> +
> +While stressing that the results of these microbenchmarks do not
> +represent real-world workloads, he had two goals in mind with it: (a)
> +explain how the software mitigation works; and (b) to justify to the
> +broader community the value of the software optimizations he's working
> +on in KVM.
> +
> +Andrea then reasoned through several interesting graphs that show how
> +CPU computation time gets impacted when you disable or enable the
> +various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.
> +
> +The proposal: "KVM Monolithic"
> +------------------------------
> +
> +Based on his investigation, Andrea proposed a patch series, ["KVM
> +monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM
> +common module, 'kvm.ko'.  Instead the KVM common code gets linked twice
> +into each of the vendor-specific KVM modules, 'kvm-intel.ko' and
> +'kvm-amd.ko'.
> +
> +The reason for doing this is that the 'kvm.ko' module indirectly calls
> +(via the "retpoline" technique) the vendor-specific KVM modules at every
> +VM-Exit, several times.  These indirect calls were not optimal before,
> +but the "retpoline" mitigation (which isolates indirect branches, that
> +allow a CPU to execute code from arbitrary locations, from speculative
> +execution) for Spectre v2 compounds the problem, as it degrades
> +performance.
> +
> +This approach will result in a few MiB of increased disk space for
> +'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls,
> +and the elimination of "retpoline" overhead at run-time more than
> +compensate for it.
> +
> +With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks
> +show a double-digit improvement in performance with default mitigations
> +(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'.  And
> +with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES
> +"KVM monolithic" still improve[s] performance, albiet it's on the order
> +of 1%.
> +
> +Conclusion
> +----------
> +
> +Removal of the common KVM module has a non-negligible positive
> +performance impact.  And the "KVM Monolitic" patch series is still
> +actively being reviewed, modulo some pending clean-ups.  Based on the
> +upstream review discussion, KVM Maintainer, Paolo Bonzini, and other
> +reviewers seemed amenable to merge the series.
> +
> +Although, we still have to deal with mitigations for 'indirect branch
> +prediction' for a long time, reducing the VM-Exit latency is important
> +in general; and more specifically, for guest workloads that happen to
> +trigger frequent VM-Exits, without having to disable Spectre v2
> +mitigations on the host, as Andrea stated in the cover letter of his
> +patch series.
> 

This article refers to "indirect calls" and "indirect branches" quite a
few times.

I suggest mentioning "function pointers" at least once...

(AIUI, the core of the issue is that kvm.ko calls kvm-intel.ko and
kvm-amd.ko through function pointers. Such calls are the target of
malicious branch predictor mis-training, and therefore, as a
counter-measure, they are compiled into retpolines, rather than the
directly corresponding indirect call assembly instructions. But
retpolines run slowly, in comparison. Calling the functions in question
by name, in the C source code, rather than via function pointers,
eliminates the indirect call assembly instructions, and obviates the
need for retpolines. The resultant C source code is less abstract and
less dynamic at runtime, but the original indirection isn't inherently
necessary at runtime.)

I couldn't attend Andrea's presentation, nor have I seen the slides, or
a recording thereof, or the patchset; so I could easily be off. My point
is, *if* the expression "function pointers" applies in this context,
please do mention it; otherwise "indirect calls" just hangs in the air,
IMHO.

It might be as simple as replacing

  These indirect calls were not optimal before,

with

  These indirect calls -- via function pointers in the C source code --
  were not optimal before,

Thanks!
Laszlo
Kashyap Chamarthy Nov. 15, 2019, 3:19 p.m. UTC | #10
On Fri, Nov 15, 2019 at 01:41:01PM +0100, Paolo Bonzini wrote:
> On 15/11/19 13:37, Kashyap Chamarthy wrote:
> >> Opinions? Ideas?
> > Another _potential_ venue: Given the topic is kernel space-related, it
> > is likely to fit in with the LWN audience.  LWN itself says they
> > generally look for kernel-related articles.  Although, I'm aware that
> > there's already a few LWN articles being written on KVM Forum-based
> > talks.  (Perhaps once the "KVM Monolithic" patch series merges, this can
> > be reworked into a standalone LWN kernel article — assuming LWN is
> > amenable to it; need to check with LWN.)
> 
> Yeah, perhaps later.  For now I guess qemu.org is the best.

Sure; others also seem to agree it's okay to be on qemu.org.
Kashyap Chamarthy Nov. 15, 2019, 3:27 p.m. UTC | #11
On Fri, Nov 15, 2019 at 01:45:51PM +0100, Laszlo Ersek wrote:
> On 11/08/19 10:22, Kashyap Chamarthy wrote:

[...]

> > +Guest workloads that are hard to virtualize
> > +-------------------------------------------
> > +
> > +At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
> > +Arcangeli, attempted to address the kernel part of minimizing VM-Exits.
> 
> I'd suggest "addressed", not "attempted to address".

Will fix in next iteration.

[...]

> > +Conclusion
> > +----------

[...]

> > +Although, we still have to deal with mitigations for 'indirect branch
> > +prediction' for a long time, reducing the VM-Exit latency is important
> > +in general; and more specifically, for guest workloads that happen to
> > +trigger frequent VM-Exits, without having to disable Spectre v2
> > +mitigations on the host, as Andrea stated in the cover letter of his
> > +patch series.
> > 
> 
> This article refers to "indirect calls" and "indirect branches" quite a
> few times.
> 
> I suggest mentioning "function pointers" at least once...
> 
> (AIUI, the core of the issue is that kvm.ko calls kvm-intel.ko and
> kvm-amd.ko through function pointers. Such calls are the target of
> malicious branch predictor mis-training, and therefore, as a
> counter-measure, they are compiled into retpolines, rather than the
> directly corresponding indirect call assembly instructions. But
> retpolines run slowly, in comparison. Calling the functions in question
> by name, in the C source code, rather than via function pointers,
> eliminates the indirect call assembly instructions, and obviates the
> need for retpolines. The resultant C source code is less abstract and
> less dynamic at runtime, but the original indirection isn't inherently
> necessary at runtime.)
> 
> I couldn't attend Andrea's presentation, nor have I seen the slides, or
> a recording thereof, or the patchset; so I could easily be off. 

I think your above explanation is indeed correct (which I couldn't have
articulated so well; thanks!), based on my understanding, and reading
Andrea's patch[*] and its commit message:

    "This [patch] replaces all kvm_x86_ops pointer to functions with
    regular external functions that don't require indirect calls.

    "[...] The pointer to function virtual template model cannot provide
    any runtime benefit because kvm-intel and kvm-amd can't be loaded at
    the same time. [...]"


[*] https://lkml.org/lkml/2019/9/20/932 --  [PATCH 02/17] KVM:
    monolithic: x86: convert the kvm_x86_ops methods to external functions

> My point is, *if* the expression "function pointers" applies in this
> context, please do mention it; otherwise "indirect calls" just hangs
> in the air, IMHO.
> 
> It might be as simple as replacing
> 
>   These indirect calls were not optimal before,
> 
> with
> 
>   These indirect calls -- via function pointers in the C source code
>   -- were not optimal before,

Will fix; thanks for the thorough review.

If you want to read Andrea's slides, here they are:

    https://static.sched.com/hosted_files/kvmforum2019/3b/kvm-monolithic.pdf

Thanks for the review!
diff mbox series

Patch

diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976
--- /dev/null
+++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
@@ -0,0 +1,115 @@ 
+---
+layout: post
+title: "Micro-Optimizing KVM VM-Exits"
+date:   2019-11-08
+categories: [kvm, optimization]
+---
+
+Background on VM-Exits
+----------------------
+
+KVM (Kernel-based Virtual Machine) is the Linux kernel module that
+allows a host to run virtualized guests (Linux, Windows, etc).  The KVM
+"guest execution loop", with QEMU (the open source emulator and
+virtualizer) as its user space, is roughly as follows: QEMU issues the
+ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode"
+-- a special processor mode which allows guest code to safely run
+directly on the physical CPU.  The guest code, which is inside a "jail"
+and thus cannot interfere with the rest of the system, keeps running on
+the hardware until it encounters a request it cannot handle.  Then the
+processor gives the control back (referred to as "VM-Exit") either to
+kernel space, or to the user space to handle the request.  Once the
+request is handled, native execution of guest code on the processor
+resumes again.  And the loop goes on.
+
+There are dozens of reasons for VM-Exits (Intel's Software Developer
+Manual outlines 64 "Basic Exit Reasons").  For example, when a guest
+needs to emulate the CPUID instruction, it causes a "light-weight exit"
+to kernel space, because CPUID (among a few others) is emulated in the
+kernel itself, for performance reasons.  But when the kernel _cannot_
+handle a request, e.g. to emulate certain hardware, it results in a
+"heavy-weight exit" to QEMU, to perform the emulation.  These VM-Exits
+and subsequent re-entries ("VM-Enters"), even the light-weight ones, can
+be expensive.  What can be done about it?
+
+Guest workloads that are hard to virtualize
+-------------------------------------------
+
+At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
+Arcangeli, attempted to address the kernel part of minimizing VM-Exits.
+
+His talk touched on the cost of VM-Exits into the kernel, especially for
+guest workloads (e.g. enterprise databases) that are sensitive to their
+performance penalty.  However, these workloads cannot avoid triggering
+VM-Exits with a high frequency.  Andrea then outlined some of the
+optimizations he's been working on to improve the VM-Exit performance in
+the KVM code path -- especially in light of applying mitigations for
+speculative execution flaws (Spectre v2, MDS, L1TF).
+
+Andrea gave a brief recap of the different kinds of speculative
+execution attacks (retpolines, IBPB, PTI, SSBD, etc).  Followed by that
+he outlined the performance impact of Spectre-v2 mitigations in context
+of KVM.
+
+The microbechmark: CPUID in a one million loop
+----------------------------------------------
+
+The synthetic microbenchmark (meaning, focus on measuring the
+performance of a specific area of code) Andrea used was to run the CPUID
+instruction one million times, without any GCC optimizations or caching.
+This was done to test the latency of VM-Exits.
+
+While stressing that the results of these microbenchmarks do not
+represent real-world workloads, he had two goals in mind with it: (a)
+explain how the software mitigation works; and (b) to justify to the
+broader community the value of the software optimizations he's working
+on in KVM.
+
+Andrea then reasoned through several interesting graphs that show how
+CPU computation time gets impacted when you disable or enable the
+various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.
+
+The proposal: "KVM Monolithic"
+------------------------------
+
+Based on his investigation, Andrea proposed a patch series, ["KVM
+monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM
+common module, 'kvm.ko'.  Instead the KVM common code gets linked twice
+into each of the vendor-specific KVM modules, 'kvm-intel.ko' and
+'kvm-amd.ko'.
+
+The reason for doing this is that the 'kvm.ko' module indirectly calls
+(via the "retpoline" technique) the vendor-specific KVM modules at every
+VM-Exit, several times.  These indirect calls were not optimal before,
+but the "retpoline" mitigation (which isolates indirect branches, that
+allow a CPU to execute code from arbitrary locations, from speculative
+execution) for Spectre v2 compounds the problem, as it degrades
+performance.
+
+This approach will result in a few MiB of increased disk space for
+'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls,
+and the elimination of "retpoline" overhead at run-time more than
+compensate for it.
+
+With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks
+show a double-digit improvement in performance with default mitigations
+(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'.  And
+with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES
+"KVM monolithic" still improve[s] performance, albiet it's on the order
+of 1%.
+
+Conclusion
+----------
+
+Removal of the common KVM module has a non-negligible positive
+performance impact.  And the "KVM Monolitic" patch series is still
+actively being reviewed, modulo some pending clean-ups.  Based on the
+upstream review discussion, KVM Maintainer, Paolo Bonzini, and other
+reviewers seemed amenable to merge the series.
+
+Although, we still have to deal with mitigations for 'indirect branch
+prediction' for a long time, reducing the VM-Exit latency is important
+in general; and more specifically, for guest workloads that happen to
+trigger frequent VM-Exits, without having to disable Spectre v2
+mitigations on the host, as Andrea stated in the cover letter of his
+patch series.