Patchwork [19/23] KVM: PPC: Book3S: Select PR vs HV separately for each guest

login
register
mail settings
Submitter Paul Mackerras
Date Aug. 6, 2013, 4:26 a.m.
Message ID <20130806042628.GY19254@iris.ozlabs.ibm.com>
Download mbox | patch
Permalink /patch/264863/
State New
Headers show

Comments

Paul Mackerras - Aug. 6, 2013, 4:26 a.m.
This makes it possible to have both PR and HV guests running
concurrently on one machine, by deferring the decision about which type
of KVM to use for each guest until it either enables the PAPR capability
or runs a vcpu.  (Of course, this is only possible if both
CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)

Guests start out essentially as PR guests but with kvm->arch.kvm_mode
set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
capability, and the machine is capable of running HV guests (i.e. it
has suitable CPUs and has a usable hypervisor mode available), the
guest gets converted to an HV guest at that point.  If userspace runs
a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
guest is confirmed as a PR guest at that point.

This also moves the preloading of the FPU for PR guests from
kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
kvmppc_set_msr_pr() can be called before any vcpu has been run, and
it may be that the guest will end up as a HV guest, and in this case
the preloading is not appropriate.  Instead it is now done after we
have emulated a privileged or illegal instruction, if the guest MSR
now has FP set.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s.h |  3 ++
 arch/powerpc/include/asm/kvm_booke.h  |  2 +
 arch/powerpc/include/asm/kvm_host.h   |  1 +
 arch/powerpc/kvm/book3s.c             | 76 ++++++++++++++++++++++++++++------
 arch/powerpc/kvm/book3s_hv.c          | 77 +++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_pr.c          | 24 +++++++++--
 arch/powerpc/kvm/powerpc.c            |  1 +
 7 files changed, 167 insertions(+), 17 deletions(-)
Alexander Graf - Sept. 12, 2013, 10:56 p.m.
On 05.08.2013, at 23:26, Paul Mackerras wrote:

> This makes it possible to have both PR and HV guests running
> concurrently on one machine, by deferring the decision about which type
> of KVM to use for each guest until it either enables the PAPR capability
> or runs a vcpu.  (Of course, this is only possible if both
> CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)
> 
> Guests start out essentially as PR guests but with kvm->arch.kvm_mode
> set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
> capability, and the machine is capable of running HV guests (i.e. it
> has suitable CPUs and has a usable hypervisor mode available), the
> guest gets converted to an HV guest at that point.  If userspace runs
> a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
> guest is confirmed as a PR guest at that point.
> 
> This also moves the preloading of the FPU for PR guests from
> kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
> kvmppc_set_msr_pr() can be called before any vcpu has been run, and
> it may be that the guest will end up as a HV guest, and in this case
> the preloading is not appropriate.  Instead it is now done after we
> have emulated a privileged or illegal instruction, if the guest MSR
> now has FP set.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

We need to have a way to force set the mode to either HV, PR or "try HV, fall back to PR if it wouldn't work" (the one you implemented). That way management software can choose to not default to fallback mode if it wants to guarantee consistent performance.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras - Sept. 13, 2013, 12:17 a.m.
On Thu, Sep 12, 2013 at 05:56:11PM -0500, Alexander Graf wrote:
> 
> On 05.08.2013, at 23:26, Paul Mackerras wrote:
> 
> > This makes it possible to have both PR and HV guests running
> > concurrently on one machine, by deferring the decision about which type
> > of KVM to use for each guest until it either enables the PAPR capability
> > or runs a vcpu.  (Of course, this is only possible if both
> > CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)
> > 
> > Guests start out essentially as PR guests but with kvm->arch.kvm_mode
> > set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
> > capability, and the machine is capable of running HV guests (i.e. it
> > has suitable CPUs and has a usable hypervisor mode available), the
> > guest gets converted to an HV guest at that point.  If userspace runs
> > a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
> > guest is confirmed as a PR guest at that point.
> > 
> > This also moves the preloading of the FPU for PR guests from
> > kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
> > kvmppc_set_msr_pr() can be called before any vcpu has been run, and
> > it may be that the guest will end up as a HV guest, and in this case
> > the preloading is not appropriate.  Instead it is now done after we
> > have emulated a privileged or illegal instruction, if the guest MSR
> > now has FP set.
> > 
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> We need to have a way to force set the mode to either HV, PR or "try HV, fall back to PR if it wouldn't work" (the one you implemented). That way management software can choose to not default to fallback mode if it wants to guarantee consistent performance.

Yes, Anthony Liguori mentioned a similar concern to me.

Aneesh and I are currently investigating an alternative approach,
which is much more like the x86 way of doing things.  We are looking
at splitting the code into three modules: a kvm_pr.ko module with the
PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
core kvm.ko module with the generic bits (basically book3s.c,
powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
use).  Basically the core module would have a pointer to a struct
full of function pointers for the various ops that book3s_pr.c and
book3s_hv.c both provide.  You would only be able to have one of
kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
could have them both built in but only one could register its function
pointer struct with the core.  Obviously the kvm_hv module would only
load and register its struct on a machine that had hypervisor mode
available.  If they were both built in I would think we would give HV
the first chance to register itself, and let PR register if we can't
do HV.

How does that sound?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - Sept. 13, 2013, 1:31 a.m.
On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:

> Aneesh and I are currently investigating an alternative approach,
> which is much more like the x86 way of doing things.  We are looking
> at splitting the code into three modules: a kvm_pr.ko module with the
> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
> core kvm.ko module with the generic bits (basically book3s.c,
> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
> use).  Basically the core module would have a pointer to a struct
> full of function pointers for the various ops that book3s_pr.c and
> book3s_hv.c both provide.  You would only be able to have one of
> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
> could have them both built in but only one could register its function
> pointer struct with the core.  Obviously the kvm_hv module would only
> load and register its struct on a machine that had hypervisor mode
> available.  If they were both built in I would think we would give HV
> the first chance to register itself, and let PR register if we can't
> do HV.
> 
> How does that sound?

As long as we can force-load the PR one on a machine that normally runs
HV for the sake of testing ...

Also, all those KVM modules ... they don't auto-load do they ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Sept. 13, 2013, 4:17 a.m.
Am 12.09.2013 um 19:17 schrieb Paul Mackerras <paulus@samba.org>:

> On Thu, Sep 12, 2013 at 05:56:11PM -0500, Alexander Graf wrote:
>> 
>> On 05.08.2013, at 23:26, Paul Mackerras wrote:
>> 
>>> This makes it possible to have both PR and HV guests running
>>> concurrently on one machine, by deferring the decision about which type
>>> of KVM to use for each guest until it either enables the PAPR capability
>>> or runs a vcpu.  (Of course, this is only possible if both
>>> CONFIG_KVM_BOOK3S_PR and CONFIG_KVM_BOOK3S_64_HV are enabled.)
>>> 
>>> Guests start out essentially as PR guests but with kvm->arch.kvm_mode
>>> set to KVM_MODE_UNKNOWN.  If the guest then enables the KVM_CAP_PPC_PAPR
>>> capability, and the machine is capable of running HV guests (i.e. it
>>> has suitable CPUs and has a usable hypervisor mode available), the
>>> guest gets converted to an HV guest at that point.  If userspace runs
>>> a vcpu without having enabled the KVM_CAP_PPC_PAPR capability, the
>>> guest is confirmed as a PR guest at that point.
>>> 
>>> This also moves the preloading of the FPU for PR guests from
>>> kvmppc_set_msr_pr() into kvmppc_handle_exit_pr(), because
>>> kvmppc_set_msr_pr() can be called before any vcpu has been run, and
>>> it may be that the guest will end up as a HV guest, and in this case
>>> the preloading is not appropriate.  Instead it is now done after we
>>> have emulated a privileged or illegal instruction, if the guest MSR
>>> now has FP set.
>>> 
>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>> 
>> We need to have a way to force set the mode to either HV, PR or "try HV, fall back to PR if it wouldn't work" (the one you implemented). That way management software can choose to not default to fallback mode if it wants to guarantee consistent performance.
> 
> Yes, Anthony Liguori mentioned a similar concern to me.
> 
> Aneesh and I are currently investigating an alternative approach,
> which is much more like the x86 way of doing things.  We are looking
> at splitting the code into three modules: a kvm_pr.ko module with the
> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
> core kvm.ko module with the generic bits (basically book3s.c,
> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
> use).  Basically the core module would have a pointer to a struct
> full of function pointers for the various ops that book3s_pr.c and
> book3s_hv.c both provide.  You would only be able to have one of
> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
> could have them both built in but only one could register its function
> pointer struct with the core.  Obviously the kvm_hv module would only
> load and register its struct on a machine that had hypervisor mode
> available.  If they were both built in I would think we would give HV
> the first chance to register itself, and let PR register if we can't
> do HV.
> 
> How does that sound?

It means you can only choose between HV and PR machine wide, while with this patch set you give the user the flexibility to have HV and PR guests run in parallel.

I know that Anthony doesn't believe it's a valid use case, but I like the flexible solution better. It does however male sense to enable a sysadmin to remove any PR functionality from the system by blocking that module.

Can't we have both?

Alex

> 
> Paul.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Sept. 13, 2013, 4:18 a.m.
Am 12.09.2013 um 20:31 schrieb Benjamin Herrenschmidt <benh@kernel.crashing.org>:

> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
> 
>> Aneesh and I are currently investigating an alternative approach,
>> which is much more like the x86 way of doing things.  We are looking
>> at splitting the code into three modules: a kvm_pr.ko module with the
>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>> core kvm.ko module with the generic bits (basically book3s.c,
>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>> use).  Basically the core module would have a pointer to a struct
>> full of function pointers for the various ops that book3s_pr.c and
>> book3s_hv.c both provide.  You would only be able to have one of
>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>> could have them both built in but only one could register its function
>> pointer struct with the core.  Obviously the kvm_hv module would only
>> load and register its struct on a machine that had hypervisor mode
>> available.  If they were both built in I would think we would give HV
>> the first chance to register itself, and let PR register if we can't
>> do HV.
>> 
>> How does that sound?
> 
> As long as we can force-load the PR one on a machine that normally runs
> HV for the sake of testing ...
> 
> Also, all those KVM modules ... they don't auto-load do they ?

They don't today, but they should.

Alex

> 
> Cheers,
> Ben.
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V - Sept. 14, 2013, 6:33 p.m.
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>
>> Aneesh and I are currently investigating an alternative approach,
>> which is much more like the x86 way of doing things.  We are looking
>> at splitting the code into three modules: a kvm_pr.ko module with the
>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>> core kvm.ko module with the generic bits (basically book3s.c,
>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>> use).  Basically the core module would have a pointer to a struct
>> full of function pointers for the various ops that book3s_pr.c and
>> book3s_hv.c both provide.  You would only be able to have one of
>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>> could have them both built in but only one could register its function
>> pointer struct with the core.  Obviously the kvm_hv module would only
>> load and register its struct on a machine that had hypervisor mode
>> available.  If they were both built in I would think we would give HV
>> the first chance to register itself, and let PR register if we can't
>> do HV.
>> 
>> How does that sound?
>
> As long as we can force-load the PR one on a machine that normally runs
> HV for the sake of testing ...

This is what I currently have

[root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
[root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
insmod: ERROR: could not insert module ./kvm-pr.ko: File exists
[root@llmp24l02 kvm]# rmmod kvm-hv
[root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
[root@llmp24l02 kvm]# 

So if by force load you mean rmmod kvm-hv and then modprobe kvm-pr that
works. But loading kvm-pr along side kvm-hv is not supported. My
understanding was we didn't want to allow that because that can confuse users
when they are not sure whether it is hv or pr kvm they are using.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Sept. 14, 2013, 8:22 p.m.
Am 14.09.2013 um 13:33 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:

> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
>> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>> 
>>> Aneesh and I are currently investigating an alternative approach,
>>> which is much more like the x86 way of doing things.  We are looking
>>> at splitting the code into three modules: a kvm_pr.ko module with the
>>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>>> core kvm.ko module with the generic bits (basically book3s.c,
>>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>>> use).  Basically the core module would have a pointer to a struct
>>> full of function pointers for the various ops that book3s_pr.c and
>>> book3s_hv.c both provide.  You would only be able to have one of
>>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>>> could have them both built in but only one could register its function
>>> pointer struct with the core.  Obviously the kvm_hv module would only
>>> load and register its struct on a machine that had hypervisor mode
>>> available.  If they were both built in I would think we would give HV
>>> the first chance to register itself, and let PR register if we can't
>>> do HV.
>>> 
>>> How does that sound?
>> 
>> As long as we can force-load the PR one on a machine that normally runs
>> HV for the sake of testing ...
> 
> This is what I currently have
> 
> [root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
> insmod: ERROR: could not insert module ./kvm-pr.ko: File exists

The reason this model makes sense for x86 is that you never have SVM and VMX in the cpu at the same time. Either it is an AMD chip or an Intel chip.

PR and HV however are not mutually exclusive in hardware. What you really want is

1) distro can force HV/PR
2) admin can force HV/PR
3) user can force HV/PR
4) by default things "just work"

1 can be done through kernel config options.
2 can be done through modules that get loaded or not
3 can be done through a vm ioctl
4 only works if you allow hv and pr to be available at the same time

I can assume who you talked to about this to make these design decisions, but it definitely was not me.


Alex


> [root@llmp24l02 kvm]# rmmod kvm-hv
> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
> [root@llmp24l02 kvm]# 
> 
> So if by force load you mean rmmod kvm-hv and then modprobe kvm-pr that
> works. But loading kvm-pr along side kvm-hv is not supported. My
> understanding was we didn't want to allow that because that can confuse users
> when they are not sure whether it is hv or pr kvm they are using.
> 
> -aneesh
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V - Sept. 15, 2013, 9:16 a.m.
Alexander Graf <agraf@suse.de> writes:

> Am 14.09.2013 um 13:33 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:
>
>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>> 
>>> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>>> 
>>>> Aneesh and I are currently investigating an alternative approach,
>>>> which is much more like the x86 way of doing things.  We are looking
>>>> at splitting the code into three modules: a kvm_pr.ko module with the
>>>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>>>> core kvm.ko module with the generic bits (basically book3s.c,
>>>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>>>> use).  Basically the core module would have a pointer to a struct
>>>> full of function pointers for the various ops that book3s_pr.c and
>>>> book3s_hv.c both provide.  You would only be able to have one of
>>>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>>>> could have them both built in but only one could register its function
>>>> pointer struct with the core.  Obviously the kvm_hv module would only
>>>> load and register its struct on a machine that had hypervisor mode
>>>> available.  If they were both built in I would think we would give HV
>>>> the first chance to register itself, and let PR register if we can't
>>>> do HV.
>>>> 
>>>> How does that sound?
>>> 
>>> As long as we can force-load the PR one on a machine that normally runs
>>> HV for the sake of testing ...
>> 
>> This is what I currently have
>> 
>> [root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
>> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
>> insmod: ERROR: could not insert module ./kvm-pr.ko: File exists
>
> The reason this model makes sense for x86 is that you never have SVM and VMX in the cpu at the same time. Either it is an AMD chip or an Intel chip.
>
> PR and HV however are not mutually exclusive in hardware. What you really want is
>
> 1) distro can force HV/PR
> 2) admin can force HV/PR
> 3) user can force HV/PR
> 4) by default things "just work"
>
> 1 can be done through kernel config options.
> 2 can be done through modules that get loaded or not
> 3 can be done through a vm ioctl
> 4 only works if you allow hv and pr to be available at the same time
>
> I can assume who you talked to about this to make these design decisions, but it definitely was not me.

I didn't had much discussion around the design with anybody yet. What
you saw above was me changing/moving code around madly to get
something working in a day. I was hoping to get something that I can post as RFC
early and let others to comment. Good to get the feedback early.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Sept. 15, 2013, 11:55 a.m.
Am 15.09.2013 um 04:16 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:

> Alexander Graf <agraf@suse.de> writes:
> 
>> Am 14.09.2013 um 13:33 schrieb "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>:
>> 
>>> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>>> 
>>>> On Fri, 2013-09-13 at 10:17 +1000, Paul Mackerras wrote:
>>>> 
>>>>> Aneesh and I are currently investigating an alternative approach,
>>>>> which is much more like the x86 way of doing things.  We are looking
>>>>> at splitting the code into three modules: a kvm_pr.ko module with the
>>>>> PR-specific bits, a kvm_hv.ko module with the HV-specific bits, and a
>>>>> core kvm.ko module with the generic bits (basically book3s.c,
>>>>> powerpc.c, stuff from virt/kvm/, plus the stuff that both PR and HV
>>>>> use).  Basically the core module would have a pointer to a struct
>>>>> full of function pointers for the various ops that book3s_pr.c and
>>>>> book3s_hv.c both provide.  You would only be able to have one of
>>>>> kvm_pr and kvm_hv loaded at any one time.  If they were built in, you
>>>>> could have them both built in but only one could register its function
>>>>> pointer struct with the core.  Obviously the kvm_hv module would only
>>>>> load and register its struct on a machine that had hypervisor mode
>>>>> available.  If they were both built in I would think we would give HV
>>>>> the first chance to register itself, and let PR register if we can't
>>>>> do HV.
>>>>> 
>>>>> How does that sound?
>>>> 
>>>> As long as we can force-load the PR one on a machine that normally runs
>>>> HV for the sake of testing ...
>>> 
>>> This is what I currently have
>>> 
>>> [root@llmp24l02 kvm]# insmod ./kvm-hv.ko 
>>> [root@llmp24l02 kvm]# insmod ./kvm-pr.ko 
>>> insmod: ERROR: could not insert module ./kvm-pr.ko: File exists
>> 
>> The reason this model makes sense for x86 is that you never have SVM and VMX in the cpu at the same time. Either it is an AMD chip or an Intel chip.
>> 
>> PR and HV however are not mutually exclusive in hardware. What you really want is
>> 
>> 1) distro can force HV/PR
>> 2) admin can force HV/PR
>> 3) user can force HV/PR
>> 4) by default things "just work"
>> 
>> 1 can be done through kernel config options.
>> 2 can be done through modules that get loaded or not
>> 3 can be done through a vm ioctl
>> 4 only works if you allow hv and pr to be available at the same time
>> 
>> I can assume who you talked to about this to make these design decisions, but it definitely was not me.
> 
> I didn't had much discussion around the design with anybody yet. What
> you saw above was me changing/moving code around madly to get
> something working in a day. I was hoping to get something that I can post as RFC
> early and let others to comment. Good to get the feedback early.

Heh ok :). I think we want to be flexible here unless complexity grows too much of a maintenance burden and/or slows things down.

Alex

> 
> -aneesh
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras - Sept. 18, 2013, 12:05 p.m.
On Thu, Sep 12, 2013 at 11:17:11PM -0500, Alexander Graf wrote:
> 
> It means you can only choose between HV and PR machine wide, while with this patch set you give the user the flexibility to have HV and PR guests run in parallel.
> 
> I know that Anthony doesn't believe it's a valid use case, but I like the flexible solution better. It does however male sense to enable a sysadmin to remove any PR functionality from the system by blocking that module.
> 
> Can't we have both?

So, one suggestion (from Aneesh) is to use the 'type' argument to
kvm_arch_init_vm() to indicate whether we want a specific type of KVM
(PR or HV), or just the default.  Zero would mean default (fastest
available) whereas other values would indicate a specific choice of PR
or HV.  Then, if we build separate kvm_pr and kvm_hv modules when KVM
is configured to be a module, the sysadmin can control the default
choice by loading and unloading modules.

How does that sound?  Or would you prefer to stick with a single
module and have a module option to control the default choice?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Sept. 19, 2013, 7:31 a.m.
Am 18.09.2013 um 07:05 schrieb Paul Mackerras <paulus@samba.org>:

> On Thu, Sep 12, 2013 at 11:17:11PM -0500, Alexander Graf wrote:
>> 
>> It means you can only choose between HV and PR machine wide, while with this patch set you give the user the flexibility to have HV and PR guests run in parallel.
>> 
>> I know that Anthony doesn't believe it's a valid use case, but I like the flexible solution better. It does however male sense to enable a sysadmin to remove any PR functionality from the system by blocking that module.
>> 
>> Can't we have both?
> 
> So, one suggestion (from Aneesh) is to use the 'type' argument to
> kvm_arch_init_vm() to indicate whether we want a specific type of KVM
> (PR or HV), or just the default.  Zero would mean default (fastest
> available) whereas other values would indicate a specific choice of PR
> or HV.  Then, if we build separate kvm_pr and kvm_hv modules when KVM
> is configured to be a module, the sysadmin can control the default
> choice by loading and unloading modules.
> 
> How does that sound?  Or would you prefer to stick with a single
> module and have a module option to control the default choice?

I think keeping 2 modules makes a lot of sense, but I'm not sure a parameter to init_vm works well with the way we model machines in QEMU. IIRC we only know that we force anything in the machine model initialization which happens way past the vm init.

Alex

> 
> Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index f6af43f..e0bc83b 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -189,6 +189,9 @@  extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst);
 extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst);
 extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd);
 
+extern void kvmppc_core_enable_papr(struct kvm_vcpu *vcpu);
+extern int kvmppc_convert_to_hv(struct kvm *kvm);
+extern void kvmppc_release_vcpu_pr(struct kvm_vcpu *vcpu);
 /* Functions that have implementations in both PR and HV KVM */
 extern struct kvm_vcpu *kvmppc_core_vcpu_create_pr(struct kvm *kvm,
 						   unsigned int id);
diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
index d3c1eb3..450bd71 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -23,6 +23,8 @@ 
 #include <linux/types.h>
 #include <linux/kvm_host.h>
 
+static inline void kvmppc_core_enable_papr(struct kvm_vcpu *vcpu) {}
+
 /* LPIDs we support with this build -- runtime limit may be lower */
 #define KVMPPC_NR_LPIDS                        64
 
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 647e064..138e781 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -280,6 +280,7 @@  struct kvm_arch {
 };
 
 /* Values for kvm_mode */
+#define KVM_MODE_UNKNOWN	0
 #define KVM_MODE_PR		1
 #define KVM_MODE_HV		2
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index f22b3af..bddbfaa 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -77,9 +77,11 @@  int kvm_is_book3s_hv(struct kvm *kvm)
 
 #ifdef CONFIG_KVM_BOOK3S_PR
 #ifdef CONFIG_KVM_BOOK3S_64_HV
-/* Do x if the VM mode is PR */
+/* Do x if the VM mode is known to be PR */
 #define DO_IF_PR(kvm, x)	if ((kvm)->arch.kvm_mode == KVM_MODE_PR) { x; }
-/* Do x if the VM mode is HV */
+/* Do x if the VM mode is unknown or is known to be PR */
+#define DO_IF_PR_U(kvm, x)	if ((kvm)->arch.kvm_mode != KVM_MODE_HV) { x; }
+/* Do x if the VM mode is known to be HV */
 #define DO_IF_HV(kvm, x)	if ((kvm)->arch.kvm_mode == KVM_MODE_HV) { x; }
 
 /* Do x for PR vcpus */
@@ -89,6 +91,7 @@  int kvm_is_book3s_hv(struct kvm *kvm)
 
 #else
 #define DO_IF_PR(kvm, x)	x
+#define DO_IF_PR_U(kvm, x)	x
 #define DO_IF_HV(kvm, x)	
 #define VCPU_DO_PR(vcpu, x)	x
 #define VCPU_DO_HV(vcpu, x)
@@ -97,12 +100,14 @@  int kvm_is_book3s_hv(struct kvm *kvm)
 #else
 #ifdef CONFIG_KVM_BOOK3S_64_HV
 #define DO_IF_PR(kvm, x)
+#define DO_IF_PR_U(kvm, x)
 #define DO_IF_HV(kvm, x)	x
 #define VCPU_DO_PR(vcpu, x)
 #define VCPU_DO_HV(vcpu, x)	x
 
 #else
 #define DO_IF_PR(kvm, x)
+#define DO_IF_PR_U(kvm, x)
 #define DO_IF_HV(kvm, x)
 #define VCPU_DO_PR(vcpu, x)
 #define VCPU_DO_HV(vcpu, x)
@@ -712,11 +717,47 @@  void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
 
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
+	struct kvm *kvm = vcpu->kvm;
+
+	/*
+	 * If HV mode hasn't been selected by now, make it PR mode
+	 * from now on.
+	 */
+	if (kvm->arch.kvm_mode == KVM_MODE_UNKNOWN) {
+		mutex_lock(&kvm->lock);
+		if (kvm->arch.kvm_mode == KVM_MODE_UNKNOWN)
+			kvm->arch.kvm_mode = KVM_MODE_PR;
+		mutex_unlock(&kvm->lock);
+	}
+
 	VCPU_DO_PR(vcpu, return kvmppc_vcpu_run_pr(kvm_run, vcpu));
 	VCPU_DO_HV(vcpu, return kvmppc_vcpu_run_hv(kvm_run, vcpu));
 	return -EINVAL;
 }
 
+/*
+ * If we can do either PR or HV, switch to HV if possible.
+ */
+void kvmppc_core_enable_papr(struct kvm_vcpu *vcpu)
+{
+#if defined(CONFIG_KVM_BOOK3S_PR) && defined(CONFIG_KVM_BOOK3S_64_HV)
+	struct kvm *kvm = vcpu->kvm;
+	int err;
+
+	mutex_lock(&kvm->lock);
+	if (kvm->arch.kvm_mode == KVM_MODE_UNKNOWN) {
+		if (kvm_book3s_hv_possible()) {
+			/* should check PVRs */
+			err = kvmppc_convert_to_hv(kvm);
+			if (!err)
+				pr_debug("KVM: Using HV mode for PAPR guest\n");
+		} else
+			kvm->arch.kvm_mode = KVM_MODE_PR;
+	}
+	mutex_unlock(&kvm->lock);
+#endif
+}
+
 int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
                                   struct kvm_translation *tr)
 {
@@ -739,7 +780,7 @@  void kvmppc_decrementer_func(unsigned long data)
 
 struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 {
-	DO_IF_PR(kvm, return kvmppc_core_vcpu_create_pr(kvm, id));
+	DO_IF_PR_U(kvm, return kvmppc_core_vcpu_create_pr(kvm, id));
 	DO_IF_HV(kvm, return kvmppc_core_vcpu_create_hv(kvm, id));
 	return NULL;
 }
@@ -758,7 +799,7 @@  int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 {
-	DO_IF_PR(kvm, return kvm_vm_ioctl_get_dirty_log_pr(kvm, log));
+	DO_IF_PR_U(kvm, return kvm_vm_ioctl_get_dirty_log_pr(kvm, log));
 	DO_IF_HV(kvm, return kvm_vm_ioctl_get_dirty_log_hv(kvm, log));
 	return -ENOTTY;
 }
@@ -902,24 +943,33 @@  int kvmppc_core_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
 #endif
 
-#ifdef CONFIG_KVM_BOOK3S_64_HV
-	if (hv_ok) {
-		err = kvmppc_core_init_vm_hv(kvm);
-		kvm->arch.kvm_mode = KVM_MODE_HV;
-		return err;
-	}
-#endif
+	/*
+	 * If both PR and HV are enabled, then new VMs start out as
+	 * PR and get converted to HV when userspace enables the
+	 * KVM_CAP_PPC_PAPR capability, assuming the system supports
+	 * HV-mode KVM (i.e. has suitable CPUs and has hypervisor
+	 * mode available).
+	 */
 #ifdef CONFIG_KVM_BOOK3S_PR
 	err = kvmppc_core_init_vm_pr(kvm);
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	kvm->arch.kvm_mode = KVM_MODE_UNKNOWN;
+#else
 	kvm->arch.kvm_mode = KVM_MODE_PR;
 #endif
-
+#else
+#if defined(CONFIG_KVM_BOOK3S_64_HV)
+	err = kvmppc_core_init_vm_hv(kvm);
+	kvm->arch.kvm_mode = KVM_MODE_HV;
+#endif
+#endif
 	return err;
+
 }
 
 void kvmppc_core_destroy_vm(struct kvm *kvm)
 {
-	DO_IF_PR(kvm, kvmppc_core_destroy_vm_pr(kvm));
+	DO_IF_PR_U(kvm, kvmppc_core_destroy_vm_pr(kvm));
 	DO_IF_HV(kvm, kvmppc_core_destroy_vm_hv(kvm));
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 956318b..fee28e5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -984,6 +984,83 @@  out:
 	return ERR_PTR(err);
 }
 
+#ifdef CONFIG_KVM_BOOK3S_PR
+/*
+ * Adapt to different storage conventions for registers between PR and HV.
+ */
+static void kvmppc_convert_regs(struct kvm_vcpu *vcpu)
+{
+	int i, j;
+
+	/* Pack SLB entries down to low indexes and add index field */
+	j = 0;
+	for (i = 0; i < vcpu->arch.slb_nr; ++i) {
+		if (vcpu->arch.slb[i].valid) {
+			vcpu->arch.slb[j].orige = vcpu->arch.slb[i].orige | i;
+			vcpu->arch.slb[j].origv = vcpu->arch.slb[i].origv;
+			++j;
+		}
+	}
+
+	memcpy(&vcpu->arch.shregs, vcpu->arch.shared,
+	       sizeof(vcpu->arch.shregs));
+}
+
+/* Caller must hold kvm->lock */
+int kvmppc_convert_to_hv(struct kvm *kvm)
+{
+	int err;
+	long int i;
+	struct kvm_memory_slot *memslot;
+	struct kvm_memslots *slots;
+	struct kvm_vcpu *vcpu;
+
+	/* First do all the necessary memory allocations */
+	err = kvmppc_core_init_vm_hv(kvm);
+	if (err)
+		goto out;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		err = kvmppc_alloc_vcore(vcpu, vcpu->vcpu_id);
+		if (err)
+			goto free_vm;
+	}
+
+	mutex_lock(&kvm->slots_lock);
+	slots = kvm->memslots;
+	err = 0;
+	kvm_for_each_memslot(memslot, slots) {
+		err = kvmppc_core_prepare_memory_region_hv(kvm, memslot, NULL);
+		if (err)
+			goto free_slots_locked;
+	}
+	mutex_unlock(&kvm->slots_lock);
+
+	/*
+	 * Now that memory is allocated, switch the VM over to HV mode.
+	 */
+	kvm->arch.kvm_mode = KVM_MODE_HV;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		kvmppc_convert_regs(vcpu);
+		kvmppc_release_vcpu_pr(vcpu);
+		vcpu->arch.use_hv = true;
+		kvmppc_setup_hv_vcpu(vcpu);
+	}
+
+	return 0;
+
+ free_slots_locked:
+	kvm_for_each_memslot(memslot, slots)
+		kvmppc_core_free_memslot_hv(memslot, NULL);
+	mutex_unlock(&kvm->slots_lock);
+ free_vm:
+	kvmppc_core_destroy_vm_hv(kvm);
+ out:
+	return err;
+}
+#endif /* CONFIG_KVM_BOOK3S_PR */
+
 static void unpin_vpa(struct kvm *kvm, struct kvmppc_vpa *vpa)
 {
 	if (vpa->pinned_addr)
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index f583e10..f35425e 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -264,10 +264,6 @@  void kvmppc_set_msr_pr(struct kvm_vcpu *vcpu, u64 msr)
 		kvmppc_mmu_pte_flush(vcpu, (uint32_t)vcpu->arch.magic_page_pa,
 				     ~0xFFFUL);
 	}
-
-	/* Preload FPU if it's enabled */
-	if (vcpu->arch.shared->msr & MSR_FP)
-		kvmppc_handle_ext(vcpu, BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 }
 
 void kvmppc_set_pvr_pr(struct kvm_vcpu *vcpu, u32 pvr)
@@ -840,6 +836,11 @@  program_interrupt:
 		switch (er) {
 		case EMULATE_DONE:
 			r = RESUME_GUEST_NV;
+			/* Preload FPU if it's enabled */
+			if (vcpu->arch.shared->msr & MSR_FP &
+			    ~vcpu->arch.guest_owned_ext)
+				kvmppc_handle_ext(vcpu,
+					BOOK3S_INTERRUPT_FP_UNAVAIL, MSR_FP);
 			break;
 		case EMULATE_AGAIN:
 			r = RESUME_GUEST;
@@ -1184,6 +1185,21 @@  out:
 	return ERR_PTR(err);
 }
 
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+/*
+ * Release PR-specific resources allocated for this vcpu.
+ */
+void kvmppc_release_vcpu_pr(struct kvm_vcpu *vcpu)
+{
+	kvmppc_mmu_destroy_pr(vcpu);
+	free_page((unsigned long)vcpu->arch.shared & PAGE_MASK);
+	kfree(vcpu->arch.shadow_vcpu);
+	vcpu->arch.shadow_vcpu = NULL;
+	vfree(vcpu->arch.book3s);
+	vcpu->arch.book3s = NULL;
+}
+#endif /* CONFIG_KVM_BOOK3S_64_HV */
+
 void kvmppc_core_vcpu_free_pr(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcpu_book3s *vcpu_book3s = to_book3s(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 49bbc9e..a8caeec 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -800,6 +800,7 @@  static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 	case KVM_CAP_PPC_PAPR:
 		r = 0;
 		vcpu->arch.papr_enabled = true;
+		kvmppc_core_enable_papr(vcpu);
 		break;
 	case KVM_CAP_PPC_EPR:
 		r = 0;