Patchwork [2/3] KVM: PPC: Add SPR emulation exits

login
register
mail settings
Submitter Alexander Graf
Date Oct. 6, 2012, 11:41 p.m.
Message ID <1349566882-10948-3-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/189763/
State New
Headers show

Comments

Alexander Graf - Oct. 6, 2012, 11:41 p.m.
SPRs on PowerPC are the equivalent to MSRs on x86. They usually
control behavior inside a core, so the best place to emulate them
traditionally has been the kernel side of kvm.

However, some SPRs should be emulated by user space. For example
the DBCR0 register which is used for machine reset. Or the interrupt
acknowledge register on e500 which is tightly integrated with the
interrupt controller that lives in user space.

So let's expose "unknown" SPR reads and writes to user space, so that
it can handle them if it knows what's going on.

As a nice side effect, we also get a lot better error reporting up
to user space, since now we actually know when an SPR read/write failed.

Signed-off-by: Alexander Graf <agraf@suse.de>
---
 Documentation/virtual/kvm/api.txt   |   37 ++++++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/kvm_host.h |    3 ++
 arch/powerpc/include/asm/kvm_ppc.h  |    1 +
 arch/powerpc/kvm/book3s_pr.c        |    4 +++
 arch/powerpc/kvm/booke.c            |    4 +++
 arch/powerpc/kvm/emulate.c          |   38 ++++++++++++++++++++++++++++++----
 arch/powerpc/kvm/powerpc.c          |   20 ++++++++++++++++++
 include/linux/kvm.h                 |   12 +++++++++++
 8 files changed, 114 insertions(+), 5 deletions(-)
Avi Kivity - Oct. 7, 2012, 1:13 p.m.
On 10/07/2012 01:41 AM, Alexander Graf wrote:
> SPRs on PowerPC are the equivalent to MSRs on x86. They usually
> control behavior inside a core, so the best place to emulate them
> traditionally has been the kernel side of kvm.
> 
> However, some SPRs should be emulated by user space. For example
> the DBCR0 register which is used for machine reset. Or the interrupt
> acknowledge register on e500 which is tightly integrated with the
> interrupt controller that lives in user space.
> 
> So let's expose "unknown" SPR reads and writes to user space, so that
> it can handle them if it knows what's going on.
> 
> As a nice side effect, we also get a lot better error reporting up
> to user space, since now we actually know when an SPR read/write failed.
> 

We have a similar problem with x86 MSRs.

> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index e726d76..7a35c64 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2253,6 +2253,32 @@ The possible hypercalls are defined in the Power Architecture Platform
>  Requirements (PAPR) document available from www.power.org (free
>  developer registration required to access it).
>  
> +		/* KVM_EXIT_SPR */
> +		struct {
> +			__u64 sprn;
> +			__u64 data;
> +			__u64 msr;
> +			__u8  is_write;
> +#define SPR_STATUS_OK   0
> +#define SPR_STATUS_FAIL 1
> +			__u8  status;
> +		} spr;
> +
> +This is used on PowerPC for Special Purpose Register emulation that
> +the kernel can not deal with.
> +
> +It occurs when the guest triggers an mtspr or mfspr instruction on
> +an SPR that is not handled by kvm's SPR emulation code. In these
> +cases, 'sprn' contains the SPR ID. That ID is target CPU specific.
> +'data' contains the value to write to the SPR when 'is_write'==1 (mtspr)
> +or is used as result buffer for 'is_write'==0 (mfspr). Status is used
> +to tell the kernel that an SPR read/write was successful. It is set to
> +SPR_STATUS_OK by default. If user space fails to emulate an SPR access,
> +it should set it to SPR_STATUS_FAIL, so that the kernel can inject
> +an exception into the guest context. The field 'msr' contains the MSR
> +register state at the point of time the SPR read/write occured. It can
> +be used by user space for permission checks.
> +

Since this happens in the middle of instruction emulation, the same
rules should apply.  Userspace must reenter kvm with the response to the
instruction, and can force an immediate exit by queueing an unmasked signal.


What happens when a future kvm starts emulating an SPR that was
previously emulated in userspace?
Alexander Graf - Oct. 7, 2012, 1:19 p.m.
On 07.10.2012, at 15:13, Avi Kivity wrote:

> On 10/07/2012 01:41 AM, Alexander Graf wrote:
>> SPRs on PowerPC are the equivalent to MSRs on x86. They usually
>> control behavior inside a core, so the best place to emulate them
>> traditionally has been the kernel side of kvm.
>> 
>> However, some SPRs should be emulated by user space. For example
>> the DBCR0 register which is used for machine reset. Or the interrupt
>> acknowledge register on e500 which is tightly integrated with the
>> interrupt controller that lives in user space.
>> 
>> So let's expose "unknown" SPR reads and writes to user space, so that
>> it can handle them if it knows what's going on.
>> 
>> As a nice side effect, we also get a lot better error reporting up
>> to user space, since now we actually know when an SPR read/write failed.
>> 
> 
> We have a similar problem with x86 MSRs.

Yup. The new APIC MSR registers would also have the same problem, right?

> 
>> 
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index e726d76..7a35c64 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2253,6 +2253,32 @@ The possible hypercalls are defined in the Power Architecture Platform
>> Requirements (PAPR) document available from www.power.org (free
>> developer registration required to access it).
>> 
>> +		/* KVM_EXIT_SPR */
>> +		struct {
>> +			__u64 sprn;
>> +			__u64 data;
>> +			__u64 msr;
>> +			__u8  is_write;
>> +#define SPR_STATUS_OK   0
>> +#define SPR_STATUS_FAIL 1
>> +			__u8  status;
>> +		} spr;
>> +
>> +This is used on PowerPC for Special Purpose Register emulation that
>> +the kernel can not deal with.
>> +
>> +It occurs when the guest triggers an mtspr or mfspr instruction on
>> +an SPR that is not handled by kvm's SPR emulation code. In these
>> +cases, 'sprn' contains the SPR ID. That ID is target CPU specific.
>> +'data' contains the value to write to the SPR when 'is_write'==1 (mtspr)
>> +or is used as result buffer for 'is_write'==0 (mfspr). Status is used
>> +to tell the kernel that an SPR read/write was successful. It is set to
>> +SPR_STATUS_OK by default. If user space fails to emulate an SPR access,
>> +it should set it to SPR_STATUS_FAIL, so that the kernel can inject
>> +an exception into the guest context. The field 'msr' contains the MSR
>> +register state at the point of time the SPR read/write occured. It can
>> +be used by user space for permission checks.
>> +
> 
> Since this happens in the middle of instruction emulation, the same
> rules should apply.  Userspace must reenter kvm with the response to the
> instruction, and can force an immediate exit by queueing an unmasked signal.

Ah, yes. I forgot to add this exit to that section of the spec.

> What happens when a future kvm starts emulating an SPR that was
> previously emulated in userspace?

That depends on a case-by-case basis.

  a) SPR is emulated in kernel space because the device is emulated in kernel space.

  This is the typical in-kernel irqchip case. We just only enable SPR traps for those SPRs in the kernel if the in-kernel irqchip is in use.

  b) Some bits need to be emulated by kernel space instead because they change vcpu behavior

  That one is more tricky. I would assume you could handle the few bits that change vcpu behavior in kernel space, then fail the instruction emulation and pass the rest on to user space as you did before for the rest of the bits.

Is there any other case? I can't think of any OTOH :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - Oct. 7, 2012, 1:26 p.m.
On 10/07/2012 03:19 PM, Alexander Graf wrote:
> 
> On 07.10.2012, at 15:13, Avi Kivity wrote:
> 
>> On 10/07/2012 01:41 AM, Alexander Graf wrote:
>>> SPRs on PowerPC are the equivalent to MSRs on x86. They usually
>>> control behavior inside a core, so the best place to emulate them
>>> traditionally has been the kernel side of kvm.
>>> 
>>> However, some SPRs should be emulated by user space. For example
>>> the DBCR0 register which is used for machine reset. Or the interrupt
>>> acknowledge register on e500 which is tightly integrated with the
>>> interrupt controller that lives in user space.
>>> 
>>> So let's expose "unknown" SPR reads and writes to user space, so that
>>> it can handle them if it knows what's going on.
>>> 
>>> As a nice side effect, we also get a lot better error reporting up
>>> to user space, since now we actually know when an SPR read/write failed.
>>> 
>> 
>> We have a similar problem with x86 MSRs.
> 
> Yup. The new APIC MSR registers would also have the same problem, right?

Which new APIC MSR registers?

> 
>> 
>>> 
>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>> index e726d76..7a35c64 100644
>>> --- a/Documentation/virtual/kvm/api.txt
>>> +++ b/Documentation/virtual/kvm/api.txt
>>> @@ -2253,6 +2253,32 @@ The possible hypercalls are defined in the Power Architecture Platform
>>> Requirements (PAPR) document available from www.power.org (free
>>> developer registration required to access it).
>>> 
>>> +		/* KVM_EXIT_SPR */
>>> +		struct {
>>> +			__u64 sprn;
>>> +			__u64 data;
>>> +			__u64 msr;
>>> +			__u8  is_write;
>>> +#define SPR_STATUS_OK   0
>>> +#define SPR_STATUS_FAIL 1
>>> +			__u8  status;
>>> +		} spr;
>>> +
>>> +This is used on PowerPC for Special Purpose Register emulation that
>>> +the kernel can not deal with.
>>> +
>>> +It occurs when the guest triggers an mtspr or mfspr instruction on
>>> +an SPR that is not handled by kvm's SPR emulation code. In these
>>> +cases, 'sprn' contains the SPR ID. That ID is target CPU specific.
>>> +'data' contains the value to write to the SPR when 'is_write'==1 (mtspr)
>>> +or is used as result buffer for 'is_write'==0 (mfspr). Status is used
>>> +to tell the kernel that an SPR read/write was successful. It is set to
>>> +SPR_STATUS_OK by default. If user space fails to emulate an SPR access,
>>> +it should set it to SPR_STATUS_FAIL, so that the kernel can inject
>>> +an exception into the guest context. The field 'msr' contains the MSR
>>> +register state at the point of time the SPR read/write occured. It can
>>> +be used by user space for permission checks.
>>> +
>> 
>> Since this happens in the middle of instruction emulation, the same
>> rules should apply.  Userspace must reenter kvm with the response to the
>> instruction, and can force an immediate exit by queueing an unmasked signal.
> 
> Ah, yes. I forgot to add this exit to that section of the spec.
> 
>> What happens when a future kvm starts emulating an SPR that was
>> previously emulated in userspace?
> 
> That depends on a case-by-case basis.
> 
>   a) SPR is emulated in kernel space because the device is emulated in kernel space.
> 
>   This is the typical in-kernel irqchip case. We just only enable SPR traps for those SPRs in the kernel if the in-kernel irqchip is in use.
> 
>   b) Some bits need to be emulated by kernel space instead because they change vcpu behavior
> 
>   That one is more tricky. I would assume you could handle the few bits that change vcpu behavior in kernel space, then fail the instruction emulation and pass the rest on to user space as you did before for the rest of the bits.
> 
> Is there any other case? I can't think of any OTOH :).
> 

An SPR becomes heavily used by a guest, and there is therefore pressure
to emulate it in the kernel in order to improve performance.

The downside of this generic approach is that it prepares suprises down
the road.  The alternative approach, of adding a new KVM_EXIT_RESET,
avoids this minefield, but requires ABI changes every time we want to
emulate something in userspace.  Can you provide a critique of this
alternate approach?
Alexander Graf - Oct. 7, 2012, 1:26 p.m.
On 07.10.2012, at 15:19, Alexander Graf wrote:

> 
> On 07.10.2012, at 15:13, Avi Kivity wrote:
> 
>> On 10/07/2012 01:41 AM, Alexander Graf wrote:
>>> SPRs on PowerPC are the equivalent to MSRs on x86. They usually
>>> control behavior inside a core, so the best place to emulate them
>>> traditionally has been the kernel side of kvm.
>>> 
>>> However, some SPRs should be emulated by user space. For example
>>> the DBCR0 register which is used for machine reset. Or the interrupt
>>> acknowledge register on e500 which is tightly integrated with the
>>> interrupt controller that lives in user space.
>>> 
>>> So let's expose "unknown" SPR reads and writes to user space, so that
>>> it can handle them if it knows what's going on.
>>> 
>>> As a nice side effect, we also get a lot better error reporting up
>>> to user space, since now we actually know when an SPR read/write failed.
>>> 
>> 
>> We have a similar problem with x86 MSRs.
> 
> Yup. The new APIC MSR registers would also have the same problem, right?
> 
>> 
>>> 
>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>> index e726d76..7a35c64 100644
>>> --- a/Documentation/virtual/kvm/api.txt
>>> +++ b/Documentation/virtual/kvm/api.txt
>>> @@ -2253,6 +2253,32 @@ The possible hypercalls are defined in the Power Architecture Platform
>>> Requirements (PAPR) document available from www.power.org (free
>>> developer registration required to access it).
>>> 
>>> +		/* KVM_EXIT_SPR */
>>> +		struct {
>>> +			__u64 sprn;
>>> +			__u64 data;
>>> +			__u64 msr;
>>> +			__u8  is_write;
>>> +#define SPR_STATUS_OK   0
>>> +#define SPR_STATUS_FAIL 1
>>> +			__u8  status;
>>> +		} spr;
>>> +
>>> +This is used on PowerPC for Special Purpose Register emulation that
>>> +the kernel can not deal with.
>>> +
>>> +It occurs when the guest triggers an mtspr or mfspr instruction on
>>> +an SPR that is not handled by kvm's SPR emulation code. In these
>>> +cases, 'sprn' contains the SPR ID. That ID is target CPU specific.
>>> +'data' contains the value to write to the SPR when 'is_write'==1 (mtspr)
>>> +or is used as result buffer for 'is_write'==0 (mfspr). Status is used
>>> +to tell the kernel that an SPR read/write was successful. It is set to
>>> +SPR_STATUS_OK by default. If user space fails to emulate an SPR access,
>>> +it should set it to SPR_STATUS_FAIL, so that the kernel can inject
>>> +an exception into the guest context. The field 'msr' contains the MSR
>>> +register state at the point of time the SPR read/write occured. It can
>>> +be used by user space for permission checks.
>>> +
>> 
>> Since this happens in the middle of instruction emulation, the same
>> rules should apply.  Userspace must reenter kvm with the response to the
>> instruction, and can force an immediate exit by queueing an unmasked signal.
> 
> Ah, yes. I forgot to add this exit to that section of the spec.

Patch submitted :).

>> What happens when a future kvm starts emulating an SPR that was
>> previously emulated in userspace?

Also, for state migration, that SPR would end up in the "SPRs to be saved" list that kernel space passes on to user space (we don't have that yet, but it's the same idea as what x86's MSR code or ARM's cr15 code do). Kernel saved registers always take precedence over user space saved registers.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - Oct. 7, 2012, 1:30 p.m.
On 07.10.2012, at 15:26, Avi Kivity wrote:

> On 10/07/2012 03:19 PM, Alexander Graf wrote:
>> 
>> On 07.10.2012, at 15:13, Avi Kivity wrote:
>> 
>>> On 10/07/2012 01:41 AM, Alexander Graf wrote:
>>>> SPRs on PowerPC are the equivalent to MSRs on x86. They usually
>>>> control behavior inside a core, so the best place to emulate them
>>>> traditionally has been the kernel side of kvm.
>>>> 
>>>> However, some SPRs should be emulated by user space. For example
>>>> the DBCR0 register which is used for machine reset. Or the interrupt
>>>> acknowledge register on e500 which is tightly integrated with the
>>>> interrupt controller that lives in user space.
>>>> 
>>>> So let's expose "unknown" SPR reads and writes to user space, so that
>>>> it can handle them if it knows what's going on.
>>>> 
>>>> As a nice side effect, we also get a lot better error reporting up
>>>> to user space, since now we actually know when an SPR read/write failed.
>>>> 
>>> 
>>> We have a similar problem with x86 MSRs.
>> 
>> Yup. The new APIC MSR registers would also have the same problem, right?
> 
> Which new APIC MSR registers?

I thought x2apic can be accessed through MSRs? If you want to emulate that in user space, you need something similar.

> 
>> 
>>> 
>>>> 
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index e726d76..7a35c64 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -2253,6 +2253,32 @@ The possible hypercalls are defined in the Power Architecture Platform
>>>> Requirements (PAPR) document available from www.power.org (free
>>>> developer registration required to access it).
>>>> 
>>>> +		/* KVM_EXIT_SPR */
>>>> +		struct {
>>>> +			__u64 sprn;
>>>> +			__u64 data;
>>>> +			__u64 msr;
>>>> +			__u8  is_write;
>>>> +#define SPR_STATUS_OK   0
>>>> +#define SPR_STATUS_FAIL 1
>>>> +			__u8  status;
>>>> +		} spr;
>>>> +
>>>> +This is used on PowerPC for Special Purpose Register emulation that
>>>> +the kernel can not deal with.
>>>> +
>>>> +It occurs when the guest triggers an mtspr or mfspr instruction on
>>>> +an SPR that is not handled by kvm's SPR emulation code. In these
>>>> +cases, 'sprn' contains the SPR ID. That ID is target CPU specific.
>>>> +'data' contains the value to write to the SPR when 'is_write'==1 (mtspr)
>>>> +or is used as result buffer for 'is_write'==0 (mfspr). Status is used
>>>> +to tell the kernel that an SPR read/write was successful. It is set to
>>>> +SPR_STATUS_OK by default. If user space fails to emulate an SPR access,
>>>> +it should set it to SPR_STATUS_FAIL, so that the kernel can inject
>>>> +an exception into the guest context. The field 'msr' contains the MSR
>>>> +register state at the point of time the SPR read/write occured. It can
>>>> +be used by user space for permission checks.
>>>> +
>>> 
>>> Since this happens in the middle of instruction emulation, the same
>>> rules should apply.  Userspace must reenter kvm with the response to the
>>> instruction, and can force an immediate exit by queueing an unmasked signal.
>> 
>> Ah, yes. I forgot to add this exit to that section of the spec.
>> 
>>> What happens when a future kvm starts emulating an SPR that was
>>> previously emulated in userspace?
>> 
>> That depends on a case-by-case basis.
>> 
>>  a) SPR is emulated in kernel space because the device is emulated in kernel space.
>> 
>>  This is the typical in-kernel irqchip case. We just only enable SPR traps for those SPRs in the kernel if the in-kernel irqchip is in use.
>> 
>>  b) Some bits need to be emulated by kernel space instead because they change vcpu behavior
>> 
>>  That one is more tricky. I would assume you could handle the few bits that change vcpu behavior in kernel space, then fail the instruction emulation and pass the rest on to user space as you did before for the rest of the bits.
>> 
>> Is there any other case? I can't think of any OTOH :).
>> 
> 
> An SPR becomes heavily used by a guest, and there is therefore pressure
> to emulate it in the kernel in order to improve performance.

Then you enable a CAP to have it enabled in kernel space and thus user and kernel space know about it.

> The downside of this generic approach is that it prepares suprises down
> the road.  The alternative approach, of adding a new KVM_EXIT_RESET,
> avoids this minefield, but requires ABI changes every time we want to
> emulate something in userspace.  Can you provide a critique of this
> alternate approach?

Yeah, it doesn't scale as well. The SPR read/write give us all information we need to emulate other registers too, like the magical "read this SPR and automatically get the interrupt vector from the MPIC and ack the interrupt along the way" register we have on e500. We'd have to add a new exit for that one as well. And for the next, and the next.

Plus, today we don't get good error messages when we fail an SPR read/write. With this approach, you do. And you can potentially configure whether you want to ignore unknown SPRs on a per-VM basis.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - Oct. 7, 2012, 1:30 p.m.
On 10/07/2012 03:26 PM, Alexander Graf wrote:
>> 
>> Ah, yes. I forgot to add this exit to that section of the spec.
> 
> Patch submitted :).

Ugh, the whole point of finding problems in patches is that I don't have
to think about them for a while.  Posting a new patch in a few minutes
negates this completely.
Alexander Graf - Oct. 7, 2012, 1:33 p.m.
On 07.10.2012, at 15:30, Avi Kivity wrote:

> On 10/07/2012 03:26 PM, Alexander Graf wrote:
>>> 
>>> Ah, yes. I forgot to add this exit to that section of the spec.
>> 
>> Patch submitted :).
> 
> Ugh, the whole point of finding problems in patches is that I don't have
> to think about them for a while.  Posting a new patch in a few minutes
> negates this completely.

Haha :). It's a real trivial one. And it fixes the documentation to reflect a few more missing exit types that need reentry to be consistent. Apparently we forgot to add them to the documentation when we added the exits.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - Oct. 7, 2012, 1:34 p.m.
On 10/07/2012 03:30 PM, Alexander Graf wrote:
>>> 
>>> Yup. The new APIC MSR registers would also have the same problem, right?
>> 
>> Which new APIC MSR registers?
> 
> I thought x2apic can be accessed through MSRs? If you want to emulate that in user space, you need something similar.

It's emulated in the kernel.  But yes, for -no-kvm-irqchip -cpu +x2apic
we have to do this kind of forwarding.

>> 
>> An SPR becomes heavily used by a guest, and there is therefore pressure
>> to emulate it in the kernel in order to improve performance.
> 
> Then you enable a CAP to have it enabled in kernel space and thus user and kernel space know about it.

Ok.

> 
>> The downside of this generic approach is that it prepares suprises down
>> the road.  The alternative approach, of adding a new KVM_EXIT_RESET,
>> avoids this minefield, but requires ABI changes every time we want to
>> emulate something in userspace.  Can you provide a critique of this
>> alternate approach?
> 
> Yeah, it doesn't scale as well. The SPR read/write give us all information we need to emulate other registers too, like the magical "read this SPR and automatically get the interrupt vector from the MPIC and ack the interrupt along the way" register we have on e500. We'd have to add a new exit for that one as well. And for the next, and the next.

Unless we emulate the MPIC in the kernel (of course that shouldn't be
the trigger).

> 
> Plus, today we don't get good error messages when we fail an SPR read/write. With this approach, you do. And you can potentially configure whether you want to ignore unknown SPRs on a per-VM basis.

Ok, it makes sense.  Maybe we should do the same for x86, though I'm
worried about fragmenting the implementation - the cpu is now
implemented in both userspace and the kernel, and we need to document
precisely which MSRs are emulated in the kernel.
Alexander Graf - Oct. 7, 2012, 1:37 p.m.
On 07.10.2012, at 15:34, Avi Kivity wrote:

> On 10/07/2012 03:30 PM, Alexander Graf wrote:
>>>> 
>>>> Yup. The new APIC MSR registers would also have the same problem, right?
>>> 
>>> Which new APIC MSR registers?
>> 
>> I thought x2apic can be accessed through MSRs? If you want to emulate that in user space, you need something similar.
> 
> It's emulated in the kernel.  But yes, for -no-kvm-irqchip -cpu +x2apic
> we have to do this kind of forwarding.
> 
>>> 
>>> An SPR becomes heavily used by a guest, and there is therefore pressure
>>> to emulate it in the kernel in order to improve performance.
>> 
>> Then you enable a CAP to have it enabled in kernel space and thus user and kernel space know about it.
> 
> Ok.
> 
>> 
>>> The downside of this generic approach is that it prepares suprises down
>>> the road.  The alternative approach, of adding a new KVM_EXIT_RESET,
>>> avoids this minefield, but requires ABI changes every time we want to
>>> emulate something in userspace.  Can you provide a critique of this
>>> alternate approach?
>> 
>> Yeah, it doesn't scale as well. The SPR read/write give us all information we need to emulate other registers too, like the magical "read this SPR and automatically get the interrupt vector from the MPIC and ack the interrupt along the way" register we have on e500. We'd have to add a new exit for that one as well. And for the next, and the next.
> 
> Unless we emulate the MPIC in the kernel (of course that shouldn't be
> the trigger).

Yeah, and we have a patch pending that emulates the MPIC in kernel space. But I want to keep 100% compatibility with user space emulated MPICs anyway, least of all to make sure that we have an easier time to narrow down bugs.

> 
>> 
>> Plus, today we don't get good error messages when we fail an SPR read/write. With this approach, you do. And you can potentially configure whether you want to ignore unknown SPRs on a per-VM basis.
> 
> Ok, it makes sense.  Maybe we should do the same for x86, though I'm
> worried about fragmenting the implementation - the cpu is now
> implemented in both userspace and the kernel, and we need to document
> precisely which MSRs are emulated in the kernel.

Why not reverse the logic? Just have a list of MSRs you require user space to emulate. Put all the others that arrive at user space into the "unknown MSR" bucket and leave it to user space to decide what it wants to do with them.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e726d76..7a35c64 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2253,6 +2253,32 @@  The possible hypercalls are defined in the Power Architecture Platform
 Requirements (PAPR) document available from www.power.org (free
 developer registration required to access it).
 
+		/* KVM_EXIT_SPR */
+		struct {
+			__u64 sprn;
+			__u64 data;
+			__u64 msr;
+			__u8  is_write;
+#define SPR_STATUS_OK   0
+#define SPR_STATUS_FAIL 1
+			__u8  status;
+		} spr;
+
+This is used on PowerPC for Special Purpose Register emulation that
+the kernel can not deal with.
+
+It occurs when the guest triggers an mtspr or mfspr instruction on
+an SPR that is not handled by kvm's SPR emulation code. In these
+cases, 'sprn' contains the SPR ID. That ID is target CPU specific.
+'data' contains the value to write to the SPR when 'is_write'==1 (mtspr)
+or is used as result buffer for 'is_write'==0 (mfspr). Status is used
+to tell the kernel that an SPR read/write was successful. It is set to
+SPR_STATUS_OK by default. If user space fails to emulate an SPR access,
+it should set it to SPR_STATUS_FAIL, so that the kernel can inject
+an exception into the guest context. The field 'msr' contains the MSR
+register state at the point of time the SPR read/write occured. It can
+be used by user space for permission checks.
+
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -2374,3 +2400,14 @@  For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV:
    where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value.
  - The tsize field of mas1 shall be set to 4K on TLB0, even though the
    hardware ignores this value for TLB0.
+
+6.4 KVM_CAP_PPC_SPR_EXIT
+
+Architectures: ppc
+Parameters: none
+Returns: 0 on success; -1 on error
+
+With this CAP enabled, KVM returns KVM_EXIT_SPR exits to have user space emulate
+SPRs that it can not handle itself.
+
+When this capability is enabled, KVM_EXIT_SPR can occur.
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 68f5a30..d72ea96 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -510,6 +510,9 @@  struct kvm_vcpu_arch {
 	u8 osi_enabled;
 	u8 papr_enabled;
 	u8 watchdog_enabled;
+	u8 spr_needed;
+	u8 spr_is_write;
+	u8 spr_exit_enabled;
 	u8 sane;
 	u8 cpu_type;
 	u8 hcall_needed;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 609cca3..f9cdd42 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -42,6 +42,7 @@  enum emulation_result {
 	EMULATE_DONE,         /* no further processing */
 	EMULATE_DO_MMIO,      /* kvm_run filled with MMIO request */
 	EMULATE_DO_DCR,       /* kvm_run filled with DCR request */
+	EMULATE_DO_SPR,       /* kvm_run filled with SPR request */
 	EMULATE_FAIL,         /* can't emulate this instruction */
 	EMULATE_AGAIN,        /* something went wrong. go again */
 };
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index b853696..8d4dbc8 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -748,6 +748,10 @@  program_interrupt:
 			run->exit_reason = KVM_EXIT_MMIO;
 			r = RESUME_HOST_NV;
 			break;
+		case EMULATE_DO_SPR:
+			run->exit_reason = KVM_EXIT_SPR;
+			r = RESUME_HOST_NV;
+			break;
 		default:
 			BUG();
 		}
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3d1f35d..07fff68 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -694,6 +694,10 @@  static int emulation_exit(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		run->exit_reason = KVM_EXIT_DCR;
 		return RESUME_HOST;
 
+	case EMULATE_DO_SPR:
+		run->exit_reason = KVM_EXIT_SPR;
+		return RESUME_HOST;
+
 	case EMULATE_FAIL:
 		printk(KERN_CRIT "%s: emulation at %lx failed (%08x)\n",
 		       __func__, vcpu->arch.pc, vcpu->arch.last_inst);
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index b0855e5..c35cec5 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -172,9 +172,23 @@  static int kvmppc_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
 	default:
 		emulated = kvmppc_core_emulate_mtspr(vcpu, sprn,
 						     spr_val);
-		if (emulated == EMULATE_FAIL)
-			printk(KERN_INFO "mtspr: unknown spr "
-				"0x%x\n", sprn);
+		if (unlikely(emulated == EMULATE_FAIL)) {
+			if (vcpu->arch.spr_exit_enabled) {
+				/* In-kernel code can't handle this SPR.
+				   Forward it on to user space */
+				vcpu->run->spr.sprn = sprn;
+				vcpu->run->spr.data =  spr_val;
+				vcpu->run->spr.is_write = 1;
+				vcpu->run->spr.status = SPR_STATUS_OK;
+				vcpu->run->spr.msr = vcpu->arch.shared->msr;
+				vcpu->arch.spr_is_write = 1;
+				vcpu->arch.spr_needed = 1;
+				emulated = EMULATE_DO_SPR;
+			} else {
+				printk(KERN_INFO "mfspr: unknown spr "
+					"0x%x\n", sprn);
+			}
+		}
 		break;
 	}
 
@@ -237,8 +251,22 @@  static int kvmppc_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
 		emulated = kvmppc_core_emulate_mfspr(vcpu, sprn,
 						     &spr_val);
 		if (unlikely(emulated == EMULATE_FAIL)) {
-			printk(KERN_INFO "mfspr: unknown spr "
-				"0x%x\n", sprn);
+			if (vcpu->arch.spr_exit_enabled) {
+				/* In-kernel code can't handle this SPR.
+				   Forward it on to user space */
+				vcpu->run->spr.sprn = sprn;
+				vcpu->run->spr.data = 0;
+				vcpu->run->spr.is_write = 0;
+				vcpu->run->spr.status = SPR_STATUS_OK;
+				vcpu->run->spr.msr = vcpu->arch.shared->msr;
+				vcpu->arch.spr_is_write = 0;
+				vcpu->arch.io_gpr = rt;
+				vcpu->arch.spr_needed = 1;
+				emulated = EMULATE_DO_SPR;
+			} else {
+				printk(KERN_INFO "mfspr: unknown spr "
+					"0x%x\n", sprn);
+			}
 		}
 		break;
 	}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index deb0d59..7d120dc 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -530,6 +530,12 @@  static void kvmppc_complete_dcr_load(struct kvm_vcpu *vcpu,
 	kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, run->dcr.data);
 }
 
+static void kvmppc_complete_spr_load(struct kvm_vcpu *vcpu,
+                                     struct kvm_run *run)
+{
+	kvmppc_set_gpr(vcpu, vcpu->arch.io_gpr, run->spr.data);
+}
+
 static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu,
                                       struct kvm_run *run)
 {
@@ -680,6 +686,16 @@  int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (!vcpu->arch.dcr_is_write)
 			kvmppc_complete_dcr_load(vcpu, run);
 		vcpu->arch.dcr_needed = 0;
+	} else if (vcpu->arch.spr_needed) {
+		int flags = 0;
+		if (!vcpu->arch.spr_is_write)
+			kvmppc_complete_spr_load(vcpu, run);
+#if !defined(CONFIG_PPC_BOOK3S)
+		/* Indicate an illegal instruction for BookE */
+		flags = ESR_PIL;
+#endif
+		kvmppc_core_queue_program(vcpu, flags);
+		vcpu->arch.spr_needed = 0;
 	} else if (vcpu->arch.osi_needed) {
 		u64 *gprs = run->osi.gprs;
 		int i;
@@ -735,6 +751,10 @@  static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 		r = 0;
 		vcpu->arch.papr_enabled = true;
 		break;
+	case KVM_CAP_PPC_SPR_EXIT:
+		r = 0;
+		vcpu->arch.spr_exit_enabled = true;
+		break;
 #ifdef CONFIG_BOOKE
 	case KVM_CAP_PPC_BOOKE_WATCHDOG:
 		r = 0;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 65ad5c6..2e8f536 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -168,6 +168,7 @@  struct kvm_pit_config {
 #define KVM_EXIT_PAPR_HCALL	  19
 #define KVM_EXIT_S390_UCONTROL	  20
 #define KVM_EXIT_WATCHDOG         21
+#define KVM_EXIT_SPR              22
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 #define KVM_INTERNAL_ERROR_EMULATION 1
@@ -281,6 +282,16 @@  struct kvm_run {
 			__u64 ret;
 			__u64 args[9];
 		} papr_hcall;
+		/* KVM_EXIT_SPR */
+		struct {
+			__u64 sprn;
+			__u64 data;
+			__u64 msr;
+			__u8  is_write;
+#define SPR_STATUS_OK   0
+#define SPR_STATUS_FAIL 1
+			__u8  status;
+		} spr;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -630,6 +641,7 @@  struct kvm_ppc_smmu_info {
 #endif
 #define KVM_CAP_IRQFD_RESAMPLE 82
 #define KVM_CAP_PPC_BOOKE_WATCHDOG 83
+#define KVM_CAP_PPC_SPR_EXIT 84
 
 #ifdef KVM_CAP_IRQ_ROUTING