Patchwork [1/2] pseries: Synchronize qemu and KVM state on hypercalls

login
register
mail settings
Submitter David Gibson
Date Sept. 20, 2012, 7:08 a.m.
Message ID <1348124922-24263-2-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/185348/
State New
Headers show

Comments

David Gibson - Sept. 20, 2012, 7:08 a.m.
Currently the KVM exit path for PAPR hypercalls does not synchronize the
qemu cpu state with the KVM state.  Mostly this works, because the actual
hypercall arguments and return values are explicitly passed through the
kvm_run structure.  However, the hypercall path includes a privilege check,
to ensure that only the guest kernel can invoke hypercalls, not the guest
userspace.  Because of the lack of sync, this privilege check will use an
out of date copy of the MSR, which could lead either to guest userspace
being able to invoke hypercalls (a security hole for the guest) or to the
guest kernel being incorrectly refused privilege leading to various other
failures.

This patch fixes the bug by forcing a synchronization on the hypercall exit
path.  This does mean we have a potentially quite expensive get and set of
the state, however performance critical hypercalls are generally already
implemented inside KVM so this probably won't matter.  If it is a
performance problem we can optimize it later by having the kernel perform
the privilege check.  That will need a new capability, however, since qemu
will still need the privilege check for older kernels.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 target-ppc/kvm.c |    1 +
 1 file changed, 1 insertion(+)
Alexander Graf - Sept. 20, 2012, 7:38 a.m.
On 20.09.2012, at 09:08, David Gibson wrote:

> Currently the KVM exit path for PAPR hypercalls does not synchronize the
> qemu cpu state with the KVM state.  Mostly this works, because the actual
> hypercall arguments and return values are explicitly passed through the
> kvm_run structure.  However, the hypercall path includes a privilege check,
> to ensure that only the guest kernel can invoke hypercalls, not the guest
> userspace.  Because of the lack of sync, this privilege check will use an
> out of date copy of the MSR, which could lead either to guest userspace
> being able to invoke hypercalls (a security hole for the guest) or to the
> guest kernel being incorrectly refused privilege leading to various other
> failures.
> 
> This patch fixes the bug by forcing a synchronization on the hypercall exit
> path.  This does mean we have a potentially quite expensive get and set of
> the state, however performance critical hypercalls are generally already
> implemented inside KVM so this probably won't matter.  If it is a
> performance problem we can optimize it later by having the kernel perform
> the privilege check.  That will need a new capability, however, since qemu
> will still need the privilege check for older kernels.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

I would actually prefer to see that one fixed in kernel space. If we send it through -stable there, we should be a lot better off. Also, PR KVM already checks for !MSR_PR in kvm.


Alex
Jan Kiszka - Sept. 20, 2012, 7:39 a.m.
On 2012-09-20 09:08, David Gibson wrote:
> Currently the KVM exit path for PAPR hypercalls does not synchronize the
> qemu cpu state with the KVM state.  Mostly this works, because the actual
> hypercall arguments and return values are explicitly passed through the
> kvm_run structure.  However, the hypercall path includes a privilege check,
> to ensure that only the guest kernel can invoke hypercalls, not the guest
> userspace.  Because of the lack of sync, this privilege check will use an
> out of date copy of the MSR, which could lead either to guest userspace
> being able to invoke hypercalls (a security hole for the guest) or to the
> guest kernel being incorrectly refused privilege leading to various other
> failures.
> 
> This patch fixes the bug by forcing a synchronization on the hypercall exit
> path.  This does mean we have a potentially quite expensive get and set of
> the state, however performance critical hypercalls are generally already
> implemented inside KVM so this probably won't matter.  If it is a
> performance problem we can optimize it later by having the kernel perform
> the privilege check.  That will need a new capability, however, since qemu
> will still need the privilege check for older kernels.

If it's just about reading a small subset of the state (a single MSR?)
to allow the privilege check, you can also open-code only that in HCALL
exit. If it really matters.

Jan

> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  target-ppc/kvm.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 546c116..78a47fb 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -813,6 +813,7 @@ int kvm_arch_handle_exit(CPUPPCState *env, struct kvm_run *run)
>  #ifdef CONFIG_PSERIES
>      case KVM_EXIT_PAPR_HCALL:
>          dprintf("handle PAPR hypercall\n");
> +        cpu_synchronize_state(env);
>          run->papr_hcall.ret = spapr_hypercall(env, run->papr_hcall.nr,
>                                                run->papr_hcall.args);
>          ret = 0;
>
David Gibson - Sept. 20, 2012, 11:53 a.m.
On Thu, Sep 20, 2012 at 09:38:58AM +0200, Alexander Graf wrote:
> 
> On 20.09.2012, at 09:08, David Gibson wrote:
> 
> > Currently the KVM exit path for PAPR hypercalls does not synchronize the
> > qemu cpu state with the KVM state.  Mostly this works, because the actual
> > hypercall arguments and return values are explicitly passed through the
> > kvm_run structure.  However, the hypercall path includes a privilege check,
> > to ensure that only the guest kernel can invoke hypercalls, not the guest
> > userspace.  Because of the lack of sync, this privilege check will use an
> > out of date copy of the MSR, which could lead either to guest userspace
> > being able to invoke hypercalls (a security hole for the guest) or to the
> > guest kernel being incorrectly refused privilege leading to various other
> > failures.
> > 
> > This patch fixes the bug by forcing a synchronization on the hypercall exit
> > path.  This does mean we have a potentially quite expensive get and set of
> > the state, however performance critical hypercalls are generally already
> > implemented inside KVM so this probably won't matter.  If it is a
> > performance problem we can optimize it later by having the kernel perform
> > the privilege check.  That will need a new capability, however, since qemu
> > will still need the privilege check for older kernels.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> 
> I would actually prefer to see that one fixed in kernel space.

That's a better fix, but we can't fix it purely in the kernel, because
there are existing released kernels that don't do the privilege check.

> If we
> send it through -stable there, we should be a lot better off. Also,
> PR KVM already checks for !MSR_PR in kvm.
David Gibson - Sept. 20, 2012, 11:54 a.m.
On Thu, Sep 20, 2012 at 09:39:56AM +0200, Jan Kiszka wrote:
> On 2012-09-20 09:08, David Gibson wrote:
> > Currently the KVM exit path for PAPR hypercalls does not synchronize the
> > qemu cpu state with the KVM state.  Mostly this works, because the actual
> > hypercall arguments and return values are explicitly passed through the
> > kvm_run structure.  However, the hypercall path includes a privilege check,
> > to ensure that only the guest kernel can invoke hypercalls, not the guest
> > userspace.  Because of the lack of sync, this privilege check will use an
> > out of date copy of the MSR, which could lead either to guest userspace
> > being able to invoke hypercalls (a security hole for the guest) or to the
> > guest kernel being incorrectly refused privilege leading to various other
> > failures.
> > 
> > This patch fixes the bug by forcing a synchronization on the hypercall exit
> > path.  This does mean we have a potentially quite expensive get and set of
> > the state, however performance critical hypercalls are generally already
> > implemented inside KVM so this probably won't matter.  If it is a
> > performance problem we can optimize it later by having the kernel perform
> > the privilege check.  That will need a new capability, however, since qemu
> > will still need the privilege check for older kernels.
> 
> If it's just about reading a small subset of the state (a single MSR?)
> to allow the privilege check, you can also open-code only that in HCALL
> exit. If it really matters.

We could, don't think it's worth the trouble.
Alexander Graf - Sept. 20, 2012, 12:44 p.m.
On 20.09.2012, at 13:53, David Gibson wrote:

> On Thu, Sep 20, 2012 at 09:38:58AM +0200, Alexander Graf wrote:
>> 
>> On 20.09.2012, at 09:08, David Gibson wrote:
>> 
>>> Currently the KVM exit path for PAPR hypercalls does not synchronize the
>>> qemu cpu state with the KVM state.  Mostly this works, because the actual
>>> hypercall arguments and return values are explicitly passed through the
>>> kvm_run structure.  However, the hypercall path includes a privilege check,
>>> to ensure that only the guest kernel can invoke hypercalls, not the guest
>>> userspace.  Because of the lack of sync, this privilege check will use an
>>> out of date copy of the MSR, which could lead either to guest userspace
>>> being able to invoke hypercalls (a security hole for the guest) or to the
>>> guest kernel being incorrectly refused privilege leading to various other
>>> failures.
>>> 
>>> This patch fixes the bug by forcing a synchronization on the hypercall exit
>>> path.  This does mean we have a potentially quite expensive get and set of
>>> the state, however performance critical hypercalls are generally already
>>> implemented inside KVM so this probably won't matter.  If it is a
>>> performance problem we can optimize it later by having the kernel perform
>>> the privilege check.  That will need a new capability, however, since qemu
>>> will still need the privilege check for older kernels.
>>> 
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> 
>> I would actually prefer to see that one fixed in kernel space.
> 
> That's a better fix, but we can't fix it purely in the kernel, because
> there are existing released kernels that don't do the privilege check.

There are security flaws fixed through -stable updates in the kernel any day, why should this one be handled differently?


Alex
David Gibson - Sept. 21, 2012, 12:22 a.m.
On Thu, Sep 20, 2012 at 02:44:26PM +0200, Alexander Graf wrote:
> 
> On 20.09.2012, at 13:53, David Gibson wrote:
> 
> > On Thu, Sep 20, 2012 at 09:38:58AM +0200, Alexander Graf wrote:
> >> 
> >> On 20.09.2012, at 09:08, David Gibson wrote:
> >> 
> >>> Currently the KVM exit path for PAPR hypercalls does not synchronize the
> >>> qemu cpu state with the KVM state.  Mostly this works, because the actual
> >>> hypercall arguments and return values are explicitly passed through the
> >>> kvm_run structure.  However, the hypercall path includes a privilege check,
> >>> to ensure that only the guest kernel can invoke hypercalls, not the guest
> >>> userspace.  Because of the lack of sync, this privilege check will use an
> >>> out of date copy of the MSR, which could lead either to guest userspace
> >>> being able to invoke hypercalls (a security hole for the guest) or to the
> >>> guest kernel being incorrectly refused privilege leading to various other
> >>> failures.
> >>> 
> >>> This patch fixes the bug by forcing a synchronization on the hypercall exit
> >>> path.  This does mean we have a potentially quite expensive get and set of
> >>> the state, however performance critical hypercalls are generally already
> >>> implemented inside KVM so this probably won't matter.  If it is a
> >>> performance problem we can optimize it later by having the kernel perform
> >>> the privilege check.  That will need a new capability, however, since qemu
> >>> will still need the privilege check for older kernels.
> >>> 
> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> 
> >> I would actually prefer to see that one fixed in kernel space.
> > 
> > That's a better fix, but we can't fix it purely in the kernel, because
> > there are existing released kernels that don't do the privilege check.
> 
> There are security flaws fixed through -stable updates in the kernel
> any day, why should this one be handled differently?

From the kernel's point of view, this is not obviously a security bug
- it passes a hypercall it doesn't know how to handle to qemu, qemu
handles it incorrectly.

And in any case, even if you do consider it a kernel security bug,
there's no reason that qemu should just allow that bug to appear when
it's capable of working around buggy kernels in a way that closes the
security hole.
Alexander Graf - Sept. 24, 2012, 2:27 p.m.
On 21.09.2012, at 02:22, David Gibson wrote:

> On Thu, Sep 20, 2012 at 02:44:26PM +0200, Alexander Graf wrote:
>> 
>> On 20.09.2012, at 13:53, David Gibson wrote:
>> 
>>> On Thu, Sep 20, 2012 at 09:38:58AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 20.09.2012, at 09:08, David Gibson wrote:
>>>> 
>>>>> Currently the KVM exit path for PAPR hypercalls does not synchronize the
>>>>> qemu cpu state with the KVM state.  Mostly this works, because the actual
>>>>> hypercall arguments and return values are explicitly passed through the
>>>>> kvm_run structure.  However, the hypercall path includes a privilege check,
>>>>> to ensure that only the guest kernel can invoke hypercalls, not the guest
>>>>> userspace.  Because of the lack of sync, this privilege check will use an
>>>>> out of date copy of the MSR, which could lead either to guest userspace
>>>>> being able to invoke hypercalls (a security hole for the guest) or to the
>>>>> guest kernel being incorrectly refused privilege leading to various other
>>>>> failures.
>>>>> 
>>>>> This patch fixes the bug by forcing a synchronization on the hypercall exit
>>>>> path.  This does mean we have a potentially quite expensive get and set of
>>>>> the state, however performance critical hypercalls are generally already
>>>>> implemented inside KVM so this probably won't matter.  If it is a
>>>>> performance problem we can optimize it later by having the kernel perform
>>>>> the privilege check.  That will need a new capability, however, since qemu
>>>>> will still need the privilege check for older kernels.
>>>>> 
>>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>> 
>>>> I would actually prefer to see that one fixed in kernel space.
>>> 
>>> That's a better fix, but we can't fix it purely in the kernel, because
>>> there are existing released kernels that don't do the privilege check.
>> 
>> There are security flaws fixed through -stable updates in the kernel
>> any day, why should this one be handled differently?
> 
> From the kernel's point of view, this is not obviously a security bug
> - it passes a hypercall it doesn't know how to handle to qemu, qemu
> handles it incorrectly.
> 
> And in any case, even if you do consider it a kernel security bug,
> there's no reason that qemu should just allow that bug to appear when
> it's capable of working around buggy kernels in a way that closes the
> security hole.

This is the code in the HV kernel side:

        case BOOK3S_INTERRUPT_SYSCALL:
        {
                /* hcall - punt to userspace */
                int i;

                if (vcpu->arch.shregs.msr & MSR_PR) {
                        /* sc 1 from userspace - reflect to guest syscall */
                        kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_SYSCALL);
                        r = RESUME_GUEST;
                        break;
                }
                run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
                for (i = 0; i < 9; ++i)
                        run->papr_hcall.args[i] = kvmppc_get_gpr(vcpu, 4 + i);
                run->exit_reason = KVM_EXIT_PAPR_HCALL;
                vcpu->arch.hcall_needed = 1;
                r = RESUME_HOST;
                break;
        }

So it already handles hypercalls in user space and deflects them back. Everyone's happy :).

The only outstanding bug is that QEMU shouldn't interpret env->msr when handling hypercalls from KVM, since these are already guaranteed to be checked and MSR in QEMU does not reflect the current MSR in the vcpu, so we might end up rejecting hypercalls by accident.


Alex
David Gibson - Sept. 25, 2012, 7:05 a.m.
On Mon, Sep 24, 2012 at 04:27:20PM +0200, Alexander Graf wrote:
> 
> On 21.09.2012, at 02:22, David Gibson wrote:
> 
> > On Thu, Sep 20, 2012 at 02:44:26PM +0200, Alexander Graf wrote:
> >> 
> >> On 20.09.2012, at 13:53, David Gibson wrote:
> >> 
> >>> On Thu, Sep 20, 2012 at 09:38:58AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 20.09.2012, at 09:08, David Gibson wrote:
> >>>> 
> >>>>> Currently the KVM exit path for PAPR hypercalls does not synchronize the
> >>>>> qemu cpu state with the KVM state.  Mostly this works, because the actual
> >>>>> hypercall arguments and return values are explicitly passed through the
> >>>>> kvm_run structure.  However, the hypercall path includes a privilege check,
> >>>>> to ensure that only the guest kernel can invoke hypercalls, not the guest
> >>>>> userspace.  Because of the lack of sync, this privilege check will use an
> >>>>> out of date copy of the MSR, which could lead either to guest userspace
> >>>>> being able to invoke hypercalls (a security hole for the guest) or to the
> >>>>> guest kernel being incorrectly refused privilege leading to various other
> >>>>> failures.
> >>>>> 
> >>>>> This patch fixes the bug by forcing a synchronization on the hypercall exit
> >>>>> path.  This does mean we have a potentially quite expensive get and set of
> >>>>> the state, however performance critical hypercalls are generally already
> >>>>> implemented inside KVM so this probably won't matter.  If it is a
> >>>>> performance problem we can optimize it later by having the kernel perform
> >>>>> the privilege check.  That will need a new capability, however, since qemu
> >>>>> will still need the privilege check for older kernels.
> >>>>> 
> >>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>>> 
> >>>> I would actually prefer to see that one fixed in kernel space.
> >>> 
> >>> That's a better fix, but we can't fix it purely in the kernel, because
> >>> there are existing released kernels that don't do the privilege check.
> >> 
> >> There are security flaws fixed through -stable updates in the kernel
> >> any day, why should this one be handled differently?
> > 
> > From the kernel's point of view, this is not obviously a security bug
> > - it passes a hypercall it doesn't know how to handle to qemu, qemu
> > handles it incorrectly.
> > 
> > And in any case, even if you do consider it a kernel security bug,
> > there's no reason that qemu should just allow that bug to appear when
> > it's capable of working around buggy kernels in a way that closes the
> > security hole.
> 
> This is the code in the HV kernel side:
> 
>         case BOOK3S_INTERRUPT_SYSCALL:
>         {
>                 /* hcall - punt to userspace */
>                 int i;
> 
>                 if (vcpu->arch.shregs.msr & MSR_PR) {
>                         /* sc 1 from userspace - reflect to guest syscall */
>                         kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_SYSCALL);
>                         r = RESUME_GUEST;
>                         break;
>                 }
>                 run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
>                 for (i = 0; i < 9; ++i)
>                         run->papr_hcall.args[i] = kvmppc_get_gpr(vcpu, 4 + i);
>                 run->exit_reason = KVM_EXIT_PAPR_HCALL;
>                 vcpu->arch.hcall_needed = 1;
>                 r = RESUME_HOST;
>                 break;
>         }
> 
> So it already handles hypercalls in user space and deflects them
> back. Everyone's happy :).

Ah, so it does.  I was mistaken.

> The only outstanding bug is that QEMU shouldn't interpret env->msr
> when handling hypercalls from KVM, since these are already
> guaranteed to be checked and MSR in QEMU does not reflect the
> current MSR in the vcpu, so we might end up rejecting hypercalls by
> accident.

I've written a suitable patch, just needs a little more testing and
I'll send it out.

Patch

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 546c116..78a47fb 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -813,6 +813,7 @@  int kvm_arch_handle_exit(CPUPPCState *env, struct kvm_run *run)
 #ifdef CONFIG_PSERIES
     case KVM_EXIT_PAPR_HCALL:
         dprintf("handle PAPR hypercall\n");
+        cpu_synchronize_state(env);
         run->papr_hcall.ret = spapr_hypercall(env, run->papr_hcall.nr,
                                               run->papr_hcall.args);
         ret = 0;