KVM: PPC: Book3S PR: only call slbmte for valid SLB entries

Message ID 150607286967.26027.12529646475118424696.stgit@bahia.lan
State Superseded
Headers show
Series
  • KVM: PPC: Book3S PR: only call slbmte for valid SLB entries
Related show

Commit Message

Greg Kurz Sept. 22, 2017, 9:34 a.m.
Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS,
some of which are valid (ie, SLB_ESID_V is set) and the rest are
likely all-zeroes (with QEMU at least).

Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which
assumes to find the SLB index in the 3 lower bits of its rb argument.
When passed zeroed arguments, it happily overwrites the 0th SLB entry
with zeroes. This is exactly what happens while doing live migration
with QEMU when the destination pushes the incoming SLB descriptors to
KVM PR. When reloading the SLBs at the next synchronization, QEMU first
clears its SLB array and only restore valid ones, but the 0th one is
now gone and we cannot access the corresponding memory anymore:

(qemu) x/x $pc
c0000000000b742c: Cannot access memory

To avoid this, let's filter out non-valid SLB entries, like we
already do for Book3S HV.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/kvm/book3s_pr.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Gibson Sept. 26, 2017, 3:56 a.m. | #1
On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote:
> Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS,
> some of which are valid (ie, SLB_ESID_V is set) and the rest are
> likely all-zeroes (with QEMU at least).
> 
> Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which
> assumes to find the SLB index in the 3 lower bits of its rb argument.
> When passed zeroed arguments, it happily overwrites the 0th SLB entry
> with zeroes. This is exactly what happens while doing live migration
> with QEMU when the destination pushes the incoming SLB descriptors to
> KVM PR. When reloading the SLBs at the next synchronization, QEMU first
> clears its SLB array and only restore valid ones, but the 0th one is
> now gone and we cannot access the corresponding memory anymore:
> 
> (qemu) x/x $pc
> c0000000000b742c: Cannot access memory
> 
> To avoid this, let's filter out non-valid SLB entries, like we
> already do for Book3S HV.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

This seems like a good idea, but to make it fully correct, don't we
also need to fully flush the SLB before inserting the new entries.

> ---
>  arch/powerpc/kvm/book3s_pr.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
> index 3beb4ff469d1..cb6894e55f97 100644
> --- a/arch/powerpc/kvm/book3s_pr.c
> +++ b/arch/powerpc/kvm/book3s_pr.c
> @@ -1328,8 +1328,10 @@ static int kvm_arch_vcpu_ioctl_set_sregs_pr(struct kvm_vcpu *vcpu,
>  	vcpu3s->sdr1 = sregs->u.s.sdr1;
>  	if (vcpu->arch.hflags & BOOK3S_HFLAG_SLB) {
>  		for (i = 0; i < 64; i++) {
> -			vcpu->arch.mmu.slbmte(vcpu, sregs->u.s.ppc64.slb[i].slbv,
> -						    sregs->u.s.ppc64.slb[i].slbe);
> +			u64 rb = sregs->u.s.ppc64.slb[i].slbe;
> +			u64 rs = sregs->u.s.ppc64.slb[i].slbv;
> +			if (rb & SLB_ESID_V)
> +				vcpu->arch.mmu.slbmte(vcpu, rs, rb);
>  		}
>  	} else {
>  		for (i = 0; i < 16; i++) {
>
Michael Ellerman Sept. 26, 2017, 5:24 a.m. | #2
David Gibson <david@gibson.dropbear.id.au> writes:

> On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote:
>> Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS,
>> some of which are valid (ie, SLB_ESID_V is set) and the rest are
>> likely all-zeroes (with QEMU at least).
>> 
>> Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which
>> assumes to find the SLB index in the 3 lower bits of its rb argument.
>> When passed zeroed arguments, it happily overwrites the 0th SLB entry
>> with zeroes. This is exactly what happens while doing live migration
>> with QEMU when the destination pushes the incoming SLB descriptors to
>> KVM PR. When reloading the SLBs at the next synchronization, QEMU first
>> clears its SLB array and only restore valid ones, but the 0th one is
>> now gone and we cannot access the corresponding memory anymore:
>> 
>> (qemu) x/x $pc
>> c0000000000b742c: Cannot access memory
>> 
>> To avoid this, let's filter out non-valid SLB entries, like we
>> already do for Book3S HV.
>> 
>> Signed-off-by: Greg Kurz <groug@kaod.org>
>
> This seems like a good idea, but to make it fully correct, don't we
> also need to fully flush the SLB before inserting the new entries.

We would need to do that yeah.

But I don't think I like this patch, it would mean userspace has no way
of programming an invalid SLB entry. It's true that in general that
isn't something we care about doing, but the API should allow it.

For example the kernel could leave invalid entries in place and flip the
valid bit when it wanted to make them valid, and this patch would
prevent that state being successfully migrated IIUIC.

cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras Sept. 26, 2017, 11:34 a.m. | #3
On Tue, Sep 26, 2017 at 03:24:05PM +1000, Michael Ellerman wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote:
> >> Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS,
> >> some of which are valid (ie, SLB_ESID_V is set) and the rest are
> >> likely all-zeroes (with QEMU at least).
> >> 
> >> Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which
> >> assumes to find the SLB index in the 3 lower bits of its rb argument.
> >> When passed zeroed arguments, it happily overwrites the 0th SLB entry
> >> with zeroes. This is exactly what happens while doing live migration
> >> with QEMU when the destination pushes the incoming SLB descriptors to
> >> KVM PR. When reloading the SLBs at the next synchronization, QEMU first
> >> clears its SLB array and only restore valid ones, but the 0th one is
> >> now gone and we cannot access the corresponding memory anymore:
> >> 
> >> (qemu) x/x $pc
> >> c0000000000b742c: Cannot access memory
> >> 
> >> To avoid this, let's filter out non-valid SLB entries, like we
> >> already do for Book3S HV.
> >> 
> >> Signed-off-by: Greg Kurz <groug@kaod.org>
> >
> > This seems like a good idea, but to make it fully correct, don't we
> > also need to fully flush the SLB before inserting the new entries.
> 
> We would need to do that yeah.
> 
> But I don't think I like this patch, it would mean userspace has no way
> of programming an invalid SLB entry. It's true that in general that
> isn't something we care about doing, but the API should allow it.
> 
> For example the kernel could leave invalid entries in place and flip the
> valid bit when it wanted to make them valid, and this patch would
> prevent that state being successfully migrated IIUIC.

If I remember correctly, the architecture says that slbmfee/slbmfev
return all zeroes for an invalid entry, so there would be no way for
the guest kernel to do what you suggest.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Ellerman Sept. 27, 2017, 3:25 a.m. | #4
Paul Mackerras <paulus@ozlabs.org> writes:

> On Tue, Sep 26, 2017 at 03:24:05PM +1000, Michael Ellerman wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > On Fri, Sep 22, 2017 at 11:34:29AM +0200, Greg Kurz wrote:
>> >> Userland passes an array of 64 SLB descriptors to KVM_SET_SREGS,
>> >> some of which are valid (ie, SLB_ESID_V is set) and the rest are
>> >> likely all-zeroes (with QEMU at least).
>> >> 
>> >> Each of them is then passed to kvmppc_mmu_book3s_64_slbmte(), which
>> >> assumes to find the SLB index in the 3 lower bits of its rb argument.
>> >> When passed zeroed arguments, it happily overwrites the 0th SLB entry
>> >> with zeroes. This is exactly what happens while doing live migration
>> >> with QEMU when the destination pushes the incoming SLB descriptors to
>> >> KVM PR. When reloading the SLBs at the next synchronization, QEMU first
>> >> clears its SLB array and only restore valid ones, but the 0th one is
>> >> now gone and we cannot access the corresponding memory anymore:
>> >> 
>> >> (qemu) x/x $pc
>> >> c0000000000b742c: Cannot access memory
>> >> 
>> >> To avoid this, let's filter out non-valid SLB entries, like we
>> >> already do for Book3S HV.
>> >> 
>> >> Signed-off-by: Greg Kurz <groug@kaod.org>
>> >
>> > This seems like a good idea, but to make it fully correct, don't we
>> > also need to fully flush the SLB before inserting the new entries.
>> 
>> We would need to do that yeah.
>> 
>> But I don't think I like this patch, it would mean userspace has no way
>> of programming an invalid SLB entry. It's true that in general that
>> isn't something we care about doing, but the API should allow it.
>> 
>> For example the kernel could leave invalid entries in place and flip the
>> valid bit when it wanted to make them valid, and this patch would
>> prevent that state being successfully migrated IIUIC.
>
> If I remember correctly, the architecture says that slbmfee/slbmfev
> return all zeroes for an invalid entry, so there would be no way for
> the guest kernel to do what you suggest.

You're right it does.

We have code in xmon that reads entries and then checks for SLB_ESID_V,
but I guess that's just overly pessimistic.

cheers
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 3beb4ff469d1..cb6894e55f97 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1328,8 +1328,10 @@  static int kvm_arch_vcpu_ioctl_set_sregs_pr(struct kvm_vcpu *vcpu,
 	vcpu3s->sdr1 = sregs->u.s.sdr1;
 	if (vcpu->arch.hflags & BOOK3S_HFLAG_SLB) {
 		for (i = 0; i < 64; i++) {
-			vcpu->arch.mmu.slbmte(vcpu, sregs->u.s.ppc64.slb[i].slbv,
-						    sregs->u.s.ppc64.slb[i].slbe);
+			u64 rb = sregs->u.s.ppc64.slb[i].slbe;
+			u64 rs = sregs->u.s.ppc64.slb[i].slbv;
+			if (rb & SLB_ESID_V)
+				vcpu->arch.mmu.slbmte(vcpu, rs, rb);
 		}
 	} else {
 		for (i = 0; i < 16; i++) {