diff mbox

[4/5] KVM: PPC: Book3S HV: Don't give the guest RW access to RO pages

Message ID 20121122092841.GE31117@bloggs.ozlabs.ibm.com
State New, archived
Headers show

Commit Message

Paul Mackerras Nov. 22, 2012, 9:28 a.m. UTC
Currently, if the guest does an H_PROTECT hcall requesting that the
permissions on a HPT entry be changed to allow writing, we make the
requested change even if the page is marked read-only in the host
Linux page tables.  This is a problem since it would for instance
allow a guest to modify a page that KSM has decided can be shared
between multiple guests.

To fix this, if the new permissions for the page allow writing, we need
to look up the memslot for the page, work out the host virtual address,
and look up the Linux page tables to get the PTE for the page.  If that
PTE is read-only, we reduce the HPTE permissions to read-only.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |   22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

Comments

Alexander Graf Nov. 23, 2012, 3:47 p.m. UTC | #1
On 22.11.2012, at 10:28, Paul Mackerras wrote:

> Currently, if the guest does an H_PROTECT hcall requesting that the
> permissions on a HPT entry be changed to allow writing, we make the
> requested change even if the page is marked read-only in the host
> Linux page tables.  This is a problem since it would for instance
> allow a guest to modify a page that KSM has decided can be shared
> between multiple guests.
> 
> To fix this, if the new permissions for the page allow writing, we need
> to look up the memslot for the page, work out the host virtual address,
> and look up the Linux page tables to get the PTE for the page.  If that
> PTE is read-only, we reduce the HPTE permissions to read-only.

How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?


Alex

> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/kvm/book3s_hv_rm_mmu.c |   22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> index 7e1f7e2..19c93ba 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
> @@ -629,6 +629,28 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
> 			asm volatile("tlbiel %0" : : "r" (rb));
> 			asm volatile("ptesync" : : : "memory");
> 		}
> +		/*
> +		 * If the host has this page as readonly but the guest
> +		 * wants to make it read/write, reduce the permissions.
> +		 * Checking the host permissions involves finding the
> +		 * memslot and then the Linux PTE for the page.
> +		 */
> +		if (hpte_is_writable(r) && kvm->arch.using_mmu_notifiers) {
> +			unsigned long psize, gfn, hva;
> +			struct kvm_memory_slot *memslot;
> +			pgd_t *pgdir = vcpu->arch.pgdir;
> +			pte_t pte;
> +
> +			psize = hpte_page_size(v, r);
> +			gfn = ((r & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
> +			memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
> +			if (memslot) {
> +				hva = __gfn_to_hva_memslot(memslot, gfn);
> +				pte = lookup_linux_pte(pgdir, hva, 1, &psize);
> +				if (pte_present(pte) && !pte_write(pte))
> +					r = hpte_make_readonly(r);
> +			}
> +		}
> 	}
> 	hpte[1] = r;
> 	eieio();
> -- 
> 1.7.10.rc3.219.g53414
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras Nov. 23, 2012, 10:13 p.m. UTC | #2
On Fri, Nov 23, 2012 at 04:47:45PM +0100, Alexander Graf wrote:
> 
> On 22.11.2012, at 10:28, Paul Mackerras wrote:
> 
> > Currently, if the guest does an H_PROTECT hcall requesting that the
> > permissions on a HPT entry be changed to allow writing, we make the
> > requested change even if the page is marked read-only in the host
> > Linux page tables.  This is a problem since it would for instance
> > allow a guest to modify a page that KSM has decided can be shared
> > between multiple guests.
> > 
> > To fix this, if the new permissions for the page allow writing, we need
> > to look up the memslot for the page, work out the host virtual address,
> > and look up the Linux page tables to get the PTE for the page.  If that
> > PTE is read-only, we reduce the HPTE permissions to read-only.
> 
> How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?

The scenario goes something like this:

1. Guest creates an HPTE with RO permissions.
2. KSM decides the page is identical to another page and changes the
   HPTE to point to a shared copy.  Permissions are still RO.
3. Guest decides it wants write access to the page and does an
   H_PROTECT hcall to change the permissions on the HPTE to RW.

The bug is that we actually make the requested change in step 3.
Instead we should leave it at RO, then when the guest tries to write
to the page, we take a hypervisor page fault, copy the page and give
the guest write access to its own copy of the page.

So what this patch does is add code to H_PROTECT so that if the guest
is requesting RW access, we check the Linux PTE to see if the
underlying guest page is RO, and if so reduce the permissions in the
HPTE to RO.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Nov. 24, 2012, 9:05 a.m. UTC | #3
On 23.11.2012, at 23:13, Paul Mackerras <paulus@samba.org> wrote:

> On Fri, Nov 23, 2012 at 04:47:45PM +0100, Alexander Graf wrote:
>> 
>> On 22.11.2012, at 10:28, Paul Mackerras wrote:
>> 
>>> Currently, if the guest does an H_PROTECT hcall requesting that the
>>> permissions on a HPT entry be changed to allow writing, we make the
>>> requested change even if the page is marked read-only in the host
>>> Linux page tables.  This is a problem since it would for instance
>>> allow a guest to modify a page that KSM has decided can be shared
>>> between multiple guests.
>>> 
>>> To fix this, if the new permissions for the page allow writing, we need
>>> to look up the memslot for the page, work out the host virtual address,
>>> and look up the Linux page tables to get the PTE for the page.  If that
>>> PTE is read-only, we reduce the HPTE permissions to read-only.
>> 
>> How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?
> 
> The scenario goes something like this:
> 
> 1. Guest creates an HPTE with RO permissions.
> 2. KSM decides the page is identical to another page and changes the
>   HPTE to point to a shared copy.  Permissions are still RO.
> 3. Guest decides it wants write access to the page and does an
>   H_PROTECT hcall to change the permissions on the HPTE to RW.
> 
> The bug is that we actually make the requested change in step 3.
> Instead we should leave it at RO, then when the guest tries to write
> to the page, we take a hypervisor page fault, copy the page and give
> the guest write access to its own copy of the page.
> 
> So what this patch does is add code to H_PROTECT so that if the guest
> is requesting RW access, we check the Linux PTE to see if the
> underlying guest page is RO, and if so reduce the permissions in the
> HPTE to RO.

But this will be guest visible, because now H_PROTECT doesn't actually mark the page R/W in the HTAB, right?

So the flow with this patch is:

  - guest page permission fault
  - guest does H_PROTECT to mark page r/w
  - H_PROTECT doesn't do anything
  - guest returns from permission handler, triggers write fault


2 questions here:

How does the host know that the page is actually r/w?

How does this work on 970? I thought page faults always go straight to the guest there.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras Nov. 24, 2012, 9:32 a.m. UTC | #4
On Sat, Nov 24, 2012 at 10:05:37AM +0100, Alexander Graf wrote:
> 
> 
> On 23.11.2012, at 23:13, Paul Mackerras <paulus@samba.org> wrote:
> 
> > On Fri, Nov 23, 2012 at 04:47:45PM +0100, Alexander Graf wrote:
> >> 
> >> On 22.11.2012, at 10:28, Paul Mackerras wrote:
> >> 
> >>> Currently, if the guest does an H_PROTECT hcall requesting that the
> >>> permissions on a HPT entry be changed to allow writing, we make the
> >>> requested change even if the page is marked read-only in the host
> >>> Linux page tables.  This is a problem since it would for instance
> >>> allow a guest to modify a page that KSM has decided can be shared
> >>> between multiple guests.
> >>> 
> >>> To fix this, if the new permissions for the page allow writing, we need
> >>> to look up the memslot for the page, work out the host virtual address,
> >>> and look up the Linux page tables to get the PTE for the page.  If that
> >>> PTE is read-only, we reduce the HPTE permissions to read-only.
> >> 
> >> How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?
> > 
> > The scenario goes something like this:
> > 
> > 1. Guest creates an HPTE with RO permissions.
> > 2. KSM decides the page is identical to another page and changes the
> >   HPTE to point to a shared copy.  Permissions are still RO.
> > 3. Guest decides it wants write access to the page and does an
> >   H_PROTECT hcall to change the permissions on the HPTE to RW.
> > 
> > The bug is that we actually make the requested change in step 3.
> > Instead we should leave it at RO, then when the guest tries to write
> > to the page, we take a hypervisor page fault, copy the page and give
> > the guest write access to its own copy of the page.
> > 
> > So what this patch does is add code to H_PROTECT so that if the guest
> > is requesting RW access, we check the Linux PTE to see if the
> > underlying guest page is RO, and if so reduce the permissions in the
> > HPTE to RO.
> 
> But this will be guest visible, because now H_PROTECT doesn't actually mark the page R/W in the HTAB, right?

No - the guest view of the HPTE has R/W permissions.  The guest view
of the HPTE is made up of doubleword 0 from the real HPT plus
rev->guest_rpte for doubleword 1 (where rev is the entry in the revmap
array, kvm->arch.revmap, for the HPTE).  The guest view can be
different from the host/hardware view, which is in the real HPT.  For
instance, the guest view of a HPTE might be valid but the host view
might be invalid because the underlying real page has been paged out -
in that case we use a software bit which we call HPTE_V_ABSENT to
remind ourselves that there is something valid there from the guest's
point of view.  Or the guest view can be R/W but the host view is RO,
as in the case where KSM has merged the page.

> So the flow with this patch is:
> 
>   - guest page permission fault

This comes through the host (kvmppc_hpte_hv_fault()) which looks at
the guest view of the HPTE, sees that it has RO permissions, and sends
the page fault to the guest.

>   - guest does H_PROTECT to mark page r/w
>   - H_PROTECT doesn't do anything
>   - guest returns from permission handler, triggers write fault

This comes once again to kvmppc_hpte_hv_fault(), which sees that the
guest view of the HPTE has R/W permissions now, and sends the page
fault to kvmppc_book3s_hv_page_fault(), which requests write access to
the page, possibly triggering copy-on-write or whatever, and updates
the real HPTE to have R/W permissions and possibly point to a new page
of memory.

> 
> 2 questions here:
> 
> How does the host know that the page is actually r/w?

I assume you mean RO?  It looks up the memslot for the guest physical
address (which it gets from rev->guest_rpte), uses that to work out
the host virtual address (i.e. the address in qemu's address space),
looks up the Linux PTE in qemu's Linux page tables, and looks at the
_PAGE_RW bit there.

> How does this work on 970? I thought page faults always go straight to the guest there.

They do, which is why PPC970 can't do any of this.  On PPC970 we have
kvm->arch.using_mmu_notifiers == 0, and that makes the code pin every
page of guest memory that is mapped by a guest HPTE (with a Linux
guest, that means every page, because of the linear mapping).  On
POWER7 we have kvm->arch.using_mmu_notifiers == 1, which enables
host paging and deduplication of guest memory.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf Nov. 26, 2012, 1:09 p.m. UTC | #5
On 24.11.2012, at 10:32, Paul Mackerras wrote:

> On Sat, Nov 24, 2012 at 10:05:37AM +0100, Alexander Graf wrote:
>> 
>> 
>> On 23.11.2012, at 23:13, Paul Mackerras <paulus@samba.org> wrote:
>> 
>>> On Fri, Nov 23, 2012 at 04:47:45PM +0100, Alexander Graf wrote:
>>>> 
>>>> On 22.11.2012, at 10:28, Paul Mackerras wrote:
>>>> 
>>>>> Currently, if the guest does an H_PROTECT hcall requesting that the
>>>>> permissions on a HPT entry be changed to allow writing, we make the
>>>>> requested change even if the page is marked read-only in the host
>>>>> Linux page tables.  This is a problem since it would for instance
>>>>> allow a guest to modify a page that KSM has decided can be shared
>>>>> between multiple guests.
>>>>> 
>>>>> To fix this, if the new permissions for the page allow writing, we need
>>>>> to look up the memslot for the page, work out the host virtual address,
>>>>> and look up the Linux page tables to get the PTE for the page.  If that
>>>>> PTE is read-only, we reduce the HPTE permissions to read-only.
>>>> 
>>>> How does KSM handle this usually? If you reduce the permissions to R/O, how do you ever get a R/W page from a deduplicated one?
>>> 
>>> The scenario goes something like this:
>>> 
>>> 1. Guest creates an HPTE with RO permissions.
>>> 2. KSM decides the page is identical to another page and changes the
>>>  HPTE to point to a shared copy.  Permissions are still RO.
>>> 3. Guest decides it wants write access to the page and does an
>>>  H_PROTECT hcall to change the permissions on the HPTE to RW.
>>> 
>>> The bug is that we actually make the requested change in step 3.
>>> Instead we should leave it at RO, then when the guest tries to write
>>> to the page, we take a hypervisor page fault, copy the page and give
>>> the guest write access to its own copy of the page.
>>> 
>>> So what this patch does is add code to H_PROTECT so that if the guest
>>> is requesting RW access, we check the Linux PTE to see if the
>>> underlying guest page is RO, and if so reduce the permissions in the
>>> HPTE to RO.
>> 
>> But this will be guest visible, because now H_PROTECT doesn't actually mark the page R/W in the HTAB, right?
> 
> No - the guest view of the HPTE has R/W permissions.  The guest view
> of the HPTE is made up of doubleword 0 from the real HPT plus
> rev->guest_rpte for doubleword 1 (where rev is the entry in the revmap
> array, kvm->arch.revmap, for the HPTE).  The guest view can be
> different from the host/hardware view, which is in the real HPT.  For
> instance, the guest view of a HPTE might be valid but the host view
> might be invalid because the underlying real page has been paged out -
> in that case we use a software bit which we call HPTE_V_ABSENT to
> remind ourselves that there is something valid there from the guest's
> point of view.  Or the guest view can be R/W but the host view is RO,
> as in the case where KSM has merged the page.
> 
>> So the flow with this patch is:
>> 
>>  - guest page permission fault
> 
> This comes through the host (kvmppc_hpte_hv_fault()) which looks at
> the guest view of the HPTE, sees that it has RO permissions, and sends
> the page fault to the guest.
> 
>>  - guest does H_PROTECT to mark page r/w
>>  - H_PROTECT doesn't do anything
>>  - guest returns from permission handler, triggers write fault
> 
> This comes once again to kvmppc_hpte_hv_fault(), which sees that the
> guest view of the HPTE has R/W permissions now, and sends the page
> fault to kvmppc_book3s_hv_page_fault(), which requests write access to
> the page, possibly triggering copy-on-write or whatever, and updates
> the real HPTE to have R/W permissions and possibly point to a new page
> of memory.
> 
>> 
>> 2 questions here:
>> 
>> How does the host know that the page is actually r/w?
> 
> I assume you mean RO?  It looks up the memslot for the guest physical
> address (which it gets from rev->guest_rpte), uses that to work out
> the host virtual address (i.e. the address in qemu's address space),
> looks up the Linux PTE in qemu's Linux page tables, and looks at the
> _PAGE_RW bit there.
> 
>> How does this work on 970? I thought page faults always go straight to the guest there.
> 
> They do, which is why PPC970 can't do any of this.  On PPC970 we have
> kvm->arch.using_mmu_notifiers == 0, and that makes the code pin every
> page of guest memory that is mapped by a guest HPTE (with a Linux
> guest, that means every page, because of the linear mapping).  On
> POWER7 we have kvm->arch.using_mmu_notifiers == 1, which enables
> host paging and deduplication of guest memory.

Thanks a lot for the detailed explanation! Maybe you guys should just release an HV capable p7 system publicly, so we can deprecate 970 support. That would make a few things quite a bit easier ;)

Thanks, applied to kvm-ppc-next.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 7e1f7e2..19c93ba 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -629,6 +629,28 @@  long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 			asm volatile("tlbiel %0" : : "r" (rb));
 			asm volatile("ptesync" : : : "memory");
 		}
+		/*
+		 * If the host has this page as readonly but the guest
+		 * wants to make it read/write, reduce the permissions.
+		 * Checking the host permissions involves finding the
+		 * memslot and then the Linux PTE for the page.
+		 */
+		if (hpte_is_writable(r) && kvm->arch.using_mmu_notifiers) {
+			unsigned long psize, gfn, hva;
+			struct kvm_memory_slot *memslot;
+			pgd_t *pgdir = vcpu->arch.pgdir;
+			pte_t pte;
+
+			psize = hpte_page_size(v, r);
+			gfn = ((r & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT;
+			memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn);
+			if (memslot) {
+				hva = __gfn_to_hva_memslot(memslot, gfn);
+				pte = lookup_linux_pte(pgdir, hva, 1, &psize);
+				if (pte_present(pte) && !pte_write(pte))
+					r = hpte_make_readonly(r);
+			}
+		}
 	}
 	hpte[1] = r;
 	eieio();