[v10,3/8] KVM: PPC: Shared pages support for secure guests
diff mbox series

Message ID 20191104041800.24527-4-bharata@linux.ibm.com
State Changes Requested
Headers show
Series
  • KVM: PPC: Driver to manage pages of secure guest
Related show

Commit Message

Bharata B Rao Nov. 4, 2019, 4:17 a.m. UTC
A secure guest will share some of its pages with hypervisor (Eg. virtio
bounce buffers etc). Support sharing of pages between hypervisor and
ultravisor.

Shared page is reachable via both HV and UV side page tables. Once a
secure page is converted to shared page, the device page that represents
the secure page is unmapped from the HV side page tables.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h  |  3 ++
 arch/powerpc/kvm/book3s_hv_uvmem.c | 85 ++++++++++++++++++++++++++++--
 2 files changed, 84 insertions(+), 4 deletions(-)

Comments

Paul Mackerras Nov. 6, 2019, 4:52 a.m. UTC | #1
On Mon, Nov 04, 2019 at 09:47:55AM +0530, Bharata B Rao wrote:
> A secure guest will share some of its pages with hypervisor (Eg. virtio
> bounce buffers etc). Support sharing of pages between hypervisor and
> ultravisor.
> 
> Shared page is reachable via both HV and UV side page tables. Once a
> secure page is converted to shared page, the device page that represents
> the secure page is unmapped from the HV side page tables.

I'd like to understand a little better what's going on - see below...

> +/*
> + * Shares the page with HV, thus making it a normal page.
> + *
> + * - If the page is already secure, then provision a new page and share
> + * - If the page is a normal page, share the existing page
> + *
> + * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
> + * to unmap the device page from QEMU's page tables.
> + */
> +static unsigned long
> +kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
> +{
> +
> +	int ret = H_PARAMETER;
> +	struct page *uvmem_page;
> +	struct kvmppc_uvmem_page_pvt *pvt;
> +	unsigned long pfn;
> +	unsigned long gfn = gpa >> page_shift;
> +	int srcu_idx;
> +	unsigned long uvmem_pfn;
> +
> +	srcu_idx = srcu_read_lock(&kvm->srcu);
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> +		uvmem_page = pfn_to_page(uvmem_pfn);
> +		pvt = uvmem_page->zone_device_data;
> +		pvt->skip_page_out = true;
> +	}
> +
> +retry:
> +	mutex_unlock(&kvm->arch.uvmem_lock);
> +	pfn = gfn_to_pfn(kvm, gfn);

At this point, pfn is the value obtained from the page table for
userspace (e.g. QEMU), right?  I would think it should be equal to
uvmem_pfn in most cases, shouldn't it?  If not, what is it going to
be?

> +	if (is_error_noslot_pfn(pfn))
> +		goto out;
> +
> +	mutex_lock(&kvm->arch.uvmem_lock);
> +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> +		uvmem_page = pfn_to_page(uvmem_pfn);
> +		pvt = uvmem_page->zone_device_data;
> +		pvt->skip_page_out = true;
> +		kvm_release_pfn_clean(pfn);

This is going to do a put_page(), unless pfn is a reserved pfn.  If it
does a put_page(), where did we do the corresponding get_page()?
However, since kvmppc_gfn_is_uvmem_pfn() returned true, doesn't that
mean that pfn here should be a device pfn, and in fact should be the
same as uvmem_pfn (possibly with some extra bit(s) set)?  What does
kvm_is_reserved_pfn() return for a device pfn?

Paul.
Bharata B Rao Nov. 6, 2019, 8:22 a.m. UTC | #2
On Wed, Nov 06, 2019 at 03:52:38PM +1100, Paul Mackerras wrote:
> On Mon, Nov 04, 2019 at 09:47:55AM +0530, Bharata B Rao wrote:
> > A secure guest will share some of its pages with hypervisor (Eg. virtio
> > bounce buffers etc). Support sharing of pages between hypervisor and
> > ultravisor.
> > 
> > Shared page is reachable via both HV and UV side page tables. Once a
> > secure page is converted to shared page, the device page that represents
> > the secure page is unmapped from the HV side page tables.
> 
> I'd like to understand a little better what's going on - see below...
> 
> > +/*
> > + * Shares the page with HV, thus making it a normal page.
> > + *
> > + * - If the page is already secure, then provision a new page and share
> > + * - If the page is a normal page, share the existing page
> > + *
> > + * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
> > + * to unmap the device page from QEMU's page tables.
> > + */
> > +static unsigned long
> > +kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
> > +{
> > +
> > +	int ret = H_PARAMETER;
> > +	struct page *uvmem_page;
> > +	struct kvmppc_uvmem_page_pvt *pvt;
> > +	unsigned long pfn;
> > +	unsigned long gfn = gpa >> page_shift;
> > +	int srcu_idx;
> > +	unsigned long uvmem_pfn;
> > +
> > +	srcu_idx = srcu_read_lock(&kvm->srcu);
> > +	mutex_lock(&kvm->arch.uvmem_lock);
> > +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> > +		uvmem_page = pfn_to_page(uvmem_pfn);
> > +		pvt = uvmem_page->zone_device_data;
> > +		pvt->skip_page_out = true;
> > +	}
> > +
> > +retry:
> > +	mutex_unlock(&kvm->arch.uvmem_lock);
> > +	pfn = gfn_to_pfn(kvm, gfn);
> 
> At this point, pfn is the value obtained from the page table for
> userspace (e.g. QEMU), right?

Yes.

> I would think it should be equal to
> uvmem_pfn in most cases, shouldn't it?

Yes, in most cases (Common case is to share a page that is already secure)

> If not, what is it going to
> be?

It can be a regular pfn if non-secure page is being shared again.

> 
> > +	if (is_error_noslot_pfn(pfn))
> > +		goto out;
> > +
> > +	mutex_lock(&kvm->arch.uvmem_lock);
> > +	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
> > +		uvmem_page = pfn_to_page(uvmem_pfn);
> > +		pvt = uvmem_page->zone_device_data;
> > +		pvt->skip_page_out = true;
> > +		kvm_release_pfn_clean(pfn);
> 
> This is going to do a put_page(), unless pfn is a reserved pfn.  If it
> does a put_page(), where did we do the corresponding get_page()?

gfn_to_pfn() will come with a reference held.

> However, since kvmppc_gfn_is_uvmem_pfn() returned true, doesn't that
> mean that pfn here should be a device pfn, and in fact should be the
> same as uvmem_pfn (possibly with some extra bit(s) set)?

If secure page is being converted to share, pfn will be uvmem_pfn (device pfn).
If not, it will be regular pfn.

>  What does
> kvm_is_reserved_pfn() return for a device pfn?

From this code patch, we will never call kvm_release_pfn_clean() on a device
pfn. The prior call to gfn_to_pfn() would fault, result in page-out thus
converting the device pfn to regular pfn (page share request for secure page
case).

Regards,
Bharata.
Bharata B Rao Nov. 6, 2019, 8:29 a.m. UTC | #3
On Wed, Nov 06, 2019 at 01:52:39PM +0530, Bharata B Rao wrote:
> > However, since kvmppc_gfn_is_uvmem_pfn() returned true, doesn't that
> > mean that pfn here should be a device pfn, and in fact should be the
> > same as uvmem_pfn (possibly with some extra bit(s) set)?
> 
> If secure page is being converted to share, pfn will be uvmem_pfn (device pfn).

Also, kvmppc_gfn_is_uvmem_pfn() needn't always return true. It returns
true only for secure pages, while this routine handles sharing of both
secure and normal pages.

Regards,
Bharata.

Patch
diff mbox series

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index 4150732c81a0..13bd870609c3 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -342,6 +342,9 @@ 
 #define H_TLB_INVALIDATE	0xF808
 #define H_COPY_TOFROM_GUEST	0xF80C
 
+/* Flags for H_SVM_PAGE_IN */
+#define H_PAGE_IN_SHARED        0x1
+
 /* Platform-specific hcalls used by the Ultravisor */
 #define H_SVM_PAGE_IN		0xEF00
 #define H_SVM_PAGE_OUT		0xEF04
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index fe456fd07c74..d9395a23b10d 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -19,7 +19,10 @@ 
  * available in the platform for running secure guests is hotplugged.
  * Whenever a page belonging to the guest becomes secure, a page from this
  * private device memory is used to represent and track that secure page
- * on the HV side.
+ * on the HV side. Some pages (like virtio buffers, VPA pages etc) are
+ * shared between UV and HV. However such pages aren't represented by
+ * device private memory and mappings to shared memory exist in both
+ * UV and HV page tables.
  */
 
 /*
@@ -64,6 +67,9 @@ 
  * UV splits and remaps the 2MB page if necessary and copies out the
  * required 64K page contents.
  *
+ * Shared pages: Whenever guest shares a secure page, UV will split and
+ * remap the 2MB page if required and issue H_SVM_PAGE_IN with 64K page size.
+ *
  * In summary, the current secure pages handling code in HV assumes
  * 64K page size and in fact fails any page-in/page-out requests of
  * non-64K size upfront. If and when UV starts supporting multiple
@@ -93,6 +99,7 @@  struct kvmppc_uvmem_slot {
 struct kvmppc_uvmem_page_pvt {
 	struct kvm *kvm;
 	unsigned long gpa;
+	bool skip_page_out;
 };
 
 int kvmppc_uvmem_slot_init(struct kvm *kvm, const struct kvm_memory_slot *slot)
@@ -329,8 +336,64 @@  kvmppc_svm_page_in(struct vm_area_struct *vma, unsigned long start,
 	return ret;
 }
 
+/*
+ * Shares the page with HV, thus making it a normal page.
+ *
+ * - If the page is already secure, then provision a new page and share
+ * - If the page is a normal page, share the existing page
+ *
+ * In the former case, uses dev_pagemap_ops.migrate_to_ram handler
+ * to unmap the device page from QEMU's page tables.
+ */
+static unsigned long
+kvmppc_share_page(struct kvm *kvm, unsigned long gpa, unsigned long page_shift)
+{
+
+	int ret = H_PARAMETER;
+	struct page *uvmem_page;
+	struct kvmppc_uvmem_page_pvt *pvt;
+	unsigned long pfn;
+	unsigned long gfn = gpa >> page_shift;
+	int srcu_idx;
+	unsigned long uvmem_pfn;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	mutex_lock(&kvm->arch.uvmem_lock);
+	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+		uvmem_page = pfn_to_page(uvmem_pfn);
+		pvt = uvmem_page->zone_device_data;
+		pvt->skip_page_out = true;
+	}
+
+retry:
+	mutex_unlock(&kvm->arch.uvmem_lock);
+	pfn = gfn_to_pfn(kvm, gfn);
+	if (is_error_noslot_pfn(pfn))
+		goto out;
+
+	mutex_lock(&kvm->arch.uvmem_lock);
+	if (kvmppc_gfn_is_uvmem_pfn(gfn, kvm, &uvmem_pfn)) {
+		uvmem_page = pfn_to_page(uvmem_pfn);
+		pvt = uvmem_page->zone_device_data;
+		pvt->skip_page_out = true;
+		kvm_release_pfn_clean(pfn);
+		goto retry;
+	}
+
+	if (!uv_page_in(kvm->arch.lpid, pfn << page_shift, gpa, 0, page_shift))
+		ret = H_SUCCESS;
+	kvm_release_pfn_clean(pfn);
+	mutex_unlock(&kvm->arch.uvmem_lock);
+out:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return ret;
+}
+
 /*
  * H_SVM_PAGE_IN: Move page from normal memory to secure memory.
+ *
+ * H_PAGE_IN_SHARED flag makes the page shared which means that the same
+ * memory in is visible from both UV and HV.
  */
 unsigned long
 kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
@@ -345,9 +408,12 @@  kvmppc_h_svm_page_in(struct kvm *kvm, unsigned long gpa,
 	if (page_shift != PAGE_SHIFT)
 		return H_P3;
 
-	if (flags)
+	if (flags & ~H_PAGE_IN_SHARED)
 		return H_P2;
 
+	if (flags & H_PAGE_IN_SHARED)
+		return kvmppc_share_page(kvm, gpa, page_shift);
+
 	ret = H_PARAMETER;
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 	down_read(&kvm->mm->mmap_sem);
@@ -388,6 +454,7 @@  kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
 	unsigned long src_pfn, dst_pfn = 0;
 	struct migrate_vma mig;
 	struct page *dpage, *spage;
+	struct kvmppc_uvmem_page_pvt *pvt;
 	unsigned long pfn;
 	int ret = U_SUCCESS;
 
@@ -421,10 +488,20 @@  kvmppc_svm_page_out(struct vm_area_struct *vma, unsigned long start,
 	}
 
 	lock_page(dpage);
+	pvt = spage->zone_device_data;
 	pfn = page_to_pfn(dpage);
 
-	ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
-			  gpa, 0, page_shift);
+	/*
+	 * This function is used in two cases:
+	 * - When HV touches a secure page, for which we do UV_PAGE_OUT
+	 * - When a secure page is converted to shared page, we *get*
+	 *   the page to essentially unmap the device page. In this
+	 *   case we skip page-out.
+	 */
+	if (!pvt->skip_page_out)
+		ret = uv_page_out(kvm->arch.lpid, pfn << page_shift,
+				  gpa, 0, page_shift);
+
 	if (ret == U_SUCCESS)
 		*mig.dst = migrate_pfn(pfn) | MIGRATE_PFN_LOCKED;
 	else {