diff mbox

[4/5] KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots

Message ID 20120806100655.GE8980@bloggs.ozlabs.ibm.com
State New, archived
Headers show

Commit Message

Paul Mackerras Aug. 6, 2012, 10:06 a.m. UTC
The generic KVM code uses SRCU (sleeping RCU) to protect accesses
to the memslots data structures against updates due to userspace
adding, modifying or removing memory slots.  We need to do that too,
both to avoid accessing stale copies of the memslots and to avoid
lockdep warnings.  This therefore adds srcu_read_lock/unlock pairs
around code that accesses and uses memslots.

Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE
need to access the memslots, and we don't want to call the SRCU code
in real mode (since we have no assurance that it would only access
the linear mapping), we hold the SRCU read lock for the VM while
in the guest.  This does mean that adding or removing memory slots
while some vcpus are executing in the guest will block for up to
two jiffies.  This tradeoff is acceptable since adding/removing
memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE
are performance-critical hot paths.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   17 +++++++++++++----
 arch/powerpc/kvm/book3s_hv.c        |   27 +++++++++++++++++++++++----
 2 files changed, 36 insertions(+), 8 deletions(-)

Comments

Marcelo Tosatti Aug. 9, 2012, 6:22 p.m. UTC | #1
On Mon, Aug 06, 2012 at 08:06:55PM +1000, Paul Mackerras wrote:
> The generic KVM code uses SRCU (sleeping RCU) to protect accesses
> to the memslots data structures against updates due to userspace
> adding, modifying or removing memory slots.  We need to do that too,
> both to avoid accessing stale copies of the memslots and to avoid
> lockdep warnings.  This therefore adds srcu_read_lock/unlock pairs
> around code that accesses and uses memslots.
> 
> Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE
> need to access the memslots, and we don't want to call the SRCU code
> in real mode (since we have no assurance that it would only access
> the linear mapping), we hold the SRCU read lock for the VM while
> in the guest.  This does mean that adding or removing memory slots
> while some vcpus are executing in the guest will block for up to
> two jiffies.  This tradeoff is acceptable since adding/removing
> memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE
> are performance-critical hot paths.

I would avoid doing this. What prevents the guest entry loop in the kernel to
simply reenter without ever unlocking SRCU?

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Mackerras Aug. 10, 2012, 12:45 a.m. UTC | #2
On Thu, Aug 09, 2012 at 03:22:38PM -0300, Marcelo Tosatti wrote:
> On Mon, Aug 06, 2012 at 08:06:55PM +1000, Paul Mackerras wrote:
> > The generic KVM code uses SRCU (sleeping RCU) to protect accesses
> > to the memslots data structures against updates due to userspace
> > adding, modifying or removing memory slots.  We need to do that too,
> > both to avoid accessing stale copies of the memslots and to avoid
> > lockdep warnings.  This therefore adds srcu_read_lock/unlock pairs
> > around code that accesses and uses memslots.
> > 
> > Since the real-mode handlers for H_ENTER, H_REMOVE and H_BULK_REMOVE
> > need to access the memslots, and we don't want to call the SRCU code
> > in real mode (since we have no assurance that it would only access
> > the linear mapping), we hold the SRCU read lock for the VM while
> > in the guest.  This does mean that adding or removing memory slots
> > while some vcpus are executing in the guest will block for up to
> > two jiffies.  This tradeoff is acceptable since adding/removing
> > memory slots only happens rarely, while H_ENTER/H_REMOVE/H_BULK_REMOVE
> > are performance-critical hot paths.
> 
> I would avoid doing this. What prevents the guest entry loop in the kernel to
> simply reenter without ever unlocking SRCU?

I take and release the SRCU lock inside the guest entry loop, so in
fact we will release the SRCU lock every time we pop out of the guest.

As for holding the SRCU read lock while we're in the guest - that is
what enables us to do H_ENTER etc. (the hypercalls that the guest uses
to update the hardware page table) in real mode, i.e. without
switching the MMU over to the kernel context.  In real mode we can
access the linear mapping but not vmalloc, ioremap or user space, so I
am nervous about doing srcu_read_lock/unlock in real mode.  I really
don't want to switch to kernel mode for those hypercalls because that
switch is relatively slow and requires pulling all of the SMT threads
in the core out of the guest (because of hardware restrictions).

So, holding the SRCU read lock while in the guest is the least ugly
approach AFAICS.  Yes it makes memslot addition/removal a bit slower,
but only for this VM, and it doesn't delay grace periods for other
forms of RCU, so its impact is pretty limited.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 87735a7..6e17e97 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -24,6 +24,7 @@ 
 #include <linux/slab.h>
 #include <linux/hugetlb.h>
 #include <linux/vmalloc.h>
+#include <linux/srcu.h>
 
 #include <asm/tlbflush.h>
 #include <asm/kvm_ppc.h>
@@ -1080,20 +1081,22 @@  void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
 	unsigned long hva, psize, offset;
 	unsigned long pa;
 	unsigned long *physp;
+	int srcu_idx;
 
+	srcu_idx = srcu_read_lock(&kvm->srcu);
 	memslot = gfn_to_memslot(kvm, gfn);
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
-		return NULL;
+		goto err;
 	if (!kvm->arch.using_mmu_notifiers) {
 		physp = kvm->arch.slot_phys[memslot->id];
 		if (!physp)
-			return NULL;
+			goto err;
 		physp += gfn - memslot->base_gfn;
 		pa = *physp;
 		if (!pa) {
 			if (kvmppc_get_guest_page(kvm, gfn, memslot,
 						  PAGE_SIZE) < 0)
-				return NULL;
+				goto err;
 			pa = *physp;
 		}
 		page = pfn_to_page(pa >> PAGE_SHIFT);
@@ -1102,9 +1105,11 @@  void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
 		hva = gfn_to_hva_memslot(memslot, gfn);
 		npages = get_user_pages_fast(hva, 1, 1, pages);
 		if (npages < 1)
-			return NULL;
+			goto err;
 		page = pages[0];
 	}
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+
 	psize = PAGE_SIZE;
 	if (PageHuge(page)) {
 		page = compound_head(page);
@@ -1114,6 +1119,10 @@  void *kvmppc_pin_guest_page(struct kvm *kvm, unsigned long gpa,
 	if (nb_ret)
 		*nb_ret = psize - offset;
 	return page_address(page) + offset;
+
+ err:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	return NULL;
 }
 
 void kvmppc_unpin_guest_page(struct kvm *kvm, void *va)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index aad20ca0..067d0b6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -30,6 +30,7 @@ 
 #include <linux/cpumask.h>
 #include <linux/spinlock.h>
 #include <linux/page-flags.h>
+#include <linux/srcu.h>
 
 #include <asm/reg.h>
 #include <asm/cputable.h>
@@ -366,13 +367,16 @@  int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 	unsigned long req = kvmppc_get_gpr(vcpu, 3);
 	unsigned long target, ret = H_SUCCESS;
 	struct kvm_vcpu *tvcpu;
+	int idx;
 
 	switch (req) {
 	case H_ENTER:
+		idx = srcu_read_lock(&vcpu->kvm->srcu);
 		ret = kvmppc_virtmode_h_enter(vcpu, kvmppc_get_gpr(vcpu, 4),
 					      kvmppc_get_gpr(vcpu, 5),
 					      kvmppc_get_gpr(vcpu, 6),
 					      kvmppc_get_gpr(vcpu, 7));
+		srcu_read_unlock(&vcpu->kvm->srcu, idx);
 		break;
 	case H_CEDE:
 		break;
@@ -411,6 +415,7 @@  static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			      struct task_struct *tsk)
 {
 	int r = RESUME_HOST;
+	int srcu_idx;
 
 	vcpu->stat.sum_exits++;
 
@@ -470,12 +475,16 @@  static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 * have been handled already.
 	 */
 	case BOOK3S_INTERRUPT_H_DATA_STORAGE:
+		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		r = kvmppc_book3s_hv_page_fault(run, vcpu,
 				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
 		break;
 	case BOOK3S_INTERRUPT_H_INST_STORAGE:
+		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		r = kvmppc_book3s_hv_page_fault(run, vcpu,
 				kvmppc_get_pc(vcpu), 0);
+		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
 		break;
 	/*
 	 * This occurs if the guest executes an illegal instruction.
@@ -820,6 +829,7 @@  static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	long ret;
 	u64 now;
 	int ptid, i, need_vpa_update;
+	int srcu_idx;
 
 	/* don't start if any threads have a signal pending */
 	need_vpa_update = 0;
@@ -898,6 +908,9 @@  static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	spin_unlock(&vc->lock);
 
 	kvm_guest_enter();
+
+	srcu_idx = srcu_read_lock(&vcpu0->kvm->srcu);
+
 	__kvmppc_vcore_entry(NULL, vcpu0);
 	for (i = 0; i < threads_per_core; ++i)
 		kvmppc_release_hwthread(vc->pcpu + i);
@@ -913,6 +926,8 @@  static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	vc->vcore_state = VCORE_EXITING;
 	spin_unlock(&vc->lock);
 
+	srcu_read_unlock(&vcpu0->kvm->srcu, srcu_idx);
+
 	/* make sure updates to secondary vcpu structs are visible now */
 	smp_mb();
 	kvm_guest_exit();
@@ -1383,6 +1398,7 @@  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	unsigned long rmls;
 	unsigned long *physp;
 	unsigned long i, npages;
+	int srcu_idx;
 
 	mutex_lock(&kvm->lock);
 	if (kvm->arch.rma_setup_done)
@@ -1398,12 +1414,13 @@  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	}
 
 	/* Look up the memslot for guest physical address 0 */
+	srcu_idx = srcu_read_lock(&kvm->srcu);
 	memslot = gfn_to_memslot(kvm, 0);
 
 	/* We must have some memory at 0 by now */
 	err = -EINVAL;
 	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
-		goto out;
+		goto out_srcu;
 
 	/* Look up the VMA for the start of this memory slot */
 	hva = memslot->userspace_addr;
@@ -1427,14 +1444,14 @@  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 		err = -EPERM;
 		if (cpu_has_feature(CPU_FTR_ARCH_201)) {
 			pr_err("KVM: CPU requires an RMO\n");
-			goto out;
+			goto out_srcu;
 		}
 
 		/* We can handle 4k, 64k or 16M pages in the VRMA */
 		err = -EINVAL;
 		if (!(psize == 0x1000 || psize == 0x10000 ||
 		      psize == 0x1000000))
-			goto out;
+			goto out_srcu;
 
 		/* Update VRMASD field in the LPCR */
 		senc = slb_pgsize_encoding(psize);
@@ -1457,7 +1474,7 @@  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 		err = -EINVAL;
 		if (rmls < 0) {
 			pr_err("KVM: Can't use RMA of 0x%lx bytes\n", rma_size);
-			goto out;
+			goto out_srcu;
 		}
 		atomic_inc(&ri->use_count);
 		kvm->arch.rma = ri;
@@ -1497,6 +1514,8 @@  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	smp_wmb();
 	kvm->arch.rma_setup_done = 1;
 	err = 0;
+ out_srcu:
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
  out:
 	mutex_unlock(&kvm->lock);
 	return err;