Patchwork [v2,1/2] powerpc/kvm: fix rare but potential deadlock scene

login
register
mail settings
Submitter Liu Ping Fan
Date Nov. 7, 2013, 6:22 a.m.
Message ID <1383805375-14766-1-git-send-email-pingfank@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/289198/
State Not Applicable
Headers show

Comments

Liu Ping Fan - Nov. 7, 2013, 6:22 a.m.
Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
realmode, so it can trigger the deadlock.

Suppose the following scene:

Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus.

If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out,
and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in
realmode for a long time.

What makes things even worse if the following happens,
  On cpuM, bitlockX is hold, on cpuN, Y is hold.
  vcpu_B_2 try to lock Y on cpuM in realmode
  vcpu_A_2 try to lock X on cpuN in realmode

Oops! deadlock happens

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
Alexander Graf - Nov. 7, 2013, 9:54 a.m.
On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:

> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
> realmode, so it can trigger the deadlock.
> 
> Suppose the following scene:
> 
> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus.
> 
> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out,
> and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in
> realmode for a long time.
> 
> What makes things even worse if the following happens,
>  On cpuM, bitlockX is hold, on cpuN, Y is hold.
>  vcpu_B_2 try to lock Y on cpuM in realmode
>  vcpu_A_2 try to lock X on cpuN in realmode
> 
> Oops! deadlock happens
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>

Very nice catch :).

I think it makes a lot of sense to document the fact that kvmppc_hv_find_lock_hpte() should be called with preemption disabled in a comment above the function, so that next time when someone potentially calls it, he knows that he needs to put preempt_disable() around it.

Thanks a lot for finding this pretty subtle issue. May I ask how you got there? Did you actually see systems deadlock because of this?


Alex

> ---
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 043eec8..dbc1478 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> 	}
> 
> 	/* Find the HPTE in the hash table */
> +	preempt_disable();
> 	index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
> 					 HPTE_V_VALID | HPTE_V_ABSENT);
> -	if (index < 0)
> +	if (index < 0) {
> +		preempt_enable();
> 		return -ENOENT;
> +	}
> 	hptep = (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
> 	v = hptep[0] & ~HPTE_V_HVLOCK;
> 	gr = kvm->arch.revmap[index].guest_rpte;
> @@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
> 	/* Unlock the HPTE */
> 	asm volatile("lwsync" : : : "memory");
> 	hptep[0] = v;
> +	preempt_enable();
> 
> 	gpte->eaddr = eaddr;
> 	gpte->vpage = ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff);
> -- 
> 1.8.1.4
>
Liu Ping Fan - Nov. 8, 2013, 2:07 a.m.
On Thu, Nov 7, 2013 at 5:54 PM, Alexander Graf <agraf@suse.de> wrote:
>
> On 07.11.2013, at 07:22, Liu Ping Fan <kernelfans@gmail.com> wrote:
>
>> Since kvmppc_hv_find_lock_hpte() is called from both virtmode and
>> realmode, so it can trigger the deadlock.
>>
>> Suppose the following scene:
>>
>> Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus.
>>
>> If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out,
>> and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in
>> realmode for a long time.
>>
>> What makes things even worse if the following happens,
>>  On cpuM, bitlockX is hold, on cpuN, Y is hold.
>>  vcpu_B_2 try to lock Y on cpuM in realmode
>>  vcpu_A_2 try to lock X on cpuN in realmode
>>
>> Oops! deadlock happens
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>
> Very nice catch :).
>
> I think it makes a lot of sense to document the fact that kvmppc_hv_find_lock_hpte() should be called with preemption disabled in a comment above the function, so that next time when someone potentially calls it, he knows that he needs to put preempt_disable() around it.
>
Ok, I will document them in v3

> Thanks a lot for finding this pretty subtle issue. May I ask how you got there? Did you actually see systems deadlock because of this?
>
Intuition :). then I begin try to model a scene which causes the
deadlock. And fortunately, I find a case to verify my suspension.

Regards,
Pingfan
>
> Alex
>
>> ---
>> arch/powerpc/kvm/book3s_64_mmu_hv.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> index 043eec8..dbc1478 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>> @@ -474,10 +474,13 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
>>       }
>>
>>       /* Find the HPTE in the hash table */
>> +     preempt_disable();
>>       index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
>>                                        HPTE_V_VALID | HPTE_V_ABSENT);
>> -     if (index < 0)
>> +     if (index < 0) {
>> +             preempt_enable();
>>               return -ENOENT;
>> +     }
>>       hptep = (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
>>       v = hptep[0] & ~HPTE_V_HVLOCK;
>>       gr = kvm->arch.revmap[index].guest_rpte;
>> @@ -485,6 +488,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
>>       /* Unlock the HPTE */
>>       asm volatile("lwsync" : : : "memory");
>>       hptep[0] = v;
>> +     preempt_enable();
>>
>>       gpte->eaddr = eaddr;
>>       gpte->vpage = ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff);
>> --
>> 1.8.1.4
>>
>

Patch

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 043eec8..dbc1478 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -474,10 +474,13 @@  static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	}
 
 	/* Find the HPTE in the hash table */
+	preempt_disable();
 	index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v,
 					 HPTE_V_VALID | HPTE_V_ABSENT);
-	if (index < 0)
+	if (index < 0) {
+		preempt_enable();
 		return -ENOENT;
+	}
 	hptep = (unsigned long *)(kvm->arch.hpt_virt + (index << 4));
 	v = hptep[0] & ~HPTE_V_HVLOCK;
 	gr = kvm->arch.revmap[index].guest_rpte;
@@ -485,6 +488,7 @@  static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	/* Unlock the HPTE */
 	asm volatile("lwsync" : : : "memory");
 	hptep[0] = v;
+	preempt_enable();
 
 	gpte->eaddr = eaddr;
 	gpte->vpage = ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff);