diff mbox

[Trusty] UBUNTU: SAUCE: Fix kvm hangs on i386

Message ID 1396429188-6447-1-git-send-email-stefan.bader@canonical.com
State New
Headers show

Commit Message

Stefan Bader April 2, 2014, 8:59 a.m. UTC
Unfortunately completely failed to proceed with this one (me and
upstream). This hack is curing at least the big issue with kvm.
As noted in the patch, there is a series of fixes after 3.13 which
I had been picking up while looking into this. These would be stable
material (minus maybe the idle changes though they might be useful for
other reasons and I did not want to make guesses about the things that
patch #3 changes there).

1. x86, acpi, idle: Restructure the mwait idle routines
2. x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
3. sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
4. sched/preempt/x86: Fix voluntary preempt for x86

This just for documentation and in case anything else looks odd in
the scheduler and this might be a reason. Its now too late to pick those
and the hack works without.

-Stefan

---

From 951bca51ab8456d2a2cbca33f440680ae675b5e4 Mon Sep 17 00:00:00 2001
From: Stefan Bader <stefan.bader@canonical.com>
Date: Tue, 1 Apr 2014 20:26:00 +0200
Subject: [PATCH] UBUNTU: SAUCE: kvm: Use schedule instead of cond_reschedule
 on i386

Since commit c2daa3bed53a81171cf8c1a36db798e82b91afe8
  Author: Peter Zijlstra <peterz@infradead.org>
  sched, x86: Provide a per-cpu preempt_count implementation

there is a problem on 32bit x86 which causes kvm to run wildly in a
tight loop. The cause is that since the preempt_count change it seems
possible on 32bit to have the thread info flag TIF_NEED_RESCHED set
but the IPI that in theory should update that flag into the per-cpu
counter does not see it. This leaves us in a bad place as this
counts as the process should reschule but is not allowed to. And
kvm unfortunately relies on those things to be in sync.

There is actually a list of fixes to this in general which was not yet
pushed to stable (since we never found a resolution to this issue). So
i386 may behave unfair in general. This here just allows to run kvm on
i386 at all.

Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/kvm/x86.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Andy Whitcroft April 2, 2014, 9:31 a.m. UTC | #1
On Wed, Apr 02, 2014 at 10:59:48AM +0200, Stefan Bader wrote:
> Unfortunately completely failed to proceed with this one (me and
> upstream). This hack is curing at least the big issue with kvm.
> As noted in the patch, there is a series of fixes after 3.13 which
> I had been picking up while looking into this. These would be stable
> material (minus maybe the idle changes though they might be useful for
> other reasons and I did not want to make guesses about the things that
> patch #3 changes there).
> 
> 1. x86, acpi, idle: Restructure the mwait idle routines
> 2. x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
> 3. sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
> 4. sched/preempt/x86: Fix voluntary preempt for x86
> 
> This just for documentation and in case anything else looks odd in
> the scheduler and this might be a reason. Its now too late to pick those
> and the hack works without.
> 
> -Stefan
> 
> ---
> 
> From 951bca51ab8456d2a2cbca33f440680ae675b5e4 Mon Sep 17 00:00:00 2001
> From: Stefan Bader <stefan.bader@canonical.com>
> Date: Tue, 1 Apr 2014 20:26:00 +0200
> Subject: [PATCH] UBUNTU: SAUCE: kvm: Use schedule instead of cond_reschedule
>  on i386
> 
> Since commit c2daa3bed53a81171cf8c1a36db798e82b91afe8
>   Author: Peter Zijlstra <peterz@infradead.org>
>   sched, x86: Provide a per-cpu preempt_count implementation
> 
> there is a problem on 32bit x86 which causes kvm to run wildly in a
> tight loop. The cause is that since the preempt_count change it seems
> possible on 32bit to have the thread info flag TIF_NEED_RESCHED set
> but the IPI that in theory should update that flag into the per-cpu
> counter does not see it. This leaves us in a bad place as this
> counts as the process should reschule but is not allowed to. And
> kvm unfortunately relies on those things to be in sync.
> 
> There is actually a list of fixes to this in general which was not yet
> pushed to stable (since we never found a resolution to this issue). So
> i386 may behave unfair in general. This here just allows to run kvm on
> i386 at all.
> 
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  arch/x86/kvm/x86.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6dc4904..512b1ce 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6092,7 +6092,21 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>  		}
>  		if (need_resched()) {
>  			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
> +#ifdef CONFIG_X86_32
> +			/*
> +			 * This is a hack around a current bug on i386
> +			 * that causes TIF_NEED_RESCHED not being propagated
> +			 * into the per-cpu counter. This causes cond_resched()
> +			 * which is what kvm_resched() calls to exit without
> +			 * actually rescheduling. We continue immediately with
> +			 * the loop but vcpu_enter_guest checks for the process
> +			 * flag and immediately exits and this loops busily
> +			 * runs forever.
> +			 */
> +			schedule();
> +#else
>  			kvm_resched(vcpu);
> +#endif
>  			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>  		}
>  	}

I am slightly worried that switching to schedule() here is triggering
more work that kvm_resched() would have done.  As the stated bug is that
TIF_NEED_RESCHED does not get copied over correctly "sometimes".  Could
we not make this patch add like:

	if (test_tsk_need_resched(current))
		set_preempt_need_resched();

Just before kvm_resched() in the 32bit case?

-apw
Stefan Bader April 2, 2014, 9:41 a.m. UTC | #2
On 02.04.2014 11:31, Andy Whitcroft wrote:
> On Wed, Apr 02, 2014 at 10:59:48AM +0200, Stefan Bader wrote:
>> Unfortunately completely failed to proceed with this one (me and
>> upstream). This hack is curing at least the big issue with kvm.
>> As noted in the patch, there is a series of fixes after 3.13 which
>> I had been picking up while looking into this. These would be stable
>> material (minus maybe the idle changes though they might be useful for
>> other reasons and I did not want to make guesses about the things that
>> patch #3 changes there).
>>
>> 1. x86, acpi, idle: Restructure the mwait idle routines
>> 2. x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
>> 3. sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
>> 4. sched/preempt/x86: Fix voluntary preempt for x86
>>
>> This just for documentation and in case anything else looks odd in
>> the scheduler and this might be a reason. Its now too late to pick those
>> and the hack works without.
>>
>> -Stefan
>>
>> ---
>>
>> From 951bca51ab8456d2a2cbca33f440680ae675b5e4 Mon Sep 17 00:00:00 2001
>> From: Stefan Bader <stefan.bader@canonical.com>
>> Date: Tue, 1 Apr 2014 20:26:00 +0200
>> Subject: [PATCH] UBUNTU: SAUCE: kvm: Use schedule instead of cond_reschedule
>>  on i386
>>
>> Since commit c2daa3bed53a81171cf8c1a36db798e82b91afe8
>>   Author: Peter Zijlstra <peterz@infradead.org>
>>   sched, x86: Provide a per-cpu preempt_count implementation
>>
>> there is a problem on 32bit x86 which causes kvm to run wildly in a
>> tight loop. The cause is that since the preempt_count change it seems
>> possible on 32bit to have the thread info flag TIF_NEED_RESCHED set
>> but the IPI that in theory should update that flag into the per-cpu
>> counter does not see it. This leaves us in a bad place as this
>> counts as the process should reschule but is not allowed to. And
>> kvm unfortunately relies on those things to be in sync.
>>
>> There is actually a list of fixes to this in general which was not yet
>> pushed to stable (since we never found a resolution to this issue). So
>> i386 may behave unfair in general. This here just allows to run kvm on
>> i386 at all.
>>
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> ---
>>  arch/x86/kvm/x86.c |   14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 6dc4904..512b1ce 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6092,7 +6092,21 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
>>  		}
>>  		if (need_resched()) {
>>  			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
>> +#ifdef CONFIG_X86_32
>> +			/*
>> +			 * This is a hack around a current bug on i386
>> +			 * that causes TIF_NEED_RESCHED not being propagated
>> +			 * into the per-cpu counter. This causes cond_resched()
>> +			 * which is what kvm_resched() calls to exit without
>> +			 * actually rescheduling. We continue immediately with
>> +			 * the loop but vcpu_enter_guest checks for the process
>> +			 * flag and immediately exits and this loops busily
>> +			 * runs forever.
>> +			 */
>> +			schedule();
>> +#else
>>  			kvm_resched(vcpu);
>> +#endif
>>  			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
>>  		}
>>  	}
> 
> I am slightly worried that switching to schedule() here is triggering
> more work that kvm_resched() would have done.  As the stated bug is that
> TIF_NEED_RESCHED does not get copied over correctly "sometimes".  Could
> we not make this patch add like:
> 
> 	if (test_tsk_need_resched(current))
> 		set_preempt_need_resched();
> 
> Just before kvm_resched() in the 32bit case?
> 
> -apw
> 

I am working on something equivalent. Will update here when I got it compiled
and tested.
diff mbox

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6dc4904..512b1ce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6092,7 +6092,21 @@  static int __vcpu_run(struct kvm_vcpu *vcpu)
 		}
 		if (need_resched()) {
 			srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+#ifdef CONFIG_X86_32
+			/*
+			 * This is a hack around a current bug on i386
+			 * that causes TIF_NEED_RESCHED not being propagated
+			 * into the per-cpu counter. This causes cond_resched()
+			 * which is what kvm_resched() calls to exit without
+			 * actually rescheduling. We continue immediately with
+			 * the loop but vcpu_enter_guest checks for the process
+			 * flag and immediately exits and this loops busily
+			 * runs forever.
+			 */
+			schedule();
+#else
 			kvm_resched(vcpu);
+#endif
 			vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
 		}
 	}