Patchwork [Maverick] SRU: Reboot of linux-virtual hangs on EC2

login
register
mail settings
Submitter Stefan Bader
Date March 2, 2011, 2:57 p.m.
Message ID <1299077834-27946-1-git-send-email-stefan.bader@canonical.com>
Download mbox | patch
Permalink /patch/85090/
State Accepted
Delegated to: Tim Gardner
Headers show

Comments

Stefan Bader - March 2, 2011, 2:57 p.m.
SRU Justification:

Impact: On reboot or shutdown the current Xen code does try to stop other
CPUs. However the IPI communication is disabled already at that point.
So issuing a reboot or shutdown from within an instance with multiple
vcpus hangs.

Fix: Cherry pick of an upstream patch (added around 2.6.37) removes the
attempt to stop other CPUs.

Testcase: From within an EC2 instance, call "sudo reboot". Instance never
comes up again. With the patch applied it does.

-Stefan

---

From bf8e8b6da4821b59ea162eec16e05f3a6d8a0f9a Mon Sep 17 00:00:00 2001
From: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Mon, 29 Nov 2010 14:16:53 -0800
Subject: [PATCH] xen: don't bother to stop other cpus on shutdown/reboot

Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
no need to do it manually.

In any case it will fail because all the IPI irqs have been pulled
down by this point, so the cross-CPU calls will simply hang forever.

Until change 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce the function calls
were not synchronously waited for, so this wasn't apparent.  However after
that change the calls became synchronous leading to a hang on shutdown
on multi-VCPU guests.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Stable Kernel <stable@kernel.org>
Cc: Alok Kataria <akataria@vmware.com>

BugLink: http://bugs.launchpad.net/bugs/727814

(cherry-picked from commit 31e323cca9d5c8afd372976c35a5d46192f540d1 upstream)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 arch/x86/xen/enlighten.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)
Stefan Bader - March 2, 2011, 3:13 p.m.
On 03/02/2011 03:57 PM, Stefan Bader wrote:
> SRU Justification:
> 
> Impact: On reboot or shutdown the current Xen code does try to stop other
> CPUs. However the IPI communication is disabled already at that point.
> So issuing a reboot or shutdown from within an instance with multiple
> vcpus hangs.
> 
> Fix: Cherry pick of an upstream patch (added around 2.6.37) removes the
> attempt to stop other CPUs.
> 
> Testcase: From within an EC2 instance, call "sudo reboot". Instance never
> comes up again. With the patch applied it does.
> 
> -Stefan

Forgot to mention: this is also on the 2.6.35-longterm tree.
Tim Gardner - March 2, 2011, 3:59 p.m.
On 03/02/2011 08:13 AM, Stefan Bader wrote:
> On 03/02/2011 03:57 PM, Stefan Bader wrote:
>> SRU Justification:
>>
>> Impact: On reboot or shutdown the current Xen code does try to stop other
>> CPUs. However the IPI communication is disabled already at that point.
>> So issuing a reboot or shutdown from within an instance with multiple
>> vcpus hangs.
>>
>> Fix: Cherry pick of an upstream patch (added around 2.6.37) removes the
>> attempt to stop other CPUs.
>>
>> Testcase: From within an EC2 instance, call "sudo reboot". Instance never
>> comes up again. With the patch applied it does.
>>
>> -Stefan
>
> Forgot to mention: this is also on the 2.6.35-longterm tree.
>

Acked-by: Tim Gardner <tim.gardner@canonical.com>
Brad Figg - March 2, 2011, 6:33 p.m.
On 03/02/2011 06:57 AM, Stefan Bader wrote:
> SRU Justification:
>
> Impact: On reboot or shutdown the current Xen code does try to stop other
> CPUs. However the IPI communication is disabled already at that point.
> So issuing a reboot or shutdown from within an instance with multiple
> vcpus hangs.
>
> Fix: Cherry pick of an upstream patch (added around 2.6.37) removes the
> attempt to stop other CPUs.
>
> Testcase: From within an EC2 instance, call "sudo reboot". Instance never
> comes up again. With the patch applied it does.
>
> -Stefan
>
> ---
>
>  From bf8e8b6da4821b59ea162eec16e05f3a6d8a0f9a Mon Sep 17 00:00:00 2001
> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> Date: Mon, 29 Nov 2010 14:16:53 -0800
> Subject: [PATCH] xen: don't bother to stop other cpus on shutdown/reboot
>
> Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
> no need to do it manually.
>
> In any case it will fail because all the IPI irqs have been pulled
> down by this point, so the cross-CPU calls will simply hang forever.
>
> Until change 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce the function calls
> were not synchronously waited for, so this wasn't apparent.  However after
> that change the calls became synchronous leading to a hang on shutdown
> on multi-VCPU guests.
>
> Signed-off-by: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> Cc: Stable Kernel<stable@kernel.org>
> Cc: Alok Kataria<akataria@vmware.com>
>
> BugLink: http://bugs.launchpad.net/bugs/727814
>
> (cherry-picked from commit 31e323cca9d5c8afd372976c35a5d46192f540d1 upstream)
> Signed-off-by: Stefan Bader<stefan.bader@canonical.com>
> ---
>   arch/x86/xen/enlighten.c |    4 ----
>   1 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 6f6cd65..42c2d5f 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1022,10 +1022,6 @@ static void xen_reboot(int reason)
>   {
>   	struct sched_shutdown r = { .reason = reason };
>
> -#ifdef CONFIG_SMP
> -	stop_other_cpus();
> -#endif
> -
>   	if (HYPERVISOR_sched_op(SCHEDOP_shutdown,&r))
>   		BUG();
>   }

Acked-by: Brad Figg <brad.figg@canonical.com>
Tim Gardner - March 2, 2011, 9:09 p.m.
On 03/02/2011 07:57 AM, Stefan Bader wrote:
> SRU Justification:
>
> Impact: On reboot or shutdown the current Xen code does try to stop other
> CPUs. However the IPI communication is disabled already at that point.
> So issuing a reboot or shutdown from within an instance with multiple
> vcpus hangs.
>
> Fix: Cherry pick of an upstream patch (added around 2.6.37) removes the
> attempt to stop other CPUs.
>
> Testcase: From within an EC2 instance, call "sudo reboot". Instance never
> comes up again. With the patch applied it does.
>
> -Stefan
>
> ---
>
>  From bf8e8b6da4821b59ea162eec16e05f3a6d8a0f9a Mon Sep 17 00:00:00 2001
> From: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> Date: Mon, 29 Nov 2010 14:16:53 -0800
> Subject: [PATCH] xen: don't bother to stop other cpus on shutdown/reboot
>
> Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's
> no need to do it manually.
>
> In any case it will fail because all the IPI irqs have been pulled
> down by this point, so the cross-CPU calls will simply hang forever.
>
> Until change 76fac077db6b34e2c6383a7b4f3f4f7b7d06d8ce the function calls
> were not synchronously waited for, so this wasn't apparent.  However after
> that change the calls became synchronous leading to a hang on shutdown
> on multi-VCPU guests.
>
> Signed-off-by: Jeremy Fitzhardinge<jeremy.fitzhardinge@citrix.com>
> Cc: Stable Kernel<stable@kernel.org>
> Cc: Alok Kataria<akataria@vmware.com>
>
> BugLink: http://bugs.launchpad.net/bugs/727814
>
> (cherry-picked from commit 31e323cca9d5c8afd372976c35a5d46192f540d1 upstream)
> Signed-off-by: Stefan Bader<stefan.bader@canonical.com>
> ---
>   arch/x86/xen/enlighten.c |    4 ----
>   1 files changed, 0 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index 6f6cd65..42c2d5f 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1022,10 +1022,6 @@ static void xen_reboot(int reason)
>   {
>   	struct sched_shutdown r = { .reason = reason };
>
> -#ifdef CONFIG_SMP
> -	stop_other_cpus();
> -#endif
> -
>   	if (HYPERVISOR_sched_op(SCHEDOP_shutdown,&r))
>   		BUG();
>   }

applied

Patch

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 6f6cd65..42c2d5f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1022,10 +1022,6 @@  static void xen_reboot(int reason)
 {
 	struct sched_shutdown r = { .reason = reason };
 
-#ifdef CONFIG_SMP
-	stop_other_cpus();
-#endif
-
 	if (HYPERVISOR_sched_op(SCHEDOP_shutdown, &r))
 		BUG();
 }