Patchwork PPC: Don't sync timebase when inside VM

login
register
mail settings
Submitter Alexander Graf
Date March 2, 2012, 2:12 p.m.
Message ID <1330697553-27156-1-git-send-email-agraf@suse.de>
Download mbox | patch
Permalink /patch/144267/
State New
Headers show

Comments

Alexander Graf - March 2, 2012, 2:12 p.m.
When running inside a virtual machine, we can not modify timebase, so
let's just not call the functions for it then.

This resolves hangs when booting e500 SMP guests on overcommitted hosts.

Reported-by: Stuart Yoder <B08248@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
---
 arch/powerpc/platforms/85xx/smp.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)
Scott Wood - March 2, 2012, 4:20 p.m.
On Fri, Mar 02, 2012 at 03:12:33PM +0100, Alexander Graf wrote:
> When running inside a virtual machine, we can not modify timebase, so
> let's just not call the functions for it then.
> 
> This resolves hangs when booting e500 SMP guests on overcommitted hosts.
> 
> Reported-by: Stuart Yoder <B08248@freescale.com>
> Signed-off-by: Alexander Graf <agraf@suse.de>
> ---
>  arch/powerpc/platforms/85xx/smp.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
> index ff42490..d4b6c1f 100644
> --- a/arch/powerpc/platforms/85xx/smp.c
> +++ b/arch/powerpc/platforms/85xx/smp.c
> @@ -249,6 +249,13 @@ void __init mpc85xx_smp_init(void)
>  		smp_85xx_ops.cause_ipi = doorbell_cause_ipi;
>  	}
>  
> +	/* When running under a hypervisor, we can not modify tb */
> +	np = of_find_node_by_path("/hypervisor");
> +	if (np) {
> +		smp_85xx_ops.give_timebase = NULL;
> +		smp_85xx_ops.take_timebase = NULL;
> +	}
> +
>  	smp_ops = &smp_85xx_ops;

Again, for 85xx we should *never* sync the timebase in the kernel,
hypervisor or no.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Graf - March 2, 2012, 4:30 p.m.
On 02.03.2012, at 17:20, Scott Wood wrote:

> On Fri, Mar 02, 2012 at 03:12:33PM +0100, Alexander Graf wrote:
>> When running inside a virtual machine, we can not modify timebase, so
>> let's just not call the functions for it then.
>> 
>> This resolves hangs when booting e500 SMP guests on overcommitted hosts.
>> 
>> Reported-by: Stuart Yoder <B08248@freescale.com>
>> Signed-off-by: Alexander Graf <agraf@suse.de>
>> ---
>> arch/powerpc/platforms/85xx/smp.c |    7 +++++++
>> 1 files changed, 7 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
>> index ff42490..d4b6c1f 100644
>> --- a/arch/powerpc/platforms/85xx/smp.c
>> +++ b/arch/powerpc/platforms/85xx/smp.c
>> @@ -249,6 +249,13 @@ void __init mpc85xx_smp_init(void)
>> 		smp_85xx_ops.cause_ipi = doorbell_cause_ipi;
>> 	}
>> 
>> +	/* When running under a hypervisor, we can not modify tb */
>> +	np = of_find_node_by_path("/hypervisor");
>> +	if (np) {
>> +		smp_85xx_ops.give_timebase = NULL;
>> +		smp_85xx_ops.take_timebase = NULL;
>> +	}
>> +
>> 	smp_ops = &smp_85xx_ops;
> 
> Again, for 85xx we should *never* sync the timebase in the kernel,
> hypervisor or no.

The code says "if the kexec config option is enabled, do the sync". I'm fairly sure it's there for a reason.

Git blame points to the following commit:

commit f933a41e419a954ef90605224e02c3ded78f3372
Author: Matthew McClintock <msm@freescale.com>
Date:   Wed Jul 21 16:14:53 2010 -0500

    powerpc/85xx: kexec for SMP 85xx BookE systems
    
    Adds support for kexec on 85xx machines for the BookE platform.
    Including support for SMP machines
    
    Based off work from Maxim Uvarov <muvarov@mvista.com>
    Signed-off-by: Matthew McClintock <msm@freescale.com>
    Signed-off-by: Kumar Gala <galak@kernel.crashing.org>


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - March 2, 2012, 5:17 p.m.
On 03/02/2012 10:30 AM, Alexander Graf wrote:
> 
> On 02.03.2012, at 17:20, Scott Wood wrote:
> 
>> On Fri, Mar 02, 2012 at 03:12:33PM +0100, Alexander Graf wrote:
>>> When running inside a virtual machine, we can not modify timebase, so
>>> let's just not call the functions for it then.
>>>
>>> This resolves hangs when booting e500 SMP guests on overcommitted hosts.
>>>
>>> Reported-by: Stuart Yoder <B08248@freescale.com>
>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>> ---
>>> arch/powerpc/platforms/85xx/smp.c |    7 +++++++
>>> 1 files changed, 7 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
>>> index ff42490..d4b6c1f 100644
>>> --- a/arch/powerpc/platforms/85xx/smp.c
>>> +++ b/arch/powerpc/platforms/85xx/smp.c
>>> @@ -249,6 +249,13 @@ void __init mpc85xx_smp_init(void)
>>> 		smp_85xx_ops.cause_ipi = doorbell_cause_ipi;
>>> 	}
>>>
>>> +	/* When running under a hypervisor, we can not modify tb */
>>> +	np = of_find_node_by_path("/hypervisor");
>>> +	if (np) {
>>> +		smp_85xx_ops.give_timebase = NULL;
>>> +		smp_85xx_ops.take_timebase = NULL;
>>> +	}
>>> +
>>> 	smp_ops = &smp_85xx_ops;
>>
>> Again, for 85xx we should *never* sync the timebase in the kernel,
>> hypervisor or no.
> 
> The code says "if the kexec config option is enabled, do the sync". I'm fairly sure it's there for a reason.

Sigh.  I forgot about that.  It's because instead of doing kexec the
simple way, we actually physically reset the core.  We really shouldn't
do that.  And we *really* shouldn't do it just because CONFIG_KEXEC is
defined, regardless of whether we're actually booting from kexec.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
McClintock Matthew-B29882 - March 8, 2012, 5:31 p.m.
On Fri, Mar 2, 2012 at 11:17 AM, Scott Wood <scottwood@freescale.com> wrote:
> On 03/02/2012 10:30 AM, Alexander Graf wrote:
>>
>> On 02.03.2012, at 17:20, Scott Wood wrote:
>>
>>> On Fri, Mar 02, 2012 at 03:12:33PM +0100, Alexander Graf wrote:
>>>> When running inside a virtual machine, we can not modify timebase, so
>>>> let's just not call the functions for it then.
>>>>
>>>> This resolves hangs when booting e500 SMP guests on overcommitted hosts.
>>>>
>>>> Reported-by: Stuart Yoder <B08248@freescale.com>
>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>> ---
>>>> arch/powerpc/platforms/85xx/smp.c |    7 +++++++
>>>> 1 files changed, 7 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
>>>> index ff42490..d4b6c1f 100644
>>>> --- a/arch/powerpc/platforms/85xx/smp.c
>>>> +++ b/arch/powerpc/platforms/85xx/smp.c
>>>> @@ -249,6 +249,13 @@ void __init mpc85xx_smp_init(void)
>>>>             smp_85xx_ops.cause_ipi = doorbell_cause_ipi;
>>>>     }
>>>>
>>>> +   /* When running under a hypervisor, we can not modify tb */
>>>> +   np = of_find_node_by_path("/hypervisor");
>>>> +   if (np) {
>>>> +           smp_85xx_ops.give_timebase = NULL;
>>>> +           smp_85xx_ops.take_timebase = NULL;
>>>> +   }
>>>> +
>>>>     smp_ops = &smp_85xx_ops;
>>>
>>> Again, for 85xx we should *never* sync the timebase in the kernel,
>>> hypervisor or no.
>>
>> The code says "if the kexec config option is enabled, do the sync". I'm fairly sure it's there for a reason.
>
> Sigh.  I forgot about that.  It's because instead of doing kexec the
> simple way, we actually physically reset the core.  We really shouldn't
> do that.  And we *really* shouldn't do it just because CONFIG_KEXEC is
> defined, regardless of whether we're actually booting from kexec.

How would one rocver a core that's off in the weeds? We need to at
least stop it somehow before we restart into the kdump kernel. kexec
seems feasible since we have all the cores in a known state...

-M
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - March 8, 2012, 6:20 p.m.
On 03/08/2012 11:31 AM, McClintock Matthew-B29882 wrote:
> On Fri, Mar 2, 2012 at 11:17 AM, Scott Wood <scottwood@freescale.com> wrote:
>> On 03/02/2012 10:30 AM, Alexander Graf wrote:
>>>
>>> On 02.03.2012, at 17:20, Scott Wood wrote:
>>>> Again, for 85xx we should *never* sync the timebase in the kernel,
>>>> hypervisor or no.
>>>
>>> The code says "if the kexec config option is enabled, do the sync". I'm fairly sure it's there for a reason.
>>
>> Sigh.  I forgot about that.  It's because instead of doing kexec the
>> simple way, we actually physically reset the core.  We really shouldn't
>> do that.  And we *really* shouldn't do it just because CONFIG_KEXEC is
>> defined, regardless of whether we're actually booting from kexec.
> 
> How would one rocver a core that's off in the weeds? 

System reset?

I thought kexec was for upgrading your kernel, not recovering from crashes.

> We need to at least stop it somehow before we restart into the kdump kernel. kexec
> seems feasible since we have all the cores in a known state...

Stop, yes.  Reset, no.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
McClintock Matthew-B29882 - March 8, 2012, 6:24 p.m.
On Thu, Mar 8, 2012 at 12:20 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 03/08/2012 11:31 AM, McClintock Matthew-B29882 wrote:
>> On Fri, Mar 2, 2012 at 11:17 AM, Scott Wood <scottwood@freescale.com> wrote:
>>> On 03/02/2012 10:30 AM, Alexander Graf wrote:
>>>>
>>>> On 02.03.2012, at 17:20, Scott Wood wrote:
>>>>> Again, for 85xx we should *never* sync the timebase in the kernel,
>>>>> hypervisor or no.
>>>>
>>>> The code says "if the kexec config option is enabled, do the sync". I'm fairly sure it's there for a reason.
>>>
>>> Sigh.  I forgot about that.  It's because instead of doing kexec the
>>> simple way, we actually physically reset the core.  We really shouldn't
>>> do that.  And we *really* shouldn't do it just because CONFIG_KEXEC is
>>> defined, regardless of whether we're actually booting from kexec.
>>
>> How would one rocver a core that's off in the weeds?
>
> System reset?

kdump restarts a kernel using a reserved memory region to inspect the
memory of the crashed kernel. Wouldn't system reset cause issues here?

> I thought kexec was for upgrading your kernel, not recovering from crashes.
>
>> We need to at least stop it somehow before we restart into the kdump kernel. kexec
>> seems feasible since we have all the cores in a known state...
>
> Stop, yes.  Reset, no.

How would we stop the remaining cores assuming they could be in any state?

-M
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - March 8, 2012, 6:43 p.m.
On 03/08/2012 12:24 PM, McClintock Matthew-B29882 wrote:
> On Thu, Mar 8, 2012 at 12:20 PM, Scott Wood <scottwood@freescale.com> wrote:
>> On 03/08/2012 11:31 AM, McClintock Matthew-B29882 wrote:
>>> On Fri, Mar 2, 2012 at 11:17 AM, Scott Wood <scottwood@freescale.com> wrote:
>>>> On 03/02/2012 10:30 AM, Alexander Graf wrote:
>>>>>
>>>>> On 02.03.2012, at 17:20, Scott Wood wrote:
>>>>>> Again, for 85xx we should *never* sync the timebase in the kernel,
>>>>>> hypervisor or no.
>>>>>
>>>>> The code says "if the kexec config option is enabled, do the sync". I'm fairly sure it's there for a reason.
>>>>
>>>> Sigh.  I forgot about that.  It's because instead of doing kexec the
>>>> simple way, we actually physically reset the core.  We really shouldn't
>>>> do that.  And we *really* shouldn't do it just because CONFIG_KEXEC is
>>>> defined, regardless of whether we're actually booting from kexec.
>>>
>>> How would one rocver a core that's off in the weeds?
>>
>> System reset?
> 
> kdump restarts a kernel using a reserved memory region to inspect the
> memory of the crashed kernel. Wouldn't system reset cause issues here?

Oh, kdump.

Maybe in that case, go ahead and reset the other cores (or halt them via
some other means), but don't do anything with them.  Just boot a single
core in the dump kernel, do the dumping, then reset the system?

Or if you must sync the timebase in Linux, do it the way U-Boot does
(and skip it if you don't have access to the relevant CCSR bits).

Are I/O devices a problem with kdump and not resetting the system,
especially on some of our chips where certain I/O devices are difficult
to reset?

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
McClintock Matthew-B29882 - March 8, 2012, 6:46 p.m.
On Thu, Mar 8, 2012 at 12:43 PM, Scott Wood <scottwood@freescale.com> wrote:
> On 03/08/2012 12:24 PM, McClintock Matthew-B29882 wrote:
>> On Thu, Mar 8, 2012 at 12:20 PM, Scott Wood <scottwood@freescale.com> wrote:
>>> On 03/08/2012 11:31 AM, McClintock Matthew-B29882 wrote:
>>>> On Fri, Mar 2, 2012 at 11:17 AM, Scott Wood <scottwood@freescale.com> wrote:
>>>>> On 03/02/2012 10:30 AM, Alexander Graf wrote:
>>>>>>
>>>>>> On 02.03.2012, at 17:20, Scott Wood wrote:
>>>>>>> Again, for 85xx we should *never* sync the timebase in the kernel,
>>>>>>> hypervisor or no.
>>>>>>
>>>>>> The code says "if the kexec config option is enabled, do the sync". I'm fairly sure it's there for a reason.
>>>>>
>>>>> Sigh.  I forgot about that.  It's because instead of doing kexec the
>>>>> simple way, we actually physically reset the core.  We really shouldn't
>>>>> do that.  And we *really* shouldn't do it just because CONFIG_KEXEC is
>>>>> defined, regardless of whether we're actually booting from kexec.
>>>>
>>>> How would one rocver a core that's off in the weeds?
>>>
>>> System reset?
>>
>> kdump restarts a kernel using a reserved memory region to inspect the
>> memory of the crashed kernel. Wouldn't system reset cause issues here?
>
> Oh, kdump.
>
> Maybe in that case, go ahead and reset the other cores (or halt them via
> some other means), but don't do anything with them.  Just boot a single
> core in the dump kernel, do the dumping, then reset the system?

Yes, we could do one core here... but is it worthwhile to do kexec and
kdump differently?

> Or if you must sync the timebase in Linux, do it the way U-Boot does
> (and skip it if you don't have access to the relevant CCSR bits).

Ok - I've never even looked at how the timebase sync was done in Linux
or u-boot. Just enabled the timebase sync

> Are I/O devices a problem with kdump and not resetting the system,
> especially on some of our chips where certain I/O devices are difficult
> to reset?

Anything still occurring will be problematic and *possibly* prevent us
from determining what caused the actual crash.

-M
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Scott Wood - July 22, 2013, 10:19 p.m.
On Fri, Mar 02, 2012 at 03:12:33PM +0100, Alexander Graf wrote:
> When running inside a virtual machine, we can not modify timebase, so
> let's just not call the functions for it then.
> 
> This resolves hangs when booting e500 SMP guests on overcommitted hosts.
> 
> Reported-by: Stuart Yoder <B08248@freescale.com>
> Signed-off-by: Alexander Graf <agraf@suse.de>
> 
> ---
> arch/powerpc/platforms/85xx/smp.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
> index ff42490..d4b6c1f 100644
> --- a/arch/powerpc/platforms/85xx/smp.c
> +++ b/arch/powerpc/platforms/85xx/smp.c
> @@ -249,6 +249,13 @@ void __init mpc85xx_smp_init(void)
>  		smp_85xx_ops.cause_ipi = doorbell_cause_ipi;
>  	}
>  
> +	/* When running under a hypervisor, we can not modify tb */
> +	np = of_find_node_by_path("/hypervisor");
> +	if (np) {
> +		smp_85xx_ops.give_timebase = NULL;
> +		smp_85xx_ops.take_timebase = NULL;
> +	}

I'm marking this superseded as we now only set give/take_timebase if a
guts node is present that corresponds to an SMP SoC.  QEMU currently
advertises an mpc8544 guts (which is not SMP) and will eventually move to
a paravirt device with no guts at all.

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
index ff42490..d4b6c1f 100644
--- a/arch/powerpc/platforms/85xx/smp.c
+++ b/arch/powerpc/platforms/85xx/smp.c
@@ -249,6 +249,13 @@  void __init mpc85xx_smp_init(void)
 		smp_85xx_ops.cause_ipi = doorbell_cause_ipi;
 	}
 
+	/* When running under a hypervisor, we can not modify tb */
+	np = of_find_node_by_path("/hypervisor");
+	if (np) {
+		smp_85xx_ops.give_timebase = NULL;
+		smp_85xx_ops.take_timebase = NULL;
+	}
+
 	smp_ops = &smp_85xx_ops;
 
 #ifdef CONFIG_KEXEC