[1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot

Message ID 20171006061005.29891-2-npiggin@gmail.com
State New
Headers show
Series
  • some boot/shutdown improvements
Related show

Commit Message

Nicholas Piggin Oct. 6, 2017, 6:10 a.m.
OPAL boot does not insert secondaries at 0x60 to wait at the secondary
hold spinloop. Instead it keeps them held in firmware until the
opal_start_cpu call is made, which directs them where the caller
specifies. Linux inserts them into generic_secondary_smp_init(), which
is after the secondary hold spinloop (they go on to spin at the per-CPU
paca loops, but that is another step).

So avoid waiting on this spinloop when booting with OPAL firmware.
It always just times out.

This saves 100ms boot time on bare metal, and 10s of seconds when
booting the simulator in SMP.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/head_64.S  |  4 +++-
 arch/powerpc/kernel/setup_64.c | 14 ++++++++++++--
 2 files changed, 15 insertions(+), 3 deletions(-)

Comments

Michael Ellerman Oct. 10, 2017, 11:11 a.m. | #1
Nicholas Piggin <npiggin@gmail.com> writes:

> OPAL boot does not insert secondaries at 0x60 to wait at the secondary
> hold spinloop. Instead it keeps them held in firmware until the
> opal_start_cpu call is made, which directs them where the caller
> specifies. Linux inserts them into generic_secondary_smp_init(), which
> is after the secondary hold spinloop (they go on to spin at the per-CPU
> paca loops, but that is another step).
>
> So avoid waiting on this spinloop when booting with OPAL firmware.
> It always just times out.
>
> This saves 100ms boot time on bare metal, and 10s of seconds when
> booting the simulator in SMP.

Oh nice, that's real facepalm territory.

It'd be neater if we just inserted them at 0x60, but the sequence is
wrong.

Can we fix it just by making spinning_secondaries zero on OPAL?

cheers
Nicholas Piggin Oct. 10, 2017, 11:44 a.m. | #2
On Tue, 10 Oct 2017 22:11:46 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > OPAL boot does not insert secondaries at 0x60 to wait at the secondary
> > hold spinloop. Instead it keeps them held in firmware until the
> > opal_start_cpu call is made, which directs them where the caller
> > specifies. Linux inserts them into generic_secondary_smp_init(), which
> > is after the secondary hold spinloop (they go on to spin at the per-CPU
> > paca loops, but that is another step).
> >
> > So avoid waiting on this spinloop when booting with OPAL firmware.
> > It always just times out.
> >
> > This saves 100ms boot time on bare metal, and 10s of seconds when
> > booting the simulator in SMP.  
> 
> Oh nice, that's real facepalm territory.
> 
> It'd be neater if we just inserted them at 0x60, but the sequence is
> wrong.
> 
> Can we fix it just by making spinning_secondaries zero on OPAL?

I had a look at that, but generic_secondary_smp_init() still
decrements it, so it would underflow which I thought was
uglier.

I actually have to look a bit further, because KVM guests are
also having the loop time out too by the looks.

Thanks,
Nick
Nicholas Piggin Oct. 10, 2017, 3:58 p.m. | #3
On Tue, 10 Oct 2017 21:44:15 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Tue, 10 Oct 2017 22:11:46 +1100
> Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> > Nicholas Piggin <npiggin@gmail.com> writes:
> >   
> > > OPAL boot does not insert secondaries at 0x60 to wait at the secondary
> > > hold spinloop. Instead it keeps them held in firmware until the
> > > opal_start_cpu call is made, which directs them where the caller
> > > specifies. Linux inserts them into generic_secondary_smp_init(), which
> > > is after the secondary hold spinloop (they go on to spin at the per-CPU
> > > paca loops, but that is another step).
> > >
> > > So avoid waiting on this spinloop when booting with OPAL firmware.
> > > It always just times out.
> > >
> > > This saves 100ms boot time on bare metal, and 10s of seconds when
> > > booting the simulator in SMP.    
> > 
> > Oh nice, that's real facepalm territory.
> > 
> > It'd be neater if we just inserted them at 0x60, but the sequence is
> > wrong.
> > 
> > Can we fix it just by making spinning_secondaries zero on OPAL?  
> 
> I had a look at that, but generic_secondary_smp_init() still
> decrements it, so it would underflow which I thought was
> uglier.
> 
> I actually have to look a bit further, because KVM guests are
> also having the loop time out too by the looks.

Ahh okay, pseries is using the start-cpu RTAS call to enter at
generic_secondary_smp_init() as well. So we can take it out for
pseries as well.

Thanks,
Nick
Nicholas Piggin Oct. 10, 2017, 6:52 p.m. | #4
On Wed, 11 Oct 2017 01:58:28 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:


> Ahh okay, pseries is using the start-cpu RTAS call to enter at
> generic_secondary_smp_init() as well. So we can take it out for
> pseries as well.

This patch seems to do the trick for pseries guests too:

powerpc/64s: Avoid waiting for secondary hold spinloop if it is not used

OPAL and some RTAS boot does not insert secondaries at 0x60 to wait at
the secondary hold spinloop. Instead they are started later, at
generic_secondary_smp_init(), which is after the secondary hold
spinloop.

Avoid waiting on this spinloop when booting with OPAL firmware, or
when the RTAS boot does not use this loop. This wait always times
out in those cases.

This saves 100ms boot time on bare metal (10s of seconds of real time
when booting on the simulator in SMP), and 100ms on modern pseries
guests.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/head_64.S  | 16 +++++++++++-----
 arch/powerpc/kernel/setup_64.c | 12 +++++++++++-
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c9e760ec7530..0deef350004f 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -55,12 +55,18 @@
  *
  *  For pSeries or server processors:
  *   1. The MMU is off & open firmware is running in real mode.
- *   2. The kernel is entered at __start
+ *   2. The primary CPU enters at __start.
+ *   3. If the RTAS supports "query-cpu-stopped-state", then secondary
+ *      CPUs will enter as directed by "start-cpu" RTAS call, which is
+ *      generic_secondary_smp_init, with PIR in r3.
+ *   4. Else the secondary CPUs will enter at secondary_hold (0x60) as
+ *      directed by the "start-cpu" RTS call, with PIR in r3.
  * -or- For OPAL entry:
- *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
- *      with device-tree in gpr3. We also get OPAL base in r8 and
- *	entry in r9 for debugging purposes
- *   2. Secondary processors enter at 0x60 with PIR in gpr3
+ *   1. The MMU is off, processor in HV mode.
+ *   2. The primary CPU enters at 0 with device-tree in r3, OPAL base
+ *      in r8, and entry in r9 for debugging purposes.
+ *   3. Secondary CPUs enter as directed by OPAL_START_CPU call, which
+ *      is at generic_secondary_smp_init, with PIR in r3.
  *
  *  For Book3E processors:
  *   1. The MMU is on running in AS0 in a state defined in ePAPR
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3f2453858f60..afa79e8d56a6 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -363,8 +363,18 @@ void early_setup_secondary(void)
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
 static bool use_spinloop(void)
 {
-	if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
+		/*
+		 * See comments in head_64.S -- not all platforms insert
+		 * secondaries at __secondary_hold and wait at the spin
+		 * loop.
+		 */
+		if (firmware_has_feature(FW_FEATURE_OPAL))
+			return false;
+		if (rtas_token("query-cpu-stopped-state") != RTAS_UNKNOWN_SERVICE)
+			return false;
 		return true;
+	}
 
 	/*
 	 * When book3e boots from kexec, the ePAPR spin table does
Michael Ellerman Oct. 11, 2017, 11:27 a.m. | #5
Nicholas Piggin <npiggin@gmail.com> writes:

> On Wed, 11 Oct 2017 01:58:28 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
>
>
>> Ahh okay, pseries is using the start-cpu RTAS call to enter at
>> generic_secondary_smp_init() as well. So we can take it out for
>> pseries as well.
>
> This patch seems to do the trick for pseries guests too:
>
> powerpc/64s: Avoid waiting for secondary hold spinloop if it is not used
>
> OPAL and some RTAS boot does not insert secondaries at 0x60 to wait at
> the secondary hold spinloop. Instead they are started later, at
> generic_secondary_smp_init(), which is after the secondary hold
> spinloop.
>
> Avoid waiting on this spinloop when booting with OPAL firmware, or
> when the RTAS boot does not use this loop. This wait always times
> out in those cases.
>
> This saves 100ms boot time on bare metal (10s of seconds of real time
> when booting on the simulator in SMP), and 100ms on modern pseries
> guests.

My instinct was to say "huh, that's not how it works on pseries!".

But then I see this was all changed in:

  dbe78b401186 ("powerpc/pseries: Do not start secondaries in Open Firmware") (Sep 2013)


So that is where my confusion comes from. Most of the code and comments
still talk about secondaries coming in at 0x60, but that's really only
on "legacy" machines.

I guess I can merge this, but really this code needs a proper cleanup. I
dislike all this platform specific knowledge ending up in setup_64.c.

If we had an smp_ops->spinning_secondaries() that tells the spin
loop how many secondaries to wait for, it could all go in platform code
I think.

The check for use_spinloop() would just become a short-circuit check of
spinning_secondaries == 0.

The other issue is kexec. IIRC when we kexec on pseries we don't return
the CPUs to RTAS, so then they *are* spinning at 0x60. But maybe that's
changed since I last looked at it too :)

cheers

> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index c9e760ec7530..0deef350004f 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -55,12 +55,18 @@
>   *
>   *  For pSeries or server processors:
>   *   1. The MMU is off & open firmware is running in real mode.
> - *   2. The kernel is entered at __start
> + *   2. The primary CPU enters at __start.
> + *   3. If the RTAS supports "query-cpu-stopped-state", then secondary
> + *      CPUs will enter as directed by "start-cpu" RTAS call, which is
> + *      generic_secondary_smp_init, with PIR in r3.
> + *   4. Else the secondary CPUs will enter at secondary_hold (0x60) as
> + *      directed by the "start-cpu" RTS call, with PIR in r3.
>   * -or- For OPAL entry:
> - *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
> - *      with device-tree in gpr3. We also get OPAL base in r8 and
> - *	entry in r9 for debugging purposes
> - *   2. Secondary processors enter at 0x60 with PIR in gpr3
> + *   1. The MMU is off, processor in HV mode.
> + *   2. The primary CPU enters at 0 with device-tree in r3, OPAL base
> + *      in r8, and entry in r9 for debugging purposes.
> + *   3. Secondary CPUs enter as directed by OPAL_START_CPU call, which
> + *      is at generic_secondary_smp_init, with PIR in r3.
>   *
>   *  For Book3E processors:
>   *   1. The MMU is on running in AS0 in a state defined in ePAPR
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 3f2453858f60..afa79e8d56a6 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -363,8 +363,18 @@ void early_setup_secondary(void)
>  #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
>  static bool use_spinloop(void)
>  {
> -	if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
> +		/*
> +		 * See comments in head_64.S -- not all platforms insert
> +		 * secondaries at __secondary_hold and wait at the spin
> +		 * loop.
> +		 */
> +		if (firmware_has_feature(FW_FEATURE_OPAL))
> +			return false;
> +		if (rtas_token("query-cpu-stopped-state") != RTAS_UNKNOWN_SERVICE)
> +			return false;
>  		return true;
> +	}
>  
>  	/*
>  	 * When book3e boots from kexec, the ePAPR spin table does
> -- 
> 2.13.3
Nicholas Piggin Oct. 11, 2017, 2 p.m. | #6
On Wed, 11 Oct 2017 22:27:23 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > On Wed, 11 Oct 2017 01:58:28 +1000
> > Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> >  
> >> Ahh okay, pseries is using the start-cpu RTAS call to enter at
> >> generic_secondary_smp_init() as well. So we can take it out for
> >> pseries as well.  
> >
> > This patch seems to do the trick for pseries guests too:
> >
> > powerpc/64s: Avoid waiting for secondary hold spinloop if it is not used
> >
> > OPAL and some RTAS boot does not insert secondaries at 0x60 to wait at
> > the secondary hold spinloop. Instead they are started later, at
> > generic_secondary_smp_init(), which is after the secondary hold
> > spinloop.
> >
> > Avoid waiting on this spinloop when booting with OPAL firmware, or
> > when the RTAS boot does not use this loop. This wait always times
> > out in those cases.
> >
> > This saves 100ms boot time on bare metal (10s of seconds of real time
> > when booting on the simulator in SMP), and 100ms on modern pseries
> > guests.  
> 
> My instinct was to say "huh, that's not how it works on pseries!".
> 
> But then I see this was all changed in:
> 
>   dbe78b401186 ("powerpc/pseries: Do not start secondaries in Open Firmware") (Sep 2013)
> 
> 
> So that is where my confusion comes from. Most of the code and comments
> still talk about secondaries coming in at 0x60, but that's really only
> on "legacy" machines.
> 
> I guess I can merge this, but really this code needs a proper cleanup. I
> dislike all this platform specific knowledge ending up in setup_64.c.
> 
> If we had an smp_ops->spinning_secondaries() that tells the spin
> loop how many secondaries to wait for, it could all go in platform code
> I think.

Yeah, not sure how best to do it. What I wanted to do was just increment
spinning_secondaries in prom_init as we inserted them to 0x60 (or the
0x100 for pmac or whatever). But prom_init doesn't like referencing outside
variables so there goes that.

> 
> The check for use_spinloop() would just become a short-circuit check of
> spinning_secondaries == 0.

Yeah maybe that would be enough. I don't know if half that setup_arch
could be per-platformirized, including smp_release_cpus().

> 
> The other issue is kexec. IIRC when we kexec on pseries we don't return
> the CPUs to RTAS, so then they *are* spinning at 0x60. But maybe that's
> changed since I last looked at it too :)

Oh I might have forgotten to test that on pseries, so I'll try that.

Thanks,
Nick

Patch

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c9e760ec7530..1ebfb3f2cbbb 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -60,7 +60,9 @@ 
  *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
  *      with device-tree in gpr3. We also get OPAL base in r8 and
  *	entry in r9 for debugging purposes
- *   2. Secondary processors enter at 0x60 with PIR in gpr3
+ *   2. Secondary processors enter as directed by opal_start_cpu(), which
+ *      is generic_secondary_smp_init, with PIR in gpr3. The secondary spin
+ *      code is not used.
  *
  *  For Book3E processors:
  *   1. The MMU is on running in AS0 in a state defined in ePAPR
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3f2453858f60..eada0a7b73f8 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -363,8 +363,18 @@  void early_setup_secondary(void)
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
 static bool use_spinloop(void)
 {
-	if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
-		return true;
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
+		/*
+		 * With OPAL, secondaries do not use the secondary hold
+		 * spinloop, rather they are held in firmware until
+		 * opal_start_cpu() sends them to generic_secondary_smp_init
+		 * directly.
+		 */
+		if (firmware_has_feature(FW_FEATURE_OPAL))
+			return false;
+		else
+			return true;
+	}
 
 	/*
 	 * When book3e boots from kexec, the ePAPR spin table does