core/cpu: Initialize all cpu thread areas to avoid invalid memory access.

Message ID 153582356338.1352.1498937538112757025.stgit@jupiter.in.ibm.com
State New
Headers show
Series
  • core/cpu: Initialize all cpu thread areas to avoid invalid memory access.
Related show

Checks

Context Check Description
snowpatch_ozlabs/make_check success Test make_check on branch master
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied

Commit Message

Mahesh J Salgaonkar Sept. 1, 2018, 5:40 p.m.
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Currently we initialize cpu stack memory only for cpu pir that is found on
the device-tree. For the rest, the cpu thread contents are uninitialized.
This sometime causes for_each_cpu* macros to return cpu thread for pir/cpu
which isn't present on the system. The for_each_cpu* macros iterate over
cpu stacks using pir as index and returns cpu thread pointer if
state != cpu_state_no_cpu. For cpus that are not found on device-tree the
state may hold junk value leading opal to access invalid cpu thread area.
This further leads to accessing pointers with junk values causing machine
check (MCE) during OPAL init code. Fix this by Initializing all the cpu
thread areas upto cpu_max_pir.

[  119.306746450,3] Fatal MCE at 000000003002949c   .init_trace_buffers+0x20c
[  119.306756674,3] CFAR : 0000000030029310
[  119.306758217,3] SRR0 : 000000003002949c SRR1 : 9000000000201000
[  119.306760486,3] HSRR0: 000000003000280c HSRR1: 9000000000001000
[  119.306762704,3] DSISR: 00000040         DAR  : a603087c2000807e
[  119.306764761,3] LR   : 00000000300294a8 CTR  : 0000000000000000
[  119.306766796,3] CR   : 40004204         XER  : 00000000
[  119.306768646,3] GPR00: 00000000300293e0 GPR16: 0000000000000000
[  119.306770934,3] GPR01: 0000000035d03b90 GPR17: 0000000000000000
[  119.306773164,3] GPR02: 0000000030123c00 GPR18: 0000000000000000
[  119.306775430,3] GPR03: 0000000031ce0000 GPR19: 0000000000000000
[  119.306777662,3] GPR04: 00000000001000a7 GPR20: 0000000000000000
[  119.306779953,3] GPR05: 0000000000000000 GPR21: 0000000000000000
[  119.306782111,3] GPR06: 0000000000000000 GPR22: 0000000000000000
[  119.306784268,3] GPR07: 0000000000000000 GPR23: 00000000001000a7
[  119.306786455,3] GPR08: 000000000000085f GPR24: 0000000000000077
[  119.306788681,3] GPR09: a603087c2000804e GPR25: 00000000000fffff
[  119.306791146,3] GPR10: 00000000a1d9fe4b GPR26: 00000000300c4a4b
[  119.306793413,3] GPR11: 0000000030099d7c GPR27: 00000000300c4a3a
[  119.306795599,3] GPR12: 0000000040004202 GPR28: 0000000000101000
[  119.306797869,3] GPR13: 0000000035d00000 GPR29: 0000200001123000
[  119.306800179,3] GPR14: 00000000300026b0 GPR30: 0000200001123000
[  119.306802421,3] GPR15: 0000000000000000 GPR31: 0000000000000000

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 core/cpu.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Comments

Oliver Sept. 3, 2018, 5:50 a.m. | #1
On Sun, Sep 2, 2018 at 3:40 AM, Mahesh J Salgaonkar
<mahesh@linux.vnet.ibm.com> wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> Currently we initialize cpu stack memory only for cpu pir that is found on
> the device-tree. For the rest, the cpu thread contents are uninitialized.
> This sometime causes for_each_cpu* macros to return cpu thread for pir/cpu
> which isn't present on the system. The for_each_cpu* macros iterate over
> cpu stacks using pir as index and returns cpu thread pointer if
> state != cpu_state_no_cpu. For cpus that are not found on device-tree the
> state may hold junk value leading opal to access invalid cpu thread area.
> This further leads to accessing pointers with junk values causing machine
> check (MCE) during OPAL init code. Fix this by Initializing all the cpu
> thread areas upto cpu_max_pir.

Is this from a cold IPL or MPIPL? Memory should be zeroed before we
enter OPAL so if there is junk in the stack area we might have a data
corruption problem,

Reviewed-by: Oliver O'Halloran <oohall@gmail.com>

> [  119.306746450,3] Fatal MCE at 000000003002949c   .init_trace_buffers+0x20c
> [  119.306756674,3] CFAR : 0000000030029310
> [  119.306758217,3] SRR0 : 000000003002949c SRR1 : 9000000000201000
> [  119.306760486,3] HSRR0: 000000003000280c HSRR1: 9000000000001000
> [  119.306762704,3] DSISR: 00000040         DAR  : a603087c2000807e
> [  119.306764761,3] LR   : 00000000300294a8 CTR  : 0000000000000000
> [  119.306766796,3] CR   : 40004204         XER  : 00000000
> [  119.306768646,3] GPR00: 00000000300293e0 GPR16: 0000000000000000
> [  119.306770934,3] GPR01: 0000000035d03b90 GPR17: 0000000000000000
> [  119.306773164,3] GPR02: 0000000030123c00 GPR18: 0000000000000000
> [  119.306775430,3] GPR03: 0000000031ce0000 GPR19: 0000000000000000
> [  119.306777662,3] GPR04: 00000000001000a7 GPR20: 0000000000000000
> [  119.306779953,3] GPR05: 0000000000000000 GPR21: 0000000000000000
> [  119.306782111,3] GPR06: 0000000000000000 GPR22: 0000000000000000
> [  119.306784268,3] GPR07: 0000000000000000 GPR23: 00000000001000a7
> [  119.306786455,3] GPR08: 000000000000085f GPR24: 0000000000000077
> [  119.306788681,3] GPR09: a603087c2000804e GPR25: 00000000000fffff
> [  119.306791146,3] GPR10: 00000000a1d9fe4b GPR26: 00000000300c4a4b
> [  119.306793413,3] GPR11: 0000000030099d7c GPR27: 00000000300c4a3a
> [  119.306795599,3] GPR12: 0000000040004202 GPR28: 0000000000101000
> [  119.306797869,3] GPR13: 0000000035d00000 GPR29: 0000200001123000
> [  119.306800179,3] GPR14: 00000000300026b0 GPR30: 0000200001123000
> [  119.306802421,3] GPR15: 0000000000000000 GPR31: 0000000000000000
>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  core/cpu.c |   17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/core/cpu.c b/core/cpu.c
> index 88477f821..8b3e5d995 100644
> --- a/core/cpu.c
> +++ b/core/cpu.c
> @@ -1121,6 +1121,18 @@ void init_cpu_max_pir(void)
>         prlog(PR_DEBUG, "CPU: New max PIR set to 0x%x\n", cpu_max_pir);
>  }
>
> +static void init_all_cpu_threads(void)
> +{
> +       unsigned int pir;
> +       struct cpu_thread *t;
> +
> +       for (pir = 0; pir <= cpu_max_pir; pir++) {
> +               t = &cpu_stacks[pir].cpu;
> +               if (t != boot_cpu)
> +                       init_cpu_thread(t, cpu_state_no_cpu, pir);
> +       }
> +}
> +
>  void init_all_cpus(void)
>  {
>         struct dt_node *cpus, *cpu;
> @@ -1132,6 +1144,7 @@ void init_all_cpus(void)
>
>         init_tm_suspend_mode_property();
>
> +       init_all_cpu_threads();
>         /* Iterate all CPUs in the device-tree */
>         dt_for_each_child(cpus, cpu) {
>                 unsigned int pir, server_no, chip_id;
> @@ -1166,7 +1179,7 @@ void init_all_cpus(void)
>                 assert(pir <= cpu_max_pir);
>                 t = pt = &cpu_stacks[pir].cpu;
>                 if (t != boot_cpu) {
> -                       init_cpu_thread(t, state, pir);
> +                       t->state = state;
>                         /* Each cpu gets its own later in init_trace_buffers */
>                         t->trace = boot_cpu->trace;
>                 }
> @@ -1195,7 +1208,7 @@ void init_all_cpus(void)
>                         prlog(PR_TRACE, "CPU:   secondary thread %d found\n",
>                               thread);
>                         t = &cpu_stacks[pir + thread].cpu;
> -                       init_cpu_thread(t, state, pir + thread);
> +                       t->state = state;
>                         t->trace = boot_cpu->trace;
>                         t->server_no = ((const u32 *)p->prop)[thread];
>                         t->is_secondary = true;
>
> _______________________________________________
> Skiboot mailing list
> Skiboot@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Mahesh J Salgaonkar Sept. 3, 2018, 6:07 a.m. | #2
On 09/03/2018 11:20 AM, Oliver wrote:
> On Sun, Sep 2, 2018 at 3:40 AM, Mahesh J Salgaonkar
> <mahesh@linux.vnet.ibm.com> wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Currently we initialize cpu stack memory only for cpu pir that is found on
>> the device-tree. For the rest, the cpu thread contents are uninitialized.
>> This sometime causes for_each_cpu* macros to return cpu thread for pir/cpu
>> which isn't present on the system. The for_each_cpu* macros iterate over
>> cpu stacks using pir as index and returns cpu thread pointer if
>> state != cpu_state_no_cpu. For cpus that are not found on device-tree the
>> state may hold junk value leading opal to access invalid cpu thread area.
>> This further leads to accessing pointers with junk values causing machine
>> check (MCE) during OPAL init code. Fix this by Initializing all the cpu
>> thread areas upto cpu_max_pir.
> 
> Is this from a cold IPL or MPIPL? Memory should be zeroed before we
> enter OPAL so if there is junk in the stack area we might have a data
> corruption problem,

This is happening during re-IPL (OS reboot). On FSP, per Dean Sanner, OS
reboot requests get transformed into an MPIPL. On cold IPL the memory is
all zeroed and we don't see this issue during first IPL.

> 
> Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Oliver Sept. 3, 2018, 6:55 a.m. | #3
On Mon, Sep 3, 2018 at 4:07 PM, Mahesh Jagannath Salgaonkar
<mahesh@linux.vnet.ibm.com> wrote:
> On 09/03/2018 11:20 AM, Oliver wrote:
>> On Sun, Sep 2, 2018 at 3:40 AM, Mahesh J Salgaonkar
>> <mahesh@linux.vnet.ibm.com> wrote:
>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>
>>> Currently we initialize cpu stack memory only for cpu pir that is found on
>>> the device-tree. For the rest, the cpu thread contents are uninitialized.
>>> This sometime causes for_each_cpu* macros to return cpu thread for pir/cpu
>>> which isn't present on the system. The for_each_cpu* macros iterate over
>>> cpu stacks using pir as index and returns cpu thread pointer if
>>> state != cpu_state_no_cpu. For cpus that are not found on device-tree the
>>> state may hold junk value leading opal to access invalid cpu thread area.
>>> This further leads to accessing pointers with junk values causing machine
>>> check (MCE) during OPAL init code. Fix this by Initializing all the cpu
>>> thread areas upto cpu_max_pir.
>>
>> Is this from a cold IPL or MPIPL? Memory should be zeroed before we
>> enter OPAL so if there is junk in the stack area we might have a data
>> corruption problem,
>
> This is happening during re-IPL (OS reboot). On FSP, per Dean Sanner, OS
> reboot requests get transformed into an MPIPL. On cold IPL the memory is
> all zeroed and we don't see this issue during first IPL.

Right, you should probably mention this is MPIPL specific in the commit message.

I'm still a little concerned that skiboot is putting junk data in the
unused cpu_thread structures at all. If it is then I'd expect
cpu_for_each*() to be broken at broken at runtime even if it boots
successfully from a cold IPL. I'll take a look at it sometime.
Vasant Hegde Sept. 6, 2018, 5:22 a.m. | #4
On 09/01/2018 11:10 PM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Currently we initialize cpu stack memory only for cpu pir that is found on
> the device-tree. For the rest, the cpu thread contents are uninitialized.
> This sometime causes for_each_cpu* macros to return cpu thread for pir/cpu
> which isn't present on the system. The for_each_cpu* macros iterate over
> cpu stacks using pir as index and returns cpu thread pointer if
> state != cpu_state_no_cpu. For cpus that are not found on device-tree the
> state may hold junk value leading opal to access invalid cpu thread area.
> This further leads to accessing pointers with junk values causing machine
> check (MCE) during OPAL init code. Fix this by Initializing all the cpu
> thread areas upto cpu_max_pir.
> 
> [  119.306746450,3] Fatal MCE at 000000003002949c   .init_trace_buffers+0x20c
> [  119.306756674,3] CFAR : 0000000030029310
> [  119.306758217,3] SRR0 : 000000003002949c SRR1 : 9000000000201000
> [  119.306760486,3] HSRR0: 000000003000280c HSRR1: 9000000000001000
> [  119.306762704,3] DSISR: 00000040         DAR  : a603087c2000807e
> [  119.306764761,3] LR   : 00000000300294a8 CTR  : 0000000000000000
> [  119.306766796,3] CR   : 40004204         XER  : 00000000
> [  119.306768646,3] GPR00: 00000000300293e0 GPR16: 0000000000000000
> [  119.306770934,3] GPR01: 0000000035d03b90 GPR17: 0000000000000000
> [  119.306773164,3] GPR02: 0000000030123c00 GPR18: 0000000000000000
> [  119.306775430,3] GPR03: 0000000031ce0000 GPR19: 0000000000000000
> [  119.306777662,3] GPR04: 00000000001000a7 GPR20: 0000000000000000
> [  119.306779953,3] GPR05: 0000000000000000 GPR21: 0000000000000000
> [  119.306782111,3] GPR06: 0000000000000000 GPR22: 0000000000000000
> [  119.306784268,3] GPR07: 0000000000000000 GPR23: 00000000001000a7
> [  119.306786455,3] GPR08: 000000000000085f GPR24: 0000000000000077
> [  119.306788681,3] GPR09: a603087c2000804e GPR25: 00000000000fffff
> [  119.306791146,3] GPR10: 00000000a1d9fe4b GPR26: 00000000300c4a4b
> [  119.306793413,3] GPR11: 0000000030099d7c GPR27: 00000000300c4a3a
> [  119.306795599,3] GPR12: 0000000040004202 GPR28: 0000000000101000
> [  119.306797869,3] GPR13: 0000000035d00000 GPR29: 0000200001123000
> [  119.306800179,3] GPR14: 00000000300026b0 GPR30: 0000200001123000
> [  119.306802421,3] GPR15: 0000000000000000 GPR31: 0000000000000000
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>   core/cpu.c |   17 +++++++++++++++--
>   1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/core/cpu.c b/core/cpu.c
> index 88477f821..8b3e5d995 100644
> --- a/core/cpu.c
> +++ b/core/cpu.c
> @@ -1121,6 +1121,18 @@ void init_cpu_max_pir(void)
>   	prlog(PR_DEBUG, "CPU: New max PIR set to 0x%x\n", cpu_max_pir);
>   }
> 
> +static void init_all_cpu_threads(void)
> +{
> +	unsigned int pir;
> +	struct cpu_thread *t;
> +
> +	for (pir = 0; pir <= cpu_max_pir; pir++) {
> +		t = &cpu_stacks[pir].cpu;
> +		if (t != boot_cpu)
> +			init_cpu_thread(t, cpu_state_no_cpu, pir);

Why should we init no existing CPU? May be we should just clear stack?
(move memset from init_cpu_thread() to here) and then initialize actual cpu stack.

-Vasant
Vasant Hegde Sept. 6, 2018, 5:25 a.m. | #5
On 09/03/2018 12:25 PM, Oliver wrote:
> On Mon, Sep 3, 2018 at 4:07 PM, Mahesh Jagannath Salgaonkar
> <mahesh@linux.vnet.ibm.com> wrote:
>> On 09/03/2018 11:20 AM, Oliver wrote:
>>> On Sun, Sep 2, 2018 at 3:40 AM, Mahesh J Salgaonkar
>>> <mahesh@linux.vnet.ibm.com> wrote:
>>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

.../...

>>>
>>> Is this from a cold IPL or MPIPL? Memory should be zeroed before we
>>> enter OPAL so if there is junk in the stack area we might have a data
>>> corruption problem,
>>
>> This is happening during re-IPL (OS reboot). On FSP, per Dean Sanner, OS
>> reboot requests get transformed into an MPIPL. On cold IPL the memory is
>> all zeroed and we don't see this issue during first IPL.
> 
> Right, you should probably mention this is MPIPL specific in the commit message.
> 
> I'm still a little concerned that skiboot is putting junk data in the
> unused cpu_thread structures at all. If it is then I'd expect
> cpu_for_each*() to be broken at broken at runtime even if it boots
> successfully from a cold IPL. I'll take a look at it sometime.

Yesterday we spent sometime to track down the real issue. It turned out that 
Hostboot
copied 16MB HDAT data to OPAL memory instead of 8MB.. That corrupted our TCE space
and CPU stack.

Hostboot upstream has required commit... that needs to be backported to relevant 
FSP builds.


FYI : HB commit :

commit 3d3d39d62a94da9dc9bc2da73474c9c3400762c4
Author: Mike Baiocchi <mbaiocch@us.ibm.com>
Date:   Thu May 3 09:02:02 2018 -0500

     Get Final HDAT Size from PAYLOAD's SPIRA section


-Vasant
Mahesh J Salgaonkar Sept. 6, 2018, 6:43 a.m. | #6
On 09/06/2018 10:55 AM, Vasant Hegde wrote:
>>
>> I'm still a little concerned that skiboot is putting junk data in the
>> unused cpu_thread structures at all. If it is then I'd expect
>> cpu_for_each*() to be broken at broken at runtime even if it boots
>> successfully from a cold IPL. I'll take a look at it sometime.
> 
> Yesterday we spent sometime to track down the real issue. It turned out
> that Hostboot
> copied 16MB HDAT data to OPAL memory instead of 8MB.. That corrupted our
> TCE space
> and CPU stack.

Thanks Vasant for tracking this down.

> 
> Hostboot upstream has required commit... that needs to be backported to
> relevant FSP builds.

So with the hostboot fix mentioned below we should not have any
corruption in CPU stack. With that I am wondering whether this patch is
required at all.

> 
> 
> FYI : HB commit :
> 
> commit 3d3d39d62a94da9dc9bc2da73474c9c3400762c4
> Author: Mike Baiocchi <mbaiocch@us.ibm.com>
> Date:   Thu May 3 09:02:02 2018 -0500

Patch

diff --git a/core/cpu.c b/core/cpu.c
index 88477f821..8b3e5d995 100644
--- a/core/cpu.c
+++ b/core/cpu.c
@@ -1121,6 +1121,18 @@  void init_cpu_max_pir(void)
 	prlog(PR_DEBUG, "CPU: New max PIR set to 0x%x\n", cpu_max_pir);
 }
 
+static void init_all_cpu_threads(void)
+{
+	unsigned int pir;
+	struct cpu_thread *t;
+
+	for (pir = 0; pir <= cpu_max_pir; pir++) {
+		t = &cpu_stacks[pir].cpu;
+		if (t != boot_cpu)
+			init_cpu_thread(t, cpu_state_no_cpu, pir);
+	}
+}
+
 void init_all_cpus(void)
 {
 	struct dt_node *cpus, *cpu;
@@ -1132,6 +1144,7 @@  void init_all_cpus(void)
 
 	init_tm_suspend_mode_property();
 
+	init_all_cpu_threads();
 	/* Iterate all CPUs in the device-tree */
 	dt_for_each_child(cpus, cpu) {
 		unsigned int pir, server_no, chip_id;
@@ -1166,7 +1179,7 @@  void init_all_cpus(void)
 		assert(pir <= cpu_max_pir);
 		t = pt = &cpu_stacks[pir].cpu;
 		if (t != boot_cpu) {
-			init_cpu_thread(t, state, pir);
+			t->state = state;
 			/* Each cpu gets its own later in init_trace_buffers */
 			t->trace = boot_cpu->trace;
 		}
@@ -1195,7 +1208,7 @@  void init_all_cpus(void)
 			prlog(PR_TRACE, "CPU:   secondary thread %d found\n",
 			      thread);
 			t = &cpu_stacks[pir + thread].cpu;
-			init_cpu_thread(t, state, pir + thread);
+			t->state = state;
 			t->trace = boot_cpu->trace;
 			t->server_no = ((const u32 *)p->prop)[thread];
 			t->is_secondary = true;