diff mbox

powerpc: Fix the corrupt r3 error during MCE handling.

Message ID 20130710130155.4993.61577.stgit@mars (mailing list archive)
State Accepted, archived
Commit ee1dd1e3dc774cf257012215d996e8e7e370c162
Headers show

Commit Message

Mahesh J Salgaonkar July 10, 2013, 1:02 p.m. UTC
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

During Machine Check interrupt on pseries platform, R3 generally points to
memory region inside RTAS (FWNMI) area. We see r3 corruption because when RTAS
delivers the machine check exception it passes the address inside FWNMI area
with the top most bit set. This patch fixes this issue by masking top two bit
in machine check exception handler.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/ras.c |    3 +++
 1 file changed, 3 insertions(+)

Comments

Aneesh Kumar K.V July 10, 2013, 2:11 p.m. UTC | #1
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:

> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>
> During Machine Check interrupt on pseries platform, R3 generally points to
> memory region inside RTAS (FWNMI) area. We see r3 corruption because when RTAS
> delivers the machine check exception it passes the address inside FWNMI area
> with the top most bit set. This patch fixes this issue by masking top two bit
> in machine check exception handler.

I always got that error and used to wonder why I find FWNMI
corrupt. IS this a rtas bug or is it documented in papr ?


>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/ras.c |    3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 7b3cbde..721c058 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -287,6 +287,9 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>  	unsigned long *savep;
>  	struct rtas_error_log *h, *errhdr = NULL;
>
> +	/* Mask top two bits */
> +	regs->gpr[3] &= ~(0x3UL << 62);
> +
>  	if (!VALID_FWNMI_BUFFER(regs->gpr[3])) {
>  		printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
>  		return NULL;
>

-aneesh
Mahesh J Salgaonkar July 11, 2013, 4:34 a.m. UTC | #2
On 07/10/2013 07:41 PM, Aneesh Kumar K.V wrote:
> Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes:
> 
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> During Machine Check interrupt on pseries platform, R3 generally points to
>> memory region inside RTAS (FWNMI) area. We see r3 corruption because when RTAS
>> delivers the machine check exception it passes the address inside FWNMI area
>> with the top most bit set. This patch fixes this issue by masking top two bit
>> in machine check exception handler.
> 
> I always got that error and used to wonder why I find FWNMI
> corrupt. IS this a rtas bug or is it documented in papr ?

Nope. There is no mention of it in PAPR. It looks like a bug in RTAS.

> 
> 
>>
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/ras.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 7b3cbde..721c058 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -287,6 +287,9 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>>  	unsigned long *savep;
>>  	struct rtas_error_log *h, *errhdr = NULL;
>>
>> +	/* Mask top two bits */
>> +	regs->gpr[3] &= ~(0x3UL << 62);
>> +
>>  	if (!VALID_FWNMI_BUFFER(regs->gpr[3])) {
>>  		printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
>>  		return NULL;
>>
> 
> -aneesh
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
Benjamin Herrenschmidt July 11, 2013, 4:42 a.m. UTC | #3
On Thu, 2013-07-11 at 10:04 +0530, Mahesh Jagannath Salgaonkar wrote:
> > I always got that error and used to wonder why I find FWNMI
> > corrupt. IS this a rtas bug or is it documented in papr ?
> 
> Nope. There is no mention of it in PAPR. It looks like a bug in RTAS.

Typically, the top bit in real mode means to ignore the HRMOR... I think
it's some old bug in RTAS indeed.

Cheers,
Ben.
Anshuman Khandual July 11, 2013, 5:54 a.m. UTC | #4
On 07/10/2013 06:32 PM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> During Machine Check interrupt on pseries platform, R3 generally points to
> memory region inside RTAS (FWNMI) area. We see r3 corruption because when RTAS
> delivers the machine check exception it passes the address inside FWNMI area
> with the top most bit set. This patch fixes this issue by masking top two bit
> in machine check exception handler.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/pseries/ras.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index 7b3cbde..721c058 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -287,6 +287,9 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>  	unsigned long *savep;
>  	struct rtas_error_log *h, *errhdr = NULL;
> 
> +	/* Mask top two bits */
> +	regs->gpr[3] &= ~(0x3UL << 62);

We need to replace this "62" with a shift macro specifying the significance
of these top two address bits in the real mode.
Aneesh Kumar K.V July 15, 2013, 6:06 a.m. UTC | #5
Anshuman Khandual <khandual@linux.vnet.ibm.com> writes:

> On 07/10/2013 06:32 PM, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> 
>> During Machine Check interrupt on pseries platform, R3 generally points to
>> memory region inside RTAS (FWNMI) area. We see r3 corruption because when RTAS
>> delivers the machine check exception it passes the address inside FWNMI area
>> with the top most bit set. This patch fixes this issue by masking top two bit
>> in machine check exception handler.
>> 
>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/ras.c |    3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 7b3cbde..721c058 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -287,6 +287,9 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>>  	unsigned long *savep;
>>  	struct rtas_error_log *h, *errhdr = NULL;
>> 
>> +	/* Mask top two bits */
>> +	regs->gpr[3] &= ~(0x3UL << 62);
>
> We need to replace this "62" with a shift macro specifying the significance
> of these top two address bits in the real mode.

huh??

(gdb) p/t 0x3ull << 62
$4 = 1100000000000000000000000000000000000000000000000000000000000000

Why you need an extra comment to explain 62. But yes, we can possibly
write that this is an RTAS bug

-aneesh
Anshuman Khandual July 15, 2013, 6:18 a.m. UTC | #6
On 07/15/2013 11:36 AM, Aneesh Kumar K.V wrote:
> Anshuman Khandual <khandual@linux.vnet.ibm.com> writes:
> 
>> On 07/10/2013 06:32 PM, Mahesh J Salgaonkar wrote:
>>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>>
>>> During Machine Check interrupt on pseries platform, R3 generally points to
>>> memory region inside RTAS (FWNMI) area. We see r3 corruption because when RTAS
>>> delivers the machine check exception it passes the address inside FWNMI area
>>> with the top most bit set. This patch fixes this issue by masking top two bit
>>> in machine check exception handler.
>>>
>>> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/ras.c |    3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>>> index 7b3cbde..721c058 100644
>>> --- a/arch/powerpc/platforms/pseries/ras.c
>>> +++ b/arch/powerpc/platforms/pseries/ras.c
>>> @@ -287,6 +287,9 @@ static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
>>>  	unsigned long *savep;
>>>  	struct rtas_error_log *h, *errhdr = NULL;
>>>
>>> +	/* Mask top two bits */
>>> +	regs->gpr[3] &= ~(0x3UL << 62);
>>
>> We need to replace this "62" with a shift macro specifying the significance
>> of these top two address bits in the real mode.
> 
> huh??
> 
> (gdb) p/t 0x3ull << 62
> $4 = 1100000000000000000000000000000000000000000000000000000000000000
> 
> Why you need an extra comment to explain 62. But yes, we can possibly
> write that this is an RTAS bug

62 was just to point at the top two address bits in the real mode. Extra comment
request was to specify what is the RTAS behaviour or bug with respect to those top
two bits and how we are dealing with them here in this fix.
diff mbox

Patch

diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index 7b3cbde..721c058 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -287,6 +287,9 @@  static struct rtas_error_log *fwnmi_get_errinfo(struct pt_regs *regs)
 	unsigned long *savep;
 	struct rtas_error_log *h, *errhdr = NULL;
 
+	/* Mask top two bits */
+	regs->gpr[3] &= ~(0x3UL << 62);
+
 	if (!VALID_FWNMI_BUFFER(regs->gpr[3])) {
 		printk(KERN_ERR "FWNMI: corrupt r3 0x%016lx\n", regs->gpr[3]);
 		return NULL;