Message ID | 20200322160525.7624-1-ganeshgr@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v3] powerpc/pseries: Handle UE event for memcpy_mcsafe | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch powerpc/merge (a87b93bdf800a4d7a42d95683624a4516e516b4f) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 26 lines checked |
snowpatch_ozlabs/needsstable | success | Patch has no Fixes tags |
Ganesh Goudar <ganeshgr@linux.ibm.com> writes: > If we hit UE at an instruction with a fixup entry, flag to > ignore the event and set nip to continue execution at the > fixup entry. You don't explain why we would want to do that. Or what the consequences are if we *don't* do it. As such it's unclear if this is an important fix or just a nice-to-have. > For powernv these changes are already made by > commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe") We have masses of code that supposedly abstracts the MCE logic. How did we end up in the situation where we're having to write the same fix twice for different platforms? cheers > Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> > Reviewed-by: Santosh S <santosh@fossix.org> > Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> > --- > V2: Fixes a trivial checkpatch error in commit msg. > V3: Use proper subject prefix. > --- > arch/powerpc/platforms/pseries/ras.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c > index 43710b69e09e..58e2483fbb1a 100644 > --- a/arch/powerpc/platforms/pseries/ras.c > +++ b/arch/powerpc/platforms/pseries/ras.c > @@ -10,6 +10,7 @@ > #include <linux/fs.h> > #include <linux/reboot.h> > #include <linux/irq_work.h> > +#include <linux/extable.h> > > #include <asm/machdep.h> > #include <asm/rtas.h> > @@ -505,6 +506,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) > int initiator = rtas_error_initiator(errp); > int severity = rtas_error_severity(errp); > u8 error_type, err_sub_type; > + const struct exception_table_entry *entry; > > if (initiator == RTAS_INITIATOR_UNKNOWN) > mce_err.initiator = MCE_INITIATOR_UNKNOWN; > @@ -558,6 +560,12 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) > switch (mce_log->error_type) { > case MC_ERROR_TYPE_UE: > mce_err.error_type = MCE_ERROR_TYPE_UE; > + entry = search_kernel_exception_table(regs->nip); > + if (entry) { > + mce_err.ignore_event = true; > + regs->nip = extable_fixup(entry); > + disposition = RTAS_DISP_FULLY_RECOVERED; > + } > switch (err_sub_type) { > case MC_ERROR_UE_IFETCH: > mce_err.u.ue_error_type = MCE_UE_ERROR_IFETCH; > -- > 2.17.2
On 3/24/20 10:57 AM, Michael Ellerman wrote: > Ganesh Goudar <ganeshgr@linux.ibm.com> writes: >> If we hit UE at an instruction with a fixup entry, flag to >> ignore the event and set nip to continue execution at the >> fixup entry. > You don't explain why we would want to do that. Or what the consequences > are if we *don't* do it. > > As such it's unclear if this is an important fix or just a nice-to-have. We want avoid panic if we hit MCE during memcpy from pmem devices because the system is still recoverable and should just result -EIO, So we flag it here to ignore the UE event. I will respin with better commit message. >> For powernv these changes are already made by >> commit 895e3dceeb97 ("powerpc/mce: Handle UE event for memcpy_mcsafe") > We have masses of code that supposedly abstracts the MCE logic. How did > we end up in the situation where we're having to write the same fix > twice for different platforms? What is common between pseries and powernv now is saving the MCE event for deferred handling and deferred handling. According to me it becomes bit messy to return disposition(UE RECOVERED) from common code. So what we can have is a common function which searches the exception table entry and updates nip with fixup address, And call it from different places for pseries and powernv. If you are ok ill spin next version. > next > > cheers > >> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> >> Reviewed-by: Santosh S <santosh@fossix.org> >> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> >> --- >> V2: Fixes a trivial checkpatch error in commit msg. >> V3: Use proper subject prefix. >> --- >> arch/powerpc/platforms/pseries/ras.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c >> index 43710b69e09e..58e2483fbb1a 100644 >> --- a/arch/powerpc/platforms/pseries/ras.c >> +++ b/arch/powerpc/platforms/pseries/ras.c >> @@ -10,6 +10,7 @@ >> #include <linux/fs.h> >> #include <linux/reboot.h> >> #include <linux/irq_work.h> >> +#include <linux/extable.h> >> >> #include <asm/machdep.h> >> #include <asm/rtas.h> >> @@ -505,6 +506,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) >> int initiator = rtas_error_initiator(errp); >> int severity = rtas_error_severity(errp); >> u8 error_type, err_sub_type; >> + const struct exception_table_entry *entry; >> >> if (initiator == RTAS_INITIATOR_UNKNOWN) >> mce_err.initiator = MCE_INITIATOR_UNKNOWN; >> @@ -558,6 +560,12 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) >> switch (mce_log->error_type) { >> case MC_ERROR_TYPE_UE: >> mce_err.error_type = MCE_ERROR_TYPE_UE; >> + entry = search_kernel_exception_table(regs->nip); >> + if (entry) { >> + mce_err.ignore_event = true; >> + regs->nip = extable_fixup(entry); >> + disposition = RTAS_DISP_FULLY_RECOVERED; >> + } >> switch (err_sub_type) { >> case MC_ERROR_UE_IFETCH: >> mce_err.u.ue_error_type = MCE_UE_ERROR_IFETCH; >> -- >> 2.17.2
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 43710b69e09e..58e2483fbb1a 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -10,6 +10,7 @@ #include <linux/fs.h> #include <linux/reboot.h> #include <linux/irq_work.h> +#include <linux/extable.h> #include <asm/machdep.h> #include <asm/rtas.h> @@ -505,6 +506,7 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) int initiator = rtas_error_initiator(errp); int severity = rtas_error_severity(errp); u8 error_type, err_sub_type; + const struct exception_table_entry *entry; if (initiator == RTAS_INITIATOR_UNKNOWN) mce_err.initiator = MCE_INITIATOR_UNKNOWN; @@ -558,6 +560,12 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) switch (mce_log->error_type) { case MC_ERROR_TYPE_UE: mce_err.error_type = MCE_ERROR_TYPE_UE; + entry = search_kernel_exception_table(regs->nip); + if (entry) { + mce_err.ignore_event = true; + regs->nip = extable_fixup(entry); + disposition = RTAS_DISP_FULLY_RECOVERED; + } switch (err_sub_type) { case MC_ERROR_UE_IFETCH: mce_err.u.ue_error_type = MCE_UE_ERROR_IFETCH;