mbox series

[0/7] powerpc: Modernize unhandled signals message

Message ID 20180724192720.32417-1-muriloo@linux.ibm.com (mailing list archive)
Headers show
Series powerpc: Modernize unhandled signals message | expand

Message

Murilo Opsfelder Araújo July 24, 2018, 7:27 p.m. UTC
Hi, everyone.

This series was inspired by the need to modernize and display more
informative messages about unhandled signals.

The "unhandled signal NN" is not very informative.  We thought it would
be helpful adding a human-readable message describing what the signal
number means, printing the VMA address, and dumping the instructions.

We can add more informative messages, like informing what each code of a
SIGSEGV signal means.  We are open to suggestions.

I have collected some early feedback from Michael Ellerman about this
series and would love to hear more feedback from you all.

Before this series:

    Jul 24 13:01:07 localhost kernel: pandafault[5989]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00003fff85a75100 code 2

After this series:

    Jul 24 13:08:01 localhost kernel: pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
    Jul 24 13:08:01 localhost kernel: Instruction dump:
    Jul 24 13:08:01 localhost kernel: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
    Jul 24 13:08:01 localhost kernel: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

Cheers
Murilo

Murilo Opsfelder Araujo (7):
  powerpc/traps: Print unhandled signals in a separate function
  powerpc/traps: Return early in show_signal_msg()
  powerpc/reg: Add REG_FMT definition
  powerpc/traps: Use REG_FMT in show_signal_msg()
  powerpc/traps: Print VMA for unhandled signals
  powerpc/traps: Print signal name for unhandled signals
  powerpc/traps: Show instructions on exceptions

 arch/powerpc/include/asm/reg.h        |  6 +++
 arch/powerpc/include/asm/stacktrace.h |  7 +++
 arch/powerpc/kernel/process.c         | 28 +++++-----
 arch/powerpc/kernel/traps.c           | 73 +++++++++++++++++++++++----
 4 files changed, 89 insertions(+), 25 deletions(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

Comments

Michael Neuling July 25, 2018, 7 a.m. UTC | #1
On Tue, 2018-07-24 at 16:27 -0300, Murilo Opsfelder Araujo wrote:
> Hi, everyone.
> 
> This series was inspired by the need to modernize and display more
> informative messages about unhandled signals.
> 
> The "unhandled signal NN" is not very informative.  We thought it would
> be helpful adding a human-readable message describing what the signal
> number means, printing the VMA address, and dumping the instructions.
> 
> We can add more informative messages, like informing what each code of a
> SIGSEGV signal means.  We are open to suggestions.
> 
> I have collected some early feedback from Michael Ellerman about this
> series and would love to hear more feedback from you all.

Nice.. the instruction dump would have been very handy when debugging the PCR
init issue I had a month or so back.

> Before this series:
> 
>     Jul 24 13:01:07 localhost kernel: pandafault[5989]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00003fff85a75100 code 2
> 
> After this series:
> 
>     Jul 24 13:08:01 localhost kernel: pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
>     Jul 24 13:08:01 localhost kernel: Instruction dump:
>     Jul 24 13:08:01 localhost kernel: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
>     Jul 24 13:08:01 localhost kernel: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

What happens if we get a sudden flood of these from different processes that
overlap their output? Are we going to be able to match up the process with
instruction dump?

Should we prefix every line with the PID to avoid this?

Mikey
Murilo Opsfelder Araújo July 25, 2018, 7:36 p.m. UTC | #2
Hi, Mikey.

On Wed, Jul 25, 2018 at 05:00:21PM +1000, Michael Neuling wrote:
>  On Tue, 2018-07-24 at 16:27 -0300, Murilo Opsfelder Araujo wrote:
> > Hi, everyone.
> > 
> > This series was inspired by the need to modernize and display more
> > informative messages about unhandled signals.
> > 
> > The "unhandled signal NN" is not very informative.  We thought it would
> > be helpful adding a human-readable message describing what the signal
> > number means, printing the VMA address, and dumping the instructions.
> > 
> > We can add more informative messages, like informing what each code of a
> > SIGSEGV signal means.  We are open to suggestions.
> > 
> > I have collected some early feedback from Michael Ellerman about this
> > series and would love to hear more feedback from you all.
> 
> Nice.. the instruction dump would have been very handy when debugging the PCR
> init issue I had a month or so back.
> 
> > Before this series:
> > 
> >     Jul 24 13:01:07 localhost kernel: pandafault[5989]: unhandled signal 11 at 00000000100007d0 nip 000000001000061c lr 00003fff85a75100 code 2
> > 
> > After this series:
> > 
> >     Jul 24 13:08:01 localhost kernel: pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
> >     Jul 24 13:08:01 localhost kernel: Instruction dump:
> >     Jul 24 13:08:01 localhost kernel: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
> >     Jul 24 13:08:01 localhost kernel: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
> 
> What happens if we get a sudden flood of these from different processes that
> overlap their output? Are we going to be able to match up the process with
> instruction dump?

As to the flood of messages, ___ratelimit() makes me think that we'll
likely see some warn messages informing how many show_signal_msg()
callbacks were suppressed, instead of interleaved messages and
instruction dumps.

As to matching process with instruction dump, I believe we'd need more
information to glue them together.

What if we modify show_instructions() to accept a string prefix to be
printed along with each line?

> Should we prefix every line with the PID to avoid this?

That's possible.  An alternative would be prefixing each line with the
process name and its PID, as in the first line.  For example:

    pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
    pandafault[10758]: Instruction dump:
    pandafault[10758]: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
    pandafault[10758]: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

The above can be interleaved with other messages and we'll still be able
to match process and its corresponding instruction dump.

Cheers
Murilo
Michael Neuling July 26, 2018, 2:11 a.m. UTC | #3
> > Should we prefix every line with the PID to avoid this?
> 
> That's possible.  An alternative would be prefixing each line with the
> process name and its PID, as in the first line.  For example:
> 
>     pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
>     pandafault[10758]: Instruction dump:
>     pandafault[10758]: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
>     pandafault[10758]: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
> 
> The above can be interleaved with other messages and we'll still be able
> to match process and its corresponding instruction dump.

LGTM.

Thanks!
Mikey
Michael Ellerman July 26, 2018, 2:20 a.m. UTC | #4
Murilo Opsfelder Araujo <muriloo@linux.ibm.com> writes:
> On Wed, Jul 25, 2018 at 05:00:21PM +1000, Michael Neuling wrote:
>>  On Tue, 2018-07-24 at 16:27 -0300, Murilo Opsfelder Araujo wrote:
>> > This series was inspired by the need to modernize and display more
>> > informative messages about unhandled signals.
...
>> 
>> Nice.. the instruction dump would have been very handy when debugging the PCR
>> init issue I had a month or so back.

These things may be related :)

>> What happens if we get a sudden flood of these from different processes that
>> overlap their output? Are we going to be able to match up the process with
>> instruction dump?
>
> As to the flood of messages, ___ratelimit() makes me think that we'll
> likely see some warn messages informing how many show_signal_msg()
> callbacks were suppressed, instead of interleaved messages and
> instruction dumps.
>
> As to matching process with instruction dump, I believe we'd need more
> information to glue them together.
>
> What if we modify show_instructions() to accept a string prefix to be
> printed along with each line?
>
>> Should we prefix every line with the PID to avoid this?
>
> That's possible.  An alternative would be prefixing each line with the
> process name and its PID, as in the first line.  For example:
>
>     pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
>     pandafault[10758]: Instruction dump:
>     pandafault[10758]: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
>     pandafault[10758]: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040
>
> The above can be interleaved with other messages and we'll still be able
> to match process and its corresponding instruction dump.

Yeah prefixing with the comm and pid is nice.

Also the "Instruction dump:" line is a waste of space.

I prefer the x86 format, where it's prefixed with "code:", eg:

  pandafault[10758]: segfault (11) at 00000000100007d0 nip 000000001000061c lr 00007fffabc85100 code 2 in pandafault[10000000+10000]
  pandafault[10758]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
  pandafault[10758]: code: 392988d0 f93f0020 e93f0020 39400048 <99490000> 39200000 7d234b78 383f0040

cheers