mbox series

[RFC,0/9] powerpc/64s: fast interrupt exit

Message ID 20201106155929.2246055-1-npiggin@gmail.com (mailing list archive)
Headers show
Series powerpc/64s: fast interrupt exit | expand

Message

Nicholas Piggin Nov. 6, 2020, 3:59 p.m. UTC
This series attempts to improve the speed of interrupts and system calls
in two major ways.

Firstly, the SRR/HSRR registers do not need to be reloaded if they were
not used or clobbered fur the duration of the interrupt.

Secondly, an alternate return location facility is added for soft-masked
asynchronous interrupts and then that's used to set everything up for
return without having to disable MSR RI or EE.

After this series, the entire system call / interrupt handler fast path
executes no mtsprs and one mtmsrd to enable interrupts initially, and
the system call vectored path doesn't even need to do that.

Thanks,
Nick

Nicholas Piggin (9):
  powerpc/64s: syscall real mode entry use mtmsrd rather than rfid
  powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]
  powerpc/64s: introduce different functions to return from SRR vs HSRR
    interrupts
  powerpc/64s: avoid reloading (H)SRR registers if they are still valid
  powerpc/64: move interrupt return asm to interrupt_64.S
  powerpc/64s: save one more register in the masked interrupt handler
  powerpc/64s: allow alternate return locations for soft-masked
    interrupts
  powerpc/64s: interrupt soft-enable race fix
  powerpc/64s: use interrupt restart table to speed up return from
    interrupt

 arch/powerpc/Kconfig.debug                 |   5 +
 arch/powerpc/include/asm/asm-prototypes.h  |   4 +-
 arch/powerpc/include/asm/head-64.h         |   2 +-
 arch/powerpc/include/asm/interrupt.h       |  18 +
 arch/powerpc/include/asm/paca.h            |   3 +
 arch/powerpc/include/asm/ppc_asm.h         |   8 +
 arch/powerpc/include/asm/ptrace.h          |  28 +-
 arch/powerpc/kernel/asm-offsets.c          |   5 +
 arch/powerpc/kernel/entry_64.S             | 508 ---------------
 arch/powerpc/kernel/exceptions-64s.S       | 180 ++++--
 arch/powerpc/kernel/fpu.S                  |   2 +
 arch/powerpc/kernel/head_64.S              |   5 +-
 arch/powerpc/kernel/interrupt_64.S         | 720 +++++++++++++++++++++
 arch/powerpc/kernel/irq.c                  |  79 ++-
 arch/powerpc/kernel/kgdb.c                 |   2 +-
 arch/powerpc/kernel/kprobes-ftrace.c       |   2 +-
 arch/powerpc/kernel/kprobes.c              |  10 +-
 arch/powerpc/kernel/process.c              |  21 +-
 arch/powerpc/kernel/rtas.c                 |  13 +-
 arch/powerpc/kernel/signal.c               |   2 +-
 arch/powerpc/kernel/signal_64.c            |  14 +
 arch/powerpc/kernel/syscall_64.c           | 242 ++++---
 arch/powerpc/kernel/syscalls.c             |   2 +
 arch/powerpc/kernel/traps.c                |  18 +-
 arch/powerpc/kernel/vector.S               |   6 +-
 arch/powerpc/kernel/vmlinux.lds.S          |  10 +
 arch/powerpc/lib/Makefile                  |   2 +-
 arch/powerpc/lib/restart_table.c           |  26 +
 arch/powerpc/lib/sstep.c                   |   5 +-
 arch/powerpc/math-emu/math.c               |   2 +-
 arch/powerpc/mm/fault.c                    |   2 +-
 arch/powerpc/perf/core-book3s.c            |  19 +-
 arch/powerpc/platforms/powernv/opal-call.c |   3 +
 arch/powerpc/sysdev/fsl_pci.c              |   2 +-
 34 files changed, 1244 insertions(+), 726 deletions(-)
 create mode 100644 arch/powerpc/kernel/interrupt_64.S
 create mode 100644 arch/powerpc/lib/restart_table.c

Comments

Christophe Leroy Nov. 7, 2020, 10:35 a.m. UTC | #1
Le 06/11/2020 à 16:59, Nicholas Piggin a écrit :
> This series attempts to improve the speed of interrupts and system calls
> in two major ways.
> 
> Firstly, the SRR/HSRR registers do not need to be reloaded if they were
> not used or clobbered fur the duration of the interrupt.
> 
> Secondly, an alternate return location facility is added for soft-masked
> asynchronous interrupts and then that's used to set everything up for
> return without having to disable MSR RI or EE.
> 
> After this series, the entire system call / interrupt handler fast path
> executes no mtsprs and one mtmsrd to enable interrupts initially, and
> the system call vectored path doesn't even need to do that.

Interesting series.

Unfortunately, can't be done on PPC32 (at least on non bookE), because it would mean mapping kernel 
at 0 instead of 0xC0000000. Not sure libc would like it, and anyway it would be an issue for 
catching NULL pointer dereferencing, unless we use page tables instead of BATs to map kernel mem, 
which would be serious performance cut.

Christophe

> 
> Thanks,
> Nick
> 
> Nicholas Piggin (9):
>    powerpc/64s: syscall real mode entry use mtmsrd rather than rfid
>    powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]
>    powerpc/64s: introduce different functions to return from SRR vs HSRR
>      interrupts
>    powerpc/64s: avoid reloading (H)SRR registers if they are still valid
>    powerpc/64: move interrupt return asm to interrupt_64.S
>    powerpc/64s: save one more register in the masked interrupt handler
>    powerpc/64s: allow alternate return locations for soft-masked
>      interrupts
>    powerpc/64s: interrupt soft-enable race fix
>    powerpc/64s: use interrupt restart table to speed up return from
>      interrupt
> 
>   arch/powerpc/Kconfig.debug                 |   5 +
>   arch/powerpc/include/asm/asm-prototypes.h  |   4 +-
>   arch/powerpc/include/asm/head-64.h         |   2 +-
>   arch/powerpc/include/asm/interrupt.h       |  18 +
>   arch/powerpc/include/asm/paca.h            |   3 +
>   arch/powerpc/include/asm/ppc_asm.h         |   8 +
>   arch/powerpc/include/asm/ptrace.h          |  28 +-
>   arch/powerpc/kernel/asm-offsets.c          |   5 +
>   arch/powerpc/kernel/entry_64.S             | 508 ---------------
>   arch/powerpc/kernel/exceptions-64s.S       | 180 ++++--
>   arch/powerpc/kernel/fpu.S                  |   2 +
>   arch/powerpc/kernel/head_64.S              |   5 +-
>   arch/powerpc/kernel/interrupt_64.S         | 720 +++++++++++++++++++++
>   arch/powerpc/kernel/irq.c                  |  79 ++-
>   arch/powerpc/kernel/kgdb.c                 |   2 +-
>   arch/powerpc/kernel/kprobes-ftrace.c       |   2 +-
>   arch/powerpc/kernel/kprobes.c              |  10 +-
>   arch/powerpc/kernel/process.c              |  21 +-
>   arch/powerpc/kernel/rtas.c                 |  13 +-
>   arch/powerpc/kernel/signal.c               |   2 +-
>   arch/powerpc/kernel/signal_64.c            |  14 +
>   arch/powerpc/kernel/syscall_64.c           | 242 ++++---
>   arch/powerpc/kernel/syscalls.c             |   2 +
>   arch/powerpc/kernel/traps.c                |  18 +-
>   arch/powerpc/kernel/vector.S               |   6 +-
>   arch/powerpc/kernel/vmlinux.lds.S          |  10 +
>   arch/powerpc/lib/Makefile                  |   2 +-
>   arch/powerpc/lib/restart_table.c           |  26 +
>   arch/powerpc/lib/sstep.c                   |   5 +-
>   arch/powerpc/math-emu/math.c               |   2 +-
>   arch/powerpc/mm/fault.c                    |   2 +-
>   arch/powerpc/perf/core-book3s.c            |  19 +-
>   arch/powerpc/platforms/powernv/opal-call.c |   3 +
>   arch/powerpc/sysdev/fsl_pci.c              |   2 +-
>   34 files changed, 1244 insertions(+), 726 deletions(-)
>   create mode 100644 arch/powerpc/kernel/interrupt_64.S
>   create mode 100644 arch/powerpc/lib/restart_table.c
>
Nicholas Piggin Nov. 10, 2020, 8:49 a.m. UTC | #2
Excerpts from Christophe Leroy's message of November 7, 2020 8:35 pm:
> 
> 
> Le 06/11/2020 à 16:59, Nicholas Piggin a écrit :
>> This series attempts to improve the speed of interrupts and system calls
>> in two major ways.
>> 
>> Firstly, the SRR/HSRR registers do not need to be reloaded if they were
>> not used or clobbered fur the duration of the interrupt.
>> 
>> Secondly, an alternate return location facility is added for soft-masked
>> asynchronous interrupts and then that's used to set everything up for
>> return without having to disable MSR RI or EE.
>> 
>> After this series, the entire system call / interrupt handler fast path
>> executes no mtsprs and one mtmsrd to enable interrupts initially, and
>> the system call vectored path doesn't even need to do that.
> 
> Interesting series.
> 
> Unfortunately, can't be done on PPC32 (at least on non bookE), because it would mean mapping kernel 
> at 0 instead of 0xC0000000. Not sure libc would like it, and anyway it would be an issue for 
> catching NULL pointer dereferencing, unless we use page tables instead of BATs to map kernel mem, 
> which would be serious performance cut.

Hmm, why would you have to map at 0?

PPC32 doesn't have soft mask interrupts, but you could still test all 
MSR[PR]=0 interrupts to see if they land inside some region to see if
they hit in the restart table I think?

Could PPC32 skip the SRR reload at least? That's simpler.

Thanks,
Nick
Christophe Leroy Nov. 10, 2020, 11:31 a.m. UTC | #3
Le 10/11/2020 à 09:49, Nicholas Piggin a écrit :
> Excerpts from Christophe Leroy's message of November 7, 2020 8:35 pm:
>>
>>
>> Le 06/11/2020 à 16:59, Nicholas Piggin a écrit :
>>> This series attempts to improve the speed of interrupts and system calls
>>> in two major ways.
>>>
>>> Firstly, the SRR/HSRR registers do not need to be reloaded if they were
>>> not used or clobbered fur the duration of the interrupt.
>>>
>>> Secondly, an alternate return location facility is added for soft-masked
>>> asynchronous interrupts and then that's used to set everything up for
>>> return without having to disable MSR RI or EE.
>>>
>>> After this series, the entire system call / interrupt handler fast path
>>> executes no mtsprs and one mtmsrd to enable interrupts initially, and
>>> the system call vectored path doesn't even need to do that.
>>
>> Interesting series.
>>
>> Unfortunately, can't be done on PPC32 (at least on non bookE), because it would mean mapping kernel
>> at 0 instead of 0xC0000000. Not sure libc would like it, and anyway it would be an issue for
>> catching NULL pointer dereferencing, unless we use page tables instead of BATs to map kernel mem,
>> which would be serious performance cut.
> 
> Hmm, why would you have to map at 0?

In real mode, physical mem is at 0. If we want to switch from real to virtual mode by just writing 
to MSR, then we need virtuel addresses match with real addresses ?
We could play with chip selects to put RAM at 0xc0000000, but then we'd have a problem with 
exception as they have to be at 0. Or we could play with MSR[IP] and get the exceptions at 
0xfff00000, but that would not be so easy I guess.

> 
> PPC32 doesn't have soft mask interrupts, but you could still test all
> MSR[PR]=0 interrupts to see if they land inside some region to see if
> they hit in the restart table I think?

Yes and this is already what is done at exit for the ones that handle MSR[RI] I think.

> 
> Could PPC32 skip the SRR reload at least? That's simpler.

I think that would only be possible if real addresses where matching virtual addresses, or am I 
missing something ?

Christophe
Nicholas Piggin Nov. 11, 2020, 4:49 a.m. UTC | #4
Excerpts from Christophe Leroy's message of November 10, 2020 9:31 pm:
> 
> 
> Le 10/11/2020 à 09:49, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of November 7, 2020 8:35 pm:
>>>
>>>
>>> Le 06/11/2020 à 16:59, Nicholas Piggin a écrit :
>>>> This series attempts to improve the speed of interrupts and system calls
>>>> in two major ways.
>>>>
>>>> Firstly, the SRR/HSRR registers do not need to be reloaded if they were
>>>> not used or clobbered fur the duration of the interrupt.
>>>>
>>>> Secondly, an alternate return location facility is added for soft-masked
>>>> asynchronous interrupts and then that's used to set everything up for
>>>> return without having to disable MSR RI or EE.
>>>>
>>>> After this series, the entire system call / interrupt handler fast path
>>>> executes no mtsprs and one mtmsrd to enable interrupts initially, and
>>>> the system call vectored path doesn't even need to do that.
>>>
>>> Interesting series.
>>>
>>> Unfortunately, can't be done on PPC32 (at least on non bookE), because it would mean mapping kernel
>>> at 0 instead of 0xC0000000. Not sure libc would like it, and anyway it would be an issue for
>>> catching NULL pointer dereferencing, unless we use page tables instead of BATs to map kernel mem,
>>> which would be serious performance cut.
>> 
>> Hmm, why would you have to map at 0?
> 
> In real mode, physical mem is at 0. If we want to switch from real to virtual mode by just writing 
> to MSR, then we need virtuel addresses match with real addresses ?

Ah that's what I missed.

64s real mode masks out the top 2 bits of the address which is how that 
works. But I don't usually think about that path anyway because most
iterrupts arrive with MMU on.

> We could play with chip selects to put RAM at 0xc0000000, but then we'd have a problem with 
> exception as they have to be at 0. Or we could play with MSR[IP] and get the exceptions at 
> 0xfff00000, but that would not be so easy I guess.
> 
>> 
>> PPC32 doesn't have soft mask interrupts, but you could still test all
>> MSR[PR]=0 interrupts to see if they land inside some region to see if
>> they hit in the restart table I think?
> 
> Yes and this is already what is done at exit for the ones that handle MSR[RI] I think.

Interesting, I'll have to check that out.

>> 
>> Could PPC32 skip the SRR reload at least? That's simpler.
> 
> I think that would only be possible if real addresses where matching virtual addresses, or am I 
> missing something ?

No you're right, I was missing something.

Thanks,
Nick