mbox series

[RFC,00/11] New TM Model

Message ID 1536781219-13938-1-git-send-email-leitao@debian.org (mailing list archive)
Headers show
Series New TM Model | expand

Message

Breno Leitao Sept. 12, 2018, 7:40 p.m. UTC
This patchset for the hardware transactional memory (TM) subsystem aims to
avoid spending a lot of time on TM suspended mode in kernel space. It basically
changes where the reclaim/recheckpoint will be executed.

Once a CPU enters in transactional state it uses a footprint area to track
down the load/stores performed in transaction so it can be verified later
to decide if a conflict happened due to some change done in that state. If
a transaction is active in userspace and there is an exception that takes
the CPU to the kernel space the CPU moves the transaction to suspended
state but does not discard the footprint area.

POWER9 has a known problem[1][2] and does not have enough room in
footprint area for several transactions to be suspended at the same time
on concurrent CPUs leading to CPU stalls.

This patchset aims to reclaim the checkpointed footprint as soon as the
kernel is invoked, in the beginning of the exception handlers, thus freeing
room to other CPUs enter in suspended mode, avoiding too many CPUs in suspended
state that can cause the CPUs to stall. The same mechanism is done on kernel
exit, doing a recheckpoint as late as possible (which will reload the
checkpointed state into CPU's room) at the exception return.

The way to achieve this goal is creating a macro (TM_KERNEL_ENTRY) which
will check if userspace was in an active transaction just after getting
into kernel and reclaiming if that's the case. Thus all exception handlers
will call this macro as soon as possible.

All exceptions should reclaim (if necessary) at this stage and only
recheckpoint if the task is tagged as TIF_RESTORE_TM (i.e. was in
transactional state before being interrupted), which will be done at
ret_from_except_lite().

Ideally all reclaims will happen at the exception entrance, however during
the recheckpoint process another exception can hit the CPU which might
cause the current thread to be rescheduled, thus there is another reclaim
point to be considered at __switch_to().

Hence, by allowing the CPU to be in suspended state for only a brief period
it's possible to cope with the TM hardware limitations like the current
problem on the new POWER9.

This patchset was tested in different scenarios using different test
suites, as the kernel selftests and htm-torture[3], in the following
configuration:

 * POWER8/pseries LE and BE
 * POWER8/powernv LE
 * POWER9/powernv LE hosting KVM guests running TM tests

This patchset is based on initial work done by Cyril Bur:
    https://patchwork.ozlabs.org/cover/875341/

Regards,
Breno

[1] Documentation/powerpc/transactional_memory.txt
[2] commit 4bb3c7a0208fc13ca70598efd109901a7cd45ae7
[3] https://github.com/leitao/htm_torture/

Breno Leitao (11):
  powerpc/tm: Reclaim transaction on kernel entry
  powerpc/tm: Reclaim on unavailable exception
  powerpc/tm: Recheckpoint when exiting from kernel
  powerpc/tm: Always set TIF_RESTORE_TM on reclaim
  powerpc/tm: Function that updates the failure code
  powerpc/tm: Refactor the __switch_to_tm code
  powerpc/tm: Do not recheckpoint at sigreturn
  powerpc/tm: Do not reclaim on ptrace
  powerpc/tm: Do not restore default DSCR
  powerpc/tm: Set failure summary
  selftests/powerpc: Adapt the test

 arch/powerpc/include/asm/exception-64s.h      |  46 +++++
 arch/powerpc/kernel/entry_64.S                |  10 +
 arch/powerpc/kernel/exceptions-64s.S          |  15 +-
 arch/powerpc/kernel/process.c                 | 185 +++++++++---------
 arch/powerpc/kernel/ptrace.c                  |  10 +-
 arch/powerpc/kernel/signal_32.c               |  25 +--
 arch/powerpc/kernel/signal_64.c               |  17 +-
 arch/powerpc/kernel/tm.S                      |   4 -
 arch/powerpc/kernel/traps.c                   |  16 +-
 .../testing/selftests/powerpc/tm/tm-syscall.c |   6 -
 10 files changed, 178 insertions(+), 156 deletions(-)

Comments

Michael Neuling Sept. 17, 2018, 5:25 a.m. UTC | #1
On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
> This patchset for the hardware transactional memory (TM) subsystem aims to
> avoid spending a lot of time on TM suspended mode in kernel space. It
> basically
> changes where the reclaim/recheckpoint will be executed.
> 
> Once a CPU enters in transactional state it uses a footprint area to track
> down the load/stores performed in transaction so it can be verified later
> to decide if a conflict happened due to some change done in that state. If
> a transaction is active in userspace and there is an exception that takes
> the CPU to the kernel space the CPU moves the transaction to suspended
> state but does not discard the footprint area.

In this description, you should differente between memory and register
(GPR/VSX/SPR) footprints.

In suspend, the CPU can disregard the memory footprint at any point, but it has
to keep the register footprint.  

In the above paragraph you are talking about register footprint but not memory
footprint. 

> 
> POWER9 has a known problem[1][2] and does not have enough room in
> footprint area for several transactions to be suspended at the same time
> on concurrent CPUs leading to CPU stalls.
> 
> This patchset aims to reclaim the checkpointed footprint as soon as the
> kernel is invoked, in the beginning of the exception handlers, thus freeing
> room to other CPUs enter in suspended mode, avoiding too many CPUs in
> suspended
> state that can cause the CPUs to stall. The same mechanism is done on kernel
> exit, doing a recheckpoint as late as possible (which will reload the
> checkpointed state into CPU's room) at the exception return.

OK, but we are still potentially in suspend in userspace, so that doesn't help
us on the lockup issue.

We need fake suspend in userspace to prevent lockups.

> The way to achieve this goal is creating a macro (TM_KERNEL_ENTRY) which
> will check if userspace was in an active transaction just after getting
> into kernel and reclaiming if that's the case. Thus all exception handlers
> will call this macro as soon as possible.
> 
> All exceptions should reclaim (if necessary) at this stage and only
> recheckpoint if the task is tagged as TIF_RESTORE_TM (i.e. was in
> transactional state before being interrupted), which will be done at
> ret_from_except_lite().
> 
> Ideally all reclaims will happen at the exception entrance, however during
> the recheckpoint process another exception can hit the CPU which might
> cause the current thread to be rescheduled, thus there is another reclaim
> point to be considered at __switch_to().

Can we do the recheckpoint() later so that it's when we have interrupts off and
can't be rescheduled?

> Hence, by allowing the CPU to be in suspended state for only a brief period
> it's possible to cope with the TM hardware limitations like the current
> problem on the new POWER9.

As mentioned, since we're still running userspace with real suspend, we still
have an issue.

> This patchset was tested in different scenarios using different test
> suites, as the kernel selftests and htm-torture[3], in the following
> configuration:
> 
>  * POWER8/pseries LE and BE
>  * POWER8/powernv LE
>  * POWER9/powernv LE hosting KVM guests running TM tests
> 
> This patchset is based on initial work done by Cyril Bur:
>     https://patchwork.ozlabs.org/cover/875341/

Adding Cyril to CC.

Mikey
Breno Leitao Sept. 27, 2018, 8:13 p.m. UTC | #2
Hi Mikey,

First of all, thanks for you detailed review. I really appreciate your
comments here.

On 09/17/2018 02:25 AM, Michael Neuling wrote:
> On Wed, 2018-09-12 at 16:40 -0300, Breno Leitao wrote:
>> This patchset for the hardware transactional memory (TM) subsystem aims to
>> avoid spending a lot of time on TM suspended mode in kernel space. It
>> basically
>> changes where the reclaim/recheckpoint will be executed.
>>
>> Once a CPU enters in transactional state it uses a footprint area to track
>> down the load/stores performed in transaction so it can be verified later
>> to decide if a conflict happened due to some change done in that state. If
>> a transaction is active in userspace and there is an exception that takes
>> the CPU to the kernel space the CPU moves the transaction to suspended
>> state but does not discard the footprint area.
> 
> In this description, you should differente between memory and register
> (GPR/VSX/SPR) footprints.

Right, reading the ISA, I understand that footprint is a term for memory only
and it represents the modified memory that was stored during a transactional
state (that after tbegin). For registers, the ISA talks about checkpointed
registers, which is the register state *before* a transaction starts.

I.e, for register it is the previous state, and for memory, it is the
current/live state.

That said, if the transactional is aborted, the memory footprint is discarded
and the checkpointed registers replaces the live registers.

> In suspend, the CPU can disregard the memory footprint at any point, but it has> to keep the register footprint.
Yes!

Anyway, I was just trying to describe how the hardware works, it is not
related to the kernel at the paragraph above, but I will make sure I will
re-write it better.

>> This patchset aims to reclaim the checkpointed footprint as soon as the
>> kernel is invoked, in the beginning of the exception handlers, thus freeing
>> room to other CPUs enter in suspended mode, avoiding too many CPUs in
>> suspended
>> state that can cause the CPUs to stall. The same mechanism is done on kernel
>> exit, doing a recheckpoint as late as possible (which will reload the
>> checkpointed state into CPU's room) at the exception return.
> 
> OK, but we are still potentially in suspend in userspace, so that doesn't help
> us on the lockup issue.
> 
> We need fake suspend in userspace to prevent lockups.

Correct. I will make sure I document it. This patchset is the very first step
to start creating a work around for the hardware limitations.

>> The way to achieve this goal is creating a macro (TM_KERNEL_ENTRY) which
>> will check if userspace was in an active transaction just after getting
>> into kernel and reclaiming if that's the case. Thus all exception handlers
>> will call this macro as soon as possible.
>>
>> All exceptions should reclaim (if necessary) at this stage and only
>> recheckpoint if the task is tagged as TIF_RESTORE_TM (i.e. was in
>> transactional state before being interrupted), which will be done at
>> ret_from_except_lite().
>>
>> Ideally all reclaims will happen at the exception entrance, however during
>> the recheckpoint process another exception can hit the CPU which might
>> cause the current thread to be rescheduled, thus there is another reclaim
>> point to be considered at __switch_to().
> 
> Can we do the recheckpoint() later so that it's when we have interrupts off and
> can't be rescheduled?

Yes!  After thinking on it for a long time, this is definitely what should be
done. I will send a v2 with this change (and others being discussed here)

Thank you,
Breno