diff mbox

powerpc: Add support for userspace P9 copy paste

Message ID 20160426002844.GA9198@x1.distroguy.com (mailing list archive)
State Accepted
Headers show

Commit Message

Chris Smart April 26, 2016, 12:28 a.m. UTC
The copy paste facility introduced in POWER9 provides an optimised
mechanism for a userspace application to copy a cacheline. This is
provided by a pair of instructions, copy and paste, while a third,
cp_abort (copy paste abort), provides a clean up of the state in case of
a failure.

The copy instruction will read a 128 byte cacheline and store it in an
internal buffer. The subsequent paste instruction will store this
internal buffer to memory and set a CR field if the paste succeeds.

Since the state of the copy paste buffer is internal (and not
architecturally visible), in the unlikely event of a context switch, the
state cannot be stored and the paste should therefore fail.

The cp_abort instruction exists to fail and clean up any such
interrupted copy paste sequence and is to be called by the kernel as
part of the context switch. Doing so prevents data from a preceding copy
in one process leaking into the paste of another.

This code enables use of the cp_abort instruction if a supported
processor is detected.

NOTE: this is for userspace only, not in kernel, and does not deal
with KVM guests.

Patch created with much assistance from Michael Neuling
<mikey@neuling.org>

Signed-off-by: Chris Smart <chris@distroguy.com>
---

Note: A follow-up patch is expected soon with a working self-test.

 arch/powerpc/include/asm/ppc-opcode.h | 2 ++
 arch/powerpc/kernel/entry_64.S        | 9 +++++++++
 2 files changed, 11 insertions(+)

Comments

Cyril Bur April 26, 2016, 6:42 a.m. UTC | #1
On Tue, 26 Apr 2016 10:28:50 +1000
Chris Smart <chris@distroguy.com> wrote:

> The copy paste facility introduced in POWER9 provides an optimised
> mechanism for a userspace application to copy a cacheline. This is
> provided by a pair of instructions, copy and paste, while a third,
> cp_abort (copy paste abort), provides a clean up of the state in case of
> a failure.
> 
> The copy instruction will read a 128 byte cacheline and store it in an
> internal buffer. The subsequent paste instruction will store this
> internal buffer to memory and set a CR field if the paste succeeds.
> 
> Since the state of the copy paste buffer is internal (and not
> architecturally visible), in the unlikely event of a context switch, the
> state cannot be stored and the paste should therefore fail.
> 
> The cp_abort instruction exists to fail and clean up any such
> interrupted copy paste sequence and is to be called by the kernel as
> part of the context switch. Doing so prevents data from a preceding copy
> in one process leaking into the paste of another.
> 
> This code enables use of the cp_abort instruction if a supported
> processor is detected.
> 
> NOTE: this is for userspace only, not in kernel, and does not deal
> with KVM guests.
> 
> Patch created with much assistance from Michael Neuling
> <mikey@neuling.org>
> 

Hi Chris,

Patch looks good. Looks like you've put 8 spaces (instead of a tab) on the
PPC_CP_ABORT line.

Apart from that,

Reviewed-by: Cyril Bur <cyrilbur@gmail.com>

> Signed-off-by: Chris Smart <chris@distroguy.com>
> ---
> 
> Note: A follow-up patch is expected soon with a working self-test.
> 
>  arch/powerpc/include/asm/ppc-opcode.h | 2 ++
>  arch/powerpc/kernel/entry_64.S        | 9 +++++++++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
> index 7ab04fc59e24..1d035c1cc889 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -131,6 +131,7 @@
>  /* sorted alphabetically */
>  #define PPC_INST_BHRBE			0x7c00025c
>  #define PPC_INST_CLRBHRB		0x7c00035c
> +#define PPC_INST_CP_ABORT		0x7c00068c
>  #define PPC_INST_DCBA			0x7c0005ec
>  #define PPC_INST_DCBA_MASK		0xfc0007fe
>  #define PPC_INST_DCBAL			0x7c2005ec
> @@ -285,6 +286,7 @@
>  #endif
>  
>  /* Deal with instructions that older assemblers aren't aware of */
> +#define	PPC_CP_ABORT		stringify_in_c(.long PPC_INST_CP_ABORT)
>  #define	PPC_DCBAL(a, b)		stringify_in_c(.long PPC_INST_DCBAL | \
>  					__PPC_RA(a) | __PPC_RB(b))
>  #define	PPC_DCBZL(a, b)		stringify_in_c(.long PPC_INST_DCBZL | \
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index 8b9d68676d2b..ab1457c3f1d1 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -36,6 +36,7 @@
>  #include <asm/hw_irq.h>
>  #include <asm/context_tracking.h>
>  #include <asm/tm.h>
> +#include <asm/ppc-opcode.h>
>  
>  /*
>   * System calls.
> @@ -508,6 +509,14 @@ BEGIN_FTR_SECTION
>  	ldarx	r6,0,r1
>  END_FTR_SECTION_IFSET(CPU_FTR_STCX_CHECKS_ADDRESS)
>  
> +BEGIN_FTR_SECTION
> +/*
> + * A cp_abort (copy paste abort) here ensures that when context switching, a
> + * copy from one process can't leak into the paste of another.
> + */
> +        PPC_CP_ABORT
> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> +
>  #ifdef CONFIG_PPC_BOOK3S
>  /* Cancel all explict user streams as they will have no use after context
>   * switch and will stop the HW from creating streams itself
Chris Smart April 26, 2016, 6:49 a.m. UTC | #2
On Tue, Apr 26, 2016 at 04:42:09PM +1000, Cyril Bur wrote:
>Hi Chris,
>
>Patch looks good. Looks like you've put 8 spaces (instead of a tab) on the
>PPC_CP_ABORT line.
>

Argh. I'm not sure how that happened, thanks.

>Apart from that,
>
>Reviewed-by: Cyril Bur <cyrilbur@gmail.com>

Cheers,
-c
Balbir Singh April 26, 2016, 7:45 a.m. UTC | #3
> /*
>  * System calls.
> @@ -508,6 +509,14 @@ BEGIN_FTR_SECTION
>     ldarx    r6,0,r1
> END_FTR_SECTION_IFSET(CPU_FTR_STCX_CHECKS_ADDRESS)
> 
> +BEGIN_FTR_SECTION
> +/*
> + * A cp_abort (copy paste abort) here ensures that when context switching, a
> + * copy from one process can't leak into the paste of another.
> + */
> +        PPC_CP_ABORT

I think the alignment issue has been called out, but it is not clear from the changelog that
we do this during syscall_exit/syscalls.

And also, do we need to care about preemptions, etc by the scheduler?

> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> +
> #ifdef CONFIG_PPC_BOOK3S
> /* Cancel all explict user streams as they will have no use after context
>  * switch and will stop the HW from creating streams itself
Michael Neuling April 27, 2016, 2:05 a.m. UTC | #4
On Tue, 2016-04-26 at 17:45 +1000, Balbir Singh wrote:
> > 
> > /*
> >  * System calls.
> > @@ -508,6 +509,14 @@ BEGIN_FTR_SECTION
> >     ldarx    r6,0,r1
> > END_FTR_SECTION_IFSET(CPU_FTR_STCX_CHECKS_ADDRESS)
> > 
> > +BEGIN_FTR_SECTION
> > +/*
> > + * A cp_abort (copy paste abort) here ensures that when context
> > switching, a
> > + * copy from one process can't leak into the paste of another.
> > + */
> > +        PPC_CP_ABORT
> I think the alignment issue has been called out, but it is not clear from
> the changelog that
> we do this during syscall_exit/syscalls.

I don't think we need this during syscall entry exit until we use this
in the kernel.

> And also, do we need to care about preemptions, etc by the scheduler?

This handles the userspace preemption case.

kernel support is not handled here, so kernel preemption is TBD.

Mikey

> > 
> > +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
> > +
> > #ifdef CONFIG_PPC_BOOK3S
> > /* Cancel all explict user streams as they will have no use after context
> >  * switch and will stop the HW from creating streams itself
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
David Laight April 27, 2016, 3:25 p.m. UTC | #5
From: Chris Smart

> Sent: 26 April 2016 01:29

> The copy paste facility introduced in POWER9 provides an optimised

> mechanism for a userspace application to copy a cacheline. This is

> provided by a pair of instructions, copy and paste, while a third,

> cp_abort (copy paste abort), provides a clean up of the state in case of

> a failure.

> 

> The copy instruction will read a 128 byte cacheline and store it in an

> internal buffer. The subsequent paste instruction will store this

> internal buffer to memory and set a CR field if the paste succeeds.

> 

> Since the state of the copy paste buffer is internal (and not

> architecturally visible), in the unlikely event of a context switch, the

> state cannot be stored and the paste should therefore fail.

> 

> The cp_abort instruction exists to fail and clean up any such

> interrupted copy paste sequence and is to be called by the kernel as

> part of the context switch. Doing so prevents data from a preceding copy

> in one process leaking into the paste of another.

> 

> This code enables use of the cp_abort instruction if a supported

> processor is detected.


In that case what actually completes the copy?
I think you'd need to be inside a 'restartable atomic sequence'
in which case the cp_abort need only be done when the/a RAS
block is detected.

Or have I missed something??

	David
Chris Smart April 27, 2016, 11:51 p.m. UTC | #6
On Wed, Apr 27, 2016 at 03:25:59PM +0000, David Laight wrote:
>From: Chris Smart
>> Sent: 26 April 2016 01:29
>> The copy paste facility introduced in POWER9 provides an optimised
>> mechanism for a userspace application to copy a cacheline. This is
>> provided by a pair of instructions, copy and paste, while a third,
>> cp_abort (copy paste abort), provides a clean up of the state in case of
>> a failure.
>>
>> The copy instruction will read a 128 byte cacheline and store it in an
>> internal buffer. The subsequent paste instruction will store this
>> internal buffer to memory and set a CR field if the paste succeeds.
>>
>> Since the state of the copy paste buffer is internal (and not
>> architecturally visible), in the unlikely event of a context switch, the
>> state cannot be stored and the paste should therefore fail.
>>
>> The cp_abort instruction exists to fail and clean up any such
>> interrupted copy paste sequence and is to be called by the kernel as
>> part of the context switch. Doing so prevents data from a preceding copy
>> in one process leaking into the paste of another.
>>
>> This code enables use of the cp_abort instruction if a supported
>> processor is detected.
>
>In that case what actually completes the copy?
>I think you'd need to be inside a 'restartable atomic sequence'
>in which case the cp_abort need only be done when the/a RAS
>block is detected.
>
>Or have I missed something??

It's up to the userspace process that's doing the copy paste to check
the CR field after the paste to see whether it has succeeded. If it has
not succeeded, then the process can choose to re-do the copy and paste.

This is very similar to a lwarx stwcx where the user checks the CR after
the stwcx to verify whether it has succeeded.

Hope that makes sense.

-c
Michael Ellerman April 28, 2016, 2:22 p.m. UTC | #7
On Tue, 2016-26-04 at 00:28:50 UTC, Chris Smart wrote:
> The copy paste facility introduced in POWER9 provides an optimised
> mechanism for a userspace application to copy a cacheline. This is
> provided by a pair of instructions, copy and paste, while a third,
> cp_abort (copy paste abort), provides a clean up of the state in case of
> a failure.
...
> Signed-off-by: Chris Smart <chris@distroguy.com>
> Reviewed-by: Cyril Bur <cyrilbur@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8a649045e75a4b9091ea9d041f

cheers
David Laight April 28, 2016, 2:48 p.m. UTC | #8
From: Chris Smart

> Sent: 28 April 2016 00:52

...
> >In that case what actually completes the copy?

> >I think you'd need to be inside a 'restartable atomic sequence'

> >in which case the cp_abort need only be done when the/a RAS

> >block is detected.

> >

> >Or have I missed something??

> 

> It's up to the userspace process that's doing the copy paste to check

> the CR field after the paste to see whether it has succeeded. If it has

> not succeeded, then the process can choose to re-do the copy and paste.

>

> Hope that makes sense.


Yes, thanks.

	David
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index 7ab04fc59e24..1d035c1cc889 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -131,6 +131,7 @@ 
 /* sorted alphabetically */
 #define PPC_INST_BHRBE			0x7c00025c
 #define PPC_INST_CLRBHRB		0x7c00035c
+#define PPC_INST_CP_ABORT		0x7c00068c
 #define PPC_INST_DCBA			0x7c0005ec
 #define PPC_INST_DCBA_MASK		0xfc0007fe
 #define PPC_INST_DCBAL			0x7c2005ec
@@ -285,6 +286,7 @@ 
 #endif
 
 /* Deal with instructions that older assemblers aren't aware of */
+#define	PPC_CP_ABORT		stringify_in_c(.long PPC_INST_CP_ABORT)
 #define	PPC_DCBAL(a, b)		stringify_in_c(.long PPC_INST_DCBAL | \
 					__PPC_RA(a) | __PPC_RB(b))
 #define	PPC_DCBZL(a, b)		stringify_in_c(.long PPC_INST_DCBZL | \
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 8b9d68676d2b..ab1457c3f1d1 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -36,6 +36,7 @@ 
 #include <asm/hw_irq.h>
 #include <asm/context_tracking.h>
 #include <asm/tm.h>
+#include <asm/ppc-opcode.h>
 
 /*
  * System calls.
@@ -508,6 +509,14 @@  BEGIN_FTR_SECTION
 	ldarx	r6,0,r1
 END_FTR_SECTION_IFSET(CPU_FTR_STCX_CHECKS_ADDRESS)
 
+BEGIN_FTR_SECTION
+/*
+ * A cp_abort (copy paste abort) here ensures that when context switching, a
+ * copy from one process can't leak into the paste of another.
+ */
+        PPC_CP_ABORT
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300)
+
 #ifdef CONFIG_PPC_BOOK3S
 /* Cancel all explict user streams as they will have no use after context
  * switch and will stop the HW from creating streams itself