diff mbox

powerpc/lib: Split xor_vmx file to guarantee instruction ordering

Message ID 20170523234559.28080-1-matthew.brown.dev@gmail.com (mailing list archive)
State Accepted
Commit f718d426d7e42eec6e5d2932f52a51de23bd3b30
Headers show

Commit Message

Matt Brown May 23, 2017, 11:45 p.m. UTC
The xor_vmx.c file is used for the RAID5 xor operations. In these functions
altivec is enabled to run the operation and then disabled. However due to
compiler instruction reordering, altivec instructions are being run before
enable_altivec() and after disable_altivec().

This patch splits the non-altivec code into xor_vmx_glue.c which calls the
altivec functions in xor_vmx.c. By compiling xor_vmx_glue.c without
-maltivec we can guarantee that altivec instruction will not be reordered
outside of the enable/disable block.

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
---
 arch/powerpc/lib/Makefile       |  2 +-
 arch/powerpc/lib/xor_vmx.c      | 53 ++++++++---------------------------
 arch/powerpc/lib/xor_vmx.h      | 20 +++++++++++++
 arch/powerpc/lib/xor_vmx_glue.c | 62 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 94 insertions(+), 43 deletions(-)
 create mode 100644 arch/powerpc/lib/xor_vmx.h
 create mode 100644 arch/powerpc/lib/xor_vmx_glue.c

Comments

Paul A. Clarke May 24, 2017, 1:36 p.m. UTC | #1
On 05/23/2017 06:45 PM, Matt Brown wrote:
> The xor_vmx.c file is used for the RAID5 xor operations. In these functions
> altivec is enabled to run the operation and then disabled. However due to
> compiler instruction reordering, altivec instructions are being run before
> enable_altivec() and after disable_altivec().

If altivec instructions can be reordered after disable_altivec(), then disable_altivec() is broken, I'd think.

Could it be because the isync in mtmsr_isync() is after the mtmsr?

disable_kernel_altivec
- msr_check_and_clear
  - __msr_check_and_clear
    - mtmsr_isync

PC
Matt Brown May 25, 2017, 4:56 a.m. UTC | #2
On Wed, May 24, 2017 at 11:36 PM, Paul Clarke <pc@us.ibm.com> wrote:
> On 05/23/2017 06:45 PM, Matt Brown wrote:
>> The xor_vmx.c file is used for the RAID5 xor operations. In these functions
>> altivec is enabled to run the operation and then disabled. However due to
>> compiler instruction reordering, altivec instructions are being run before
>> enable_altivec() and after disable_altivec().
>
> If altivec instructions can be reordered after disable_altivec(), then disable_altivec() is broken, I'd think.
>
> Could it be because the isync in mtmsr_isync() is after the mtmsr?
>
> disable_kernel_altivec
> - msr_check_and_clear
>   - __msr_check_and_clear
>     - mtmsr_isync

So it turns out the enable / disable functions don't actually enable
or disable the use of vector instructions.
If we have marked the file to be compiled with altivec the compiler
has free reign to reorder the vector instructions wherever it likes.
Including reordering it before or after the enable/disable_altivec
commands.

The enable_kernel_altivec and disable_kernel_altivec functions are
mainly there to empty and restore the vector registers which could
have been used in user-space. So those functions work as intended,
although not particularly intuitive.

Splitting the files and only compiling the xor_vmx.c file with altivec
will guarantee that there are no vector instructions in the
xor_vmx_glue.c file, and that no vector instructions are outside of
the enable/disable block.


- Matt Brown

>
> PC
>
Paul A. Clarke May 25, 2017, 1:25 p.m. UTC | #3
On 05/24/2017 11:56 PM, Matt Brown wrote:
> On Wed, May 24, 2017 at 11:36 PM, Paul Clarke <pc@us.ibm.com> wrote:
>> On 05/23/2017 06:45 PM, Matt Brown wrote:
>>> The xor_vmx.c file is used for the RAID5 xor operations. In these functions
>>> altivec is enabled to run the operation and then disabled. However due to
>>> compiler instruction reordering, altivec instructions are being run before
>>> enable_altivec() and after disable_altivec().
>>
>> If altivec instructions can be reordered after disable_altivec(), then disable_altivec() is broken, I'd think.
>>
>> Could it be because the isync in mtmsr_isync() is after the mtmsr?
>>
>> disable_kernel_altivec
>> - msr_check_and_clear
>>   - __msr_check_and_clear
>>     - mtmsr_isync
> 
> So it turns out the enable / disable functions don't actually enable
> or disable the use of vector instructions.
> If we have marked the file to be compiled with altivec the compiler
> has free reign to reorder the vector instructions wherever it likes.
> Including reordering it before or after the enable/disable_altivec
> commands.
> 
> The enable_kernel_altivec and disable_kernel_altivec functions are
> mainly there to empty and restore the vector registers which could
> have been used in user-space. So those functions work as intended,
> although not particularly intuitive.
> 
> Splitting the files and only compiling the xor_vmx.c file with altivec
> will guarantee that there are no vector instructions in the
> xor_vmx_glue.c file, and that no vector instructions are outside of
> the enable/disable block.

Thanks for the explanation!  That makes sense.

PC
Michael Ellerman May 26, 2017, 4:12 a.m. UTC | #4
Matt Brown <matthew.brown.dev@gmail.com> writes:

> On Wed, May 24, 2017 at 11:36 PM, Paul Clarke <pc@us.ibm.com> wrote:
>> On 05/23/2017 06:45 PM, Matt Brown wrote:
>>> The xor_vmx.c file is used for the RAID5 xor operations. In these functions
>>> altivec is enabled to run the operation and then disabled. However due to
>>> compiler instruction reordering, altivec instructions are being run before
>>> enable_altivec() and after disable_altivec().
>>
>> If altivec instructions can be reordered after disable_altivec(), then disable_altivec() is broken, I'd think.
>>
>> Could it be because the isync in mtmsr_isync() is after the mtmsr?
>>
>> disable_kernel_altivec
>> - msr_check_and_clear
>>   - __msr_check_and_clear
>>     - mtmsr_isync
>
> So it turns out the enable / disable functions don't actually enable
> or disable the use of vector instructions.
> If we have marked the file to be compiled with altivec the compiler
> has free reign to reorder the vector instructions wherever it likes.
> Including reordering it before or after the enable/disable_altivec
> commands.

That's true. Though it's not that the compiler is reordering vector
instructions, it's that it's generating them directly to save/restore
vector arguments.

In enable_kernel_altivec() we have a mtmsr() which is implemented with
inline asm, with a "memory" clobber, so that is effectively a full
compiler barrier. ie. the compiler will not reorder code across that.

> The enable_kernel_altivec and disable_kernel_altivec functions are
> mainly there to empty and restore the vector registers which could
> have been used in user-space. So those functions work as intended,
> although not particularly intuitive.

Yeah, perhaps "allow" would be better. In fact the whole API is a bit
crap and we need to rework it.

cheers
Michael Ellerman June 5, 2017, 10:21 a.m. UTC | #5
On Tue, 2017-05-23 at 23:45:59 UTC, Matt Brown wrote:
> The xor_vmx.c file is used for the RAID5 xor operations. In these functions
> altivec is enabled to run the operation and then disabled. However due to
> compiler instruction reordering, altivec instructions are being run before
> enable_altivec() and after disable_altivec().
> 
> This patch splits the non-altivec code into xor_vmx_glue.c which calls the
> altivec functions in xor_vmx.c. By compiling xor_vmx_glue.c without
> -maltivec we can guarantee that altivec instruction will not be reordered
> outside of the enable/disable block.
> 
> Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f718d426d7e42eec6e5d2932f52a51

cheers
diff mbox

Patch

diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 309361e8..a448464 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -31,7 +31,7 @@  obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
 
 obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
 
-obj-$(CONFIG_ALTIVEC)	+= xor_vmx.o
+obj-$(CONFIG_ALTIVEC)	+= xor_vmx.o xor_vmx_glue.o
 CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
 
 obj-$(CONFIG_PPC64) += $(obj64-y)
diff --git a/arch/powerpc/lib/xor_vmx.c b/arch/powerpc/lib/xor_vmx.c
index f9de69a..4df240a 100644
--- a/arch/powerpc/lib/xor_vmx.c
+++ b/arch/powerpc/lib/xor_vmx.c
@@ -29,10 +29,7 @@ 
 #define vector __attribute__((vector_size(16)))
 #endif
 
-#include <linux/preempt.h>
-#include <linux/export.h>
-#include <linux/sched.h>
-#include <asm/switch_to.h>
+#include "xor_vmx.h"
 
 typedef vector signed char unative_t;
 
@@ -64,16 +61,13 @@  typedef vector signed char unative_t;
 		V1##_3 = vec_xor(V1##_3, V2##_3);	\
 	} while (0)
 
-void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
-		   unsigned long *v2_in)
+void __xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
+		     unsigned long *v2_in)
 {
 	DEFINE(v1);
 	DEFINE(v2);
 	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
 
-	preempt_disable();
-	enable_kernel_altivec();
-
 	do {
 		LOAD(v1);
 		LOAD(v2);
@@ -83,23 +77,16 @@  void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
 		v1 += 4;
 		v2 += 4;
 	} while (--lines > 0);
-
-	disable_kernel_altivec();
-	preempt_enable();
 }
-EXPORT_SYMBOL(xor_altivec_2);
 
-void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
-		   unsigned long *v2_in, unsigned long *v3_in)
+void __xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
+		     unsigned long *v2_in, unsigned long *v3_in)
 {
 	DEFINE(v1);
 	DEFINE(v2);
 	DEFINE(v3);
 	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
 
-	preempt_disable();
-	enable_kernel_altivec();
-
 	do {
 		LOAD(v1);
 		LOAD(v2);
@@ -112,15 +99,11 @@  void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
 		v2 += 4;
 		v3 += 4;
 	} while (--lines > 0);
-
-	disable_kernel_altivec();
-	preempt_enable();
 }
-EXPORT_SYMBOL(xor_altivec_3);
 
-void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
-		   unsigned long *v2_in, unsigned long *v3_in,
-		   unsigned long *v4_in)
+void __xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
+		     unsigned long *v2_in, unsigned long *v3_in,
+		     unsigned long *v4_in)
 {
 	DEFINE(v1);
 	DEFINE(v2);
@@ -128,9 +111,6 @@  void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
 	DEFINE(v4);
 	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
 
-	preempt_disable();
-	enable_kernel_altivec();
-
 	do {
 		LOAD(v1);
 		LOAD(v2);
@@ -146,15 +126,11 @@  void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
 		v3 += 4;
 		v4 += 4;
 	} while (--lines > 0);
-
-	disable_kernel_altivec();
-	preempt_enable();
 }
-EXPORT_SYMBOL(xor_altivec_4);
 
-void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
-		   unsigned long *v2_in, unsigned long *v3_in,
-		   unsigned long *v4_in, unsigned long *v5_in)
+void __xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
+		     unsigned long *v2_in, unsigned long *v3_in,
+		     unsigned long *v4_in, unsigned long *v5_in)
 {
 	DEFINE(v1);
 	DEFINE(v2);
@@ -163,9 +139,6 @@  void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
 	DEFINE(v5);
 	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
 
-	preempt_disable();
-	enable_kernel_altivec();
-
 	do {
 		LOAD(v1);
 		LOAD(v2);
@@ -184,8 +157,4 @@  void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
 		v4 += 4;
 		v5 += 4;
 	} while (--lines > 0);
-
-	disable_kernel_altivec();
-	preempt_enable();
 }
-EXPORT_SYMBOL(xor_altivec_5);
diff --git a/arch/powerpc/lib/xor_vmx.h b/arch/powerpc/lib/xor_vmx.h
new file mode 100644
index 0000000..4414a24
--- /dev/null
+++ b/arch/powerpc/lib/xor_vmx.h
@@ -0,0 +1,20 @@ 
+/*
+ * Simple interface to link xor_vmx.c and xor_vmx_glue.c
+ *
+ * Seperating these file ensures that no altivec instructions are run 
+ * outside of the enable/disable altivec block.
+ */
+
+void __xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
+			     unsigned long *v2_in);
+
+void __xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
+			     unsigned long *v2_in, unsigned long *v3_in);
+
+void __xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
+			     unsigned long *v2_in, unsigned long *v3_in,
+			     unsigned long *v4_in);
+
+void __xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
+			     unsigned long *v2_in, unsigned long *v3_in,
+			     unsigned long *v4_in, unsigned long *v5_in);
diff --git a/arch/powerpc/lib/xor_vmx_glue.c b/arch/powerpc/lib/xor_vmx_glue.c
new file mode 100644
index 0000000..6521fe5
--- /dev/null
+++ b/arch/powerpc/lib/xor_vmx_glue.c
@@ -0,0 +1,62 @@ 
+/*
+ * Altivec XOR operations
+ *
+ * Copyright 2017 IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/preempt.h>
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <asm/switch_to.h>
+#include "xor_vmx.h"
+
+void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
+		   unsigned long *v2_in)
+{
+	preempt_disable();
+	enable_kernel_altivec();
+	__xor_altivec_2(bytes, v1_in, v2_in);
+	disable_kernel_altivec();
+	preempt_enable();
+}
+EXPORT_SYMBOL(xor_altivec_2);
+
+void xor_altivec_3(unsigned long bytes,  unsigned long *v1_in,
+		   unsigned long *v2_in, unsigned long *v3_in)
+{
+	preempt_disable();
+	enable_kernel_altivec();
+	__xor_altivec_3(bytes, v1_in, v2_in, v3_in);
+	disable_kernel_altivec();
+	preempt_enable();
+}
+EXPORT_SYMBOL(xor_altivec_3);
+
+void xor_altivec_4(unsigned long bytes,  unsigned long *v1_in,
+		   unsigned long *v2_in, unsigned long *v3_in,
+		   unsigned long *v4_in)
+{
+	preempt_disable();
+	enable_kernel_altivec();
+	__xor_altivec_4(bytes, v1_in, v2_in, v3_in, v4_in);
+	disable_kernel_altivec();
+	preempt_enable();
+}
+EXPORT_SYMBOL(xor_altivec_4);
+
+void xor_altivec_5(unsigned long bytes,  unsigned long *v1_in,
+		   unsigned long *v2_in, unsigned long *v3_in,
+		   unsigned long *v4_in, unsigned long *v5_in)
+{
+	preempt_disable();
+	enable_kernel_altivec();
+	__xor_altivec_5(bytes, v1_in, v2_in, v3_in, v4_in, v5_in);
+	disable_kernel_altivec();
+	preempt_enable();
+}
+EXPORT_SYMBOL(xor_altivec_5);