diff mbox series

Fix not 8-byte aligned ldrd/strd on ARMv5

Message ID VI1PR07MB48649075FF0E9EB53F64BC6DE46E0@VI1PR07MB4864.eurprd07.prod.outlook.com
State New
Headers show
Series Fix not 8-byte aligned ldrd/strd on ARMv5 | expand

Commit Message

Bernd Edlinger Feb. 5, 2019, 3:07 p.m. UTC
Hi,

due to the AAPCS parameter passing of 8-byte aligned structures, which happen to
be 8-byte aligned or only 4-byte aligned in the test case, ldrd instructions
are generated that may access 4-byte aligned stack slots, which will trap on ARMv5 and
ARMv6 according to the following document:


http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473m/dom1361290002364.html
says:

"In ARMv5TE, or in ARMv6 when SCTLR.U is 0, LDRD and STRD doubleword data transfers must be
eight-byte aligned.  Use ALIGN 8 before memory allocation directives such as DCQ if the data
is to be accessed using LDRD or STRD.  This is not required in ARMv6 when SCTLR.U is 1, or in
ARMv7, because in these versions, doubleword data transfers can be word-aligned."


The reason why the ldrd instruction is generated seems to be a missing alignment check in the
function output_move_double.  But when that is fixed, it turns out that if the parameter happens
to be 8-byte aligned by chance, they still have MEM_ALIGN = 4, which prevents the ldrd completely.

The reason for that is in function.c (assign_parm_find_stack_rtl), where values that happen to be
aligned to STACK_BOUNDARY, are only  aligned to PARM_BOUNDARY.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf with all languages.
Is it OK for trunk?


Thanks
Bernd.

Comments

Bernd Edlinger Feb. 12, 2019, 12:32 p.m. UTC | #1
Hi!

I'd like to ping for this patch:
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00248.html

Thanks
Bernd.


On 2/5/19 4:07 PM, Bernd Edlinger wrote:
> Hi,
> 
> due to the AAPCS parameter passing of 8-byte aligned structures, which happen to
> be 8-byte aligned or only 4-byte aligned in the test case, ldrd instructions
> are generated that may access 4-byte aligned stack slots, which will trap on ARMv5 and
> ARMv6 according to the following document:
> 
> 
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473m/dom1361290002364.html
> says:
> 
> "In ARMv5TE, or in ARMv6 when SCTLR.U is 0, LDRD and STRD doubleword data transfers must be
> eight-byte aligned.  Use ALIGN 8 before memory allocation directives such as DCQ if the data
> is to be accessed using LDRD or STRD.  This is not required in ARMv6 when SCTLR.U is 1, or in
> ARMv7, because in these versions, doubleword data transfers can be word-aligned."
> 
> 
> The reason why the ldrd instruction is generated seems to be a missing alignment check in the
> function output_move_double.  But when that is fixed, it turns out that if the parameter happens
> to be 8-byte aligned by chance, they still have MEM_ALIGN = 4, which prevents the ldrd completely.
> 
> The reason for that is in function.c (assign_parm_find_stack_rtl), where values that happen to be
> aligned to STACK_BOUNDARY, are only  aligned to PARM_BOUNDARY.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf with all languages.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
>
Bernd Edlinger Feb. 22, 2019, 5:53 p.m. UTC | #2
Ping...

On 2/12/19 1:32 PM, Bernd Edlinger wrote:
> Hi!
> 
> I'd like to ping for this patch:
> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00248.html
> 
> Thanks
> Bernd.
> 
> 
> On 2/5/19 4:07 PM, Bernd Edlinger wrote:
>> Hi,
>>
>> due to the AAPCS parameter passing of 8-byte aligned structures, which happen to
>> be 8-byte aligned or only 4-byte aligned in the test case, ldrd instructions
>> are generated that may access 4-byte aligned stack slots, which will trap on ARMv5 and
>> ARMv6 according to the following document:
>>
>>
>> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473m/dom1361290002364.html
>> says:
>>
>> "In ARMv5TE, or in ARMv6 when SCTLR.U is 0, LDRD and STRD doubleword data transfers must be
>> eight-byte aligned.  Use ALIGN 8 before memory allocation directives such as DCQ if the data
>> is to be accessed using LDRD or STRD.  This is not required in ARMv6 when SCTLR.U is 1, or in
>> ARMv7, because in these versions, doubleword data transfers can be word-aligned."
>>
>>
>> The reason why the ldrd instruction is generated seems to be a missing alignment check in the
>> function output_move_double.  But when that is fixed, it turns out that if the parameter happens
>> to be 8-byte aligned by chance, they still have MEM_ALIGN = 4, which prevents the ldrd completely.
>>
>> The reason for that is in function.c (assign_parm_find_stack_rtl), where values that happen to be
>> aligned to STACK_BOUNDARY, are only  aligned to PARM_BOUNDARY.
>>
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf with all languages.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
Richard Earnshaw (lists) March 1, 2019, 10:43 a.m. UTC | #3
On 05/02/2019 15:07, Bernd Edlinger wrote:
> Hi,
> 
> due to the AAPCS parameter passing of 8-byte aligned structures, which happen to
> be 8-byte aligned or only 4-byte aligned in the test case, ldrd instructions
> are generated that may access 4-byte aligned stack slots, which will trap on ARMv5 and
> ARMv6 according to the following document:
> 
> 
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473m/dom1361290002364.html
> says:
> 
> "In ARMv5TE, or in ARMv6 when SCTLR.U is 0, LDRD and STRD doubleword data transfers must be
> eight-byte aligned.  Use ALIGN 8 before memory allocation directives such as DCQ if the data
> is to be accessed using LDRD or STRD.  This is not required in ARMv6 when SCTLR.U is 1, or in
> ARMv7, because in these versions, doubleword data transfers can be word-aligned."
> 
> 
> The reason why the ldrd instruction is generated seems to be a missing alignment check in the
> function output_move_double.  But when that is fixed, it turns out that if the parameter happens
> to be 8-byte aligned by chance, they still have MEM_ALIGN = 4, which prevents the ldrd completely.
> 
> The reason for that is in function.c (assign_parm_find_stack_rtl), where values that happen to be
> aligned to STACK_BOUNDARY, are only  aligned to PARM_BOUNDARY.
> 
> 
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf with all languages.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 
> 
> patch-arm-align-abi.diff
> 
> 2019-02-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	* config/arm/arm.c (output_move_double): Check required memory
> 	alignment for ldrd/strd instructions.
> 	* function.c (assign_parm_find_stack_rtl): Use larger alignment
> 	when possible.
> 
> testsuite:
> 2019-02-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	* gcc.target/arm/unaligned-argument-1.c: New test.
> 	* gcc.target/arm/unaligned-argument-2.c: New test.
> 
> Index: gcc/config/arm/arm.c
> ===================================================================
> --- gcc/config/arm/arm.c	(revision 268337)
> +++ gcc/config/arm/arm.c	(working copy)
> @@ -18303,6 +18303,8 @@ output_move_double (rtx *operands, bool emit, int
>        otherops[0] = gen_rtx_REG (SImode, 1 + reg0);
>  
>        gcc_assert (code1 == MEM);  /* Constraints should ensure this.  */
> +      bool allow_ldrd = TARGET_LDRD
> +			&& align_ok_ldrd_strd (MEM_ALIGN (operands[1]), 0);
>  
>        switch (GET_CODE (XEXP (operands[1], 0)))
>  	{
> @@ -18310,8 +18312,8 @@ output_move_double (rtx *operands, bool emit, int
>  
>  	  if (emit)
>  	    {
> -	      if (TARGET_LDRD
> -		  && !(fix_cm3_ldrd && reg0 == REGNO(XEXP (operands[1], 0))))
> +	      if (allow_ldrd
> +		  && !(fix_cm3_ldrd && reg0 == REGNO (XEXP (operands[1], 0))))
>  		output_asm_insn ("ldrd%?\t%0, [%m1]", operands);
>  	      else
>  		output_asm_insn ("ldmia%?\t%m1, %M0", operands);
> @@ -18319,7 +18321,7 @@ output_move_double (rtx *operands, bool emit, int
>  	  break;
>  
>  	case PRE_INC:
> -	  gcc_assert (TARGET_LDRD);
> +	  gcc_assert (allow_ldrd);
>  	  if (emit)
>  	    output_asm_insn ("ldrd%?\t%0, [%m1, #8]!", operands);
>  	  break;
> @@ -18327,7 +18329,7 @@ output_move_double (rtx *operands, bool emit, int
>  	case PRE_DEC:
>  	  if (emit)
>  	    {
> -	      if (TARGET_LDRD)
> +	      if (allow_ldrd)
>  		output_asm_insn ("ldrd%?\t%0, [%m1, #-8]!", operands);
>  	      else
>  		output_asm_insn ("ldmdb%?\t%m1!, %M0", operands);
> @@ -18337,7 +18339,7 @@ output_move_double (rtx *operands, bool emit, int
>  	case POST_INC:
>  	  if (emit)
>  	    {
> -	      if (TARGET_LDRD)
> +	      if (allow_ldrd)
>  		output_asm_insn ("ldrd%?\t%0, [%m1], #8", operands);
>  	      else
>  		output_asm_insn ("ldmia%?\t%m1!, %M0", operands);
> @@ -18345,7 +18347,7 @@ output_move_double (rtx *operands, bool emit, int
>  	  break;
>  
>  	case POST_DEC:
> -	  gcc_assert (TARGET_LDRD);
> +	  gcc_assert (allow_ldrd);
>  	  if (emit)
>  	    output_asm_insn ("ldrd%?\t%0, [%m1], #-8", operands);
>  	  break;
> @@ -18483,7 +18485,7 @@ output_move_double (rtx *operands, bool emit, int
>  		    }
>  		  otherops[0] = gen_rtx_REG(SImode, REGNO(operands[0]) + 1);
>  		  operands[1] = otherops[0];
> -		  if (TARGET_LDRD
> +		  if (allow_ldrd
>  		      && (REG_P (otherops[2])
>  			  || TARGET_THUMB2
>  			  || (CONST_INT_P (otherops[2])
> @@ -18544,7 +18546,7 @@ output_move_double (rtx *operands, bool emit, int
>  	      if (count)
>  		*count = 2;
>  
> -	      if (TARGET_LDRD)
> +	      if (allow_ldrd)
>  		return "ldrd%?\t%0, [%1]";
>  
>  	      return "ldmia%?\t%1, %M0";
> @@ -18589,7 +18591,8 @@ output_move_double (rtx *operands, bool emit, int
>  	 values but user assembly constraints can force an odd
>  	 starting register.  */
>        bool allow_strd = TARGET_LDRD
> -			 && !(TARGET_ARM && (REGNO (operands[1]) & 1) == 1);
> +			 && !(TARGET_ARM && (REGNO (operands[1]) & 1) == 1)
> +			 && align_ok_ldrd_strd (MEM_ALIGN (operands[0]), 0);
>        switch (GET_CODE (XEXP (operands[0], 0)))
>          {
>  	case REG:
> Index: gcc/function.c
> ===================================================================
> --- gcc/function.c	(revision 268337)
> +++ gcc/function.c	(working copy)
> @@ -2698,8 +2698,20 @@ assign_parm_find_stack_rtl (tree parm, struct assi
>       intentionally forcing upward padding.  Otherwise we have to come
>       up with a guess at the alignment based on OFFSET_RTX.  */
>    poly_int64 offset;
> -  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
> +  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
>      align = boundary;
> +  else if (data->locate.where_pad == PAD_UPWARD)
> +    {
> +      align = boundary;
> +      if (poly_int_rtx_p (offset_rtx, &offset)
> +	  && STACK_POINTER_OFFSET == 0)
> +	{
> +	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
> +	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
> +	    offset_align = STACK_BOUNDARY;
> +	  align = MAX (align, offset_align);
> +	}
> +    }
>    else if (poly_int_rtx_p (offset_rtx, &offset))
>      {
>        align = least_bit_hwi (boundary);
> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
> +
> +struct s {
> +  int a, b;
> +} __attribute__((aligned(8)));
> +
> +struct s f0;
> +
> +void f(int a, int b, int c, int d, struct s f)
> +{
> +  /* f is on a 64 bit aligned stack slot, thus ldrd OK.  */
> +  f0 = f;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldrd" 1 } } */
> +/* { dg-final { scan-assembler-times "strd" 1 } } */
> Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
> +++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
> +
> +struct s {
> +  int a, b;
> +} __attribute__((aligned(8)));
> +
> +struct s f0;
> +
> +void f(int a, int b, int c, int d, int e, struct s f)
> +{
> +  /* f is on a 32 bit aligned stack slot, thus no ldrd.  */
> +  f0 = f;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldrd" 0 } } */
> +/* { dg-final { scan-assembler-times "strd" 1 } } */
> 

As explained on our other thread, I think this is papering over a deeper
problem.  Put simply,

(mem/c:DI (plus:SI (reg/f:SI 104 virtual-incoming-args)
        (const_int 4 [0x4])) [1 f+0 S8 A32])

is not valid RTL on a strict alignment target (DImode with A32) and
should never be generated.

I've created PR89544.

R.
diff mbox series

Patch

2019-02-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config/arm/arm.c (output_move_double): Check required memory
	alignment for ldrd/strd instructions.
	* function.c (assign_parm_find_stack_rtl): Use larger alignment
	when possible.

testsuite:
2019-02-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 268337)
+++ gcc/config/arm/arm.c	(working copy)
@@ -18303,6 +18303,8 @@  output_move_double (rtx *operands, bool emit, int
       otherops[0] = gen_rtx_REG (SImode, 1 + reg0);
 
       gcc_assert (code1 == MEM);  /* Constraints should ensure this.  */
+      bool allow_ldrd = TARGET_LDRD
+			&& align_ok_ldrd_strd (MEM_ALIGN (operands[1]), 0);
 
       switch (GET_CODE (XEXP (operands[1], 0)))
 	{
@@ -18310,8 +18312,8 @@  output_move_double (rtx *operands, bool emit, int
 
 	  if (emit)
 	    {
-	      if (TARGET_LDRD
-		  && !(fix_cm3_ldrd && reg0 == REGNO(XEXP (operands[1], 0))))
+	      if (allow_ldrd
+		  && !(fix_cm3_ldrd && reg0 == REGNO (XEXP (operands[1], 0))))
 		output_asm_insn ("ldrd%?\t%0, [%m1]", operands);
 	      else
 		output_asm_insn ("ldmia%?\t%m1, %M0", operands);
@@ -18319,7 +18321,7 @@  output_move_double (rtx *operands, bool emit, int
 	  break;
 
 	case PRE_INC:
-	  gcc_assert (TARGET_LDRD);
+	  gcc_assert (allow_ldrd);
 	  if (emit)
 	    output_asm_insn ("ldrd%?\t%0, [%m1, #8]!", operands);
 	  break;
@@ -18327,7 +18329,7 @@  output_move_double (rtx *operands, bool emit, int
 	case PRE_DEC:
 	  if (emit)
 	    {
-	      if (TARGET_LDRD)
+	      if (allow_ldrd)
 		output_asm_insn ("ldrd%?\t%0, [%m1, #-8]!", operands);
 	      else
 		output_asm_insn ("ldmdb%?\t%m1!, %M0", operands);
@@ -18337,7 +18339,7 @@  output_move_double (rtx *operands, bool emit, int
 	case POST_INC:
 	  if (emit)
 	    {
-	      if (TARGET_LDRD)
+	      if (allow_ldrd)
 		output_asm_insn ("ldrd%?\t%0, [%m1], #8", operands);
 	      else
 		output_asm_insn ("ldmia%?\t%m1!, %M0", operands);
@@ -18345,7 +18347,7 @@  output_move_double (rtx *operands, bool emit, int
 	  break;
 
 	case POST_DEC:
-	  gcc_assert (TARGET_LDRD);
+	  gcc_assert (allow_ldrd);
 	  if (emit)
 	    output_asm_insn ("ldrd%?\t%0, [%m1], #-8", operands);
 	  break;
@@ -18483,7 +18485,7 @@  output_move_double (rtx *operands, bool emit, int
 		    }
 		  otherops[0] = gen_rtx_REG(SImode, REGNO(operands[0]) + 1);
 		  operands[1] = otherops[0];
-		  if (TARGET_LDRD
+		  if (allow_ldrd
 		      && (REG_P (otherops[2])
 			  || TARGET_THUMB2
 			  || (CONST_INT_P (otherops[2])
@@ -18544,7 +18546,7 @@  output_move_double (rtx *operands, bool emit, int
 	      if (count)
 		*count = 2;
 
-	      if (TARGET_LDRD)
+	      if (allow_ldrd)
 		return "ldrd%?\t%0, [%1]";
 
 	      return "ldmia%?\t%1, %M0";
@@ -18589,7 +18591,8 @@  output_move_double (rtx *operands, bool emit, int
 	 values but user assembly constraints can force an odd
 	 starting register.  */
       bool allow_strd = TARGET_LDRD
-			 && !(TARGET_ARM && (REGNO (operands[1]) & 1) == 1);
+			 && !(TARGET_ARM && (REGNO (operands[1]) & 1) == 1)
+			 && align_ok_ldrd_strd (MEM_ALIGN (operands[0]), 0);
       switch (GET_CODE (XEXP (operands[0], 0)))
         {
 	case REG:
Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 268337)
+++ gcc/function.c	(working copy)
@@ -2698,8 +2698,20 @@  assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  /* f is on a 64 bit aligned stack slot, thus ldrd OK.  */
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  /* f is on a 32 bit aligned stack slot, thus no ldrd.  */
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */