diff mbox series

[PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Message ID AM6PR07MB4037775DF79E0229DCCA425AE44F0@AM6PR07MB4037.eurprd07.prod.outlook.com
State New
Headers show
Series [PATCHv2] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) | expand

Commit Message

Bernd Edlinger March 10, 2019, 9:42 a.m. UTC
Hi,

This patch is an update to the previous patch, which takes into account that
the middle-end is not supposed to use the unaligned DI value directly which
was passed in an unaligned stack slot due to the AAPCS parameter passing rules.

The patch works by changing use_register_for_decl to return false if the
incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target,
as if the address of the parameter was taken (which is TREE_ADDRESSABLE).
So not taking the address of the parameter is a necessary condition
for the wrong-code in PR 89544.

It works together with this check in assign_parm_adjust_stack_rtl:
  /* If we can't trust the parm stack slot to be aligned enough for its
     ultimate type, don't use that slot after entry.  We'll make another
     stack slot, if we need one.  */
  if (stack_parm
      && ((STRICT_ALIGNMENT
           && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
...
    stack_param = NULL

This makes assign_parms use assign_parm_setup_stack instead of
assign_parm_setup_reg, and the latter does assign a suitably aligned
stack slot, because stack_param == NULL by now, and uses emit_block_move
which avoids the issue with the back-end.

Additionally, to prevent unnecessary performance regressions,
assign_parm_find_stack_rtl is enhanced to make use of a possible larger
alignment if the parameter was passed in an aligned stack slot without
the ABI requiring it.


Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.

Comments

Bernd Edlinger March 19, 2019, 1:27 p.m. UTC | #1
Hi,

I'd like to ping for this patch:
https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00438.html

Thanks
Bernd.


On 3/10/19 10:42 AM, Bernd Edlinger wrote:
> Hi,
> 
> This patch is an update to the previous patch, which takes into account that
> the middle-end is not supposed to use the unaligned DI value directly which
> was passed in an unaligned stack slot due to the AAPCS parameter passing rules.
> 
> The patch works by changing use_register_for_decl to return false if the
> incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target,
> as if the address of the parameter was taken (which is TREE_ADDRESSABLE).
> So not taking the address of the parameter is a necessary condition
> for the wrong-code in PR 89544.
> 
> It works together with this check in assign_parm_adjust_stack_rtl:
>   /* If we can't trust the parm stack slot to be aligned enough for its
>      ultimate type, don't use that slot after entry.  We'll make another
>      stack slot, if we need one.  */
>   if (stack_parm
>       && ((STRICT_ALIGNMENT
>            && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
> ...
>     stack_param = NULL
> 
> This makes assign_parms use assign_parm_setup_stack instead of
> assign_parm_setup_reg, and the latter does assign a suitably aligned
> stack slot, because stack_param == NULL by now, and uses emit_block_move
> which avoids the issue with the back-end.
> 
> Additionally, to prevent unnecessary performance regressions,
> assign_parm_find_stack_rtl is enhanced to make use of a possible larger
> alignment if the parameter was passed in an aligned stack slot without
> the ABI requiring it.
> 
> 
> Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
>
Richard Biener March 21, 2019, 11:15 a.m. UTC | #2
On Sun, 10 Mar 2019, Bernd Edlinger wrote:

> Hi,
> 
> This patch is an update to the previous patch, which takes into account that
> the middle-end is not supposed to use the unaligned DI value directly which
> was passed in an unaligned stack slot due to the AAPCS parameter passing rules.
> 
> The patch works by changing use_register_for_decl to return false if the
> incoming RTL is not sufficiently aligned on a STRICT_ALIGNMENT target,
> as if the address of the parameter was taken (which is TREE_ADDRESSABLE).
> So not taking the address of the parameter is a necessary condition
> for the wrong-code in PR 89544.
> 
> It works together with this check in assign_parm_adjust_stack_rtl:
>   /* If we can't trust the parm stack slot to be aligned enough for its
>      ultimate type, don't use that slot after entry.  We'll make another
>      stack slot, if we need one.  */
>   if (stack_parm
>       && ((STRICT_ALIGNMENT
>            && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
> ...
>     stack_param = NULL
> 
> This makes assign_parms use assign_parm_setup_stack instead of
> assign_parm_setup_reg, and the latter does assign a suitably aligned
> stack slot, because stack_param == NULL by now, and uses emit_block_move
> which avoids the issue with the back-end.
> 
> Additionally, to prevent unnecessary performance regressions,
> assign_parm_find_stack_rtl is enhanced to make use of a possible larger
> alignment if the parameter was passed in an aligned stack slot without
> the ABI requiring it.
> 
> 
> Bootstrapped and reg-tested with all languages on arm-linux-gnueabihf.
> Is it OK for trunk?

I think the assign_parm_find_stack_rtl is not appropriate at this stage,
I am also missing an update to the comment of the block you change.
It also changes code I'm not familar enough with to review...

Finally...

Index: gcc/function.c
===================================================================
--- gcc/function.c      (revision 269264)
+++ gcc/function.c      (working copy)
@@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
   if (DECL_MODE (decl) == BLKmode)
     return false;

+  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
+      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
+      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
+        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
+    return false;
+
   /* If -ffloat-store specified, don't put explicit float variables
      into registers.  */
   /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa

I wonder if it is necessary to look at DECL_INCOMING_RTL here
and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
doesn't exist then shouldn't we return false for safety reasons?

Similarly the very same issue should exist on x86_64 which is
!STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
alignment on the caller side.  So the STRICT_ALIGNMENT check is
a wrong one.

Which makes me think that a proper fix is not here, but in
target(hook) code.

Changing use_register_for_decl sounds odd anyways since if we return true
we for the testcase still end up in memory, no?

The hunk obviously misses a comment since the effect that this
will cause a copy to be emitted isn't obvious (and relying on
this probably fragile).

Richard.
Bernd Edlinger March 22, 2019, 5:44 p.m. UTC | #3
On 3/21/19 12:15 PM, Richard Biener wrote:
> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
> Finally...
> 
> Index: gcc/function.c
> ===================================================================
> --- gcc/function.c      (revision 269264)
> +++ gcc/function.c      (working copy)
> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
>    if (DECL_MODE (decl) == BLKmode)
>      return false;
> 
> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
> +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
> +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
> +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
> +    return false;
> +
>    /* If -ffloat-store specified, don't put explicit float variables
>       into registers.  */
>    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
> 
> I wonder if it is necessary to look at DECL_INCOMING_RTL here
> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
> doesn't exist then shouldn't we return false for safety reasons?
> 

I think that happens a few times already before the INCOMING_RTL
is assigned.  I thought that might be too pessimistic.

> Similarly the very same issue should exist on x86_64 which is
> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> a wrong one.
> 

I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
is always 32-bit align on the DImode memory.  The x86_64 vector instructions
would look at MEM_ALIGN and do the right thing, yes?

It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
right?

The other hunk, where I admit I did not fully understand the comment, tries
only to increase the MEM_ALIGN to 64-bit if the stack slot is
64-bit aligned although the target said it only needs 32-bit alignment.
So that it is no longer necessary to copy the incoming value.


> Which makes me think that a proper fix is not here, but in
> target(hook) code.
> 
> Changing use_register_for_decl sounds odd anyways since if we return true
> we for the testcase still end up in memory, no?
> 

It seems to make us use the incoming register _or_ stack slot if this function
returns true here.

If it returns false here, a new stack slot is allocated, but only if the
original stack slot was not aligned.  This works together with the
other STRICT_ALIGNMENT check in assign_parm_adjust_stack_rtl.
Where also for !STRICT_ALIGNMENT target TYPE_ALIGN and MEM_ALIGN
are checked, but this seems to have only an effect if an address
is taken, in that case I see use_register_for_decl return false
due to TREE_ADDRESSABLE (decl), and whoops, we have an aligned copy
of the unaligned stack slot.

So I believe that there was already a fix for unaligned stack positions,
that relied on the addressability of the parameter, while the target
relied on the 8-byte alignment of the DImode access.

> The hunk obviously misses a comment since the effect that this
> will cause a copy to be emitted isn't obvious (and relying on
> this probably fragile).
> 

Yes, also that the copy is done using movmisalign optab is important.


Thanks
Bernd.
Richard Biener March 25, 2019, 8:41 a.m. UTC | #4
On Fri, 22 Mar 2019, Bernd Edlinger wrote:

> On 3/21/19 12:15 PM, Richard Biener wrote:
> > On Sun, 10 Mar 2019, Bernd Edlinger wrote:
> > Finally...
> > 
> > Index: gcc/function.c
> > ===================================================================
> > --- gcc/function.c      (revision 269264)
> > +++ gcc/function.c      (working copy)
> > @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
> >    if (DECL_MODE (decl) == BLKmode)
> >      return false;
> > 
> > +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
> > +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
> > +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
> > +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
> > +    return false;
> > +
> >    /* If -ffloat-store specified, don't put explicit float variables
> >       into registers.  */
> >    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
> > 
> > I wonder if it is necessary to look at DECL_INCOMING_RTL here
> > and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
> > doesn't exist then shouldn't we return false for safety reasons?
> > 
> 
> I think that happens a few times already before the INCOMING_RTL
> is assigned.  I thought that might be too pessimistic.
> 
> > Similarly the very same issue should exist on x86_64 which is
> > !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> > alignment on the caller side.  So the STRICT_ALIGNMENT check is
> > a wrong one.
> > 
> 
> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> would look at MEM_ALIGN and do the right thing, yes?

No, they need to use the movmisalign optab and end up with UNSPECs
for example.

> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> right?

Yes, I think we never losened that.  Note that RTL expansion has to
fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
specifies that x86 is strict-align wrt vector modes.

> The other hunk, where I admit I did not fully understand the comment, tries
> only to increase the MEM_ALIGN to 64-bit if the stack slot is
> 64-bit aligned although the target said it only needs 32-bit alignment.
> So that it is no longer necessary to copy the incoming value.
> 
> 
> > Which makes me think that a proper fix is not here, but in
> > target(hook) code.
> > 
> > Changing use_register_for_decl sounds odd anyways since if we return true
> > we for the testcase still end up in memory, no?
> > 
> 
> It seems to make us use the incoming register _or_ stack slot if this function
> returns true here.
>
> If it returns false here, a new stack slot is allocated, but only if the
> original stack slot was not aligned.  This works together with the
> other STRICT_ALIGNMENT check in assign_parm_adjust_stack_rtl.

Yes, I understood this - but then the check should be in that code
deciding whether to copy, not in use_register_for_decl?

> Where also for !STRICT_ALIGNMENT target TYPE_ALIGN and MEM_ALIGN
> are checked, but this seems to have only an effect if an address
> is taken, in that case I see use_register_for_decl return false
> due to TREE_ADDRESSABLE (decl), and whoops, we have an aligned copy
> of the unaligned stack slot.
> 
> So I believe that there was already a fix for unaligned stack positions,
> that relied on the addressability of the parameter, while the target
> relied on the 8-byte alignment of the DImode access.
> 
> > The hunk obviously misses a comment since the effect that this
> > will cause a copy to be emitted isn't obvious (and relying on
> > this probably fragile).
> > 
> 
> Yes, also that the copy is done using movmisalign optab is important.
> 
> 
> Thanks
> Bernd.
>
diff mbox series

Patch

2019-03-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* function.c (use_register_for_decl): Avoid using unaligned stack
	values on strict alignment targets.
	(assign_parm_find_stack_rtl): Use larger alignment when possible.

testsuite:
2019-03-05  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 269264)
+++ gcc/function.c	(working copy)
@@ -2210,6 +2210,12 @@  use_register_for_decl (const_tree decl)
   if (DECL_MODE (decl) == BLKmode)
     return false;
 
+  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
+      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
+      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
+	 > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
+    return false;
+
   /* If -ffloat-store specified, don't put explicit float variables
      into registers.  */
   /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
@@ -2698,8 +2704,20 @@  assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "stmdb" 0 } } */
+/* { dg-final { scan-assembler-times "ldrd\[^\\n\]*\\\[sp\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,18 @@ 
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "stmdb" 1 } } */
+/* { dg-final { scan-assembler-times "ldrd\[^\\n\]*\\\[sp\\\]" 1 } } */
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */