diff mbox series

[PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544)

Message ID AM6PR10MB256664D731C3CC92F2FBEDC5E4DC0@AM6PR10MB2566.EURPRD10.PROD.OUTLOOK.COM
State New
Headers show
Series [PATCHv3] Fix not 8-byte aligned ldrd/strd on ARMv5 (PR 89544) | expand

Commit Message

Bernd Edlinger July 30, 2019, 8:51 p.m. UTC
Hi Richard,

it is already a while ago, but I had not found time to continue
with this patch until now.

I think I have now a better solution, which properly addresses your
comments below.

On 3/25/19 9:41 AM, Richard Biener wrote:
> On Fri, 22 Mar 2019, Bernd Edlinger wrote:
> 
>> On 3/21/19 12:15 PM, Richard Biener wrote:
>>> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
>>> Finally...
>>>
>>> Index: gcc/function.c
>>> ===================================================================
>>> --- gcc/function.c      (revision 269264)
>>> +++ gcc/function.c      (working copy)
>>> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
>>>    if (DECL_MODE (decl) == BLKmode)
>>>      return false;
>>>
>>> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
>>> +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
>>> +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
>>> +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
>>> +    return false;
>>> +
>>>    /* If -ffloat-store specified, don't put explicit float variables
>>>       into registers.  */
>>>    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
>>>
>>> I wonder if it is necessary to look at DECL_INCOMING_RTL here
>>> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
>>> doesn't exist then shouldn't we return false for safety reasons?
>>>

You are right, it is not possbile to return different results from
use_register_for_decl before vs. after incoming RTL is assigned.
That hits an assertion in set_rtl.

This hunk is gone now, instead I changed assign_parm_setup_reg
to use movmisalign optab and/or extract_bit_field if misaligned
entry_parm is to be assigned in a register.

I have no test coverage for the movmisalign optab though, so I
rely on your code review for that part.

>>> Similarly the very same issue should exist on x86_64 which is
>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
>>> a wrong one.
>>>
>>
>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
>> would look at MEM_ALIGN and do the right thing, yes?
> 
> No, they need to use the movmisalign optab and end up with UNSPECs
> for example.
Ah, thanks, now I see.

>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
>> right?
> 
> Yes, I think we never losened that.  Note that RTL expansion has to
> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> specifies that x86 is strict-align wrt vector modes.
> 

Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
is not used.  So I invoke the movmisalign optab if available and if not fall
back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
assumes that data->promoted_mode != data->nominal_mode does not happen with
misaligned stack slots.


Attached is the v3 if my patch.

Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.

Is it OK for trunk?


Thanks
Bernd.

Comments

Richard Earnshaw (lists) July 31, 2019, 1:16 p.m. UTC | #1
On 30/07/2019 21:51, Bernd Edlinger wrote:
> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */

This isn't going to work as-is, we test many combinations of the 
compiler, either with explicit dejagnu settings or with the compiler 
defaults and the dejagnu settings can't generally be overridden this way.

For -marm you require an effective-target of arm_arm_ok.  For ldrd, it 
should be enough to just require an effective-target of 
arm_ldrd_strd_ok, then you can .

I don't think we really care about any ABIs other than aapcs, so I'd 
just leave that off.  And as for setting the float-abi, I don't see 
anything in the tests that would require that, so that can probably be 
omitted as well.

I think with all this, you can then write something like

/* { dg-require-effective-target arm_arm_ok && arm_ldrd_strd_ok } */
/* { dg-options "-marm -mno-unaligned-access -O3 } */

But I haven't tested that, so you might need to fiddle with it a bit, 
especially the effective-target rule.

R.
Bernd Edlinger Aug. 1, 2019, 11:19 a.m. UTC | #2
On 7/31/19 3:16 PM, Richard Earnshaw (lists) wrote:
> 
> 
> On 30/07/2019 21:51, Bernd Edlinger wrote:
>> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
> 
> This isn't going to work as-is, we test many combinations of the compiler, either with explicit dejagnu settings or with the compiler defaults and the dejagnu settings can't generally be overridden this way.
> 
> For -marm you require an effective-target of arm_arm_ok.  For ldrd, it should be enough to just require an effective-target of arm_ldrd_strd_ok, then you can .
> 
> I don't think we really care about any ABIs other than aapcs, so I'd just leave that off.  And as for setting the float-abi, I don't see anything in the tests that would require that, so that can probably be omitted as well.
> 
> I think with all this, you can then write something like
> 
> /* { dg-require-effective-target arm_arm_ok && arm_ldrd_strd_ok } */
> /* { dg-options "-marm -mno-unaligned-access -O3 } */
> 
> But I haven't tested that, so you might need to fiddle with it a bit, especially the effective-target rule.
> 

Okay, it seems we need two dg-require-effective-target rules for this to work,
as in the attached new version of the patch which I am currently boot-strapping.

Is it OK for trunk after successful boot-strap and reg-testing?


Thanks
Bernd.
Richard Earnshaw (lists) Aug. 2, 2019, 9:10 a.m. UTC | #3
On 01/08/2019 12:19, Bernd Edlinger wrote:
> On 7/31/19 3:16 PM, Richard Earnshaw (lists) wrote:
>>
>>
>> On 30/07/2019 21:51, Bernd Edlinger wrote:
>>> +/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
>>
>> This isn't going to work as-is, we test many combinations of the compiler, either with explicit dejagnu settings or with the compiler defaults and the dejagnu settings can't generally be overridden this way.
>>
>> For -marm you require an effective-target of arm_arm_ok.  For ldrd, it should be enough to just require an effective-target of arm_ldrd_strd_ok, then you can .
>>
>> I don't think we really care about any ABIs other than aapcs, so I'd just leave that off.  And as for setting the float-abi, I don't see anything in the tests that would require that, so that can probably be omitted as well.
>>
>> I think with all this, you can then write something like
>>
>> /* { dg-require-effective-target arm_arm_ok && arm_ldrd_strd_ok } */
>> /* { dg-options "-marm -mno-unaligned-access -O3 } */
>>
>> But I haven't tested that, so you might need to fiddle with it a bit, especially the effective-target rule.
>>
> 
> Okay, it seems we need two dg-require-effective-target rules for this to work,
> as in the attached new version of the patch which I am currently boot-strapping.
> 
> Is it OK for trunk after successful boot-strap and reg-testing?
> 

The tests are OK.  If the match rules for the stm instruction turn out 
to cause problems I think we can just drop them without materially 
weakening the tests.  But lets wait and see on that.

I'll leave the mid-end bits to Richi, I'm not familiar with that code.

R.

> 
> Thanks
> Bernd.
>
Richard Biener Aug. 2, 2019, 1:11 p.m. UTC | #4
On Tue, 30 Jul 2019, Bernd Edlinger wrote:

> Hi Richard,
> 
> it is already a while ago, but I had not found time to continue
> with this patch until now.
> 
> I think I have now a better solution, which properly addresses your
> comments below.
> 
> On 3/25/19 9:41 AM, Richard Biener wrote:
> > On Fri, 22 Mar 2019, Bernd Edlinger wrote:
> > 
> >> On 3/21/19 12:15 PM, Richard Biener wrote:
> >>> On Sun, 10 Mar 2019, Bernd Edlinger wrote:
> >>> Finally...
> >>>
> >>> Index: gcc/function.c
> >>> ===================================================================
> >>> --- gcc/function.c      (revision 269264)
> >>> +++ gcc/function.c      (working copy)
> >>> @@ -2210,6 +2210,12 @@ use_register_for_decl (const_tree decl)
> >>>    if (DECL_MODE (decl) == BLKmode)
> >>>      return false;
> >>>
> >>> +  if (STRICT_ALIGNMENT && TREE_CODE (decl) == PARM_DECL
> >>> +      && DECL_INCOMING_RTL (decl) && MEM_P (DECL_INCOMING_RTL (decl))
> >>> +      && GET_MODE_ALIGNMENT (DECL_MODE (decl))
> >>> +        > MEM_ALIGN (DECL_INCOMING_RTL (decl)))
> >>> +    return false;
> >>> +
> >>>    /* If -ffloat-store specified, don't put explicit float variables
> >>>       into registers.  */
> >>>    /* ??? This should be checked after DECL_ARTIFICIAL, but tree-ssa
> >>>
> >>> I wonder if it is necessary to look at DECL_INCOMING_RTL here
> >>> and why such RTL may not exist?  That is, iff DECL_INCOMING_RTL
> >>> doesn't exist then shouldn't we return false for safety reasons?
> >>>
> 
> You are right, it is not possbile to return different results from
> use_register_for_decl before vs. after incoming RTL is assigned.
> That hits an assertion in set_rtl.
> 
> This hunk is gone now, instead I changed assign_parm_setup_reg
> to use movmisalign optab and/or extract_bit_field if misaligned
> entry_parm is to be assigned in a register.
> 
> I have no test coverage for the movmisalign optab though, so I
> rely on your code review for that part.

It looks OK.  I tried to make it trigger on the following on
i?86 with -msse2:

typedef int v4si __attribute__((vector_size (16)));

struct S { v4si v; } __attribute__((packed));

v4si foo (struct S s)
{
  return s.v;
}

but nowadays x86 seems to be happy with regular moves operating on
unaligned memory, using unaligned moves where necessary.

(insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
        (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
{movv4si_internal}
     (nil))

and with GCC 4.8 we ended up with the following expansion which is
also correct.

(insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
        (unspec:V16QI [
                (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
            ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
     (nil))

So it seems it has been too long and I don't remember what is
special with arm that it doesn't work...  it possibly simply
trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
I think is OK-ish?

> >>> Similarly the very same issue should exist on x86_64 which is
> >>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> >>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> >>> a wrong one.
> >>>
> >>
> >> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> >> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> >> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> >> would look at MEM_ALIGN and do the right thing, yes?
> > 
> > No, they need to use the movmisalign optab and end up with UNSPECs
> > for example.
> Ah, thanks, now I see.
> 
> >> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> >> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> >> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> >> right?
> > 
> > Yes, I think we never losened that.  Note that RTL expansion has to
> > fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> > specifies that x86 is strict-align wrt vector modes.
> > 
> 
> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
> is not used.  So I invoke the movmisalign optab if available and if not fall
> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
> assumes that data->promoted_mode != data->nominal_mode does not happen with
> misaligned stack slots.
> 
> 
> Attached is the v3 if my patch.
> 
> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> 
> Is it OK for trunk?

Few comments.

@@ -2274,8 +2274,6 @@ struct assign_parm_data_one
   int partial;
   BOOL_BITFIELD named_arg : 1;
   BOOL_BITFIELD passed_pointer : 1;
-  BOOL_BITFIELD on_stack : 1;
-  BOOL_BITFIELD loaded_in_reg : 1;
 };

 /* A subroutine of assign_parms.  Initialize ALL.  */

independently OK.

@@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
      ultimate type, don't use that slot after entry.  We'll make another
      stack slot, if we need one.  */
   if (stack_parm
-      && ((STRICT_ALIGNMENT
-          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
(stack_parm))
+      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
(stack_parm)
+          && targetm.slow_unaligned_access (data->nominal_mode,
+                                            MEM_ALIGN (stack_parm)))
          || (data->nominal_type
              && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
              && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))

looks like something we should have as a separate commit as well.  It
also looks obvious to me.

@@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all

       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)

we arrive here by-passing

  else if (need_conversion)
    {
      /* We did not have an insn to convert directly, or the sequence
         generated appeared unsafe.  We must first copy the parm to a
         pseudo reg, and save the conversion until after all
         parameters have been moved.  */

      int save_tree_used;
      rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));

      emit_move_insn (tempreg, validated_mem);

but this move instruction is invalid in the same way as the case
you fix, no?  So wouldn't it be better to do

  if (moved)
    /* Nothing to do.  */
    ;
  else
    {
       if (unaligned)
         ...
       else
         emit_move_insn (...);

       if (need_conversion)
 ....
    }

?  Hopefully whatever "moved" things in the if (moved) case did
it correctly.

Can you check whehter your patch does anything to the x86 testcase
posted above?

I'm not very familiar with this code so I'm leaving actual approval
to somebody else.  Still hope the comments were helpful.

Thanks,
Richard.
Bernd Edlinger Aug. 2, 2019, 7:01 p.m. UTC | #5
On 8/2/19 3:11 PM, Richard Biener wrote:
> On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> 
>>
>> I have no test coverage for the movmisalign optab though, so I
>> rely on your code review for that part.
> 
> It looks OK.  I tried to make it trigger on the following on
> i?86 with -msse2:
> 
> typedef int v4si __attribute__((vector_size (16)));
> 
> struct S { v4si v; } __attribute__((packed));
> 
> v4si foo (struct S s)
> {
>   return s.v;
> }
> 

Hmm, the entry_parm need to be a MEM_P and an unaligned one.
So the test case could be made to trigger it this way:

typedef int v4si __attribute__((vector_size (16)));

struct S { v4si v; } __attribute__((packed));

int t;
v4si foo (struct S a, struct S b, struct S c, struct S d,
          struct S e, struct S f, struct S g, struct S h,
          int i, int j, int k, int l, int m, int n,
          int o, struct S s)
{
  t = o;
  return s.v;
}

However the code path is still not reached, since targetm.slow_ualigned_access
is always FALSE, which is probably a flaw in my patch.

So I think,

+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+          && targetm.slow_unaligned_access (promoted_nominal_mode,
+                                            MEM_ALIGN (data->entry_parm)))

should probably better be

+  else if (MEM_P (data->entry_parm)
+          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+             > MEM_ALIGN (data->entry_parm)
+        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
+             != CODE_FOR_nothing)
+            || targetm.slow_unaligned_access (promoted_nominal_mode,
+                                              MEM_ALIGN (data->entry_parm))))

Right?

Then the modified test case would use the movmisalign optab.
However nothing changes in the end, since the i386 back-end is used to work
around the middle end not using movmisalign optab when it should do so.

I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
patterns that the memory is properly aligned ?


> but nowadays x86 seems to be happy with regular moves operating on
> unaligned memory, using unaligned moves where necessary.
> 
> (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
>         (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
> {movv4si_internal}
>      (nil))
> 
> and with GCC 4.8 we ended up with the following expansion which is
> also correct.
> 
> (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
>         (unspec:V16QI [
>                 (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
>             ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
>      (nil))
> 
> So it seems it has been too long and I don't remember what is
> special with arm that it doesn't work...  it possibly simply
> trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
> I think is OK-ish?
> 

Yes, that is what Richard said as well.

>>>>> Similarly the very same issue should exist on x86_64 which is
>>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
>>>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
>>>>> a wrong one.
>>>>>
>>>>
>>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
>>>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
>>>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
>>>> would look at MEM_ALIGN and do the right thing, yes?
>>>
>>> No, they need to use the movmisalign optab and end up with UNSPECs
>>> for example.
>> Ah, thanks, now I see.
>>
>>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
>>>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
>>>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
>>>> right?
>>>
>>> Yes, I think we never losened that.  Note that RTL expansion has to
>>> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
>>> specifies that x86 is strict-align wrt vector modes.
>>>
>>
>> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
>> is not used.  So I invoke the movmisalign optab if available and if not fall
>> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
>> assumes that data->promoted_mode != data->nominal_mode does not happen with
>> misaligned stack slots.
>>
>>
>> Attached is the v3 if my patch.
>>
>> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
>>
>> Is it OK for trunk?
> 
> Few comments.
> 
> @@ -2274,8 +2274,6 @@ struct assign_parm_data_one
>    int partial;
>    BOOL_BITFIELD named_arg : 1;
>    BOOL_BITFIELD passed_pointer : 1;
> -  BOOL_BITFIELD on_stack : 1;
> -  BOOL_BITFIELD loaded_in_reg : 1;
>  };
> 
>  /* A subroutine of assign_parms.  Initialize ALL.  */
> 
> independently OK.
> 
> @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
>       ultimate type, don't use that slot after entry.  We'll make another
>       stack slot, if we need one.  */
>    if (stack_parm
> -      && ((STRICT_ALIGNMENT
> -          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> (stack_parm))
> +      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> (stack_parm)
> +          && targetm.slow_unaligned_access (data->nominal_mode,
> +                                            MEM_ALIGN (stack_parm)))
>           || (data->nominal_type
>               && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
>               && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
> 
> looks like something we should have as a separate commit as well.  It
> also looks obvious to me.
> 

Okay, committed as two separate commits: r274023 & r274025

> @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> 
>        did_conversion = true;
>      }
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> 
> we arrive here by-passing
> 
>   else if (need_conversion)
>     {
>       /* We did not have an insn to convert directly, or the sequence
>          generated appeared unsafe.  We must first copy the parm to a
>          pseudo reg, and save the conversion until after all
>          parameters have been moved.  */
> 
>       int save_tree_used;
>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> 
>       emit_move_insn (tempreg, validated_mem);
> 
> but this move instruction is invalid in the same way as the case
> you fix, no?  So wouldn't it be better to do
> 

We could do that, but I supposed that there must be a reason why
assign_parm_setup_stack gets away with that same:

  if (data->promoted_mode != data->nominal_mode)
    {
      /* Conversion is required.  */
      rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));

      emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));


So either some back-ends are too permissive with us,
or there is a reason why promoted_mode != nominal_mode
does not happen together with unaligned entry_parm.
In a way that would be a rather unusual ABI.

>   if (moved)
>     /* Nothing to do.  */
>     ;
>   else
>     {
>        if (unaligned)
>          ...
>        else
>          emit_move_insn (...);
> 
>        if (need_conversion)
>  ....
>     }
> 
> ?  Hopefully whatever "moved" things in the if (moved) case did
> it correctly.
> 

It would'nt.  It uses the gen_extend_insn would that be expected to
work with unaligned memory?

> Can you check whehter your patch does anything to the x86 testcase
> posted above?
> 

Thanks, it might help to have at least a test case where the pattern
is expanded, even if it does not change anything.

> I'm not very familiar with this code so I'm leaving actual approval
> to somebody else.  Still hope the comments were helpful.
> 

Yes they are, thanks a lot.


Thanks
Bernd.
Richard Biener Aug. 14, 2019, 11:16 a.m. UTC | #6
On Fri, 2 Aug 2019, Bernd Edlinger wrote:

> On 8/2/19 3:11 PM, Richard Biener wrote:
> > On Tue, 30 Jul 2019, Bernd Edlinger wrote:
> > 
> >>
> >> I have no test coverage for the movmisalign optab though, so I
> >> rely on your code review for that part.
> > 
> > It looks OK.  I tried to make it trigger on the following on
> > i?86 with -msse2:
> > 
> > typedef int v4si __attribute__((vector_size (16)));
> > 
> > struct S { v4si v; } __attribute__((packed));
> > 
> > v4si foo (struct S s)
> > {
> >   return s.v;
> > }
> > 
> 
> Hmm, the entry_parm need to be a MEM_P and an unaligned one.
> So the test case could be made to trigger it this way:
> 
> typedef int v4si __attribute__((vector_size (16)));
> 
> struct S { v4si v; } __attribute__((packed));
> 
> int t;
> v4si foo (struct S a, struct S b, struct S c, struct S d,
>           struct S e, struct S f, struct S g, struct S h,
>           int i, int j, int k, int l, int m, int n,
>           int o, struct S s)
> {
>   t = o;
>   return s.v;
> }
> 
> However the code path is still not reached, since targetm.slow_ualigned_access
> is always FALSE, which is probably a flaw in my patch.
> 
> So I think,
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +          && targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                            MEM_ALIGN (data->entry_parm)))
> 
> should probably better be
> 
> +  else if (MEM_P (data->entry_parm)
> +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> +             > MEM_ALIGN (data->entry_parm)
> +        && (((icode = optab_handler (movmisalign_optab, promoted_nominal_mode))
> +             != CODE_FOR_nothing)
> +            || targetm.slow_unaligned_access (promoted_nominal_mode,
> +                                              MEM_ALIGN (data->entry_parm))))
> 
> Right?

Ah, yes.  So it's really the presence of a movmisalign optab makes it
a must for unaligned moves and if it is not present then
targetm.slow_unaligned_access tells whether we need to use the bitfield
extraction/insertion code.

> Then the modified test case would use the movmisalign optab.
> However nothing changes in the end, since the i386 back-end is used to work
> around the middle end not using movmisalign optab when it should do so.

Yeah, in the past it would have failed though.  I wonder if movmisalign
is still needed for x86...

> I wonder if I should try to add a gcc_checking_assert to the mov<mode> expand
> patterns that the memory is properly aligned ?

I suppose gen* could add asserts that there is no movmisalign_optab
that would match when expanding a mov<mode>.  Eventually it's enough
to guard the mov_optab use in emit_move_insn_1 that way?  Or even
try movmisalign there...

> 
> > but nowadays x86 seems to be happy with regular moves operating on
> > unaligned memory, using unaligned moves where necessary.
> > 
> > (insn 5 2 8 2 (set (reg:V4SI 82 [ _2 ])
> >         (mem/c:V4SI (reg/f:SI 16 argp) [2 s.v+0 S16 A32])) "t.c":7:11 1229 
> > {movv4si_internal}
> >      (nil))
> > 
> > and with GCC 4.8 we ended up with the following expansion which is
> > also correct.
> > 
> > (insn 2 4 3 2 (set (subreg:V16QI (reg/v:V4SI 61 [ s ]) 0)
> >         (unspec:V16QI [
> >                 (mem/c:V16QI (reg/f:SI 16 argp) [0 s+0 S16 A32])
> >             ] UNSPEC_LOADU)) t.c:6 1164 {sse2_loaddqu}
> >      (nil))
> > 
> > So it seems it has been too long and I don't remember what is
> > special with arm that it doesn't work...  it possibly simply
> > trusts GET_MODE_ALIGNMENT, never looking at MEM_ALIGN which
> > I think is OK-ish?
> > 
> 
> Yes, that is what Richard said as well.
> 
> >>>>> Similarly the very same issue should exist on x86_64 which is
> >>>>> !STRICT_ALIGNMENT, it's just the ABI seems to provide the appropriate
> >>>>> alignment on the caller side.  So the STRICT_ALIGNMENT check is
> >>>>> a wrong one.
> >>>>>
> >>>>
> >>>> I may be plain wrong here, but I thought that !STRICT_ALIGNMENT targets
> >>>> just use MEM_ALIGN to select the right instructions.  MEM_ALIGN
> >>>> is always 32-bit align on the DImode memory.  The x86_64 vector instructions
> >>>> would look at MEM_ALIGN and do the right thing, yes?
> >>>
> >>> No, they need to use the movmisalign optab and end up with UNSPECs
> >>> for example.
> >> Ah, thanks, now I see.
> >>
> >>>> It seems to be the definition of STRICT_ALIGNMENT targets that all RTL
> >>>> instructions need to have MEM_ALIGN >= GET_MODE_ALIGNMENT, so the target
> >>>> does not even have to look at MEM_ALIGN except in the mov_misalign_optab,
> >>>> right?
> >>>
> >>> Yes, I think we never losened that.  Note that RTL expansion has to
> >>> fix this up for them.  Note that strictly speaking SLOW_UNALIGNED_ACCESS
> >>> specifies that x86 is strict-align wrt vector modes.
> >>>
> >>
> >> Yes I agree, the code would be incorrect for x86 as well when the movmisalign_optab
> >> is not used.  So I invoke the movmisalign optab if available and if not fall
> >> back to extract_bit_field.  As in the assign_parm_setup_stack assign_parm_setup_reg
> >> assumes that data->promoted_mode != data->nominal_mode does not happen with
> >> misaligned stack slots.
> >>
> >>
> >> Attached is the v3 if my patch.
> >>
> >> Boot-strapped and reg-tested on x86_64-pc-linux-gnu and arm-linux-gnueabihf.
> >>
> >> Is it OK for trunk?
> > 
> > Few comments.
> > 
> > @@ -2274,8 +2274,6 @@ struct assign_parm_data_one
> >    int partial;
> >    BOOL_BITFIELD named_arg : 1;
> >    BOOL_BITFIELD passed_pointer : 1;
> > -  BOOL_BITFIELD on_stack : 1;
> > -  BOOL_BITFIELD loaded_in_reg : 1;
> >  };
> > 
> >  /* A subroutine of assign_parms.  Initialize ALL.  */
> > 
> > independently OK.
> > 
> > @@ -2813,8 +2826,9 @@ assign_parm_adjust_stack_rtl (struct assign_parm_d
> >       ultimate type, don't use that slot after entry.  We'll make another
> >       stack slot, if we need one.  */
> >    if (stack_parm
> > -      && ((STRICT_ALIGNMENT
> > -          && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> > (stack_parm))
> > +      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN 
> > (stack_parm)
> > +          && targetm.slow_unaligned_access (data->nominal_mode,
> > +                                            MEM_ALIGN (stack_parm)))
> >           || (data->nominal_type
> >               && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
> >               && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
> > 
> > looks like something we should have as a separate commit as well.  It
> > also looks obvious to me.
> > 
> 
> Okay, committed as two separate commits: r274023 & r274025
> 
> > @@ -3292,6 +3306,23 @@ assign_parm_setup_reg (struct assign_parm_data_all
> > 
> >        did_conversion = true;
> >      }
> > +  else if (MEM_P (data->entry_parm)
> > +          && GET_MODE_ALIGNMENT (promoted_nominal_mode)
> > +             > MEM_ALIGN (data->entry_parm)
> > 
> > we arrive here by-passing
> > 
> >   else if (need_conversion)
> >     {
> >       /* We did not have an insn to convert directly, or the sequence
> >          generated appeared unsafe.  We must first copy the parm to a
> >          pseudo reg, and save the conversion until after all
> >          parameters have been moved.  */
> > 
> >       int save_tree_used;
> >       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> > 
> >       emit_move_insn (tempreg, validated_mem);
> > 
> > but this move instruction is invalid in the same way as the case
> > you fix, no?  So wouldn't it be better to do
> > 
> 
> We could do that, but I supposed that there must be a reason why
> assign_parm_setup_stack gets away with that same:
> 
>   if (data->promoted_mode != data->nominal_mode)
>     {
>       /* Conversion is required.  */
>       rtx tempreg = gen_reg_rtx (GET_MODE (data->entry_parm));
> 
>       emit_move_insn (tempreg, validize_mem (copy_rtx (data->entry_parm)));
> 
> 
> So either some back-ends are too permissive with us,
> or there is a reason why promoted_mode != nominal_mode
> does not happen together with unaligned entry_parm.
> In a way that would be a rather unusual ABI.
> 
> >   if (moved)
> >     /* Nothing to do.  */
> >     ;
> >   else
> >     {
> >        if (unaligned)
> >          ...
> >        else
> >          emit_move_insn (...);
> > 
> >        if (need_conversion)
> >  ....
> >     }
> > 
> > ?  Hopefully whatever "moved" things in the if (moved) case did
> > it correctly.
> > 
> 
> It would'nt.  It uses the gen_extend_insn would that be expected to
> work with unaligned memory?

No idea..

> > Can you check whehter your patch does anything to the x86 testcase
> > posted above?
> > 
> 
> Thanks, it might help to have at least a test case where the pattern
> is expanded, even if it does not change anything.
> 
> > I'm not very familiar with this code so I'm leaving actual approval
> > to somebody else.  Still hope the comments were helpful.
> > 
> 
> Yes they are, thanks a lot.

Sorry for the slow response(s).

Richard.
diff mbox series

Patch

2019-07-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* function.c (assign_param_data_one): Remove unused data members.
	(assign_parm_find_stack_rtl): Use larger alignment when possible.
	(assign_parm_adjust_stack_rtl): Revise STRICT_ALIGNMENT check.
	(assign_parm_setup_reg): Handle misaligned stack arguments.

testsuite:
2019-07-30  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	PR middle-end/89544
	* gcc.target/arm/unaligned-argument-1.c: New test.
	* gcc.target/arm/unaligned-argument-2.c: New test.

Index: gcc/function.c
===================================================================
--- gcc/function.c	(revision 273767)
+++ gcc/function.c	(working copy)
@@ -2274,8 +2274,6 @@  struct assign_parm_data_one
   int partial;
   BOOL_BITFIELD named_arg : 1;
   BOOL_BITFIELD passed_pointer : 1;
-  BOOL_BITFIELD on_stack : 1;
-  BOOL_BITFIELD loaded_in_reg : 1;
 };
 
 /* A subroutine of assign_parms.  Initialize ALL.  */
@@ -2699,8 +2697,23 @@  assign_parm_find_stack_rtl (tree parm, struct assi
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
   poly_int64 offset;
-  if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
+  if (data->locate.where_pad == PAD_NONE || data->entry_parm)
     align = boundary;
+  else if (data->locate.where_pad == PAD_UPWARD)
+    {
+      align = boundary;
+      /* If the argument offset is actually more aligned than the nominal
+	 stack slot boundary, take advantage of that excess alignment.
+	 Don't make any assumptions if STACK_POINTER_OFFSET is in use.  */
+      if (poly_int_rtx_p (offset_rtx, &offset)
+	  && STACK_POINTER_OFFSET == 0)
+	{
+	  unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+	  if (offset_align == 0 || offset_align > STACK_BOUNDARY)
+	    offset_align = STACK_BOUNDARY;
+	  align = MAX (align, offset_align);
+	}
+    }
   else if (poly_int_rtx_p (offset_rtx, &offset))
     {
       align = least_bit_hwi (boundary);
@@ -2813,8 +2826,9 @@  assign_parm_adjust_stack_rtl (struct assign_parm_d
      ultimate type, don't use that slot after entry.  We'll make another
      stack slot, if we need one.  */
   if (stack_parm
-      && ((STRICT_ALIGNMENT
-	   && GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm))
+      && ((GET_MODE_ALIGNMENT (data->nominal_mode) > MEM_ALIGN (stack_parm)
+	   && targetm.slow_unaligned_access (data->nominal_mode,
+					     MEM_ALIGN (stack_parm)))
 	  || (data->nominal_type
 	      && TYPE_ALIGN (data->nominal_type) > MEM_ALIGN (stack_parm)
 	      && MEM_ALIGN (stack_parm) < PREFERRED_STACK_BOUNDARY)))
@@ -3292,6 +3306,23 @@  assign_parm_setup_reg (struct assign_parm_data_all
 
       did_conversion = true;
     }
+  else if (MEM_P (data->entry_parm)
+	   && GET_MODE_ALIGNMENT (promoted_nominal_mode)
+	      > MEM_ALIGN (data->entry_parm)
+	   && targetm.slow_unaligned_access (promoted_nominal_mode,
+					     MEM_ALIGN (data->entry_parm)))
+    {
+      enum insn_code icode = optab_handler (movmisalign_optab,
+					    promoted_nominal_mode);
+
+      if (icode != CODE_FOR_nothing)
+	emit_insn (GEN_FCN (icode) (parmreg, validated_mem));
+      else
+	rtl = parmreg = extract_bit_field (validated_mem,
+			GET_MODE_BITSIZE (promoted_nominal_mode), 0,
+			unsignedp, parmreg,
+			promoted_nominal_mode, VOIDmode, false, NULL);
+    }
   else
     emit_move_insn (parmreg, validated_mem);
 
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-1.c	(working copy)
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 1 } } */
+/* { dg-final { scan-assembler-times "strd" 1 } } */
+/* { dg-final { scan-assembler-times "stm" 0 } } */
Index: gcc/testsuite/gcc.target/arm/unaligned-argument-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/unaligned-argument-2.c	(working copy)
@@ -0,0 +1,17 @@ 
+/* { dg-do compile } */
+/* { dg-options "-marm -march=armv6 -mno-unaligned-access -mfloat-abi=soft -mabi=aapcs -O3" } */
+
+struct s {
+  int a, b;
+} __attribute__((aligned(8)));
+
+struct s f0;
+
+void f(int a, int b, int c, int d, int e, struct s f)
+{
+  f0 = f;
+}
+
+/* { dg-final { scan-assembler-times "ldrd" 0 } } */
+/* { dg-final { scan-assembler-times "strd" 0 } } */
+/* { dg-final { scan-assembler-times "stm" 1 } } */