diff mbox

Fix regression caused by PR65310 fix

Message ID alpine.LSU.2.11.1503111607530.10796@zhemvz.fhfr.qr
State New
Headers show

Commit Message

Richard Biener March 11, 2015, 3:08 p.m. UTC
This fixes a vectorizer testcase regression on powerpc where SRA
drops alignment info unnecessarily.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-03-11  Richard Biener  <rguenther@suse.de>

	PR tree-optimization/65310
	* tree-sra.c (build_ref_for_offset): Also preserve larger
	alignment.

Comments

Alan Lawrence March 30, 2015, 12:15 p.m. UTC | #1
We've been seeing a bunch of new failures in the *libffi* testsuite on ARM Linux 
(arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this one-liner 
fix. I've reduced the testcase down to the attached (including removing any 
dependency on libffi); with gcc r221347, this prints the expected
7 8 9
whereas with gcc r221348, instead it prints
0 8 0

The action of r221348 is to change the alignment of a mem_ref, and a var_decl of 
b1, from 32 to 64; both have type
  type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 BLK
         size <integer_cst 0x2b9b8d3720a8 constant 192>
         unit size <integer_cst 0x2b9b8d372078 constant 24>
         align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
         fields <field_decl 0x2b9b8d42b098 a type <integer_type 0x2b9b8d092690 int>
             SI file reduced.c line 12 col 7
             size <integer_cst 0x2b9b8d08eeb8 constant 32>
             unit size <integer_cst 0x2b9b8d08eed0 constant 4>
             align 32 offset_align 64
             offset <integer_cst 0x2b9b8d08eee8 constant 0>
             bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context 
<record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl 0x2b9b8d42b130 
b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
         pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl 
0x2b9b8d42b000 D.6044>>

The tree-optimized output is the same with both compilers (as this does not 
mention alignment); the expand output differs.

Still investigating...

--Alan


Richard Biener wrote:
> This fixes a vectorizer testcase regression on powerpc where SRA
> drops alignment info unnecessarily.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> 
> Richard.
> 
> 2015-03-11  Richard Biener  <rguenther@suse.de>
> 
> 	PR tree-optimization/65310
> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
> 	alignment.
> 
> Index: gcc/tree-sra.c
> ===================================================================
> --- gcc/tree-sra.c	(revision 221324)
> +++ gcc/tree-sra.c	(working copy)
> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>    misalign = (misalign + offset) & (align - 1);
>    if (misalign != 0)
>      align = (misalign & -misalign);
> -  if (align < TYPE_ALIGN (exp_type))
> +  if (align != TYPE_ALIGN (exp_type))
>      exp_type = build_aligned_type (exp_type, align);
>  
>    mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> 
>
Alan Lawrence March 30, 2015, 12:18 p.m. UTC | #2
...actually attach the testcase...

Alan Lawrence wrote:
> We've been seeing a bunch of new failures in the *libffi* testsuite on ARM Linux 
> (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this one-liner 
> fix. I've reduced the testcase down to the attached (including removing any 
> dependency on libffi); with gcc r221347, this prints the expected
> 7 8 9
> whereas with gcc r221348, instead it prints
> 0 8 0
> 
> The action of r221348 is to change the alignment of a mem_ref, and a var_decl of 
> b1, from 32 to 64; both have type
>   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 BLK
>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
>          fields <field_decl 0x2b9b8d42b098 a type <integer_type 0x2b9b8d092690 int>
>              SI file reduced.c line 12 col 7
>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>              align 32 offset_align 64
>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context 
> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl 0x2b9b8d42b130 
> b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl 
> 0x2b9b8d42b000 D.6044>>
> 
> The tree-optimized output is the same with both compilers (as this does not 
> mention alignment); the expand output differs.
> 
> Still investigating...
> 
> --Alan
> 
> 
> Richard Biener wrote:
>> This fixes a vectorizer testcase regression on powerpc where SRA
>> drops alignment info unnecessarily.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>
>> Richard.
>>
>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>
>> 	PR tree-optimization/65310
>> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
>> 	alignment.
>>
>> Index: gcc/tree-sra.c
>> ===================================================================
>> --- gcc/tree-sra.c	(revision 221324)
>> +++ gcc/tree-sra.c	(working copy)
>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>    misalign = (misalign + offset) & (align - 1);
>>    if (misalign != 0)
>>      align = (misalign & -misalign);
>> -  if (align < TYPE_ALIGN (exp_type))
>> +  if (align != TYPE_ALIGN (exp_type))
>>      exp_type = build_aligned_type (exp_type, align);
>>  
>>    mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
>>
>>
> 
>
Richard Biener March 30, 2015, 1:01 p.m. UTC | #3
On Mon, 30 Mar 2015, Alan Lawrence wrote:

> ...actually attach the testcase...

What compile options?

> Alan Lawrence wrote:
> > We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
> > Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
> > one-liner fix. I've reduced the testcase down to the attached (including
> > removing any dependency on libffi); with gcc r221347, this prints the
> > expected
> > 7 8 9
> > whereas with gcc r221348, instead it prints
> > 0 8 0
> > 
> > The action of r221348 is to change the alignment of a mem_ref, and a
> > var_decl of b1, from 32 to 64; both have type
> >   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
> > BLK
> >          size <integer_cst 0x2b9b8d3720a8 constant 192>
> >          unit size <integer_cst 0x2b9b8d372078 constant 24>
> >          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
> >          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> > 0x2b9b8d092690 int>
> >              SI file reduced.c line 12 col 7
> >              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> >              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> >              align 32 offset_align 64
> >              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> >              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
> > <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> > 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
> >          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
> > 0x2b9b8d42b000 D.6044>>
> > 
> > The tree-optimized output is the same with both compilers (as this does not
> > mention alignment); the expand output differs.
> > 
> > Still investigating...
> > 
> > --Alan
> > 
> > 
> > Richard Biener wrote:
> > > This fixes a vectorizer testcase regression on powerpc where SRA
> > > drops alignment info unnecessarily.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > > 
> > > Richard.
> > > 
> > > 2015-03-11  Richard Biener  <rguenther@suse.de>
> > > 
> > > 	PR tree-optimization/65310
> > > 	* tree-sra.c (build_ref_for_offset): Also preserve larger
> > > 	alignment.
> > > 
> > > Index: gcc/tree-sra.c
> > > ===================================================================
> > > --- gcc/tree-sra.c	(revision 221324)
> > > +++ gcc/tree-sra.c	(working copy)
> > > @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> > >    misalign = (misalign + offset) & (align - 1);
> > >    if (misalign != 0)
> > >      align = (misalign & -misalign);
> > > -  if (align < TYPE_ALIGN (exp_type))
> > > +  if (align != TYPE_ALIGN (exp_type))
> > >      exp_type = build_aligned_type (exp_type, align);
> > >     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> > > 
> > > 
> > 
> > 
> 
> 
> 
>
Richard Biener March 30, 2015, 1:16 p.m. UTC | #4
On Mon, 30 Mar 2015, Richard Biener wrote:

> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> 
> > ...actually attach the testcase...
> 
> What compile options?

Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
I can't see anything not guaranteeing that:

        .section        .rodata
        .align  3
.LANCHOR0 = . + 0
.LC1:
        .ascii  "%d %g %d\012\000"
        .space  6
.LC0:
        .word   7
        .space  4
        .word   0
        .word   1075838976
        .word   9
        .space  4

maybe there is some more generic code-gen bug for aligned aggregate
copy?  That is, the patch tells the backend that the loads and
stores to the 'int' vars (which have padding followed) is aligned
to 8 bytes.

I don't see what is wrong in the final assembler, but maybe
some endian issue exists?  The code looks quite ugly though ;)

Richard.

> 
> > Alan Lawrence wrote:
> > > We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
> > > Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
> > > one-liner fix. I've reduced the testcase down to the attached (including
> > > removing any dependency on libffi); with gcc r221347, this prints the
> > > expected
> > > 7 8 9
> > > whereas with gcc r221348, instead it prints
> > > 0 8 0
> > > 
> > > The action of r221348 is to change the alignment of a mem_ref, and a
> > > var_decl of b1, from 32 to 64; both have type
> > >   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
> > > BLK
> > >          size <integer_cst 0x2b9b8d3720a8 constant 192>
> > >          unit size <integer_cst 0x2b9b8d372078 constant 24>
> > >          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
> > >          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> > > 0x2b9b8d092690 int>
> > >              SI file reduced.c line 12 col 7
> > >              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> > >              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> > >              align 32 offset_align 64
> > >              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> > >              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
> > > <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> > > 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
> > >          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
> > > 0x2b9b8d42b000 D.6044>>
> > > 
> > > The tree-optimized output is the same with both compilers (as this does not
> > > mention alignment); the expand output differs.
> > > 
> > > Still investigating...
> > > 
> > > --Alan
> > > 
> > > 
> > > Richard Biener wrote:
> > > > This fixes a vectorizer testcase regression on powerpc where SRA
> > > > drops alignment info unnecessarily.
> > > > 
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > > > 
> > > > Richard.
> > > > 
> > > > 2015-03-11  Richard Biener  <rguenther@suse.de>
> > > > 
> > > > 	PR tree-optimization/65310
> > > > 	* tree-sra.c (build_ref_for_offset): Also preserve larger
> > > > 	alignment.
> > > > 
> > > > Index: gcc/tree-sra.c
> > > > ===================================================================
> > > > --- gcc/tree-sra.c	(revision 221324)
> > > > +++ gcc/tree-sra.c	(working copy)
> > > > @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> > > >    misalign = (misalign + offset) & (align - 1);
> > > >    if (misalign != 0)
> > > >      align = (misalign & -misalign);
> > > > -  if (align < TYPE_ALIGN (exp_type))
> > > > +  if (align != TYPE_ALIGN (exp_type))
> > > >      exp_type = build_aligned_type (exp_type, align);
> > > >     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
>
Alan Lawrence March 30, 2015, 4:45 p.m. UTC | #5
-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes it.

The problem appears to be in laying out arguments, specifically varargs. From 
the "good" -fdump-rtl-expand:

(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0  S4 A32])
         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
      (nil))
(insn 19 18 20 2 (set (reg:DF 2 r2)
         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
      (nil))
(insn 20 19 21 2 (set (reg:SI 1 r1)
         (reg:SI 113 [ b1 ])) reduced.c:14 -1
      (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
         (reg:SI 118)) reduced.c:14 -1
      (nil))
(call_insn 22 21 23 2 (parallel [
             (set (reg:SI 0 r0)
                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
<function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 A32])

The struct members are
reg:SI 113 => int a;
reg:DF 112 => double b;
reg:SI 111 => int c;

r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is pushed 
into virtual-outgoing-args. In contrast, post-change to build_ref_of_offset, we get:

(insn 17 16 18 2 (set (reg:SI 118)
         (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750 
*.LC1>)) reduced.c:14 -1
      (nil))
(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 virtual-outgoing-args)
                 (const_int 8 [0x8])) [0  S4 A64])
         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
      (nil))
(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0  S8 A64])
         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
      (nil))
(insn 20 19 21 2 (set (reg:SI 2 r2)
         (reg:SI 113 [ b1 ])) reduced.c:14 -1
      (nil))
(insn 21 20 22 2 (set (reg:SI 0 r0)
         (reg:SI 118)) reduced.c:14 -1
      (nil))
(call_insn 22 21 23 2 (parallel [
             (set (reg:SI 0 r0)
                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
<function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 A32])

r0 still gets the format string, but 'int b1.a' now goes in r2, and the 
double+following int are all pushed into virtual-outgoing-args. This is because 
arm_function_arg is fed a 64-bit-aligned int as type of the second argument (the 
type constructed by build_ref_for_offset); it then executes (aapcs_layout_arg, 
arm.c line ~~5914)

   /* C3 - For double-word aligned arguments, round the NCRN up to the
      next even number.  */
   ncrn = pcum->aapcs_ncrn;
   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
     ncrn++;

Which changes r1 to r2. Passing -fno-tree-sra, or removing from the testcase 
"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a 
32-bit-aligned int instead, which works as previously.

Passing the same members of that struct in a non-vargs call, works ok - I think 
because these use the type of the declared parameters, rather than the provided 
arguments, and the former do not have the increased alignment from 
build_ref_for_offset.

FWIW, I also tried:

__attribute__((__aligned__((16)))) int x;
int main (void)
{
   __builtin_printf("%d\n", x);
}

but in that case, the arm_function_arg is still fed a type with alignment 32 
(bits), i.e. distinct from the type of the field 'x' in memory, which has 
alignment 128.

--Alan

Richard Biener wrote:
> On Mon, 30 Mar 2015, Richard Biener wrote:
> 
>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>
>>> ...actually attach the testcase...
>> What compile options?
> 
> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> I can't see anything not guaranteeing that:
> 
>         .section        .rodata
>         .align  3
> .LANCHOR0 = . + 0
> .LC1:
>         .ascii  "%d %g %d\012\000"
>         .space  6
> .LC0:
>         .word   7
>         .space  4
>         .word   0
>         .word   1075838976
>         .word   9
>         .space  4
> 
> maybe there is some more generic code-gen bug for aligned aggregate
> copy?  That is, the patch tells the backend that the loads and
> stores to the 'int' vars (which have padding followed) is aligned
> to 8 bytes.
> 
> I don't see what is wrong in the final assembler, but maybe
> some endian issue exists?  The code looks quite ugly though ;)
> 
> Richard.
> 
>>> Alan Lawrence wrote:
>>>> We've been seeing a bunch of new failures in the *libffi* testsuite on ARM
>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this
>>>> one-liner fix. I've reduced the testcase down to the attached (including
>>>> removing any dependency on libffi); with gcc r221347, this prints the
>>>> expected
>>>> 7 8 9
>>>> whereas with gcc r221348, instead it prints
>>>> 0 8 0
>>>>
>>>> The action of r221348 is to change the alignment of a mem_ref, and a
>>>> var_decl of b1, from 32 to 64; both have type
>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0
>>>> BLK
>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>          align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20
>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>> 0x2b9b8d092690 int>
>>>>              SI file reduced.c line 12 col 7
>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>              align 32 offset_align 64
>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context
>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070>
>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl
>>>> 0x2b9b8d42b000 D.6044>>
>>>>
>>>> The tree-optimized output is the same with both compilers (as this does not
>>>> mention alignment); the expand output differs.
>>>>
>>>> Still investigating...
>>>>
>>>> --Alan
>>>>
>>>>
>>>> Richard Biener wrote:
>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>> drops alignment info unnecessarily.
>>>>>
>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>
>>>>> Richard.
>>>>>
>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>
>>>>> 	PR tree-optimization/65310
>>>>> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>> 	alignment.
>>>>>
>>>>> Index: gcc/tree-sra.c
>>>>> ===================================================================
>>>>> --- gcc/tree-sra.c	(revision 221324)
>>>>> +++ gcc/tree-sra.c	(working copy)
>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>    if (misalign != 0)
>>>>>      align = (misalign & -misalign);
>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
Richard Biener March 30, 2015, 8:13 p.m. UTC | #6
On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>it.
>
>The problem appears to be in laying out arguments, specifically
>varargs. From 
>the "good" -fdump-rtl-expand:
>
>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 
>S4 A32])
>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>      (nil))
>(insn 19 18 20 2 (set (reg:DF 2 r2)
>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>      (nil))
>(insn 20 19 21 2 (set (reg:SI 1 r1)
>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>      (nil))
>(insn 21 20 22 2 (set (reg:SI 0 r0)
>         (reg:SI 118)) reduced.c:14 -1
>      (nil))
>(call_insn 22 21 23 2 (parallel [
>             (set (reg:SI 0 r0)
>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>A32])
>
>The struct members are
>reg:SI 113 => int a;
>reg:DF 112 => double b;
>reg:SI 111 => int c;
>
>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>pushed 
>into virtual-outgoing-args. In contrast, post-change to
>build_ref_of_offset, we get:
>
>(insn 17 16 18 2 (set (reg:SI 118)
>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750 
>*.LC1>)) reduced.c:14 -1
>      (nil))
>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>virtual-outgoing-args)
>                 (const_int 8 [0x8])) [0  S4 A64])
>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>      (nil))
>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 
>S8 A64])
>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>      (nil))
>(insn 20 19 21 2 (set (reg:SI 2 r2)
>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>      (nil))
>(insn 21 20 22 2 (set (reg:SI 0 r0)
>         (reg:SI 118)) reduced.c:14 -1
>      (nil))
>(call_insn 22 21 23 2 (parallel [
>             (set (reg:SI 0 r0)
>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] 
><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>A32])
>
>r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>
>double+following int are all pushed into virtual-outgoing-args. This is
>because 
>arm_function_arg is fed a 64-bit-aligned int as type of the second
>argument (the 
>type constructed by build_ref_for_offset); it then executes
>(aapcs_layout_arg, 
>arm.c line ~~5914)
>
>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>      next even number.  */
>   ncrn = pcum->aapcs_ncrn;
>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>     ncrn++;
>
>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>testcase 
>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a 
>32-bit-aligned int instead, which works as previously.
>
>Passing the same members of that struct in a non-vargs call, works ok -
>I think 
>because these use the type of the declared parameters, rather than the
>provided 
>arguments, and the former do not have the increased alignment from 
>build_ref_for_offset.

It doesn't make sense to use the alignment of passed values.  That looks like bs.

This means that

Int I __aligned__(8);

Is passed differently than int.

Arm_function_arg needs to be fixed.

Richard.

>
>FWIW, I also tried:
>
>__attribute__((__aligned__((16)))) int x;
>int main (void)
>{
>   __builtin_printf("%d\n", x);
>}
>
>but in that case, the arm_function_arg is still fed a type with
>alignment 32 
>(bits), i.e. distinct from the type of the field 'x' in memory, which
>has 
>alignment 128.
>
>--Alan
>
>Richard Biener wrote:
>> On Mon, 30 Mar 2015, Richard Biener wrote:
>> 
>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>
>>>> ...actually attach the testcase...
>>> What compile options?
>> 
>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>> I can't see anything not guaranteeing that:
>> 
>>         .section        .rodata
>>         .align  3
>> .LANCHOR0 = . + 0
>> .LC1:
>>         .ascii  "%d %g %d\012\000"
>>         .space  6
>> .LC0:
>>         .word   7
>>         .space  4
>>         .word   0
>>         .word   1075838976
>>         .word   9
>>         .space  4
>> 
>> maybe there is some more generic code-gen bug for aligned aggregate
>> copy?  That is, the patch tells the backend that the loads and
>> stores to the 'int' vars (which have padding followed) is aligned
>> to 8 bytes.
>> 
>> I don't see what is wrong in the final assembler, but maybe
>> some endian issue exists?  The code looks quite ugly though ;)
>> 
>> Richard.
>> 
>>>> Alan Lawrence wrote:
>>>>> We've been seeing a bunch of new failures in the *libffi*
>testsuite on ARM
>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>following this
>>>>> one-liner fix. I've reduced the testcase down to the attached
>(including
>>>>> removing any dependency on libffi); with gcc r221347, this prints
>the
>>>>> expected
>>>>> 7 8 9
>>>>> whereas with gcc r221348, instead it prints
>>>>> 0 8 0
>>>>>
>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>a
>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>sizes-gimplified type_0
>>>>> BLK
>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>          align 64 symtab 0 alias set 1 canonical type
>0x2b9b8d428d20
>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>> 0x2b9b8d092690 int>
>>>>>              SI file reduced.c line 12 col 7
>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>              align 32 offset_align 64
>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>context
>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>D.6070>
>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
><type_decl
>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>
>>>>> The tree-optimized output is the same with both compilers (as this
>does not
>>>>> mention alignment); the expand output differs.
>>>>>
>>>>> Still investigating...
>>>>>
>>>>> --Alan
>>>>>
>>>>>
>>>>> Richard Biener wrote:
>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>> drops alignment info unnecessarily.
>>>>>>
>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>
>>>>>> 	PR tree-optimization/65310
>>>>>> 	* tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>> 	alignment.
>>>>>>
>>>>>> Index: gcc/tree-sra.c
>>>>>>
>===================================================================
>>>>>> --- gcc/tree-sra.c	(revision 221324)
>>>>>> +++ gcc/tree-sra.c	(working copy)
>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>    if (misalign != 0)
>>>>>>      align = (misalign & -misalign);
>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>off);
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
Richard Biener March 31, 2015, 7:50 a.m. UTC | #7
On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>it.
>>
>>The problem appears to be in laying out arguments, specifically
>>varargs. From
>>the "good" -fdump-rtl-expand:
>>
>>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>S4 A32])
>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>      (nil))
>>(insn 19 18 20 2 (set (reg:DF 2 r2)
>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>      (nil))
>>(insn 20 19 21 2 (set (reg:SI 1 r1)
>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>      (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>>         (reg:SI 118)) reduced.c:14 -1
>>      (nil))
>>(call_insn 22 21 23 2 (parallel [
>>             (set (reg:SI 0 r0)
>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>A32])
>>
>>The struct members are
>>reg:SI 113 => int a;
>>reg:DF 112 => double b;
>>reg:SI 111 => int c;
>>
>>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>pushed
>>into virtual-outgoing-args. In contrast, post-change to
>>build_ref_of_offset, we get:
>>
>>(insn 17 16 18 2 (set (reg:SI 118)
>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>*.LC1>)) reduced.c:14 -1
>>      (nil))
>>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>virtual-outgoing-args)
>>                 (const_int 8 [0x8])) [0  S4 A64])
>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>      (nil))
>>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>S8 A64])
>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>      (nil))
>>(insn 20 19 21 2 (set (reg:SI 2 r2)
>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>      (nil))
>>(insn 21 20 22 2 (set (reg:SI 0 r0)
>>         (reg:SI 118)) reduced.c:14 -1
>>      (nil))
>>(call_insn 22 21 23 2 (parallel [
>>             (set (reg:SI 0 r0)
>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>A32])
>>
>>r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>
>>double+following int are all pushed into virtual-outgoing-args. This is
>>because
>>arm_function_arg is fed a 64-bit-aligned int as type of the second
>>argument (the
>>type constructed by build_ref_for_offset); it then executes
>>(aapcs_layout_arg,
>>arm.c line ~~5914)
>>
>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>      next even number.  */
>>   ncrn = pcum->aapcs_ncrn;
>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>     ncrn++;
>>
>>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>testcase
>>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>32-bit-aligned int instead, which works as previously.
>>
>>Passing the same members of that struct in a non-vargs call, works ok -
>>I think
>>because these use the type of the declared parameters, rather than the
>>provided
>>arguments, and the former do not have the increased alignment from
>>build_ref_for_offset.
>
> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>
> This means that
>
> Int I __aligned__(8);
>
> Is passed differently than int.
>
> Arm_function_arg needs to be fixed.

That is,

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", i, j);
}

or

myint i;
int j;
myint *p = &i;
int *q = &j;

int main()
{
  __builtin_printf("%d %d", *p, *q);
}

should behave the same.  There isn't a printf modifier for an "aligned int"
because that sort of thing doesn't make sense.  Special-casing aligned vs.
non-aligned values only makes sense for things passed by value on the stack.
And then obviously only dependent on the functuion type signature, not
on the type of the passed value.

Richard.

> Richard.
>
>>
>>FWIW, I also tried:
>>
>>__attribute__((__aligned__((16)))) int x;
>>int main (void)
>>{
>>   __builtin_printf("%d\n", x);
>>}
>>
>>but in that case, the arm_function_arg is still fed a type with
>>alignment 32
>>(bits), i.e. distinct from the type of the field 'x' in memory, which
>>has
>>alignment 128.
>>
>>--Alan
>>
>>Richard Biener wrote:
>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>
>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>
>>>>> ...actually attach the testcase...
>>>> What compile options?
>>>
>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>> I can't see anything not guaranteeing that:
>>>
>>>         .section        .rodata
>>>         .align  3
>>> .LANCHOR0 = . + 0
>>> .LC1:
>>>         .ascii  "%d %g %d\012\000"
>>>         .space  6
>>> .LC0:
>>>         .word   7
>>>         .space  4
>>>         .word   0
>>>         .word   1075838976
>>>         .word   9
>>>         .space  4
>>>
>>> maybe there is some more generic code-gen bug for aligned aggregate
>>> copy?  That is, the patch tells the backend that the loads and
>>> stores to the 'int' vars (which have padding followed) is aligned
>>> to 8 bytes.
>>>
>>> I don't see what is wrong in the final assembler, but maybe
>>> some endian issue exists?  The code looks quite ugly though ;)
>>>
>>> Richard.
>>>
>>>>> Alan Lawrence wrote:
>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>testsuite on ARM
>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>following this
>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>(including
>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>the
>>>>>> expected
>>>>>> 7 8 9
>>>>>> whereas with gcc r221348, instead it prints
>>>>>> 0 8 0
>>>>>>
>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>a
>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>sizes-gimplified type_0
>>>>>> BLK
>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>0x2b9b8d428d20
>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>> 0x2b9b8d092690 int>
>>>>>>              SI file reduced.c line 12 col 7
>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>              align 32 offset_align 64
>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>context
>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>D.6070>
>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>><type_decl
>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>
>>>>>> The tree-optimized output is the same with both compilers (as this
>>does not
>>>>>> mention alignment); the expand output differs.
>>>>>>
>>>>>> Still investigating...
>>>>>>
>>>>>> --Alan
>>>>>>
>>>>>>
>>>>>> Richard Biener wrote:
>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>> drops alignment info unnecessarily.
>>>>>>>
>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>
>>>>>>>  PR tree-optimization/65310
>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>  alignment.
>>>>>>>
>>>>>>> Index: gcc/tree-sra.c
>>>>>>>
>>===================================================================
>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>    if (misalign != 0)
>>>>>>>      align = (misalign & -misalign);
>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>off);
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>
>
Richard Earnshaw March 31, 2015, 9:43 a.m. UTC | #8
On 31/03/15 08:50, Richard Biener wrote:
> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>> it.
>>>
>>> The problem appears to be in laying out arguments, specifically
>>> varargs. From
>>> the "good" -fdump-rtl-expand:
>>>
>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>> S4 A32])
>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>         (reg:SI 118)) reduced.c:14 -1
>>>      (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>>             (set (reg:SI 0 r0)
>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>> A32])
>>>
>>> The struct members are
>>> reg:SI 113 => int a;
>>> reg:DF 112 => double b;
>>> reg:SI 111 => int c;
>>>
>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>> pushed
>>> into virtual-outgoing-args. In contrast, post-change to
>>> build_ref_of_offset, we get:
>>>
>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>> *.LC1>)) reduced.c:14 -1
>>>      (nil))
>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>> virtual-outgoing-args)
>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>> S8 A64])
>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>      (nil))
>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>         (reg:SI 118)) reduced.c:14 -1
>>>      (nil))
>>> (call_insn 22 21 23 2 (parallel [
>>>             (set (reg:SI 0 r0)
>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>> A32])
>>>
>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>
>>> double+following int are all pushed into virtual-outgoing-args. This is
>>> because
>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>> argument (the
>>> type constructed by build_ref_for_offset); it then executes
>>> (aapcs_layout_arg,
>>> arm.c line ~~5914)
>>>
>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>      next even number.  */
>>>   ncrn = pcum->aapcs_ncrn;
>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>     ncrn++;
>>>
>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>> testcase
>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>> 32-bit-aligned int instead, which works as previously.
>>>
>>> Passing the same members of that struct in a non-vargs call, works ok -
>>> I think
>>> because these use the type of the declared parameters, rather than the
>>> provided
>>> arguments, and the former do not have the increased alignment from
>>> build_ref_for_offset.
>>
>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>
>> This means that
>>
>> Int I __aligned__(8);
>>
>> Is passed differently than int.
>>
>> Arm_function_arg needs to be fixed.
> 
> That is,
> 
> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", i, j);
> }
> 
> or
> 
> myint i;
> int j;
> myint *p = &i;
> int *q = &j;
> 
> int main()
> {
>   __builtin_printf("%d %d", *p, *q);
> }
> 
> should behave the same.  There isn't a printf modifier for an "aligned int"
> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> non-aligned values only makes sense for things passed by value on the stack.
> And then obviously only dependent on the functuion type signature, not
> on the type of the passed value.
> 

I think the testcase is ill-formed.  Just because printf doesn't have
such a modifier, doesn't mean that another variadic function might not
have a means to detect when an object in the variadic list needs to be
over-aligned.  As such, the test should really be written as:

typedef int myint __attribute__((aligned(8)));

int main()
{
  myint i = 1;
  int j = 2;
  __builtin_printf("%d %d\n", (int) i, j);
}

Variadic functions take the types of their arguments from the types of
the actuals passed.  The compiler should either be applying appropriate
promotion rules to make the types conformant by the language
specification or respecting the types exactly.  However, that should be
done in the mid-end not the back-end.  If incorrect alignment
information is passed to the back-end it can't help but make the wrong
choice.  Examining the mode here wouldn't help either.  Consider:

int foo (int a, int b, int c, int d, int e, int f
__attribute__((aligned(8))));

On ARM that should pass f in a 64-bit aligned location (since the
parameter will be on the stack).

R.


> Richard.
> 
>> Richard.
>>
>>>
>>> FWIW, I also tried:
>>>
>>> __attribute__((__aligned__((16)))) int x;
>>> int main (void)
>>> {
>>>   __builtin_printf("%d\n", x);
>>> }
>>>
>>> but in that case, the arm_function_arg is still fed a type with
>>> alignment 32
>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>> has
>>> alignment 128.
>>>
>>> --Alan
>>>
>>> Richard Biener wrote:
>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>
>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>
>>>>>> ...actually attach the testcase...
>>>>> What compile options?
>>>>
>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>> I can't see anything not guaranteeing that:
>>>>
>>>>         .section        .rodata
>>>>         .align  3
>>>> .LANCHOR0 = . + 0
>>>> .LC1:
>>>>         .ascii  "%d %g %d\012\000"
>>>>         .space  6
>>>> .LC0:
>>>>         .word   7
>>>>         .space  4
>>>>         .word   0
>>>>         .word   1075838976
>>>>         .word   9
>>>>         .space  4
>>>>
>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>> copy?  That is, the patch tells the backend that the loads and
>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>> to 8 bytes.
>>>>
>>>> I don't see what is wrong in the final assembler, but maybe
>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>
>>>> Richard.
>>>>
>>>>>> Alan Lawrence wrote:
>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>> testsuite on ARM
>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>> following this
>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>> (including
>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>> the
>>>>>>> expected
>>>>>>> 7 8 9
>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>> 0 8 0
>>>>>>>
>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>> a
>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>> sizes-gimplified type_0
>>>>>>> BLK
>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>> 0x2b9b8d428d20
>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>> 0x2b9b8d092690 int>
>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>              align 32 offset_align 64
>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>> context
>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>> D.6070>
>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>> <type_decl
>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>
>>>>>>> The tree-optimized output is the same with both compilers (as this
>>> does not
>>>>>>> mention alignment); the expand output differs.
>>>>>>>
>>>>>>> Still investigating...
>>>>>>>
>>>>>>> --Alan
>>>>>>>
>>>>>>>
>>>>>>> Richard Biener wrote:
>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>
>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>
>>>>>>>>  PR tree-optimization/65310
>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>  alignment.
>>>>>>>>
>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>
>>> ===================================================================
>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>    if (misalign != 0)
>>>>>>>>      align = (misalign & -misalign);
>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>> off);
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>>
>
Alan Lawrence March 31, 2015, 9:50 a.m. UTC | #9
Richard Biener wrote:
> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>
>> This means that
>>
>> Int I __aligned__(8);
>>
>> Is passed differently than int.
>>
>> Arm_function_arg needs to be fixed.
> 
> That is,
> 
> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", i, j);
> }
> 
> or
> 
> myint i;
> int j;
> myint *p = &i;
> int *q = &j;
> 
> int main()
> {
>   __builtin_printf("%d %d", *p, *q);
> }
> 
> should behave the same.  There isn't a printf modifier for an "aligned int"
> because that sort of thing doesn't make sense.

Agreed. All of the cases you post do indeed behave the same, and correctly (they 
pass the format string in r0, the next argument in r1, and then r2). This is 
what my "aligned(16)" example was trying to achieve also. From the 
-fdump-rtl-expand of your last example:

(insn 7 6 8 2 (set (reg:SI 117)
         (mem:SI (reg/f:SI 116) [2 *_4+0 S4 A32])) richie2.c:10 -1
      (nil))
(insn 8 7 9 2 (set (reg/f:SI 118)
         (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) richie2.c:10 -1
      (nil))
(insn 9 8 10 2 (set (reg/f:SI 119)
         (mem/f/c:SI (plus:SI (reg/f:SI 118)
                 (const_int 4 [0x4])) [1 p+0 S4 A32])) richie2.c:10 -1
      (nil))
(insn 10 9 11 2 (set (reg:SI 120)
         (mem:SI (reg/f:SI 119) [2 *_2+0 S4 A64])) richie2.c:10 -1      ***
      (nil))
(insn 11 10 12 2 (set (reg:SI 121)
         (symbol_ref/v/f:SI ("*.LC0") [flags 0x82]  <var_decl 0x2b2c3b35d240 
*.LC0>)) richie2.c:10 -1
      (nil))
(insn 12 11 13 2 (set (reg:SI 2 r2)
         (reg:SI 117)) richie2.c:10 -1
      (nil))
(insn 13 12 14 2 (set (reg:SI 1 r1)
         (reg:SI 120)) richie2.c:10 -1
      (nil))
(insn 14 13 15 2 (set (reg:SI 0 r0)
         (reg:SI 121)) richie2.c:10 -1
      (nil))

*** is the load of *p. The mem has alignment 64, but the type describing the 
loaded value *p, passed to arm_function_arg, has alignment 32. Even in the first 
of your examples, or even with an explicit cast to the aligned type:
__builtin_printf("%d %d\n", (myint) i, j);
we still get alignment 32 in arm_function_arg. It's only if SRA is applied, do 
we get the situation where the int with 64-bit alignment, is passed to 
arm_function_arg.

Disclaimer, I haven't tracked down how the alignment information flows through 
the compiler i.e. from build_ref_for_offset into expand_call. But it looks to me 
like something different is happening in the SRA case...no?

--Alan
Richard Biener March 31, 2015, 10 a.m. UTC | #10
On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 08:50, Richard Biener wrote:
> > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>> it.
> >>>
> >>> The problem appears to be in laying out arguments, specifically
> >>> varargs. From
> >>> the "good" -fdump-rtl-expand:
> >>>
> >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S4 A32])
> >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>         (reg:SI 118)) reduced.c:14 -1
> >>>      (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>>             (set (reg:SI 0 r0)
> >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>> A32])
> >>>
> >>> The struct members are
> >>> reg:SI 113 => int a;
> >>> reg:DF 112 => double b;
> >>> reg:SI 111 => int c;
> >>>
> >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>> pushed
> >>> into virtual-outgoing-args. In contrast, post-change to
> >>> build_ref_of_offset, we get:
> >>>
> >>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>> *.LC1>)) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>> virtual-outgoing-args)
> >>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>> S8 A64])
> >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>      (nil))
> >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>         (reg:SI 118)) reduced.c:14 -1
> >>>      (nil))
> >>> (call_insn 22 21 23 2 (parallel [
> >>>             (set (reg:SI 0 r0)
> >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>> A32])
> >>>
> >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>
> >>> double+following int are all pushed into virtual-outgoing-args. This is
> >>> because
> >>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>> argument (the
> >>> type constructed by build_ref_for_offset); it then executes
> >>> (aapcs_layout_arg,
> >>> arm.c line ~~5914)
> >>>
> >>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>      next even number.  */
> >>>   ncrn = pcum->aapcs_ncrn;
> >>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>     ncrn++;
> >>>
> >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>> testcase
> >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>> 32-bit-aligned int instead, which works as previously.
> >>>
> >>> Passing the same members of that struct in a non-vargs call, works ok -
> >>> I think
> >>> because these use the type of the declared parameters, rather than the
> >>> provided
> >>> arguments, and the former do not have the increased alignment from
> >>> build_ref_for_offset.
> >>
> >> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> >>
> >> This means that
> >>
> >> Int I __aligned__(8);
> >>
> >> Is passed differently than int.
> >>
> >> Arm_function_arg needs to be fixed.
> > 
> > That is,
> > 
> > typedef int myint __attribute__((aligned(8)));
> > 
> > int main()
> > {
> >   myint i = 1;
> >   int j = 2;
> >   __builtin_printf("%d %d\n", i, j);
> > }
> > 
> > or
> > 
> > myint i;
> > int j;
> > myint *p = &i;
> > int *q = &j;
> > 
> > int main()
> > {
> >   __builtin_printf("%d %d", *p, *q);
> > }
> > 
> > should behave the same.  There isn't a printf modifier for an "aligned int"
> > because that sort of thing doesn't make sense.  Special-casing aligned vs.
> > non-aligned values only makes sense for things passed by value on the stack.
> > And then obviously only dependent on the functuion type signature, not
> > on the type of the passed value.
> > 
> 
> I think the testcase is ill-formed.  Just because printf doesn't have
> such a modifier, doesn't mean that another variadic function might not
> have a means to detect when an object in the variadic list needs to be
> over-aligned.  As such, the test should really be written as:

A value doesn't have "alignment".  A function may have alignment
requirements on its arguments, clearly printf doesn't.

> typedef int myint __attribute__((aligned(8)));
> 
> int main()
> {
>   myint i = 1;
>   int j = 2;
>   __builtin_printf("%d %d\n", (int) i, j);
> }
> 
> Variadic functions take the types of their arguments from the types of
> the actuals passed.  The compiler should either be applying appropriate
> promotion rules to make the types conformant by the language
> specification or respecting the types exactly.  However, that should be
> done in the mid-end not the back-end.  If incorrect alignment
> information is passed to the back-end it can't help but make the wrong
> choice.  Examining the mode here wouldn't help either.  Consider:
> 
> int foo (int a, int b, int c, int d, int e, int f
> __attribute__((aligned(8))));
> 
> On ARM that should pass f in a 64-bit aligned location (since the
> parameter will be on the stack).
> 
> R.
> 
> 
> > Richard.
> > 
> >> Richard.
> >>
> >>>
> >>> FWIW, I also tried:
> >>>
> >>> __attribute__((__aligned__((16)))) int x;
> >>> int main (void)
> >>> {
> >>>   __builtin_printf("%d\n", x);
> >>> }
> >>>
> >>> but in that case, the arm_function_arg is still fed a type with
> >>> alignment 32
> >>> (bits), i.e. distinct from the type of the field 'x' in memory, which
> >>> has
> >>> alignment 128.
> >>>
> >>> --Alan
> >>>
> >>> Richard Biener wrote:
> >>>> On Mon, 30 Mar 2015, Richard Biener wrote:
> >>>>
> >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> >>>>>
> >>>>>> ...actually attach the testcase...
> >>>>> What compile options?
> >>>>
> >>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> >>>> I can't see anything not guaranteeing that:
> >>>>
> >>>>         .section        .rodata
> >>>>         .align  3
> >>>> .LANCHOR0 = . + 0
> >>>> .LC1:
> >>>>         .ascii  "%d %g %d\012\000"
> >>>>         .space  6
> >>>> .LC0:
> >>>>         .word   7
> >>>>         .space  4
> >>>>         .word   0
> >>>>         .word   1075838976
> >>>>         .word   9
> >>>>         .space  4
> >>>>
> >>>> maybe there is some more generic code-gen bug for aligned aggregate
> >>>> copy?  That is, the patch tells the backend that the loads and
> >>>> stores to the 'int' vars (which have padding followed) is aligned
> >>>> to 8 bytes.
> >>>>
> >>>> I don't see what is wrong in the final assembler, but maybe
> >>>> some endian issue exists?  The code looks quite ugly though ;)
> >>>>
> >>>> Richard.
> >>>>
> >>>>>> Alan Lawrence wrote:
> >>>>>>> We've been seeing a bunch of new failures in the *libffi*
> >>> testsuite on ARM
> >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
> >>> following this
> >>>>>>> one-liner fix. I've reduced the testcase down to the attached
> >>> (including
> >>>>>>> removing any dependency on libffi); with gcc r221347, this prints
> >>> the
> >>>>>>> expected
> >>>>>>> 7 8 9
> >>>>>>> whereas with gcc r221348, instead it prints
> >>>>>>> 0 8 0
> >>>>>>>
> >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
> >>> a
> >>>>>>> var_decl of b1, from 32 to 64; both have type
> >>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
> >>> sizes-gimplified type_0
> >>>>>>> BLK
> >>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
> >>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
> >>>>>>>          align 64 symtab 0 alias set 1 canonical type
> >>> 0x2b9b8d428d20
> >>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> >>>>>>> 0x2b9b8d092690 int>
> >>>>>>>              SI file reduced.c line 12 col 7
> >>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> >>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> >>>>>>>              align 32 offset_align 64
> >>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> >>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
> >>> context
> >>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> >>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
> >>> D.6070>
> >>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
> >>> <type_decl
> >>>>>>> 0x2b9b8d42b000 D.6044>>
> >>>>>>>
> >>>>>>> The tree-optimized output is the same with both compilers (as this
> >>> does not
> >>>>>>> mention alignment); the expand output differs.
> >>>>>>>
> >>>>>>> Still investigating...
> >>>>>>>
> >>>>>>> --Alan
> >>>>>>>
> >>>>>>>
> >>>>>>> Richard Biener wrote:
> >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
> >>>>>>>> drops alignment info unnecessarily.
> >>>>>>>>
> >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >>>>>>>>
> >>>>>>>> Richard.
> >>>>>>>>
> >>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
> >>>>>>>>
> >>>>>>>>  PR tree-optimization/65310
> >>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
> >>>>>>>>  alignment.
> >>>>>>>>
> >>>>>>>> Index: gcc/tree-sra.c
> >>>>>>>>
> >>> ===================================================================
> >>>>>>>> --- gcc/tree-sra.c       (revision 221324)
> >>>>>>>> +++ gcc/tree-sra.c       (working copy)
> >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> >>>>>>>>    misalign = (misalign + offset) & (align - 1);
> >>>>>>>>    if (misalign != 0)
> >>>>>>>>      align = (misalign & -misalign);
> >>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
> >>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
> >>>>>>>>      exp_type = build_aligned_type (exp_type, align);
> >>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
> >>> off);
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
> > 
> 
>
Richard Earnshaw March 31, 2015, 10:10 a.m. UTC | #11
On 31/03/15 11:00, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 08:50, Richard Biener wrote:
>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>> it.
>>>>>
>>>>> The problem appears to be in laying out arguments, specifically
>>>>> varargs. From
>>>>> the "good" -fdump-rtl-expand:
>>>>>
>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>> S4 A32])
>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>      (nil))
>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>             (set (reg:SI 0 r0)
>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>> A32])
>>>>>
>>>>> The struct members are
>>>>> reg:SI 113 => int a;
>>>>> reg:DF 112 => double b;
>>>>> reg:SI 111 => int c;
>>>>>
>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>> pushed
>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>> build_ref_of_offset, we get:
>>>>>
>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>> *.LC1>)) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>> virtual-outgoing-args)
>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>> S8 A64])
>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>      (nil))
>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>      (nil))
>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>             (set (reg:SI 0 r0)
>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>> A32])
>>>>>
>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>
>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>> because
>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>> argument (the
>>>>> type constructed by build_ref_for_offset); it then executes
>>>>> (aapcs_layout_arg,
>>>>> arm.c line ~~5914)
>>>>>
>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>      next even number.  */
>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>     ncrn++;
>>>>>
>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>> testcase
>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>
>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>> I think
>>>>> because these use the type of the declared parameters, rather than the
>>>>> provided
>>>>> arguments, and the former do not have the increased alignment from
>>>>> build_ref_for_offset.
>>>>
>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>
>>>> This means that
>>>>
>>>> Int I __aligned__(8);
>>>>
>>>> Is passed differently than int.
>>>>
>>>> Arm_function_arg needs to be fixed.
>>>
>>> That is,
>>>
>>> typedef int myint __attribute__((aligned(8)));
>>>
>>> int main()
>>> {
>>>   myint i = 1;
>>>   int j = 2;
>>>   __builtin_printf("%d %d\n", i, j);
>>> }
>>>
>>> or
>>>
>>> myint i;
>>> int j;
>>> myint *p = &i;
>>> int *q = &j;
>>>
>>> int main()
>>> {
>>>   __builtin_printf("%d %d", *p, *q);
>>> }
>>>
>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>> non-aligned values only makes sense for things passed by value on the stack.
>>> And then obviously only dependent on the functuion type signature, not
>>> on the type of the passed value.
>>>
>>
>> I think the testcase is ill-formed.  Just because printf doesn't have
>> such a modifier, doesn't mean that another variadic function might not
>> have a means to detect when an object in the variadic list needs to be
>> over-aligned.  As such, the test should really be written as:
> 
> A value doesn't have "alignment".  A function may have alignment
> requirements on its arguments, clearly printf doesn't.
> 

Values don't.  But types do and variadic functions are special in that
they derive their types from the types of the actual parameters passed
not from the formals in the prototype.  Any manipulation of the types
should be done in the front end, not in the back end.

R.

>> typedef int myint __attribute__((aligned(8)));
>>
>> int main()
>> {
>>   myint i = 1;
>>   int j = 2;
>>   __builtin_printf("%d %d\n", (int) i, j);
>> }
>>
>> Variadic functions take the types of their arguments from the types of
>> the actuals passed.  The compiler should either be applying appropriate
>> promotion rules to make the types conformant by the language
>> specification or respecting the types exactly.  However, that should be
>> done in the mid-end not the back-end.  If incorrect alignment
>> information is passed to the back-end it can't help but make the wrong
>> choice.  Examining the mode here wouldn't help either.  Consider:
>>
>> int foo (int a, int b, int c, int d, int e, int f
>> __attribute__((aligned(8))));
>>
>> On ARM that should pass f in a 64-bit aligned location (since the
>> parameter will be on the stack).
>>
>> R.
>>
>>
>>> Richard.
>>>
>>>> Richard.
>>>>
>>>>>
>>>>> FWIW, I also tried:
>>>>>
>>>>> __attribute__((__aligned__((16)))) int x;
>>>>> int main (void)
>>>>> {
>>>>>   __builtin_printf("%d\n", x);
>>>>> }
>>>>>
>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>> alignment 32
>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>> has
>>>>> alignment 128.
>>>>>
>>>>> --Alan
>>>>>
>>>>> Richard Biener wrote:
>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>
>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>
>>>>>>>> ...actually attach the testcase...
>>>>>>> What compile options?
>>>>>>
>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>> I can't see anything not guaranteeing that:
>>>>>>
>>>>>>         .section        .rodata
>>>>>>         .align  3
>>>>>> .LANCHOR0 = . + 0
>>>>>> .LC1:
>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>         .space  6
>>>>>> .LC0:
>>>>>>         .word   7
>>>>>>         .space  4
>>>>>>         .word   0
>>>>>>         .word   1075838976
>>>>>>         .word   9
>>>>>>         .space  4
>>>>>>
>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>> to 8 bytes.
>>>>>>
>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>> Alan Lawrence wrote:
>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>> testsuite on ARM
>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>> following this
>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>> (including
>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>> the
>>>>>>>>> expected
>>>>>>>>> 7 8 9
>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>> 0 8 0
>>>>>>>>>
>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>> a
>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>> sizes-gimplified type_0
>>>>>>>>> BLK
>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>> 0x2b9b8d428d20
>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>> context
>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>> D.6070>
>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>> <type_decl
>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>
>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>> does not
>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>
>>>>>>>>> Still investigating...
>>>>>>>>>
>>>>>>>>> --Alan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Richard Biener wrote:
>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>
>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>>>
>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>  alignment.
>>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>
>>>>> ===================================================================
>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>> off);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>
>>
>
Richard Biener March 31, 2015, 10:20 a.m. UTC | #12
On Tue, 31 Mar 2015, Richard Biener wrote:

> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
> > On 31/03/15 08:50, Richard Biener wrote:
> > > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> > >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> > >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> > >>> it.
> > >>>
> > >>> The problem appears to be in laying out arguments, specifically
> > >>> varargs. From
> > >>> the "good" -fdump-rtl-expand:
> > >>>
> > >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> > >>> S4 A32])
> > >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> > >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> > >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> > >>>         (reg:SI 118)) reduced.c:14 -1
> > >>>      (nil))
> > >>> (call_insn 22 21 23 2 (parallel [
> > >>>             (set (reg:SI 0 r0)
> > >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > >>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> > >>> A32])
> > >>>
> > >>> The struct members are
> > >>> reg:SI 113 => int a;
> > >>> reg:DF 112 => double b;
> > >>> reg:SI 111 => int c;
> > >>>
> > >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> > >>> pushed
> > >>> into virtual-outgoing-args. In contrast, post-change to
> > >>> build_ref_of_offset, we get:
> > >>>
> > >>> (insn 17 16 18 2 (set (reg:SI 118)
> > >>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> > >>> *.LC1>)) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> > >>> virtual-outgoing-args)
> > >>>                 (const_int 8 [0x8])) [0  S4 A64])
> > >>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> > >>> S8 A64])
> > >>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> > >>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> > >>>      (nil))
> > >>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> > >>>         (reg:SI 118)) reduced.c:14 -1
> > >>>      (nil))
> > >>> (call_insn 22 21 23 2 (parallel [
> > >>>             (set (reg:SI 0 r0)
> > >>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> > >>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> > >>> A32])
> > >>>
> > >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> > >>>
> > >>> double+following int are all pushed into virtual-outgoing-args. This is
> > >>> because
> > >>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> > >>> argument (the
> > >>> type constructed by build_ref_for_offset); it then executes
> > >>> (aapcs_layout_arg,
> > >>> arm.c line ~~5914)
> > >>>
> > >>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> > >>>      next even number.  */
> > >>>   ncrn = pcum->aapcs_ncrn;
> > >>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> > >>>     ncrn++;
> > >>>
> > >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> > >>> testcase
> > >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> > >>> 32-bit-aligned int instead, which works as previously.
> > >>>
> > >>> Passing the same members of that struct in a non-vargs call, works ok -
> > >>> I think
> > >>> because these use the type of the declared parameters, rather than the
> > >>> provided
> > >>> arguments, and the former do not have the increased alignment from
> > >>> build_ref_for_offset.
> > >>
> > >> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> > >>
> > >> This means that
> > >>
> > >> Int I __aligned__(8);
> > >>
> > >> Is passed differently than int.
> > >>
> > >> Arm_function_arg needs to be fixed.
> > > 
> > > That is,
> > > 
> > > typedef int myint __attribute__((aligned(8)));
> > > 
> > > int main()
> > > {
> > >   myint i = 1;
> > >   int j = 2;
> > >   __builtin_printf("%d %d\n", i, j);
> > > }
> > > 
> > > or
> > > 
> > > myint i;
> > > int j;
> > > myint *p = &i;
> > > int *q = &j;
> > > 
> > > int main()
> > > {
> > >   __builtin_printf("%d %d", *p, *q);
> > > }
> > > 
> > > should behave the same.  There isn't a printf modifier for an "aligned int"
> > > because that sort of thing doesn't make sense.  Special-casing aligned vs.
> > > non-aligned values only makes sense for things passed by value on the stack.
> > > And then obviously only dependent on the functuion type signature, not
> > > on the type of the passed value.
> > > 
> > 
> > I think the testcase is ill-formed.  Just because printf doesn't have
> > such a modifier, doesn't mean that another variadic function might not
> > have a means to detect when an object in the variadic list needs to be
> > over-aligned.  As such, the test should really be written as:
> 
> A value doesn't have "alignment".  A function may have alignment
> requirements on its arguments, clearly printf doesn't.

Btw, it looks like function_arg is supposed to pass whether the argument
is to the variadic part of the function or not, but it gets passed
false as in

      args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
                                                argpos < n_named_args);

n_named_args is 4.  That is because ARM asks to do this via

  if (type_arg_types != 0
      && targetm.calls.strict_argument_naming (args_so_far))
    ;
  else if (type_arg_types != 0
           && ! targetm.calls.pretend_outgoing_varargs_named 
(args_so_far))
    /* Don't include the last named arg.  */
    --n_named_args;
  else
    /* Treat all args as named.  */
    n_named_args = num_actuals;

thus you lose that info (you could have that readily available).

> > typedef int myint __attribute__((aligned(8)));
> > 
> > int main()
> > {
> >   myint i = 1;
> >   int j = 2;
> >   __builtin_printf("%d %d\n", (int) i, j);
> > }
> > 
> > Variadic functions take the types of their arguments from the types of
> > the actuals passed.  The compiler should either be applying appropriate
> > promotion rules to make the types conformant by the language
> > specification or respecting the types exactly.  However, that should be
> > done in the mid-end not the back-end.  If incorrect alignment
> > information is passed to the back-end it can't help but make the wrong
> > choice.  Examining the mode here wouldn't help either.  Consider:
> > 
> > int foo (int a, int b, int c, int d, int e, int f
> > __attribute__((aligned(8))));
> > 
> > On ARM that should pass f in a 64-bit aligned location (since the
> > parameter will be on the stack).
> > 
> > R.
> > 
> > 
> > > Richard.
> > > 
> > >> Richard.
> > >>
> > >>>
> > >>> FWIW, I also tried:
> > >>>
> > >>> __attribute__((__aligned__((16)))) int x;
> > >>> int main (void)
> > >>> {
> > >>>   __builtin_printf("%d\n", x);
> > >>> }
> > >>>
> > >>> but in that case, the arm_function_arg is still fed a type with
> > >>> alignment 32
> > >>> (bits), i.e. distinct from the type of the field 'x' in memory, which
> > >>> has
> > >>> alignment 128.
> > >>>
> > >>> --Alan
> > >>>
> > >>> Richard Biener wrote:
> > >>>> On Mon, 30 Mar 2015, Richard Biener wrote:
> > >>>>
> > >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> > >>>>>
> > >>>>>> ...actually attach the testcase...
> > >>>>> What compile options?
> > >>>>
> > >>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> > >>>> I can't see anything not guaranteeing that:
> > >>>>
> > >>>>         .section        .rodata
> > >>>>         .align  3
> > >>>> .LANCHOR0 = . + 0
> > >>>> .LC1:
> > >>>>         .ascii  "%d %g %d\012\000"
> > >>>>         .space  6
> > >>>> .LC0:
> > >>>>         .word   7
> > >>>>         .space  4
> > >>>>         .word   0
> > >>>>         .word   1075838976
> > >>>>         .word   9
> > >>>>         .space  4
> > >>>>
> > >>>> maybe there is some more generic code-gen bug for aligned aggregate
> > >>>> copy?  That is, the patch tells the backend that the loads and
> > >>>> stores to the 'int' vars (which have padding followed) is aligned
> > >>>> to 8 bytes.
> > >>>>
> > >>>> I don't see what is wrong in the final assembler, but maybe
> > >>>> some endian issue exists?  The code looks quite ugly though ;)
> > >>>>
> > >>>> Richard.
> > >>>>
> > >>>>>> Alan Lawrence wrote:
> > >>>>>>> We've been seeing a bunch of new failures in the *libffi*
> > >>> testsuite on ARM
> > >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
> > >>> following this
> > >>>>>>> one-liner fix. I've reduced the testcase down to the attached
> > >>> (including
> > >>>>>>> removing any dependency on libffi); with gcc r221347, this prints
> > >>> the
> > >>>>>>> expected
> > >>>>>>> 7 8 9
> > >>>>>>> whereas with gcc r221348, instead it prints
> > >>>>>>> 0 8 0
> > >>>>>>>
> > >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
> > >>> a
> > >>>>>>> var_decl of b1, from 32 to 64; both have type
> > >>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
> > >>> sizes-gimplified type_0
> > >>>>>>> BLK
> > >>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
> > >>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
> > >>>>>>>          align 64 symtab 0 alias set 1 canonical type
> > >>> 0x2b9b8d428d20
> > >>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> > >>>>>>> 0x2b9b8d092690 int>
> > >>>>>>>              SI file reduced.c line 12 col 7
> > >>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> > >>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> > >>>>>>>              align 32 offset_align 64
> > >>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> > >>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
> > >>> context
> > >>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> > >>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
> > >>> D.6070>
> > >>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
> > >>> <type_decl
> > >>>>>>> 0x2b9b8d42b000 D.6044>>
> > >>>>>>>
> > >>>>>>> The tree-optimized output is the same with both compilers (as this
> > >>> does not
> > >>>>>>> mention alignment); the expand output differs.
> > >>>>>>>
> > >>>>>>> Still investigating...
> > >>>>>>>
> > >>>>>>> --Alan
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Richard Biener wrote:
> > >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
> > >>>>>>>> drops alignment info unnecessarily.
> > >>>>>>>>
> > >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > >>>>>>>>
> > >>>>>>>> Richard.
> > >>>>>>>>
> > >>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
> > >>>>>>>>
> > >>>>>>>>  PR tree-optimization/65310
> > >>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
> > >>>>>>>>  alignment.
> > >>>>>>>>
> > >>>>>>>> Index: gcc/tree-sra.c
> > >>>>>>>>
> > >>> ===================================================================
> > >>>>>>>> --- gcc/tree-sra.c       (revision 221324)
> > >>>>>>>> +++ gcc/tree-sra.c       (working copy)
> > >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> > >>>>>>>>    misalign = (misalign + offset) & (align - 1);
> > >>>>>>>>    if (misalign != 0)
> > >>>>>>>>      align = (misalign & -misalign);
> > >>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
> > >>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
> > >>>>>>>>      exp_type = build_aligned_type (exp_type, align);
> > >>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
> > >>> off);
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> > > 
> > 
> > 
> 
>
Richard Earnshaw March 31, 2015, 10:31 a.m. UTC | #13
On 31/03/15 11:20, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Biener wrote:
> 
>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>
>>> On 31/03/15 08:50, Richard Biener wrote:
>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>> it.
>>>>>>
>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>> varargs. From
>>>>>> the "good" -fdump-rtl-expand:
>>>>>>
>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>> S4 A32])
>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>             (set (reg:SI 0 r0)
>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>> A32])
>>>>>>
>>>>>> The struct members are
>>>>>> reg:SI 113 => int a;
>>>>>> reg:DF 112 => double b;
>>>>>> reg:SI 111 => int c;
>>>>>>
>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>> pushed
>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>> build_ref_of_offset, we get:
>>>>>>
>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>> virtual-outgoing-args)
>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>> S8 A64])
>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>      (nil))
>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>             (set (reg:SI 0 r0)
>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>> A32])
>>>>>>
>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>
>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>> because
>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>> argument (the
>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>> (aapcs_layout_arg,
>>>>>> arm.c line ~~5914)
>>>>>>
>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>      next even number.  */
>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>     ncrn++;
>>>>>>
>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>> testcase
>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>
>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>> I think
>>>>>> because these use the type of the declared parameters, rather than the
>>>>>> provided
>>>>>> arguments, and the former do not have the increased alignment from
>>>>>> build_ref_for_offset.
>>>>>
>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>
>>>>> This means that
>>>>>
>>>>> Int I __aligned__(8);
>>>>>
>>>>> Is passed differently than int.
>>>>>
>>>>> Arm_function_arg needs to be fixed.
>>>>
>>>> That is,
>>>>
>>>> typedef int myint __attribute__((aligned(8)));
>>>>
>>>> int main()
>>>> {
>>>>   myint i = 1;
>>>>   int j = 2;
>>>>   __builtin_printf("%d %d\n", i, j);
>>>> }
>>>>
>>>> or
>>>>
>>>> myint i;
>>>> int j;
>>>> myint *p = &i;
>>>> int *q = &j;
>>>>
>>>> int main()
>>>> {
>>>>   __builtin_printf("%d %d", *p, *q);
>>>> }
>>>>
>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>> And then obviously only dependent on the functuion type signature, not
>>>> on the type of the passed value.
>>>>
>>>
>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>> such a modifier, doesn't mean that another variadic function might not
>>> have a means to detect when an object in the variadic list needs to be
>>> over-aligned.  As such, the test should really be written as:
>>
>> A value doesn't have "alignment".  A function may have alignment
>> requirements on its arguments, clearly printf doesn't.
> 
> Btw, it looks like function_arg is supposed to pass whether the argument
> is to the variadic part of the function or not, but it gets passed
> false as in
> 
>       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
>                                                 argpos < n_named_args);
> 
> n_named_args is 4.  That is because ARM asks to do this via
> 
>   if (type_arg_types != 0
>       && targetm.calls.strict_argument_naming (args_so_far))
>     ;
>   else if (type_arg_types != 0
>            && ! targetm.calls.pretend_outgoing_varargs_named 
> (args_so_far))
>     /* Don't include the last named arg.  */
>     --n_named_args;
>   else
>     /* Treat all args as named.  */
>     n_named_args = num_actuals;
> 
> thus you lose that info (you could have that readily available).
> 

From the calling side of a function it shouldn't matter.  They only
thing the caller has to know is that the function is variadic (and
therefore that the base-standard rules from the AAPCS apply -- no use of
FP registers for parameters).  The behaviour after that is *as if* all
the arguments were pushed onto the stack in the correct order and
finally the lowest 4 words are popped off the stack again into r0-r3.
Hence any alignment that would apply to arguments on the stack has to be
reflected in their allocation into r0-r3 (since the push/pop of those
registers never happens).

R.

>>> typedef int myint __attribute__((aligned(8)));
>>>
>>> int main()
>>> {
>>>   myint i = 1;
>>>   int j = 2;
>>>   __builtin_printf("%d %d\n", (int) i, j);
>>> }
>>>
>>> Variadic functions take the types of their arguments from the types of
>>> the actuals passed.  The compiler should either be applying appropriate
>>> promotion rules to make the types conformant by the language
>>> specification or respecting the types exactly.  However, that should be
>>> done in the mid-end not the back-end.  If incorrect alignment
>>> information is passed to the back-end it can't help but make the wrong
>>> choice.  Examining the mode here wouldn't help either.  Consider:
>>>
>>> int foo (int a, int b, int c, int d, int e, int f
>>> __attribute__((aligned(8))));
>>>
>>> On ARM that should pass f in a 64-bit aligned location (since the
>>> parameter will be on the stack).
>>>
>>> R.
>>>
>>>
>>>> Richard.
>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> FWIW, I also tried:
>>>>>>
>>>>>> __attribute__((__aligned__((16)))) int x;
>>>>>> int main (void)
>>>>>> {
>>>>>>   __builtin_printf("%d\n", x);
>>>>>> }
>>>>>>
>>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>>> alignment 32
>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>>> has
>>>>>> alignment 128.
>>>>>>
>>>>>> --Alan
>>>>>>
>>>>>> Richard Biener wrote:
>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>>
>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>>
>>>>>>>>> ...actually attach the testcase...
>>>>>>>> What compile options?
>>>>>>>
>>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>>> I can't see anything not guaranteeing that:
>>>>>>>
>>>>>>>         .section        .rodata
>>>>>>>         .align  3
>>>>>>> .LANCHOR0 = . + 0
>>>>>>> .LC1:
>>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>>         .space  6
>>>>>>> .LC0:
>>>>>>>         .word   7
>>>>>>>         .space  4
>>>>>>>         .word   0
>>>>>>>         .word   1075838976
>>>>>>>         .word   9
>>>>>>>         .space  4
>>>>>>>
>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>>> to 8 bytes.
>>>>>>>
>>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>> Alan Lawrence wrote:
>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>>> testsuite on ARM
>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>>> following this
>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>>> (including
>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>>> the
>>>>>>>>>> expected
>>>>>>>>>> 7 8 9
>>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>>> 0 8 0
>>>>>>>>>>
>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>>> a
>>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>>> sizes-gimplified type_0
>>>>>>>>>> BLK
>>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>>> 0x2b9b8d428d20
>>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>>> context
>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>>> D.6070>
>>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>>> <type_decl
>>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>>
>>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>>> does not
>>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>>
>>>>>>>>>> Still investigating...
>>>>>>>>>>
>>>>>>>>>> --Alan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Richard Biener wrote:
>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>>
>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>>
>>>>>>>>>>> Richard.
>>>>>>>>>>>
>>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>>>>
>>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>>  alignment.
>>>>>>>>>>>
>>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>>
>>>>>> ===================================================================
>>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>>> off);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
Jakub Jelinek March 31, 2015, 10:32 a.m. UTC | #14
On Tue, Mar 31, 2015 at 11:10:39AM +0100, Richard Earnshaw wrote:
> >>> That is,
> >>>
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", i, j);
> >>> }
> >>>
> >>> or
> >>>
> >>> myint i;
> >>> int j;
> >>> myint *p = &i;
> >>> int *q = &j;
> >>>
> >>> int main()
> >>> {
> >>>   __builtin_printf("%d %d", *p, *q);
> >>> }

Note that starting with r221348, gcc fails to profiledbootstrap on
armv7hl-linux-gnueabi.  I'd hope it is the same thing.

To middle-end, all integral type conversions that differ just in alignment
are useless - for INTEGRAL_TYPE_P all useless_type_conversion_p cares about
is sign and precision.

So, if arm has a weirdo ABI that wants to pass aligned types differently
(why, I'd say it is just a bug in the ABI), then for named arguments it
really has to look at the function type - the type of the argument, rather
than the passed in value's type, and for varargs it would be best if it
remembered such thing early (in the FEs), because the middle-end can change
things any time, with or without Richard B.'s recent SRA change.

	Jakub
Richard Biener March 31, 2015, 10:44 a.m. UTC | #15
On Tue, 31 Mar 2015, Richard Earnshaw wrote:

> On 31/03/15 11:20, Richard Biener wrote:
> > On Tue, 31 Mar 2015, Richard Biener wrote:
> > 
> >> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> >>
> >>> On 31/03/15 08:50, Richard Biener wrote:
> >>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
> >>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
> >>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
> >>>>>> it.
> >>>>>>
> >>>>>> The problem appears to be in laying out arguments, specifically
> >>>>>> varargs. From
> >>>>>> the "good" -fdump-rtl-expand:
> >>>>>>
> >>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>> S4 A32])
> >>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
> >>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
> >>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>             (set (reg:SI 0 r0)
> >>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>> A32])
> >>>>>>
> >>>>>> The struct members are
> >>>>>> reg:SI 113 => int a;
> >>>>>> reg:DF 112 => double b;
> >>>>>> reg:SI 111 => int c;
> >>>>>>
> >>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
> >>>>>> pushed
> >>>>>> into virtual-outgoing-args. In contrast, post-change to
> >>>>>> build_ref_of_offset, we get:
> >>>>>>
> >>>>>> (insn 17 16 18 2 (set (reg:SI 118)
> >>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
> >>>>>> *.LC1>)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
> >>>>>> virtual-outgoing-args)
> >>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
> >>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
> >>>>>> S8 A64])
> >>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
> >>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
> >>>>>>         (reg:SI 118)) reduced.c:14 -1
> >>>>>>      (nil))
> >>>>>> (call_insn 22 21 23 2 (parallel [
> >>>>>>             (set (reg:SI 0 r0)
> >>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
> >>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
> >>>>>> A32])
> >>>>>>
> >>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
> >>>>>>
> >>>>>> double+following int are all pushed into virtual-outgoing-args. This is
> >>>>>> because
> >>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
> >>>>>> argument (the
> >>>>>> type constructed by build_ref_for_offset); it then executes
> >>>>>> (aapcs_layout_arg,
> >>>>>> arm.c line ~~5914)
> >>>>>>
> >>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
> >>>>>>      next even number.  */
> >>>>>>   ncrn = pcum->aapcs_ncrn;
> >>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
> >>>>>>     ncrn++;
> >>>>>>
> >>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
> >>>>>> testcase
> >>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
> >>>>>> 32-bit-aligned int instead, which works as previously.
> >>>>>>
> >>>>>> Passing the same members of that struct in a non-vargs call, works ok -
> >>>>>> I think
> >>>>>> because these use the type of the declared parameters, rather than the
> >>>>>> provided
> >>>>>> arguments, and the former do not have the increased alignment from
> >>>>>> build_ref_for_offset.
> >>>>>
> >>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
> >>>>>
> >>>>> This means that
> >>>>>
> >>>>> Int I __aligned__(8);
> >>>>>
> >>>>> Is passed differently than int.
> >>>>>
> >>>>> Arm_function_arg needs to be fixed.
> >>>>
> >>>> That is,
> >>>>
> >>>> typedef int myint __attribute__((aligned(8)));
> >>>>
> >>>> int main()
> >>>> {
> >>>>   myint i = 1;
> >>>>   int j = 2;
> >>>>   __builtin_printf("%d %d\n", i, j);
> >>>> }
> >>>>
> >>>> or
> >>>>
> >>>> myint i;
> >>>> int j;
> >>>> myint *p = &i;
> >>>> int *q = &j;
> >>>>
> >>>> int main()
> >>>> {
> >>>>   __builtin_printf("%d %d", *p, *q);
> >>>> }
> >>>>
> >>>> should behave the same.  There isn't a printf modifier for an "aligned int"
> >>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
> >>>> non-aligned values only makes sense for things passed by value on the stack.
> >>>> And then obviously only dependent on the functuion type signature, not
> >>>> on the type of the passed value.
> >>>>
> >>>
> >>> I think the testcase is ill-formed.  Just because printf doesn't have
> >>> such a modifier, doesn't mean that another variadic function might not
> >>> have a means to detect when an object in the variadic list needs to be
> >>> over-aligned.  As such, the test should really be written as:
> >>
> >> A value doesn't have "alignment".  A function may have alignment
> >> requirements on its arguments, clearly printf doesn't.
> > 
> > Btw, it looks like function_arg is supposed to pass whether the argument
> > is to the variadic part of the function or not, but it gets passed
> > false as in
> > 
> >       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
> >                                                 argpos < n_named_args);
> > 
> > n_named_args is 4.  That is because ARM asks to do this via
> > 
> >   if (type_arg_types != 0
> >       && targetm.calls.strict_argument_naming (args_so_far))
> >     ;
> >   else if (type_arg_types != 0
> >            && ! targetm.calls.pretend_outgoing_varargs_named 
> > (args_so_far))
> >     /* Don't include the last named arg.  */
> >     --n_named_args;
> >   else
> >     /* Treat all args as named.  */
> >     n_named_args = num_actuals;
> > 
> > thus you lose that info (you could have that readily available).
> > 
> 
> From the calling side of a function it shouldn't matter.  They only
> thing the caller has to know is that the function is variadic (and
> therefore that the base-standard rules from the AAPCS apply -- no use of
> FP registers for parameters).  The behaviour after that is *as if* all
> the arguments were pushed onto the stack in the correct order and
> finally the lowest 4 words are popped off the stack again into r0-r3.
> Hence any alignment that would apply to arguments on the stack has to be
> reflected in their allocation into r0-r3 (since the push/pop of those
> registers never happens).

But how do you compute the alignment of, say a myint '1'?  Of course
__attribute__((aligned())) is a C extension thus AAPCS likely doesn't 
consider it.

But yes, as given even on x86_64 we might spill a 8-byte aligned
int register to a 8-byte aligned stack slot so the proposed patch
makes sense in that regard.  I wonder how many other passes can
get confused by this (PRE, I suppose, inserting an explicitely
aligned load as well as all passes after the vectorizer which
also creates loads/store with explicit (usually lower) alignment).

Richard.

> R.
> 
> >>> typedef int myint __attribute__((aligned(8)));
> >>>
> >>> int main()
> >>> {
> >>>   myint i = 1;
> >>>   int j = 2;
> >>>   __builtin_printf("%d %d\n", (int) i, j);
> >>> }
> >>>
> >>> Variadic functions take the types of their arguments from the types of
> >>> the actuals passed.  The compiler should either be applying appropriate
> >>> promotion rules to make the types conformant by the language
> >>> specification or respecting the types exactly.  However, that should be
> >>> done in the mid-end not the back-end.  If incorrect alignment
> >>> information is passed to the back-end it can't help but make the wrong
> >>> choice.  Examining the mode here wouldn't help either.  Consider:
> >>>
> >>> int foo (int a, int b, int c, int d, int e, int f
> >>> __attribute__((aligned(8))));
> >>>
> >>> On ARM that should pass f in a 64-bit aligned location (since the
> >>> parameter will be on the stack).
> >>>
> >>> R.
> >>>
> >>>
> >>>> Richard.
> >>>>
> >>>>> Richard.
> >>>>>
> >>>>>>
> >>>>>> FWIW, I also tried:
> >>>>>>
> >>>>>> __attribute__((__aligned__((16)))) int x;
> >>>>>> int main (void)
> >>>>>> {
> >>>>>>   __builtin_printf("%d\n", x);
> >>>>>> }
> >>>>>>
> >>>>>> but in that case, the arm_function_arg is still fed a type with
> >>>>>> alignment 32
> >>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
> >>>>>> has
> >>>>>> alignment 128.
> >>>>>>
> >>>>>> --Alan
> >>>>>>
> >>>>>> Richard Biener wrote:
> >>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
> >>>>>>>
> >>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
> >>>>>>>>
> >>>>>>>>> ...actually attach the testcase...
> >>>>>>>> What compile options?
> >>>>>>>
> >>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
> >>>>>>> I can't see anything not guaranteeing that:
> >>>>>>>
> >>>>>>>         .section        .rodata
> >>>>>>>         .align  3
> >>>>>>> .LANCHOR0 = . + 0
> >>>>>>> .LC1:
> >>>>>>>         .ascii  "%d %g %d\012\000"
> >>>>>>>         .space  6
> >>>>>>> .LC0:
> >>>>>>>         .word   7
> >>>>>>>         .space  4
> >>>>>>>         .word   0
> >>>>>>>         .word   1075838976
> >>>>>>>         .word   9
> >>>>>>>         .space  4
> >>>>>>>
> >>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
> >>>>>>> copy?  That is, the patch tells the backend that the loads and
> >>>>>>> stores to the 'int' vars (which have padding followed) is aligned
> >>>>>>> to 8 bytes.
> >>>>>>>
> >>>>>>> I don't see what is wrong in the final assembler, but maybe
> >>>>>>> some endian issue exists?  The code looks quite ugly though ;)
> >>>>>>>
> >>>>>>> Richard.
> >>>>>>>
> >>>>>>>>> Alan Lawrence wrote:
> >>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
> >>>>>> testsuite on ARM
> >>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
> >>>>>> following this
> >>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
> >>>>>> (including
> >>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
> >>>>>> the
> >>>>>>>>>> expected
> >>>>>>>>>> 7 8 9
> >>>>>>>>>> whereas with gcc r221348, instead it prints
> >>>>>>>>>> 0 8 0
> >>>>>>>>>>
> >>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
> >>>>>> a
> >>>>>>>>>> var_decl of b1, from 32 to 64; both have type
> >>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
> >>>>>> sizes-gimplified type_0
> >>>>>>>>>> BLK
> >>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
> >>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
> >>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
> >>>>>> 0x2b9b8d428d20
> >>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
> >>>>>>>>>> 0x2b9b8d092690 int>
> >>>>>>>>>>              SI file reduced.c line 12 col 7
> >>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
> >>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
> >>>>>>>>>>              align 32 offset_align 64
> >>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
> >>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
> >>>>>> context
> >>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
> >>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
> >>>>>> D.6070>
> >>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
> >>>>>> <type_decl
> >>>>>>>>>> 0x2b9b8d42b000 D.6044>>
> >>>>>>>>>>
> >>>>>>>>>> The tree-optimized output is the same with both compilers (as this
> >>>>>> does not
> >>>>>>>>>> mention alignment); the expand output differs.
> >>>>>>>>>>
> >>>>>>>>>> Still investigating...
> >>>>>>>>>>
> >>>>>>>>>> --Alan
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Richard Biener wrote:
> >>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
> >>>>>>>>>>> drops alignment info unnecessarily.
> >>>>>>>>>>>
> >>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> >>>>>>>>>>>
> >>>>>>>>>>> Richard.
> >>>>>>>>>>>
> >>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
> >>>>>>>>>>>
> >>>>>>>>>>>  PR tree-optimization/65310
> >>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
> >>>>>>>>>>>  alignment.
> >>>>>>>>>>>
> >>>>>>>>>>> Index: gcc/tree-sra.c
> >>>>>>>>>>>
> >>>>>> ===================================================================
> >>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
> >>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
> >>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
> >>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
> >>>>>>>>>>>    if (misalign != 0)
> >>>>>>>>>>>      align = (misalign & -misalign);
> >>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
> >>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
> >>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
> >>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
> >>>>>> off);
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> > 
> 
>
Richard Earnshaw March 31, 2015, 10:53 a.m. UTC | #16
On 31/03/15 11:44, Richard Biener wrote:
> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
> 
>> On 31/03/15 11:20, Richard Biener wrote:
>>> On Tue, 31 Mar 2015, Richard Biener wrote:
>>>
>>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote:
>>>>
>>>>> On 31/03/15 08:50, Richard Biener wrote:
>>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote:
>>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes
>>>>>>>> it.
>>>>>>>>
>>>>>>>> The problem appears to be in laying out arguments, specifically
>>>>>>>> varargs. From
>>>>>>>> the "good" -fdump-rtl-expand:
>>>>>>>>
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S4 A32])
>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2)
>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1)
>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> The struct members are
>>>>>>>> reg:SI 113 => int a;
>>>>>>>> reg:DF 112 => double b;
>>>>>>>> reg:SI 111 => int c;
>>>>>>>>
>>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is
>>>>>>>> pushed
>>>>>>>> into virtual-outgoing-args. In contrast, post-change to
>>>>>>>> build_ref_of_offset, we get:
>>>>>>>>
>>>>>>>> (insn 17 16 18 2 (set (reg:SI 118)
>>>>>>>>   (symbol_ref/v/f:SI ("*.LC1") [flags 0x82]  <var_decl 0x2ba57fa8d750
>>>>>>>> *.LC1>)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107
>>>>>>>> virtual-outgoing-args)
>>>>>>>>                 (const_int 8 [0x8])) [0  S4 A64])
>>>>>>>>         (reg:SI 111 [ b1$16 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0
>>>>>>>> S8 A64])
>>>>>>>>         (reg:DF 112 [ b1$8 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2)
>>>>>>>>         (reg:SI 113 [ b1 ])) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0)
>>>>>>>>         (reg:SI 118)) reduced.c:14 -1
>>>>>>>>      (nil))
>>>>>>>> (call_insn 22 21 23 2 (parallel [
>>>>>>>>             (set (reg:SI 0 r0)
>>>>>>>>                 (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41]
>>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4
>>>>>>>> A32])
>>>>>>>>
>>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the
>>>>>>>>
>>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is
>>>>>>>> because
>>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second
>>>>>>>> argument (the
>>>>>>>> type constructed by build_ref_for_offset); it then executes
>>>>>>>> (aapcs_layout_arg,
>>>>>>>> arm.c line ~~5914)
>>>>>>>>
>>>>>>>>   /* C3 - For double-word aligned arguments, round the NCRN up to the
>>>>>>>>      next even number.  */
>>>>>>>>   ncrn = pcum->aapcs_ncrn;
>>>>>>>>   if ((ncrn & 1) && arm_needs_doubleword_align (mode, type))
>>>>>>>>     ncrn++;
>>>>>>>>
>>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the
>>>>>>>> testcase
>>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a
>>>>>>>> 32-bit-aligned int instead, which works as previously.
>>>>>>>>
>>>>>>>> Passing the same members of that struct in a non-vargs call, works ok -
>>>>>>>> I think
>>>>>>>> because these use the type of the declared parameters, rather than the
>>>>>>>> provided
>>>>>>>> arguments, and the former do not have the increased alignment from
>>>>>>>> build_ref_for_offset.
>>>>>>>
>>>>>>> It doesn't make sense to use the alignment of passed values.  That looks like bs.
>>>>>>>
>>>>>>> This means that
>>>>>>>
>>>>>>> Int I __aligned__(8);
>>>>>>>
>>>>>>> Is passed differently than int.
>>>>>>>
>>>>>>> Arm_function_arg needs to be fixed.
>>>>>>
>>>>>> That is,
>>>>>>
>>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>   myint i = 1;
>>>>>>   int j = 2;
>>>>>>   __builtin_printf("%d %d\n", i, j);
>>>>>> }
>>>>>>
>>>>>> or
>>>>>>
>>>>>> myint i;
>>>>>> int j;
>>>>>> myint *p = &i;
>>>>>> int *q = &j;
>>>>>>
>>>>>> int main()
>>>>>> {
>>>>>>   __builtin_printf("%d %d", *p, *q);
>>>>>> }
>>>>>>
>>>>>> should behave the same.  There isn't a printf modifier for an "aligned int"
>>>>>> because that sort of thing doesn't make sense.  Special-casing aligned vs.
>>>>>> non-aligned values only makes sense for things passed by value on the stack.
>>>>>> And then obviously only dependent on the functuion type signature, not
>>>>>> on the type of the passed value.
>>>>>>
>>>>>
>>>>> I think the testcase is ill-formed.  Just because printf doesn't have
>>>>> such a modifier, doesn't mean that another variadic function might not
>>>>> have a means to detect when an object in the variadic list needs to be
>>>>> over-aligned.  As such, the test should really be written as:
>>>>
>>>> A value doesn't have "alignment".  A function may have alignment
>>>> requirements on its arguments, clearly printf doesn't.
>>>
>>> Btw, it looks like function_arg is supposed to pass whether the argument
>>> is to the variadic part of the function or not, but it gets passed
>>> false as in
>>>
>>>       args[i].reg = targetm.calls.function_arg (args_so_far, mode, type,
>>>                                                 argpos < n_named_args);
>>>
>>> n_named_args is 4.  That is because ARM asks to do this via
>>>
>>>   if (type_arg_types != 0
>>>       && targetm.calls.strict_argument_naming (args_so_far))
>>>     ;
>>>   else if (type_arg_types != 0
>>>            && ! targetm.calls.pretend_outgoing_varargs_named 
>>> (args_so_far))
>>>     /* Don't include the last named arg.  */
>>>     --n_named_args;
>>>   else
>>>     /* Treat all args as named.  */
>>>     n_named_args = num_actuals;
>>>
>>> thus you lose that info (you could have that readily available).
>>>
>>
>> From the calling side of a function it shouldn't matter.  They only
>> thing the caller has to know is that the function is variadic (and
>> therefore that the base-standard rules from the AAPCS apply -- no use of
>> FP registers for parameters).  The behaviour after that is *as if* all
>> the arguments were pushed onto the stack in the correct order and
>> finally the lowest 4 words are popped off the stack again into r0-r3.
>> Hence any alignment that would apply to arguments on the stack has to be
>> reflected in their allocation into r0-r3 (since the push/pop of those
>> registers never happens).
> 
> But how do you compute the alignment of, say a myint '1'?  Of course
> __attribute__((aligned())) is a C extension thus AAPCS likely doesn't 
> consider it.
> 

Front-end problem :-)

> But yes, as given even on x86_64 we might spill a 8-byte aligned
> int register to a 8-byte aligned stack slot so the proposed patch
> makes sense in that regard.  I wonder how many other passes can
> get confused by this (PRE, I suppose, inserting an explicitely
> aligned load as well as all passes after the vectorizer which
> also creates loads/store with explicit (usually lower) alignment).
> 
> Richard.
> 
>> R.
>>
>>>>> typedef int myint __attribute__((aligned(8)));
>>>>>
>>>>> int main()
>>>>> {
>>>>>   myint i = 1;
>>>>>   int j = 2;
>>>>>   __builtin_printf("%d %d\n", (int) i, j);
>>>>> }
>>>>>
>>>>> Variadic functions take the types of their arguments from the types of
>>>>> the actuals passed.  The compiler should either be applying appropriate
>>>>> promotion rules to make the types conformant by the language
>>>>> specification or respecting the types exactly.  However, that should be
>>>>> done in the mid-end not the back-end.  If incorrect alignment
>>>>> information is passed to the back-end it can't help but make the wrong
>>>>> choice.  Examining the mode here wouldn't help either.  Consider:
>>>>>
>>>>> int foo (int a, int b, int c, int d, int e, int f
>>>>> __attribute__((aligned(8))));
>>>>>
>>>>> On ARM that should pass f in a 64-bit aligned location (since the
>>>>> parameter will be on the stack).
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> FWIW, I also tried:
>>>>>>>>
>>>>>>>> __attribute__((__aligned__((16)))) int x;
>>>>>>>> int main (void)
>>>>>>>> {
>>>>>>>>   __builtin_printf("%d\n", x);
>>>>>>>> }
>>>>>>>>
>>>>>>>> but in that case, the arm_function_arg is still fed a type with
>>>>>>>> alignment 32
>>>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which
>>>>>>>> has
>>>>>>>> alignment 128.
>>>>>>>>
>>>>>>>> --Alan
>>>>>>>>
>>>>>>>> Richard Biener wrote:
>>>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote:
>>>>>>>>>
>>>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote:
>>>>>>>>>>
>>>>>>>>>>> ...actually attach the testcase...
>>>>>>>>>> What compile options?
>>>>>>>>>
>>>>>>>>> Just tried -O2.  The GIMPLE IL assumes 64bit alignment of .LC0 but
>>>>>>>>> I can't see anything not guaranteeing that:
>>>>>>>>>
>>>>>>>>>         .section        .rodata
>>>>>>>>>         .align  3
>>>>>>>>> .LANCHOR0 = . + 0
>>>>>>>>> .LC1:
>>>>>>>>>         .ascii  "%d %g %d\012\000"
>>>>>>>>>         .space  6
>>>>>>>>> .LC0:
>>>>>>>>>         .word   7
>>>>>>>>>         .space  4
>>>>>>>>>         .word   0
>>>>>>>>>         .word   1075838976
>>>>>>>>>         .word   9
>>>>>>>>>         .space  4
>>>>>>>>>
>>>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate
>>>>>>>>> copy?  That is, the patch tells the backend that the loads and
>>>>>>>>> stores to the 'int' vars (which have padding followed) is aligned
>>>>>>>>> to 8 bytes.
>>>>>>>>>
>>>>>>>>> I don't see what is wrong in the final assembler, but maybe
>>>>>>>>> some endian issue exists?  The code looks quite ugly though ;)
>>>>>>>>>
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>> Alan Lawrence wrote:
>>>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi*
>>>>>>>> testsuite on ARM
>>>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf),
>>>>>>>> following this
>>>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached
>>>>>>>> (including
>>>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints
>>>>>>>> the
>>>>>>>>>>>> expected
>>>>>>>>>>>> 7 8 9
>>>>>>>>>>>> whereas with gcc r221348, instead it prints
>>>>>>>>>>>> 0 8 0
>>>>>>>>>>>>
>>>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and
>>>>>>>> a
>>>>>>>>>>>> var_decl of b1, from 32 to 64; both have type
>>>>>>>>>>>>   type <record_type 0x2b9b8d428d20 cls_struct_16byte
>>>>>>>> sizes-gimplified type_0
>>>>>>>>>>>> BLK
>>>>>>>>>>>>          size <integer_cst 0x2b9b8d3720a8 constant 192>
>>>>>>>>>>>>          unit size <integer_cst 0x2b9b8d372078 constant 24>
>>>>>>>>>>>>          align 64 symtab 0 alias set 1 canonical type
>>>>>>>> 0x2b9b8d428d20
>>>>>>>>>>>>          fields <field_decl 0x2b9b8d42b098 a type <integer_type
>>>>>>>>>>>> 0x2b9b8d092690 int>
>>>>>>>>>>>>              SI file reduced.c line 12 col 7
>>>>>>>>>>>>              size <integer_cst 0x2b9b8d08eeb8 constant 32>
>>>>>>>>>>>>              unit size <integer_cst 0x2b9b8d08eed0 constant 4>
>>>>>>>>>>>>              align 32 offset_align 64
>>>>>>>>>>>>              offset <integer_cst 0x2b9b8d08eee8 constant 0>
>>>>>>>>>>>>              bit offset <integer_cst 0x2b9b8d08ef48 constant 0>
>>>>>>>> context
>>>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl
>>>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0
>>>>>>>> D.6070>
>>>>>>>>>>>>          pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain
>>>>>>>> <type_decl
>>>>>>>>>>>> 0x2b9b8d42b000 D.6044>>
>>>>>>>>>>>>
>>>>>>>>>>>> The tree-optimized output is the same with both compilers (as this
>>>>>>>> does not
>>>>>>>>>>>> mention alignment); the expand output differs.
>>>>>>>>>>>>
>>>>>>>>>>>> Still investigating...
>>>>>>>>>>>>
>>>>>>>>>>>> --Alan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Richard Biener wrote:
>>>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA
>>>>>>>>>>>>> drops alignment info unnecessarily.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2015-03-11  Richard Biener  <rguenther@suse.de>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  PR tree-optimization/65310
>>>>>>>>>>>>>  * tree-sra.c (build_ref_for_offset): Also preserve larger
>>>>>>>>>>>>>  alignment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Index: gcc/tree-sra.c
>>>>>>>>>>>>>
>>>>>>>> ===================================================================
>>>>>>>>>>>>> --- gcc/tree-sra.c       (revision 221324)
>>>>>>>>>>>>> +++ gcc/tree-sra.c       (working copy)
>>>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr
>>>>>>>>>>>>>    misalign = (misalign + offset) & (align - 1);
>>>>>>>>>>>>>    if (misalign != 0)
>>>>>>>>>>>>>      align = (misalign & -misalign);
>>>>>>>>>>>>> -  if (align < TYPE_ALIGN (exp_type))
>>>>>>>>>>>>> +  if (align != TYPE_ALIGN (exp_type))
>>>>>>>>>>>>>      exp_type = build_aligned_type (exp_type, align);
>>>>>>>>>>>>>     mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base,
>>>>>>>> off);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
diff mbox

Patch

Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c	(revision 221324)
+++ gcc/tree-sra.c	(working copy)
@@ -1597,7 +1597,7 @@  build_ref_for_offset (location_t loc, tr
   misalign = (misalign + offset) & (align - 1);
   if (misalign != 0)
     align = (misalign & -misalign);
-  if (align < TYPE_ALIGN (exp_type))
+  if (align != TYPE_ALIGN (exp_type))
     exp_type = build_aligned_type (exp_type, align);
 
   mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);