Message ID | alpine.LSU.2.11.1503111607530.10796@zhemvz.fhfr.qr |
---|---|
State | New |
Headers | show |
We've been seeing a bunch of new failures in the *libffi* testsuite on ARM Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this one-liner fix. I've reduced the testcase down to the attached (including removing any dependency on libffi); with gcc r221347, this prints the expected 7 8 9 whereas with gcc r221348, instead it prints 0 8 0 The action of r221348 is to change the alignment of a mem_ref, and a var_decl of b1, from 32 to 64; both have type type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 BLK size <integer_cst 0x2b9b8d3720a8 constant 192> unit size <integer_cst 0x2b9b8d372078 constant 24> align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20 fields <field_decl 0x2b9b8d42b098 a type <integer_type 0x2b9b8d092690 int> SI file reduced.c line 12 col 7 size <integer_cst 0x2b9b8d08eeb8 constant 32> unit size <integer_cst 0x2b9b8d08eed0 constant 4> align 32 offset_align 64 offset <integer_cst 0x2b9b8d08eee8 constant 0> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl 0x2b9b8d42b000 D.6044>> The tree-optimized output is the same with both compilers (as this does not mention alignment); the expand output differs. Still investigating... --Alan Richard Biener wrote: > This fixes a vectorizer testcase regression on powerpc where SRA > drops alignment info unnecessarily. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > > Richard. > > 2015-03-11 Richard Biener <rguenther@suse.de> > > PR tree-optimization/65310 > * tree-sra.c (build_ref_for_offset): Also preserve larger > alignment. > > Index: gcc/tree-sra.c > =================================================================== > --- gcc/tree-sra.c (revision 221324) > +++ gcc/tree-sra.c (working copy) > @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > misalign = (misalign + offset) & (align - 1); > if (misalign != 0) > align = (misalign & -misalign); > - if (align < TYPE_ALIGN (exp_type)) > + if (align != TYPE_ALIGN (exp_type)) > exp_type = build_aligned_type (exp_type, align); > > mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off); > >
...actually attach the testcase... Alan Lawrence wrote: > We've been seeing a bunch of new failures in the *libffi* testsuite on ARM Linux > (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this one-liner > fix. I've reduced the testcase down to the attached (including removing any > dependency on libffi); with gcc r221347, this prints the expected > 7 8 9 > whereas with gcc r221348, instead it prints > 0 8 0 > > The action of r221348 is to change the alignment of a mem_ref, and a var_decl of > b1, from 32 to 64; both have type > type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 BLK > size <integer_cst 0x2b9b8d3720a8 constant 192> > unit size <integer_cst 0x2b9b8d372078 constant 24> > align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20 > fields <field_decl 0x2b9b8d42b098 a type <integer_type 0x2b9b8d092690 int> > SI file reduced.c line 12 col 7 > size <integer_cst 0x2b9b8d08eeb8 constant 32> > unit size <integer_cst 0x2b9b8d08eed0 constant 4> > align 32 offset_align 64 > offset <integer_cst 0x2b9b8d08eee8 constant 0> > bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context > <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl 0x2b9b8d42b130 > b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070> > pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl > 0x2b9b8d42b000 D.6044>> > > The tree-optimized output is the same with both compilers (as this does not > mention alignment); the expand output differs. > > Still investigating... > > --Alan > > > Richard Biener wrote: >> This fixes a vectorizer testcase regression on powerpc where SRA >> drops alignment info unnecessarily. >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >> >> Richard. >> >> 2015-03-11 Richard Biener <rguenther@suse.de> >> >> PR tree-optimization/65310 >> * tree-sra.c (build_ref_for_offset): Also preserve larger >> alignment. >> >> Index: gcc/tree-sra.c >> =================================================================== >> --- gcc/tree-sra.c (revision 221324) >> +++ gcc/tree-sra.c (working copy) >> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >> misalign = (misalign + offset) & (align - 1); >> if (misalign != 0) >> align = (misalign & -misalign); >> - if (align < TYPE_ALIGN (exp_type)) >> + if (align != TYPE_ALIGN (exp_type)) >> exp_type = build_aligned_type (exp_type, align); >> >> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off); >> >> > >
On Mon, 30 Mar 2015, Alan Lawrence wrote: > ...actually attach the testcase... What compile options? > Alan Lawrence wrote: > > We've been seeing a bunch of new failures in the *libffi* testsuite on ARM > > Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this > > one-liner fix. I've reduced the testcase down to the attached (including > > removing any dependency on libffi); with gcc r221347, this prints the > > expected > > 7 8 9 > > whereas with gcc r221348, instead it prints > > 0 8 0 > > > > The action of r221348 is to change the alignment of a mem_ref, and a > > var_decl of b1, from 32 to 64; both have type > > type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 > > BLK > > size <integer_cst 0x2b9b8d3720a8 constant 192> > > unit size <integer_cst 0x2b9b8d372078 constant 24> > > align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20 > > fields <field_decl 0x2b9b8d42b098 a type <integer_type > > 0x2b9b8d092690 int> > > SI file reduced.c line 12 col 7 > > size <integer_cst 0x2b9b8d08eeb8 constant 32> > > unit size <integer_cst 0x2b9b8d08eed0 constant 4> > > align 32 offset_align 64 > > offset <integer_cst 0x2b9b8d08eee8 constant 0> > > bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context > > <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl > > 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070> > > pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl > > 0x2b9b8d42b000 D.6044>> > > > > The tree-optimized output is the same with both compilers (as this does not > > mention alignment); the expand output differs. > > > > Still investigating... > > > > --Alan > > > > > > Richard Biener wrote: > > > This fixes a vectorizer testcase regression on powerpc where SRA > > > drops alignment info unnecessarily. > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > > > > > > Richard. > > > > > > 2015-03-11 Richard Biener <rguenther@suse.de> > > > > > > PR tree-optimization/65310 > > > * tree-sra.c (build_ref_for_offset): Also preserve larger > > > alignment. > > > > > > Index: gcc/tree-sra.c > > > =================================================================== > > > --- gcc/tree-sra.c (revision 221324) > > > +++ gcc/tree-sra.c (working copy) > > > @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > > > misalign = (misalign + offset) & (align - 1); > > > if (misalign != 0) > > > align = (misalign & -misalign); > > > - if (align < TYPE_ALIGN (exp_type)) > > > + if (align != TYPE_ALIGN (exp_type)) > > > exp_type = build_aligned_type (exp_type, align); > > > mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off); > > > > > > > > > > > > > >
On Mon, 30 Mar 2015, Richard Biener wrote: > On Mon, 30 Mar 2015, Alan Lawrence wrote: > > > ...actually attach the testcase... > > What compile options? Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but I can't see anything not guaranteeing that: .section .rodata .align 3 .LANCHOR0 = . + 0 .LC1: .ascii "%d %g %d\012\000" .space 6 .LC0: .word 7 .space 4 .word 0 .word 1075838976 .word 9 .space 4 maybe there is some more generic code-gen bug for aligned aggregate copy? That is, the patch tells the backend that the loads and stores to the 'int' vars (which have padding followed) is aligned to 8 bytes. I don't see what is wrong in the final assembler, but maybe some endian issue exists? The code looks quite ugly though ;) Richard. > > > Alan Lawrence wrote: > > > We've been seeing a bunch of new failures in the *libffi* testsuite on ARM > > > Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this > > > one-liner fix. I've reduced the testcase down to the attached (including > > > removing any dependency on libffi); with gcc r221347, this prints the > > > expected > > > 7 8 9 > > > whereas with gcc r221348, instead it prints > > > 0 8 0 > > > > > > The action of r221348 is to change the alignment of a mem_ref, and a > > > var_decl of b1, from 32 to 64; both have type > > > type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 > > > BLK > > > size <integer_cst 0x2b9b8d3720a8 constant 192> > > > unit size <integer_cst 0x2b9b8d372078 constant 24> > > > align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20 > > > fields <field_decl 0x2b9b8d42b098 a type <integer_type > > > 0x2b9b8d092690 int> > > > SI file reduced.c line 12 col 7 > > > size <integer_cst 0x2b9b8d08eeb8 constant 32> > > > unit size <integer_cst 0x2b9b8d08eed0 constant 4> > > > align 32 offset_align 64 > > > offset <integer_cst 0x2b9b8d08eee8 constant 0> > > > bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context > > > <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl > > > 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070> > > > pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl > > > 0x2b9b8d42b000 D.6044>> > > > > > > The tree-optimized output is the same with both compilers (as this does not > > > mention alignment); the expand output differs. > > > > > > Still investigating... > > > > > > --Alan > > > > > > > > > Richard Biener wrote: > > > > This fixes a vectorizer testcase regression on powerpc where SRA > > > > drops alignment info unnecessarily. > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > > > > > > > > Richard. > > > > > > > > 2015-03-11 Richard Biener <rguenther@suse.de> > > > > > > > > PR tree-optimization/65310 > > > > * tree-sra.c (build_ref_for_offset): Also preserve larger > > > > alignment. > > > > > > > > Index: gcc/tree-sra.c > > > > =================================================================== > > > > --- gcc/tree-sra.c (revision 221324) > > > > +++ gcc/tree-sra.c (working copy) > > > > @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > > > > misalign = (misalign + offset) & (align - 1); > > > > if (misalign != 0) > > > > align = (misalign & -misalign); > > > > - if (align < TYPE_ALIGN (exp_type)) > > > > + if (align != TYPE_ALIGN (exp_type)) > > > > exp_type = build_aligned_type (exp_type, align); > > > > mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off); > > > > > > > > > > > > > > > > > > > > > > > >
-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes it. The problem appears to be in laying out arguments, specifically varargs. From the "good" -fdump-rtl-expand: (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 S4 A32]) (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 (nil)) (insn 19 18 20 2 (set (reg:DF 2 r2) (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 (nil)) (insn 20 19 21 2 (set (reg:SI 1 r1) (reg:SI 113 [ b1 ])) reduced.c:14 -1 (nil)) (insn 21 20 22 2 (set (reg:SI 0 r0) (reg:SI 118)) reduced.c:14 -1 (nil)) (call_insn 22 21 23 2 (parallel [ (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 A32]) The struct members are reg:SI 113 => int a; reg:DF 112 => double b; reg:SI 111 => int c; r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is pushed into virtual-outgoing-args. In contrast, post-change to build_ref_of_offset, we get: (insn 17 16 18 2 (set (reg:SI 118) (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 *.LC1>)) reduced.c:14 -1 (nil)) (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 virtual-outgoing-args) (const_int 8 [0x8])) [0 S4 A64]) (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 (nil)) (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 S8 A64]) (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 (nil)) (insn 20 19 21 2 (set (reg:SI 2 r2) (reg:SI 113 [ b1 ])) reduced.c:14 -1 (nil)) (insn 21 20 22 2 (set (reg:SI 0 r0) (reg:SI 118)) reduced.c:14 -1 (nil)) (call_insn 22 21 23 2 (parallel [ (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 A32]) r0 still gets the format string, but 'int b1.a' now goes in r2, and the double+following int are all pushed into virtual-outgoing-args. This is because arm_function_arg is fed a 64-bit-aligned int as type of the second argument (the type constructed by build_ref_for_offset); it then executes (aapcs_layout_arg, arm.c line ~~5914) /* C3 - For double-word aligned arguments, round the NCRN up to the next even number. */ ncrn = pcum->aapcs_ncrn; if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) ncrn++; Which changes r1 to r2. Passing -fno-tree-sra, or removing from the testcase "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a 32-bit-aligned int instead, which works as previously. Passing the same members of that struct in a non-vargs call, works ok - I think because these use the type of the declared parameters, rather than the provided arguments, and the former do not have the increased alignment from build_ref_for_offset. FWIW, I also tried: __attribute__((__aligned__((16)))) int x; int main (void) { __builtin_printf("%d\n", x); } but in that case, the arm_function_arg is still fed a type with alignment 32 (bits), i.e. distinct from the type of the field 'x' in memory, which has alignment 128. --Alan Richard Biener wrote: > On Mon, 30 Mar 2015, Richard Biener wrote: > >> On Mon, 30 Mar 2015, Alan Lawrence wrote: >> >>> ...actually attach the testcase... >> What compile options? > > Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but > I can't see anything not guaranteeing that: > > .section .rodata > .align 3 > .LANCHOR0 = . + 0 > .LC1: > .ascii "%d %g %d\012\000" > .space 6 > .LC0: > .word 7 > .space 4 > .word 0 > .word 1075838976 > .word 9 > .space 4 > > maybe there is some more generic code-gen bug for aligned aggregate > copy? That is, the patch tells the backend that the loads and > stores to the 'int' vars (which have padding followed) is aligned > to 8 bytes. > > I don't see what is wrong in the final assembler, but maybe > some endian issue exists? The code looks quite ugly though ;) > > Richard. > >>> Alan Lawrence wrote: >>>> We've been seeing a bunch of new failures in the *libffi* testsuite on ARM >>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), following this >>>> one-liner fix. I've reduced the testcase down to the attached (including >>>> removing any dependency on libffi); with gcc r221347, this prints the >>>> expected >>>> 7 8 9 >>>> whereas with gcc r221348, instead it prints >>>> 0 8 0 >>>> >>>> The action of r221348 is to change the alignment of a mem_ref, and a >>>> var_decl of b1, from 32 to 64; both have type >>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte sizes-gimplified type_0 >>>> BLK >>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>> align 64 symtab 0 alias set 1 canonical type 0x2b9b8d428d20 >>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>> 0x2b9b8d092690 int> >>>> SI file reduced.c line 12 col 7 >>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>> align 32 offset_align 64 >>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> context >>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 D.6070> >>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain <type_decl >>>> 0x2b9b8d42b000 D.6044>> >>>> >>>> The tree-optimized output is the same with both compilers (as this does not >>>> mention alignment); the expand output differs. >>>> >>>> Still investigating... >>>> >>>> --Alan >>>> >>>> >>>> Richard Biener wrote: >>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>> drops alignment info unnecessarily. >>>>> >>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>> >>>>> Richard. >>>>> >>>>> 2015-03-11 Richard Biener <rguenther@suse.de> >>>>> >>>>> PR tree-optimization/65310 >>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>> alignment. >>>>> >>>>> Index: gcc/tree-sra.c >>>>> =================================================================== >>>>> --- gcc/tree-sra.c (revision 221324) >>>>> +++ gcc/tree-sra.c (working copy) >>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>> misalign = (misalign + offset) & (align - 1); >>>>> if (misalign != 0) >>>>> align = (misalign & -misalign); >>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>> exp_type = build_aligned_type (exp_type, align); >>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off); >>>>> >>>>> >>>> >>> >>> >>> >> >
On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: >-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >it. > >The problem appears to be in laying out arguments, specifically >varargs. From >the "good" -fdump-rtl-expand: > >(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >S4 A32]) > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > (nil)) >(insn 19 18 20 2 (set (reg:DF 2 r2) > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > (nil)) >(insn 20 19 21 2 (set (reg:SI 1 r1) > (reg:SI 113 [ b1 ])) reduced.c:14 -1 > (nil)) >(insn 21 20 22 2 (set (reg:SI 0 r0) > (reg:SI 118)) reduced.c:14 -1 > (nil)) >(call_insn 22 21 23 2 (parallel [ > (set (reg:SI 0 r0) > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] ><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 >A32]) > >The struct members are >reg:SI 113 => int a; >reg:DF 112 => double b; >reg:SI 111 => int c; > >r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >pushed >into virtual-outgoing-args. In contrast, post-change to >build_ref_of_offset, we get: > >(insn 17 16 18 2 (set (reg:SI 118) > (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 >*.LC1>)) reduced.c:14 -1 > (nil)) >(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >virtual-outgoing-args) > (const_int 8 [0x8])) [0 S4 A64]) > (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > (nil)) >(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >S8 A64]) > (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > (nil)) >(insn 20 19 21 2 (set (reg:SI 2 r2) > (reg:SI 113 [ b1 ])) reduced.c:14 -1 > (nil)) >(insn 21 20 22 2 (set (reg:SI 0 r0) > (reg:SI 118)) reduced.c:14 -1 > (nil)) >(call_insn 22 21 23 2 (parallel [ > (set (reg:SI 0 r0) > (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] ><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 >A32]) > >r0 still gets the format string, but 'int b1.a' now goes in r2, and the > >double+following int are all pushed into virtual-outgoing-args. This is >because >arm_function_arg is fed a 64-bit-aligned int as type of the second >argument (the >type constructed by build_ref_for_offset); it then executes >(aapcs_layout_arg, >arm.c line ~~5914) > > /* C3 - For double-word aligned arguments, round the NCRN up to the > next even number. */ > ncrn = pcum->aapcs_ncrn; > if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > ncrn++; > >Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >testcase >"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >32-bit-aligned int instead, which works as previously. > >Passing the same members of that struct in a non-vargs call, works ok - >I think >because these use the type of the declared parameters, rather than the >provided >arguments, and the former do not have the increased alignment from >build_ref_for_offset. It doesn't make sense to use the alignment of passed values. That looks like bs. This means that Int I __aligned__(8); Is passed differently than int. Arm_function_arg needs to be fixed. Richard. > >FWIW, I also tried: > >__attribute__((__aligned__((16)))) int x; >int main (void) >{ > __builtin_printf("%d\n", x); >} > >but in that case, the arm_function_arg is still fed a type with >alignment 32 >(bits), i.e. distinct from the type of the field 'x' in memory, which >has >alignment 128. > >--Alan > >Richard Biener wrote: >> On Mon, 30 Mar 2015, Richard Biener wrote: >> >>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>> >>>> ...actually attach the testcase... >>> What compile options? >> >> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >> I can't see anything not guaranteeing that: >> >> .section .rodata >> .align 3 >> .LANCHOR0 = . + 0 >> .LC1: >> .ascii "%d %g %d\012\000" >> .space 6 >> .LC0: >> .word 7 >> .space 4 >> .word 0 >> .word 1075838976 >> .word 9 >> .space 4 >> >> maybe there is some more generic code-gen bug for aligned aggregate >> copy? That is, the patch tells the backend that the loads and >> stores to the 'int' vars (which have padding followed) is aligned >> to 8 bytes. >> >> I don't see what is wrong in the final assembler, but maybe >> some endian issue exists? The code looks quite ugly though ;) >> >> Richard. >> >>>> Alan Lawrence wrote: >>>>> We've been seeing a bunch of new failures in the *libffi* >testsuite on ARM >>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >following this >>>>> one-liner fix. I've reduced the testcase down to the attached >(including >>>>> removing any dependency on libffi); with gcc r221347, this prints >the >>>>> expected >>>>> 7 8 9 >>>>> whereas with gcc r221348, instead it prints >>>>> 0 8 0 >>>>> >>>>> The action of r221348 is to change the alignment of a mem_ref, and >a >>>>> var_decl of b1, from 32 to 64; both have type >>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte >sizes-gimplified type_0 >>>>> BLK >>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>>> align 64 symtab 0 alias set 1 canonical type >0x2b9b8d428d20 >>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>>> 0x2b9b8d092690 int> >>>>> SI file reduced.c line 12 col 7 >>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>>> align 32 offset_align 64 >>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> >context >>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 >D.6070> >>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain ><type_decl >>>>> 0x2b9b8d42b000 D.6044>> >>>>> >>>>> The tree-optimized output is the same with both compilers (as this >does not >>>>> mention alignment); the expand output differs. >>>>> >>>>> Still investigating... >>>>> >>>>> --Alan >>>>> >>>>> >>>>> Richard Biener wrote: >>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>> drops alignment info unnecessarily. >>>>>> >>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>> >>>>>> Richard. >>>>>> >>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> >>>>>> >>>>>> PR tree-optimization/65310 >>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>> alignment. >>>>>> >>>>>> Index: gcc/tree-sra.c >>>>>> >=================================================================== >>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>> +++ gcc/tree-sra.c (working copy) >>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>> misalign = (misalign + offset) & (align - 1); >>>>>> if (misalign != 0) >>>>>> align = (misalign & -misalign); >>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >off); >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >>
On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: > On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: >>-O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >>it. >> >>The problem appears to be in laying out arguments, specifically >>varargs. From >>the "good" -fdump-rtl-expand: >> >>(insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >>S4 A32]) >> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >> (nil)) >>(insn 19 18 20 2 (set (reg:DF 2 r2) >> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >> (nil)) >>(insn 20 19 21 2 (set (reg:SI 1 r1) >> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >> (nil)) >>(insn 21 20 22 2 (set (reg:SI 0 r0) >> (reg:SI 118)) reduced.c:14 -1 >> (nil)) >>(call_insn 22 21 23 2 (parallel [ >> (set (reg:SI 0 r0) >> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >><function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 >>A32]) >> >>The struct members are >>reg:SI 113 => int a; >>reg:DF 112 => double b; >>reg:SI 111 => int c; >> >>r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >>pushed >>into virtual-outgoing-args. In contrast, post-change to >>build_ref_of_offset, we get: >> >>(insn 17 16 18 2 (set (reg:SI 118) >> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 >>*.LC1>)) reduced.c:14 -1 >> (nil)) >>(insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >>virtual-outgoing-args) >> (const_int 8 [0x8])) [0 S4 A64]) >> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >> (nil)) >>(insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >>S8 A64]) >> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >> (nil)) >>(insn 20 19 21 2 (set (reg:SI 2 r2) >> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >> (nil)) >>(insn 21 20 22 2 (set (reg:SI 0 r0) >> (reg:SI 118)) reduced.c:14 -1 >> (nil)) >>(call_insn 22 21 23 2 (parallel [ >> (set (reg:SI 0 r0) >> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >><function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 >>A32]) >> >>r0 still gets the format string, but 'int b1.a' now goes in r2, and the >> >>double+following int are all pushed into virtual-outgoing-args. This is >>because >>arm_function_arg is fed a 64-bit-aligned int as type of the second >>argument (the >>type constructed by build_ref_for_offset); it then executes >>(aapcs_layout_arg, >>arm.c line ~~5914) >> >> /* C3 - For double-word aligned arguments, round the NCRN up to the >> next even number. */ >> ncrn = pcum->aapcs_ncrn; >> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) >> ncrn++; >> >>Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >>testcase >>"*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >>32-bit-aligned int instead, which works as previously. >> >>Passing the same members of that struct in a non-vargs call, works ok - >>I think >>because these use the type of the declared parameters, rather than the >>provided >>arguments, and the former do not have the increased alignment from >>build_ref_for_offset. > > It doesn't make sense to use the alignment of passed values. That looks like bs. > > This means that > > Int I __aligned__(8); > > Is passed differently than int. > > Arm_function_arg needs to be fixed. That is, typedef int myint __attribute__((aligned(8))); int main() { myint i = 1; int j = 2; __builtin_printf("%d %d\n", i, j); } or myint i; int j; myint *p = &i; int *q = &j; int main() { __builtin_printf("%d %d", *p, *q); } should behave the same. There isn't a printf modifier for an "aligned int" because that sort of thing doesn't make sense. Special-casing aligned vs. non-aligned values only makes sense for things passed by value on the stack. And then obviously only dependent on the functuion type signature, not on the type of the passed value. Richard. > Richard. > >> >>FWIW, I also tried: >> >>__attribute__((__aligned__((16)))) int x; >>int main (void) >>{ >> __builtin_printf("%d\n", x); >>} >> >>but in that case, the arm_function_arg is still fed a type with >>alignment 32 >>(bits), i.e. distinct from the type of the field 'x' in memory, which >>has >>alignment 128. >> >>--Alan >> >>Richard Biener wrote: >>> On Mon, 30 Mar 2015, Richard Biener wrote: >>> >>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>>> >>>>> ...actually attach the testcase... >>>> What compile options? >>> >>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >>> I can't see anything not guaranteeing that: >>> >>> .section .rodata >>> .align 3 >>> .LANCHOR0 = . + 0 >>> .LC1: >>> .ascii "%d %g %d\012\000" >>> .space 6 >>> .LC0: >>> .word 7 >>> .space 4 >>> .word 0 >>> .word 1075838976 >>> .word 9 >>> .space 4 >>> >>> maybe there is some more generic code-gen bug for aligned aggregate >>> copy? That is, the patch tells the backend that the loads and >>> stores to the 'int' vars (which have padding followed) is aligned >>> to 8 bytes. >>> >>> I don't see what is wrong in the final assembler, but maybe >>> some endian issue exists? The code looks quite ugly though ;) >>> >>> Richard. >>> >>>>> Alan Lawrence wrote: >>>>>> We've been seeing a bunch of new failures in the *libffi* >>testsuite on ARM >>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >>following this >>>>>> one-liner fix. I've reduced the testcase down to the attached >>(including >>>>>> removing any dependency on libffi); with gcc r221347, this prints >>the >>>>>> expected >>>>>> 7 8 9 >>>>>> whereas with gcc r221348, instead it prints >>>>>> 0 8 0 >>>>>> >>>>>> The action of r221348 is to change the alignment of a mem_ref, and >>a >>>>>> var_decl of b1, from 32 to 64; both have type >>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte >>sizes-gimplified type_0 >>>>>> BLK >>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>>>> align 64 symtab 0 alias set 1 canonical type >>0x2b9b8d428d20 >>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>>>> 0x2b9b8d092690 int> >>>>>> SI file reduced.c line 12 col 7 >>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>>>> align 32 offset_align 64 >>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> >>context >>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 >>D.6070> >>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain >><type_decl >>>>>> 0x2b9b8d42b000 D.6044>> >>>>>> >>>>>> The tree-optimized output is the same with both compilers (as this >>does not >>>>>> mention alignment); the expand output differs. >>>>>> >>>>>> Still investigating... >>>>>> >>>>>> --Alan >>>>>> >>>>>> >>>>>> Richard Biener wrote: >>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>>> drops alignment info unnecessarily. >>>>>>> >>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> >>>>>>> >>>>>>> PR tree-optimization/65310 >>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>>> alignment. >>>>>>> >>>>>>> Index: gcc/tree-sra.c >>>>>>> >>=================================================================== >>>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>>> +++ gcc/tree-sra.c (working copy) >>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>>> misalign = (misalign + offset) & (align - 1); >>>>>>> if (misalign != 0) >>>>>>> align = (misalign & -misalign); >>>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >>off); >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> > >
On 31/03/15 08:50, Richard Biener wrote: > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >>> it. >>> >>> The problem appears to be in laying out arguments, specifically >>> varargs. From >>> the "good" -fdump-rtl-expand: >>> >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >>> S4 A32]) >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>> (nil)) >>> (insn 19 18 20 2 (set (reg:DF 2 r2) >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>> (nil)) >>> (insn 20 19 21 2 (set (reg:SI 1 r1) >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>> (nil)) >>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>> (reg:SI 118)) reduced.c:14 -1 >>> (nil)) >>> (call_insn 22 21 23 2 (parallel [ >>> (set (reg:SI 0 r0) >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 >>> A32]) >>> >>> The struct members are >>> reg:SI 113 => int a; >>> reg:DF 112 => double b; >>> reg:SI 111 => int c; >>> >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >>> pushed >>> into virtual-outgoing-args. In contrast, post-change to >>> build_ref_of_offset, we get: >>> >>> (insn 17 16 18 2 (set (reg:SI 118) >>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 >>> *.LC1>)) reduced.c:14 -1 >>> (nil)) >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >>> virtual-outgoing-args) >>> (const_int 8 [0x8])) [0 S4 A64]) >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>> (nil)) >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >>> S8 A64]) >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>> (nil)) >>> (insn 20 19 21 2 (set (reg:SI 2 r2) >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>> (nil)) >>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>> (reg:SI 118)) reduced.c:14 -1 >>> (nil)) >>> (call_insn 22 21 23 2 (parallel [ >>> (set (reg:SI 0 r0) >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 >>> A32]) >>> >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the >>> >>> double+following int are all pushed into virtual-outgoing-args. This is >>> because >>> arm_function_arg is fed a 64-bit-aligned int as type of the second >>> argument (the >>> type constructed by build_ref_for_offset); it then executes >>> (aapcs_layout_arg, >>> arm.c line ~~5914) >>> >>> /* C3 - For double-word aligned arguments, round the NCRN up to the >>> next even number. */ >>> ncrn = pcum->aapcs_ncrn; >>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) >>> ncrn++; >>> >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >>> testcase >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >>> 32-bit-aligned int instead, which works as previously. >>> >>> Passing the same members of that struct in a non-vargs call, works ok - >>> I think >>> because these use the type of the declared parameters, rather than the >>> provided >>> arguments, and the former do not have the increased alignment from >>> build_ref_for_offset. >> >> It doesn't make sense to use the alignment of passed values. That looks like bs. >> >> This means that >> >> Int I __aligned__(8); >> >> Is passed differently than int. >> >> Arm_function_arg needs to be fixed. > > That is, > > typedef int myint __attribute__((aligned(8))); > > int main() > { > myint i = 1; > int j = 2; > __builtin_printf("%d %d\n", i, j); > } > > or > > myint i; > int j; > myint *p = &i; > int *q = &j; > > int main() > { > __builtin_printf("%d %d", *p, *q); > } > > should behave the same. There isn't a printf modifier for an "aligned int" > because that sort of thing doesn't make sense. Special-casing aligned vs. > non-aligned values only makes sense for things passed by value on the stack. > And then obviously only dependent on the functuion type signature, not > on the type of the passed value. > I think the testcase is ill-formed. Just because printf doesn't have such a modifier, doesn't mean that another variadic function might not have a means to detect when an object in the variadic list needs to be over-aligned. As such, the test should really be written as: typedef int myint __attribute__((aligned(8))); int main() { myint i = 1; int j = 2; __builtin_printf("%d %d\n", (int) i, j); } Variadic functions take the types of their arguments from the types of the actuals passed. The compiler should either be applying appropriate promotion rules to make the types conformant by the language specification or respecting the types exactly. However, that should be done in the mid-end not the back-end. If incorrect alignment information is passed to the back-end it can't help but make the wrong choice. Examining the mode here wouldn't help either. Consider: int foo (int a, int b, int c, int d, int e, int f __attribute__((aligned(8)))); On ARM that should pass f in a 64-bit aligned location (since the parameter will be on the stack). R. > Richard. > >> Richard. >> >>> >>> FWIW, I also tried: >>> >>> __attribute__((__aligned__((16)))) int x; >>> int main (void) >>> { >>> __builtin_printf("%d\n", x); >>> } >>> >>> but in that case, the arm_function_arg is still fed a type with >>> alignment 32 >>> (bits), i.e. distinct from the type of the field 'x' in memory, which >>> has >>> alignment 128. >>> >>> --Alan >>> >>> Richard Biener wrote: >>>> On Mon, 30 Mar 2015, Richard Biener wrote: >>>> >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>>>> >>>>>> ...actually attach the testcase... >>>>> What compile options? >>>> >>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >>>> I can't see anything not guaranteeing that: >>>> >>>> .section .rodata >>>> .align 3 >>>> .LANCHOR0 = . + 0 >>>> .LC1: >>>> .ascii "%d %g %d\012\000" >>>> .space 6 >>>> .LC0: >>>> .word 7 >>>> .space 4 >>>> .word 0 >>>> .word 1075838976 >>>> .word 9 >>>> .space 4 >>>> >>>> maybe there is some more generic code-gen bug for aligned aggregate >>>> copy? That is, the patch tells the backend that the loads and >>>> stores to the 'int' vars (which have padding followed) is aligned >>>> to 8 bytes. >>>> >>>> I don't see what is wrong in the final assembler, but maybe >>>> some endian issue exists? The code looks quite ugly though ;) >>>> >>>> Richard. >>>> >>>>>> Alan Lawrence wrote: >>>>>>> We've been seeing a bunch of new failures in the *libffi* >>> testsuite on ARM >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >>> following this >>>>>>> one-liner fix. I've reduced the testcase down to the attached >>> (including >>>>>>> removing any dependency on libffi); with gcc r221347, this prints >>> the >>>>>>> expected >>>>>>> 7 8 9 >>>>>>> whereas with gcc r221348, instead it prints >>>>>>> 0 8 0 >>>>>>> >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and >>> a >>>>>>> var_decl of b1, from 32 to 64; both have type >>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte >>> sizes-gimplified type_0 >>>>>>> BLK >>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>>>>> align 64 symtab 0 alias set 1 canonical type >>> 0x2b9b8d428d20 >>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>>>>> 0x2b9b8d092690 int> >>>>>>> SI file reduced.c line 12 col 7 >>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>>>>> align 32 offset_align 64 >>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> >>> context >>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 >>> D.6070> >>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain >>> <type_decl >>>>>>> 0x2b9b8d42b000 D.6044>> >>>>>>> >>>>>>> The tree-optimized output is the same with both compilers (as this >>> does not >>>>>>> mention alignment); the expand output differs. >>>>>>> >>>>>>> Still investigating... >>>>>>> >>>>>>> --Alan >>>>>>> >>>>>>> >>>>>>> Richard Biener wrote: >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>>>> drops alignment info unnecessarily. >>>>>>>> >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> >>>>>>>> >>>>>>>> PR tree-optimization/65310 >>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>>>> alignment. >>>>>>>> >>>>>>>> Index: gcc/tree-sra.c >>>>>>>> >>> =================================================================== >>>>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>>>> +++ gcc/tree-sra.c (working copy) >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>>>> misalign = (misalign + offset) & (align - 1); >>>>>>>> if (misalign != 0) >>>>>>>> align = (misalign & -misalign); >>>>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >>> off); >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >> >> >
Richard Biener wrote: > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: >> It doesn't make sense to use the alignment of passed values. That looks like bs. >> >> This means that >> >> Int I __aligned__(8); >> >> Is passed differently than int. >> >> Arm_function_arg needs to be fixed. > > That is, > > typedef int myint __attribute__((aligned(8))); > > int main() > { > myint i = 1; > int j = 2; > __builtin_printf("%d %d\n", i, j); > } > > or > > myint i; > int j; > myint *p = &i; > int *q = &j; > > int main() > { > __builtin_printf("%d %d", *p, *q); > } > > should behave the same. There isn't a printf modifier for an "aligned int" > because that sort of thing doesn't make sense. Agreed. All of the cases you post do indeed behave the same, and correctly (they pass the format string in r0, the next argument in r1, and then r2). This is what my "aligned(16)" example was trying to achieve also. From the -fdump-rtl-expand of your last example: (insn 7 6 8 2 (set (reg:SI 117) (mem:SI (reg/f:SI 116) [2 *_4+0 S4 A32])) richie2.c:10 -1 (nil)) (insn 8 7 9 2 (set (reg/f:SI 118) (symbol_ref:SI ("*.LANCHOR0") [flags 0x182])) richie2.c:10 -1 (nil)) (insn 9 8 10 2 (set (reg/f:SI 119) (mem/f/c:SI (plus:SI (reg/f:SI 118) (const_int 4 [0x4])) [1 p+0 S4 A32])) richie2.c:10 -1 (nil)) (insn 10 9 11 2 (set (reg:SI 120) (mem:SI (reg/f:SI 119) [2 *_2+0 S4 A64])) richie2.c:10 -1 *** (nil)) (insn 11 10 12 2 (set (reg:SI 121) (symbol_ref/v/f:SI ("*.LC0") [flags 0x82] <var_decl 0x2b2c3b35d240 *.LC0>)) richie2.c:10 -1 (nil)) (insn 12 11 13 2 (set (reg:SI 2 r2) (reg:SI 117)) richie2.c:10 -1 (nil)) (insn 13 12 14 2 (set (reg:SI 1 r1) (reg:SI 120)) richie2.c:10 -1 (nil)) (insn 14 13 15 2 (set (reg:SI 0 r0) (reg:SI 121)) richie2.c:10 -1 (nil)) *** is the load of *p. The mem has alignment 64, but the type describing the loaded value *p, passed to arm_function_arg, has alignment 32. Even in the first of your examples, or even with an explicit cast to the aligned type: __builtin_printf("%d %d\n", (myint) i, j); we still get alignment 32 in arm_function_arg. It's only if SRA is applied, do we get the situation where the int with 64-bit alignment, is passed to arm_function_arg. Disclaimer, I haven't tracked down how the alignment information flows through the compiler i.e. from build_ref_for_offset into expand_call. But it looks to me like something different is happening in the SRA case...no? --Alan
On Tue, 31 Mar 2015, Richard Earnshaw wrote: > On 31/03/15 08:50, Richard Biener wrote: > > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: > >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: > >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes > >>> it. > >>> > >>> The problem appears to be in laying out arguments, specifically > >>> varargs. From > >>> the "good" -fdump-rtl-expand: > >>> > >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 > >>> S4 A32]) > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > >>> (nil)) > >>> (insn 19 18 20 2 (set (reg:DF 2 r2) > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > >>> (nil)) > >>> (insn 20 19 21 2 (set (reg:SI 1 r1) > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > >>> (nil)) > >>> (insn 21 20 22 2 (set (reg:SI 0 r0) > >>> (reg:SI 118)) reduced.c:14 -1 > >>> (nil)) > >>> (call_insn 22 21 23 2 (parallel [ > >>> (set (reg:SI 0 r0) > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > >>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 > >>> A32]) > >>> > >>> The struct members are > >>> reg:SI 113 => int a; > >>> reg:DF 112 => double b; > >>> reg:SI 111 => int c; > >>> > >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is > >>> pushed > >>> into virtual-outgoing-args. In contrast, post-change to > >>> build_ref_of_offset, we get: > >>> > >>> (insn 17 16 18 2 (set (reg:SI 118) > >>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 > >>> *.LC1>)) reduced.c:14 -1 > >>> (nil)) > >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 > >>> virtual-outgoing-args) > >>> (const_int 8 [0x8])) [0 S4 A64]) > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > >>> (nil)) > >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 > >>> S8 A64]) > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > >>> (nil)) > >>> (insn 20 19 21 2 (set (reg:SI 2 r2) > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > >>> (nil)) > >>> (insn 21 20 22 2 (set (reg:SI 0 r0) > >>> (reg:SI 118)) reduced.c:14 -1 > >>> (nil)) > >>> (call_insn 22 21 23 2 (parallel [ > >>> (set (reg:SI 0 r0) > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > >>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 > >>> A32]) > >>> > >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the > >>> > >>> double+following int are all pushed into virtual-outgoing-args. This is > >>> because > >>> arm_function_arg is fed a 64-bit-aligned int as type of the second > >>> argument (the > >>> type constructed by build_ref_for_offset); it then executes > >>> (aapcs_layout_arg, > >>> arm.c line ~~5914) > >>> > >>> /* C3 - For double-word aligned arguments, round the NCRN up to the > >>> next even number. */ > >>> ncrn = pcum->aapcs_ncrn; > >>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > >>> ncrn++; > >>> > >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the > >>> testcase > >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a > >>> 32-bit-aligned int instead, which works as previously. > >>> > >>> Passing the same members of that struct in a non-vargs call, works ok - > >>> I think > >>> because these use the type of the declared parameters, rather than the > >>> provided > >>> arguments, and the former do not have the increased alignment from > >>> build_ref_for_offset. > >> > >> It doesn't make sense to use the alignment of passed values. That looks like bs. > >> > >> This means that > >> > >> Int I __aligned__(8); > >> > >> Is passed differently than int. > >> > >> Arm_function_arg needs to be fixed. > > > > That is, > > > > typedef int myint __attribute__((aligned(8))); > > > > int main() > > { > > myint i = 1; > > int j = 2; > > __builtin_printf("%d %d\n", i, j); > > } > > > > or > > > > myint i; > > int j; > > myint *p = &i; > > int *q = &j; > > > > int main() > > { > > __builtin_printf("%d %d", *p, *q); > > } > > > > should behave the same. There isn't a printf modifier for an "aligned int" > > because that sort of thing doesn't make sense. Special-casing aligned vs. > > non-aligned values only makes sense for things passed by value on the stack. > > And then obviously only dependent on the functuion type signature, not > > on the type of the passed value. > > > > I think the testcase is ill-formed. Just because printf doesn't have > such a modifier, doesn't mean that another variadic function might not > have a means to detect when an object in the variadic list needs to be > over-aligned. As such, the test should really be written as: A value doesn't have "alignment". A function may have alignment requirements on its arguments, clearly printf doesn't. > typedef int myint __attribute__((aligned(8))); > > int main() > { > myint i = 1; > int j = 2; > __builtin_printf("%d %d\n", (int) i, j); > } > > Variadic functions take the types of their arguments from the types of > the actuals passed. The compiler should either be applying appropriate > promotion rules to make the types conformant by the language > specification or respecting the types exactly. However, that should be > done in the mid-end not the back-end. If incorrect alignment > information is passed to the back-end it can't help but make the wrong > choice. Examining the mode here wouldn't help either. Consider: > > int foo (int a, int b, int c, int d, int e, int f > __attribute__((aligned(8)))); > > On ARM that should pass f in a 64-bit aligned location (since the > parameter will be on the stack). > > R. > > > > Richard. > > > >> Richard. > >> > >>> > >>> FWIW, I also tried: > >>> > >>> __attribute__((__aligned__((16)))) int x; > >>> int main (void) > >>> { > >>> __builtin_printf("%d\n", x); > >>> } > >>> > >>> but in that case, the arm_function_arg is still fed a type with > >>> alignment 32 > >>> (bits), i.e. distinct from the type of the field 'x' in memory, which > >>> has > >>> alignment 128. > >>> > >>> --Alan > >>> > >>> Richard Biener wrote: > >>>> On Mon, 30 Mar 2015, Richard Biener wrote: > >>>> > >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: > >>>>> > >>>>>> ...actually attach the testcase... > >>>>> What compile options? > >>>> > >>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but > >>>> I can't see anything not guaranteeing that: > >>>> > >>>> .section .rodata > >>>> .align 3 > >>>> .LANCHOR0 = . + 0 > >>>> .LC1: > >>>> .ascii "%d %g %d\012\000" > >>>> .space 6 > >>>> .LC0: > >>>> .word 7 > >>>> .space 4 > >>>> .word 0 > >>>> .word 1075838976 > >>>> .word 9 > >>>> .space 4 > >>>> > >>>> maybe there is some more generic code-gen bug for aligned aggregate > >>>> copy? That is, the patch tells the backend that the loads and > >>>> stores to the 'int' vars (which have padding followed) is aligned > >>>> to 8 bytes. > >>>> > >>>> I don't see what is wrong in the final assembler, but maybe > >>>> some endian issue exists? The code looks quite ugly though ;) > >>>> > >>>> Richard. > >>>> > >>>>>> Alan Lawrence wrote: > >>>>>>> We've been seeing a bunch of new failures in the *libffi* > >>> testsuite on ARM > >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), > >>> following this > >>>>>>> one-liner fix. I've reduced the testcase down to the attached > >>> (including > >>>>>>> removing any dependency on libffi); with gcc r221347, this prints > >>> the > >>>>>>> expected > >>>>>>> 7 8 9 > >>>>>>> whereas with gcc r221348, instead it prints > >>>>>>> 0 8 0 > >>>>>>> > >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and > >>> a > >>>>>>> var_decl of b1, from 32 to 64; both have type > >>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte > >>> sizes-gimplified type_0 > >>>>>>> BLK > >>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> > >>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> > >>>>>>> align 64 symtab 0 alias set 1 canonical type > >>> 0x2b9b8d428d20 > >>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type > >>>>>>> 0x2b9b8d092690 int> > >>>>>>> SI file reduced.c line 12 col 7 > >>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> > >>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> > >>>>>>> align 32 offset_align 64 > >>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> > >>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> > >>> context > >>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl > >>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 > >>> D.6070> > >>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain > >>> <type_decl > >>>>>>> 0x2b9b8d42b000 D.6044>> > >>>>>>> > >>>>>>> The tree-optimized output is the same with both compilers (as this > >>> does not > >>>>>>> mention alignment); the expand output differs. > >>>>>>> > >>>>>>> Still investigating... > >>>>>>> > >>>>>>> --Alan > >>>>>>> > >>>>>>> > >>>>>>> Richard Biener wrote: > >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA > >>>>>>>> drops alignment info unnecessarily. > >>>>>>>> > >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > >>>>>>>> > >>>>>>>> Richard. > >>>>>>>> > >>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> > >>>>>>>> > >>>>>>>> PR tree-optimization/65310 > >>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger > >>>>>>>> alignment. > >>>>>>>> > >>>>>>>> Index: gcc/tree-sra.c > >>>>>>>> > >>> =================================================================== > >>>>>>>> --- gcc/tree-sra.c (revision 221324) > >>>>>>>> +++ gcc/tree-sra.c (working copy) > >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > >>>>>>>> misalign = (misalign + offset) & (align - 1); > >>>>>>>> if (misalign != 0) > >>>>>>>> align = (misalign & -misalign); > >>>>>>>> - if (align < TYPE_ALIGN (exp_type)) > >>>>>>>> + if (align != TYPE_ALIGN (exp_type)) > >>>>>>>> exp_type = build_aligned_type (exp_type, align); > >>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, > >>> off); > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >> > >> > > > >
On 31/03/15 11:00, Richard Biener wrote: > On Tue, 31 Mar 2015, Richard Earnshaw wrote: > >> On 31/03/15 08:50, Richard Biener wrote: >>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: >>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: >>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >>>>> it. >>>>> >>>>> The problem appears to be in laying out arguments, specifically >>>>> varargs. From >>>>> the "good" -fdump-rtl-expand: >>>>> >>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >>>>> S4 A32]) >>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>> (nil)) >>>>> (insn 19 18 20 2 (set (reg:DF 2 r2) >>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>> (nil)) >>>>> (insn 20 19 21 2 (set (reg:SI 1 r1) >>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>> (nil)) >>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>> (reg:SI 118)) reduced.c:14 -1 >>>>> (nil)) >>>>> (call_insn 22 21 23 2 (parallel [ >>>>> (set (reg:SI 0 r0) >>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 >>>>> A32]) >>>>> >>>>> The struct members are >>>>> reg:SI 113 => int a; >>>>> reg:DF 112 => double b; >>>>> reg:SI 111 => int c; >>>>> >>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >>>>> pushed >>>>> into virtual-outgoing-args. In contrast, post-change to >>>>> build_ref_of_offset, we get: >>>>> >>>>> (insn 17 16 18 2 (set (reg:SI 118) >>>>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 >>>>> *.LC1>)) reduced.c:14 -1 >>>>> (nil)) >>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >>>>> virtual-outgoing-args) >>>>> (const_int 8 [0x8])) [0 S4 A64]) >>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>> (nil)) >>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >>>>> S8 A64]) >>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>> (nil)) >>>>> (insn 20 19 21 2 (set (reg:SI 2 r2) >>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>> (nil)) >>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>> (reg:SI 118)) reduced.c:14 -1 >>>>> (nil)) >>>>> (call_insn 22 21 23 2 (parallel [ >>>>> (set (reg:SI 0 r0) >>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 >>>>> A32]) >>>>> >>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the >>>>> >>>>> double+following int are all pushed into virtual-outgoing-args. This is >>>>> because >>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second >>>>> argument (the >>>>> type constructed by build_ref_for_offset); it then executes >>>>> (aapcs_layout_arg, >>>>> arm.c line ~~5914) >>>>> >>>>> /* C3 - For double-word aligned arguments, round the NCRN up to the >>>>> next even number. */ >>>>> ncrn = pcum->aapcs_ncrn; >>>>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) >>>>> ncrn++; >>>>> >>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >>>>> testcase >>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >>>>> 32-bit-aligned int instead, which works as previously. >>>>> >>>>> Passing the same members of that struct in a non-vargs call, works ok - >>>>> I think >>>>> because these use the type of the declared parameters, rather than the >>>>> provided >>>>> arguments, and the former do not have the increased alignment from >>>>> build_ref_for_offset. >>>> >>>> It doesn't make sense to use the alignment of passed values. That looks like bs. >>>> >>>> This means that >>>> >>>> Int I __aligned__(8); >>>> >>>> Is passed differently than int. >>>> >>>> Arm_function_arg needs to be fixed. >>> >>> That is, >>> >>> typedef int myint __attribute__((aligned(8))); >>> >>> int main() >>> { >>> myint i = 1; >>> int j = 2; >>> __builtin_printf("%d %d\n", i, j); >>> } >>> >>> or >>> >>> myint i; >>> int j; >>> myint *p = &i; >>> int *q = &j; >>> >>> int main() >>> { >>> __builtin_printf("%d %d", *p, *q); >>> } >>> >>> should behave the same. There isn't a printf modifier for an "aligned int" >>> because that sort of thing doesn't make sense. Special-casing aligned vs. >>> non-aligned values only makes sense for things passed by value on the stack. >>> And then obviously only dependent on the functuion type signature, not >>> on the type of the passed value. >>> >> >> I think the testcase is ill-formed. Just because printf doesn't have >> such a modifier, doesn't mean that another variadic function might not >> have a means to detect when an object in the variadic list needs to be >> over-aligned. As such, the test should really be written as: > > A value doesn't have "alignment". A function may have alignment > requirements on its arguments, clearly printf doesn't. > Values don't. But types do and variadic functions are special in that they derive their types from the types of the actual parameters passed not from the formals in the prototype. Any manipulation of the types should be done in the front end, not in the back end. R. >> typedef int myint __attribute__((aligned(8))); >> >> int main() >> { >> myint i = 1; >> int j = 2; >> __builtin_printf("%d %d\n", (int) i, j); >> } >> >> Variadic functions take the types of their arguments from the types of >> the actuals passed. The compiler should either be applying appropriate >> promotion rules to make the types conformant by the language >> specification or respecting the types exactly. However, that should be >> done in the mid-end not the back-end. If incorrect alignment >> information is passed to the back-end it can't help but make the wrong >> choice. Examining the mode here wouldn't help either. Consider: >> >> int foo (int a, int b, int c, int d, int e, int f >> __attribute__((aligned(8)))); >> >> On ARM that should pass f in a 64-bit aligned location (since the >> parameter will be on the stack). >> >> R. >> >> >>> Richard. >>> >>>> Richard. >>>> >>>>> >>>>> FWIW, I also tried: >>>>> >>>>> __attribute__((__aligned__((16)))) int x; >>>>> int main (void) >>>>> { >>>>> __builtin_printf("%d\n", x); >>>>> } >>>>> >>>>> but in that case, the arm_function_arg is still fed a type with >>>>> alignment 32 >>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which >>>>> has >>>>> alignment 128. >>>>> >>>>> --Alan >>>>> >>>>> Richard Biener wrote: >>>>>> On Mon, 30 Mar 2015, Richard Biener wrote: >>>>>> >>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>>>>>> >>>>>>>> ...actually attach the testcase... >>>>>>> What compile options? >>>>>> >>>>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >>>>>> I can't see anything not guaranteeing that: >>>>>> >>>>>> .section .rodata >>>>>> .align 3 >>>>>> .LANCHOR0 = . + 0 >>>>>> .LC1: >>>>>> .ascii "%d %g %d\012\000" >>>>>> .space 6 >>>>>> .LC0: >>>>>> .word 7 >>>>>> .space 4 >>>>>> .word 0 >>>>>> .word 1075838976 >>>>>> .word 9 >>>>>> .space 4 >>>>>> >>>>>> maybe there is some more generic code-gen bug for aligned aggregate >>>>>> copy? That is, the patch tells the backend that the loads and >>>>>> stores to the 'int' vars (which have padding followed) is aligned >>>>>> to 8 bytes. >>>>>> >>>>>> I don't see what is wrong in the final assembler, but maybe >>>>>> some endian issue exists? The code looks quite ugly though ;) >>>>>> >>>>>> Richard. >>>>>> >>>>>>>> Alan Lawrence wrote: >>>>>>>>> We've been seeing a bunch of new failures in the *libffi* >>>>> testsuite on ARM >>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >>>>> following this >>>>>>>>> one-liner fix. I've reduced the testcase down to the attached >>>>> (including >>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints >>>>> the >>>>>>>>> expected >>>>>>>>> 7 8 9 >>>>>>>>> whereas with gcc r221348, instead it prints >>>>>>>>> 0 8 0 >>>>>>>>> >>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and >>>>> a >>>>>>>>> var_decl of b1, from 32 to 64; both have type >>>>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte >>>>> sizes-gimplified type_0 >>>>>>>>> BLK >>>>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>>>>>>> align 64 symtab 0 alias set 1 canonical type >>>>> 0x2b9b8d428d20 >>>>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>>>>>>> 0x2b9b8d092690 int> >>>>>>>>> SI file reduced.c line 12 col 7 >>>>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>>>>>>> align 32 offset_align 64 >>>>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> >>>>> context >>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 >>>>> D.6070> >>>>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain >>>>> <type_decl >>>>>>>>> 0x2b9b8d42b000 D.6044>> >>>>>>>>> >>>>>>>>> The tree-optimized output is the same with both compilers (as this >>>>> does not >>>>>>>>> mention alignment); the expand output differs. >>>>>>>>> >>>>>>>>> Still investigating... >>>>>>>>> >>>>>>>>> --Alan >>>>>>>>> >>>>>>>>> >>>>>>>>> Richard Biener wrote: >>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>>>>>> drops alignment info unnecessarily. >>>>>>>>>> >>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> >>>>>>>>>> >>>>>>>>>> PR tree-optimization/65310 >>>>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>>>>>> alignment. >>>>>>>>>> >>>>>>>>>> Index: gcc/tree-sra.c >>>>>>>>>> >>>>> =================================================================== >>>>>>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>>>>>> +++ gcc/tree-sra.c (working copy) >>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>>>>>> misalign = (misalign + offset) & (align - 1); >>>>>>>>>> if (misalign != 0) >>>>>>>>>> align = (misalign & -misalign); >>>>>>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >>>>> off); >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> >>> >> >> >
On Tue, 31 Mar 2015, Richard Biener wrote: > On Tue, 31 Mar 2015, Richard Earnshaw wrote: > > > On 31/03/15 08:50, Richard Biener wrote: > > > On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: > > >> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: > > >>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes > > >>> it. > > >>> > > >>> The problem appears to be in laying out arguments, specifically > > >>> varargs. From > > >>> the "good" -fdump-rtl-expand: > > >>> > > >>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 > > >>> S4 A32]) > > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 19 18 20 2 (set (reg:DF 2 r2) > > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 20 19 21 2 (set (reg:SI 1 r1) > > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 21 20 22 2 (set (reg:SI 0 r0) > > >>> (reg:SI 118)) reduced.c:14 -1 > > >>> (nil)) > > >>> (call_insn 22 21 23 2 (parallel [ > > >>> (set (reg:SI 0 r0) > > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > > >>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 > > >>> A32]) > > >>> > > >>> The struct members are > > >>> reg:SI 113 => int a; > > >>> reg:DF 112 => double b; > > >>> reg:SI 111 => int c; > > >>> > > >>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is > > >>> pushed > > >>> into virtual-outgoing-args. In contrast, post-change to > > >>> build_ref_of_offset, we get: > > >>> > > >>> (insn 17 16 18 2 (set (reg:SI 118) > > >>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 > > >>> *.LC1>)) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 > > >>> virtual-outgoing-args) > > >>> (const_int 8 [0x8])) [0 S4 A64]) > > >>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 > > >>> S8 A64]) > > >>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 20 19 21 2 (set (reg:SI 2 r2) > > >>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > > >>> (nil)) > > >>> (insn 21 20 22 2 (set (reg:SI 0 r0) > > >>> (reg:SI 118)) reduced.c:14 -1 > > >>> (nil)) > > >>> (call_insn 22 21 23 2 (parallel [ > > >>> (set (reg:SI 0 r0) > > >>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > > >>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 > > >>> A32]) > > >>> > > >>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the > > >>> > > >>> double+following int are all pushed into virtual-outgoing-args. This is > > >>> because > > >>> arm_function_arg is fed a 64-bit-aligned int as type of the second > > >>> argument (the > > >>> type constructed by build_ref_for_offset); it then executes > > >>> (aapcs_layout_arg, > > >>> arm.c line ~~5914) > > >>> > > >>> /* C3 - For double-word aligned arguments, round the NCRN up to the > > >>> next even number. */ > > >>> ncrn = pcum->aapcs_ncrn; > > >>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > > >>> ncrn++; > > >>> > > >>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the > > >>> testcase > > >>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a > > >>> 32-bit-aligned int instead, which works as previously. > > >>> > > >>> Passing the same members of that struct in a non-vargs call, works ok - > > >>> I think > > >>> because these use the type of the declared parameters, rather than the > > >>> provided > > >>> arguments, and the former do not have the increased alignment from > > >>> build_ref_for_offset. > > >> > > >> It doesn't make sense to use the alignment of passed values. That looks like bs. > > >> > > >> This means that > > >> > > >> Int I __aligned__(8); > > >> > > >> Is passed differently than int. > > >> > > >> Arm_function_arg needs to be fixed. > > > > > > That is, > > > > > > typedef int myint __attribute__((aligned(8))); > > > > > > int main() > > > { > > > myint i = 1; > > > int j = 2; > > > __builtin_printf("%d %d\n", i, j); > > > } > > > > > > or > > > > > > myint i; > > > int j; > > > myint *p = &i; > > > int *q = &j; > > > > > > int main() > > > { > > > __builtin_printf("%d %d", *p, *q); > > > } > > > > > > should behave the same. There isn't a printf modifier for an "aligned int" > > > because that sort of thing doesn't make sense. Special-casing aligned vs. > > > non-aligned values only makes sense for things passed by value on the stack. > > > And then obviously only dependent on the functuion type signature, not > > > on the type of the passed value. > > > > > > > I think the testcase is ill-formed. Just because printf doesn't have > > such a modifier, doesn't mean that another variadic function might not > > have a means to detect when an object in the variadic list needs to be > > over-aligned. As such, the test should really be written as: > > A value doesn't have "alignment". A function may have alignment > requirements on its arguments, clearly printf doesn't. Btw, it looks like function_arg is supposed to pass whether the argument is to the variadic part of the function or not, but it gets passed false as in args[i].reg = targetm.calls.function_arg (args_so_far, mode, type, argpos < n_named_args); n_named_args is 4. That is because ARM asks to do this via if (type_arg_types != 0 && targetm.calls.strict_argument_naming (args_so_far)) ; else if (type_arg_types != 0 && ! targetm.calls.pretend_outgoing_varargs_named (args_so_far)) /* Don't include the last named arg. */ --n_named_args; else /* Treat all args as named. */ n_named_args = num_actuals; thus you lose that info (you could have that readily available). > > typedef int myint __attribute__((aligned(8))); > > > > int main() > > { > > myint i = 1; > > int j = 2; > > __builtin_printf("%d %d\n", (int) i, j); > > } > > > > Variadic functions take the types of their arguments from the types of > > the actuals passed. The compiler should either be applying appropriate > > promotion rules to make the types conformant by the language > > specification or respecting the types exactly. However, that should be > > done in the mid-end not the back-end. If incorrect alignment > > information is passed to the back-end it can't help but make the wrong > > choice. Examining the mode here wouldn't help either. Consider: > > > > int foo (int a, int b, int c, int d, int e, int f > > __attribute__((aligned(8)))); > > > > On ARM that should pass f in a 64-bit aligned location (since the > > parameter will be on the stack). > > > > R. > > > > > > > Richard. > > > > > >> Richard. > > >> > > >>> > > >>> FWIW, I also tried: > > >>> > > >>> __attribute__((__aligned__((16)))) int x; > > >>> int main (void) > > >>> { > > >>> __builtin_printf("%d\n", x); > > >>> } > > >>> > > >>> but in that case, the arm_function_arg is still fed a type with > > >>> alignment 32 > > >>> (bits), i.e. distinct from the type of the field 'x' in memory, which > > >>> has > > >>> alignment 128. > > >>> > > >>> --Alan > > >>> > > >>> Richard Biener wrote: > > >>>> On Mon, 30 Mar 2015, Richard Biener wrote: > > >>>> > > >>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: > > >>>>> > > >>>>>> ...actually attach the testcase... > > >>>>> What compile options? > > >>>> > > >>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but > > >>>> I can't see anything not guaranteeing that: > > >>>> > > >>>> .section .rodata > > >>>> .align 3 > > >>>> .LANCHOR0 = . + 0 > > >>>> .LC1: > > >>>> .ascii "%d %g %d\012\000" > > >>>> .space 6 > > >>>> .LC0: > > >>>> .word 7 > > >>>> .space 4 > > >>>> .word 0 > > >>>> .word 1075838976 > > >>>> .word 9 > > >>>> .space 4 > > >>>> > > >>>> maybe there is some more generic code-gen bug for aligned aggregate > > >>>> copy? That is, the patch tells the backend that the loads and > > >>>> stores to the 'int' vars (which have padding followed) is aligned > > >>>> to 8 bytes. > > >>>> > > >>>> I don't see what is wrong in the final assembler, but maybe > > >>>> some endian issue exists? The code looks quite ugly though ;) > > >>>> > > >>>> Richard. > > >>>> > > >>>>>> Alan Lawrence wrote: > > >>>>>>> We've been seeing a bunch of new failures in the *libffi* > > >>> testsuite on ARM > > >>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), > > >>> following this > > >>>>>>> one-liner fix. I've reduced the testcase down to the attached > > >>> (including > > >>>>>>> removing any dependency on libffi); with gcc r221347, this prints > > >>> the > > >>>>>>> expected > > >>>>>>> 7 8 9 > > >>>>>>> whereas with gcc r221348, instead it prints > > >>>>>>> 0 8 0 > > >>>>>>> > > >>>>>>> The action of r221348 is to change the alignment of a mem_ref, and > > >>> a > > >>>>>>> var_decl of b1, from 32 to 64; both have type > > >>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte > > >>> sizes-gimplified type_0 > > >>>>>>> BLK > > >>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> > > >>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> > > >>>>>>> align 64 symtab 0 alias set 1 canonical type > > >>> 0x2b9b8d428d20 > > >>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type > > >>>>>>> 0x2b9b8d092690 int> > > >>>>>>> SI file reduced.c line 12 col 7 > > >>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> > > >>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> > > >>>>>>> align 32 offset_align 64 > > >>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> > > >>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> > > >>> context > > >>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl > > >>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 > > >>> D.6070> > > >>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain > > >>> <type_decl > > >>>>>>> 0x2b9b8d42b000 D.6044>> > > >>>>>>> > > >>>>>>> The tree-optimized output is the same with both compilers (as this > > >>> does not > > >>>>>>> mention alignment); the expand output differs. > > >>>>>>> > > >>>>>>> Still investigating... > > >>>>>>> > > >>>>>>> --Alan > > >>>>>>> > > >>>>>>> > > >>>>>>> Richard Biener wrote: > > >>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA > > >>>>>>>> drops alignment info unnecessarily. > > >>>>>>>> > > >>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > > >>>>>>>> > > >>>>>>>> Richard. > > >>>>>>>> > > >>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> > > >>>>>>>> > > >>>>>>>> PR tree-optimization/65310 > > >>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger > > >>>>>>>> alignment. > > >>>>>>>> > > >>>>>>>> Index: gcc/tree-sra.c > > >>>>>>>> > > >>> =================================================================== > > >>>>>>>> --- gcc/tree-sra.c (revision 221324) > > >>>>>>>> +++ gcc/tree-sra.c (working copy) > > >>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > > >>>>>>>> misalign = (misalign + offset) & (align - 1); > > >>>>>>>> if (misalign != 0) > > >>>>>>>> align = (misalign & -misalign); > > >>>>>>>> - if (align < TYPE_ALIGN (exp_type)) > > >>>>>>>> + if (align != TYPE_ALIGN (exp_type)) > > >>>>>>>> exp_type = build_aligned_type (exp_type, align); > > >>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, > > >>> off); > > >>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >> > > >> > > > > > > > > >
On 31/03/15 11:20, Richard Biener wrote: > On Tue, 31 Mar 2015, Richard Biener wrote: > >> On Tue, 31 Mar 2015, Richard Earnshaw wrote: >> >>> On 31/03/15 08:50, Richard Biener wrote: >>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: >>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: >>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >>>>>> it. >>>>>> >>>>>> The problem appears to be in laying out arguments, specifically >>>>>> varargs. From >>>>>> the "good" -fdump-rtl-expand: >>>>>> >>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >>>>>> S4 A32]) >>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2) >>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1) >>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>>> (reg:SI 118)) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (call_insn 22 21 23 2 (parallel [ >>>>>> (set (reg:SI 0 r0) >>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 >>>>>> A32]) >>>>>> >>>>>> The struct members are >>>>>> reg:SI 113 => int a; >>>>>> reg:DF 112 => double b; >>>>>> reg:SI 111 => int c; >>>>>> >>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >>>>>> pushed >>>>>> into virtual-outgoing-args. In contrast, post-change to >>>>>> build_ref_of_offset, we get: >>>>>> >>>>>> (insn 17 16 18 2 (set (reg:SI 118) >>>>>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 >>>>>> *.LC1>)) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >>>>>> virtual-outgoing-args) >>>>>> (const_int 8 [0x8])) [0 S4 A64]) >>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >>>>>> S8 A64]) >>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2) >>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>>> (reg:SI 118)) reduced.c:14 -1 >>>>>> (nil)) >>>>>> (call_insn 22 21 23 2 (parallel [ >>>>>> (set (reg:SI 0 r0) >>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 >>>>>> A32]) >>>>>> >>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the >>>>>> >>>>>> double+following int are all pushed into virtual-outgoing-args. This is >>>>>> because >>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second >>>>>> argument (the >>>>>> type constructed by build_ref_for_offset); it then executes >>>>>> (aapcs_layout_arg, >>>>>> arm.c line ~~5914) >>>>>> >>>>>> /* C3 - For double-word aligned arguments, round the NCRN up to the >>>>>> next even number. */ >>>>>> ncrn = pcum->aapcs_ncrn; >>>>>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) >>>>>> ncrn++; >>>>>> >>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >>>>>> testcase >>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >>>>>> 32-bit-aligned int instead, which works as previously. >>>>>> >>>>>> Passing the same members of that struct in a non-vargs call, works ok - >>>>>> I think >>>>>> because these use the type of the declared parameters, rather than the >>>>>> provided >>>>>> arguments, and the former do not have the increased alignment from >>>>>> build_ref_for_offset. >>>>> >>>>> It doesn't make sense to use the alignment of passed values. That looks like bs. >>>>> >>>>> This means that >>>>> >>>>> Int I __aligned__(8); >>>>> >>>>> Is passed differently than int. >>>>> >>>>> Arm_function_arg needs to be fixed. >>>> >>>> That is, >>>> >>>> typedef int myint __attribute__((aligned(8))); >>>> >>>> int main() >>>> { >>>> myint i = 1; >>>> int j = 2; >>>> __builtin_printf("%d %d\n", i, j); >>>> } >>>> >>>> or >>>> >>>> myint i; >>>> int j; >>>> myint *p = &i; >>>> int *q = &j; >>>> >>>> int main() >>>> { >>>> __builtin_printf("%d %d", *p, *q); >>>> } >>>> >>>> should behave the same. There isn't a printf modifier for an "aligned int" >>>> because that sort of thing doesn't make sense. Special-casing aligned vs. >>>> non-aligned values only makes sense for things passed by value on the stack. >>>> And then obviously only dependent on the functuion type signature, not >>>> on the type of the passed value. >>>> >>> >>> I think the testcase is ill-formed. Just because printf doesn't have >>> such a modifier, doesn't mean that another variadic function might not >>> have a means to detect when an object in the variadic list needs to be >>> over-aligned. As such, the test should really be written as: >> >> A value doesn't have "alignment". A function may have alignment >> requirements on its arguments, clearly printf doesn't. > > Btw, it looks like function_arg is supposed to pass whether the argument > is to the variadic part of the function or not, but it gets passed > false as in > > args[i].reg = targetm.calls.function_arg (args_so_far, mode, type, > argpos < n_named_args); > > n_named_args is 4. That is because ARM asks to do this via > > if (type_arg_types != 0 > && targetm.calls.strict_argument_naming (args_so_far)) > ; > else if (type_arg_types != 0 > && ! targetm.calls.pretend_outgoing_varargs_named > (args_so_far)) > /* Don't include the last named arg. */ > --n_named_args; > else > /* Treat all args as named. */ > n_named_args = num_actuals; > > thus you lose that info (you could have that readily available). > From the calling side of a function it shouldn't matter. They only thing the caller has to know is that the function is variadic (and therefore that the base-standard rules from the AAPCS apply -- no use of FP registers for parameters). The behaviour after that is *as if* all the arguments were pushed onto the stack in the correct order and finally the lowest 4 words are popped off the stack again into r0-r3. Hence any alignment that would apply to arguments on the stack has to be reflected in their allocation into r0-r3 (since the push/pop of those registers never happens). R. >>> typedef int myint __attribute__((aligned(8))); >>> >>> int main() >>> { >>> myint i = 1; >>> int j = 2; >>> __builtin_printf("%d %d\n", (int) i, j); >>> } >>> >>> Variadic functions take the types of their arguments from the types of >>> the actuals passed. The compiler should either be applying appropriate >>> promotion rules to make the types conformant by the language >>> specification or respecting the types exactly. However, that should be >>> done in the mid-end not the back-end. If incorrect alignment >>> information is passed to the back-end it can't help but make the wrong >>> choice. Examining the mode here wouldn't help either. Consider: >>> >>> int foo (int a, int b, int c, int d, int e, int f >>> __attribute__((aligned(8)))); >>> >>> On ARM that should pass f in a 64-bit aligned location (since the >>> parameter will be on the stack). >>> >>> R. >>> >>> >>>> Richard. >>>> >>>>> Richard. >>>>> >>>>>> >>>>>> FWIW, I also tried: >>>>>> >>>>>> __attribute__((__aligned__((16)))) int x; >>>>>> int main (void) >>>>>> { >>>>>> __builtin_printf("%d\n", x); >>>>>> } >>>>>> >>>>>> but in that case, the arm_function_arg is still fed a type with >>>>>> alignment 32 >>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which >>>>>> has >>>>>> alignment 128. >>>>>> >>>>>> --Alan >>>>>> >>>>>> Richard Biener wrote: >>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote: >>>>>>> >>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>>>>>>> >>>>>>>>> ...actually attach the testcase... >>>>>>>> What compile options? >>>>>>> >>>>>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >>>>>>> I can't see anything not guaranteeing that: >>>>>>> >>>>>>> .section .rodata >>>>>>> .align 3 >>>>>>> .LANCHOR0 = . + 0 >>>>>>> .LC1: >>>>>>> .ascii "%d %g %d\012\000" >>>>>>> .space 6 >>>>>>> .LC0: >>>>>>> .word 7 >>>>>>> .space 4 >>>>>>> .word 0 >>>>>>> .word 1075838976 >>>>>>> .word 9 >>>>>>> .space 4 >>>>>>> >>>>>>> maybe there is some more generic code-gen bug for aligned aggregate >>>>>>> copy? That is, the patch tells the backend that the loads and >>>>>>> stores to the 'int' vars (which have padding followed) is aligned >>>>>>> to 8 bytes. >>>>>>> >>>>>>> I don't see what is wrong in the final assembler, but maybe >>>>>>> some endian issue exists? The code looks quite ugly though ;) >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>>> Alan Lawrence wrote: >>>>>>>>>> We've been seeing a bunch of new failures in the *libffi* >>>>>> testsuite on ARM >>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >>>>>> following this >>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached >>>>>> (including >>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints >>>>>> the >>>>>>>>>> expected >>>>>>>>>> 7 8 9 >>>>>>>>>> whereas with gcc r221348, instead it prints >>>>>>>>>> 0 8 0 >>>>>>>>>> >>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and >>>>>> a >>>>>>>>>> var_decl of b1, from 32 to 64; both have type >>>>>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte >>>>>> sizes-gimplified type_0 >>>>>>>>>> BLK >>>>>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>>>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>>>>>>>> align 64 symtab 0 alias set 1 canonical type >>>>>> 0x2b9b8d428d20 >>>>>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>>>>>>>> 0x2b9b8d092690 int> >>>>>>>>>> SI file reduced.c line 12 col 7 >>>>>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>>>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>>>>>>>> align 32 offset_align 64 >>>>>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>>>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> >>>>>> context >>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 >>>>>> D.6070> >>>>>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain >>>>>> <type_decl >>>>>>>>>> 0x2b9b8d42b000 D.6044>> >>>>>>>>>> >>>>>>>>>> The tree-optimized output is the same with both compilers (as this >>>>>> does not >>>>>>>>>> mention alignment); the expand output differs. >>>>>>>>>> >>>>>>>>>> Still investigating... >>>>>>>>>> >>>>>>>>>> --Alan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Richard Biener wrote: >>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>>>>>>> drops alignment info unnecessarily. >>>>>>>>>>> >>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> >>>>>>>>>>> >>>>>>>>>>> PR tree-optimization/65310 >>>>>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>>>>>>> alignment. >>>>>>>>>>> >>>>>>>>>>> Index: gcc/tree-sra.c >>>>>>>>>>> >>>>>> =================================================================== >>>>>>>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>>>>>>> +++ gcc/tree-sra.c (working copy) >>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>>>>>>> misalign = (misalign + offset) & (align - 1); >>>>>>>>>>> if (misalign != 0) >>>>>>>>>>> align = (misalign & -misalign); >>>>>>>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>>>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>>>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >>>>>> off); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>> >>>> >>> >>> >> >> >
On Tue, Mar 31, 2015 at 11:10:39AM +0100, Richard Earnshaw wrote: > >>> That is, > >>> > >>> typedef int myint __attribute__((aligned(8))); > >>> > >>> int main() > >>> { > >>> myint i = 1; > >>> int j = 2; > >>> __builtin_printf("%d %d\n", i, j); > >>> } > >>> > >>> or > >>> > >>> myint i; > >>> int j; > >>> myint *p = &i; > >>> int *q = &j; > >>> > >>> int main() > >>> { > >>> __builtin_printf("%d %d", *p, *q); > >>> } Note that starting with r221348, gcc fails to profiledbootstrap on armv7hl-linux-gnueabi. I'd hope it is the same thing. To middle-end, all integral type conversions that differ just in alignment are useless - for INTEGRAL_TYPE_P all useless_type_conversion_p cares about is sign and precision. So, if arm has a weirdo ABI that wants to pass aligned types differently (why, I'd say it is just a bug in the ABI), then for named arguments it really has to look at the function type - the type of the argument, rather than the passed in value's type, and for varargs it would be best if it remembered such thing early (in the FEs), because the middle-end can change things any time, with or without Richard B.'s recent SRA change. Jakub
On Tue, 31 Mar 2015, Richard Earnshaw wrote: > On 31/03/15 11:20, Richard Biener wrote: > > On Tue, 31 Mar 2015, Richard Biener wrote: > > > >> On Tue, 31 Mar 2015, Richard Earnshaw wrote: > >> > >>> On 31/03/15 08:50, Richard Biener wrote: > >>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: > >>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: > >>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes > >>>>>> it. > >>>>>> > >>>>>> The problem appears to be in laying out arguments, specifically > >>>>>> varargs. From > >>>>>> the "good" -fdump-rtl-expand: > >>>>>> > >>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 > >>>>>> S4 A32]) > >>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2) > >>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1) > >>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) > >>>>>> (reg:SI 118)) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (call_insn 22 21 23 2 (parallel [ > >>>>>> (set (reg:SI 0 r0) > >>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > >>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 > >>>>>> A32]) > >>>>>> > >>>>>> The struct members are > >>>>>> reg:SI 113 => int a; > >>>>>> reg:DF 112 => double b; > >>>>>> reg:SI 111 => int c; > >>>>>> > >>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is > >>>>>> pushed > >>>>>> into virtual-outgoing-args. In contrast, post-change to > >>>>>> build_ref_of_offset, we get: > >>>>>> > >>>>>> (insn 17 16 18 2 (set (reg:SI 118) > >>>>>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 > >>>>>> *.LC1>)) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 > >>>>>> virtual-outgoing-args) > >>>>>> (const_int 8 [0x8])) [0 S4 A64]) > >>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 > >>>>>> S8 A64]) > >>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2) > >>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) > >>>>>> (reg:SI 118)) reduced.c:14 -1 > >>>>>> (nil)) > >>>>>> (call_insn 22 21 23 2 (parallel [ > >>>>>> (set (reg:SI 0 r0) > >>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] > >>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 > >>>>>> A32]) > >>>>>> > >>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the > >>>>>> > >>>>>> double+following int are all pushed into virtual-outgoing-args. This is > >>>>>> because > >>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second > >>>>>> argument (the > >>>>>> type constructed by build_ref_for_offset); it then executes > >>>>>> (aapcs_layout_arg, > >>>>>> arm.c line ~~5914) > >>>>>> > >>>>>> /* C3 - For double-word aligned arguments, round the NCRN up to the > >>>>>> next even number. */ > >>>>>> ncrn = pcum->aapcs_ncrn; > >>>>>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) > >>>>>> ncrn++; > >>>>>> > >>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the > >>>>>> testcase > >>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a > >>>>>> 32-bit-aligned int instead, which works as previously. > >>>>>> > >>>>>> Passing the same members of that struct in a non-vargs call, works ok - > >>>>>> I think > >>>>>> because these use the type of the declared parameters, rather than the > >>>>>> provided > >>>>>> arguments, and the former do not have the increased alignment from > >>>>>> build_ref_for_offset. > >>>>> > >>>>> It doesn't make sense to use the alignment of passed values. That looks like bs. > >>>>> > >>>>> This means that > >>>>> > >>>>> Int I __aligned__(8); > >>>>> > >>>>> Is passed differently than int. > >>>>> > >>>>> Arm_function_arg needs to be fixed. > >>>> > >>>> That is, > >>>> > >>>> typedef int myint __attribute__((aligned(8))); > >>>> > >>>> int main() > >>>> { > >>>> myint i = 1; > >>>> int j = 2; > >>>> __builtin_printf("%d %d\n", i, j); > >>>> } > >>>> > >>>> or > >>>> > >>>> myint i; > >>>> int j; > >>>> myint *p = &i; > >>>> int *q = &j; > >>>> > >>>> int main() > >>>> { > >>>> __builtin_printf("%d %d", *p, *q); > >>>> } > >>>> > >>>> should behave the same. There isn't a printf modifier for an "aligned int" > >>>> because that sort of thing doesn't make sense. Special-casing aligned vs. > >>>> non-aligned values only makes sense for things passed by value on the stack. > >>>> And then obviously only dependent on the functuion type signature, not > >>>> on the type of the passed value. > >>>> > >>> > >>> I think the testcase is ill-formed. Just because printf doesn't have > >>> such a modifier, doesn't mean that another variadic function might not > >>> have a means to detect when an object in the variadic list needs to be > >>> over-aligned. As such, the test should really be written as: > >> > >> A value doesn't have "alignment". A function may have alignment > >> requirements on its arguments, clearly printf doesn't. > > > > Btw, it looks like function_arg is supposed to pass whether the argument > > is to the variadic part of the function or not, but it gets passed > > false as in > > > > args[i].reg = targetm.calls.function_arg (args_so_far, mode, type, > > argpos < n_named_args); > > > > n_named_args is 4. That is because ARM asks to do this via > > > > if (type_arg_types != 0 > > && targetm.calls.strict_argument_naming (args_so_far)) > > ; > > else if (type_arg_types != 0 > > && ! targetm.calls.pretend_outgoing_varargs_named > > (args_so_far)) > > /* Don't include the last named arg. */ > > --n_named_args; > > else > > /* Treat all args as named. */ > > n_named_args = num_actuals; > > > > thus you lose that info (you could have that readily available). > > > > From the calling side of a function it shouldn't matter. They only > thing the caller has to know is that the function is variadic (and > therefore that the base-standard rules from the AAPCS apply -- no use of > FP registers for parameters). The behaviour after that is *as if* all > the arguments were pushed onto the stack in the correct order and > finally the lowest 4 words are popped off the stack again into r0-r3. > Hence any alignment that would apply to arguments on the stack has to be > reflected in their allocation into r0-r3 (since the push/pop of those > registers never happens). But how do you compute the alignment of, say a myint '1'? Of course __attribute__((aligned())) is a C extension thus AAPCS likely doesn't consider it. But yes, as given even on x86_64 we might spill a 8-byte aligned int register to a 8-byte aligned stack slot so the proposed patch makes sense in that regard. I wonder how many other passes can get confused by this (PRE, I suppose, inserting an explicitely aligned load as well as all passes after the vectorizer which also creates loads/store with explicit (usually lower) alignment). Richard. > R. > > >>> typedef int myint __attribute__((aligned(8))); > >>> > >>> int main() > >>> { > >>> myint i = 1; > >>> int j = 2; > >>> __builtin_printf("%d %d\n", (int) i, j); > >>> } > >>> > >>> Variadic functions take the types of their arguments from the types of > >>> the actuals passed. The compiler should either be applying appropriate > >>> promotion rules to make the types conformant by the language > >>> specification or respecting the types exactly. However, that should be > >>> done in the mid-end not the back-end. If incorrect alignment > >>> information is passed to the back-end it can't help but make the wrong > >>> choice. Examining the mode here wouldn't help either. Consider: > >>> > >>> int foo (int a, int b, int c, int d, int e, int f > >>> __attribute__((aligned(8)))); > >>> > >>> On ARM that should pass f in a 64-bit aligned location (since the > >>> parameter will be on the stack). > >>> > >>> R. > >>> > >>> > >>>> Richard. > >>>> > >>>>> Richard. > >>>>> > >>>>>> > >>>>>> FWIW, I also tried: > >>>>>> > >>>>>> __attribute__((__aligned__((16)))) int x; > >>>>>> int main (void) > >>>>>> { > >>>>>> __builtin_printf("%d\n", x); > >>>>>> } > >>>>>> > >>>>>> but in that case, the arm_function_arg is still fed a type with > >>>>>> alignment 32 > >>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which > >>>>>> has > >>>>>> alignment 128. > >>>>>> > >>>>>> --Alan > >>>>>> > >>>>>> Richard Biener wrote: > >>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote: > >>>>>>> > >>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: > >>>>>>>> > >>>>>>>>> ...actually attach the testcase... > >>>>>>>> What compile options? > >>>>>>> > >>>>>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but > >>>>>>> I can't see anything not guaranteeing that: > >>>>>>> > >>>>>>> .section .rodata > >>>>>>> .align 3 > >>>>>>> .LANCHOR0 = . + 0 > >>>>>>> .LC1: > >>>>>>> .ascii "%d %g %d\012\000" > >>>>>>> .space 6 > >>>>>>> .LC0: > >>>>>>> .word 7 > >>>>>>> .space 4 > >>>>>>> .word 0 > >>>>>>> .word 1075838976 > >>>>>>> .word 9 > >>>>>>> .space 4 > >>>>>>> > >>>>>>> maybe there is some more generic code-gen bug for aligned aggregate > >>>>>>> copy? That is, the patch tells the backend that the loads and > >>>>>>> stores to the 'int' vars (which have padding followed) is aligned > >>>>>>> to 8 bytes. > >>>>>>> > >>>>>>> I don't see what is wrong in the final assembler, but maybe > >>>>>>> some endian issue exists? The code looks quite ugly though ;) > >>>>>>> > >>>>>>> Richard. > >>>>>>> > >>>>>>>>> Alan Lawrence wrote: > >>>>>>>>>> We've been seeing a bunch of new failures in the *libffi* > >>>>>> testsuite on ARM > >>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), > >>>>>> following this > >>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached > >>>>>> (including > >>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints > >>>>>> the > >>>>>>>>>> expected > >>>>>>>>>> 7 8 9 > >>>>>>>>>> whereas with gcc r221348, instead it prints > >>>>>>>>>> 0 8 0 > >>>>>>>>>> > >>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and > >>>>>> a > >>>>>>>>>> var_decl of b1, from 32 to 64; both have type > >>>>>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte > >>>>>> sizes-gimplified type_0 > >>>>>>>>>> BLK > >>>>>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> > >>>>>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> > >>>>>>>>>> align 64 symtab 0 alias set 1 canonical type > >>>>>> 0x2b9b8d428d20 > >>>>>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type > >>>>>>>>>> 0x2b9b8d092690 int> > >>>>>>>>>> SI file reduced.c line 12 col 7 > >>>>>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> > >>>>>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> > >>>>>>>>>> align 32 offset_align 64 > >>>>>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> > >>>>>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> > >>>>>> context > >>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl > >>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 > >>>>>> D.6070> > >>>>>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain > >>>>>> <type_decl > >>>>>>>>>> 0x2b9b8d42b000 D.6044>> > >>>>>>>>>> > >>>>>>>>>> The tree-optimized output is the same with both compilers (as this > >>>>>> does not > >>>>>>>>>> mention alignment); the expand output differs. > >>>>>>>>>> > >>>>>>>>>> Still investigating... > >>>>>>>>>> > >>>>>>>>>> --Alan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Richard Biener wrote: > >>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA > >>>>>>>>>>> drops alignment info unnecessarily. > >>>>>>>>>>> > >>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. > >>>>>>>>>>> > >>>>>>>>>>> Richard. > >>>>>>>>>>> > >>>>>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> > >>>>>>>>>>> > >>>>>>>>>>> PR tree-optimization/65310 > >>>>>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger > >>>>>>>>>>> alignment. > >>>>>>>>>>> > >>>>>>>>>>> Index: gcc/tree-sra.c > >>>>>>>>>>> > >>>>>> =================================================================== > >>>>>>>>>>> --- gcc/tree-sra.c (revision 221324) > >>>>>>>>>>> +++ gcc/tree-sra.c (working copy) > >>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr > >>>>>>>>>>> misalign = (misalign + offset) & (align - 1); > >>>>>>>>>>> if (misalign != 0) > >>>>>>>>>>> align = (misalign & -misalign); > >>>>>>>>>>> - if (align < TYPE_ALIGN (exp_type)) > >>>>>>>>>>> + if (align != TYPE_ALIGN (exp_type)) > >>>>>>>>>>> exp_type = build_aligned_type (exp_type, align); > >>>>>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, > >>>>>> off); > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >> > >> > > > >
On 31/03/15 11:44, Richard Biener wrote: > On Tue, 31 Mar 2015, Richard Earnshaw wrote: > >> On 31/03/15 11:20, Richard Biener wrote: >>> On Tue, 31 Mar 2015, Richard Biener wrote: >>> >>>> On Tue, 31 Mar 2015, Richard Earnshaw wrote: >>>> >>>>> On 31/03/15 08:50, Richard Biener wrote: >>>>>> On Mon, Mar 30, 2015 at 10:13 PM, Richard Biener <rguenther@suse.de> wrote: >>>>>>> On March 30, 2015 6:45:34 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote: >>>>>>>> -O2 was what I first used; it also occurs at -O1. -fno-tree-sra fixes >>>>>>>> it. >>>>>>>> >>>>>>>> The problem appears to be in laying out arguments, specifically >>>>>>>> varargs. From >>>>>>>> the "good" -fdump-rtl-expand: >>>>>>>> >>>>>>>> (insn 18 17 19 2 (set (mem:SI (reg/f:SI 107 virtual-outgoing-args) [0 >>>>>>>> S4 A32]) >>>>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 19 18 20 2 (set (reg:DF 2 r2) >>>>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 20 19 21 2 (set (reg:SI 1 r1) >>>>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>>>>> (reg:SI 118)) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (call_insn 22 21 23 2 (parallel [ >>>>>>>> (set (reg:SI 0 r0) >>>>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>>>>> <function_decl 0x2ab50e856d00 __builtin_printf>) [0 __builtin_printf S4 >>>>>>>> A32]) >>>>>>>> >>>>>>>> The struct members are >>>>>>>> reg:SI 113 => int a; >>>>>>>> reg:DF 112 => double b; >>>>>>>> reg:SI 111 => int c; >>>>>>>> >>>>>>>> r0 gets the format string; r1 gets int a; r2+r3 get double b; int c is >>>>>>>> pushed >>>>>>>> into virtual-outgoing-args. In contrast, post-change to >>>>>>>> build_ref_of_offset, we get: >>>>>>>> >>>>>>>> (insn 17 16 18 2 (set (reg:SI 118) >>>>>>>> (symbol_ref/v/f:SI ("*.LC1") [flags 0x82] <var_decl 0x2ba57fa8d750 >>>>>>>> *.LC1>)) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 18 17 19 2 (set (mem:SI (plus:SI (reg/f:SI 107 >>>>>>>> virtual-outgoing-args) >>>>>>>> (const_int 8 [0x8])) [0 S4 A64]) >>>>>>>> (reg:SI 111 [ b1$16 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 19 18 20 2 (set (mem:DF (reg/f:SI 107 virtual-outgoing-args) [0 >>>>>>>> S8 A64]) >>>>>>>> (reg:DF 112 [ b1$8 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 20 19 21 2 (set (reg:SI 2 r2) >>>>>>>> (reg:SI 113 [ b1 ])) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (insn 21 20 22 2 (set (reg:SI 0 r0) >>>>>>>> (reg:SI 118)) reduced.c:14 -1 >>>>>>>> (nil)) >>>>>>>> (call_insn 22 21 23 2 (parallel [ >>>>>>>> (set (reg:SI 0 r0) >>>>>>>> (call (mem:SI (symbol_ref:SI ("printf") [flags 0x41] >>>>>>>> <function_decl 0x2ba57f843d00 __builtin_printf>) [0 __builtin_printf S4 >>>>>>>> A32]) >>>>>>>> >>>>>>>> r0 still gets the format string, but 'int b1.a' now goes in r2, and the >>>>>>>> >>>>>>>> double+following int are all pushed into virtual-outgoing-args. This is >>>>>>>> because >>>>>>>> arm_function_arg is fed a 64-bit-aligned int as type of the second >>>>>>>> argument (the >>>>>>>> type constructed by build_ref_for_offset); it then executes >>>>>>>> (aapcs_layout_arg, >>>>>>>> arm.c line ~~5914) >>>>>>>> >>>>>>>> /* C3 - For double-word aligned arguments, round the NCRN up to the >>>>>>>> next even number. */ >>>>>>>> ncrn = pcum->aapcs_ncrn; >>>>>>>> if ((ncrn & 1) && arm_needs_doubleword_align (mode, type)) >>>>>>>> ncrn++; >>>>>>>> >>>>>>>> Which changes r1 to r2. Passing -fno-tree-sra, or removing from the >>>>>>>> testcase >>>>>>>> "*(cls_struct_16byte *)resp = b1", causes arm_function_arg to be fed a >>>>>>>> 32-bit-aligned int instead, which works as previously. >>>>>>>> >>>>>>>> Passing the same members of that struct in a non-vargs call, works ok - >>>>>>>> I think >>>>>>>> because these use the type of the declared parameters, rather than the >>>>>>>> provided >>>>>>>> arguments, and the former do not have the increased alignment from >>>>>>>> build_ref_for_offset. >>>>>>> >>>>>>> It doesn't make sense to use the alignment of passed values. That looks like bs. >>>>>>> >>>>>>> This means that >>>>>>> >>>>>>> Int I __aligned__(8); >>>>>>> >>>>>>> Is passed differently than int. >>>>>>> >>>>>>> Arm_function_arg needs to be fixed. >>>>>> >>>>>> That is, >>>>>> >>>>>> typedef int myint __attribute__((aligned(8))); >>>>>> >>>>>> int main() >>>>>> { >>>>>> myint i = 1; >>>>>> int j = 2; >>>>>> __builtin_printf("%d %d\n", i, j); >>>>>> } >>>>>> >>>>>> or >>>>>> >>>>>> myint i; >>>>>> int j; >>>>>> myint *p = &i; >>>>>> int *q = &j; >>>>>> >>>>>> int main() >>>>>> { >>>>>> __builtin_printf("%d %d", *p, *q); >>>>>> } >>>>>> >>>>>> should behave the same. There isn't a printf modifier for an "aligned int" >>>>>> because that sort of thing doesn't make sense. Special-casing aligned vs. >>>>>> non-aligned values only makes sense for things passed by value on the stack. >>>>>> And then obviously only dependent on the functuion type signature, not >>>>>> on the type of the passed value. >>>>>> >>>>> >>>>> I think the testcase is ill-formed. Just because printf doesn't have >>>>> such a modifier, doesn't mean that another variadic function might not >>>>> have a means to detect when an object in the variadic list needs to be >>>>> over-aligned. As such, the test should really be written as: >>>> >>>> A value doesn't have "alignment". A function may have alignment >>>> requirements on its arguments, clearly printf doesn't. >>> >>> Btw, it looks like function_arg is supposed to pass whether the argument >>> is to the variadic part of the function or not, but it gets passed >>> false as in >>> >>> args[i].reg = targetm.calls.function_arg (args_so_far, mode, type, >>> argpos < n_named_args); >>> >>> n_named_args is 4. That is because ARM asks to do this via >>> >>> if (type_arg_types != 0 >>> && targetm.calls.strict_argument_naming (args_so_far)) >>> ; >>> else if (type_arg_types != 0 >>> && ! targetm.calls.pretend_outgoing_varargs_named >>> (args_so_far)) >>> /* Don't include the last named arg. */ >>> --n_named_args; >>> else >>> /* Treat all args as named. */ >>> n_named_args = num_actuals; >>> >>> thus you lose that info (you could have that readily available). >>> >> >> From the calling side of a function it shouldn't matter. They only >> thing the caller has to know is that the function is variadic (and >> therefore that the base-standard rules from the AAPCS apply -- no use of >> FP registers for parameters). The behaviour after that is *as if* all >> the arguments were pushed onto the stack in the correct order and >> finally the lowest 4 words are popped off the stack again into r0-r3. >> Hence any alignment that would apply to arguments on the stack has to be >> reflected in their allocation into r0-r3 (since the push/pop of those >> registers never happens). > > But how do you compute the alignment of, say a myint '1'? Of course > __attribute__((aligned())) is a C extension thus AAPCS likely doesn't > consider it. > Front-end problem :-) > But yes, as given even on x86_64 we might spill a 8-byte aligned > int register to a 8-byte aligned stack slot so the proposed patch > makes sense in that regard. I wonder how many other passes can > get confused by this (PRE, I suppose, inserting an explicitely > aligned load as well as all passes after the vectorizer which > also creates loads/store with explicit (usually lower) alignment). > > Richard. > >> R. >> >>>>> typedef int myint __attribute__((aligned(8))); >>>>> >>>>> int main() >>>>> { >>>>> myint i = 1; >>>>> int j = 2; >>>>> __builtin_printf("%d %d\n", (int) i, j); >>>>> } >>>>> >>>>> Variadic functions take the types of their arguments from the types of >>>>> the actuals passed. The compiler should either be applying appropriate >>>>> promotion rules to make the types conformant by the language >>>>> specification or respecting the types exactly. However, that should be >>>>> done in the mid-end not the back-end. If incorrect alignment >>>>> information is passed to the back-end it can't help but make the wrong >>>>> choice. Examining the mode here wouldn't help either. Consider: >>>>> >>>>> int foo (int a, int b, int c, int d, int e, int f >>>>> __attribute__((aligned(8)))); >>>>> >>>>> On ARM that should pass f in a 64-bit aligned location (since the >>>>> parameter will be on the stack). >>>>> >>>>> R. >>>>> >>>>> >>>>>> Richard. >>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> >>>>>>>> FWIW, I also tried: >>>>>>>> >>>>>>>> __attribute__((__aligned__((16)))) int x; >>>>>>>> int main (void) >>>>>>>> { >>>>>>>> __builtin_printf("%d\n", x); >>>>>>>> } >>>>>>>> >>>>>>>> but in that case, the arm_function_arg is still fed a type with >>>>>>>> alignment 32 >>>>>>>> (bits), i.e. distinct from the type of the field 'x' in memory, which >>>>>>>> has >>>>>>>> alignment 128. >>>>>>>> >>>>>>>> --Alan >>>>>>>> >>>>>>>> Richard Biener wrote: >>>>>>>>> On Mon, 30 Mar 2015, Richard Biener wrote: >>>>>>>>> >>>>>>>>>> On Mon, 30 Mar 2015, Alan Lawrence wrote: >>>>>>>>>> >>>>>>>>>>> ...actually attach the testcase... >>>>>>>>>> What compile options? >>>>>>>>> >>>>>>>>> Just tried -O2. The GIMPLE IL assumes 64bit alignment of .LC0 but >>>>>>>>> I can't see anything not guaranteeing that: >>>>>>>>> >>>>>>>>> .section .rodata >>>>>>>>> .align 3 >>>>>>>>> .LANCHOR0 = . + 0 >>>>>>>>> .LC1: >>>>>>>>> .ascii "%d %g %d\012\000" >>>>>>>>> .space 6 >>>>>>>>> .LC0: >>>>>>>>> .word 7 >>>>>>>>> .space 4 >>>>>>>>> .word 0 >>>>>>>>> .word 1075838976 >>>>>>>>> .word 9 >>>>>>>>> .space 4 >>>>>>>>> >>>>>>>>> maybe there is some more generic code-gen bug for aligned aggregate >>>>>>>>> copy? That is, the patch tells the backend that the loads and >>>>>>>>> stores to the 'int' vars (which have padding followed) is aligned >>>>>>>>> to 8 bytes. >>>>>>>>> >>>>>>>>> I don't see what is wrong in the final assembler, but maybe >>>>>>>>> some endian issue exists? The code looks quite ugly though ;) >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>>> Alan Lawrence wrote: >>>>>>>>>>>> We've been seeing a bunch of new failures in the *libffi* >>>>>>>> testsuite on ARM >>>>>>>>>>>> Linux (arm-none-linux-gnueabi, arm-none-linux-gnueabihf), >>>>>>>> following this >>>>>>>>>>>> one-liner fix. I've reduced the testcase down to the attached >>>>>>>> (including >>>>>>>>>>>> removing any dependency on libffi); with gcc r221347, this prints >>>>>>>> the >>>>>>>>>>>> expected >>>>>>>>>>>> 7 8 9 >>>>>>>>>>>> whereas with gcc r221348, instead it prints >>>>>>>>>>>> 0 8 0 >>>>>>>>>>>> >>>>>>>>>>>> The action of r221348 is to change the alignment of a mem_ref, and >>>>>>>> a >>>>>>>>>>>> var_decl of b1, from 32 to 64; both have type >>>>>>>>>>>> type <record_type 0x2b9b8d428d20 cls_struct_16byte >>>>>>>> sizes-gimplified type_0 >>>>>>>>>>>> BLK >>>>>>>>>>>> size <integer_cst 0x2b9b8d3720a8 constant 192> >>>>>>>>>>>> unit size <integer_cst 0x2b9b8d372078 constant 24> >>>>>>>>>>>> align 64 symtab 0 alias set 1 canonical type >>>>>>>> 0x2b9b8d428d20 >>>>>>>>>>>> fields <field_decl 0x2b9b8d42b098 a type <integer_type >>>>>>>>>>>> 0x2b9b8d092690 int> >>>>>>>>>>>> SI file reduced.c line 12 col 7 >>>>>>>>>>>> size <integer_cst 0x2b9b8d08eeb8 constant 32> >>>>>>>>>>>> unit size <integer_cst 0x2b9b8d08eed0 constant 4> >>>>>>>>>>>> align 32 offset_align 64 >>>>>>>>>>>> offset <integer_cst 0x2b9b8d08eee8 constant 0> >>>>>>>>>>>> bit offset <integer_cst 0x2b9b8d08ef48 constant 0> >>>>>>>> context >>>>>>>>>>>> <record_type 0x2b9b8d428d20 cls_struct_16byte> chain <field_decl >>>>>>>>>>>> 0x2b9b8d42b130 b>> context <translation_unit_decl 0x2b9b8d4232d0 >>>>>>>> D.6070> >>>>>>>>>>>> pointer_to_this <pointer_type 0x2b9b8d42d0a8> chain >>>>>>>> <type_decl >>>>>>>>>>>> 0x2b9b8d42b000 D.6044>> >>>>>>>>>>>> >>>>>>>>>>>> The tree-optimized output is the same with both compilers (as this >>>>>>>> does not >>>>>>>>>>>> mention alignment); the expand output differs. >>>>>>>>>>>> >>>>>>>>>>>> Still investigating... >>>>>>>>>>>> >>>>>>>>>>>> --Alan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Richard Biener wrote: >>>>>>>>>>>>> This fixes a vectorizer testcase regression on powerpc where SRA >>>>>>>>>>>>> drops alignment info unnecessarily. >>>>>>>>>>>>> >>>>>>>>>>>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. >>>>>>>>>>>>> >>>>>>>>>>>>> Richard. >>>>>>>>>>>>> >>>>>>>>>>>>> 2015-03-11 Richard Biener <rguenther@suse.de> >>>>>>>>>>>>> >>>>>>>>>>>>> PR tree-optimization/65310 >>>>>>>>>>>>> * tree-sra.c (build_ref_for_offset): Also preserve larger >>>>>>>>>>>>> alignment. >>>>>>>>>>>>> >>>>>>>>>>>>> Index: gcc/tree-sra.c >>>>>>>>>>>>> >>>>>>>> =================================================================== >>>>>>>>>>>>> --- gcc/tree-sra.c (revision 221324) >>>>>>>>>>>>> +++ gcc/tree-sra.c (working copy) >>>>>>>>>>>>> @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr >>>>>>>>>>>>> misalign = (misalign + offset) & (align - 1); >>>>>>>>>>>>> if (misalign != 0) >>>>>>>>>>>>> align = (misalign & -misalign); >>>>>>>>>>>>> - if (align < TYPE_ALIGN (exp_type)) >>>>>>>>>>>>> + if (align != TYPE_ALIGN (exp_type)) >>>>>>>>>>>>> exp_type = build_aligned_type (exp_type, align); >>>>>>>>>>>>> mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, >>>>>>>> off); >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >> >> >
Index: gcc/tree-sra.c =================================================================== --- gcc/tree-sra.c (revision 221324) +++ gcc/tree-sra.c (working copy) @@ -1597,7 +1597,7 @@ build_ref_for_offset (location_t loc, tr misalign = (misalign + offset) & (align - 1); if (misalign != 0) align = (misalign & -misalign); - if (align < TYPE_ALIGN (exp_type)) + if (align != TYPE_ALIGN (exp_type)) exp_type = build_aligned_type (exp_type, align); mem_ref = fold_build2_loc (loc, MEM_REF, exp_type, base, off);