Message ID | 20130607192540.GH1493@tucnak.redhat.com |
---|---|
State | New |
Headers | show |
On 06/07/2013 12:25 PM, Jakub Jelinek wrote: > This PR is about DATA_ALIGNMENT macro increasing alignment of some decls > for optimization purposes beyond ABI mandated levels. It is fine to emit > the vars aligned as much as we want for optimization purposes, but if we > can't be sure that references to that decl bind to the definition we > increased the alignment on (e.g. common variables, or -fpic code without > hidden visibility, weak vars etc.), we can't assume that alignment. When the linker merges common blocks, it chooses both maximum size and maximum alignment. Thus for any common block for which we can prove the block must reside in the module (any executable, or hidden common in shared object), we can go ahead and use the increased alignment. It's only in shared objects with non-hidden common blocks that we have a problem, since in that case we may resolve the common block via copy reloc to a memory block in another module. So while decl_binds_to_current_def_p is a good starting point, I think we can do a little better with common blocks. Which ought to take care of those vectorization regressions you mention. > @@ -966,8 +966,12 @@ align_variable (tree decl, bool dont_out > align = MAX_OFILE_ALIGNMENT; > } > > - /* On some machines, it is good to increase alignment sometimes. */ > - if (! DECL_USER_ALIGN (decl)) > + /* On some machines, it is good to increase alignment sometimes. > + But as DECL_ALIGN is used both for actually emitting the variable > + and for code accessing the variable as guaranteed alignment, we > + can only increase the alignment if it is a performance optimization > + if the references to it must bind to the current definition. */ > + if (! DECL_USER_ALIGN (decl) && decl_binds_to_current_def_p (decl)) > { > #ifdef DATA_ALIGNMENT > unsigned int data_align = DATA_ALIGNMENT (TREE_TYPE (decl), align); > @@ -988,12 +992,69 @@ align_variable (tree decl, bool dont_out > } > #endif > } > +#ifdef DATA_ABI_ALIGNMENT > + else if (! DECL_USER_ALIGN (decl)) > + { > + unsigned int data_align = DATA_ABI_ALIGNMENT (TREE_TYPE (decl), align); > + /* For backwards compatibility, don't assume the ABI alignment for > + TLS variables. */ > + if (! DECL_THREAD_LOCAL_P (decl) || data_align <= BITS_PER_WORD) > + align = data_align; > + } > +#endif This structure would seem to do the wrong thing if DATA_ABI_ALIGNMENT is defined, but DATA_ALIGNMENT isn't. And while I realize you documented it, I don't like the restriction that D_A /must/ return something larger than D_A_A. All that means is that in complex cases D_A will have to call D_A_A itself. I would think that it would be better to rearrange as if (!D_U_A) { #ifdef D_A_A align = ... #endif #ifdef D_A if (d_b_t_c_d_p) align = ... #endif } Why the special case for TLS? If we really want that special case surely that test should go into D_A_A itself, and not here in generic code. > Bootstrapped/regtested on x86_64-linux and i686-linux. No idea about other > targets, I've kept them all using DATA_ALIGNMENT, which is considered > optimization increase only now, if there is some ABI mandated alignment > increase on other targets, that should be done in DATA_ABI_ALIGNMENT as > well as DATA_ALIGNMENT. I've had a brief look over the instances of D_A within the tree atm. Most of them carry the cut-n-paste comment "for the same reasons". These I believe never intended an ABI change, and were really only interested in optimization. But these I think require a good hard look to see if they really intended an ABI alignment: c6x comment explicitly mentions abi cris compiler options for alignment -- systemwide or local? mmix comment mentions GETA instruction s390 comment mentions LARL instruction rs6000 SPE and E500 portion of the alignment non-optional? Relevant port maintainers CCed. r~
On Fri, Jun 07, 2013 at 01:43:27PM -0700, Richard Henderson wrote: > On 06/07/2013 12:25 PM, Jakub Jelinek wrote: > > This PR is about DATA_ALIGNMENT macro increasing alignment of some decls > > for optimization purposes beyond ABI mandated levels. It is fine to emit > > the vars aligned as much as we want for optimization purposes, but if we > > can't be sure that references to that decl bind to the definition we > > increased the alignment on (e.g. common variables, or -fpic code without > > hidden visibility, weak vars etc.), we can't assume that alignment. > > When the linker merges common blocks, it chooses both maximum size and maximum > alignment. Thus for any common block for which we can prove the block must > reside in the module (any executable, or hidden common in shared object), we > can go ahead and use the increased alignment. But consider say: one TU: struct S { char buf[15]; } s __attribute__((aligned (32))); another TU: char c = 7; struct S { char buf[15]; } s = { { 1, 2 } }; char d = 8; int main () { return 0; } (the aligned(32) is there just to simulate the DATA_ALIGNMENT optimization increase). Linker warns about this (thus the question is if we want to increase the alignment for optimization on commons at all) and doesn't align it. > This structure would seem to do the wrong thing if DATA_ABI_ALIGNMENT is > defined, but DATA_ALIGNMENT isn't. And while I realize you documented it, I > don't like the restriction that D_A /must/ return something larger than D_A_A. > All that means is that in complex cases D_A will have to call D_A_A itself. Yeah, I guess I can rearrange it. The reason I wrote it that way was to avoid an extra function call, but that is probably not big enough overhead. > I would think that it would be better to rearrange as > > if (!D_U_A) > { > #ifdef D_A_A > align = ... > #endif > #ifdef D_A > if (d_b_t_c_d_p) > align = ... > #endif > } > > Why the special case for TLS? If we really want that special case surely that > test should go into D_A_A itself, and not here in generic code. I special case TLS for backwards ABI compatibility with GCC <= 4.8.1, because we ignored DATA_ALIGNMENT for TLS vars, even if what it returned wasn't just an optimization, but ABI requirement. So, if you have: __thread char a[16] = { 1, 2 }; in one shared library and __thread char a[16] = { 1, 2 }; use (a); in another shared library, compile one with GCC 4.8.1 e.g. and the other one with 4.9.0, if use (a) would assume the psABI mandated 16 byte alignment, but the actual definition was from 4.8.1 compiled object and was only 1 byte aligned, it could crash. Also, neither DATA_ALIGNMENT nor DATA_ABI_ALIGNMENT sees the decl itself, it just looks at type and previously assumed alignment. > > Bootstrapped/regtested on x86_64-linux and i686-linux. No idea about other > > targets, I've kept them all using DATA_ALIGNMENT, which is considered > > optimization increase only now, if there is some ABI mandated alignment > > increase on other targets, that should be done in DATA_ABI_ALIGNMENT as > > well as DATA_ALIGNMENT. > > I've had a brief look over the instances of D_A within the tree atm. Most of > them carry the cut-n-paste comment "for the same reasons". These I believe > never intended an ABI change, and were really only interested in optimization. Thanks for looking into this. > But these I think require a good hard look to see if they really intended an > ABI alignment: > > c6x comment explicitly mentions abi > cris compiler options for alignment -- systemwide or local? > mmix comment mentions GETA instruction > s390 comment mentions LARL instruction > rs6000 SPE and E500 portion of the alignment non-optional? > > Relevant port maintainers CCed. Jakub
On Fri, 7 Jun 2013, Richard Henderson wrote: > I've had a brief look over the instances of D_A within the tree atm. Most of > them carry the cut-n-paste comment "for the same reasons". These I believe > never intended an ABI change, and were really only interested in optimization. > > But these I think require a good hard look to see if they really intended an > ABI alignment: I'm not sure what is about to change how? > cris compiler options for alignment -- systemwide or local? No, DATA_ALIGNMENT in cris.h is not intended as an ABI indication, but as an optimization when emitting data. (This was the way to do it at the time. Has this changed?) The ABI is as indicated by BIGGEST_ALIGNMENT: 8 (bits; one byte). Nothing is guaranteed (to the data referer) to have a bigger alignment - unless otherwise indicated by attribute-align. (Unfortunately I can't change BIGGEST_ALIGNMENT to indicate that atomic variables require "natural alignment", or actually not to straddle a cache-boundary, as increasing BIGGEST_ALIGNMENT makes GCC change the ABI. But that's a slightly different issue.) > mmix comment mentions GETA instruction Yep, data must be at least 32-bit-aligned so addresses can be formed with a GETA insn. BIGGEST_ALIGNMENT is 64 though and STRICT_ALIGNMENT; natural alignment is required for proper interpretation as the low bits are ignored. brgds, H-P
On Fri, Jun 07, 2013 at 06:56:34PM -0400, Hans-Peter Nilsson wrote: > > cris compiler options for alignment -- systemwide or local? > > No, DATA_ALIGNMENT in cris.h is not intended as an ABI > indication, but as an optimization when emitting data. > (This was the way to do it at the time. Has this changed?) > > The ABI is as indicated by BIGGEST_ALIGNMENT: 8 (bits; one > byte). Nothing is guaranteed (to the data referer) to have a > bigger alignment - unless otherwise indicated by attribute-align. > > (Unfortunately I can't change BIGGEST_ALIGNMENT to indicate that > atomic variables require "natural alignment", or actually not to > straddle a cache-boundary, as increasing BIGGEST_ALIGNMENT makes > GCC change the ABI. But that's a slightly different issue.) Right now it is unfortunately part of ABI, which is something the patch attempts to cure in a backwards compatible way (i.e., data will be still alignment the way it used to be, but code will no longer assume it is that much aligned, unless it is DATA_ABI_ALIGNMENT). > > > mmix comment mentions GETA instruction > > Yep, data must be at least 32-bit-aligned so addresses can be > formed with a GETA insn. BIGGEST_ALIGNMENT is 64 though and > STRICT_ALIGNMENT; natural alignment is required for proper > interpretation as the low bits are ignored. Then mmix would probably want to define DATA_ABI_ALIGNMENT instead of DATA_ALIGNMENT after the patch (or both). Jakub
On 06/07/2013 10:43 PM, Richard Henderson wrote: > But these I think require a good hard look to see if they really intended an > ABI alignment: > > c6x comment explicitly mentions abi The ABI specifies a minimum alignment for arrays. Bernd
On Mon, Jun 10, 2013 at 12:51:05PM +0200, Bernd Schmidt wrote: > On 06/07/2013 10:43 PM, Richard Henderson wrote: > > But these I think require a good hard look to see if they really intended an > > ABI alignment: > > > > c6x comment explicitly mentions abi > > The ABI specifies a minimum alignment for arrays. Thus after the patch c6x.h (DATA_ALIGNMENT) should be renamed to DATA_ABI_ALIGNMENT, right? Jakub
On 06/10/2013 12:55 PM, Jakub Jelinek wrote: > On Mon, Jun 10, 2013 at 12:51:05PM +0200, Bernd Schmidt wrote: >> On 06/07/2013 10:43 PM, Richard Henderson wrote: >>> But these I think require a good hard look to see if they really intended an >>> ABI alignment: >>> >>> c6x comment explicitly mentions abi >> >> The ABI specifies a minimum alignment for arrays. > > Thus after the patch c6x.h (DATA_ALIGNMENT) should be renamed to > DATA_ABI_ALIGNMENT, right? I think so. Bernd
On 06/07/2013 02:14 PM, Jakub Jelinek wrote: >> > When the linker merges common blocks, it chooses both maximum size and maximum >> > alignment. Thus for any common block for which we can prove the block must >> > reside in the module (any executable, or hidden common in shared object), we >> > can go ahead and use the increased alignment. > But consider say: > one TU: > struct S { char buf[15]; } s __attribute__((aligned (32))); > > another TU: > char c = 7; > struct S { char buf[15]; } s = { { 1, 2 } }; > char d = 8; > int main () { return 0; } > > (the aligned(32) is there just to simulate the DATA_ALIGNMENT optimization > increase). Linker warns about this (thus the question is if we want to > increase the alignment for optimization on commons at all) and doesn't align > it. > Oh, right. I hadn't considered commons unifying with non-common variables, and the failure of commoning in that case. I'd mostly been thinking of uninitialized Fortran-like common blocks, where it is more common for the blocks to have nothing in common but the name. r~
The e500v2 (SPE) hardware is such that if the address of vector (double world load / stores) are not double world aligned the instruction will trap. So this alignment is not optional. Edmar On Fri, Jun 7, 2013 at 3:43 PM, Richard Henderson <rth@redhat.com> wrote: > On 06/07/2013 12:25 PM, Jakub Jelinek wrote: >> This PR is about DATA_ALIGNMENT macro increasing alignment of some decls >> for optimization purposes beyond ABI mandated levels. It is fine to emit >> the vars aligned as much as we want for optimization purposes, but if we >> can't be sure that references to that decl bind to the definition we >> increased the alignment on (e.g. common variables, or -fpic code without >> hidden visibility, weak vars etc.), we can't assume that alignment. > > When the linker merges common blocks, it chooses both maximum size and maximum > alignment. Thus for any common block for which we can prove the block must > reside in the module (any executable, or hidden common in shared object), we > can go ahead and use the increased alignment. > > It's only in shared objects with non-hidden common blocks that we have a > problem, since in that case we may resolve the common block via copy reloc to a > memory block in another module. > > So while decl_binds_to_current_def_p is a good starting point, I think we can > do a little better with common blocks. Which ought to take care of those > vectorization regressions you mention. > >> @@ -966,8 +966,12 @@ align_variable (tree decl, bool dont_out >> align = MAX_OFILE_ALIGNMENT; >> } >> >> - /* On some machines, it is good to increase alignment sometimes. */ >> - if (! DECL_USER_ALIGN (decl)) >> + /* On some machines, it is good to increase alignment sometimes. >> + But as DECL_ALIGN is used both for actually emitting the variable >> + and for code accessing the variable as guaranteed alignment, we >> + can only increase the alignment if it is a performance optimization >> + if the references to it must bind to the current definition. */ >> + if (! DECL_USER_ALIGN (decl) && decl_binds_to_current_def_p (decl)) >> { >> #ifdef DATA_ALIGNMENT >> unsigned int data_align = DATA_ALIGNMENT (TREE_TYPE (decl), align); >> @@ -988,12 +992,69 @@ align_variable (tree decl, bool dont_out >> } >> #endif >> } >> +#ifdef DATA_ABI_ALIGNMENT >> + else if (! DECL_USER_ALIGN (decl)) >> + { >> + unsigned int data_align = DATA_ABI_ALIGNMENT (TREE_TYPE (decl), align); >> + /* For backwards compatibility, don't assume the ABI alignment for >> + TLS variables. */ >> + if (! DECL_THREAD_LOCAL_P (decl) || data_align <= BITS_PER_WORD) >> + align = data_align; >> + } >> +#endif > > This structure would seem to do the wrong thing if DATA_ABI_ALIGNMENT is > defined, but DATA_ALIGNMENT isn't. And while I realize you documented it, I > don't like the restriction that D_A /must/ return something larger than D_A_A. > All that means is that in complex cases D_A will have to call D_A_A itself. > > I would think that it would be better to rearrange as > > if (!D_U_A) > { > #ifdef D_A_A > align = ... > #endif > #ifdef D_A > if (d_b_t_c_d_p) > align = ... > #endif > } > > Why the special case for TLS? If we really want that special case surely that > test should go into D_A_A itself, and not here in generic code. > >> Bootstrapped/regtested on x86_64-linux and i686-linux. No idea about other >> targets, I've kept them all using DATA_ALIGNMENT, which is considered >> optimization increase only now, if there is some ABI mandated alignment >> increase on other targets, that should be done in DATA_ABI_ALIGNMENT as >> well as DATA_ALIGNMENT. > > I've had a brief look over the instances of D_A within the tree atm. Most of > them carry the cut-n-paste comment "for the same reasons". These I believe > never intended an ABI change, and were really only interested in optimization. > > But these I think require a good hard look to see if they really intended an > ABI alignment: > > c6x comment explicitly mentions abi > cris compiler options for alignment -- systemwide or local? > mmix comment mentions GETA instruction > s390 comment mentions LARL instruction > rs6000 SPE and E500 portion of the alignment non-optional? > > Relevant port maintainers CCed. > > > r~
The change also affects vectorizer in avx case which could be seen for gcc.dg/tree-ssa/loop-19.c test. After the change report says loop-19_bad.c:16: note: === vect_analyze_data_refs_alignment === loop-19_bad.c:16: note: vect_compute_data_ref_alignment: loop-19_bad.c:16: note: can't force alignment of ref: a[j_9] loop-19_bad.c:16: note: vect_compute_data_ref_alignment: loop-19_bad.c:16: note: can't force alignment of ref: c[j_9] AFAICS first condition in ix86_data_alignment was true before the change so 256 was a return value. Do we need to tweak this test also? Thanks, Igor > Hi! > > This PR is about DATA_ALIGNMENT macro increasing alignment of some decls for optimization purposes beyond ABI mandated levels. It is fine to emit the vars aligned as much as we want for optimization purposes, but if we can't be sure that references to that decl bind to the definition we increased the alignment on (e.g. common variables, or -fpic code without hidden visibility, weak vars etc.), we can't assume that alignment. > As DECL_ALIGN is used for both the alignment emitted for the definitions and alignment assumed on code referring to it, this patch increases DECL_ALIGN only on decls where decl_binds_to_current_def_p is true, and otherwise the optimization part on top of that emits only when aligning definition. > On x86_64, DATA_ALIGNMENT macro was partly an optimization, partly ABI mandated alignment increase, so I've introduced a new macro, DATA_ABI_ALIGNMENT, which is the ABI mandated increase only (on x86-64 I think the only one is that arrays with size 16 bytes or more (and VLAs, but that is not handled by DATA*ALIGNMENT) are at least 16 byte aligned). > > Bootstrapped/regtested on x86_64-linux and i686-linux. No idea about other targets, I've kept them all using DATA_ALIGNMENT, which is considered optimization increase only now, if there is some ABI mandated alignment increase on other targets, that should be done in DATA_ABI_ALIGNMENT as well as DATA_ALIGNMENT. The patch causes some vectorization regressions (tweaked in the testsuite), especially for common vars where we used to align say common arrays to 256 bits rather than the ABI mandated 128 bits, or for -fpic code, but I'm afraid we need to live with that, if you compile another file with say icc or some other compiler which doesn't increase alignment beyond ABI mandated level and that other file defines the var say as non-common, we have wrong-code. > > 2013-06-07 Jakub Jelinek <jakub@redhat.com> > > PR target/56564 > * varasm.c (align_variable): Don't use DATA_ALIGNMENT or > CONSTANT_ALIGNMENT if !decl_binds_to_current_def_p (decl). > Use DATA_ABI_ALIGNMENT for that case instead if defined. > (get_variable_align): New function. > (get_variable_section, emit_bss, emit_common, > assemble_variable_contents, place_block_symbol): Use > get_variable_align instead of DECL_ALIGN. > (assemble_noswitch_variable): Add align argument, use it > instead of DECL_ALIGN. > (assemble_variable): Adjust caller. Use get_variable_align > instead of DECL_ALIGN. > * config/i386/i386.h (DATA_ALIGNMENT): Adjust x86_data_alignment > caller. > (DATA_ABI_ALIGNMENT): Define. > * config/i386/i386-protos.h (x86_data_alignment): Adjust prototype. > * config/i386/i386.c (x86_data_alignment): Add opt argument. If > opt is false, only return the psABI mandated alignment increase. > * doc/tm.texi.in (DATA_ABI_ALIGNMENT): Document. > * doc/tm.texi: Regenerated. > > * gcc.target/i386/pr56564-1.c: New test. > * gcc.target/i386/pr56564-2.c: New test. > * gcc.target/i386/pr56564-3.c: New test. > * gcc.target/i386/pr56564-4.c: New test. > * gcc.target/i386/avx256-unaligned-load-4.c: Add -fno-common. > * gcc.target/i386/avx256-unaligned-store-1.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-3.c: Likewise. > * gcc.target/i386/avx256-unaligned-store-4.c: Likewise. > * gcc.target/i386/vect-sizes-1.c: Likewise. > * gcc.target/i386/memcpy-1.c: Likewise. > * gcc.dg/vect/costmodel/i386/costmodel-vect-31.c (tmp): Initialize. > * gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c (tmp): Likewise. >
On Wed, Jun 19, 2013 at 11:01:59AM +0400, Igor Zamyatin wrote: > The change also affects vectorizer in avx case which could be seen for > gcc.dg/tree-ssa/loop-19.c test. > > After the change report says > > loop-19_bad.c:16: note: === vect_analyze_data_refs_alignment === > loop-19_bad.c:16: note: vect_compute_data_ref_alignment: > loop-19_bad.c:16: note: can't force alignment of ref: a[j_9] > loop-19_bad.c:16: note: vect_compute_data_ref_alignment: > loop-19_bad.c:16: note: can't force alignment of ref: c[j_9] > > AFAICS first condition in ix86_data_alignment was true before the > change so 256 was a return value. > > Do we need to tweak this test also? I'd add -fno-common to the test. Jakub
--- gcc/varasm.c.jj 2013-06-07 13:17:17.000000000 +0200 +++ gcc/varasm.c 2013-06-07 15:38:36.710908852 +0200 @@ -966,8 +966,12 @@ align_variable (tree decl, bool dont_out align = MAX_OFILE_ALIGNMENT; } - /* On some machines, it is good to increase alignment sometimes. */ - if (! DECL_USER_ALIGN (decl)) + /* On some machines, it is good to increase alignment sometimes. + But as DECL_ALIGN is used both for actually emitting the variable + and for code accessing the variable as guaranteed alignment, we + can only increase the alignment if it is a performance optimization + if the references to it must bind to the current definition. */ + if (! DECL_USER_ALIGN (decl) && decl_binds_to_current_def_p (decl)) { #ifdef DATA_ALIGNMENT unsigned int data_align = DATA_ALIGNMENT (TREE_TYPE (decl), align); @@ -988,12 +992,69 @@ align_variable (tree decl, bool dont_out } #endif } +#ifdef DATA_ABI_ALIGNMENT + else if (! DECL_USER_ALIGN (decl)) + { + unsigned int data_align = DATA_ABI_ALIGNMENT (TREE_TYPE (decl), align); + /* For backwards compatibility, don't assume the ABI alignment for + TLS variables. */ + if (! DECL_THREAD_LOCAL_P (decl) || data_align <= BITS_PER_WORD) + align = data_align; + } +#endif /* Reset the alignment in case we have made it tighter, so we can benefit from it in get_pointer_alignment. */ DECL_ALIGN (decl) = align; } +/* Return DECL_ALIGN (decl), possibly increased for optimization purposes + beyond what align_variable returned. */ + +static unsigned int +get_variable_align (tree decl) +{ + unsigned int align = DECL_ALIGN (decl); + + /* For user aligned vars or static vars align_variable already did + everything. */ + if (DECL_USER_ALIGN (decl) || !TREE_PUBLIC (decl)) + return align; + + /* For decls that bind to the current definition, align_variable + did also everything, except for not assuming ABI required alignment + of TLS variables. For other vars, increase the alignment here + as an optimization. */ + if (!decl_binds_to_current_def_p (decl)) + { + /* On some machines, it is good to increase alignment sometimes. */ +#ifdef DATA_ALIGNMENT + unsigned int data_align = DATA_ALIGNMENT (TREE_TYPE (decl), align); + /* Don't increase alignment too much for TLS variables - TLS space + is too precious. */ + if (! DECL_THREAD_LOCAL_P (decl) || data_align <= BITS_PER_WORD) + align = data_align; +#endif +#ifdef CONSTANT_ALIGNMENT + if (DECL_INITIAL (decl) != 0 && DECL_INITIAL (decl) != error_mark_node) + { + unsigned int const_align = CONSTANT_ALIGNMENT (DECL_INITIAL (decl), + align); + /* Don't increase alignment too much for TLS variables - TLS space + is too precious. */ + if (! DECL_THREAD_LOCAL_P (decl) || const_align <= BITS_PER_WORD) + align = const_align; + } + } +#endif + +#ifdef DATA_ABI_ALIGNMENT + if (DECL_THREAD_LOCAL_P (decl)) + return DATA_ABI_ALIGNMENT (TREE_TYPE (decl), align); +#endif + return align; +} + /* Return the section into which the given VAR_DECL or CONST_DECL should be placed. PREFER_NOSWITCH_P is true if a noswitch section should be used wherever possible. */ @@ -1043,7 +1104,8 @@ get_variable_section (tree decl, bool pr return bss_noswitch_section; } - return targetm.asm_out.select_section (decl, reloc, DECL_ALIGN (decl)); + return targetm.asm_out.select_section (decl, reloc, + get_variable_align (decl)); } /* Return the block into which object_block DECL should be placed. */ @@ -1780,7 +1842,8 @@ emit_bss (tree decl ATTRIBUTE_UNUSED, unsigned HOST_WIDE_INT rounded ATTRIBUTE_UNUSED) { #if defined ASM_OUTPUT_ALIGNED_BSS - ASM_OUTPUT_ALIGNED_BSS (asm_out_file, decl, name, size, DECL_ALIGN (decl)); + ASM_OUTPUT_ALIGNED_BSS (asm_out_file, decl, name, size, + get_variable_align (decl)); return true; #endif } @@ -1796,10 +1859,11 @@ emit_common (tree decl ATTRIBUTE_UNUSED, { #if defined ASM_OUTPUT_ALIGNED_DECL_COMMON ASM_OUTPUT_ALIGNED_DECL_COMMON (asm_out_file, decl, name, - size, DECL_ALIGN (decl)); + size, get_variable_align (decl)); return true; #elif defined ASM_OUTPUT_ALIGNED_COMMON - ASM_OUTPUT_ALIGNED_COMMON (asm_out_file, name, size, DECL_ALIGN (decl)); + ASM_OUTPUT_ALIGNED_COMMON (asm_out_file, name, size, + get_variable_align (decl)); return true; #else ASM_OUTPUT_COMMON (asm_out_file, name, size, rounded); @@ -1828,7 +1892,8 @@ emit_tls_common (tree decl ATTRIBUTE_UNU NAME is the name of DECL's SYMBOL_REF. */ static void -assemble_noswitch_variable (tree decl, const char *name, section *sect) +assemble_noswitch_variable (tree decl, const char *name, section *sect, + unsigned int align) { unsigned HOST_WIDE_INT size, rounded; @@ -1850,7 +1915,7 @@ assemble_noswitch_variable (tree decl, c * (BIGGEST_ALIGNMENT / BITS_PER_UNIT)); if (!sect->noswitch.callback (decl, name, size, rounded) - && (unsigned HOST_WIDE_INT) DECL_ALIGN_UNIT (decl) > rounded) + && (unsigned HOST_WIDE_INT) (align / BITS_PER_UNIT) > rounded) warning (0, "requested alignment for %q+D is greater than " "implemented alignment of %wu", decl, rounded); } @@ -1880,7 +1945,7 @@ assemble_variable_contents (tree decl, c /* Output the actual data. */ output_constant (DECL_INITIAL (decl), tree_low_cst (DECL_SIZE_UNIT (decl), 1), - DECL_ALIGN (decl)); + get_variable_align (decl)); else /* Leave space for it. */ assemble_zeros (tree_low_cst (DECL_SIZE_UNIT (decl), 1)); @@ -1904,6 +1969,7 @@ assemble_variable (tree decl, int top_le const char *name; rtx decl_rtl, symbol; section *sect; + unsigned int align; bool asan_protected = false; /* This function is supposed to handle VARIABLES. Ensure we have one. */ @@ -2003,6 +2069,8 @@ assemble_variable (tree decl, int top_le set_mem_align (decl_rtl, DECL_ALIGN (decl)); + align = get_variable_align (decl); + if (TREE_PUBLIC (decl)) maybe_assemble_visibility (decl); @@ -2032,12 +2100,12 @@ assemble_variable (tree decl, int top_le place_block_symbol (symbol); } else if (SECTION_STYLE (sect) == SECTION_NOSWITCH) - assemble_noswitch_variable (decl, name, sect); + assemble_noswitch_variable (decl, name, sect, align); else { switch_to_section (sect); - if (DECL_ALIGN (decl) > BITS_PER_UNIT) - ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (DECL_ALIGN_UNIT (decl))); + if (align > BITS_PER_UNIT) + ASM_OUTPUT_ALIGN (asm_out_file, floor_log2 (align / BITS_PER_UNIT)); assemble_variable_contents (decl, name, dont_output_data); if (asan_protected) { @@ -6967,7 +7035,7 @@ place_block_symbol (rtx symbol) else { decl = SYMBOL_REF_DECL (symbol); - alignment = DECL_ALIGN (decl); + alignment = get_variable_align (decl); size = tree_low_cst (DECL_SIZE_UNIT (decl), 1); if (flag_asan && asan_protect_global (decl)) { --- gcc/config/i386/i386.h.jj 2013-06-03 19:15:34.000000000 +0200 +++ gcc/config/i386/i386.h 2013-06-07 14:48:36.430589051 +0200 @@ -859,7 +859,19 @@ enum target_cpu_default cause character arrays to be word-aligned so that `strcpy' calls that copy constants to character arrays can be done inline. */ -#define DATA_ALIGNMENT(TYPE, ALIGN) ix86_data_alignment ((TYPE), (ALIGN)) +#define DATA_ALIGNMENT(TYPE, ALIGN) \ + ix86_data_alignment ((TYPE), (ALIGN), true) + +/* Similar to DATA_ALIGNMENT, but for the cases where the ABI mandates + some alignment increase, instead of optimization only purposes. E.g. + AMD x86-64 psABI says that variables with array type larger than 15 bytes + must be aligned to 16 byte boundaries. DATA_ALIGNMENT should always + return the same or larger value than DATA_ABI_ALIGNMENT. + + If this macro is not defined, then ALIGN is used. */ + +#define DATA_ABI_ALIGNMENT(TYPE, ALIGN) \ + ix86_data_alignment ((TYPE), (ALIGN), false) /* If defined, a C expression to compute the alignment for a local variable. TYPE is the data type, and ALIGN is the alignment that --- gcc/config/i386/i386-protos.h.jj 2013-05-14 21:30:19.000000000 +0200 +++ gcc/config/i386/i386-protos.h 2013-06-07 13:31:21.937823575 +0200 @@ -207,7 +207,7 @@ extern void init_cumulative_args (CUMULA #endif /* RTX_CODE */ #ifdef TREE_CODE -extern int ix86_data_alignment (tree, int); +extern int ix86_data_alignment (tree, int, bool); extern unsigned int ix86_local_alignment (tree, enum machine_mode, unsigned int); extern unsigned int ix86_minimum_alignment (tree, enum machine_mode, --- gcc/config/i386/i386.c.jj 2013-06-07 13:17:17.000000000 +0200 +++ gcc/config/i386/i386.c 2013-06-07 13:37:24.845416361 +0200 @@ -25375,11 +25375,12 @@ ix86_constant_alignment (tree exp, int a instead of that alignment to align the object. */ int -ix86_data_alignment (tree type, int align) +ix86_data_alignment (tree type, int align, bool opt) { int max_align = optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT); - if (AGGREGATE_TYPE_P (type) + if (opt + && AGGREGATE_TYPE_P (type) && TYPE_SIZE (type) && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= (unsigned) max_align @@ -25391,14 +25392,17 @@ ix86_data_alignment (tree type, int alig to 16byte boundary. */ if (TARGET_64BIT) { - if (AGGREGATE_TYPE_P (type) - && TYPE_SIZE (type) - && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST - && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 128 - || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 128) + if ((opt ? AGGREGATE_TYPE_P (type) : TREE_CODE (type) == ARRAY_TYPE) + && TYPE_SIZE (type) + && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST + && (TREE_INT_CST_LOW (TYPE_SIZE (type)) >= 128 + || TREE_INT_CST_HIGH (TYPE_SIZE (type))) && align < 128) return 128; } + if (!opt) + return align; + if (TREE_CODE (type) == ARRAY_TYPE) { if (TYPE_MODE (TREE_TYPE (type)) == DFmode && align < 64) --- gcc/doc/tm.texi.in.jj 2013-06-01 14:47:17.000000000 +0200 +++ gcc/doc/tm.texi.in 2013-06-07 14:47:10.192003375 +0200 @@ -1062,6 +1062,16 @@ arrays to be word-aligned so that @code{ constants to character arrays can be done inline. @end defmac +@defmac DATA_ABI_ALIGNMENT (@var{type}, @var{basic-align}) +Similar to @code{DATA_ALIGNMENT}, but for the cases where the ABI mandates +some alignment increase, instead of optimization only purposes. E.g.@ +AMD x86-64 psABI says that variables with array type larger than 15 bytes +must be aligned to 16 byte boundaries. @code{DATA_ALIGNMENT} should always +return the same or larger value than @code{DATA_ABI_ALIGNMENT}. + +If this macro is not defined, then @var{basic-align} is used. +@end defmac + @defmac CONSTANT_ALIGNMENT (@var{constant}, @var{basic-align}) If defined, a C expression to compute the alignment given to a constant that is being placed in memory. @var{constant} is the constant and --- gcc/doc/tm.texi.jj 2013-06-01 14:47:17.241680273 +0200 +++ gcc/doc/tm.texi 2013-06-07 14:47:29.400694547 +0200 @@ -1078,6 +1078,16 @@ arrays to be word-aligned so that @code{ constants to character arrays can be done inline. @end defmac +@defmac DATA_ABI_ALIGNMENT (@var{type}, @var{basic-align}) +Similar to @code{DATA_ALIGNMENT}, but for the cases where the ABI mandates +some alignment increase, instead of optimization only purposes. E.g.@ +AMD x86-64 psABI says that variables with array type larger than 15 bytes +must be aligned to 16 byte boundaries. @code{DATA_ALIGNMENT} should always +return the same or larger value than @code{DATA_ABI_ALIGNMENT}. + +If this macro is not defined, then @var{basic-align} is used. +@end defmac + @defmac CONSTANT_ALIGNMENT (@var{constant}, @var{basic-align}) If defined, a C expression to compute the alignment given to a constant that is being placed in memory. @var{constant} is the constant and --- gcc/testsuite/gcc.target/i386/pr56564-1.c.jj 2013-06-07 15:17:15.879403383 +0200 +++ gcc/testsuite/gcc.target/i386/pr56564-1.c 2013-06-07 15:44:18.386232149 +0200 @@ -0,0 +1,25 @@ +/* PR target/56564 */ +/* { dg-do compile { target { fpic && lp64 } } } */ +/* { dg-options "-O3 -fpic -fdump-tree-optimized" } */ + +struct S { long a, b; } s = { 5, 6 }; +char t[16] = { 7 }; + +int +foo (void) +{ + return ((__UINTPTR_TYPE__) &s) & 15; +} + +int +bar (void) +{ + return ((__UINTPTR_TYPE__) &t[0]) & 15; +} + +/* { dg-final { scan-tree-dump-times "&s" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "&t" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "return 0" 1 "optimized" } } */ +/* { dg-final { scan-assembler ".align\[ \t]*16\[^:]*\[\n\r]s:" { target { *-*-linux* } } } } */ +/* { dg-final { scan-assembler ".align\[ \t]*16\[^:]*\[\n\r]t:" { target { *-*-linux* } } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.target/i386/pr56564-2.c.jj 2013-06-07 15:19:28.900986237 +0200 +++ gcc/testsuite/gcc.target/i386/pr56564-2.c 2013-06-07 15:44:25.120129885 +0200 @@ -0,0 +1,25 @@ +/* PR target/56564 */ +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ +/* { dg-options "-O3 -fno-pic -fdump-tree-optimized" } */ + +struct S { long a, b; } s = { 5, 6 }; +char t[16] = { 7 }; + +int +foo (void) +{ + return ((__UINTPTR_TYPE__) &s) & 15; +} + +int +bar (void) +{ + return ((__UINTPTR_TYPE__) &t[0]) & 15; +} + +/* { dg-final { scan-tree-dump-times "&s" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "&t" 0 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "return 0" 2 "optimized" } } */ +/* { dg-final { scan-assembler ".align\[ \t]*16\[^:]*\[\n\r]s:" { target { *-*-linux* } } } } */ +/* { dg-final { scan-assembler ".align\[ \t]*16\[^:]*\[\n\r]t:" { target { *-*-linux* } } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.target/i386/pr56564-3.c.jj 2013-06-07 15:22:26.470079983 +0200 +++ gcc/testsuite/gcc.target/i386/pr56564-3.c 2013-06-07 15:44:34.872968234 +0200 @@ -0,0 +1,28 @@ +/* PR target/56564 */ +/* { dg-do compile { target { fpic && lp64 } } } */ +/* { dg-options "-O3 -fpic -fdump-tree-optimized" } */ + +__thread struct S { long a, b; } s = { 5, 6 }; +__thread char t[16] = { 7 }; + +int +foo (void) +{ + return ((__UINTPTR_TYPE__) &s) & 15; +} + +/* For backwards compatibility we don't assume that t must + be aligned to 16 bytes, but align it anyway. */ + +int +bar (void) +{ + return ((__UINTPTR_TYPE__) &t[0]) & 15; +} + +/* { dg-final { scan-tree-dump-times "&s" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "&t" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "return 0" 0 "optimized" } } */ +/* { dg-final { scan-assembler-not ".align\[ \t]*16\[^:]*\[\n\r]s:" { target { *-*-linux* } } } } */ +/* { dg-final { scan-assembler ".align\[ \t]*16\[^:]*\[\n\r]t:" { target { *-*-linux* } } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.target/i386/pr56564-4.c.jj 2013-06-07 15:25:12.638326084 +0200 +++ gcc/testsuite/gcc.target/i386/pr56564-4.c 2013-06-07 15:45:59.489567940 +0200 @@ -0,0 +1,22 @@ +/* PR target/56564 */ +/* { dg-do compile { target { *-*-linux* && lp64 } } } */ +/* { dg-options "-O3 -fno-pic -fdump-tree-optimized" } */ + +__thread struct S { long a, b; } s = { 5, 6 }; +__thread char t[16] = { 7 }; + +int +foo (void) +{ + return ((__UINTPTR_TYPE__) &s) & 15; +} + +int +bar (void) +{ + return ((__UINTPTR_TYPE__) &t[0]) & 15; +} + +/* { dg-final { scan-assembler-not ".align\[ \t]*16\[^:]*\[\n\r]s:" { target { *-*-linux* } } } } */ +/* { dg-final { scan-assembler ".align\[ \t]*16\[^:]*\[\n\r]t:" { target { *-*-linux* } } } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-4.c.jj 2012-10-16 13:15:44.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-4.c 2013-06-07 21:07:22.341380267 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -dp -mavx -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store" } */ +/* { dg-options "-O3 -dp -mavx -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store -fno-common" } */ #define N 1024 --- gcc/testsuite/gcc.target/i386/avx256-unaligned-store-1.c.jj 2012-10-16 13:15:44.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-store-1.c 2013-06-07 21:06:06.911946765 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store" } */ +/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -fno-common" } */ #define N 1024 --- gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c.jj 2012-10-16 13:15:44.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-store-3.c 2013-06-07 21:06:41.421248309 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -mtune=generic" } */ +/* { dg-options "-O3 -dp -mavx -mavx256-split-unaligned-store -mtune=generic -fno-common" } */ #define N 1024 --- gcc/testsuite/gcc.target/i386/avx256-unaligned-store-4.c.jj 2012-10-16 13:15:44.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-store-4.c 2013-06-07 21:06:53.516302188 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -dp -mavx -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store" } */ +/* { dg-options "-O3 -dp -mavx -mno-avx256-split-unaligned-load -mno-avx256-split-unaligned-store -fno-common" } */ #define N 1024 --- gcc/testsuite/gcc.target/i386/vect-sizes-1.c.jj 2010-11-01 09:06:30.000000000 +0100 +++ gcc/testsuite/gcc.target/i386/vect-sizes-1.c 2013-06-07 21:08:07.851581595 +0200 @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -ffast-math -mavx -mtune=generic" } */ +/* { dg-options "-O3 -ffast-math -mavx -mtune=generic -fno-common" } */ double a[1024]; --- gcc/testsuite/gcc.target/i386/memcpy-1.c.jj 2011-07-11 10:39:29.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/memcpy-1.c 2013-06-07 21:08:46.263945653 +0200 @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target ia32 } */ -/* { dg-options "-O2 -march=pentiumpro -minline-all-stringops" } */ +/* { dg-options "-O2 -march=pentiumpro -minline-all-stringops -fno-common" } */ /* { dg-final { scan-assembler "rep" } } */ /* { dg-final { scan-assembler "movs" } } */ /* { dg-final { scan-assembler-not "test" } } */ --- gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c.jj 2009-11-04 08:15:26.000000000 +0100 +++ gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c 2013-06-07 20:58:02.091268267 +0200 @@ -18,7 +18,7 @@ struct s{ struct t e; /* unaligned (offset 2N+4N+4 B) */ }; -struct s tmp; +struct s tmp = { 1 }; int main1 () { --- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c.jj 2009-11-04 08:15:26.000000000 +0100 +++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c 2013-06-07 20:58:38.201668248 +0200 @@ -18,7 +18,7 @@ struct s{ struct t e; /* unaligned (offset 2N+4N+4 B) */ }; -struct s tmp; +struct s tmp = { 1 }; int main1 () {