diff mbox

[rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC

Message ID 1368758423.3482.88.camel@gnopaine
State New
Headers show

Commit Message

Bill Schmidt May 17, 2013, 2:40 a.m. UTC
This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
vectorization opportunities.  Previously, GCC treated malloc'd arrays as
only guaranteeing 4-byte alignment, even though the glibc implementation
guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
which is sufficient to permit the missed vectorization opportunities.

The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
but doing so currently exposes a latent bug that degrades a 64-bit
benchmark.  I have therefore not included that change at this time, but
added a FIXME recording the information.

Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
regressions.  Verified that SPEC CPU2006 degradations are fixed with no
new degradations.  Ok for trunk?  Also, do you want any backports?

Thanks,
Bill


2013-05-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	* config/rs6000/rs6000.h (MALLOC_ABI_ALIGNMENT): New #define.

Comments

David Edelsohn May 17, 2013, 5:11 a.m. UTC | #1
On Thu, May 16, 2013 at 10:40 PM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
> vectorization opportunities.  Previously, GCC treated malloc'd arrays as
> only guaranteeing 4-byte alignment, even though the glibc implementation
> guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
> which is sufficient to permit the missed vectorization opportunities.
>
> The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
> but doing so currently exposes a latent bug that degrades a 64-bit
> benchmark.  I have therefore not included that change at this time, but
> added a FIXME recording the information.
>
> Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> regressions.  Verified that SPEC CPU2006 degradations are fixed with no
> new degradations.  Ok for trunk?  Also, do you want any backports?

Okay.  If you think that this is appropriate for 4.8 branch, that is
okay as well.

Is there a FSF GCC Bugzilla open for the 437.leslie3d problem?  The
FIXME doesn't need to contain the complete explanation, but it would
be nice to reference a longer explanation and not pretend to be
Fermat's theorem that is too large to write in the marge.

Thanks, David
Richard Biener May 17, 2013, 8:32 a.m. UTC | #2
On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
> vectorization opportunities.  Previously, GCC treated malloc'd arrays as
> only guaranteeing 4-byte alignment, even though the glibc implementation
> guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
> which is sufficient to permit the missed vectorization opportunities.
>
> The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
> but doing so currently exposes a latent bug that degrades a 64-bit
> benchmark.  I have therefore not included that change at this time, but
> added a FIXME recording the information.
>
> Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> regressions.  Verified that SPEC CPU2006 degradations are fixed with no
> new degradations.  Ok for trunk?  Also, do you want any backports?

You say this is because glibc guarantees such alignment - shouldn't this
be guarded by a proper check then?  Probably AIX guarantees similar
alignment but there are also embedded elf/newlib targets, no?

Btw, the #define should possibly move to config/linux.h guarded by
OPTION_GLIBC?

Richard.

> Thanks,
> Bill
>
>
> 2013-05-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
>         * config/rs6000/rs6000.h (MALLOC_ABI_ALIGNMENT): New #define.
>
>
> Index: gcc/config/rs6000/rs6000.h
> ===================================================================
> --- gcc/config/rs6000/rs6000.h  (revision 198998)
> +++ gcc/config/rs6000/rs6000.h  (working copy)
> @@ -2297,6 +2297,13 @@ extern char rs6000_reg_names[][8];       /* register nam
>  /* How to align the given loop. */
>  #define LOOP_ALIGN(LABEL)  rs6000_loop_align(LABEL)
>
> +/* Alignment guaranteed by __builtin_malloc.  */
> +/* FIXME:  128-bit alignment is guaranteed by glibc for TARGET_64BIT.
> +   However, specifying the stronger guarantee currently leads to
> +   a regression in SPEC CPU2006 437.leslie3d.  The stronger
> +   guarantee should be implemented here once that's fixed.  */
> +#define MALLOC_ABI_ALIGNMENT (64)
> +
>  /* Pick up the return address upon entry to a procedure. Used for
>     dwarf2 unwind information.  This also enables the table driven
>     mechanism.  */
>
>
Jakub Jelinek May 17, 2013, 8:47 a.m. UTC | #3
On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
> On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
> > only guaranteeing 4-byte alignment, even though the glibc implementation
> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
> > which is sufficient to permit the missed vectorization opportunities.
> >
> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
> > but doing so currently exposes a latent bug that degrades a 64-bit
> > benchmark.  I have therefore not included that change at this time, but
> > added a FIXME recording the information.
> >
> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
> > new degradations.  Ok for trunk?  Also, do you want any backports?
> 
> You say this is because glibc guarantees such alignment - shouldn't this
> be guarded by a proper check then?  Probably AIX guarantees similar
> alignment but there are also embedded elf/newlib targets, no?
> 
> Btw, the #define should possibly move to config/linux.h guarded by
> OPTION_GLIBC?

For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
glibc really guarantees on most architectures (I think x32 provides more
than that, but it can override).  But changing it requires some analysis
on commonly used malloc alternatives on Linux, what exactly do they
guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
Note that on most targets there is some standard type that requires
such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
conformance matter.

	Jakub
David Edelsohn May 17, 2013, 8:55 a.m. UTC | #4
On Fri, May 17, 2013 at 4:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
>> On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
>> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
>> > only guaranteeing 4-byte alignment, even though the glibc implementation
>> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
>> > which is sufficient to permit the missed vectorization opportunities.
>> >
>> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
>> > but doing so currently exposes a latent bug that degrades a 64-bit
>> > benchmark.  I have therefore not included that change at this time, but
>> > added a FIXME recording the information.
>> >
>> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
>> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
>> > new degradations.  Ok for trunk?  Also, do you want any backports?
>>
>> You say this is because glibc guarantees such alignment - shouldn't this
>> be guarded by a proper check then?  Probably AIX guarantees similar
>> alignment but there are also embedded elf/newlib targets, no?
>>
>> Btw, the #define should possibly move to config/linux.h guarded by
>> OPTION_GLIBC?
>
> For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
> glibc really guarantees on most architectures (I think x32 provides more
> than that, but it can override).  But changing it requires some analysis
> on commonly used malloc alternatives on Linux, what exactly do they
> guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
> Note that on most targets there is some standard type that requires
> such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
> conformance matter.

Jakub,

As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance
regression for 16 byte alignment on PPC64.  8 bytes is a work-around
for now.

- David
Jakub Jelinek May 17, 2013, 9 a.m. UTC | #5
On Fri, May 17, 2013 at 04:55:20AM -0400, David Edelsohn wrote:
> As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance
> regression for 16 byte alignment on PPC64.  8 bytes is a work-around
> for now.

I was talking about the config/linux.h definition, PPC64 can of course
override it to workaround something temporarily, but because there is some
problem on PPC64 we shouldn't penalize code say on x86_64, s390x or aarch64.

	Jakub
Richard Biener May 17, 2013, 9:17 a.m. UTC | #6
On Fri, May 17, 2013 at 10:47 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote:
>> On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
>> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
>> > only guaranteeing 4-byte alignment, even though the glibc implementation
>> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
>> > which is sufficient to permit the missed vectorization opportunities.
>> >
>> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
>> > but doing so currently exposes a latent bug that degrades a 64-bit
>> > benchmark.  I have therefore not included that change at this time, but
>> > added a FIXME recording the information.
>> >
>> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
>> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
>> > new degradations.  Ok for trunk?  Also, do you want any backports?
>>
>> You say this is because glibc guarantees such alignment - shouldn't this
>> be guarded by a proper check then?  Probably AIX guarantees similar
>> alignment but there are also embedded elf/newlib targets, no?
>>
>> Btw, the #define should possibly move to config/linux.h guarded by
>> OPTION_GLIBC?
>
> For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what
> glibc really guarantees on most architectures (I think x32 provides more
> than that, but it can override).  But changing it requires some analysis
> on commonly used malloc alternatives on Linux, what exactly do they
> guarantee.  Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan.
> Note that on most targets there is some standard type that requires
> such alignment, thus at least 2 * POINTER_SIZE alignment is also a C
> conformance matter.

Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it
is supposed to be the alignment that is required for conforming to the C ABI
on the target.  For different allocators I'd rather have a new function
attribute that tells us the alignment of the allocators allocation function
(of course replacing the allocator at runtime via ELF interposition makes
that possibly fragile as well).  We shouldn't assume just because glibc
may at some day guarantee 256bit alignment that other allocators on linux
do so, too.

So - I'd rather play safe (and the default to align to BITS_PER_WORD
probably catches the C ABI guarantee on most targets I suppose).

Richard.

>         Jakub
Jakub Jelinek May 17, 2013, 9:35 a.m. UTC | #7
On Fri, May 17, 2013 at 11:17:26AM +0200, Richard Biener wrote:
> Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it
> is supposed to be the alignment that is required for conforming to the C ABI
> on the target.  For different allocators I'd rather have a new function
> attribute that tells us the alignment of the allocators allocation function
> (of course replacing the allocator at runtime via ELF interposition makes
> that possibly fragile as well).  We shouldn't assume just because glibc

Most of the allocators I've mentioned, if not all, are providing just
malloc/calloc etc. entrypoints and are being in search scope before -lc,
so it isn't anything you can control with attributes.

I've looked at libasan so far (seems to guarantee either 16, or 64 or 128
bytes depending on flags), valgrind (seems to guarantee 8 bytes on 32-bit
arches except ppc32, and 16 bytes elsewhere), ElectricFence (no guarantees
at all, the default is 4 bytes, but can be overridden with EF_ALIGNMENT
env option to smaller/larger value, I guess let's ignore efence).

	Jakub
Bill Schmidt May 17, 2013, 12:01 p.m. UTC | #8
On Fri, 2013-05-17 at 01:11 -0400, David Edelsohn wrote:
> On Thu, May 16, 2013 at 10:40 PM, Bill Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> > This removes two degradations in CPU2006 for 32-bit PowerPC due to lost
> > vectorization opportunities.  Previously, GCC treated malloc'd arrays as
> > only guaranteeing 4-byte alignment, even though the glibc implementation
> > guarantees 8-byte alignment.  This raises the guarantee to 8 bytes,
> > which is sufficient to permit the missed vectorization opportunities.
> >
> > The guarantee for 64-bit PowerPC should be raised to 16-byte alignment,
> > but doing so currently exposes a latent bug that degrades a 64-bit
> > benchmark.  I have therefore not included that change at this time, but
> > added a FIXME recording the information.
> >
> > Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new
> > regressions.  Verified that SPEC CPU2006 degradations are fixed with no
> > new degradations.  Ok for trunk?  Also, do you want any backports?
> 
> Okay.  If you think that this is appropriate for 4.8 branch, that is
> okay as well.

Yes, the degradation appears to have started last summer, so updating
4.8 would be good.

> 
> Is there a FSF GCC Bugzilla open for the 437.leslie3d problem?  The
> FIXME doesn't need to contain the complete explanation, but it would
> be nice to reference a longer explanation and not pretend to be
> Fermat's theorem that is too large to write in the marge.

There is now.  I opened 57309 after submitting the patch, so I can
reference that.

Thanks,
Bill

> 
> Thanks, David
>
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	(revision 198998)
+++ gcc/config/rs6000/rs6000.h	(working copy)
@@ -2297,6 +2297,13 @@  extern char rs6000_reg_names[][8];	/* register nam
 /* How to align the given loop. */
 #define LOOP_ALIGN(LABEL)  rs6000_loop_align(LABEL)
 
+/* Alignment guaranteed by __builtin_malloc.  */
+/* FIXME:  128-bit alignment is guaranteed by glibc for TARGET_64BIT.
+   However, specifying the stronger guarantee currently leads to
+   a regression in SPEC CPU2006 437.leslie3d.  The stronger
+   guarantee should be implemented here once that's fixed.  */
+#define MALLOC_ABI_ALIGNMENT (64)
+
 /* Pick up the return address upon entry to a procedure. Used for
    dwarf2 unwind information.  This also enables the table driven
    mechanism.  */