diff mbox

Make max_align_t respect _Float128 [version 2]

Message ID alpine.DEB.2.20.1609051705400.16161@digraph.polyomino.org.uk
State New
Headers show

Commit Message

Joseph Myers Sept. 5, 2016, 5:06 p.m. UTC
[Patch version 2 adds an update to cxx_fundamental_alignment_p.]

The _FloatN, _FloatNx, _DecimalN and _DecimalNx types are specified in
such a way that they are basic types, meaning that max_align_t must be
at least as aligned as those types.

On 32-bit x86, max_align_t is currently 8-byte aligned, but
_Decimal128 and _Float128 are 16-byte aligned, so the alignment of
max_align_t needs to increase to meet the standard requirements for
these types.

This patch implements such an increase.  Because max_align_t needs to
be usable for C++ as well as for C, <stddef.h> can't actually refer to
_Float128, but needs to use __float128 (or some other notation for the
type) instead.  And since __float128 is architecture-specific, there
isn't a preprocessor conditional that means "__float128 is available"
(whereas one could test __FLT128_MANT_DIG__ to see if _Float128 is
available; __SIZEOF_FLOAT128__ is available on x86 only).  But I
believe the only case that actually has an alignment problem here is
32-bit x86, and <stddef.h> already has lots of conditional specific to
particular architectures or OSes, so this patch uses a conditional on
__i386__; that also is the minimal change that ensures neither size
nor alignment of max_align_t is changed in any case other than where
it is necessary.  If any other architectures turn out to have such an
issue, it will show up as failures of one of the testcases added by
this patch.

Such an increase is of course an ABI change, but a reasonably safe
one, in that max_align_t doesn't tend to appear in library interfaces
(rather, it's something to use when writing allocators and similar
code; most matches found on codesearch.debian.net look like copies of
the gnulib stddef.h module rather than anything actually using
max_align_t at all).

cxx_fundamental_alignment_p has a corresponding change (adding
_Float128 to the types considered).

(I think glibc malloc alignment should also increase to 16-byte on
32-bit x86 so it works for allocating objects of these types, which is
now straightforward given the fix made for 32-bit powerpc.)

Bootstrapped with no regressions on x86_64-pc-linux-gnu, and
spot-tested with -m32 that the new float128-align.c test now compiles
where previously it didn't.  OK to commit?

gcc:
2016-09-05  Joseph Myers  <joseph@codesourcery.com>

	* ginclude/stddef.h (max_align_t) [__i386__]: Add __float128
	element.

gcc/c-family:
2016-09-05  Joseph Myers  <joseph@codesourcery.com>

	* c-common.c (cxx_fundamental_alignment_p): Also consider
	alignment of float128_type_node.

gcc/testsuite:
2016-09-05  Joseph Myers  <joseph@codesourcery.com>

	* gcc.dg/float128-align.c, gcc.dg/float128x-align.c,
	gcc.dg/float16-align.c, gcc.dg/float32-align.c,
	gcc.dg/float32x-align.c, gcc.dg/float64-align.c,
	gcc.dg/float64x-align.c, gcc.dg/floatn-align.h: New tests.

Comments

Richard Biener Sept. 6, 2016, 9:02 a.m. UTC | #1
On Mon, Sep 5, 2016 at 7:06 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> [Patch version 2 adds an update to cxx_fundamental_alignment_p.]
>
> The _FloatN, _FloatNx, _DecimalN and _DecimalNx types are specified in
> such a way that they are basic types, meaning that max_align_t must be
> at least as aligned as those types.
>
> On 32-bit x86, max_align_t is currently 8-byte aligned, but
> _Decimal128 and _Float128 are 16-byte aligned, so the alignment of
> max_align_t needs to increase to meet the standard requirements for
> these types.

As doubles on 32bit x86 are 4 byte aligned I suppose the choice could have
been to make those types 8 byte aligned to avoid changes of max_align_t?
(with the obvious eventual performance penalty of needing to do unaligned
loads/stores to xmm registers).

Richard.

> This patch implements such an increase.  Because max_align_t needs to
> be usable for C++ as well as for C, <stddef.h> can't actually refer to
> _Float128, but needs to use __float128 (or some other notation for the
> type) instead.  And since __float128 is architecture-specific, there
> isn't a preprocessor conditional that means "__float128 is available"
> (whereas one could test __FLT128_MANT_DIG__ to see if _Float128 is
> available; __SIZEOF_FLOAT128__ is available on x86 only).  But I
> believe the only case that actually has an alignment problem here is
> 32-bit x86, and <stddef.h> already has lots of conditional specific to
> particular architectures or OSes, so this patch uses a conditional on
> __i386__; that also is the minimal change that ensures neither size
> nor alignment of max_align_t is changed in any case other than where
> it is necessary.  If any other architectures turn out to have such an
> issue, it will show up as failures of one of the testcases added by
> this patch.
>
> Such an increase is of course an ABI change, but a reasonably safe
> one, in that max_align_t doesn't tend to appear in library interfaces
> (rather, it's something to use when writing allocators and similar
> code; most matches found on codesearch.debian.net look like copies of
> the gnulib stddef.h module rather than anything actually using
> max_align_t at all).
>
> cxx_fundamental_alignment_p has a corresponding change (adding
> _Float128 to the types considered).
>
> (I think glibc malloc alignment should also increase to 16-byte on
> 32-bit x86 so it works for allocating objects of these types, which is
> now straightforward given the fix made for 32-bit powerpc.)
>
> Bootstrapped with no regressions on x86_64-pc-linux-gnu, and
> spot-tested with -m32 that the new float128-align.c test now compiles
> where previously it didn't.  OK to commit?
>
> gcc:
> 2016-09-05  Joseph Myers  <joseph@codesourcery.com>
>
>         * ginclude/stddef.h (max_align_t) [__i386__]: Add __float128
>         element.
>
> gcc/c-family:
> 2016-09-05  Joseph Myers  <joseph@codesourcery.com>
>
>         * c-common.c (cxx_fundamental_alignment_p): Also consider
>         alignment of float128_type_node.
>
> gcc/testsuite:
> 2016-09-05  Joseph Myers  <joseph@codesourcery.com>
>
>         * gcc.dg/float128-align.c, gcc.dg/float128x-align.c,
>         gcc.dg/float16-align.c, gcc.dg/float32-align.c,
>         gcc.dg/float32x-align.c, gcc.dg/float64-align.c,
>         gcc.dg/float64x-align.c, gcc.dg/floatn-align.h: New tests.
>
> Index: gcc/c-family/c-common.c
> ===================================================================
> --- gcc/c-family/c-common.c     (revision 239987)
> +++ gcc/c-family/c-common.c     (working copy)
> @@ -12855,8 +12855,11 @@ scalar_to_vector (location_t loc, enum tree_code c
>  bool
>  cxx_fundamental_alignment_p  (unsigned align)
>  {
> -  return (align <=  MAX (TYPE_ALIGN (long_long_integer_type_node),
> -                        TYPE_ALIGN (long_double_type_node)));
> +  unsigned int max_align = MAX (TYPE_ALIGN (long_long_integer_type_node),
> +                               TYPE_ALIGN (long_double_type_node));
> +  if (float128_type_node != NULL_TREE)
> +    max_align = MAX (max_align, TYPE_ALIGN (float128_type_node));
> +  return align <= max_align;
>  }
>
>  /* Return true if T is a pointer to a zero-sized aggregate.  */
> Index: gcc/ginclude/stddef.h
> ===================================================================
> --- gcc/ginclude/stddef.h       (revision 239987)
> +++ gcc/ginclude/stddef.h       (working copy)
> @@ -426,6 +426,14 @@ typedef __WINT_TYPE__ wint_t;
>  typedef struct {
>    long long __max_align_ll __attribute__((__aligned__(__alignof__(long long))));
>    long double __max_align_ld __attribute__((__aligned__(__alignof__(long double))));
> +  /* _Float128 is defined as a basic type, so max_align_t must be
> +     sufficiently aligned for it.  This code must work in C++, so we
> +     use __float128 here; that is only available on some
> +     architectures, but only on i386 is extra alignment needed for
> +     __float128.  */
> +#ifdef __i386__
> +  __float128 __max_align_f128 __attribute__((__aligned__(__alignof(__float128))));
> +#endif
>  } max_align_t;
>  #endif
>  #endif /* C11 or C++11.  */
> Index: gcc/testsuite/gcc.dg/float128-align.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/float128-align.c       (nonexistent)
> +++ gcc/testsuite/gcc.dg/float128-align.c       (working copy)
> @@ -0,0 +1,9 @@
> +/* Test _Float128 alignment.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-add-options float128 } */
> +/* { dg-require-effective-target float128 } */
> +
> +#define WIDTH 128
> +#define EXT 0
> +#include "floatn-align.h"
> Index: gcc/testsuite/gcc.dg/float128x-align.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/float128x-align.c      (nonexistent)
> +++ gcc/testsuite/gcc.dg/float128x-align.c      (working copy)
> @@ -0,0 +1,9 @@
> +/* Test _Float128 alignment.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-add-options float128x } */
> +/* { dg-require-effective-target float128x } */
> +
> +#define WIDTH 128
> +#define EXT 1
> +#include "floatn-align.h"
> Index: gcc/testsuite/gcc.dg/float16-align.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/float16-align.c        (nonexistent)
> +++ gcc/testsuite/gcc.dg/float16-align.c        (working copy)
> @@ -0,0 +1,9 @@
> +/* Test _Float16 alignment.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-add-options float16 } */
> +/* { dg-require-effective-target float16 } */
> +
> +#define WIDTH 16
> +#define EXT 0
> +#include "floatn-align.h"
> Index: gcc/testsuite/gcc.dg/float32-align.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/float32-align.c        (nonexistent)
> +++ gcc/testsuite/gcc.dg/float32-align.c        (working copy)
> @@ -0,0 +1,9 @@
> +/* Test _Float32 alignment.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-add-options float32 } */
> +/* { dg-require-effective-target float32 } */
> +
> +#define WIDTH 32
> +#define EXT 0
> +#include "floatn-align.h"
> Index: gcc/testsuite/gcc.dg/float32x-align.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/float32x-align.c       (nonexistent)
> +++ gcc/testsuite/gcc.dg/float32x-align.c       (working copy)
> @@ -0,0 +1,9 @@
> +/* Test _Float32 alignment.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-add-options float32x } */
> +/* { dg-require-effective-target float32x } */
> +
> +#define WIDTH 32
> +#define EXT 1
> +#include "floatn-align.h"
> Index: gcc/testsuite/gcc.dg/float64-align.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/float64-align.c        (nonexistent)
> +++ gcc/testsuite/gcc.dg/float64-align.c        (working copy)
> @@ -0,0 +1,9 @@
> +/* Test _Float64 alignment.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-add-options float64 } */
> +/* { dg-require-effective-target float64 } */
> +
> +#define WIDTH 64
> +#define EXT 0
> +#include "floatn-align.h"
> Index: gcc/testsuite/gcc.dg/float64x-align.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/float64x-align.c       (nonexistent)
> +++ gcc/testsuite/gcc.dg/float64x-align.c       (working copy)
> @@ -0,0 +1,9 @@
> +/* Test _Float64 alignment.  */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-add-options float64x } */
> +/* { dg-require-effective-target float64x } */
> +
> +#define WIDTH 64
> +#define EXT 1
> +#include "floatn-align.h"
> Index: gcc/testsuite/gcc.dg/floatn-align.h
> ===================================================================
> --- gcc/testsuite/gcc.dg/floatn-align.h (nonexistent)
> +++ gcc/testsuite/gcc.dg/floatn-align.h (working copy)
> @@ -0,0 +1,18 @@
> +/* Tests for _FloatN / _FloatNx types: test max_align_t alignment.
> +   Before including this file, define WIDTH as the value N; define EXT
> +   to 1 for _FloatNx and 0 for _FloatN.  */
> +
> +#define CONCATX(X, Y) X ## Y
> +#define CONCAT(X, Y) CONCATX (X, Y)
> +#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
> +
> +#if EXT
> +# define TYPE CONCAT3 (_Float, WIDTH, x)
> +#else
> +# define TYPE CONCAT (_Float, WIDTH)
> +#endif
> +
> +#include <stddef.h>
> +
> +_Static_assert (_Alignof (max_align_t) >= _Alignof (TYPE),
> +               "max_align_t must be at least as aligned as _Float* types");
>
> --
> Joseph S. Myers
> joseph@codesourcery.com
Florian Weimer Sept. 6, 2016, 9:11 a.m. UTC | #2
On 09/05/2016 07:06 PM, Joseph Myers wrote:

> Such an increase is of course an ABI change, but a reasonably safe
> one, in that max_align_t doesn't tend to appear in library interfaces
> (rather, it's something to use when writing allocators and similar
> code; most matches found on codesearch.debian.net look like copies of
> the gnulib stddef.h module rather than anything actually using
> max_align_t at all).

I've thought some more about this.  I'll try to rephrase my reservations 
in a less diplomatic way, hopefully making my thoughts clearer.

I think this change is only safe because nothing relies on it. 
max_align_t is a committee invention with no actual users.  As such, you 
can change it in any way you see fit and it won't make a difference in 
any way.

Why aren't there any users?  The standard isn't very clear what the 
value of _Alignof (max_align_t) actually means.  Does the phrase “all 
contexts” include pointers returned malloc?  Even if the requested size 
is smaller than the fundamental alignment?  Did those who wrote the 
standard expect there to be *any* relationship between malloc and 
max_align_t?

The existing situation is that most mallocs to do not provide _Alignof 
(max_align_t) alignment unconditionally.  But they can provide 
arbitrarily large alignment with aligned_alloc/memalign-style interfaces.

For object alignment declarations, I think GCC can satisfy most 
requests, perhaps subject to binutils/dynamic linker limitations for 
global variables on some targets.

The question then is, how can we make max_align_t more useful?  If it is 
supposed to reflect a malloc property, it cannot be a compile-time 
constant because we don't know at compile time which malloc is going to 
be used (malloc has to remain interposable).  If there is no 
relationship to malloc, GCC essentially supports unbounded alignment on 
many targets.  How is it possible to have a meaningful definition of 
max_align_t under such circumstances?

Florian
Richard Biener Sept. 6, 2016, 9:23 a.m. UTC | #3
On Tue, Sep 6, 2016 at 11:11 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 09/05/2016 07:06 PM, Joseph Myers wrote:
>
>> Such an increase is of course an ABI change, but a reasonably safe
>> one, in that max_align_t doesn't tend to appear in library interfaces
>> (rather, it's something to use when writing allocators and similar
>> code; most matches found on codesearch.debian.net look like copies of
>> the gnulib stddef.h module rather than anything actually using
>> max_align_t at all).
>
>
> I've thought some more about this.  I'll try to rephrase my reservations in
> a less diplomatic way, hopefully making my thoughts clearer.
>
> I think this change is only safe because nothing relies on it. max_align_t
> is a committee invention with no actual users.  As such, you can change it
> in any way you see fit and it won't make a difference in any way.
>
> Why aren't there any users?  The standard isn't very clear what the value of
> _Alignof (max_align_t) actually means.  Does the phrase “all contexts”
> include pointers returned malloc?  Even if the requested size is smaller
> than the fundamental alignment?  Did those who wrote the standard expect
> there to be *any* relationship between malloc and max_align_t?
>
> The existing situation is that most mallocs to do not provide _Alignof
> (max_align_t) alignment unconditionally.  But they can provide arbitrarily
> large alignment with aligned_alloc/memalign-style interfaces.
>
> For object alignment declarations, I think GCC can satisfy most requests,
> perhaps subject to binutils/dynamic linker limitations for global variables
> on some targets.
>
> The question then is, how can we make max_align_t more useful?  If it is
> supposed to reflect a malloc property, it cannot be a compile-time constant
> because we don't know at compile time which malloc is going to be used
> (malloc has to remain interposable).  If there is no relationship to malloc,
> GCC essentially supports unbounded alignment on many targets.  How is it
> possible to have a meaningful definition of max_align_t under such
> circumstances?

I thought max_align_t was to replace uses like

  union {
    long long x;
    double y;
    ...
    char buf[N];
  };

to get statically allocated "max" aligned memory.  That is, you now know
_which_ type you need to put into the union.

Richard.

> Florian
Joseph Myers Sept. 6, 2016, 11:18 a.m. UTC | #4
On Tue, 6 Sep 2016, Richard Biener wrote:

> > On 32-bit x86, max_align_t is currently 8-byte aligned, but
> > _Decimal128 and _Float128 are 16-byte aligned, so the alignment of
> > max_align_t needs to increase to meet the standard requirements for
> > these types.
> 
> As doubles on 32bit x86 are 4 byte aligned I suppose the choice could have
> been to make those types 8 byte aligned to avoid changes of max_align_t?
> (with the obvious eventual performance penalty of needing to do unaligned
> loads/stores to xmm registers).

However, the choice made some years ago (and documented in HJ's ABI 
document) was 16-byte alignment.
Joseph Myers Sept. 6, 2016, 11:25 a.m. UTC | #5
On Tue, 6 Sep 2016, Florian Weimer wrote:

> Why aren't there any users?  The standard isn't very clear what the value of
> _Alignof (max_align_t) actually means.  Does the phrase “all contexts” include
> pointers returned malloc?  Even if the requested size is smaller than the
> fundamental alignment?  Did those who wrote the standard expect there to be
> *any* relationship between malloc and max_align_t?

See my cleanup of the wording in DR#445 
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_445>, which 
is intended to reflect the intent and stay compatible with C99.  malloc 
should be usable to allocate memory for any type from the standard library 
headers, including max_align_t.

> The existing situation is that most mallocs to do not provide _Alignof
> (max_align_t) alignment unconditionally.  But they can provide arbitrarily
> large alignment with aligned_alloc/memalign-style interfaces.

Well, that's a conformance bug in the implementation as a whole.  The 
nonconforming modes in question are still useful and it's useful for GCC 
to support such mallocs.

> has to remain interposable).  If there is no relationship to malloc, GCC
> essentially supports unbounded alignment on many targets.  How is it possible
> to have a meaningful definition of max_align_t under such circumstances?

max_align_t only covers fundamental alignments, not necessarily all 
alignments that are supported.
Bernd Schmidt Sept. 6, 2016, 11:59 a.m. UTC | #6
On 09/06/2016 11:11 AM, Florian Weimer wrote:

> I think this change is only safe because nothing relies on it.
> max_align_t is a committee invention with no actual users.

I tried to verify that and ran grep over all the packages in 
/usr/portage/distfiles. That did get a number of matches, fairly obvious 
and expected ones like gcc and glibc, and several more which match only 
because gnulib seems to try to provide a default, but I also found a few 
real uses:

grep-2.25/lib/fts.c:
         /* Align the allocation size so that it works for FTSENT,
            so that trailing padding may be referenced by direct access
            to the flexible array members, without triggering undefined 
behavior
            by accessing bytes beyond the heap allocation.  This 
implicit access
            was seen for example with ISDOT() and GCC 5.1.1 at -O2.
            Do not use alignof (FTSENT) here, since C11 prohibits
            taking the alignment of a structure containing a flexible
            array member.  */
         len += alignof (max_align_t) - 1;
         len &= ~ (alignof (max_align_t) - 1);


libvpx-v1.3.0/nestegg/halloc/src/halloc.c:
/*
  *      block control header
  */
typedef struct hblock
{
#ifndef NDEBUG
#define HH_MAGIC    0x20040518L
         long          magic;
#endif
         hlist_item_t  siblings; /* 2 pointers */
         hlist_head_t  children; /* 1 pointer  */
         max_align_t   data[1];  /* not allocated, see below */

} hblock_t;


Bernd
Florian Weimer Sept. 6, 2016, 2:39 p.m. UTC | #7
On 09/06/2016 01:59 PM, Bernd Schmidt wrote:
> On 09/06/2016 11:11 AM, Florian Weimer wrote:
>
>> I think this change is only safe because nothing relies on it.
>> max_align_t is a committee invention with no actual users.
>
> I tried to verify that and ran grep over all the packages in
> /usr/portage/distfiles. That did get a number of matches, fairly obvious
> and expected ones like gcc and glibc, and several more which match only
> because gnulib seems to try to provide a default, but I also found a few
> real uses:
>
> grep-2.25/lib/fts.c:
>         /* Align the allocation size so that it works for FTSENT,
>            so that trailing padding may be referenced by direct access
>            to the flexible array members, without triggering undefined
> behavior
>            by accessing bytes beyond the heap allocation.  This implicit
> access
>            was seen for example with ISDOT() and GCC 5.1.1 at -O2.
>            Do not use alignof (FTSENT) here, since C11 prohibits
>            taking the alignment of a structure containing a flexible
>            array member.  */
>         len += alignof (max_align_t) - 1;
>         len &= ~ (alignof (max_align_t) - 1);

The context looks like this:

   /*
    * The file name is a variable length array.  Allocate the FTSENT
    * structure and the file name in one chunk.
    */
   len = offsetof(FTSENT, fts_name) + namelen + 1;
   /* Align the allocation size so that it works for FTSENT,
      so that trailing padding may be referenced by direct access
      to the flexible array members, without triggering undefined behavior
      by accessing bytes beyond the heap allocation.  This implicit access
      was seen for example with ISDOT() and GCC 5.1.1 at -O2.
      Do not use alignof (FTSENT) here, since C11 prohibits
      taking the alignment of a structure containing a flexible
      array member.  */
   len += alignof (max_align_t) - 1;
   len &= ~ (alignof (max_align_t) - 1);
   if ((p = malloc(len)) == NULL)
           return (NULL);

This code was added to work around a GCC bug:

   <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66661>

It's rather dubious that this fix is necessary or even correct.  Where 
would this stop?  Would

   char buf[23];
   memset (buf, 'X', sizeof buf);
   strndup (buf, 23);

have to allocate 32 bytes so that the object size is a multiple of 16 on 
x86_64 (where alignof (max_align_t) is 16)?

> libvpx-v1.3.0/nestegg/halloc/src/halloc.c:
> /*
>  *      block control header
>  */
> typedef struct hblock
> {
> #ifndef NDEBUG
> #define HH_MAGIC    0x20040518L
>         long          magic;
> #endif
>         hlist_item_t  siblings; /* 2 pointers */
>         hlist_head_t  children; /* 1 pointer  */
>         max_align_t   data[1];  /* not allocated, see below */
>
> } hblock_t;

This is a false hit.  Upstream is here:

   https://github.com/apankrat/halloc

src/align.h has:

union max_align
{
         char   c;
         short  s;
         long   l;
         int    i;
         float  f;
         double d;
         void * v;
         void (*q)(void);
};

typedef union max_align max_align_t;

I'm not sure if it is an attempt to emulate max_align_t, or if it is a 
name collision.  It's from 2011, so perhaps the latter.  It's certainly 
incorrect for x86_64.  It works by accident in the halloc context 
because the object header is four words large, so it preserves the 
alignment of the underlying allocator.

Florian
Florian Weimer Sept. 6, 2016, 3 p.m. UTC | #8
On 09/06/2016 01:25 PM, Joseph Myers wrote:
> On Tue, 6 Sep 2016, Florian Weimer wrote:
>
>> Why aren't there any users?  The standard isn't very clear what the value of
>> _Alignof (max_align_t) actually means.  Does the phrase “all contexts” include
>> pointers returned malloc?  Even if the requested size is smaller than the
>> fundamental alignment?  Did those who wrote the standard expect there to be
>> *any* relationship between malloc and max_align_t?
>
> See my cleanup of the wording in DR#445
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2059.htm#dr_445>, which
> is intended to reflect the intent and stay compatible with C99.  malloc
> should be usable to allocate memory for any type from the standard library
> headers, including max_align_t.

And the old text for malloc says:

“
The pointer returned if the allocation succeeds is suitably aligned so 
that it may be assigned to a pointer to any type of object with a 
fundamental alignment requirement and then used to access such an object 
or an array of such objects in the space allocated (until the space is 
explicitly deallocated).
”

So that's what ties the two things together.  I still don't like what's 
implied in PR66661, that all object sizes have to be multiples of the 
fundamental alignment.

>> The existing situation is that most mallocs to do not provide _Alignof
>> (max_align_t) alignment unconditionally.  But they can provide arbitrarily
>> large alignment with aligned_alloc/memalign-style interfaces.
>
> Well, that's a conformance bug in the implementation as a whole.  The
> nonconforming modes in question are still useful and it's useful for GCC
> to support such mallocs.

PR66661 shows that GCC does not want to support such mallocs (or even 
glibc's malloc).

Florian
Joseph Myers Sept. 6, 2016, 3:16 p.m. UTC | #9
On Tue, 6 Sep 2016, Florian Weimer wrote:

> So that's what ties the two things together.  I still don't like what's
> implied in PR66661, that all object sizes have to be multiples of the
> fundamental alignment.

I don't think there's any such requirement in the case of flexible array 
members; if you use malloc to allocate a structure with a flexible array 
member, you can access as many trailing array elements as would fit within 
the allocated size, whether or not that size is a multiple of either the 
alignment of the structure, or the alignment of max_align_t.

> > Well, that's a conformance bug in the implementation as a whole.  The
> > nonconforming modes in question are still useful and it's useful for GCC
> > to support such mallocs.
> 
> PR66661 shows that GCC does not want to support such mallocs (or even glibc's
> malloc).

GCC is supposed to support all mallocs that produce results aligned to at 
least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of 
max_align_t).
Paul Eggert Sept. 6, 2016, 3:58 p.m. UTC | #10
On 09/06/2016 08:16 AM, Joseph Myers wrote:
> I don't think there's any such requirement in the case of flexible array
> members; if you use malloc to allocate a structure with a flexible array
> member, you can access as many trailing array elements as would fit within
> the allocated size, whether or not that size is a multiple of either the
> alignment of the structure, or the alignment of max_align_t.

Although I would prefer you to be correct I'm afraid the standard 
doesn't agree, as C11's wording appears to allow the GCC optimization in 
question. (At least, draft N1570 does; I don't have the final standard.) 
C11 section 6.7.2.1 paragraph 18 says:

"when a . (or ->) operator has a left operand that is (a pointer to) a 
structure with a flexible array member and the right operand names that 
member, it behaves as if that member were replaced with the longest 
array (with the same element type) that would not make the structure 
larger than the object being accessed"

Suppose 'short' size and alignment is 2, and the following code 
allocates and uses 7 bytes:

   struct s { short n; char a[]; } *p = malloc (offsetof (struct s, a) + 5);

   return &p->a[5];

A strict reading of C11 says this code is supposed to behave as if 'char 
a[];' were replaced by 'char a[4];'. It is not the 'char a[5]' that one 
might expect, because using 'char a[5];' would mean the size of struct s 
(after alignment) would be 8, whereas the size of the object being 
accessed is only 7. Hence the above return expression has a subscript error.

One way to correct the code is to increase malloc's argument up to a 
multiple of alignof(max_align_t). (One cannot portably use 
alignof(struct s) due to C11's prohibition of using alignof on 
incomplete types.) This is why gnulib/lib/fts.c uses max_align_t.
Joseph Myers Sept. 6, 2016, 8:40 p.m. UTC | #11
On Tue, 6 Sep 2016, Paul Eggert wrote:

> One way to correct the code is to increase malloc's argument up to a multiple
> of alignof(max_align_t). (One cannot portably use alignof(struct s) due to

Sounds like a defect in C11 to me - none of the examples of flexible array 
members anticipate needing to add to the size to allow for tail padding 
with unknown alignment requirements.

> C11's prohibition of using alignof on incomplete types.) This is why

Structures with flexible array members are not incomplete types.
Jason Merrill Sept. 6, 2016, 8:57 p.m. UTC | #12
On Tue, Sep 6, 2016 at 11:16 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> GCC is supposed to support all mallocs that produce results aligned to at
> least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of
> max_align_t).

I've just been running into problems with MALLOC_ABI_ALIGNMENT being
smaller than max_align_t, which doesn't make sense to me; the C11
passage Florian quoted says that malloc needs to support all
fundamental alignments, i.e. max_align_t.

Jason
Joseph Myers Sept. 6, 2016, 9:03 p.m. UTC | #13
On Tue, 6 Sep 2016, Jason Merrill wrote:

> On Tue, Sep 6, 2016 at 11:16 AM, Joseph Myers <joseph@codesourcery.com> wrote:
> > GCC is supposed to support all mallocs that produce results aligned to at
> > least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of
> > max_align_t).
> 
> I've just been running into problems with MALLOC_ABI_ALIGNMENT being
> smaller than max_align_t, which doesn't make sense to me; the C11
> passage Florian quoted says that malloc needs to support all
> fundamental alignments, i.e. max_align_t.

The point is that GCC supports being used in nonconforming implementations 
as well as in conforming ones, and in nonconforming contexts people may 
e.g. interpose malloc with a version that yields sufficient alignment for 
common cases but may not use 16-byte alignment, or may not align small 
objects as much as larger ones (whereas ISO C requires all alignments, 
even 1-byte ones, be aligned as much as max_align_t).

Of course if users of such an interposed malloc try to allocate memory for 
_Float128 with it, they can expect that to break.
Paul Eggert Sept. 6, 2016, 9:31 p.m. UTC | #14
On 09/06/2016 01:40 PM, Joseph Myers wrote:
> Sounds like a defect in C11 to me - none of the examples of flexible 
> array
> members anticipate needing to add to the size to allow for tail padding
> with unknown alignment requirements.

Yes, I would prefer calling it a defect, as most code I've seen dealing 
with flexible array members does not align the tail size. However, GCC + 
valgrind does take advantage of this "defect" and I would not be 
surprised if other picky implementations do too.

The C11 standard's examples are weird, in that their flexible members 
are typically arrays of double, where the alignment in practice is 
invariably no less than that of the containing structure so our problem 
does not occur. And for the only example that calls malloc and is 
directly on point (assuming a weird platform where doubles are 
unaligned), the associated commentary says that the flexible array 
member behaves "for most purposes" as if it had the natural size, i.e., 
all bets are off unless you read the entire standard carefully! It is 
almost as if the C11 authors knew about the problem but did not want to 
call the reader's attention to it (I doubt whether that occurred -- it's 
just that it reads that way).

>> C11's prohibition of using alignof on incomplete types.) This is why
> Structures with flexible array members are not incomplete types.
>
Ah, right you are. Sorry, I confused C11 alignof with Gnulib alignof. On 
pre-C11 platforms, the Gnulib substitute alignof does not work on such 
structures, so code like Gnulib fts.c that is intended to be portable to 
pre-C11 compilers can't use that C11 feature. I have corrected the 
Gnulib manual's documentation of this limitation.
Jason Merrill Sept. 6, 2016, 9:41 p.m. UTC | #15
On Tue, Sep 6, 2016 at 5:03 PM, Joseph Myers <joseph@codesourcery.com> wrote:
> On Tue, 6 Sep 2016, Jason Merrill wrote:
>
>> On Tue, Sep 6, 2016 at 11:16 AM, Joseph Myers <joseph@codesourcery.com> wrote:
>> > GCC is supposed to support all mallocs that produce results aligned to at
>> > least MALLOC_ABI_ALIGNMENT (which may be smaller than the alignment of
>> > max_align_t).
>>
>> I've just been running into problems with MALLOC_ABI_ALIGNMENT being
>> smaller than max_align_t, which doesn't make sense to me; the C11
>> passage Florian quoted says that malloc needs to support all
>> fundamental alignments, i.e. max_align_t.
>
> The point is that GCC supports being used in nonconforming implementations
> as well as in conforming ones, and in nonconforming contexts people may
> e.g. interpose malloc with a version that yields sufficient alignment for
> common cases but may not use 16-byte alignment, or may not align small
> objects as much as larger ones (whereas ISO C requires all alignments,
> even 1-byte ones, be aligned as much as max_align_t).

MALLOC_ABI_ALIGNMENT is documented as "Alignment, in bits, a C
conformant malloc implementation has to provide," which doesn't sound
like it's intended to allow for non-conforming implementations.  And
doesn't a smaller default lead to lost optimization opportunities in
the typical case?

The reason I care is that C++17 aligned new (wg21.link/p0035)
specifies that for types that require more alignment than the usual
operator new provides, the new-expression instead calls an operator
new with an explicit alignment parameter.  MALLOC_ABI_ALIGNMENT
sounded like exactly what I was looking for, but then I noticed that
'new long double' was going to the aligned new operator, which breaks
older code that replaces operator new (without, of course, replacing
the aligned variant).

Jason
Joseph Myers Sept. 6, 2016, 9:53 p.m. UTC | #16
On Tue, 6 Sep 2016, Jason Merrill wrote:

> The reason I care is that C++17 aligned new (wg21.link/p0035)
> specifies that for types that require more alignment than the usual
> operator new provides, the new-expression instead calls an operator
> new with an explicit alignment parameter.  MALLOC_ABI_ALIGNMENT
> sounded like exactly what I was looking for, but then I noticed that
> 'new long double' was going to the aligned new operator, which breaks
> older code that replaces operator new (without, of course, replacing
> the aligned variant).

I'd say that cxx_fundamental_alignment_p is the right thing to use here, 
but maybe you want an option to override the alignment threshold for 
aligned new (in case someone interposes a malloc that uses lower 
alignment, but still wants to allocate some objects with higher 
alignment).
Florian Weimer Sept. 7, 2016, 7:44 a.m. UTC | #17
On 09/06/2016 11:23 AM, Richard Biener wrote:

>> The question then is, how can we make max_align_t more useful?  If it is
>> supposed to reflect a malloc property, it cannot be a compile-time constant
>> because we don't know at compile time which malloc is going to be used
>> (malloc has to remain interposable).  If there is no relationship to malloc,
>> GCC essentially supports unbounded alignment on many targets.  How is it
>> possible to have a meaningful definition of max_align_t under such
>> circumstances?
>
> I thought max_align_t was to replace uses like
>
>   union {
>     long long x;
>     double y;
>     ...
>     char buf[N];
>   };
>
> to get statically allocated "max" aligned memory.  That is, you now know
> _which_ type you need to put into the union.

Yes, that's the way I have used it too.  But it's difficult to use such 
an idiom and make it valid C11 (you basically have to use memcpy to 
access the buffer, direct access is not permitted because the dynamic 
type is char).

But it's unclear what this trying to achieve.  Ignoring the aliasing 
issue, you can store any object there if you haven't used an _Alignas 
specification.  If you used an _Alignas specification for the object's 
type, you may have to manually align the pointer if the alignment is 
larger than that of max_align_t.

The existence of such a cut-off constant appears useful, but it's not 
clear if it should be tied to the fundamental alignment (especially, as 
this discussion shows, the fundamental alignment will be somewhat in flux).

Florian
Florian Weimer Sept. 7, 2016, 9:10 a.m. UTC | #18
On 09/06/2016 10:40 PM, Joseph Myers wrote:
> On Tue, 6 Sep 2016, Paul Eggert wrote:
>
>> One way to correct the code is to increase malloc's argument up to a multiple
>> of alignof(max_align_t). (One cannot portably use alignof(struct s) due to
>
> Sounds like a defect in C11 to me - none of the examples of flexible array
> members anticipate needing to add to the size to allow for tail padding
> with unknown alignment requirements.

I agree, this is a defect in C99 and C11.  The language hasn't changed 
since C99, and C99 has the same issue because it's unrelated to 
alignment specifiers.  It's a confusion between struct sizes (which are 
multiples of the struct alignment) and object sizes (which are not 
necessarily so).

I have reopened PR66661 with a more elaborate test case which shows that 
GCC packs objects more tightly than the struct alignment would permit.

Thanks,
Florian
Florian Weimer Sept. 7, 2016, 9:15 a.m. UTC | #19
On 09/06/2016 11:31 PM, Paul Eggert wrote:
> On 09/06/2016 01:40 PM, Joseph Myers wrote:
>> Sounds like a defect in C11 to me - none of the examples of flexible
>> array
>> members anticipate needing to add to the size to allow for tail padding
>> with unknown alignment requirements.
>
> Yes, I would prefer calling it a defect, as most code I've seen dealing
> with flexible array members does not align the tail size. However, GCC +
> valgrind does take advantage of this "defect" and I would not be
> surprised if other picky implementations do too.

It might be an inherent limitation of the valgrind approach. 
Speculative loads which cannot result in data races (in the C11 sense) 
due to the way the architecture behaves should be fine.  The alignment 
ensures that the load is on the same page, which is what typically 
prevent this optimization.

Some implementation techniques for C string functions result in the same 
behavior.  valgrind intercepts them or suppresses errors there, but 
that's not possible for code that GCC emits inline, obviously.

valgrind would still treat the bytes beyond the allocation boundary as 
undefined.  But I agree that false positives in this area are annoying.

Florian
Mark Wielaard Sept. 7, 2016, 11:52 a.m. UTC | #20
On Wed, Sep 07, 2016 at 11:15:34AM +0200, Florian Weimer wrote:
> On 09/06/2016 11:31 PM, Paul Eggert wrote:
> >On 09/06/2016 01:40 PM, Joseph Myers wrote:
> >>Sounds like a defect in C11 to me - none of the examples of flexible
> >>array
> >>members anticipate needing to add to the size to allow for tail padding
> >>with unknown alignment requirements.
> >
> >Yes, I would prefer calling it a defect, as most code I've seen dealing
> >with flexible array members does not align the tail size. However, GCC +
> >valgrind does take advantage of this "defect" and I would not be
> >surprised if other picky implementations do too.
> 
> It might be an inherent limitation of the valgrind approach. 
> Speculative loads which cannot result in data races (in the C11 sense) 
> due to the way the architecture behaves should be fine.  The alignment 
> ensures that the load is on the same page, which is what typically 
> prevent this optimization.

It might or might not be an issue for valgrind. If valgrind believes the
memory isn't in valid memory then it will complain about an invalid access.
But if the memory is accessible but uninitialised then it will just track
the undefinedness complain later if such a value is used.

> Some implementation techniques for C string functions result in the same 
> behavior.  valgrind intercepts them or suppresses errors there, but 
> that's not possible for code that GCC emits inline, obviously.

valgrind also has --partial-loads-ok (which in newer versions defaults
to =yes):

   Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally
   aligned loads from addresses for which some bytes are addressable
   and others are not. When yes, such loads do not produce an address
   error. Instead, loaded bytes originating from illegal addresses are
   marked as uninitialised, and those corresponding to legal addresses
   are handled in the normal way.

> valgrind would still treat the bytes beyond the allocation boundary as 
> undefined.  But I agree that false positives in this area are annoying.

Does anybody have an example program of the above issue compiled with
gcc that produces false positives with valgrind?

Thanks,

Mark
Joseph Myers Sept. 7, 2016, 5:45 p.m. UTC | #21
On Wed, 7 Sep 2016, Florian Weimer wrote:

> The existence of such a cut-off constant appears useful, but it's not clear if
> it should be tied to the fundamental alignment (especially, as this discussion
> shows, the fundamental alignment will be somewhat in flux).

I don't think it's in flux.  I think it's very rare for a new basic type 
to be added that has increased alignment requirements compared to the 
existing basic types.  _Decimal128 was added in GCC 4.3, which increased 
the requirement on 32-bit x86 (only) (32-bit x86 malloc having been buggy 
in that regard ever since then); __float128 / _Float128 did not increase 
the requirement relative to that introduced with _Decimal128.  (Obviously 
if CPLEX part 2 defines vector types that might have larger alignment 
requirements it must avoid defining them as basic types.)
Paul Eggert Sept. 7, 2016, 11:21 p.m. UTC | #22
On 09/07/2016 04:52 AM, Mark Wielaard wrote:
> If valgrind believes the
> memory isn't in valid memory then it will complain about an invalid access.
> But if the memory is accessible but uninitialised then it will just track
> the undefinedness complain later if such a value is used.

I think the former is what happened in Gnulib fts.c before Gnulib was fixed.

> valgrind also has --partial-loads-ok (which in newer versions defaults
> to =yes):
>
>     Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally
>     aligned loads from addresses for which some bytes are addressable
>     and others are not.

Although this helps in some cases, it does not suffice in general since 
the problem can occur with 16-bit aligned loads. I think that is what 
happened with fts.c.

> Does anybody have an example program of the above issue compiled with
> gcc that produces false positives with valgrind?
>

Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind 
3.11.0), when I compile with "gcc -O2 flexouch.c" and run with "valgrind 
./a.out", valgrind complains "Invalid read of size 2". This is because 
GCC compiles "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax); 
sete %al", which loads the uninitialized byte p->d[1] simultaneously 
with the initialized byte p->d[0].

As mentioned previously, although flexouch.c does not conform to C11, 
this is arguably a defect in C11.
#include <stddef.h>
#include <stdlib.h>

struct s { struct s *next; char d[]; };

int
main (void)
{
  struct s *p = malloc (offsetof (struct s, d) + 1);
  p->d[0] = 1;
  return p->d[0] == 2 && p->d[1] == 3;
}
Florian Weimer Sept. 8, 2016, 9:27 a.m. UTC | #23
On 09/07/2016 07:45 PM, Joseph Myers wrote:
> On Wed, 7 Sep 2016, Florian Weimer wrote:
>
>> The existence of such a cut-off constant appears useful, but it's not clear if
>> it should be tied to the fundamental alignment (especially, as this discussion
>> shows, the fundamental alignment will be somewhat in flux).
>
> I don't think it's in flux.  I think it's very rare for a new basic type
> to be added that has increased alignment requirements compared to the
> existing basic types.  _Decimal128 was added in GCC 4.3, which increased
> the requirement on 32-bit x86 (only) (32-bit x86 malloc having been buggy
> in that regard ever since then); __float128 / _Float128 did not increase
> the requirement relative to that introduced with _Decimal128.

I said “somewhat”. :)  I don't expect many changes.  Although with the 
gnulib emulation, you could easily get different results depending on 
whether your toolchain has C11 support or not, on the same architecture.

If max_align_t denotes the cut-off point between automatic and manual 
alignment (as suggested by Richard), it really has to be set in stone, IMHO.

Florian
Mark Wielaard Sept. 8, 2016, 11:56 a.m. UTC | #24
Hi,

I added Julian to the CC who will probably correct any mistakes
I make. Or even might know a simple way to make valgrind detect
this idiom.

On Wed, Sep 07, 2016 at 04:21:29PM -0700, Paul Eggert wrote:
> On 09/07/2016 04:52 AM, Mark Wielaard wrote:
> > If valgrind believes the
> > memory isn't in valid memory then it will complain about an invalid access.
> > But if the memory is accessible but uninitialised then it will just track
> > the undefinedness complain later if such a value is used.
> 
> I think the former is what happened in Gnulib fts.c before Gnulib was fixed.

BTW. Do gnulib and glibc syncronize? I had no idea gnulib also contained
fts. I recently fixed some issue (adding LFS support) in glibc.

> > valgrind also has --partial-loads-ok (which in newer versions defaults
> > to =yes):
> > 
> >     Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally
> >     aligned loads from addresses for which some bytes are addressable
> >     and others are not.
> 
> Although this helps in some cases, it does not suffice in general since the
> problem can occur with 16-bit aligned loads. I think that is what happened
> with fts.c.
> 
> > Does anybody have an example program of the above issue compiled with
> > gcc that produces false positives with valgrind?
> > 
> 
> Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind 3.11.0),
> when I compile with "gcc -O2 flexouch.c" and run with "valgrind ./a.out",
> valgrind complains "Invalid read of size 2". This is because GCC compiles
> "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax); sete %al", which
> loads the uninitialized byte p->d[1] simultaneously with the initialized
> byte p->d[0].
> 
> As mentioned previously, although flexouch.c does not conform to C11, this
> is arguably a defect in C11.
> 

> #include <stddef.h>
> #include <stdlib.h>
> 
> struct s { struct s *next; char d[]; };
> 
> int
> main (void)
> {
>   struct s *p = malloc (offsetof (struct s, d) + 1);
>   p->d[0] = 1;
>   return p->d[0] == 2 && p->d[1] == 3;
> }

OK, I can replicate that with the same GCC when using -O2.

==25520== Invalid read of size 2
==25520==    at 0x400442: main (in /home/mark/src/tests/flexouch)
==25520==  Address 0x51fa048 is 8 bytes inside a block of size 9 alloc'd
==25520==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==25520==    by 0x40043D: main (in /home/mark/src/tests/flexouch)

Without optimization the p->d[1] == 3 is never executed.

Dump of assembler code for function main:
=> 0x0000000000400430 <+0>:	sub    $0x8,%rsp
   0x0000000000400434 <+4>:	mov    $0x9,%edi
   0x0000000000400439 <+9>:	callq  0x400410 <malloc@plt>
   0x000000000040043e <+14>:	movb   $0x1,0x8(%rax)
   0x0000000000400442 <+18>:	cmpw   $0x302,0x8(%rax)
   0x0000000000400448 <+24>:	sete   %al
   0x000000000040044b <+27>:	add    $0x8,%rsp
   0x000000000040044f <+31>:	movzbl %al,%eax
   0x0000000000400452 <+34>:	retq   
End of assembler dump.

Note that valgrind will also get this "wrong" if you do
allocate enough space, but don't initialize d[1]:

==25906== Syscall param exit_group(status) contains uninitialised byte(s)
==25906==    at 0x4F00CC8: _Exit (in /usr/lib64/libc-2.23.so)
==25906==    by 0x4E7119A: __run_exit_handlers (in /usr/lib64/libc-2.23.so)
==25906==    by 0x4E71234: exit (in /usr/lib64/libc-2.23.so)
==25906==    by 0x4E58737: (below main) (in /usr/lib64/libc-2.23.so)
==25906==  Uninitialised value was created by a heap allocation
==25906==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==25906==    by 0x40043D: main (in /home/mark/src/tests/flexouch)

I don't think there is anything valgrind can do to detect that
compw really only depends on d[0] if the result is false.

Cheers,

Mark
Florian Weimer Sept. 8, 2016, 11:58 a.m. UTC | #25
On 09/08/2016 01:56 PM, Mark Wielaard wrote:

> On Wed, Sep 07, 2016 at 04:21:29PM -0700, Paul Eggert wrote:
>> On 09/07/2016 04:52 AM, Mark Wielaard wrote:
>>> If valgrind believes the
>>> memory isn't in valid memory then it will complain about an invalid access.
>>> But if the memory is accessible but uninitialised then it will just track
>>> the undefinedness complain later if such a value is used.
>>
>> I think the former is what happened in Gnulib fts.c before Gnulib was fixed.
>
> BTW. Do gnulib and glibc syncronize? I had no idea gnulib also contained
> fts. I recently fixed some issue (adding LFS support) in glibc.

They are expected to synchronize.  It does not always happen.

Florian
Bernd Schmidt Sept. 8, 2016, 12:26 p.m. UTC | #26
On 09/08/2016 01:21 AM, Paul Eggert wrote:

> Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind
> 3.11.0), when I compile with "gcc -O2 flexouch.c" and run with "valgrind
> ./a.out", valgrind complains "Invalid read of size 2". This is because
> GCC compiles "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax);
> sete %al", which loads the uninitialized byte p->d[1] simultaneously
> with the initialized byte p->d[0].

Interesting. That optimization doesn't really depend on d being a 
flexible array, so you can also reproduce a (different) valgrind warning 
with the following:

#include <stddef.h>
#include <stdlib.h>

struct s { int x; char d[2]; };

__attribute__((noinline,noclone)) void foo (struct s *p)
{
   p->d[0] = 1;
}

int
main (void)
{
   struct s *p = malloc (sizeof *p);
   foo (p);
   return p->d[0] == 2 && p->d[1] == 3;
}


Bernd
Florian Weimer Sept. 8, 2016, 12:30 p.m. UTC | #27
On 09/08/2016 02:26 PM, Bernd Schmidt wrote:
> On 09/08/2016 01:21 AM, Paul Eggert wrote:
>
>> Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind
>> 3.11.0), when I compile with "gcc -O2 flexouch.c" and run with "valgrind
>> ./a.out", valgrind complains "Invalid read of size 2". This is because
>> GCC compiles "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax);
>> sete %al", which loads the uninitialized byte p->d[1] simultaneously
>> with the initialized byte p->d[0].
>
> Interesting. That optimization doesn't really depend on d being a
> flexible array, so you can also reproduce a (different) valgrind warning
> with the following:
>
> #include <stddef.h>
> #include <stdlib.h>
>
> struct s { int x; char d[2]; };
>
> __attribute__((noinline,noclone)) void foo (struct s *p)
> {
>   p->d[0] = 1;
> }
>
> int
> main (void)
> {
>   struct s *p = malloc (sizeof *p);
>   foo (p);
>   return p->d[0] == 2 && p->d[1] == 3;
> }

Very interesting.  So the ASan failure reported for gnulib fts and this 
valgrind issue have separate causes (ASan does not care about undefined 
memory).

Thanks,
Florian
Paul Eggert Sept. 8, 2016, 2:55 p.m. UTC | #28
On 09/08/2016 04:56 AM, Mark Wielaard wrote:
> I don't think there is anything valgrind can do to detect that
> compw really only depends on d[0] if the result is false.
>
valgrind (with the default --partial-loads-ok=yes) could and should do 
the same thing with cmpw that it already does with cmpl. The attached 
program compiles and runs OK with gcc -O2 and valgrind on x86-64, even 
though the machine code uses cmpl to load bytes that were not allocated.

> Do gnulib and glibc syncronize?

They do at times, though not as often as we'd like. fts.c in particular 
has strayed quite a bit, to cater GNU utilities' robustness needs over 
what glibc provides. (GNU coreutils, for example, uses Gnulib fts.c even 
on glibc platforms.) At some point somebody should merge Gnulib fts.c 
back into glibc.
#include <stddef.h>
#include <stdlib.h>

struct s { struct s *next; char d[]; };

int
main (void)
{
  struct s *p = malloc (offsetof (struct s, d) + 1);
  p->d[0] = 1;
  return p->d[0] == 2 && p->d[1] == 3 && p->d[2] == 4 && p->d[3] == 5;
}
diff mbox

Patch

Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 239987)
+++ gcc/c-family/c-common.c	(working copy)
@@ -12855,8 +12855,11 @@  scalar_to_vector (location_t loc, enum tree_code c
 bool
 cxx_fundamental_alignment_p  (unsigned align)
 {
-  return (align <=  MAX (TYPE_ALIGN (long_long_integer_type_node),
-			 TYPE_ALIGN (long_double_type_node)));
+  unsigned int max_align = MAX (TYPE_ALIGN (long_long_integer_type_node),
+				TYPE_ALIGN (long_double_type_node));
+  if (float128_type_node != NULL_TREE)
+    max_align = MAX (max_align, TYPE_ALIGN (float128_type_node));
+  return align <= max_align;
 }
 
 /* Return true if T is a pointer to a zero-sized aggregate.  */
Index: gcc/ginclude/stddef.h
===================================================================
--- gcc/ginclude/stddef.h	(revision 239987)
+++ gcc/ginclude/stddef.h	(working copy)
@@ -426,6 +426,14 @@  typedef __WINT_TYPE__ wint_t;
 typedef struct {
   long long __max_align_ll __attribute__((__aligned__(__alignof__(long long))));
   long double __max_align_ld __attribute__((__aligned__(__alignof__(long double))));
+  /* _Float128 is defined as a basic type, so max_align_t must be
+     sufficiently aligned for it.  This code must work in C++, so we
+     use __float128 here; that is only available on some
+     architectures, but only on i386 is extra alignment needed for
+     __float128.  */
+#ifdef __i386__
+  __float128 __max_align_f128 __attribute__((__aligned__(__alignof(__float128))));
+#endif
 } max_align_t;
 #endif
 #endif /* C11 or C++11.  */
Index: gcc/testsuite/gcc.dg/float128-align.c
===================================================================
--- gcc/testsuite/gcc.dg/float128-align.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/float128-align.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* Test _Float128 alignment.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-add-options float128 } */
+/* { dg-require-effective-target float128 } */
+
+#define WIDTH 128
+#define EXT 0
+#include "floatn-align.h"
Index: gcc/testsuite/gcc.dg/float128x-align.c
===================================================================
--- gcc/testsuite/gcc.dg/float128x-align.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/float128x-align.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* Test _Float128 alignment.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-add-options float128x } */
+/* { dg-require-effective-target float128x } */
+
+#define WIDTH 128
+#define EXT 1
+#include "floatn-align.h"
Index: gcc/testsuite/gcc.dg/float16-align.c
===================================================================
--- gcc/testsuite/gcc.dg/float16-align.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/float16-align.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* Test _Float16 alignment.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-add-options float16 } */
+/* { dg-require-effective-target float16 } */
+
+#define WIDTH 16
+#define EXT 0
+#include "floatn-align.h"
Index: gcc/testsuite/gcc.dg/float32-align.c
===================================================================
--- gcc/testsuite/gcc.dg/float32-align.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/float32-align.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* Test _Float32 alignment.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-add-options float32 } */
+/* { dg-require-effective-target float32 } */
+
+#define WIDTH 32
+#define EXT 0
+#include "floatn-align.h"
Index: gcc/testsuite/gcc.dg/float32x-align.c
===================================================================
--- gcc/testsuite/gcc.dg/float32x-align.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/float32x-align.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* Test _Float32 alignment.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-add-options float32x } */
+/* { dg-require-effective-target float32x } */
+
+#define WIDTH 32
+#define EXT 1
+#include "floatn-align.h"
Index: gcc/testsuite/gcc.dg/float64-align.c
===================================================================
--- gcc/testsuite/gcc.dg/float64-align.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/float64-align.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* Test _Float64 alignment.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-add-options float64 } */
+/* { dg-require-effective-target float64 } */
+
+#define WIDTH 64
+#define EXT 0
+#include "floatn-align.h"
Index: gcc/testsuite/gcc.dg/float64x-align.c
===================================================================
--- gcc/testsuite/gcc.dg/float64x-align.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/float64x-align.c	(working copy)
@@ -0,0 +1,9 @@ 
+/* Test _Float64 alignment.  */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-add-options float64x } */
+/* { dg-require-effective-target float64x } */
+
+#define WIDTH 64
+#define EXT 1
+#include "floatn-align.h"
Index: gcc/testsuite/gcc.dg/floatn-align.h
===================================================================
--- gcc/testsuite/gcc.dg/floatn-align.h	(nonexistent)
+++ gcc/testsuite/gcc.dg/floatn-align.h	(working copy)
@@ -0,0 +1,18 @@ 
+/* Tests for _FloatN / _FloatNx types: test max_align_t alignment.
+   Before including this file, define WIDTH as the value N; define EXT
+   to 1 for _FloatNx and 0 for _FloatN.  */
+
+#define CONCATX(X, Y) X ## Y
+#define CONCAT(X, Y) CONCATX (X, Y)
+#define CONCAT3(X, Y, Z) CONCAT (CONCAT (X, Y), Z)
+
+#if EXT
+# define TYPE CONCAT3 (_Float, WIDTH, x)
+#else
+# define TYPE CONCAT (_Float, WIDTH)
+#endif
+
+#include <stddef.h>
+
+_Static_assert (_Alignof (max_align_t) >= _Alignof (TYPE),
+		"max_align_t must be at least as aligned as _Float* types");