diff mbox

c/67882 - improve -Warray-bounds for invalid offsetof

Message ID 56263F80.1090203@t-online.de
State New
Headers show

Commit Message

Bernd Schmidt Oct. 20, 2015, 1:20 p.m. UTC
On 10/16/2015 09:34 PM, Martin Sebor wrote:
> Thank you for the review. Attached is an updated patch that hopefully
> addresses all your comments. I ran the check_GNU_style.sh script on
> it to make sure I didn't miss something.  I've also added replies to
> a few of your comments below.

Ok, thanks. However - the logic around the context struct in the patch 
still seems fairly impenetrable to me, and I've been wondering whether a 
simpler approach wouldn't work just as well. I came up with the patch 
below, which just passes a tree_code context to recursive invocations 
and increasing the upper bound by one only if not in an ARRAY_REF or 
COMPONENT_REF context.

As far as I can tell, this produces the same warnings on the testcase 
(modulo differences in type printout), except for the final few FA5_7 
testcases, which I had doubts about anyway:

typedef struct FA5_7 {
   int i;
   char a5_7 [5][7];
} FA5_7;

     __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning 
"index" }
     __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning 
"index" }
     __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning 
"index" }
     __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning 
"index" }

Why wouldn't at least the first two be convered by the 
one-past-the-end-is-ok rule?


Bernd

Comments

Martin Sebor Oct. 20, 2015, 3:31 p.m. UTC | #1
On 10/20/2015 07:20 AM, Bernd Schmidt wrote:
> On 10/16/2015 09:34 PM, Martin Sebor wrote:
>> Thank you for the review. Attached is an updated patch that hopefully
>> addresses all your comments. I ran the check_GNU_style.sh script on
>> it to make sure I didn't miss something.  I've also added replies to
>> a few of your comments below.
>
> Ok, thanks. However - the logic around the context struct in the patch
> still seems fairly impenetrable to me, and I've been wondering whether a
> simpler approach wouldn't work just as well. I came up with the patch
> below, which just passes a tree_code context to recursive invocations
> and increasing the upper bound by one only if not in an ARRAY_REF or
> COMPONENT_REF context.
>
> As far as I can tell, this produces the same warnings on the testcase
> (modulo differences in type printout), except for the final few FA5_7
> testcases, which I had doubts about anyway:
>
> typedef struct FA5_7 {
>    int i;
>    char a5_7 [5][7];
> } FA5_7;
>
>      __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning
> "index" }
>      __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning
> "index" }
>      __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning
> "index" }
>      __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning
> "index" }
>
> Why wouldn't at least the first two be convered by the
> one-past-the-end-is-ok rule?

Thanks for taking the time to try to simplify the patch!

Diagnosing the cases above is at the core of the issue (and can
be seen by substituting a5_7 into the oversimplified test case
in the bug).

In a 5 by 7 array, the offset computed for the out-of-bounds
a_5_7[0][7] is the same as the offset computed for the in-bounds
a_5_7[1][0].  The result might be unexpected because it suggests
that two "distinct" elements of an array share the same offset.

The "one-past-the-end rule" applies only to the offset corresponding
to the address just past the end of the whole array because like its
address, the offset of the element just beyond the end of the array
is valid and cannot be mistaken for an offset of a valid element.

My initial approach was similar to the patch you attached.  Most
of the additional complexity (i.e., the recursive use of the context)
grew out of wanting to diagnose these less than trivial cases of
surprising offsets in multi-dimensional arrays.  I don't think it
can be done without passing some state down the recursive calls but
I'd be happy to be proven wrong.

FWIW, teh bug and my patch for it were precipitated by looking for
the problems behind PR 67872 - missing -Warray-bounds warning (which
was in turn prompted by developing and testing a solution for PR
67942 - diagnose placement new buffer overflow).

I think -Warray-bounds should emit consistent diagnostics for invalid
array references regardless of the contexts. I.e., given

     struct S {
         int A [5][7];
         int x;
     } s;

these should both be diagnosed:

     int i = offsetof (struct S, A [0][7]);

     int *p = &s.A [0][7];

because they are both undefined and both can lead to surprising
results when used.

With my patch for 67882, GCC diagnoses the first case. I hope to
work on 67872 in the near future to diagnose the second to bring
GCC on par with Clang. (Clang doesn't yet diagnose any offsetof
errors).

Martin
Bernd Schmidt Oct. 20, 2015, 3:48 p.m. UTC | #2
On 10/20/2015 05:31 PM, Martin Sebor wrote:
> On 10/20/2015 07:20 AM, Bernd Schmidt wrote:
>> On 10/16/2015 09:34 PM, Martin Sebor wrote:
>>> Thank you for the review. Attached is an updated patch that hopefully
>>> addresses all your comments. I ran the check_GNU_style.sh script on
>>> it to make sure I didn't miss something.  I've also added replies to
>>> a few of your comments below.
>>
>> Ok, thanks. However - the logic around the context struct in the patch
>> still seems fairly impenetrable to me, and I've been wondering whether a
>> simpler approach wouldn't work just as well. I came up with the patch
>> below, which just passes a tree_code context to recursive invocations
>> and increasing the upper bound by one only if not in an ARRAY_REF or
>> COMPONENT_REF context.
>>
>> As far as I can tell, this produces the same warnings on the testcase
>> (modulo differences in type printout), except for the final few FA5_7
>> testcases, which I had doubts about anyway:
>>
>> typedef struct FA5_7 {
>>    int i;
>>    char a5_7 [5][7];
>> } FA5_7;
>>
>>      __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning
>> "index" }
>>      __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning
>> "index" }
>>      __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning
>> "index" }
>>      __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning
>> "index" }
>>
>> Why wouldn't at least the first two be convered by the
>> one-past-the-end-is-ok rule?
>
> Thanks for taking the time to try to simplify the patch!
>
> Diagnosing the cases above is at the core of the issue (and can
> be seen by substituting a5_7 into the oversimplified test case
> in the bug).
>
> In a 5 by 7 array, the offset computed for the out-of-bounds
> a_5_7[0][7] is the same as the offset computed for the in-bounds
> a_5_7[1][0].  The result might be unexpected because it suggests
> that two "distinct" elements of an array share the same offset.

Yeah, but Joseph wrote:

   Given such a flexible array member, [anything][0] and
   [anything][1] are logically valid if [anything] is
   nonnegative (of course, the array might or might not
   have enough element at runtime).  [anything][2] is a
   reference to just past the end of an element of the
   flexible array member; [anything][3] is clearly invalid.

   Regarding the actual testcase, accepting a1_1[0][1] seems
   fine (it's referencing the just-past-end element of the
   valid array a1_1[0]).

So how's that different from a5_7[0][7]? In case it matters, a1_1 in the 
earlier test that was discussed could not be a flexible array because it 
was followed by another field in the same struct.

Handing over review duties to Joseph because he'll have to decide what 
to warn about and what not to.

>     struct S {
>         int A [5][7];
>         int x;
>     } s;
>
> these should both be diagnosed:
>
>     int i = offsetof (struct S, A [0][7]);
>
>     int *p = &s.A [0][7];
>
> because they are both undefined and both can lead to surprising
> results when used.

Is that really undefined? I thought it was explicitly allowed. 
Dereferencing the pointer is what's undefined.


Bernd

Bernd
Joseph Myers Oct. 20, 2015, 4:53 p.m. UTC | #3
On Tue, 20 Oct 2015, Martin Sebor wrote:

> I think -Warray-bounds should emit consistent diagnostics for invalid
> array references regardless of the contexts. I.e., given
> 
>     struct S {
>         int A [5][7];
>         int x;
>     } s;
> 
> these should both be diagnosed:
> 
>     int i = offsetof (struct S, A [0][7]);
> 
>     int *p = &s.A [0][7];
> 
> because they are both undefined and both can lead to surprising
> results when used.

But both are valid.  &s.A [0][7] means s.A[0] + 7 (as explicitly specified 
in C11, neither the & nor the [] is evaluated in this case, but the [] 
turns into a +), and s.A[0] is an object of type int[7], which decays to a 
pointer to the first element of that array, so adding 7 produces a 
just-past-end pointer.  It's not valid to dereference that pointer, but 
the pointer itself is valid (and subtracting 1 from it produces a pointer 
you can dereference).
Martin Sebor Oct. 20, 2015, 4:53 p.m. UTC | #4
On 10/20/2015 09:48 AM, Bernd Schmidt wrote:
>
>
> On 10/20/2015 05:31 PM, Martin Sebor wrote:
>> On 10/20/2015 07:20 AM, Bernd Schmidt wrote:
>>> On 10/16/2015 09:34 PM, Martin Sebor wrote:
>>>> Thank you for the review. Attached is an updated patch that hopefully
>>>> addresses all your comments. I ran the check_GNU_style.sh script on
>>>> it to make sure I didn't miss something.  I've also added replies to
>>>> a few of your comments below.
>>>
>>> Ok, thanks. However - the logic around the context struct in the patch
>>> still seems fairly impenetrable to me, and I've been wondering whether a
>>> simpler approach wouldn't work just as well. I came up with the patch
>>> below, which just passes a tree_code context to recursive invocations
>>> and increasing the upper bound by one only if not in an ARRAY_REF or
>>> COMPONENT_REF context.
>>>
>>> As far as I can tell, this produces the same warnings on the testcase
>>> (modulo differences in type printout), except for the final few FA5_7
>>> testcases, which I had doubts about anyway:
>>>
>>> typedef struct FA5_7 {
>>>    int i;
>>>    char a5_7 [5][7];
>>> } FA5_7;
>>>
>>>      __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning
>>> "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning
>>> "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning
>>> "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning
>>> "index" }
>>>
>>> Why wouldn't at least the first two be convered by the
>>> one-past-the-end-is-ok rule?
>>
>> Thanks for taking the time to try to simplify the patch!
>>
>> Diagnosing the cases above is at the core of the issue (and can
>> be seen by substituting a5_7 into the oversimplified test case
>> in the bug).
>>
>> In a 5 by 7 array, the offset computed for the out-of-bounds
>> a_5_7[0][7] is the same as the offset computed for the in-bounds
>> a_5_7[1][0].  The result might be unexpected because it suggests
>> that two "distinct" elements of an array share the same offset.
>
> Yeah, but Joseph wrote:
>
>    Given such a flexible array member, [anything][0] and
>    [anything][1] are logically valid if [anything] is
>    nonnegative (of course, the array might or might not
>    have enough element at runtime).  [anything][2] is a
>    reference to just past the end of an element of the
>    flexible array member; [anything][3] is clearly invalid.
>
>    Regarding the actual testcase, accepting a1_1[0][1] seems
>    fine (it's referencing the just-past-end element of the
>    valid array a1_1[0]).
>
> So how's that different from a5_7[0][7]? In case it matters, a1_1 in the
> earlier test that was discussed could not be a flexible array because it
> was followed by another field in the same struct.

Let me try to explain using a couple of examples. Given

   struct S {
       a1_1 [1][1];
       int x;
   } s;

where a1_1 is an ordinary array of a single element

   offseof (struct S, a1_1[0][1])

is accepted as a GCC extension (and, IMO, should be made valid
in C) because a) there is no way to mistake the offset for the
offset of an element of the array at a valid index, and because
b) it yields to the same value as the following valid expression:

   (char*)&s.a1_1[0][1] - (char*)&s.a1_1;

However, given

   struct S {
       a5_7 [5][7];
       int x;
   } s;

the invalid offsetof expression

   offseof (struct S, a5_7[0][7])

should be diagnosed because a) it yields the same offset of the
valid

   offseof (struct S, a5_7[1][0])

and b) because s.a5_7[0][7] is not a valid reference in any context
(i.e., (char*)&s.a5_7[0][7] - (char*)&s.a5_7 is not a valid expression).

>
> Handing over review duties to Joseph because he'll have to decide what
> to warn about and what not to.
>
>>     struct S {
>>         int A [5][7];
>>         int x;
>>     } s;
>>
>> these should both be diagnosed:
>>
>>     int i = offsetof (struct S, A [0][7]);
>>
>>     int *p = &s.A [0][7];
>>
>> because they are both undefined and both can lead to surprising
>> results when used.
>
> Is that really undefined? I thought it was explicitly allowed.
> Dereferencing the pointer is what's undefined.

Both are undefined. The most succinct statements to this effect are
in the Unde3fined Behavior section of Annex J:

   An array subscript is out of range, even if an object is apparently
   accessible with the given subscript (as in the lvalue expression
   a[1][7] given the declaration int a[4][5]) (6.5.6).

   The member designator parameter of an offsetof macro is an
   invalid right operand of the . operator for the type parameter,
   or designates a bit-field (7.19).

Martin
Joseph Myers Oct. 20, 2015, 4:57 p.m. UTC | #5
On Tue, 20 Oct 2015, Martin Sebor wrote:

>   An array subscript is out of range, even if an object is apparently
>   accessible with the given subscript (as in the lvalue expression
>   a[1][7] given the declaration int a[4][5]) (6.5.6).

Just-past-the-end is only out of range if the dereference is executed, not 
in the &array[size] case which is equivalent to array + size.
Bernd Schmidt Oct. 20, 2015, 8:35 p.m. UTC | #6
On 10/20/2015 06:53 PM, Joseph Myers wrote:
> On Tue, 20 Oct 2015, Martin Sebor wrote:
>
>> I think -Warray-bounds should emit consistent diagnostics for invalid
>> array references regardless of the contexts. I.e., given
>>
>>      struct S {
>>          int A [5][7];
>>          int x;
>>      } s;
>>
>> these should both be diagnosed:
>>
>>      int i = offsetof (struct S, A [0][7]);
>>
>>      int *p = &s.A [0][7];
>>
>> because they are both undefined and both can lead to surprising
>> results when used.
>
> But both are valid.  &s.A [0][7] means s.A[0] + 7 (as explicitly specified
> in C11, neither the & nor the [] is evaluated in this case, but the []
> turns into a +), and s.A[0] is an object of type int[7], which decays to a
> pointer to the first element of that array, so adding 7 produces a
> just-past-end pointer.  It's not valid to dereference that pointer, but
> the pointer itself is valid (and subtracting 1 from it produces a pointer
> you can dereference).

Thanks, Joseph, this is helpful. There are a few other cases that I'm 
uncertain about and which might require additional changes to the patch; 
the one I sent was an experiment more than a finished product.

Consider

struct t { int a; int b; };
struct A { struct t v[2]; } a;

So I think we've established that
   &a.v[2]
is valid, giving a pointer just past the end of the structure. How about
   &a.v[2].a
and
   &a.v[2].b
The first of these gives the same pointer just past the end of the 
array, while the second one gives a higher address. I would expect that 
the second one is invalid, but I'm not so sure about the first one. 
Syntactically we have an access to an out-of-bounds field, but the whole 
expression just computes the valid one-past-the-end address.

I think this has an impact on the tests I quoted in my last mail:

typedef struct FA5_7 {
   int i;
   char a5_7 [5][7];
} FA5_7;

     __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning 
"index" }
     __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning 
"index" }
     __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning 
"index" }
     __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning 
"index" }

Here I think the last one of these is most likely invalid (being 8 bytes 
past the end of the object, rather than just one) and the others valid. 
Can you confirm this? (If the &a.v[2].a example is considered invalid, 
then I think the a5_7[5][0] test would be the equivalent and ought to 
also be considered invalid).

It might turn out that we have to do something like pass a pointer to an 
"at_end" boolean to recursive invocations, which would set it to true if 
the expression dealt with by the caller should not increase the address 
any further. Then, when folding the addition we check that the offset we 
have is either zero or we don't have at_end, and warn otherwise.


Bernd
Joseph Myers Oct. 20, 2015, 10:10 p.m. UTC | #7
On Tue, 20 Oct 2015, Bernd Schmidt wrote:

> Consider
> 
> struct t { int a; int b; };
> struct A { struct t v[2]; } a;
> 
> So I think we've established that
>   &a.v[2]
> is valid, giving a pointer just past the end of the structure. How about
>   &a.v[2].a
> and
>   &a.v[2].b
> The first of these gives the same pointer just past the end of the array,
> while the second one gives a higher address. I would expect that the second
> one is invalid, but I'm not so sure about the first one. Syntactically we have
> an access to an out-of-bounds field, but the whole expression just computes
> the valid one-past-the-end address.

I don't think either is valid.  The address operator '&' requires "a 
function designator, the result of a [] or unary * operator, or an lvalue 
that designates an object".  So because a.v[2].a does not designate an 
object, there is undefined behavior.  The special case for [] allows the 
address of a just-past-end array element to be taken, but that doesn't 
apply here.

> I think this has an impact on the tests I quoted in my last mail:
> 
> typedef struct FA5_7 {
>   int i;
>   char a5_7 [5][7];
> } FA5_7;
> 
>     __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning "index" }
>     __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning "index" }
>     __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning "index" }
>     __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning "index" }
> 
> Here I think the last one of these is most likely invalid (being 8 bytes past
> the end of the object, rather than just one) and the others valid. Can you
> confirm this? (If the &a.v[2].a example is considered invalid, then I think
> the a5_7[5][0] test would be the equivalent and ought to also be considered
> invalid).

The last one is certainly invalid.  The one before is arguably invalid as 
well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to 
a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] 
from array to pointer - an array expression gets converted to an 
expression "that points to the initial element of the array object", but 
there is no array object a5_7[5] here).
Bernd Schmidt Oct. 23, 2015, 11:13 a.m. UTC | #8
On 10/21/2015 12:10 AM, Joseph Myers wrote:
> On Tue, 20 Oct 2015, Bernd Schmidt wrote:
>>  How about
>>    &a.v[2].a
>> and
>>    &a.v[2].b
>
> I don't think either is valid.

>> typedef struct FA5_7 {
>>    int i;
>>    char a5_7 [5][7];
>> } FA5_7;
>>
>>      __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning "index" }
>>      __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning "index" }
>>      __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning "index" }
>>      __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning "index" }
>
> The last one is certainly invalid.  The one before is arguably invalid as
> well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to
> a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5]
> from array to pointer - an array expression gets converted to an
> expression "that points to the initial element of the array object", but
> there is no array object a5_7[5] here).

Martin, is this something you're working on, or should I have a go?


Bernd
Martin Sebor Oct. 23, 2015, 3:14 p.m. UTC | #9
On 10/23/2015 05:13 AM, Bernd Schmidt wrote:
> On 10/21/2015 12:10 AM, Joseph Myers wrote:
>> On Tue, 20 Oct 2015, Bernd Schmidt wrote:
>>>  How about
>>>    &a.v[2].a
>>> and
>>>    &a.v[2].b
>>
>> I don't think either is valid.
>
>>> typedef struct FA5_7 {
>>>    int i;
>>>    char a5_7 [5][7];
>>> } FA5_7;
>>>
>>>      __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning
>>> "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning
>>> "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning
>>> "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning
>>> "index" }
>>
>> The last one is certainly invalid.  The one before is arguably invalid as
>> well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to
>> a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5]
>> from array to pointer - an array expression gets converted to an
>> expression "that points to the initial element of the array object", but
>> there is no array object a5_7[5] here).
>
> Martin, is this something you're working on, or should I have a go?

I thought I made all the tweaks to these test cases in the last patch:

   https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01946.html

But now that I'm re-reading the answer above I see that Joseph
was suggesting that a5_7[5][0] should be diagnosed when the patch
accepts it as an extension.  I think we do want to accept it
because a5_7 is treated as a flexible array member (as an extension)
and so the upper bound of the major index is unknown. I.e., FA5_7
is defined like so:

   typedef struct FA5_7 {
       int i;
       char a5_7 [5][7];   // treated as char [][7]
    } FA5_7;

Martin
Joseph Myers Oct. 23, 2015, 4:50 p.m. UTC | #10
On Fri, 23 Oct 2015, Martin Sebor wrote:

> But now that I'm re-reading the answer above I see that Joseph
> was suggesting that a5_7[5][0] should be diagnosed when the patch
> accepts it as an extension.  I think we do want to accept it
> because a5_7 is treated as a flexible array member (as an extension)
> and so the upper bound of the major index is unknown. I.e., FA5_7
> is defined like so:

If you treat it as a flexible array member, then, yes, it would be valid.
Bernd Schmidt Oct. 23, 2015, 5:45 p.m. UTC | #11
On 10/23/2015 06:50 PM, Joseph Myers wrote:
> On Fri, 23 Oct 2015, Martin Sebor wrote:
>
>> But now that I'm re-reading the answer above I see that Joseph
>> was suggesting that a5_7[5][0] should be diagnosed when the patch
>> accepts it as an extension.  I think we do want to accept it
>> because a5_7 is treated as a flexible array member (as an extension)
>> and so the upper bound of the major index is unknown. I.e., FA5_7
>> is defined like so:
>
> If you treat it as a flexible array member, then, yes, it would be valid.

Ok, let's install the patch as-is, and postpone the discussion of 
whether that is a valid flexible array member (I certainly wouldn't have 
guessed so from the documentation which only mentions [], [0] and [1] as 
valid cases).

I guess this is a case where I could say either "I wrote the patch" or 
"I requested changes to a patch in review"; in the latter case I can 
approve it. Joseph seems on board with what we've discussed, so I'd say 
please wait until Tuesday for objections then commit.


Bernd
Martin Sebor Oct. 23, 2015, 8:40 p.m. UTC | #12
On 10/23/2015 11:45 AM, Bernd Schmidt wrote:
> On 10/23/2015 06:50 PM, Joseph Myers wrote:
>> On Fri, 23 Oct 2015, Martin Sebor wrote:
>>
>>> But now that I'm re-reading the answer above I see that Joseph
>>> was suggesting that a5_7[5][0] should be diagnosed when the patch
>>> accepts it as an extension.  I think we do want to accept it
>>> because a5_7 is treated as a flexible array member (as an extension)
>>> and so the upper bound of the major index is unknown. I.e., FA5_7
>>> is defined like so:
>>
>> If you treat it as a flexible array member, then, yes, it would be valid.
>
> Ok, let's install the patch as-is, and postpone the discussion of
> whether that is a valid flexible array member (I certainly wouldn't have
> guessed so from the documentation which only mentions [], [0] and [1] as
> valid cases).

The original code deliberately avoids diagnosing the case of last
array members with bounds greater than 1 (see the comment about
"a poor man's flexible array member" added with a fix for bug
41935) and I didn't want to change that.

But if there is sentiment for tightening it up I would be very
much in favor. IMO, it would be ideal if we could agree on and
apply the same rules for offsetof as for other expressions (and
diagnose, for example, &a5_7[5][0], which currently isn't
diagnosed).

As I mentioned, I'm planning to work on bug 67872 and it would
be helpful to know what our rules are up front. I can go back
and update this patch after it's been committed if the rules
evolve between now and then.

>
> I guess this is a case where I could say either "I wrote the patch" or
> "I requested changes to a patch in review"; in the latter case I can
> approve it. Joseph seems on board with what we've discussed, so I'd say
> please wait until Tuesday for objections then commit.

Okay.

Martin
Bernd Schmidt Oct. 26, 2015, 11:40 a.m. UTC | #13
On 10/23/2015 10:40 PM, Martin Sebor wrote:
>
> The original code deliberately avoids diagnosing the case of last
> array members with bounds greater than 1 (see the comment about
> "a poor man's flexible array member" added with a fix for bug
> 41935) and I didn't want to change that.

Jakub added that, Cc'd. Do you recall why this was done?

> But if there is sentiment for tightening it up I would be very
> much in favor.

I'd be in favor, but this is Joseph's call really.


Bernd
Jakub Jelinek Oct. 26, 2015, 11:47 a.m. UTC | #14
On Mon, Oct 26, 2015 at 12:40:18PM +0100, Bernd Schmidt wrote:
> On 10/23/2015 10:40 PM, Martin Sebor wrote:
> >
> >The original code deliberately avoids diagnosing the case of last
> >array members with bounds greater than 1 (see the comment about
> >"a poor man's flexible array member" added with a fix for bug
> >41935) and I didn't want to change that.
> 
> Jakub added that, Cc'd. Do you recall why this was done?

Because the amount of code that uses this (including GCC itself) is just too
huge, so everywhere in the middle-end we also special case last array members of
structs.  While C99+ has flexible array members, e.g. C++ does not, so users
are left with using struct { ... type fld[1]; };
or similar to stand for the flexible array members, even when they want to
be able to use ->fld[3] etc.

	Jakub
Bernd Schmidt Oct. 26, 2015, 11:57 a.m. UTC | #15
On 10/26/2015 12:47 PM, Jakub Jelinek wrote:

> Because the amount of code that uses this (including GCC itself) is just too
> huge, so everywhere in the middle-end we also special case last array members of
> structs.  While C99+ has flexible array members, e.g. C++ does not, so users
> are left with using struct { ... type fld[1]; };

Yes, and that case is documented. However, the issue is arrays declared 
with a larger size than 1 or 0 - is there really code using them as 
flexible array members?


Bernd
Jakub Jelinek Oct. 26, 2015, 12:01 p.m. UTC | #16
On Mon, Oct 26, 2015 at 12:57:53PM +0100, Bernd Schmidt wrote:
> On 10/26/2015 12:47 PM, Jakub Jelinek wrote:
> 
> >Because the amount of code that uses this (including GCC itself) is just too
> >huge, so everywhere in the middle-end we also special case last array members of
> >structs.  While C99+ has flexible array members, e.g. C++ does not, so users
> >are left with using struct { ... type fld[1]; };
> 
> Yes, and that case is documented. However, the issue is arrays declared with
> a larger size than 1 or 0 - is there really code using them as flexible
> array members?

I believe so, though don't have pointers to that right now.  But vaguely
remember we saw various cases of using 2 or other values too.

	Jakub
Richard Biener Oct. 26, 2015, 12:27 p.m. UTC | #17
On Mon, Oct 26, 2015 at 1:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Oct 26, 2015 at 12:57:53PM +0100, Bernd Schmidt wrote:
>> On 10/26/2015 12:47 PM, Jakub Jelinek wrote:
>>
>> >Because the amount of code that uses this (including GCC itself) is just too
>> >huge, so everywhere in the middle-end we also special case last array members of
>> >structs.  While C99+ has flexible array members, e.g. C++ does not, so users
>> >are left with using struct { ... type fld[1]; };
>>
>> Yes, and that case is documented. However, the issue is arrays declared with
>> a larger size than 1 or 0 - is there really code using them as flexible
>> array members?
>
> I believe so, though don't have pointers to that right now.  But vaguely
> remember we saw various cases of using 2 or other values too.

Yes, char[4] is quite common (basically making sure there is no appearant
padding behind the array due to alignment - just in case compilers might
be clever because of that).

Richard.

>         Jakub
Bernd Schmidt Oct. 27, 2015, 11:13 a.m. UTC | #18
On 10/26/2015 01:27 PM, Richard Biener wrote:
> On Mon, Oct 26, 2015 at 1:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Mon, Oct 26, 2015 at 12:57:53PM +0100, Bernd Schmidt wrote:
>>> On 10/26/2015 12:47 PM, Jakub Jelinek wrote:
>>>
>>>> Because the amount of code that uses this (including GCC itself) is just too
>>>> huge, so everywhere in the middle-end we also special case last array members of
>>>> structs.  While C99+ has flexible array members, e.g. C++ does not, so users
>>>> are left with using struct { ... type fld[1]; };
>>>
>>> Yes, and that case is documented. However, the issue is arrays declared with
>>> a larger size than 1 or 0 - is there really code using them as flexible
>>> array members?
>>
>> I believe so, though don't have pointers to that right now.  But vaguely
>> remember we saw various cases of using 2 or other values too.
>
> Yes, char[4] is quite common (basically making sure there is no appearant
> padding behind the array due to alignment - just in case compilers might
> be clever because of that).

Ugh, how ugly. Can we at least agree not to allow multi-dimensional 
arrays with a size larger than one to be considered flexible?


Bernd
Martin Sebor Nov. 3, 2015, 7:14 p.m. UTC | #19
>> I guess this is a case where I could say either "I wrote the patch" or
>> "I requested changes to a patch in review"; in the latter case I can
>> approve it. Joseph seems on board with what we've discussed, so I'd say
>> please wait until Tuesday for objections then commit.

I didn't get to committing the patch last week but I have done so
just now.

Martin
Segher Boessenkool Nov. 7, 2015, 11:38 p.m. UTC | #20
On Tue, Oct 20, 2015 at 10:10:44PM +0000, Joseph Myers wrote:
> > typedef struct FA5_7 {
> >   int i;
> >   char a5_7 [5][7];
> > } FA5_7;
> > 
> >     __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning "index" }
> >     __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning "index" }
> >     __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning "index" }
> >     __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning "index" }
> > 
> > Here I think the last one of these is most likely invalid (being 8 bytes past
> > the end of the object, rather than just one) and the others valid. Can you
> > confirm this? (If the &a.v[2].a example is considered invalid, then I think
> > the a5_7[5][0] test would be the equivalent and ought to also be considered
> > invalid).
> 
> The last one is certainly invalid.  The one before is arguably invalid as 
> well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to 
> a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] 
> from array to pointer - an array expression gets converted to an 
> expression "that points to the initial element of the array object", but 
> there is no array object a5_7[5] here).

C11, 6.5.2.1/3:
Successive subscript operators designate an element of a
multidimensional array object. If E is an n-dimensional array (n >= 2)
with dimensions i x j x . . . x k, then E (used as other than an lvalue)
is converted to a pointer to an (n - 1)-dimensional array with
dimensions j x . . . x k. If the unary * operator is applied to this
pointer explicitly, or implicitly as a result of subscripting, the
result is the referenced (n - 1)-dimensional array, which itself is
converted into a pointer if used as other than an lvalue. It follows
from this that arrays are stored in row-major order (last subscript
varies fastest).

As far as I see, a5_7[5] here is never treated as an array, just as a
pointer, and &a5_7[5][0] is valid.


Segher
Martin Sebor Nov. 9, 2015, 10:46 p.m. UTC | #21
On 11/07/2015 04:38 PM, Segher Boessenkool wrote:
> On Tue, Oct 20, 2015 at 10:10:44PM +0000, Joseph Myers wrote:
>>> typedef struct FA5_7 {
>>>    int i;
>>>    char a5_7 [5][7];
>>> } FA5_7;
>>>
>>>      __builtin_offsetof (FA5_7, a5_7 [0][7]),         // { dg-warning "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [1][7]),         // { dg-warning "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [5][0]),         // { dg-warning "index" }
>>>      __builtin_offsetof (FA5_7, a5_7 [5][7]),         // { dg-warning "index" }
>>>
>>> Here I think the last one of these is most likely invalid (being 8 bytes past
>>> the end of the object, rather than just one) and the others valid. Can you
>>> confirm this? (If the &a.v[2].a example is considered invalid, then I think
>>> the a5_7[5][0] test would be the equivalent and ought to also be considered
>>> invalid).
>>
>> The last one is certainly invalid.  The one before is arguably invalid as
>> well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to
>> a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5]
>> from array to pointer - an array expression gets converted to an
>> expression "that points to the initial element of the array object", but
>> there is no array object a5_7[5] here).
>
> C11, 6.5.2.1/3:
> Successive subscript operators designate an element of a
> multidimensional array object. If E is an n-dimensional array (n >= 2)
> with dimensions i x j x . . . x k, then E (used as other than an lvalue)
> is converted to a pointer to an (n - 1)-dimensional array with
> dimensions j x . . . x k. If the unary * operator is applied to this
> pointer explicitly, or implicitly as a result of subscripting, the
> result is the referenced (n - 1)-dimensional array, which itself is
> converted into a pointer if used as other than an lvalue. It follows
> from this that arrays are stored in row-major order (last subscript
> varies fastest).
>
> As far as I see, a5_7[5] here is never treated as an array, just as a
> pointer, and &a5_7[5][0] is valid.

Segher and I discussed this briefly on IRC over the weekend and
I agreed to try to get a confirmation of the interpretation the
warning is based on from WG14. I'll report back what I learn
(if anything). I defer to Bernd and Joseph as to whether to make
any changes in the meantime.

Martin
Joseph Myers Nov. 10, 2015, 12:02 a.m. UTC | #22
On Sat, 7 Nov 2015, Segher Boessenkool wrote:

> > The last one is certainly invalid.  The one before is arguably invalid as 
> > well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to 
> > a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] 
> > from array to pointer - an array expression gets converted to an 
> > expression "that points to the initial element of the array object", but 
> > there is no array object a5_7[5] here).
> 
> C11, 6.5.2.1/3:
> Successive subscript operators designate an element of a
> multidimensional array object. If E is an n-dimensional array (n >= 2)
> with dimensions i x j x . . . x k, then E (used as other than an lvalue)
> is converted to a pointer to an (n - 1)-dimensional array with
> dimensions j x . . . x k. If the unary * operator is applied to this
> pointer explicitly, or implicitly as a result of subscripting, the
> result is the referenced (n - 1)-dimensional array, which itself is
> converted into a pointer if used as other than an lvalue. It follows
> from this that arrays are stored in row-major order (last subscript
> varies fastest).
> 
> As far as I see, a5_7[5] here is never treated as an array, just as a
> pointer, and &a5_7[5][0] is valid.

As usual, based on taking the address, not offsetof where there's the open 
question of whether the C standard actually requires support for anything 
other than a single element name there:

a5_7[5] is an expression of array type.  The only way for it to be treated 
as a pointer is for it to be converted implicitly to pointer type.  That 
implicit conversion is what I think is problematic.

Only once the implicit conversion has taken place do the special rules 
about &A[B] meaning A + B take effect.  But since the problem I see is 
with the conversion of A to a pointer, you still have undefined behavior.

The paragraph you quote seems to not to add anything to the semantics 
defined elsewhere in the standard; it's purely descriptive of some 
consequences of those semantics.

Whether we wish to be more permissive about some such cases (depending on 
-Warray-bounds=N) is a pragmatic matter depending on the extent to which 
they are used in practice.
diff mbox

Patch

Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	(revision 229049)
+++ gcc/c-family/c-common.c	(working copy)
@@ -10589,11 +10589,11 @@  c_common_to_target_charset (HOST_WIDE_IN
    traditional rendering of offsetof as a macro.  Return the folded result.  */
 
 tree
-fold_offsetof_1 (tree expr)
+fold_offsetof_1 (tree expr, enum tree_code ctx)
 {
   tree base, off, t;
-
-  switch (TREE_CODE (expr))
+  tree_code code = TREE_CODE (expr);
+  switch (code)
     {
     case ERROR_MARK:
       return expr;
@@ -10617,7 +10617,7 @@  fold_offsetof_1 (tree expr)
       return TREE_OPERAND (expr, 0);
 
     case COMPONENT_REF:
-      base = fold_offsetof_1 (TREE_OPERAND (expr, 0));
+      base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
       if (base == error_mark_node)
 	return base;
 
@@ -10634,7 +10634,7 @@  fold_offsetof_1 (tree expr)
       break;
 
     case ARRAY_REF:
-      base = fold_offsetof_1 (TREE_OPERAND (expr, 0));
+      base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code);
       if (base == error_mark_node)
 	return base;
 
@@ -10649,8 +10649,9 @@  fold_offsetof_1 (tree expr)
 	      && !tree_int_cst_equal (upbound,
 				      TYPE_MAX_VALUE (TREE_TYPE (upbound))))
 	    {
-	      upbound = size_binop (PLUS_EXPR, upbound,
-				    build_int_cst (TREE_TYPE (upbound), 1));
+	      if (ctx != ARRAY_REF && ctx != COMPONENT_REF)
+		upbound = size_binop (PLUS_EXPR, upbound,
+				      build_int_cst (TREE_TYPE (upbound), 1));
 	      if (tree_int_cst_lt (upbound, t))
 		{
 		  tree v;
Index: gcc/c-family/c-common.h
===================================================================
--- gcc/c-family/c-common.h	(revision 229049)
+++ gcc/c-family/c-common.h	(working copy)
@@ -1029,7 +1029,7 @@  extern bool c_dump_tree (void *, tree);
 
 extern void verify_sequence_points (tree);
 
-extern tree fold_offsetof_1 (tree);
+extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK);
 extern tree fold_offsetof (tree);
 
 /* Places where an lvalue, or modifiable lvalue, may be required.