Message ID | 56263F80.1090203@t-online.de |
---|---|
State | New |
Headers | show |
On 10/20/2015 07:20 AM, Bernd Schmidt wrote: > On 10/16/2015 09:34 PM, Martin Sebor wrote: >> Thank you for the review. Attached is an updated patch that hopefully >> addresses all your comments. I ran the check_GNU_style.sh script on >> it to make sure I didn't miss something. I've also added replies to >> a few of your comments below. > > Ok, thanks. However - the logic around the context struct in the patch > still seems fairly impenetrable to me, and I've been wondering whether a > simpler approach wouldn't work just as well. I came up with the patch > below, which just passes a tree_code context to recursive invocations > and increasing the upper bound by one only if not in an ARRAY_REF or > COMPONENT_REF context. > > As far as I can tell, this produces the same warnings on the testcase > (modulo differences in type printout), except for the final few FA5_7 > testcases, which I had doubts about anyway: > > typedef struct FA5_7 { > int i; > char a5_7 [5][7]; > } FA5_7; > > __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning > "index" } > __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning > "index" } > __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning > "index" } > __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning > "index" } > > Why wouldn't at least the first two be convered by the > one-past-the-end-is-ok rule? Thanks for taking the time to try to simplify the patch! Diagnosing the cases above is at the core of the issue (and can be seen by substituting a5_7 into the oversimplified test case in the bug). In a 5 by 7 array, the offset computed for the out-of-bounds a_5_7[0][7] is the same as the offset computed for the in-bounds a_5_7[1][0]. The result might be unexpected because it suggests that two "distinct" elements of an array share the same offset. The "one-past-the-end rule" applies only to the offset corresponding to the address just past the end of the whole array because like its address, the offset of the element just beyond the end of the array is valid and cannot be mistaken for an offset of a valid element. My initial approach was similar to the patch you attached. Most of the additional complexity (i.e., the recursive use of the context) grew out of wanting to diagnose these less than trivial cases of surprising offsets in multi-dimensional arrays. I don't think it can be done without passing some state down the recursive calls but I'd be happy to be proven wrong. FWIW, teh bug and my patch for it were precipitated by looking for the problems behind PR 67872 - missing -Warray-bounds warning (which was in turn prompted by developing and testing a solution for PR 67942 - diagnose placement new buffer overflow). I think -Warray-bounds should emit consistent diagnostics for invalid array references regardless of the contexts. I.e., given struct S { int A [5][7]; int x; } s; these should both be diagnosed: int i = offsetof (struct S, A [0][7]); int *p = &s.A [0][7]; because they are both undefined and both can lead to surprising results when used. With my patch for 67882, GCC diagnoses the first case. I hope to work on 67872 in the near future to diagnose the second to bring GCC on par with Clang. (Clang doesn't yet diagnose any offsetof errors). Martin
On 10/20/2015 05:31 PM, Martin Sebor wrote: > On 10/20/2015 07:20 AM, Bernd Schmidt wrote: >> On 10/16/2015 09:34 PM, Martin Sebor wrote: >>> Thank you for the review. Attached is an updated patch that hopefully >>> addresses all your comments. I ran the check_GNU_style.sh script on >>> it to make sure I didn't miss something. I've also added replies to >>> a few of your comments below. >> >> Ok, thanks. However - the logic around the context struct in the patch >> still seems fairly impenetrable to me, and I've been wondering whether a >> simpler approach wouldn't work just as well. I came up with the patch >> below, which just passes a tree_code context to recursive invocations >> and increasing the upper bound by one only if not in an ARRAY_REF or >> COMPONENT_REF context. >> >> As far as I can tell, this produces the same warnings on the testcase >> (modulo differences in type printout), except for the final few FA5_7 >> testcases, which I had doubts about anyway: >> >> typedef struct FA5_7 { >> int i; >> char a5_7 [5][7]; >> } FA5_7; >> >> __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning >> "index" } >> __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning >> "index" } >> __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning >> "index" } >> __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning >> "index" } >> >> Why wouldn't at least the first two be convered by the >> one-past-the-end-is-ok rule? > > Thanks for taking the time to try to simplify the patch! > > Diagnosing the cases above is at the core of the issue (and can > be seen by substituting a5_7 into the oversimplified test case > in the bug). > > In a 5 by 7 array, the offset computed for the out-of-bounds > a_5_7[0][7] is the same as the offset computed for the in-bounds > a_5_7[1][0]. The result might be unexpected because it suggests > that two "distinct" elements of an array share the same offset. Yeah, but Joseph wrote: Given such a flexible array member, [anything][0] and [anything][1] are logically valid if [anything] is nonnegative (of course, the array might or might not have enough element at runtime). [anything][2] is a reference to just past the end of an element of the flexible array member; [anything][3] is clearly invalid. Regarding the actual testcase, accepting a1_1[0][1] seems fine (it's referencing the just-past-end element of the valid array a1_1[0]). So how's that different from a5_7[0][7]? In case it matters, a1_1 in the earlier test that was discussed could not be a flexible array because it was followed by another field in the same struct. Handing over review duties to Joseph because he'll have to decide what to warn about and what not to. > struct S { > int A [5][7]; > int x; > } s; > > these should both be diagnosed: > > int i = offsetof (struct S, A [0][7]); > > int *p = &s.A [0][7]; > > because they are both undefined and both can lead to surprising > results when used. Is that really undefined? I thought it was explicitly allowed. Dereferencing the pointer is what's undefined. Bernd Bernd
On Tue, 20 Oct 2015, Martin Sebor wrote: > I think -Warray-bounds should emit consistent diagnostics for invalid > array references regardless of the contexts. I.e., given > > struct S { > int A [5][7]; > int x; > } s; > > these should both be diagnosed: > > int i = offsetof (struct S, A [0][7]); > > int *p = &s.A [0][7]; > > because they are both undefined and both can lead to surprising > results when used. But both are valid. &s.A [0][7] means s.A[0] + 7 (as explicitly specified in C11, neither the & nor the [] is evaluated in this case, but the [] turns into a +), and s.A[0] is an object of type int[7], which decays to a pointer to the first element of that array, so adding 7 produces a just-past-end pointer. It's not valid to dereference that pointer, but the pointer itself is valid (and subtracting 1 from it produces a pointer you can dereference).
On 10/20/2015 09:48 AM, Bernd Schmidt wrote: > > > On 10/20/2015 05:31 PM, Martin Sebor wrote: >> On 10/20/2015 07:20 AM, Bernd Schmidt wrote: >>> On 10/16/2015 09:34 PM, Martin Sebor wrote: >>>> Thank you for the review. Attached is an updated patch that hopefully >>>> addresses all your comments. I ran the check_GNU_style.sh script on >>>> it to make sure I didn't miss something. I've also added replies to >>>> a few of your comments below. >>> >>> Ok, thanks. However - the logic around the context struct in the patch >>> still seems fairly impenetrable to me, and I've been wondering whether a >>> simpler approach wouldn't work just as well. I came up with the patch >>> below, which just passes a tree_code context to recursive invocations >>> and increasing the upper bound by one only if not in an ARRAY_REF or >>> COMPONENT_REF context. >>> >>> As far as I can tell, this produces the same warnings on the testcase >>> (modulo differences in type printout), except for the final few FA5_7 >>> testcases, which I had doubts about anyway: >>> >>> typedef struct FA5_7 { >>> int i; >>> char a5_7 [5][7]; >>> } FA5_7; >>> >>> __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning >>> "index" } >>> __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning >>> "index" } >>> __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning >>> "index" } >>> __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning >>> "index" } >>> >>> Why wouldn't at least the first two be convered by the >>> one-past-the-end-is-ok rule? >> >> Thanks for taking the time to try to simplify the patch! >> >> Diagnosing the cases above is at the core of the issue (and can >> be seen by substituting a5_7 into the oversimplified test case >> in the bug). >> >> In a 5 by 7 array, the offset computed for the out-of-bounds >> a_5_7[0][7] is the same as the offset computed for the in-bounds >> a_5_7[1][0]. The result might be unexpected because it suggests >> that two "distinct" elements of an array share the same offset. > > Yeah, but Joseph wrote: > > Given such a flexible array member, [anything][0] and > [anything][1] are logically valid if [anything] is > nonnegative (of course, the array might or might not > have enough element at runtime). [anything][2] is a > reference to just past the end of an element of the > flexible array member; [anything][3] is clearly invalid. > > Regarding the actual testcase, accepting a1_1[0][1] seems > fine (it's referencing the just-past-end element of the > valid array a1_1[0]). > > So how's that different from a5_7[0][7]? In case it matters, a1_1 in the > earlier test that was discussed could not be a flexible array because it > was followed by another field in the same struct. Let me try to explain using a couple of examples. Given struct S { a1_1 [1][1]; int x; } s; where a1_1 is an ordinary array of a single element offseof (struct S, a1_1[0][1]) is accepted as a GCC extension (and, IMO, should be made valid in C) because a) there is no way to mistake the offset for the offset of an element of the array at a valid index, and because b) it yields to the same value as the following valid expression: (char*)&s.a1_1[0][1] - (char*)&s.a1_1; However, given struct S { a5_7 [5][7]; int x; } s; the invalid offsetof expression offseof (struct S, a5_7[0][7]) should be diagnosed because a) it yields the same offset of the valid offseof (struct S, a5_7[1][0]) and b) because s.a5_7[0][7] is not a valid reference in any context (i.e., (char*)&s.a5_7[0][7] - (char*)&s.a5_7 is not a valid expression). > > Handing over review duties to Joseph because he'll have to decide what > to warn about and what not to. > >> struct S { >> int A [5][7]; >> int x; >> } s; >> >> these should both be diagnosed: >> >> int i = offsetof (struct S, A [0][7]); >> >> int *p = &s.A [0][7]; >> >> because they are both undefined and both can lead to surprising >> results when used. > > Is that really undefined? I thought it was explicitly allowed. > Dereferencing the pointer is what's undefined. Both are undefined. The most succinct statements to this effect are in the Unde3fined Behavior section of Annex J: An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6). The member designator parameter of an offsetof macro is an invalid right operand of the . operator for the type parameter, or designates a bit-field (7.19). Martin
On Tue, 20 Oct 2015, Martin Sebor wrote: > An array subscript is out of range, even if an object is apparently > accessible with the given subscript (as in the lvalue expression > a[1][7] given the declaration int a[4][5]) (6.5.6). Just-past-the-end is only out of range if the dereference is executed, not in the &array[size] case which is equivalent to array + size.
On 10/20/2015 06:53 PM, Joseph Myers wrote: > On Tue, 20 Oct 2015, Martin Sebor wrote: > >> I think -Warray-bounds should emit consistent diagnostics for invalid >> array references regardless of the contexts. I.e., given >> >> struct S { >> int A [5][7]; >> int x; >> } s; >> >> these should both be diagnosed: >> >> int i = offsetof (struct S, A [0][7]); >> >> int *p = &s.A [0][7]; >> >> because they are both undefined and both can lead to surprising >> results when used. > > But both are valid. &s.A [0][7] means s.A[0] + 7 (as explicitly specified > in C11, neither the & nor the [] is evaluated in this case, but the [] > turns into a +), and s.A[0] is an object of type int[7], which decays to a > pointer to the first element of that array, so adding 7 produces a > just-past-end pointer. It's not valid to dereference that pointer, but > the pointer itself is valid (and subtracting 1 from it produces a pointer > you can dereference). Thanks, Joseph, this is helpful. There are a few other cases that I'm uncertain about and which might require additional changes to the patch; the one I sent was an experiment more than a finished product. Consider struct t { int a; int b; }; struct A { struct t v[2]; } a; So I think we've established that &a.v[2] is valid, giving a pointer just past the end of the structure. How about &a.v[2].a and &a.v[2].b The first of these gives the same pointer just past the end of the array, while the second one gives a higher address. I would expect that the second one is invalid, but I'm not so sure about the first one. Syntactically we have an access to an out-of-bounds field, but the whole expression just computes the valid one-past-the-end address. I think this has an impact on the tests I quoted in my last mail: typedef struct FA5_7 { int i; char a5_7 [5][7]; } FA5_7; __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } Here I think the last one of these is most likely invalid (being 8 bytes past the end of the object, rather than just one) and the others valid. Can you confirm this? (If the &a.v[2].a example is considered invalid, then I think the a5_7[5][0] test would be the equivalent and ought to also be considered invalid). It might turn out that we have to do something like pass a pointer to an "at_end" boolean to recursive invocations, which would set it to true if the expression dealt with by the caller should not increase the address any further. Then, when folding the addition we check that the offset we have is either zero or we don't have at_end, and warn otherwise. Bernd
On Tue, 20 Oct 2015, Bernd Schmidt wrote: > Consider > > struct t { int a; int b; }; > struct A { struct t v[2]; } a; > > So I think we've established that > &a.v[2] > is valid, giving a pointer just past the end of the structure. How about > &a.v[2].a > and > &a.v[2].b > The first of these gives the same pointer just past the end of the array, > while the second one gives a higher address. I would expect that the second > one is invalid, but I'm not so sure about the first one. Syntactically we have > an access to an out-of-bounds field, but the whole expression just computes > the valid one-past-the-end address. I don't think either is valid. The address operator '&' requires "a function designator, the result of a [] or unary * operator, or an lvalue that designates an object". So because a.v[2].a does not designate an object, there is undefined behavior. The special case for [] allows the address of a just-past-end array element to be taken, but that doesn't apply here. > I think this has an impact on the tests I quoted in my last mail: > > typedef struct FA5_7 { > int i; > char a5_7 [5][7]; > } FA5_7; > > __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } > __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } > __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } > __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } > > Here I think the last one of these is most likely invalid (being 8 bytes past > the end of the object, rather than just one) and the others valid. Can you > confirm this? (If the &a.v[2].a example is considered invalid, then I think > the a5_7[5][0] test would be the equivalent and ought to also be considered > invalid). The last one is certainly invalid. The one before is arguably invalid as well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] from array to pointer - an array expression gets converted to an expression "that points to the initial element of the array object", but there is no array object a5_7[5] here).
On 10/21/2015 12:10 AM, Joseph Myers wrote: > On Tue, 20 Oct 2015, Bernd Schmidt wrote: >> How about >> &a.v[2].a >> and >> &a.v[2].b > > I don't think either is valid. >> typedef struct FA5_7 { >> int i; >> char a5_7 [5][7]; >> } FA5_7; >> >> __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } >> __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } >> __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } >> __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } > > The last one is certainly invalid. The one before is arguably invalid as > well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to > a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] > from array to pointer - an array expression gets converted to an > expression "that points to the initial element of the array object", but > there is no array object a5_7[5] here). Martin, is this something you're working on, or should I have a go? Bernd
On 10/23/2015 05:13 AM, Bernd Schmidt wrote: > On 10/21/2015 12:10 AM, Joseph Myers wrote: >> On Tue, 20 Oct 2015, Bernd Schmidt wrote: >>> How about >>> &a.v[2].a >>> and >>> &a.v[2].b >> >> I don't think either is valid. > >>> typedef struct FA5_7 { >>> int i; >>> char a5_7 [5][7]; >>> } FA5_7; >>> >>> __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning >>> "index" } >>> __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning >>> "index" } >>> __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning >>> "index" } >>> __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning >>> "index" } >> >> The last one is certainly invalid. The one before is arguably invalid as >> well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to >> a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] >> from array to pointer - an array expression gets converted to an >> expression "that points to the initial element of the array object", but >> there is no array object a5_7[5] here). > > Martin, is this something you're working on, or should I have a go? I thought I made all the tweaks to these test cases in the last patch: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg01946.html But now that I'm re-reading the answer above I see that Joseph was suggesting that a5_7[5][0] should be diagnosed when the patch accepts it as an extension. I think we do want to accept it because a5_7 is treated as a flexible array member (as an extension) and so the upper bound of the major index is unknown. I.e., FA5_7 is defined like so: typedef struct FA5_7 { int i; char a5_7 [5][7]; // treated as char [][7] } FA5_7; Martin
On Fri, 23 Oct 2015, Martin Sebor wrote: > But now that I'm re-reading the answer above I see that Joseph > was suggesting that a5_7[5][0] should be diagnosed when the patch > accepts it as an extension. I think we do want to accept it > because a5_7 is treated as a flexible array member (as an extension) > and so the upper bound of the major index is unknown. I.e., FA5_7 > is defined like so: If you treat it as a flexible array member, then, yes, it would be valid.
On 10/23/2015 06:50 PM, Joseph Myers wrote: > On Fri, 23 Oct 2015, Martin Sebor wrote: > >> But now that I'm re-reading the answer above I see that Joseph >> was suggesting that a5_7[5][0] should be diagnosed when the patch >> accepts it as an extension. I think we do want to accept it >> because a5_7 is treated as a flexible array member (as an extension) >> and so the upper bound of the major index is unknown. I.e., FA5_7 >> is defined like so: > > If you treat it as a flexible array member, then, yes, it would be valid. Ok, let's install the patch as-is, and postpone the discussion of whether that is a valid flexible array member (I certainly wouldn't have guessed so from the documentation which only mentions [], [0] and [1] as valid cases). I guess this is a case where I could say either "I wrote the patch" or "I requested changes to a patch in review"; in the latter case I can approve it. Joseph seems on board with what we've discussed, so I'd say please wait until Tuesday for objections then commit. Bernd
On 10/23/2015 11:45 AM, Bernd Schmidt wrote: > On 10/23/2015 06:50 PM, Joseph Myers wrote: >> On Fri, 23 Oct 2015, Martin Sebor wrote: >> >>> But now that I'm re-reading the answer above I see that Joseph >>> was suggesting that a5_7[5][0] should be diagnosed when the patch >>> accepts it as an extension. I think we do want to accept it >>> because a5_7 is treated as a flexible array member (as an extension) >>> and so the upper bound of the major index is unknown. I.e., FA5_7 >>> is defined like so: >> >> If you treat it as a flexible array member, then, yes, it would be valid. > > Ok, let's install the patch as-is, and postpone the discussion of > whether that is a valid flexible array member (I certainly wouldn't have > guessed so from the documentation which only mentions [], [0] and [1] as > valid cases). The original code deliberately avoids diagnosing the case of last array members with bounds greater than 1 (see the comment about "a poor man's flexible array member" added with a fix for bug 41935) and I didn't want to change that. But if there is sentiment for tightening it up I would be very much in favor. IMO, it would be ideal if we could agree on and apply the same rules for offsetof as for other expressions (and diagnose, for example, &a5_7[5][0], which currently isn't diagnosed). As I mentioned, I'm planning to work on bug 67872 and it would be helpful to know what our rules are up front. I can go back and update this patch after it's been committed if the rules evolve between now and then. > > I guess this is a case where I could say either "I wrote the patch" or > "I requested changes to a patch in review"; in the latter case I can > approve it. Joseph seems on board with what we've discussed, so I'd say > please wait until Tuesday for objections then commit. Okay. Martin
On 10/23/2015 10:40 PM, Martin Sebor wrote: > > The original code deliberately avoids diagnosing the case of last > array members with bounds greater than 1 (see the comment about > "a poor man's flexible array member" added with a fix for bug > 41935) and I didn't want to change that. Jakub added that, Cc'd. Do you recall why this was done? > But if there is sentiment for tightening it up I would be very > much in favor. I'd be in favor, but this is Joseph's call really. Bernd
On Mon, Oct 26, 2015 at 12:40:18PM +0100, Bernd Schmidt wrote: > On 10/23/2015 10:40 PM, Martin Sebor wrote: > > > >The original code deliberately avoids diagnosing the case of last > >array members with bounds greater than 1 (see the comment about > >"a poor man's flexible array member" added with a fix for bug > >41935) and I didn't want to change that. > > Jakub added that, Cc'd. Do you recall why this was done? Because the amount of code that uses this (including GCC itself) is just too huge, so everywhere in the middle-end we also special case last array members of structs. While C99+ has flexible array members, e.g. C++ does not, so users are left with using struct { ... type fld[1]; }; or similar to stand for the flexible array members, even when they want to be able to use ->fld[3] etc. Jakub
On 10/26/2015 12:47 PM, Jakub Jelinek wrote: > Because the amount of code that uses this (including GCC itself) is just too > huge, so everywhere in the middle-end we also special case last array members of > structs. While C99+ has flexible array members, e.g. C++ does not, so users > are left with using struct { ... type fld[1]; }; Yes, and that case is documented. However, the issue is arrays declared with a larger size than 1 or 0 - is there really code using them as flexible array members? Bernd
On Mon, Oct 26, 2015 at 12:57:53PM +0100, Bernd Schmidt wrote: > On 10/26/2015 12:47 PM, Jakub Jelinek wrote: > > >Because the amount of code that uses this (including GCC itself) is just too > >huge, so everywhere in the middle-end we also special case last array members of > >structs. While C99+ has flexible array members, e.g. C++ does not, so users > >are left with using struct { ... type fld[1]; }; > > Yes, and that case is documented. However, the issue is arrays declared with > a larger size than 1 or 0 - is there really code using them as flexible > array members? I believe so, though don't have pointers to that right now. But vaguely remember we saw various cases of using 2 or other values too. Jakub
On Mon, Oct 26, 2015 at 1:01 PM, Jakub Jelinek <jakub@redhat.com> wrote: > On Mon, Oct 26, 2015 at 12:57:53PM +0100, Bernd Schmidt wrote: >> On 10/26/2015 12:47 PM, Jakub Jelinek wrote: >> >> >Because the amount of code that uses this (including GCC itself) is just too >> >huge, so everywhere in the middle-end we also special case last array members of >> >structs. While C99+ has flexible array members, e.g. C++ does not, so users >> >are left with using struct { ... type fld[1]; }; >> >> Yes, and that case is documented. However, the issue is arrays declared with >> a larger size than 1 or 0 - is there really code using them as flexible >> array members? > > I believe so, though don't have pointers to that right now. But vaguely > remember we saw various cases of using 2 or other values too. Yes, char[4] is quite common (basically making sure there is no appearant padding behind the array due to alignment - just in case compilers might be clever because of that). Richard. > Jakub
On 10/26/2015 01:27 PM, Richard Biener wrote: > On Mon, Oct 26, 2015 at 1:01 PM, Jakub Jelinek <jakub@redhat.com> wrote: >> On Mon, Oct 26, 2015 at 12:57:53PM +0100, Bernd Schmidt wrote: >>> On 10/26/2015 12:47 PM, Jakub Jelinek wrote: >>> >>>> Because the amount of code that uses this (including GCC itself) is just too >>>> huge, so everywhere in the middle-end we also special case last array members of >>>> structs. While C99+ has flexible array members, e.g. C++ does not, so users >>>> are left with using struct { ... type fld[1]; }; >>> >>> Yes, and that case is documented. However, the issue is arrays declared with >>> a larger size than 1 or 0 - is there really code using them as flexible >>> array members? >> >> I believe so, though don't have pointers to that right now. But vaguely >> remember we saw various cases of using 2 or other values too. > > Yes, char[4] is quite common (basically making sure there is no appearant > padding behind the array due to alignment - just in case compilers might > be clever because of that). Ugh, how ugly. Can we at least agree not to allow multi-dimensional arrays with a size larger than one to be considered flexible? Bernd
>> I guess this is a case where I could say either "I wrote the patch" or >> "I requested changes to a patch in review"; in the latter case I can >> approve it. Joseph seems on board with what we've discussed, so I'd say >> please wait until Tuesday for objections then commit. I didn't get to committing the patch last week but I have done so just now. Martin
On Tue, Oct 20, 2015 at 10:10:44PM +0000, Joseph Myers wrote: > > typedef struct FA5_7 { > > int i; > > char a5_7 [5][7]; > > } FA5_7; > > > > __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } > > __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } > > __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } > > __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } > > > > Here I think the last one of these is most likely invalid (being 8 bytes past > > the end of the object, rather than just one) and the others valid. Can you > > confirm this? (If the &a.v[2].a example is considered invalid, then I think > > the a5_7[5][0] test would be the equivalent and ought to also be considered > > invalid). > > The last one is certainly invalid. The one before is arguably invalid as > well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to > a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] > from array to pointer - an array expression gets converted to an > expression "that points to the initial element of the array object", but > there is no array object a5_7[5] here). C11, 6.5.2.1/3: Successive subscript operators designate an element of a multidimensional array object. If E is an n-dimensional array (n >= 2) with dimensions i x j x . . . x k, then E (used as other than an lvalue) is converted to a pointer to an (n - 1)-dimensional array with dimensions j x . . . x k. If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n - 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue. It follows from this that arrays are stored in row-major order (last subscript varies fastest). As far as I see, a5_7[5] here is never treated as an array, just as a pointer, and &a5_7[5][0] is valid. Segher
On 11/07/2015 04:38 PM, Segher Boessenkool wrote: > On Tue, Oct 20, 2015 at 10:10:44PM +0000, Joseph Myers wrote: >>> typedef struct FA5_7 { >>> int i; >>> char a5_7 [5][7]; >>> } FA5_7; >>> >>> __builtin_offsetof (FA5_7, a5_7 [0][7]), // { dg-warning "index" } >>> __builtin_offsetof (FA5_7, a5_7 [1][7]), // { dg-warning "index" } >>> __builtin_offsetof (FA5_7, a5_7 [5][0]), // { dg-warning "index" } >>> __builtin_offsetof (FA5_7, a5_7 [5][7]), // { dg-warning "index" } >>> >>> Here I think the last one of these is most likely invalid (being 8 bytes past >>> the end of the object, rather than just one) and the others valid. Can you >>> confirm this? (If the &a.v[2].a example is considered invalid, then I think >>> the a5_7[5][0] test would be the equivalent and ought to also be considered >>> invalid). >> >> The last one is certainly invalid. The one before is arguably invalid as >> well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to >> a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] >> from array to pointer - an array expression gets converted to an >> expression "that points to the initial element of the array object", but >> there is no array object a5_7[5] here). > > C11, 6.5.2.1/3: > Successive subscript operators designate an element of a > multidimensional array object. If E is an n-dimensional array (n >= 2) > with dimensions i x j x . . . x k, then E (used as other than an lvalue) > is converted to a pointer to an (n - 1)-dimensional array with > dimensions j x . . . x k. If the unary * operator is applied to this > pointer explicitly, or implicitly as a result of subscripting, the > result is the referenced (n - 1)-dimensional array, which itself is > converted into a pointer if used as other than an lvalue. It follows > from this that arrays are stored in row-major order (last subscript > varies fastest). > > As far as I see, a5_7[5] here is never treated as an array, just as a > pointer, and &a5_7[5][0] is valid. Segher and I discussed this briefly on IRC over the weekend and I agreed to try to get a confirmation of the interpretation the warning is based on from WG14. I'll report back what I learn (if anything). I defer to Bernd and Joseph as to whether to make any changes in the meantime. Martin
On Sat, 7 Nov 2015, Segher Boessenkool wrote: > > The last one is certainly invalid. The one before is arguably invalid as > > well (in the unary '&' equivalent, &a5_7[5][0] which is equivalent to > > a5_7[5] + 0, the questionable operation is implicit conversion of a5_7[5] > > from array to pointer - an array expression gets converted to an > > expression "that points to the initial element of the array object", but > > there is no array object a5_7[5] here). > > C11, 6.5.2.1/3: > Successive subscript operators designate an element of a > multidimensional array object. If E is an n-dimensional array (n >= 2) > with dimensions i x j x . . . x k, then E (used as other than an lvalue) > is converted to a pointer to an (n - 1)-dimensional array with > dimensions j x . . . x k. If the unary * operator is applied to this > pointer explicitly, or implicitly as a result of subscripting, the > result is the referenced (n - 1)-dimensional array, which itself is > converted into a pointer if used as other than an lvalue. It follows > from this that arrays are stored in row-major order (last subscript > varies fastest). > > As far as I see, a5_7[5] here is never treated as an array, just as a > pointer, and &a5_7[5][0] is valid. As usual, based on taking the address, not offsetof where there's the open question of whether the C standard actually requires support for anything other than a single element name there: a5_7[5] is an expression of array type. The only way for it to be treated as a pointer is for it to be converted implicitly to pointer type. That implicit conversion is what I think is problematic. Only once the implicit conversion has taken place do the special rules about &A[B] meaning A + B take effect. But since the problem I see is with the conversion of A to a pointer, you still have undefined behavior. The paragraph you quote seems to not to add anything to the semantics defined elsewhere in the standard; it's purely descriptive of some consequences of those semantics. Whether we wish to be more permissive about some such cases (depending on -Warray-bounds=N) is a pragmatic matter depending on the extent to which they are used in practice.
Index: gcc/c-family/c-common.c =================================================================== --- gcc/c-family/c-common.c (revision 229049) +++ gcc/c-family/c-common.c (working copy) @@ -10589,11 +10589,11 @@ c_common_to_target_charset (HOST_WIDE_IN traditional rendering of offsetof as a macro. Return the folded result. */ tree -fold_offsetof_1 (tree expr) +fold_offsetof_1 (tree expr, enum tree_code ctx) { tree base, off, t; - - switch (TREE_CODE (expr)) + tree_code code = TREE_CODE (expr); + switch (code) { case ERROR_MARK: return expr; @@ -10617,7 +10617,7 @@ fold_offsetof_1 (tree expr) return TREE_OPERAND (expr, 0); case COMPONENT_REF: - base = fold_offsetof_1 (TREE_OPERAND (expr, 0)); + base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code); if (base == error_mark_node) return base; @@ -10634,7 +10634,7 @@ fold_offsetof_1 (tree expr) break; case ARRAY_REF: - base = fold_offsetof_1 (TREE_OPERAND (expr, 0)); + base = fold_offsetof_1 (TREE_OPERAND (expr, 0), code); if (base == error_mark_node) return base; @@ -10649,8 +10649,9 @@ fold_offsetof_1 (tree expr) && !tree_int_cst_equal (upbound, TYPE_MAX_VALUE (TREE_TYPE (upbound)))) { - upbound = size_binop (PLUS_EXPR, upbound, - build_int_cst (TREE_TYPE (upbound), 1)); + if (ctx != ARRAY_REF && ctx != COMPONENT_REF) + upbound = size_binop (PLUS_EXPR, upbound, + build_int_cst (TREE_TYPE (upbound), 1)); if (tree_int_cst_lt (upbound, t)) { tree v; Index: gcc/c-family/c-common.h =================================================================== --- gcc/c-family/c-common.h (revision 229049) +++ gcc/c-family/c-common.h (working copy) @@ -1029,7 +1029,7 @@ extern bool c_dump_tree (void *, tree); extern void verify_sequence_points (tree); -extern tree fold_offsetof_1 (tree); +extern tree fold_offsetof_1 (tree, tree_code ctx = ERROR_MARK); extern tree fold_offsetof (tree); /* Places where an lvalue, or modifiable lvalue, may be required.