Message ID | AM5PR0701MB26578DF7C6DA998944B41E5BE42D0@AM5PR0701MB2657.eurprd07.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | Handle overlength strings in the C FE | expand |
On Wed, 1 Aug 2018, Bernd Edlinger wrote: > On 07/30/18 17:49, Joseph Myers wrote: > > On Mon, 30 Jul 2018, Bernd Edlinger wrote: > > > >> Hi, > >> > >> this is how I would like to handle the over length strings issue in the C FE. > >> If the string constant is exactly the right length and ends in one explicit > >> NUL character, shorten it by one character. > > > > I don't think shortening should be limited to that case. I think the case > > where the constant is longer than that (and so gets an unconditional > > pedwarn) should also have it shortened - any constant that doesn't fit in > > the object being initialized should be shortened to fit, whether diagnosed > > or not, we should define GENERIC / GIMPLE to disallow too-large string > > constants in initializers, and should add an assertion somewhere in the > > middle-end that no too-large string constants reach it. > > > > Okay, there is an update following your suggestion. It seems odd to me to have two separate bits of code dealing with reducing the length, rather than something like if (too long) { /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning, or neither. */ /* Shorten string, in either case. */ } The memcmp with "\0\0\0\0" is introducing a hidden assumption that any sort of character in strings is never more than four bytes. It also seems unnecessary, in that ultimately the over-long string should be shortened regardless of whether what's being removed is a zero character or not. It should not be possible to be over-long and fail tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in host int (even if, ideally, they'd use some other type to allow for STRING_CSTs over 2GB in size). (And I don't think GCC can represent target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the only way for a target type size in bytes to fail to be representable in unsigned HOST_WIDE_INT should be if the size is not constant.)
On 08/01/2018 05:20 AM, Bernd Edlinger wrote: > On 07/30/18 17:49, Joseph Myers wrote: >> On Mon, 30 Jul 2018, Bernd Edlinger wrote: >> >>> Hi, >>> >>> this is how I would like to handle the over length strings issue in the C FE. >>> If the string constant is exactly the right length and ends in one explicit >>> NUL character, shorten it by one character. >> >> I don't think shortening should be limited to that case. I think the case >> where the constant is longer than that (and so gets an unconditional >> pedwarn) should also have it shortened - any constant that doesn't fit in >> the object being initialized should be shortened to fit, whether diagnosed >> or not, we should define GENERIC / GIMPLE to disallow too-large string >> constants in initializers, and should add an assertion somewhere in the >> middle-end that no too-large string constants reach it. >> > > Okay, there is an update following your suggestion. > > Bootstrapped and reg-tested on x86_64-pc-linux-gnu. > Is it OK for trunk? The ChangeLog description says: * c-typeck.c (digest_init): Fix overlength strings. suggesting there is a bug but there is no test case. If there is a bug in there that can be triggered by C code (valid or otherwise), it would be good to have a test case and a bug in Bugzilla. If there is no bug and this is just cleanup, I would suggest to adjust the description. Other than that, while making improvements here, I think it would be helpful to also add more detail to the text of the warning: 1) mention the type of the array being initialized in case it's not obvious from the declaration (the array bound could be a symbol, not a literal, or the type could be a typedef) 2) mention the number of elements in the initializer in case it's a macro (such as __FILE__) whose definition isn't visible in the diagnostic 3) mention that the excess elements are ignored (since it's undefined in the standard, it will let users know what happens in GCC). Here's a test case and a suggested warning: #define S __FILE__ "\000" enum { N = sizeof __FILE__ }; const char a[N] = S; warning: discarding 1 excess element from initializer-string for 'char[4]' [-Wc++-compat] #define S __FILE__ "\000" ^~~~~~~~ note: in expansion of macro ‘S’ const char a[N] = S; ^ (Similarly for more than 1 excess element.) Martin
On 08/01/18 19:07, Martin Sebor wrote: > On 08/01/2018 05:20 AM, Bernd Edlinger wrote: >> On 07/30/18 17:49, Joseph Myers wrote: >>> On Mon, 30 Jul 2018, Bernd Edlinger wrote: >>> >>>> Hi, >>>> >>>> this is how I would like to handle the over length strings issue in the C FE. >>>> If the string constant is exactly the right length and ends in one explicit >>>> NUL character, shorten it by one character. >>> >>> I don't think shortening should be limited to that case. I think the case >>> where the constant is longer than that (and so gets an unconditional >>> pedwarn) should also have it shortened - any constant that doesn't fit in >>> the object being initialized should be shortened to fit, whether diagnosed >>> or not, we should define GENERIC / GIMPLE to disallow too-large string >>> constants in initializers, and should add an assertion somewhere in the >>> middle-end that no too-large string constants reach it. >>> >> >> Okay, there is an update following your suggestion. >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > The ChangeLog description says: > > * c-typeck.c (digest_init): Fix overlength strings. > > suggesting there is a bug but there is no test case. If there > is a bug in there that can be triggered by C code (valid or > otherwise), it would be good to have a test case and a bug > in Bugzilla. If there is no bug and this is just cleanup, > I would suggest to adjust the description. > Yes, thanks for looking at this. This is an attempt to reduce the number of different encodings for otherwise identical strings in the middle-end. I could say "Shorten overlength strings." is that better? There are already numerous test cases with overlength strings. I verified that, because I have tested this patch on top of https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00050.html > Other than that, while making improvements here, I think it > would be helpful to also add more detail to the text of > the warning: > Sure, but it is important to do only one thing at a time. I see this as a preparation to fixing the remaining string_constant folding issues which are potential wrong-code issues. > 1) mention the type of the array being initialized in case > it's not obvious from the declaration (the array bound could > be a symbol, not a literal, or the type could be a typedef) > > 2) mention the number of elements in the initializer in case > it's a macro (such as __FILE__) whose definition isn't visible > in the diagnostic > > 3) mention that the excess elements are ignored (since it's > undefined in the standard, it will let users know what > happens in GCC). > > Here's a test case and a suggested warning: > > #define S __FILE__ "\000" > enum { N = sizeof __FILE__ }; > const char a[N] = S; > > warning: discarding 1 excess element from initializer-string for 'char[4]' [-Wc++-compat] > #define S __FILE__ "\000" > ^~~~~~~~ > note: in expansion of macro ‘S’ > const char a[N] = S; > ^ > (Similarly for more than 1 excess element.) > Yes, definitely helpful, but not part of this patch. Probably one of your next patches, I would assume. Bernd.
On 08/01/18 18:04, Joseph Myers wrote: > On Wed, 1 Aug 2018, Bernd Edlinger wrote: > >> On 07/30/18 17:49, Joseph Myers wrote: >>> On Mon, 30 Jul 2018, Bernd Edlinger wrote: >>> >>>> Hi, >>>> >>>> this is how I would like to handle the over length strings issue in the C FE. >>>> If the string constant is exactly the right length and ends in one explicit >>>> NUL character, shorten it by one character. >>> >>> I don't think shortening should be limited to that case. I think the case >>> where the constant is longer than that (and so gets an unconditional >>> pedwarn) should also have it shortened - any constant that doesn't fit in >>> the object being initialized should be shortened to fit, whether diagnosed >>> or not, we should define GENERIC / GIMPLE to disallow too-large string >>> constants in initializers, and should add an assertion somewhere in the >>> middle-end that no too-large string constants reach it. >>> >> >> Okay, there is an update following your suggestion. > > It seems odd to me to have two separate bits of code dealing with reducing > the length, rather than something like > > if (too long) > { > /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning, > or neither. */ > /* Shorten string, in either case. */ > } > > The memcmp with "\0\0\0\0" is introducing a hidden assumption that any > sort of character in strings is never more than four bytes. It also seems > unnecessary, in that ultimately the over-long string should be shortened > regardless of whether what's being removed is a zero character or not. > > It should not be possible to be over-long and fail tree_fits_uhwi_p > (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in > host int (even if, ideally, they'd use some other type to allow for > STRING_CSTs over 2GB in size). (And I don't think GCC can represent > target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the > only way for a target type size in bytes to fail to be representable in > unsigned HOST_WIDE_INT should be if the size is not constant.) > Agreed. A new simplified version of the patch is attached. Bootstrapped and reg-tested as usual. Is it OK for trunk? Thanks Bernd. 2018-08-01 Bernd Edlinger <bernd.edlinger@hotmail.de> * c-typeck.c (digest_init): Shorten overlength strings. diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c --- gcc/c/c-typeck.c 2018-06-20 18:35:15.000000000 +0200 +++ gcc/c/c-typeck.c 2018-07-31 18:49:50.757586625 +0200 @@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree } } - TREE_TYPE (inside_init) = type; if (TYPE_DOMAIN (type) != NULL_TREE && TYPE_SIZE (type) != NULL_TREE && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) { unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init); + unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT; /* Subtract the size of a single (possibly wide) character because it's ok to ignore the terminating null char that is counted in the length of the constant. */ - if (compare_tree_int (TYPE_SIZE_UNIT (type), - (len - (TYPE_PRECISION (typ1) - / BITS_PER_UNIT))) < 0) + if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0) pedwarn_init (init_loc, 0, ("initializer-string for array of chars " "is too long")); @@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree warning_at (init_loc, OPT_Wc___compat, ("initializer-string for array chars " "is too long for C++")); + if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) + { + unsigned HOST_WIDE_INT size + = tree_to_uhwi (TYPE_SIZE_UNIT (type)); + const char *p = TREE_STRING_POINTER (inside_init); + char *q = (char *)xmalloc (size + unit); + + memcpy (q, p, size); + memset (q + size, 0, unit); + inside_init = build_string (size + unit, q); + free (q); + } } + TREE_TYPE (inside_init) = type; return inside_init; } else if (INTEGRAL_TYPE_P (typ1))
On Wed, Aug 01, 2018 at 08:06:53PM +0000, Bernd Edlinger wrote: > On 08/01/18 18:04, Joseph Myers wrote: > > On Wed, 1 Aug 2018, Bernd Edlinger wrote: > > > >> On 07/30/18 17:49, Joseph Myers wrote: > >>> On Mon, 30 Jul 2018, Bernd Edlinger wrote: > >>> > >>>> Hi, > >>>> > >>>> this is how I would like to handle the over length strings issue in the C FE. > >>>> If the string constant is exactly the right length and ends in one explicit > >>>> NUL character, shorten it by one character. > >>> > >>> I don't think shortening should be limited to that case. I think the case > >>> where the constant is longer than that (and so gets an unconditional > >>> pedwarn) should also have it shortened - any constant that doesn't fit in > >>> the object being initialized should be shortened to fit, whether diagnosed > >>> or not, we should define GENERIC / GIMPLE to disallow too-large string > >>> constants in initializers, and should add an assertion somewhere in the > >>> middle-end that no too-large string constants reach it. > >>> > >> > >> Okay, there is an update following your suggestion. > > > > It seems odd to me to have two separate bits of code dealing with reducing > > the length, rather than something like > > > > if (too long) > > { > > /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning, > > or neither. */ > > /* Shorten string, in either case. */ > > } > > > > The memcmp with "\0\0\0\0" is introducing a hidden assumption that any > > sort of character in strings is never more than four bytes. It also seems > > unnecessary, in that ultimately the over-long string should be shortened > > regardless of whether what's being removed is a zero character or not. > > > It should not be possible to be over-long and fail tree_fits_uhwi_p > > (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in > > host int (even if, ideally, they'd use some other type to allow for > > STRING_CSTs over 2GB in size). (And I don't think GCC can represent > > target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the > > only way for a target type size in bytes to fail to be representable in > > unsigned HOST_WIDE_INT should be if the size is not constant.) > > > > Agreed. > A new simplified version of the patch is attached. > > Bootstrapped and reg-tested as usual. > Is it OK for trunk? > > > Thanks > Bernd. > 2018-08-01 Bernd Edlinger <bernd.edlinger@hotmail.de> > > * c-typeck.c (digest_init): Shorten overlength strings. > > diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c > --- gcc/c/c-typeck.c 2018-06-20 18:35:15.000000000 +0200 > +++ gcc/c/c-typeck.c 2018-07-31 18:49:50.757586625 +0200 > @@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree > } > } > > - TREE_TYPE (inside_init) = type; > if (TYPE_DOMAIN (type) != NULL_TREE > && TYPE_SIZE (type) != NULL_TREE > && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) > { > unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init); > + unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT; > > /* Subtract the size of a single (possibly wide) character > because it's ok to ignore the terminating null char > that is counted in the length of the constant. */ > - if (compare_tree_int (TYPE_SIZE_UNIT (type), > - (len - (TYPE_PRECISION (typ1) > - / BITS_PER_UNIT))) < 0) > + if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0) > pedwarn_init (init_loc, 0, > ("initializer-string for array of chars " > "is too long")); > @@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree > warning_at (init_loc, OPT_Wc___compat, > ("initializer-string for array chars " > "is too long for C++")); > + if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) > + { > + unsigned HOST_WIDE_INT size > + = tree_to_uhwi (TYPE_SIZE_UNIT (type)); > + const char *p = TREE_STRING_POINTER (inside_init); > + char *q = (char *)xmalloc (size + unit); I guess you want XALLOCAVAR or XNEWVAR. Marek
On Wed, 1 Aug 2018, Marek Polacek wrote:
> I guess you want XALLOCAVAR or XNEWVAR.
Not XALLOCAVAR; GCC supports string constants up to 2 GB (minus one byte),
which is far too much to put on the stack.
On 8/1/18, Martin Sebor <msebor@gmail.com> wrote: > On 08/01/2018 05:20 AM, Bernd Edlinger wrote: >> On 07/30/18 17:49, Joseph Myers wrote: >>> On Mon, 30 Jul 2018, Bernd Edlinger wrote: >>> >>>> Hi, >>>> >>>> this is how I would like to handle the over length strings issue in the >>>> C FE. >>>> If the string constant is exactly the right length and ends in one >>>> explicit >>>> NUL character, shorten it by one character. >>> >>> I don't think shortening should be limited to that case. I think the >>> case >>> where the constant is longer than that (and so gets an unconditional >>> pedwarn) should also have it shortened - any constant that doesn't fit >>> in >>> the object being initialized should be shortened to fit, whether >>> diagnosed >>> or not, we should define GENERIC / GIMPLE to disallow too-large string >>> constants in initializers, and should add an assertion somewhere in the >>> middle-end that no too-large string constants reach it. >>> >> >> Okay, there is an update following your suggestion. >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu. >> Is it OK for trunk? > > The ChangeLog description says: > > * c-typeck.c (digest_init): Fix overlength strings. > > suggesting there is a bug but there is no test case. If there > is a bug in there that can be triggered by C code (valid or > otherwise), it would be good to have a test case and a bug > in Bugzilla. If there is no bug and this is just cleanup, > I would suggest to adjust the description. > > Other than that, while making improvements here, I think it > would be helpful to also add more detail to the text of > the warning: > > 1) mention the type of the array being initialized in case > it's not obvious from the declaration (the array bound could > be a symbol, not a literal, or the type could be a typedef) > > 2) mention the number of elements in the initializer in case > it's a macro (such as __FILE__) whose definition isn't visible > in the diagnostic > > 3) mention that the excess elements are ignored (since it's > undefined in the standard, it will let users know what > happens in GCC). > > Here's a test case and a suggested warning: > > #define S __FILE__ "\000" > enum { N = sizeof __FILE__ }; > const char a[N] = S; > > warning: discarding 1 excess element from initializer-string for > 'char[4]' [-Wc++-compat] > #define S __FILE__ "\000" > ^~~~~~~~ > note: in expansion of macro ‘S’ > const char a[N] = S; > ^ > (Similarly for more than 1 excess element.) > > Martin > While modifying -Woverlength-strings, I think a good way to test it would be to enable it during bootstrap. Even though it is currently disabled, I have done bootstraps before while manually enabling it, and they still succeeded. Plus there is documentation that says to avoid overlength strings in GCC sources, so it would be good if this were verified. I also brought this up in bug 80942: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80942 Eric
On Wed, 1 Aug 2018, Eric Gallager wrote:
> While modifying -Woverlength-strings, I think a good way to test it
The -Woverlength-strings option (warn about string constants longer than
the minimum guaranteed by the C standard to be supported) has nothing
whatever to do with the present patch discussion (which relates to string
constants longer than the array they are used as an initializer for).
On 08/01/18 22:43, Joseph Myers wrote: > On Wed, 1 Aug 2018, Marek Polacek wrote: > >> I guess you want XALLOCAVAR or XNEWVAR. > > Not XALLOCAVAR; GCC supports string constants up to 2 GB (minus one byte), > which is far too much to put on the stack. > I assume you want me to use XNEWVEC/XDELETEVEC. Attached is a new version using those macros. Is it OK for trunk? Thanks Bernd. 2018-08-01 Bernd Edlinger <bernd.edlinger@hotmail.de> * c-typeck.c (digest_init): Shorten overlength strings. Index: gcc/c/c-typeck.c =================================================================== --- gcc/c/c-typeck.c (revision 263072) +++ gcc/c/c-typeck.c (working copy) @@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree } } - TREE_TYPE (inside_init) = type; if (TYPE_DOMAIN (type) != NULL_TREE && TYPE_SIZE (type) != NULL_TREE && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) { unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init); + unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT; /* Subtract the size of a single (possibly wide) character because it's ok to ignore the terminating null char that is counted in the length of the constant. */ - if (compare_tree_int (TYPE_SIZE_UNIT (type), - (len - (TYPE_PRECISION (typ1) - / BITS_PER_UNIT))) < 0) + if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0) pedwarn_init (init_loc, 0, ("initializer-string for array of chars " "is too long")); @@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree warning_at (init_loc, OPT_Wc___compat, ("initializer-string for array chars " "is too long for C++")); + if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) + { + unsigned HOST_WIDE_INT size + = tree_to_uhwi (TYPE_SIZE_UNIT (type)); + const char *p = TREE_STRING_POINTER (inside_init); + char *q = XNEWVEC (char, size + unit); + + memcpy (q, p, size); + memset (q + size, 0, unit); + inside_init = build_string (size + unit, q); + XDELETEVEC (q); + } } + TREE_TYPE (inside_init) = type; return inside_init; } else if (INTEGRAL_TYPE_P (typ1))
On Thu, 9 Aug 2018, Bernd Edlinger wrote: > On 08/01/18 22:43, Joseph Myers wrote: > > On Wed, 1 Aug 2018, Marek Polacek wrote: > > > >> I guess you want XALLOCAVAR or XNEWVAR. > > > > Not XALLOCAVAR; GCC supports string constants up to 2 GB (minus one byte), > > which is far too much to put on the stack. > > > > I assume you want me to use XNEWVEC/XDELETEVEC. > > Attached is a new version using those macros. > Is it OK for trunk? This version is OK.
2018-08-01 Bernd Edlinger <bernd.edlinger@hotmail.de> * c-typeck.c (digest_init): Fix overlength strings. diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c --- gcc/c/c-typeck.c 2018-06-20 18:35:15.000000000 +0200 +++ gcc/c/c-typeck.c 2018-07-31 18:49:50.757586625 +0200 @@ -7435,29 +7435,52 @@ digest_init (location_t init_loc, tree t } } - TREE_TYPE (inside_init) = type; if (TYPE_DOMAIN (type) != NULL_TREE && TYPE_SIZE (type) != NULL_TREE && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) { unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init); + unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT; /* Subtract the size of a single (possibly wide) character because it's ok to ignore the terminating null char that is counted in the length of the constant. */ - if (compare_tree_int (TYPE_SIZE_UNIT (type), - (len - (TYPE_PRECISION (typ1) - / BITS_PER_UNIT))) < 0) - pedwarn_init (init_loc, 0, - ("initializer-string for array of chars " - "is too long")); - else if (warn_cxx_compat - && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) - warning_at (init_loc, OPT_Wc___compat, - ("initializer-string for array chars " - "is too long for C++")); + if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0) + { + pedwarn_init (init_loc, 0, + ("initializer-string for array of chars " + "is too long")); + if (tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))) + { + unsigned HOST_WIDE_INT size + = tree_to_uhwi (TYPE_SIZE_UNIT (type)); + const char *p = TREE_STRING_POINTER (inside_init); + char *q = (char *)xmalloc (size + unit); + + memcpy (q, p, size); + memset (q + size, 0, unit); + inside_init = build_string (size + unit, q); + free (q); + } + } + else if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0) + { + if (warn_cxx_compat) + warning_at (init_loc, OPT_Wc___compat, + ("initializer-string for array chars " + "is too long for C++")); + if (len >= 2 * unit) + { + const char *p = TREE_STRING_POINTER (inside_init); + + len -= unit; + if (memcmp (p + len - unit, "\0\0\0\0", unit) == 0) + inside_init = build_string (len, p); + } + } } + TREE_TYPE (inside_init) = type; return inside_init; } else if (INTEGRAL_TYPE_P (typ1))