diff mbox series

Handle overlength strings in the C FE

Message ID AM5PR0701MB26578DF7C6DA998944B41E5BE42D0@AM5PR0701MB2657.eurprd07.prod.outlook.com
State New
Headers show
Series Handle overlength strings in the C FE | expand

Commit Message

Bernd Edlinger Aug. 1, 2018, 11:20 a.m. UTC
On 07/30/18 17:49, Joseph Myers wrote:
> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
> 
>> Hi,
>>
>> this is how I would like to handle the over length strings issue in the C FE.
>> If the string constant is exactly the right length and ends in one explicit
>> NUL character, shorten it by one character.
> 
> I don't think shortening should be limited to that case.  I think the case
> where the constant is longer than that (and so gets an unconditional
> pedwarn) should also have it shortened - any constant that doesn't fit in
> the object being initialized should be shortened to fit, whether diagnosed
> or not, we should define GENERIC / GIMPLE to disallow too-large string
> constants in initializers, and should add an assertion somewhere in the
> middle-end that no too-large string constants reach it.
> 

Okay, there is an update following your suggestion.

Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.

Comments

Joseph Myers Aug. 1, 2018, 4:04 p.m. UTC | #1
On Wed, 1 Aug 2018, Bernd Edlinger wrote:

> On 07/30/18 17:49, Joseph Myers wrote:
> > On Mon, 30 Jul 2018, Bernd Edlinger wrote:
> > 
> >> Hi,
> >>
> >> this is how I would like to handle the over length strings issue in the C FE.
> >> If the string constant is exactly the right length and ends in one explicit
> >> NUL character, shorten it by one character.
> > 
> > I don't think shortening should be limited to that case.  I think the case
> > where the constant is longer than that (and so gets an unconditional
> > pedwarn) should also have it shortened - any constant that doesn't fit in
> > the object being initialized should be shortened to fit, whether diagnosed
> > or not, we should define GENERIC / GIMPLE to disallow too-large string
> > constants in initializers, and should add an assertion somewhere in the
> > middle-end that no too-large string constants reach it.
> > 
> 
> Okay, there is an update following your suggestion.

It seems odd to me to have two separate bits of code dealing with reducing 
the length, rather than something like

if (too long)
  {
    /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning,
       or neither.  */
    /* Shorten string, in either case.  */
  }

The memcmp with "\0\0\0\0" is introducing a hidden assumption that any 
sort of character in strings is never more than four bytes.  It also seems 
unnecessary, in that ultimately the over-long string should be shortened 
regardless of whether what's being removed is a zero character or not.

It should not be possible to be over-long and fail tree_fits_uhwi_p 
(TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in 
host int (even if, ideally, they'd use some other type to allow for 
STRING_CSTs over 2GB in size).  (And I don't think GCC can represent 
target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the 
only way for a target type size in bytes to fail to be representable in 
unsigned HOST_WIDE_INT should be if the size is not constant.)
Martin Sebor Aug. 1, 2018, 5:07 p.m. UTC | #2
On 08/01/2018 05:20 AM, Bernd Edlinger wrote:
> On 07/30/18 17:49, Joseph Myers wrote:
>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>
>>> Hi,
>>>
>>> this is how I would like to handle the over length strings issue in the C FE.
>>> If the string constant is exactly the right length and ends in one explicit
>>> NUL character, shorten it by one character.
>>
>> I don't think shortening should be limited to that case.  I think the case
>> where the constant is longer than that (and so gets an unconditional
>> pedwarn) should also have it shortened - any constant that doesn't fit in
>> the object being initialized should be shortened to fit, whether diagnosed
>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>> constants in initializers, and should add an assertion somewhere in the
>> middle-end that no too-large string constants reach it.
>>
>
> Okay, there is an update following your suggestion.
>
> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
> Is it OK for trunk?

The ChangeLog description says:

	* c-typeck.c (digest_init): Fix overlength strings.

suggesting there is a bug but there is no test case.  If there
is a bug in there that can be triggered by C code (valid or
otherwise), it would be good to have a test case and a bug
in Bugzilla.  If there is no bug and this is just cleanup,
I would suggest to adjust the description.

Other than that, while making improvements here, I think it
would be helpful to also add more detail to the text of
the warning:

1) mention the type of the array being initialized in case
it's not obvious from the declaration (the array bound could
be a symbol, not a literal, or the type could be a typedef)

2) mention the number of elements in the initializer in case
it's a macro (such as __FILE__) whose definition isn't visible
in the diagnostic

3) mention that the excess elements are ignored (since it's
undefined in the standard, it will let users know what
happens in GCC).

Here's a test case and a suggested warning:

   #define S __FILE__ "\000"
   enum { N = sizeof __FILE__ };
   const char a[N] = S;

   warning: discarding 1 excess element from initializer-string for 
'char[4]' [-Wc++-compat]
    #define S __FILE__ "\000"
              ^~~~~~~~
   note: in expansion of macro ‘S’
    const char a[N] = S;
                      ^
(Similarly for more than 1 excess element.)

Martin
Bernd Edlinger Aug. 1, 2018, 5:37 p.m. UTC | #3
On 08/01/18 19:07, Martin Sebor wrote:
> On 08/01/2018 05:20 AM, Bernd Edlinger wrote:
>> On 07/30/18 17:49, Joseph Myers wrote:
>>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>>
>>>> Hi,
>>>>
>>>> this is how I would like to handle the over length strings issue in the C FE.
>>>> If the string constant is exactly the right length and ends in one explicit
>>>> NUL character, shorten it by one character.
>>>
>>> I don't think shortening should be limited to that case.  I think the case
>>> where the constant is longer than that (and so gets an unconditional
>>> pedwarn) should also have it shortened - any constant that doesn't fit in
>>> the object being initialized should be shortened to fit, whether diagnosed
>>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>>> constants in initializers, and should add an assertion somewhere in the
>>> middle-end that no too-large string constants reach it.
>>>
>>
>> Okay, there is an update following your suggestion.
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
> 
> The ChangeLog description says:
> 
>      * c-typeck.c (digest_init): Fix overlength strings.
> 
> suggesting there is a bug but there is no test case.  If there
> is a bug in there that can be triggered by C code (valid or
> otherwise), it would be good to have a test case and a bug
> in Bugzilla.  If there is no bug and this is just cleanup,
> I would suggest to adjust the description.
> 

Yes, thanks for looking at this.  This is an attempt to
reduce the number of different encodings for otherwise
identical strings in the middle-end.

I could say "Shorten overlength strings." is that better?
There are already numerous test cases with overlength strings.
I verified that, because I have tested this patch on top
of https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00050.html

> Other than that, while making improvements here, I think it
> would be helpful to also add more detail to the text of
> the warning:
> 

Sure, but it is important to do only one thing at a time.
I see this as a preparation to fixing the remaining
string_constant folding issues which are potential wrong-code
issues.

> 1) mention the type of the array being initialized in case
> it's not obvious from the declaration (the array bound could
> be a symbol, not a literal, or the type could be a typedef)
> 
> 2) mention the number of elements in the initializer in case
> it's a macro (such as __FILE__) whose definition isn't visible
> in the diagnostic
> 
> 3) mention that the excess elements are ignored (since it's
> undefined in the standard, it will let users know what
> happens in GCC).
> 
> Here's a test case and a suggested warning:
> 
>    #define S __FILE__ "\000"
>    enum { N = sizeof __FILE__ };
>    const char a[N] = S;
> 
>    warning: discarding 1 excess element from initializer-string for 'char[4]' [-Wc++-compat]
>     #define S __FILE__ "\000"
>               ^~~~~~~~
>    note: in expansion of macro ‘S’
>     const char a[N] = S;
>                       ^
> (Similarly for more than 1 excess element.)
> 

Yes, definitely helpful, but not part of this patch.
Probably one of your next patches, I would assume.

Bernd.
Bernd Edlinger Aug. 1, 2018, 8:06 p.m. UTC | #4
On 08/01/18 18:04, Joseph Myers wrote:
> On Wed, 1 Aug 2018, Bernd Edlinger wrote:
> 
>> On 07/30/18 17:49, Joseph Myers wrote:
>>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>>
>>>> Hi,
>>>>
>>>> this is how I would like to handle the over length strings issue in the C FE.
>>>> If the string constant is exactly the right length and ends in one explicit
>>>> NUL character, shorten it by one character.
>>>
>>> I don't think shortening should be limited to that case.  I think the case
>>> where the constant is longer than that (and so gets an unconditional
>>> pedwarn) should also have it shortened - any constant that doesn't fit in
>>> the object being initialized should be shortened to fit, whether diagnosed
>>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>>> constants in initializers, and should add an assertion somewhere in the
>>> middle-end that no too-large string constants reach it.
>>>
>>
>> Okay, there is an update following your suggestion.
> 
> It seems odd to me to have two separate bits of code dealing with reducing
> the length, rather than something like
> 
> if (too long)
>    {
>      /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning,
>         or neither.  */
>      /* Shorten string, in either case.  */
>    }
> 
> The memcmp with "\0\0\0\0" is introducing a hidden assumption that any
> sort of character in strings is never more than four bytes.  It also seems
> unnecessary, in that ultimately the over-long string should be shortened
> regardless of whether what's being removed is a zero character or not.
> > It should not be possible to be over-long and fail tree_fits_uhwi_p
> (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in
> host int (even if, ideally, they'd use some other type to allow for
> STRING_CSTs over 2GB in size).  (And I don't think GCC can represent
> target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the
> only way for a target type size in bytes to fail to be representable in
> unsigned HOST_WIDE_INT should be if the size is not constant.)
> 

Agreed.
A new simplified version of the patch is attached.

Bootstrapped and reg-tested as usual.
Is it OK for trunk?


Thanks
Bernd.
2018-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* c-typeck.c (digest_init): Shorten overlength strings.

diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c
--- gcc/c/c-typeck.c	2018-06-20 18:35:15.000000000 +0200
+++ gcc/c/c-typeck.c	2018-07-31 18:49:50.757586625 +0200
@@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree
 		}
 	    }
 
-	  TREE_TYPE (inside_init) = type;
 	  if (TYPE_DOMAIN (type) != NULL_TREE
 	      && TYPE_SIZE (type) != NULL_TREE
 	      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
 	    {
 	      unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init);
+	      unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT;
 
 	      /* Subtract the size of a single (possibly wide) character
 		 because it's ok to ignore the terminating null char
 		 that is counted in the length of the constant.  */
-	      if (compare_tree_int (TYPE_SIZE_UNIT (type),
-				    (len - (TYPE_PRECISION (typ1)
-					    / BITS_PER_UNIT))) < 0)
+	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0)
 		pedwarn_init (init_loc, 0,
 			      ("initializer-string for array of chars "
 			       "is too long"));
@@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree
 		warning_at (init_loc, OPT_Wc___compat,
 			    ("initializer-string for array chars "
 			     "is too long for C++"));
+	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
+		{
+		  unsigned HOST_WIDE_INT size
+		    = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+		  const char *p = TREE_STRING_POINTER (inside_init);
+		  char *q = (char *)xmalloc (size + unit);
+
+		  memcpy (q, p, size);
+		  memset (q + size, 0, unit);
+		  inside_init = build_string (size + unit, q);
+		  free (q);
+		}
 	    }
 
+	  TREE_TYPE (inside_init) = type;
 	  return inside_init;
 	}
       else if (INTEGRAL_TYPE_P (typ1))
Marek Polacek Aug. 1, 2018, 8:28 p.m. UTC | #5
On Wed, Aug 01, 2018 at 08:06:53PM +0000, Bernd Edlinger wrote:
> On 08/01/18 18:04, Joseph Myers wrote:
> > On Wed, 1 Aug 2018, Bernd Edlinger wrote:
> > 
> >> On 07/30/18 17:49, Joseph Myers wrote:
> >>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> this is how I would like to handle the over length strings issue in the C FE.
> >>>> If the string constant is exactly the right length and ends in one explicit
> >>>> NUL character, shorten it by one character.
> >>>
> >>> I don't think shortening should be limited to that case.  I think the case
> >>> where the constant is longer than that (and so gets an unconditional
> >>> pedwarn) should also have it shortened - any constant that doesn't fit in
> >>> the object being initialized should be shortened to fit, whether diagnosed
> >>> or not, we should define GENERIC / GIMPLE to disallow too-large string
> >>> constants in initializers, and should add an assertion somewhere in the
> >>> middle-end that no too-large string constants reach it.
> >>>
> >>
> >> Okay, there is an update following your suggestion.
> > 
> > It seems odd to me to have two separate bits of code dealing with reducing
> > the length, rather than something like
> > 
> > if (too long)
> >    {
> >      /* Decide whether to do a pedwarn_init, or a warn_cxx_compat warning,
> >         or neither.  */
> >      /* Shorten string, in either case.  */
> >    }
> > 
> > The memcmp with "\0\0\0\0" is introducing a hidden assumption that any
> > sort of character in strings is never more than four bytes.  It also seems
> > unnecessary, in that ultimately the over-long string should be shortened
> > regardless of whether what's being removed is a zero character or not.
> > > It should not be possible to be over-long and fail tree_fits_uhwi_p
> > (TYPE_SIZE_UNIT (type)), simply because STRING_CST lengths are stored in
> > host int (even if, ideally, they'd use some other type to allow for
> > STRING_CSTs over 2GB in size).  (And I don't think GCC can represent
> > target type sizes that don't fit in unsigned HOST_WIDE_INT anyway; the
> > only way for a target type size in bytes to fail to be representable in
> > unsigned HOST_WIDE_INT should be if the size is not constant.)
> > 
> 
> Agreed.
> A new simplified version of the patch is attached.
> 
> Bootstrapped and reg-tested as usual.
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.

> 2018-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>
> 
> 	* c-typeck.c (digest_init): Shorten overlength strings.
> 
> diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c
> --- gcc/c/c-typeck.c	2018-06-20 18:35:15.000000000 +0200
> +++ gcc/c/c-typeck.c	2018-07-31 18:49:50.757586625 +0200
> @@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree
>  		}
>  	    }
>  
> -	  TREE_TYPE (inside_init) = type;
>  	  if (TYPE_DOMAIN (type) != NULL_TREE
>  	      && TYPE_SIZE (type) != NULL_TREE
>  	      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
>  	    {
>  	      unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init);
> +	      unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT;
>  
>  	      /* Subtract the size of a single (possibly wide) character
>  		 because it's ok to ignore the terminating null char
>  		 that is counted in the length of the constant.  */
> -	      if (compare_tree_int (TYPE_SIZE_UNIT (type),
> -				    (len - (TYPE_PRECISION (typ1)
> -					    / BITS_PER_UNIT))) < 0)
> +	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0)
>  		pedwarn_init (init_loc, 0,
>  			      ("initializer-string for array of chars "
>  			       "is too long"));
> @@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree
>  		warning_at (init_loc, OPT_Wc___compat,
>  			    ("initializer-string for array chars "
>  			     "is too long for C++"));
> +	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
> +		{
> +		  unsigned HOST_WIDE_INT size
> +		    = tree_to_uhwi (TYPE_SIZE_UNIT (type));
> +		  const char *p = TREE_STRING_POINTER (inside_init);
> +		  char *q = (char *)xmalloc (size + unit);

I guess you want XALLOCAVAR or XNEWVAR.

Marek
Joseph Myers Aug. 1, 2018, 8:43 p.m. UTC | #6
On Wed, 1 Aug 2018, Marek Polacek wrote:

> I guess you want XALLOCAVAR or XNEWVAR.

Not XALLOCAVAR; GCC supports string constants up to 2 GB (minus one byte), 
which is far too much to put on the stack.
Eric Gallager Aug. 1, 2018, 9:03 p.m. UTC | #7
On 8/1/18, Martin Sebor <msebor@gmail.com> wrote:
> On 08/01/2018 05:20 AM, Bernd Edlinger wrote:
>> On 07/30/18 17:49, Joseph Myers wrote:
>>> On Mon, 30 Jul 2018, Bernd Edlinger wrote:
>>>
>>>> Hi,
>>>>
>>>> this is how I would like to handle the over length strings issue in the
>>>> C FE.
>>>> If the string constant is exactly the right length and ends in one
>>>> explicit
>>>> NUL character, shorten it by one character.
>>>
>>> I don't think shortening should be limited to that case.  I think the
>>> case
>>> where the constant is longer than that (and so gets an unconditional
>>> pedwarn) should also have it shortened - any constant that doesn't fit
>>> in
>>> the object being initialized should be shortened to fit, whether
>>> diagnosed
>>> or not, we should define GENERIC / GIMPLE to disallow too-large string
>>> constants in initializers, and should add an assertion somewhere in the
>>> middle-end that no too-large string constants reach it.
>>>
>>
>> Okay, there is an update following your suggestion.
>>
>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>> Is it OK for trunk?
>
> The ChangeLog description says:
>
> 	* c-typeck.c (digest_init): Fix overlength strings.
>
> suggesting there is a bug but there is no test case.  If there
> is a bug in there that can be triggered by C code (valid or
> otherwise), it would be good to have a test case and a bug
> in Bugzilla.  If there is no bug and this is just cleanup,
> I would suggest to adjust the description.
>
> Other than that, while making improvements here, I think it
> would be helpful to also add more detail to the text of
> the warning:
>
> 1) mention the type of the array being initialized in case
> it's not obvious from the declaration (the array bound could
> be a symbol, not a literal, or the type could be a typedef)
>
> 2) mention the number of elements in the initializer in case
> it's a macro (such as __FILE__) whose definition isn't visible
> in the diagnostic
>
> 3) mention that the excess elements are ignored (since it's
> undefined in the standard, it will let users know what
> happens in GCC).
>
> Here's a test case and a suggested warning:
>
>    #define S __FILE__ "\000"
>    enum { N = sizeof __FILE__ };
>    const char a[N] = S;
>
>    warning: discarding 1 excess element from initializer-string for
> 'char[4]' [-Wc++-compat]
>     #define S __FILE__ "\000"
>               ^~~~~~~~
>    note: in expansion of macro ‘S’
>     const char a[N] = S;
>                       ^
> (Similarly for more than 1 excess element.)
>
> Martin
>

While modifying -Woverlength-strings, I think a good way to test it
would be to enable it during bootstrap. Even though it is currently
disabled, I have done bootstraps before while manually enabling it,
and they still succeeded. Plus there is documentation that says to
avoid overlength strings in GCC sources, so it would be good if this
were verified. I also brought this up in bug 80942:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80942

Eric
Joseph Myers Aug. 1, 2018, 10:09 p.m. UTC | #8
On Wed, 1 Aug 2018, Eric Gallager wrote:

> While modifying -Woverlength-strings, I think a good way to test it

The -Woverlength-strings option (warn about string constants longer than 
the minimum guaranteed by the C standard to be supported) has nothing 
whatever to do with the present patch discussion (which relates to string 
constants longer than the array they are used as an initializer for).
Bernd Edlinger Aug. 9, 2018, 2:07 p.m. UTC | #9
On 08/01/18 22:43, Joseph Myers wrote:
> On Wed, 1 Aug 2018, Marek Polacek wrote:
> 
>> I guess you want XALLOCAVAR or XNEWVAR.
> 
> Not XALLOCAVAR; GCC supports string constants up to 2 GB (minus one byte),
> which is far too much to put on the stack.
> 

I assume you want me to use XNEWVEC/XDELETEVEC.

Attached is a new version using those macros.
Is it OK for trunk?


Thanks
Bernd.
2018-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* c-typeck.c (digest_init): Shorten overlength strings.

Index: gcc/c/c-typeck.c
===================================================================
--- gcc/c/c-typeck.c	(revision 263072)
+++ gcc/c/c-typeck.c	(working copy)
@@ -7435,19 +7435,17 @@ digest_init (location_t init_loc, tree type, tree
 		}
 	    }
 
-	  TREE_TYPE (inside_init) = type;
 	  if (TYPE_DOMAIN (type) != NULL_TREE
 	      && TYPE_SIZE (type) != NULL_TREE
 	      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
 	    {
 	      unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init);
+	      unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT;
 
 	      /* Subtract the size of a single (possibly wide) character
 		 because it's ok to ignore the terminating null char
 		 that is counted in the length of the constant.  */
-	      if (compare_tree_int (TYPE_SIZE_UNIT (type),
-				    (len - (TYPE_PRECISION (typ1)
-					    / BITS_PER_UNIT))) < 0)
+	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0)
 		pedwarn_init (init_loc, 0,
 			      ("initializer-string for array of chars "
 			       "is too long"));
@@ -7456,8 +7454,21 @@ digest_init (location_t init_loc, tree type, tree
 		warning_at (init_loc, OPT_Wc___compat,
 			    ("initializer-string for array chars "
 			     "is too long for C++"));
+	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
+		{
+		  unsigned HOST_WIDE_INT size
+		    = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+		  const char *p = TREE_STRING_POINTER (inside_init);
+		  char *q = XNEWVEC (char, size + unit);
+
+		  memcpy (q, p, size);
+		  memset (q + size, 0, unit);
+		  inside_init = build_string (size + unit, q);
+		  XDELETEVEC (q);
+		}
 	    }
 
+	  TREE_TYPE (inside_init) = type;
 	  return inside_init;
 	}
       else if (INTEGRAL_TYPE_P (typ1))
Joseph Myers Aug. 9, 2018, 10:08 p.m. UTC | #10
On Thu, 9 Aug 2018, Bernd Edlinger wrote:

> On 08/01/18 22:43, Joseph Myers wrote:
> > On Wed, 1 Aug 2018, Marek Polacek wrote:
> > 
> >> I guess you want XALLOCAVAR or XNEWVAR.
> > 
> > Not XALLOCAVAR; GCC supports string constants up to 2 GB (minus one byte),
> > which is far too much to put on the stack.
> > 
> 
> I assume you want me to use XNEWVEC/XDELETEVEC.
> 
> Attached is a new version using those macros.
> Is it OK for trunk?

This version is OK.
diff mbox series

Patch

2018-08-01  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* c-typeck.c (digest_init): Fix overlength strings.

diff -pur gcc/c/c-typeck.c gcc/c/c-typeck.c
--- gcc/c/c-typeck.c	2018-06-20 18:35:15.000000000 +0200
+++ gcc/c/c-typeck.c	2018-07-31 18:49:50.757586625 +0200
@@ -7435,29 +7435,52 @@  digest_init (location_t init_loc, tree t
 		}
 	    }
 
-	  TREE_TYPE (inside_init) = type;
 	  if (TYPE_DOMAIN (type) != NULL_TREE
 	      && TYPE_SIZE (type) != NULL_TREE
 	      && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST)
 	    {
 	      unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (inside_init);
+	      unsigned unit = TYPE_PRECISION (typ1) / BITS_PER_UNIT;
 
 	      /* Subtract the size of a single (possibly wide) character
 		 because it's ok to ignore the terminating null char
 		 that is counted in the length of the constant.  */
-	      if (compare_tree_int (TYPE_SIZE_UNIT (type),
-				    (len - (TYPE_PRECISION (typ1)
-					    / BITS_PER_UNIT))) < 0)
-		pedwarn_init (init_loc, 0,
-			      ("initializer-string for array of chars "
-			       "is too long"));
-	      else if (warn_cxx_compat
-		       && compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
-		warning_at (init_loc, OPT_Wc___compat,
-			    ("initializer-string for array chars "
-			     "is too long for C++"));
+	      if (compare_tree_int (TYPE_SIZE_UNIT (type), len - unit) < 0)
+		{
+		  pedwarn_init (init_loc, 0,
+				("initializer-string for array of chars "
+				 "is too long"));
+		  if (tree_fits_uhwi_p (TYPE_SIZE_UNIT (type)))
+		    {
+		      unsigned HOST_WIDE_INT size
+			= tree_to_uhwi (TYPE_SIZE_UNIT (type));
+		      const char *p = TREE_STRING_POINTER (inside_init);
+		      char *q = (char *)xmalloc (size + unit);
+
+		      memcpy (q, p, size);
+		      memset (q + size, 0, unit);
+		      inside_init = build_string (size + unit, q);
+		      free (q);
+		    }
+		}
+	      else if (compare_tree_int (TYPE_SIZE_UNIT (type), len) < 0)
+		{
+		  if (warn_cxx_compat)
+		    warning_at (init_loc, OPT_Wc___compat,
+				("initializer-string for array chars "
+				 "is too long for C++"));
+		  if (len >= 2 * unit)
+		    {
+		      const char *p = TREE_STRING_POINTER (inside_init);
+
+		      len -= unit;
+		      if (memcmp (p + len - unit, "\0\0\0\0", unit) == 0)
+			inside_init = build_string (len, p);
+		    }
+		}
 	    }
 
+	  TREE_TYPE (inside_init) = type;
 	  return inside_init;
 	}
       else if (INTEGRAL_TYPE_P (typ1))