diff mbox series

Reject too large string literals (PR middle-end/87854)

Message ID 20181116084325.GD11625@tucnak
State New
Headers show
Series Reject too large string literals (PR middle-end/87854) | expand

Commit Message

Jakub Jelinek Nov. 16, 2018, 8:43 a.m. UTC
Hi!

Both C and C++ FE diagnose arrays larger than half of the address space:
/tmp/1.c:1:6: error: size of array ‘a’ is too large
 char a[__SIZE_MAX__ / 2 + 1];
      ^
because one can't do pointer arithmetics on them.  But we don't have
anything similar for string literals.  As internally we use host int
as TREE_STRING_LENGTH, this is relevant to targets that have < 32-bit
size_t only.

The following patch adds that diagnostics and truncates the string literals.

Bootstrapped/regtested on x86_64-linux and i686-linux and tested with
a cross to avr.  I'll defer adjusting testcases to the maintainers of 16-bit
ports.  From the PR it seems gcc.dg/concat2.c, g++.dg/parse/concat1.C and
pr46534.c tests are affected.

Ok for trunk?

2018-11-16  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/87854
	* c-common.c (fix_string_type): Reject string literals larger than
	TYPE_MAX_VALUE (ssizetype) bytes.


	Jakub

Comments

Nathan Sidwell Nov. 16, 2018, 12:06 p.m. UTC | #1
On 11/16/18 3:43 AM, Jakub Jelinek wrote:
> Hi!
> 
> Both C and C++ FE diagnose arrays larger than half of the address space:
> /tmp/1.c:1:6: error: size of array ‘a’ is too large
>   char a[__SIZE_MAX__ / 2 + 1];
>        ^
> because one can't do pointer arithmetics on them.  But we don't have
> anything similar for string literals.  As internally we use host int
> as TREE_STRING_LENGTH, this is relevant to targets that have < 32-bit
> size_t only.
> 
> The following patch adds that diagnostics and truncates the string literals.

Ok by me.

nathan
Marek Polacek Nov. 16, 2018, 2:33 p.m. UTC | #2
On Fri, Nov 16, 2018 at 07:06:51AM -0500, Nathan Sidwell wrote:
> On 11/16/18 3:43 AM, Jakub Jelinek wrote:
> > Hi!
> > 
> > Both C and C++ FE diagnose arrays larger than half of the address space:
> > /tmp/1.c:1:6: error: size of array ‘a’ is too large
> >   char a[__SIZE_MAX__ / 2 + 1];
> >        ^
> > because one can't do pointer arithmetics on them.  But we don't have
> > anything similar for string literals.  As internally we use host int
> > as TREE_STRING_LENGTH, this is relevant to targets that have < 32-bit
> > size_t only.
> > 
> > The following patch adds that diagnostics and truncates the string literals.
> 
> Ok by me.

No objections from me, either.

Marek
Joseph Myers Nov. 16, 2018, 5:34 p.m. UTC | #3
On Fri, 16 Nov 2018, Jakub Jelinek wrote:

> Hi!
> 
> Both C and C++ FE diagnose arrays larger than half of the address space:
> /tmp/1.c:1:6: error: size of array ‘a’ is too large
>  char a[__SIZE_MAX__ / 2 + 1];
>       ^
> because one can't do pointer arithmetics on them.  But we don't have
> anything similar for string literals.  As internally we use host int
> as TREE_STRING_LENGTH, this is relevant to targets that have < 32-bit
> size_t only.
> 
> The following patch adds that diagnostics and truncates the string literals.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux and tested with
> a cross to avr.  I'll defer adjusting testcases to the maintainers of 16-bit
> ports.  From the PR it seems gcc.dg/concat2.c, g++.dg/parse/concat1.C and
> pr46534.c tests are affected.
> 
> Ok for trunk?

OK with me.  I'd hope at least one test (existing or new) would actually 
test the new diagnostic on 16-bit systems, rather than just those tests 
being disabled for affected platforms.
Martin Sebor Nov. 16, 2018, 6:25 p.m. UTC | #4
On 11/16/2018 01:43 AM, Jakub Jelinek wrote:
> Hi!
>
> Both C and C++ FE diagnose arrays larger than half of the address space:
> /tmp/1.c:1:6: error: size of array ‘a’ is too large
>  char a[__SIZE_MAX__ / 2 + 1];
>       ^
> because one can't do pointer arithmetics on them.  But we don't have
> anything similar for string literals.  As internally we use host int
> as TREE_STRING_LENGTH, this is relevant to targets that have < 32-bit
> size_t only.
>
> The following patch adds that diagnostics and truncates the string literals.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux and tested with
> a cross to avr.  I'll defer adjusting testcases to the maintainers of 16-bit
> ports.  From the PR it seems gcc.dg/concat2.c, g++.dg/parse/concat1.C and
> pr46534.c tests are affected.
>
> Ok for trunk?
>
> 2018-11-16  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR middle-end/87854
> 	* c-common.c (fix_string_type): Reject string literals larger than
> 	TYPE_MAX_VALUE (ssizetype) bytes.
>
> --- gcc/c-family/c-common.c.jj	2018-11-14 13:37:46.921050615 +0100
> +++ gcc/c-family/c-common.c	2018-11-15 15:20:31.138056115 +0100
> @@ -737,31 +737,44 @@ tree
>  fix_string_type (tree value)
>  {
>    int length = TREE_STRING_LENGTH (value);
> -  int nchars;
> +  int nchars, charsz;
>    tree e_type, i_type, a_type;
>
>    /* Compute the number of elements, for the array type.  */
>    if (TREE_TYPE (value) == char_array_type_node || !TREE_TYPE (value))
>      {
> -      nchars = length;
> +      charsz = 1;
>        e_type = char_type_node;
>      }
>    else if (TREE_TYPE (value) == char16_array_type_node)
>      {
> -      nchars = length / (TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT);
> +      charsz = TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT;
>        e_type = char16_type_node;
>      }
>    else if (TREE_TYPE (value) == char32_array_type_node)
>      {
> -      nchars = length / (TYPE_PRECISION (char32_type_node) / BITS_PER_UNIT);
> +      charsz = TYPE_PRECISION (char32_type_node) / BITS_PER_UNIT;
>        e_type = char32_type_node;
>      }
>    else
>      {
> -      nchars = length / (TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT);
> +      charsz = TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT;
>        e_type = wchar_type_node;
>      }
>
> +  /* This matters only for targets where ssizetype has smaller precision
> +     than 32 bits.  */
> +  if (wi::lts_p (wi::to_wide (TYPE_MAX_VALUE (ssizetype)), length))
> +    {
> +      error ("size of string literal is too large");

It would be helpful to mention the size of the literal and the limit
so users who do run into the error don't wonder how to fix it.

Martin
Jakub Jelinek Nov. 16, 2018, 6:31 p.m. UTC | #5
On Fri, Nov 16, 2018 at 11:25:15AM -0700, Martin Sebor wrote:
> On 11/16/2018 01:43 AM, Jakub Jelinek wrote:
> > 
> > +  /* This matters only for targets where ssizetype has smaller precision
> > +     than 32 bits.  */
> > +  if (wi::lts_p (wi::to_wide (TYPE_MAX_VALUE (ssizetype)), length))
> > +    {
> > +      error ("size of string literal is too large");
> 
> It would be helpful to mention the size of the literal and the limit
> so users who do run into the error don't wonder how to fix it.

It is consistent with what we emit for the arrays.
So, if the size and limit info is helpful to users, we should provide that
for those too.  I mean the:
                        if (name)
                          error_at (loc, "size of array %qE is too large",
                        else
                          error_at (loc, "size of unnamed array is too large");
                                    name);
calls in the C FE and similar stuff in C++ FE.
Feel free to add that to all of those.

	Jakub
diff mbox series

Patch

--- gcc/c-family/c-common.c.jj	2018-11-14 13:37:46.921050615 +0100
+++ gcc/c-family/c-common.c	2018-11-15 15:20:31.138056115 +0100
@@ -737,31 +737,44 @@  tree
 fix_string_type (tree value)
 {
   int length = TREE_STRING_LENGTH (value);
-  int nchars;
+  int nchars, charsz;
   tree e_type, i_type, a_type;
 
   /* Compute the number of elements, for the array type.  */
   if (TREE_TYPE (value) == char_array_type_node || !TREE_TYPE (value))
     {
-      nchars = length;
+      charsz = 1;
       e_type = char_type_node;
     }
   else if (TREE_TYPE (value) == char16_array_type_node)
     {
-      nchars = length / (TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT);
+      charsz = TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT;
       e_type = char16_type_node;
     }
   else if (TREE_TYPE (value) == char32_array_type_node)
     {
-      nchars = length / (TYPE_PRECISION (char32_type_node) / BITS_PER_UNIT);
+      charsz = TYPE_PRECISION (char32_type_node) / BITS_PER_UNIT;
       e_type = char32_type_node;
     }
   else
     {
-      nchars = length / (TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT);
+      charsz = TYPE_PRECISION (wchar_type_node) / BITS_PER_UNIT;
       e_type = wchar_type_node;
     }
 
+  /* This matters only for targets where ssizetype has smaller precision
+     than 32 bits.  */
+  if (wi::lts_p (wi::to_wide (TYPE_MAX_VALUE (ssizetype)), length))
+    {
+      error ("size of string literal is too large");
+      length = tree_to_shwi (TYPE_MAX_VALUE (ssizetype)) / charsz * charsz;
+      char *str = CONST_CAST (char *, TREE_STRING_POINTER (value));
+      memset (str + length, '\0',
+	      MIN (TREE_STRING_LENGTH (value) - length, charsz));
+      TREE_STRING_LENGTH (value) = length;
+    }
+  nchars = length / charsz;
+
   /* C89 2.2.4.1, C99 5.2.4.1 (Translation limits).  The analogous
      limit in C++98 Annex B is very large (65536) and is not normative,
      so we do not diagnose it (warn_overlength_strings is forced off