diff mbox series

fix a couple of bugs in const string folding (PR 86532)

Message ID d798c8fe-71fa-b6e2-c01b-1480bfcf5200@gmail.com
State New
Headers show
Series fix a couple of bugs in const string folding (PR 86532) | expand

Commit Message

Martin Sebor July 17, 2018, 3:19 p.m. UTC
My enhancement to extract constant strings out of complex
aggregates committed last week introduced a couple of bugs in
dealing with non-constant indices and offsets.  One of the bugs
was fixed earlier today (PR 86528) but another one remains.  It
causes strlen (among other functions) to incorrectly fold
expressions involving a non-constant index into an array of
strings by treating the index the same as a non-consatnt
offset into it.

The non-constant index should either prevent the folding, or it
needs to handle it differently from an offset.

The attached patch takes the conservative approach of avoiding
the folding in this case.  The remaining changes deal with
the fallout from the fix.

Tested on x86_64-linux.

Martin

Comments

Jeff Law July 17, 2018, 3:38 p.m. UTC | #1
On 07/17/2018 09:19 AM, Martin Sebor wrote:
> My enhancement to extract constant strings out of complex
> aggregates committed last week introduced a couple of bugs in
> dealing with non-constant indices and offsets.  One of the bugs
> was fixed earlier today (PR 86528) but another one remains.  It
> causes strlen (among other functions) to incorrectly fold
> expressions involving a non-constant index into an array of
> strings by treating the index the same as a non-consatnt
> offset into it.
> 
> The non-constant index should either prevent the folding, or it
> needs to handle it differently from an offset.
> 
> The attached patch takes the conservative approach of avoiding
> the folding in this case.  The remaining changes deal with
> the fallout from the fix.
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> 
> gcc-86532.diff
> 
> 
> PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/86532
> 	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
> 	(check_access): Be prepared for non-constant length ranges.
> 	* expr.c (string_constant): Only handle the minor non-constant
> 	array index.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/86532
> 	* gcc.c-torture/execute/strlen-2.c: New test.

[ ... ]

> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c	(revision 262764)
> +++ gcc/expr.c	(working copy)

> @@ -11343,16 +11356,15 @@ string_constant (tree arg, tree *ptr_offset)
>      {
>        if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
>  	return NULL_TREE;
> -      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
> -	{
> -	  /* Add the scaled variable index to the constant offset.  */
> -	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
> -				     fold_convert (sizetype, varidx),
> -				     eltsize);
> -	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
> -	}
> -      else
> -	return NULL_TREE;
> +
> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
> +	chartype = TREE_TYPE (chartype);
This is a bit concerning.  First under what conditions is chartype not
going to be an INTEGER_TYPE?  And under what conditions will extracting
its type ultimately lead to something that is an INTEGER_TYPE?


The rest looks pretty reasonable.

Jeff
Martin Sebor July 17, 2018, 4:08 p.m. UTC | #2
On 07/17/2018 09:38 AM, Jeff Law wrote:
> On 07/17/2018 09:19 AM, Martin Sebor wrote:
>> My enhancement to extract constant strings out of complex
>> aggregates committed last week introduced a couple of bugs in
>> dealing with non-constant indices and offsets.  One of the bugs
>> was fixed earlier today (PR 86528) but another one remains.  It
>> causes strlen (among other functions) to incorrectly fold
>> expressions involving a non-constant index into an array of
>> strings by treating the index the same as a non-consatnt
>> offset into it.
>>
>> The non-constant index should either prevent the folding, or it
>> needs to handle it differently from an offset.
>>
>> The attached patch takes the conservative approach of avoiding
>> the folding in this case.  The remaining changes deal with
>> the fallout from the fix.
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>>
>> gcc-86532.diff
>>
>>
>> PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/86532
>> 	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
>> 	(check_access): Be prepared for non-constant length ranges.
>> 	* expr.c (string_constant): Only handle the minor non-constant
>> 	array index.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/86532
>> 	* gcc.c-torture/execute/strlen-2.c: New test.
>
> [ ... ]
>
>> Index: gcc/expr.c
>> ===================================================================
>> --- gcc/expr.c	(revision 262764)
>> +++ gcc/expr.c	(working copy)
>
>> @@ -11343,16 +11356,15 @@ string_constant (tree arg, tree *ptr_offset)
>>      {
>>        if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
>>  	return NULL_TREE;
>> -      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
>> -	{
>> -	  /* Add the scaled variable index to the constant offset.  */
>> -	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
>> -				     fold_convert (sizetype, varidx),
>> -				     eltsize);
>> -	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
>> -	}
>> -      else
>> -	return NULL_TREE;
>> +
>> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
>> +	chartype = TREE_TYPE (chartype);
> This is a bit concerning.  First under what conditions is chartype not
> going to be an INTEGER_TYPE?  And under what conditions will extracting
> its type ultimately lead to something that is an INTEGER_TYPE?

chartype is usually (maybe even always) pointer type here:

   const char a[] = "123";
   extern int i;
   n = strlen (&a[i]);

Martin
Martin Sebor July 18, 2018, 2:45 a.m. UTC | #3
The attached update takes care of a couple of problems pointed
out by Bernd Edlinger in his comments on the bug.  The ICE he
mentioned in comment #20 was due mixing sizetype, ssizetype,
and size_type_node in c_strlen().  AFAICS, some of it predates
the patch but my changes made it worse and also managed trigger
it.

On 07/17/2018 09:19 AM, Martin Sebor wrote:
> My enhancement to extract constant strings out of complex
> aggregates committed last week introduced a couple of bugs in
> dealing with non-constant indices and offsets.  One of the bugs
> was fixed earlier today (PR 86528) but another one remains.  It
> causes strlen (among other functions) to incorrectly fold
> expressions involving a non-constant index into an array of
> strings by treating the index the same as a non-consatnt
> offset into it.
>
> The non-constant index should either prevent the folding, or it
> needs to handle it differently from an offset.
>
> The attached patch takes the conservative approach of avoiding
> the folding in this case.  The remaining changes deal with
> the fallout from the fix.
>
> Tested on x86_64-linux.
>
> Martin
>
PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522

gcc/ChangeLog:

	PR tree-optimization/86532
	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
	(check_access): Be prepared for non-constant length ranges.
	* expr.c (string_constant): Only handle the minor non-constant
	array index.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86532
	* gcc.c-torture/execute/strlen-2.c: New test.
	* gcc.c-torture/execute/strlen-3.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c069d66..03cf012 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -517,7 +517,7 @@ get_pointer_alignment (tree exp)
   return align;
 }
 
-/* Return the number of non-zero elements in the sequence
+/* Return the number of leading non-zero elements in the sequence
    [ PTR, PTR + MAXELTS ) where each element's size is ELTSIZE bytes.
    ELTSIZE must be a power of 2 less than 8.  Used by c_strlen.  */
 
@@ -605,14 +605,21 @@ c_strlen (tree src, int only_value)
 
   /* Set MAXELTS to sizeof (SRC) / sizeof (*SRC) - 1, the maximum possible
      length of SRC.  Prefer TYPE_SIZE() to TREE_STRING_LENGTH() if possible
-     in case the latter is less than the size of the array.  */
-  HOST_WIDE_INT maxelts = TREE_STRING_LENGTH (src);
+     in case the latter is less than the size of the array, such as when
+     SRC refers to a short string literal used to initialize a large array.
+     In that case, the elements of the array after the terminating NUL are
+     all NUL.  */
+  HOST_WIDE_INT strelts = TREE_STRING_LENGTH (src);
+  strelts = strelts / eltsize - 1;
+
+  HOST_WIDE_INT maxelts = strelts;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      maxelts = tree_to_uhwi (size);
-
-  maxelts = maxelts / eltsize - 1;
+      {
+	maxelts = tree_to_uhwi (size);
+	maxelts = maxelts / eltsize - 1;
+      }
 
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
@@ -620,10 +627,12 @@ c_strlen (tree src, int only_value)
 
   if (byteoff && TREE_CODE (byteoff) != INTEGER_CST)
     {
-      /* If the string has an internal zero byte (e.g., "foo\0bar"), we can't
-	 compute the offset to the following null if we don't know where to
+      /* If the string has an internal NUL character followed by any
+	 non-NUL characters (e.g., "foo\0bar"), we can't compute
+	 the offset to the following NUL if we don't know where to
 	 start searching for it.  */
-      if (string_length (ptr, eltsize, maxelts) < maxelts)
+      unsigned len = string_length (ptr, eltsize, strelts);
+      if (len < strelts)
 	{
 	  /* Return when an embedded null character is found.  */
 	  return NULL_TREE;
@@ -633,12 +642,17 @@ c_strlen (tree src, int only_value)
 	return ssize_int (0);
 
       /* We don't know the starting offset, but we do know that the string
-	 has no internal zero bytes.  We can assume that the offset falls
-	 within the bounds of the string; otherwise, the programmer deserves
-	 what he gets.  Subtract the offset from the length of the string,
-	 and return that.  This would perhaps not be valid if we were dealing
-	 with named arrays in addition to literal string constants.  */
-      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
+	 has no internal zero bytes.  If the offset falls within the bounds
+	 of the string subtract the offset from the length of the string,
+	 and return that.  Otherwise the length is zero.  Take care to
+	 use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
+      offsave = fold_convert (ssizetype, offsave);
+      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
+				      build_int_cst (ssizetype, len * eltsize));
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+			      build_zero_cst (ssizetype));
     }
 
   /* Offset from the beginning of the string in elements.  */
@@ -3192,15 +3206,13 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (dstwrite)
     get_size_range (dstwrite, range);
 
-  /* This can happen at -O0.  */
-  if (range[0] && TREE_CODE (range[0]) != INTEGER_CST)
-    return false;
-
   tree func = get_callee_fndecl (exp);
 
   /* First check the number of bytes to be written against the maximum
      object size.  */
-  if (range[0] && tree_int_cst_lt (maxobjsize, range[0]))
+  if (range[0]
+      && TREE_CODE (range[0]) == INTEGER_CST
+      && tree_int_cst_lt (maxobjsize, range[0]))
     {
       if (TREE_NO_WARNING (exp))
 	return false;
@@ -3235,9 +3247,11 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (range[0] || !exactwrite || integer_all_onesp (dstwrite))
     {
       if (range[0]
+	  && TREE_CODE (range[0]) == INTEGER_CST
 	  && ((tree_fits_uhwi_p (dstsize)
 	       && tree_int_cst_lt (dstsize, range[0]))
-	      || (tree_fits_uhwi_p (dstwrite)
+	      || (dstwrite
+		  && tree_fits_uhwi_p (dstwrite)
 		  && tree_int_cst_lt (dstwrite, range[0]))))
 	{
 	  if (TREE_NO_WARNING (exp))
diff --git a/gcc/expr.c b/gcc/expr.c
index f665e18..79f6196 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11282,24 +11282,34 @@ string_constant (tree arg, tree *ptr_offset)
   /* Non-constant index into the character array in an ARRAY_REF
      expression or null.  */
   tree varidx = NULL_TREE;
+  tree chartype;
 
   poly_int64 base_off = 0;
 
   if (TREE_CODE (arg) == ADDR_EXPR)
     {
+      tree argtype = TREE_TYPE (arg);
+      chartype = argtype;
+
       arg = TREE_OPERAND (arg, 0);
       tree ref = arg;
       if (TREE_CODE (arg) == ARRAY_REF)
 	{
 	  tree idx = TREE_OPERAND (arg, 1);
-	  if (TREE_CODE (idx) != INTEGER_CST)
+	  if (TREE_CODE (idx) != INTEGER_CST
+	      && TREE_CODE (argtype) == POINTER_TYPE)
 	    {
-	      /* Extract the variable index to prevent
-		 get_addr_base_and_unit_offset() from failing due to
-		 it.  Use it later to compute the non-constant offset
+	      /* From a pointer (but not array) argument extract the variable
+		 index to prevent get_addr_base_and_unit_offset() from failing
+		 due to it.  Use it later to compute the non-constant offset
 		 into the string and return it to the caller.  */
 	      varidx = idx;
 	      ref = TREE_OPERAND (arg, 0);
+
+	      tree type = TREE_TYPE (arg);
+	      if (TREE_CODE (type) == ARRAY_TYPE
+		  && TREE_CODE (type) != INTEGER_TYPE)
+		return NULL_TREE;
 	    }
 	}
       array = get_addr_base_and_unit_offset (ref, &base_off);
@@ -11334,7 +11344,10 @@ string_constant (tree arg, tree *ptr_offset)
       return NULL_TREE;
     }
   else if (DECL_P (arg))
-    array = arg;
+    {
+      array = arg;
+      chartype = TREE_TYPE (arg);
+    }
   else
     return NULL_TREE;
 
@@ -11343,16 +11356,15 @@ string_constant (tree arg, tree *ptr_offset)
     {
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
-      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
-	{
-	  /* Add the scaled variable index to the constant offset.  */
-	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
-				     fold_convert (sizetype, varidx),
-				     eltsize);
-	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
-	}
-      else
-	return NULL_TREE;
+
+      while (TREE_CODE (chartype) != INTEGER_TYPE)
+	chartype = TREE_TYPE (chartype);
+
+      /* Set the non-constant offset to the non-constant index scaled
+	 by the size of the character type.  */
+      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
+			    fold_convert (sizetype, varidx),
+			    TYPE_SIZE_UNIT (chartype));
     }
 
   if (TREE_CODE (array) == STRING_CST)
@@ -11371,11 +11383,6 @@ string_constant (tree arg, tree *ptr_offset)
     return NULL_TREE;
   if (TREE_CODE (init) == CONSTRUCTOR)
     {
-      if (TREE_CODE (arg) != ARRAY_REF
-	  && TREE_CODE (arg) == COMPONENT_REF
-	  && TREE_CODE (arg) == MEM_REF)
-	return NULL_TREE;
-
       /* Convert the 64-bit constant offset to a wider type to avoid
 	 overflow.  */
       offset_int wioff;
@@ -11391,11 +11398,15 @@ string_constant (tree arg, tree *ptr_offset)
       init = fold_ctor_reference (NULL_TREE, init, base_off, 0, array,
 				  &fieldoff);
       HOST_WIDE_INT cstoff;
-      if (init && base_off.is_constant (&cstoff))
-	{
-	  cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
-	  offset = build_int_cst (sizetype, cstoff);
-	}
+      if (!base_off.is_constant (&cstoff))
+	return NULL_TREE;
+
+      cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
+      tree off = build_int_cst (sizetype, cstoff);
+      if (varidx)
+	offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, off);
+      else
+	offset = off;
     }
 
   if (!init || TREE_CODE (init) != STRING_CST)
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-2.c b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
new file mode 100644
index 0000000..4519f6a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
@@ -0,0 +1,210 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3] = { "1", "12" };
+static const char b[2][2][5] = { { "1", "12" }, { "123", "1234" } };
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref_2_3 (void)
+{
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[0][v0]) == 1);
+  A (strlen (&a[v0][0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][0]) == 2);
+  A (strlen (&a[1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][1]) == 1);
+  A (strlen (&a[v1][1]) == 1);
+
+  A (strlen (&a[v1][2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[i0][v0]) == 1);
+  A (strlen (&a[v0][i0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][i0]) == 2);
+  A (strlen (&a[i1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][i1]) == 1);
+  A (strlen (&a[v1][i1]) == 1);
+
+  A (strlen (&a[v1][i2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+}
+
+void test_array_off_2_3 (void)
+{
+  A (strlen (a[0] + 0) == 1);
+  A (strlen (a[0] + v0) == 1);
+  A (strlen (a[v0] + 0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[1] + v0) == 2);
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + 1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + 2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[i0] + i0) == 1);
+  A (strlen (a[i0] + v0) == 1);
+  A (strlen (a[v0] + i0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[i1] + v0) == 2);
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + i1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + i2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+}
+
+void test_array_ref_2_2_5 (void)
+{
+  A (strlen (b[0][v0]) == 1);
+  A (strlen (b[v0][0]) == 1);
+
+  A (strlen (&b[0][0][v0]) == 1);
+  A (strlen (&b[0][v0][0]) == 1);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (b[0][v1]) == 2);
+  A (strlen (b[v1][0]) == 3);
+
+  A (strlen (&b[0][0][v1]) == 0);
+  A (strlen (&b[0][v1][0]) == 2);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (&b[0][v1][v1]) == 1);
+  A (strlen (&b[v1][0][v1]) == 2);
+  A (strlen (&b[v1][v1][0]) == 4);
+  A (strlen (&b[v1][v1][1]) == 3);
+  A (strlen (&b[v1][v1][2]) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][v0]) == 1);
+  A (strlen (b[v0][i0]) == 1);
+
+  A (strlen (&b[i0][i0][v0]) == 1);
+  A (strlen (&b[i0][v0][i0]) == 1);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (b[i0][v1]) == 2);
+  A (strlen (b[v1][i0]) == 3);
+
+  A (strlen (&b[i0][i0][v1]) == 0);
+  A (strlen (&b[i0][v1][i0]) == 2);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (&b[i0][v1][v1]) == 1);
+  A (strlen (&b[v1][i0][v1]) == 2);
+  A (strlen (&b[v1][v1][i0]) == 4);
+  A (strlen (&b[v1][v1][i1]) == 3);
+  A (strlen (&b[v1][v1][i2]) == 2);
+}
+
+void test_array_off_2_2_5 (void)
+{
+  A (strlen (b[0][0] + v0) == 1);
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[0][0] + v1) == 0);
+  A (strlen (b[0][v1] + 0) == 2);
+  A (strlen (b[v0][0] + 0) == 1);
+
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + 0) == 1);
+
+  A (strlen (b[0][v1] + v1) == 1);
+  A (strlen (b[v1][0] + v1) == 2);
+  A (strlen (b[v1][v1] + 0) == 4);
+  A (strlen (b[v1][v1] + 1) == 3);
+  A (strlen (b[v1][v1] + 2) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][i0] + v0) == 1);
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[i0][i0] + v1) == 0);
+  A (strlen (b[i0][v1] + i0) == 2);
+  A (strlen (b[v0][i0] + i0) == 1);
+
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + i0) == 1);
+
+  A (strlen (b[i0][v1] + v1) == 1);
+  A (strlen (b[v1][i0] + v1) == 2);
+  A (strlen (b[v1][v1] + i0) == 4);
+  A (strlen (b[v1][v1] + i1) == 3);
+  A (strlen (b[v1][v1] + i2) == 2);
+}
+
+int main ()
+{
+  test_array_ref_2_3 ();
+  test_array_off_2_3 ();
+
+  test_array_ref_2_2_5 ();
+  test_array_off_2_2_5 ();
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-3.c b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
new file mode 100644
index 0000000..2696948
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
@@ -0,0 +1,133 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding
+   starting with r262522
+   Exercise strlen() with a multi-dimensional array of strings with
+   embedded nuls.  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3][9] = {
+  { "1", "1\0002" },
+  { "12\0003", "123\0004" }
+};
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+volatile int v3 = 3;
+volatile int v4 = 4;
+volatile int v5 = 5;
+volatile int v6 = 6;
+volatile int v7 = 7;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+  int i5 = i4 + 1;
+  int i6 = i5 + 1;
+  int i7 = i6 + 1;
+
+  A (strlen (a[0][0]) == 1);
+  A (strlen (a[0][1]) == 1);
+
+  A (strlen (a[1][0]) == 2);
+  A (strlen (a[1][1]) == 3);
+
+  A (strlen (&a[0][0][0]) == 1);
+  A (strlen (&a[0][1][0]) == 1);
+
+  A (strlen (&a[1][0][0]) == 2);
+  A (strlen (&a[1][1][0]) == 3);
+
+  A (strlen (&a[0][0][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 2) == 1);
+  A (strlen (&a[0][1][0] + 3) == 0);
+  A (strlen (&a[0][1][0] + 7) == 0);
+
+  A (strlen (&a[1][0][0] + 1) == 1);
+  A (strlen (&a[1][1][0] + 1) == 2);
+  A (strlen (&a[1][1][0] + 2) == 1);
+  A (strlen (&a[1][1][0] + 7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+  A (strlen (&a[i0][i1][i1]) == 0);
+  A (strlen (&a[i0][i1][i2]) == 1);
+  A (strlen (&a[i0][i1][i3]) == 0);
+  A (strlen (&a[i0][i1][i3]) == 0);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+  A (strlen (&a[i1][i1][i1]) == 2);
+  A (strlen (&a[i1][i1][i2]) == 1);
+  A (strlen (&a[i1][i1][i3]) == 0);
+  A (strlen (&a[i1][i1][i4]) == 1);
+  A (strlen (&a[i1][i1][i5]) == 0);
+  A (strlen (&a[i1][i1][i6]) == 0);
+  A (strlen (&a[i1][i1][i7]) == 0);
+
+  A (strlen (&a[i0][i0][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i7) == 0);
+
+  A (strlen (&a[i1][i0][i0] + i1) == 1);
+  A (strlen (&a[i1][i1][i0] + i1) == 2);
+  A (strlen (&a[i1][i1][i0] + i2) == 1);
+  A (strlen (&a[i1][i1][i0] + i3) == 0);
+  A (strlen (&a[i1][i1][i0] + i4) == 1);
+  A (strlen (&a[i1][i1][i0] + i5) == 0);
+  A (strlen (&a[i1][i1][i0] + i6) == 0);
+  A (strlen (&a[i1][i1][i0] + i7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+
+  A (strlen (&a[i0][i0][i0] + v1) == 0);
+  A (strlen (&a[i0][i0][i0] + v2) == 0);
+  A (strlen (&a[i0][i0][i0] + v7) == 0);
+
+  A (strlen (&a[i0][i1][i0] + v1) == 0);
+  A (strlen (&a[i0][i1][i0] + v2) == 1);
+  A (strlen (&a[i0][i1][i0] + v3) == 0);
+
+  A (strlen (&a[i1][i0][i0] + v1) == 1);
+  A (strlen (&a[i1][i1][i0] + v1) == 2);
+  A (strlen (&a[i1][i1][i0] + v2) == 1);
+  A (strlen (&a[i1][i1][i0] + v3) == 0);
+  A (strlen (&a[i1][i1][i0] + v4) == 1);
+  A (strlen (&a[i1][i1][i0] + v5) == 0);
+  A (strlen (&a[i1][i1][i0] + v6) == 0);
+  A (strlen (&a[i1][i1][i0] + v7) == 0);
+}
+
+
+int main (void)
+{
+  test_array_ref ();
+}
Richard Biener July 18, 2018, 8:31 a.m. UTC | #4
On Tue, 17 Jul 2018, Martin Sebor wrote:

> The attached update takes care of a couple of problems pointed
> out by Bernd Edlinger in his comments on the bug.  The ICE he
> mentioned in comment #20 was due mixing sizetype, ssizetype,
> and size_type_node in c_strlen().  AFAICS, some of it predates
> the patch but my changes made it worse and also managed trigger
> it.

+        has no internal zero bytes.  If the offset falls within the 
bounds
+        of the string subtract the offset from the length of the string,
+        and return that.  Otherwise the length is zero.  Take care to
+        use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) :
byteoff;
+      offsave = fold_convert (ssizetype, offsave);
+      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node,
offsave,
+                                     build_int_cst (ssizetype, len *
eltsize));
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
offsave);
+      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+                             build_zero_cst (ssizetype));

in what case are you expecting to return an actual COND_EXRP and
why is that useful?  You return a signed value but bother to
guard it so it is never less than zero.  Why?  Why not simply
return the difference as you did before but with the side-effects
properly handled?

> On 07/17/2018 09:19 AM, Martin Sebor wrote:
> > My enhancement to extract constant strings out of complex
> > aggregates committed last week introduced a couple of bugs in
> > dealing with non-constant indices and offsets.  One of the bugs
> > was fixed earlier today (PR 86528) but another one remains.  It
> > causes strlen (among other functions) to incorrectly fold
> > expressions involving a non-constant index into an array of
> > strings by treating the index the same as a non-consatnt
> > offset into it.
> > 
> > The non-constant index should either prevent the folding, or it
> > needs to handle it differently from an offset.
> > 
> > The attached patch takes the conservative approach of avoiding
> > the folding in this case.  The remaining changes deal with
> > the fallout from the fix.
> > 
> > Tested on x86_64-linux.
> > 
> > Martin
> > 
> 
>
Martin Sebor July 18, 2018, 2:41 p.m. UTC | #5
On 07/18/2018 02:31 AM, Richard Biener wrote:
> On Tue, 17 Jul 2018, Martin Sebor wrote:
>
>> The attached update takes care of a couple of problems pointed
>> out by Bernd Edlinger in his comments on the bug.  The ICE he
>> mentioned in comment #20 was due mixing sizetype, ssizetype,
>> and size_type_node in c_strlen().  AFAICS, some of it predates
>> the patch but my changes made it worse and also managed trigger
>> it.
>
> +        has no internal zero bytes.  If the offset falls within the
> bounds
> +        of the string subtract the offset from the length of the string,
> +        and return that.  Otherwise the length is zero.  Take care to
> +        use SAVE_EXPR in case the OFFSET has side-effects.  */
> +      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) :
> byteoff;
> +      offsave = fold_convert (ssizetype, offsave);
> +      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node,
> offsave,
> +                                     build_int_cst (ssizetype, len *
> eltsize));
> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
> offsave);
> +      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
> +                             build_zero_cst (ssizetype));
>
> in what case are you expecting to return an actual COND_EXRP and
> why is that useful?

It's necessary to correctly handle strings with multiple trailing
nuls, like in:

   const char a[8] = "123";
   int f (int i)
   {
     return strlen (a + i);
   }

If (i <= 3) then the length is i.  If it's greater than 3 then
the length is zero.  I'd expect such strings to be quite common,
even pervasive, in the case of multidimensional arrays or arrays
of structs with array members.  (Probably less so in plain one-
dimensional arrays like the one above.)

> You return a signed value but bother to
> guard it so it is never less than zero.  Why?  Why not simply
> return the difference as you did before but with the side-effects
> properly handled?

Hopefully the above answers this question (if there's a way
to do it in a more straightforward way please let me know).

FWIW, as I said in bug 86434, I think this folding is premature
and prevents other optimizations that I suspect would be more
profitable.  I'm only preserving it here for now but at some
point I hope we can agree to defer it until later when more
information about the offset is known and when it will
benefit other optimizations.  I read your comments on the bug
and I'll see if it's possible to have it both ways.

Martin

>
>> On 07/17/2018 09:19 AM, Martin Sebor wrote:
>>> My enhancement to extract constant strings out of complex
>>> aggregates committed last week introduced a couple of bugs in
>>> dealing with non-constant indices and offsets.  One of the bugs
>>> was fixed earlier today (PR 86528) but another one remains.  It
>>> causes strlen (among other functions) to incorrectly fold
>>> expressions involving a non-constant index into an array of
>>> strings by treating the index the same as a non-consatnt
>>> offset into it.
>>>
>>> The non-constant index should either prevent the folding, or it
>>> needs to handle it differently from an offset.
>>>
>>> The attached patch takes the conservative approach of avoiding
>>> the folding in this case.  The remaining changes deal with
>>> the fallout from the fix.
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>
>>
>
Martin Sebor July 18, 2018, 7:47 p.m. UTC | #6
>>> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
>>> +    chartype = TREE_TYPE (chartype);
>> This is a bit concerning.  First under what conditions is chartype not
>> going to be an INTEGER_TYPE?  And under what conditions will extracting
>> its type ultimately lead to something that is an INTEGER_TYPE?
>
> chartype is usually (maybe even always) pointer type here:
>
>   const char a[] = "123";
>   extern int i;
>   n = strlen (&a[i]);

But your hunch was correct that the loop isn't safe because
the element type need not be an integer (I didn't know/forgot
that the function is called for non-strings too).  The loop
should be replaced by:

       while (TREE_CODE (chartype) == ARRAY_TYPE
	     || TREE_CODE (chartype) == POINTER_TYPE)
	chartype = TREE_TYPE (chartype);

      if (TREE_CODE (chartype) != INTEGER_TYPE)
	return NULL;

I will update the patch before committing.

FWIW, it seems like it would be useful to extend the function to
non-string arguments.  That way it would be able to return array
initializers in cases like this:

   const struct A { int a[2]; }
     a = { { 1, 2 } },
     b = { { 1, 2 } };

   int f (void)
   {
     return __builtin_memcmp (&a, &b, sizeof a);
   }

which would in turn make it possible to fold the result of
such calls analogously to strlen or strcmp.

Martin
Richard Biener July 19, 2018, 7:17 a.m. UTC | #7
On Wed, 18 Jul 2018, Martin Sebor wrote:

> > > > +      while (TREE_CODE (chartype) != INTEGER_TYPE)
> > > > +    chartype = TREE_TYPE (chartype);
> > > This is a bit concerning.  First under what conditions is chartype not
> > > going to be an INTEGER_TYPE?  And under what conditions will extracting
> > > its type ultimately lead to something that is an INTEGER_TYPE?
> > 
> > chartype is usually (maybe even always) pointer type here:
> > 
> >   const char a[] = "123";
> >   extern int i;
> >   n = strlen (&a[i]);
> 
> But your hunch was correct that the loop isn't safe because
> the element type need not be an integer (I didn't know/forgot
> that the function is called for non-strings too).  The loop
> should be replaced by:
> 
>       while (TREE_CODE (chartype) == ARRAY_TYPE
> 	     || TREE_CODE (chartype) == POINTER_TYPE)
> 	chartype = TREE_TYPE (chartype);

As this function may be called "late" you need to cope with
the middle-end ignoring type changes and thus happily
passing int *** directly rather than (char *) of that.

Also doesn't the above yield int for int *[]?

I guess you really want

   if (POINTER_TYPE_P (chartype))
     chartype = TREE_TYPE (chartype);
   while (TREE_CODE (chartype) == ARRAY_TYPE)
     chartype = TREE_TYPE (chartype);

?

>      if (TREE_CODE (chartype) != INTEGER_TYPE)
> 	return NULL;
> 
> I will update the patch before committing.
> 
> FWIW, it seems like it would be useful to extend the function to
> non-string arguments.  That way it would be able to return array
> initializers in cases like this:
> 
>   const struct A { int a[2]; }
>     a = { { 1, 2 } },
>     b = { { 1, 2 } };
> 
>   int f (void)
>   {
>     return __builtin_memcmp (&a, &b, sizeof a);
>   }
> 
> which would in turn make it possible to fold the result of
> such calls analogously to strlen or strcmp.
> 
> Martin
> 
>
Martin Sebor July 19, 2018, 7:49 p.m. UTC | #8
On 07/19/2018 01:17 AM, Richard Biener wrote:
> On Wed, 18 Jul 2018, Martin Sebor wrote:
>
>>>>> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
>>>>> +    chartype = TREE_TYPE (chartype);
>>>> This is a bit concerning.  First under what conditions is chartype not
>>>> going to be an INTEGER_TYPE?  And under what conditions will extracting
>>>> its type ultimately lead to something that is an INTEGER_TYPE?
>>>
>>> chartype is usually (maybe even always) pointer type here:
>>>
>>>   const char a[] = "123";
>>>   extern int i;
>>>   n = strlen (&a[i]);
>>
>> But your hunch was correct that the loop isn't safe because
>> the element type need not be an integer (I didn't know/forgot
>> that the function is called for non-strings too).  The loop
>> should be replaced by:
>>
>>       while (TREE_CODE (chartype) == ARRAY_TYPE
>> 	     || TREE_CODE (chartype) == POINTER_TYPE)
>> 	chartype = TREE_TYPE (chartype);
>
> As this function may be called "late" you need to cope with
> the middle-end ignoring type changes and thus happily
> passing int *** directly rather than (char *) of that.
>
> Also doesn't the above yield int for int *[]?

I don't think it ever gets this far for either a pointer to
an array of int, or for an array of pointers to int.  So for
something like the following the function fails earlier:

   const int* const a[2] = { ... };
   const char* (const *p)[2] = &a;

   int f (void)
   {
     return __builtin_memcmp (*p, "12345678", 8);
   }

(Assuming this is what you were asking about.)

> I guess you really want
>
>    if (POINTER_TYPE_P (chartype))
>      chartype = TREE_TYPE (chartype);
>    while (TREE_CODE (chartype) == ARRAY_TYPE)
>      chartype = TREE_TYPE (chartype);
>
> ?

That seems to work too.  Attached is an update with this tweak.
The update also addresses some of Bernd's comments: it removes
the pointless second test in:

	if (TREE_CODE (type) == ARRAY_TYPE
	    && TREE_CODE (type) != INTEGER_TYPE)

the unused assignment to chartype in:

    else if (DECL_P (arg))
      {
        array = arg;
        chartype = TREE_TYPE (arg);
      }

and calls string_constant() instead of strnlen() to compute
the length of a generic string.

Other improvements  are possible in this area but they are
orthogonal to the bug I'm trying to fix so I'll post separate
patches for some of those.

Martin
PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522

gcc/ChangeLog:

	PR tree-optimization/86532
	* builtins.h (string_length): Declare.
	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
	(check_access): Be prepared for non-constant length ranges.
	(string_length): Make extern.
	* expr.c (string_constant): Only handle the minor non-constant
	array index.  Use string_constant to compute the length of
	a generic string constant.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86532
	* gcc.c-torture/execute/strlen-2.c: New test.
	* gcc.c-torture/execute/strlen-3.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c069d66..ceb477d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -517,11 +517,11 @@ get_pointer_alignment (tree exp)
   return align;
 }
 
-/* Return the number of non-zero elements in the sequence
+/* Return the number of leading non-zero elements in the sequence
    [ PTR, PTR + MAXELTS ) where each element's size is ELTSIZE bytes.
    ELTSIZE must be a power of 2 less than 8.  Used by c_strlen.  */
 
-static unsigned
+unsigned
 string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
 {
   gcc_checking_assert (eltsize == 1 || eltsize == 2 || eltsize == 4);
@@ -605,14 +605,21 @@ c_strlen (tree src, int only_value)
 
   /* Set MAXELTS to sizeof (SRC) / sizeof (*SRC) - 1, the maximum possible
      length of SRC.  Prefer TYPE_SIZE() to TREE_STRING_LENGTH() if possible
-     in case the latter is less than the size of the array.  */
-  HOST_WIDE_INT maxelts = TREE_STRING_LENGTH (src);
+     in case the latter is less than the size of the array, such as when
+     SRC refers to a short string literal used to initialize a large array.
+     In that case, the elements of the array after the terminating NUL are
+     all NUL.  */
+  HOST_WIDE_INT strelts = TREE_STRING_LENGTH (src);
+  strelts = strelts / eltsize - 1;
+
+  HOST_WIDE_INT maxelts = strelts;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      maxelts = tree_to_uhwi (size);
-
-  maxelts = maxelts / eltsize - 1;
+      {
+	maxelts = tree_to_uhwi (size);
+	maxelts = maxelts / eltsize - 1;
+      }
 
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
@@ -620,10 +627,12 @@ c_strlen (tree src, int only_value)
 
   if (byteoff && TREE_CODE (byteoff) != INTEGER_CST)
     {
-      /* If the string has an internal zero byte (e.g., "foo\0bar"), we can't
-	 compute the offset to the following null if we don't know where to
+      /* If the string has an internal NUL character followed by any
+	 non-NUL characters (e.g., "foo\0bar"), we can't compute
+	 the offset to the following NUL if we don't know where to
 	 start searching for it.  */
-      if (string_length (ptr, eltsize, maxelts) < maxelts)
+      unsigned len = string_length (ptr, eltsize, strelts);
+      if (len < strelts)
 	{
 	  /* Return when an embedded null character is found.  */
 	  return NULL_TREE;
@@ -633,12 +642,17 @@ c_strlen (tree src, int only_value)
 	return ssize_int (0);
 
       /* We don't know the starting offset, but we do know that the string
-	 has no internal zero bytes.  We can assume that the offset falls
-	 within the bounds of the string; otherwise, the programmer deserves
-	 what he gets.  Subtract the offset from the length of the string,
-	 and return that.  This would perhaps not be valid if we were dealing
-	 with named arrays in addition to literal string constants.  */
-      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
+	 has no internal zero bytes.  If the offset falls within the bounds
+	 of the string subtract the offset from the length of the string,
+	 and return that.  Otherwise the length is zero.  Take care to
+	 use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
+      offsave = fold_convert (ssizetype, offsave);
+      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
+				      build_int_cst (ssizetype, len * eltsize));
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+			      build_zero_cst (ssizetype));
     }
 
   /* Offset from the beginning of the string in elements.  */
@@ -3192,15 +3206,13 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (dstwrite)
     get_size_range (dstwrite, range);
 
-  /* This can happen at -O0.  */
-  if (range[0] && TREE_CODE (range[0]) != INTEGER_CST)
-    return false;
-
   tree func = get_callee_fndecl (exp);
 
   /* First check the number of bytes to be written against the maximum
      object size.  */
-  if (range[0] && tree_int_cst_lt (maxobjsize, range[0]))
+  if (range[0]
+      && TREE_CODE (range[0]) == INTEGER_CST
+      && tree_int_cst_lt (maxobjsize, range[0]))
     {
       if (TREE_NO_WARNING (exp))
 	return false;
@@ -3235,9 +3247,11 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (range[0] || !exactwrite || integer_all_onesp (dstwrite))
     {
       if (range[0]
+	  && TREE_CODE (range[0]) == INTEGER_CST
 	  && ((tree_fits_uhwi_p (dstsize)
 	       && tree_int_cst_lt (dstsize, range[0]))
-	      || (tree_fits_uhwi_p (dstwrite)
+	      || (dstwrite
+		  && tree_fits_uhwi_p (dstwrite)
 		  && tree_int_cst_lt (dstwrite, range[0]))))
 	{
 	  if (TREE_NO_WARNING (exp))
diff --git a/gcc/builtins.h b/gcc/builtins.h
index c922904..2e0a2f9 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -57,6 +57,7 @@ extern unsigned int get_object_alignment (tree);
 extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
+extern unsigned string_length (const void*, unsigned, unsigned);
 extern tree c_strlen (tree, int);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
diff --git a/gcc/expr.c b/gcc/expr.c
index f665e18..2162ca9 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11282,24 +11282,33 @@ string_constant (tree arg, tree *ptr_offset)
   /* Non-constant index into the character array in an ARRAY_REF
      expression or null.  */
   tree varidx = NULL_TREE;
+  tree chartype = NULL_TREE;
 
   poly_int64 base_off = 0;
 
   if (TREE_CODE (arg) == ADDR_EXPR)
     {
+      tree argtype = TREE_TYPE (arg);
+      chartype = argtype;
+
       arg = TREE_OPERAND (arg, 0);
       tree ref = arg;
       if (TREE_CODE (arg) == ARRAY_REF)
 	{
 	  tree idx = TREE_OPERAND (arg, 1);
-	  if (TREE_CODE (idx) != INTEGER_CST)
+	  if (TREE_CODE (idx) != INTEGER_CST
+	      && TREE_CODE (argtype) == POINTER_TYPE)
 	    {
-	      /* Extract the variable index to prevent
-		 get_addr_base_and_unit_offset() from failing due to
-		 it.  Use it later to compute the non-constant offset
+	      /* From a pointer (but not array) argument extract the variable
+		 index to prevent get_addr_base_and_unit_offset() from failing
+		 due to it.  Use it later to compute the non-constant offset
 		 into the string and return it to the caller.  */
 	      varidx = idx;
 	      ref = TREE_OPERAND (arg, 0);
+
+	      tree type = TREE_TYPE (arg);
+	      if (TREE_CODE (type) == ARRAY_TYPE)
+		return NULL_TREE;
 	    }
 	}
       array = get_addr_base_and_unit_offset (ref, &base_off);
@@ -11343,16 +11352,24 @@ string_constant (tree arg, tree *ptr_offset)
     {
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
-      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
-	{
-	  /* Add the scaled variable index to the constant offset.  */
-	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
-				     fold_convert (sizetype, varidx),
-				     eltsize);
-	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
-	}
-      else
-	return NULL_TREE;
+
+      /* At this point, CHARTYPE is in most (all?) cases a pointer.
+	 Determine the underlying character type, taking care to
+	 avoid non-integer types (callers need not pass in arguments
+	 that only refer to character arrays).  */
+      if (POINTER_TYPE_P (chartype))
+	chartype = TREE_TYPE (chartype);
+      while (TREE_CODE (chartype) == ARRAY_TYPE)
+       chartype = TREE_TYPE (chartype);
+
+      if (TREE_CODE (chartype) != INTEGER_TYPE)
+	return NULL;
+
+      /* Set the non-constant offset to the non-constant index scaled
+	 by the size of the character type.  */
+      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
+			    fold_convert (sizetype, varidx),
+			    TYPE_SIZE_UNIT (chartype));
     }
 
   if (TREE_CODE (array) == STRING_CST)
@@ -11371,11 +11388,6 @@ string_constant (tree arg, tree *ptr_offset)
     return NULL_TREE;
   if (TREE_CODE (init) == CONSTRUCTOR)
     {
-      if (TREE_CODE (arg) != ARRAY_REF
-	  && TREE_CODE (arg) == COMPONENT_REF
-	  && TREE_CODE (arg) == MEM_REF)
-	return NULL_TREE;
-
       /* Convert the 64-bit constant offset to a wider type to avoid
 	 overflow.  */
       offset_int wioff;
@@ -11391,11 +11403,15 @@ string_constant (tree arg, tree *ptr_offset)
       init = fold_ctor_reference (NULL_TREE, init, base_off, 0, array,
 				  &fieldoff);
       HOST_WIDE_INT cstoff;
-      if (init && base_off.is_constant (&cstoff))
-	{
-	  cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
-	  offset = build_int_cst (sizetype, cstoff);
-	}
+      if (!base_off.is_constant (&cstoff))
+	return NULL_TREE;
+
+      cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
+      tree off = build_int_cst (sizetype, cstoff);
+      if (varidx)
+	offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, off);
+      else
+	offset = off;
     }
 
   if (!init || TREE_CODE (init) != STRING_CST)
@@ -11413,8 +11429,10 @@ string_constant (tree arg, tree *ptr_offset)
      const char a[4] = "abc\000\000";
      The excess elements contribute to TREE_STRING_LENGTH()
      but not to strlen().  */
-  unsigned HOST_WIDE_INT length
-    = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
+  unsigned HOST_WIDE_INT charsize
+    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init))));
+  unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
+  length = string_length (TREE_STRING_POINTER (init), charsize, length);
   if (compare_tree_int (array_size, length + 1) < 0)
     return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-2.c b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
new file mode 100644
index 0000000..4519f6a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
@@ -0,0 +1,210 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3] = { "1", "12" };
+static const char b[2][2][5] = { { "1", "12" }, { "123", "1234" } };
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref_2_3 (void)
+{
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[0][v0]) == 1);
+  A (strlen (&a[v0][0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][0]) == 2);
+  A (strlen (&a[1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][1]) == 1);
+  A (strlen (&a[v1][1]) == 1);
+
+  A (strlen (&a[v1][2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[i0][v0]) == 1);
+  A (strlen (&a[v0][i0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][i0]) == 2);
+  A (strlen (&a[i1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][i1]) == 1);
+  A (strlen (&a[v1][i1]) == 1);
+
+  A (strlen (&a[v1][i2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+}
+
+void test_array_off_2_3 (void)
+{
+  A (strlen (a[0] + 0) == 1);
+  A (strlen (a[0] + v0) == 1);
+  A (strlen (a[v0] + 0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[1] + v0) == 2);
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + 1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + 2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[i0] + i0) == 1);
+  A (strlen (a[i0] + v0) == 1);
+  A (strlen (a[v0] + i0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[i1] + v0) == 2);
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + i1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + i2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+}
+
+void test_array_ref_2_2_5 (void)
+{
+  A (strlen (b[0][v0]) == 1);
+  A (strlen (b[v0][0]) == 1);
+
+  A (strlen (&b[0][0][v0]) == 1);
+  A (strlen (&b[0][v0][0]) == 1);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (b[0][v1]) == 2);
+  A (strlen (b[v1][0]) == 3);
+
+  A (strlen (&b[0][0][v1]) == 0);
+  A (strlen (&b[0][v1][0]) == 2);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (&b[0][v1][v1]) == 1);
+  A (strlen (&b[v1][0][v1]) == 2);
+  A (strlen (&b[v1][v1][0]) == 4);
+  A (strlen (&b[v1][v1][1]) == 3);
+  A (strlen (&b[v1][v1][2]) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][v0]) == 1);
+  A (strlen (b[v0][i0]) == 1);
+
+  A (strlen (&b[i0][i0][v0]) == 1);
+  A (strlen (&b[i0][v0][i0]) == 1);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (b[i0][v1]) == 2);
+  A (strlen (b[v1][i0]) == 3);
+
+  A (strlen (&b[i0][i0][v1]) == 0);
+  A (strlen (&b[i0][v1][i0]) == 2);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (&b[i0][v1][v1]) == 1);
+  A (strlen (&b[v1][i0][v1]) == 2);
+  A (strlen (&b[v1][v1][i0]) == 4);
+  A (strlen (&b[v1][v1][i1]) == 3);
+  A (strlen (&b[v1][v1][i2]) == 2);
+}
+
+void test_array_off_2_2_5 (void)
+{
+  A (strlen (b[0][0] + v0) == 1);
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[0][0] + v1) == 0);
+  A (strlen (b[0][v1] + 0) == 2);
+  A (strlen (b[v0][0] + 0) == 1);
+
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + 0) == 1);
+
+  A (strlen (b[0][v1] + v1) == 1);
+  A (strlen (b[v1][0] + v1) == 2);
+  A (strlen (b[v1][v1] + 0) == 4);
+  A (strlen (b[v1][v1] + 1) == 3);
+  A (strlen (b[v1][v1] + 2) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][i0] + v0) == 1);
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[i0][i0] + v1) == 0);
+  A (strlen (b[i0][v1] + i0) == 2);
+  A (strlen (b[v0][i0] + i0) == 1);
+
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + i0) == 1);
+
+  A (strlen (b[i0][v1] + v1) == 1);
+  A (strlen (b[v1][i0] + v1) == 2);
+  A (strlen (b[v1][v1] + i0) == 4);
+  A (strlen (b[v1][v1] + i1) == 3);
+  A (strlen (b[v1][v1] + i2) == 2);
+}
+
+int main ()
+{
+  test_array_ref_2_3 ();
+  test_array_off_2_3 ();
+
+  test_array_ref_2_2_5 ();
+  test_array_off_2_2_5 ();
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-3.c b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
new file mode 100644
index 0000000..2696948
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
@@ -0,0 +1,133 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding
+   starting with r262522
+   Exercise strlen() with a multi-dimensional array of strings with
+   embedded nuls.  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3][9] = {
+  { "1", "1\0002" },
+  { "12\0003", "123\0004" }
+};
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+volatile int v3 = 3;
+volatile int v4 = 4;
+volatile int v5 = 5;
+volatile int v6 = 6;
+volatile int v7 = 7;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+  int i5 = i4 + 1;
+  int i6 = i5 + 1;
+  int i7 = i6 + 1;
+
+  A (strlen (a[0][0]) == 1);
+  A (strlen (a[0][1]) == 1);
+
+  A (strlen (a[1][0]) == 2);
+  A (strlen (a[1][1]) == 3);
+
+  A (strlen (&a[0][0][0]) == 1);
+  A (strlen (&a[0][1][0]) == 1);
+
+  A (strlen (&a[1][0][0]) == 2);
+  A (strlen (&a[1][1][0]) == 3);
+
+  A (strlen (&a[0][0][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 2) == 1);
+  A (strlen (&a[0][1][0] + 3) == 0);
+  A (strlen (&a[0][1][0] + 7) == 0);
+
+  A (strlen (&a[1][0][0] + 1) == 1);
+  A (strlen (&a[1][1][0] + 1) == 2);
+  A (strlen (&a[1][1][0] + 2) == 1);
+  A (strlen (&a[1][1][0] + 7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+  A (strlen (&a[i0][i1][i1]) == 0);
+  A (strlen (&a[i0][i1][i2]) == 1);
+  A (strlen (&a[i0][i1][i3]) == 0);
+  A (strlen (&a[i0][i1][i3]) == 0);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+  A (strlen (&a[i1][i1][i1]) == 2);
+  A (strlen (&a[i1][i1][i2]) == 1);
+  A (strlen (&a[i1][i1][i3]) == 0);
+  A (strlen (&a[i1][i1][i4]) == 1);
+  A (strlen (&a[i1][i1][i5]) == 0);
+  A (strlen (&a[i1][i1][i6]) == 0);
+  A (strlen (&a[i1][i1][i7]) == 0);
+
+  A (strlen (&a[i0][i0][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i7) == 0);
+
+  A (strlen (&a[i1][i0][i0] + i1) == 1);
+  A (strlen (&a[i1][i1][i0] + i1) == 2);
+  A (strlen (&a[i1][i1][i0] + i2) == 1);
+  A (strlen (&a[i1][i1][i0] + i3) == 0);
+  A (strlen (&a[i1][i1][i0] + i4) == 1);
+  A (strlen (&a[i1][i1][i0] + i5) == 0);
+  A (strlen (&a[i1][i1][i0] + i6) == 0);
+  A (strlen (&a[i1][i1][i0] + i7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+
+  A (strlen (&a[i0][i0][i0] + v1) == 0);
+  A (strlen (&a[i0][i0][i0] + v2) == 0);
+  A (strlen (&a[i0][i0][i0] + v7) == 0);
+
+  A (strlen (&a[i0][i1][i0] + v1) == 0);
+  A (strlen (&a[i0][i1][i0] + v2) == 1);
+  A (strlen (&a[i0][i1][i0] + v3) == 0);
+
+  A (strlen (&a[i1][i0][i0] + v1) == 1);
+  A (strlen (&a[i1][i1][i0] + v1) == 2);
+  A (strlen (&a[i1][i1][i0] + v2) == 1);
+  A (strlen (&a[i1][i1][i0] + v3) == 0);
+  A (strlen (&a[i1][i1][i0] + v4) == 1);
+  A (strlen (&a[i1][i1][i0] + v5) == 0);
+  A (strlen (&a[i1][i1][i0] + v6) == 0);
+  A (strlen (&a[i1][i1][i0] + v7) == 0);
+}
+
+
+int main (void)
+{
+  test_array_ref ();
+}
Martin Sebor July 19, 2018, 11:26 p.m. UTC | #9
Here's one more update with tweaks addressing a couple more
of Bernd's comments:

1) correct the use of TREE_STRING_LENGTH() where a number of
array elements is expected and not bytes
2) set CHARTYPE as soon as it's first determined rather than
trying to extract it again later

On 07/19/2018 01:49 PM, Martin Sebor wrote:
> On 07/19/2018 01:17 AM, Richard Biener wrote:
>> On Wed, 18 Jul 2018, Martin Sebor wrote:
>>
>>>>>> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
>>>>>> +    chartype = TREE_TYPE (chartype);
>>>>> This is a bit concerning.  First under what conditions is chartype not
>>>>> going to be an INTEGER_TYPE?  And under what conditions will
>>>>> extracting
>>>>> its type ultimately lead to something that is an INTEGER_TYPE?
>>>>
>>>> chartype is usually (maybe even always) pointer type here:
>>>>
>>>>   const char a[] = "123";
>>>>   extern int i;
>>>>   n = strlen (&a[i]);
>>>
>>> But your hunch was correct that the loop isn't safe because
>>> the element type need not be an integer (I didn't know/forgot
>>> that the function is called for non-strings too).  The loop
>>> should be replaced by:
>>>
>>>       while (TREE_CODE (chartype) == ARRAY_TYPE
>>>          || TREE_CODE (chartype) == POINTER_TYPE)
>>>     chartype = TREE_TYPE (chartype);
>>
>> As this function may be called "late" you need to cope with
>> the middle-end ignoring type changes and thus happily
>> passing int *** directly rather than (char *) of that.
>>
>> Also doesn't the above yield int for int *[]?
>
> I don't think it ever gets this far for either a pointer to
> an array of int, or for an array of pointers to int.  So for
> something like the following the function fails earlier:
>
>   const int* const a[2] = { ... };
>   const char* (const *p)[2] = &a;
>
>   int f (void)
>   {
>     return __builtin_memcmp (*p, "12345678", 8);
>   }
>
> (Assuming this is what you were asking about.)
>
>> I guess you really want
>>
>>    if (POINTER_TYPE_P (chartype))
>>      chartype = TREE_TYPE (chartype);
>>    while (TREE_CODE (chartype) == ARRAY_TYPE)
>>      chartype = TREE_TYPE (chartype);
>>
>> ?
>
> That seems to work too.  Attached is an update with this tweak.
> The update also addresses some of Bernd's comments: it removes
> the pointless second test in:
>
>     if (TREE_CODE (type) == ARRAY_TYPE
>         && TREE_CODE (type) != INTEGER_TYPE)
>
> the unused assignment to chartype in:
>
>    else if (DECL_P (arg))
>      {
>        array = arg;
>        chartype = TREE_TYPE (arg);
>      }
>
> and calls string_constant() instead of strnlen() to compute
> the length of a generic string.
>
> Other improvements  are possible in this area but they are
> orthogonal to the bug I'm trying to fix so I'll post separate
> patches for some of those.
>
> Martin
PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522

gcc/ChangeLog:

	PR tree-optimization/86532
	* builtins.h (string_length): Declare.
	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
	(check_access): Be prepared for non-constant length ranges.
	(string_length): Make extern.
	* expr.c (string_constant): Only handle the minor non-constant
	array index.  Use string_constant to compute the length of
	a generic string constant.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86532
	* gcc.c-torture/execute/strlen-2.c: New test.
	* gcc.c-torture/execute/strlen-3.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c069d66..ceb477d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -517,11 +517,11 @@ get_pointer_alignment (tree exp)
   return align;
 }
 
-/* Return the number of non-zero elements in the sequence
+/* Return the number of leading non-zero elements in the sequence
    [ PTR, PTR + MAXELTS ) where each element's size is ELTSIZE bytes.
    ELTSIZE must be a power of 2 less than 8.  Used by c_strlen.  */
 
-static unsigned
+unsigned
 string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
 {
   gcc_checking_assert (eltsize == 1 || eltsize == 2 || eltsize == 4);
@@ -605,14 +605,21 @@ c_strlen (tree src, int only_value)
 
   /* Set MAXELTS to sizeof (SRC) / sizeof (*SRC) - 1, the maximum possible
      length of SRC.  Prefer TYPE_SIZE() to TREE_STRING_LENGTH() if possible
-     in case the latter is less than the size of the array.  */
-  HOST_WIDE_INT maxelts = TREE_STRING_LENGTH (src);
+     in case the latter is less than the size of the array, such as when
+     SRC refers to a short string literal used to initialize a large array.
+     In that case, the elements of the array after the terminating NUL are
+     all NUL.  */
+  HOST_WIDE_INT strelts = TREE_STRING_LENGTH (src);
+  strelts = strelts / eltsize - 1;
+
+  HOST_WIDE_INT maxelts = strelts;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      maxelts = tree_to_uhwi (size);
-
-  maxelts = maxelts / eltsize - 1;
+      {
+	maxelts = tree_to_uhwi (size);
+	maxelts = maxelts / eltsize - 1;
+      }
 
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
@@ -620,10 +627,12 @@ c_strlen (tree src, int only_value)
 
   if (byteoff && TREE_CODE (byteoff) != INTEGER_CST)
     {
-      /* If the string has an internal zero byte (e.g., "foo\0bar"), we can't
-	 compute the offset to the following null if we don't know where to
+      /* If the string has an internal NUL character followed by any
+	 non-NUL characters (e.g., "foo\0bar"), we can't compute
+	 the offset to the following NUL if we don't know where to
 	 start searching for it.  */
-      if (string_length (ptr, eltsize, maxelts) < maxelts)
+      unsigned len = string_length (ptr, eltsize, strelts);
+      if (len < strelts)
 	{
 	  /* Return when an embedded null character is found.  */
 	  return NULL_TREE;
@@ -633,12 +642,17 @@ c_strlen (tree src, int only_value)
 	return ssize_int (0);
 
       /* We don't know the starting offset, but we do know that the string
-	 has no internal zero bytes.  We can assume that the offset falls
-	 within the bounds of the string; otherwise, the programmer deserves
-	 what he gets.  Subtract the offset from the length of the string,
-	 and return that.  This would perhaps not be valid if we were dealing
-	 with named arrays in addition to literal string constants.  */
-      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
+	 has no internal zero bytes.  If the offset falls within the bounds
+	 of the string subtract the offset from the length of the string,
+	 and return that.  Otherwise the length is zero.  Take care to
+	 use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
+      offsave = fold_convert (ssizetype, offsave);
+      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
+				      build_int_cst (ssizetype, len * eltsize));
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+			      build_zero_cst (ssizetype));
     }
 
   /* Offset from the beginning of the string in elements.  */
@@ -3192,15 +3206,13 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (dstwrite)
     get_size_range (dstwrite, range);
 
-  /* This can happen at -O0.  */
-  if (range[0] && TREE_CODE (range[0]) != INTEGER_CST)
-    return false;
-
   tree func = get_callee_fndecl (exp);
 
   /* First check the number of bytes to be written against the maximum
      object size.  */
-  if (range[0] && tree_int_cst_lt (maxobjsize, range[0]))
+  if (range[0]
+      && TREE_CODE (range[0]) == INTEGER_CST
+      && tree_int_cst_lt (maxobjsize, range[0]))
     {
       if (TREE_NO_WARNING (exp))
 	return false;
@@ -3235,9 +3247,11 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (range[0] || !exactwrite || integer_all_onesp (dstwrite))
     {
       if (range[0]
+	  && TREE_CODE (range[0]) == INTEGER_CST
 	  && ((tree_fits_uhwi_p (dstsize)
 	       && tree_int_cst_lt (dstsize, range[0]))
-	      || (tree_fits_uhwi_p (dstwrite)
+	      || (dstwrite
+		  && tree_fits_uhwi_p (dstwrite)
 		  && tree_int_cst_lt (dstwrite, range[0]))))
 	{
 	  if (TREE_NO_WARNING (exp))
diff --git a/gcc/builtins.h b/gcc/builtins.h
index c922904..2e0a2f9 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -57,6 +57,7 @@ extern unsigned int get_object_alignment (tree);
 extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
+extern unsigned string_length (const void*, unsigned, unsigned);
 extern tree c_strlen (tree, int);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
diff --git a/gcc/expr.c b/gcc/expr.c
index f665e18..d7e1914 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11282,24 +11282,35 @@ string_constant (tree arg, tree *ptr_offset)
   /* Non-constant index into the character array in an ARRAY_REF
      expression or null.  */
   tree varidx = NULL_TREE;
+  /* When VARIDX is non-null, set to the type of the character
+     the string consist of.  */
+  tree chartype = NULL_TREE;
 
   poly_int64 base_off = 0;
 
   if (TREE_CODE (arg) == ADDR_EXPR)
     {
+      tree argtype = TREE_TYPE (arg);
+      chartype = argtype;
+
       arg = TREE_OPERAND (arg, 0);
       tree ref = arg;
       if (TREE_CODE (arg) == ARRAY_REF)
 	{
 	  tree idx = TREE_OPERAND (arg, 1);
-	  if (TREE_CODE (idx) != INTEGER_CST)
+	  if (TREE_CODE (idx) != INTEGER_CST
+	      && TREE_CODE (argtype) == POINTER_TYPE)
 	    {
-	      /* Extract the variable index to prevent
-		 get_addr_base_and_unit_offset() from failing due to
-		 it.  Use it later to compute the non-constant offset
+	      /* From a pointer (but not array) argument extract the variable
+		 index to prevent get_addr_base_and_unit_offset() from failing
+		 due to it.  Use it later to compute the non-constant offset
 		 into the string and return it to the caller.  */
 	      varidx = idx;
 	      ref = TREE_OPERAND (arg, 0);
+
+	      chartype = TREE_TYPE (arg);
+	      if (TREE_CODE (chartype) == ARRAY_TYPE)
+		return NULL_TREE;
 	    }
 	}
       array = get_addr_base_and_unit_offset (ref, &base_off);
@@ -11343,16 +11354,15 @@ string_constant (tree arg, tree *ptr_offset)
     {
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
-      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
-	{
-	  /* Add the scaled variable index to the constant offset.  */
-	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
-				     fold_convert (sizetype, varidx),
-				     eltsize);
-	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
-	}
-      else
-	return NULL_TREE;
+
+      if (TREE_CODE (chartype) != INTEGER_TYPE)
+	return NULL;
+
+      /* Set the non-constant offset to the non-constant index scaled
+	 by the size of the character type.  */
+      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
+			    fold_convert (sizetype, varidx),
+			    TYPE_SIZE_UNIT (chartype));
     }
 
   if (TREE_CODE (array) == STRING_CST)
@@ -11371,11 +11381,6 @@ string_constant (tree arg, tree *ptr_offset)
     return NULL_TREE;
   if (TREE_CODE (init) == CONSTRUCTOR)
     {
-      if (TREE_CODE (arg) != ARRAY_REF
-	  && TREE_CODE (arg) == COMPONENT_REF
-	  && TREE_CODE (arg) == MEM_REF)
-	return NULL_TREE;
-
       /* Convert the 64-bit constant offset to a wider type to avoid
 	 overflow.  */
       offset_int wioff;
@@ -11391,11 +11396,15 @@ string_constant (tree arg, tree *ptr_offset)
       init = fold_ctor_reference (NULL_TREE, init, base_off, 0, array,
 				  &fieldoff);
       HOST_WIDE_INT cstoff;
-      if (init && base_off.is_constant (&cstoff))
-	{
-	  cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
-	  offset = build_int_cst (sizetype, cstoff);
-	}
+      if (!base_off.is_constant (&cstoff))
+	return NULL_TREE;
+
+      cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
+      tree off = build_int_cst (sizetype, cstoff);
+      if (varidx)
+	offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, off);
+      else
+	offset = off;
     }
 
   if (!init || TREE_CODE (init) != STRING_CST)
@@ -11413,8 +11422,11 @@ string_constant (tree arg, tree *ptr_offset)
      const char a[4] = "abc\000\000";
      The excess elements contribute to TREE_STRING_LENGTH()
      but not to strlen().  */
-  unsigned HOST_WIDE_INT length
-    = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
+  unsigned HOST_WIDE_INT charsize
+    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init))));
+  unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
+  length = string_length (TREE_STRING_POINTER (init), charsize,
+			  length / charsize);
   if (compare_tree_int (array_size, length + 1) < 0)
     return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-2.c b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
new file mode 100644
index 0000000..4519f6a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
@@ -0,0 +1,210 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3] = { "1", "12" };
+static const char b[2][2][5] = { { "1", "12" }, { "123", "1234" } };
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref_2_3 (void)
+{
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[0][v0]) == 1);
+  A (strlen (&a[v0][0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][0]) == 2);
+  A (strlen (&a[1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][1]) == 1);
+  A (strlen (&a[v1][1]) == 1);
+
+  A (strlen (&a[v1][2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[i0][v0]) == 1);
+  A (strlen (&a[v0][i0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][i0]) == 2);
+  A (strlen (&a[i1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][i1]) == 1);
+  A (strlen (&a[v1][i1]) == 1);
+
+  A (strlen (&a[v1][i2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+}
+
+void test_array_off_2_3 (void)
+{
+  A (strlen (a[0] + 0) == 1);
+  A (strlen (a[0] + v0) == 1);
+  A (strlen (a[v0] + 0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[1] + v0) == 2);
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + 1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + 2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[i0] + i0) == 1);
+  A (strlen (a[i0] + v0) == 1);
+  A (strlen (a[v0] + i0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[i1] + v0) == 2);
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + i1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + i2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+}
+
+void test_array_ref_2_2_5 (void)
+{
+  A (strlen (b[0][v0]) == 1);
+  A (strlen (b[v0][0]) == 1);
+
+  A (strlen (&b[0][0][v0]) == 1);
+  A (strlen (&b[0][v0][0]) == 1);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (b[0][v1]) == 2);
+  A (strlen (b[v1][0]) == 3);
+
+  A (strlen (&b[0][0][v1]) == 0);
+  A (strlen (&b[0][v1][0]) == 2);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (&b[0][v1][v1]) == 1);
+  A (strlen (&b[v1][0][v1]) == 2);
+  A (strlen (&b[v1][v1][0]) == 4);
+  A (strlen (&b[v1][v1][1]) == 3);
+  A (strlen (&b[v1][v1][2]) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][v0]) == 1);
+  A (strlen (b[v0][i0]) == 1);
+
+  A (strlen (&b[i0][i0][v0]) == 1);
+  A (strlen (&b[i0][v0][i0]) == 1);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (b[i0][v1]) == 2);
+  A (strlen (b[v1][i0]) == 3);
+
+  A (strlen (&b[i0][i0][v1]) == 0);
+  A (strlen (&b[i0][v1][i0]) == 2);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (&b[i0][v1][v1]) == 1);
+  A (strlen (&b[v1][i0][v1]) == 2);
+  A (strlen (&b[v1][v1][i0]) == 4);
+  A (strlen (&b[v1][v1][i1]) == 3);
+  A (strlen (&b[v1][v1][i2]) == 2);
+}
+
+void test_array_off_2_2_5 (void)
+{
+  A (strlen (b[0][0] + v0) == 1);
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[0][0] + v1) == 0);
+  A (strlen (b[0][v1] + 0) == 2);
+  A (strlen (b[v0][0] + 0) == 1);
+
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + 0) == 1);
+
+  A (strlen (b[0][v1] + v1) == 1);
+  A (strlen (b[v1][0] + v1) == 2);
+  A (strlen (b[v1][v1] + 0) == 4);
+  A (strlen (b[v1][v1] + 1) == 3);
+  A (strlen (b[v1][v1] + 2) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][i0] + v0) == 1);
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[i0][i0] + v1) == 0);
+  A (strlen (b[i0][v1] + i0) == 2);
+  A (strlen (b[v0][i0] + i0) == 1);
+
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + i0) == 1);
+
+  A (strlen (b[i0][v1] + v1) == 1);
+  A (strlen (b[v1][i0] + v1) == 2);
+  A (strlen (b[v1][v1] + i0) == 4);
+  A (strlen (b[v1][v1] + i1) == 3);
+  A (strlen (b[v1][v1] + i2) == 2);
+}
+
+int main ()
+{
+  test_array_ref_2_3 ();
+  test_array_off_2_3 ();
+
+  test_array_ref_2_2_5 ();
+  test_array_off_2_2_5 ();
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-3.c b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
new file mode 100644
index 0000000..2696948
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
@@ -0,0 +1,133 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding
+   starting with r262522
+   Exercise strlen() with a multi-dimensional array of strings with
+   embedded nuls.  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3][9] = {
+  { "1", "1\0002" },
+  { "12\0003", "123\0004" }
+};
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+volatile int v3 = 3;
+volatile int v4 = 4;
+volatile int v5 = 5;
+volatile int v6 = 6;
+volatile int v7 = 7;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+  int i5 = i4 + 1;
+  int i6 = i5 + 1;
+  int i7 = i6 + 1;
+
+  A (strlen (a[0][0]) == 1);
+  A (strlen (a[0][1]) == 1);
+
+  A (strlen (a[1][0]) == 2);
+  A (strlen (a[1][1]) == 3);
+
+  A (strlen (&a[0][0][0]) == 1);
+  A (strlen (&a[0][1][0]) == 1);
+
+  A (strlen (&a[1][0][0]) == 2);
+  A (strlen (&a[1][1][0]) == 3);
+
+  A (strlen (&a[0][0][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 2) == 1);
+  A (strlen (&a[0][1][0] + 3) == 0);
+  A (strlen (&a[0][1][0] + 7) == 0);
+
+  A (strlen (&a[1][0][0] + 1) == 1);
+  A (strlen (&a[1][1][0] + 1) == 2);
+  A (strlen (&a[1][1][0] + 2) == 1);
+  A (strlen (&a[1][1][0] + 7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+  A (strlen (&a[i0][i1][i1]) == 0);
+  A (strlen (&a[i0][i1][i2]) == 1);
+  A (strlen (&a[i0][i1][i3]) == 0);
+  A (strlen (&a[i0][i1][i3]) == 0);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+  A (strlen (&a[i1][i1][i1]) == 2);
+  A (strlen (&a[i1][i1][i2]) == 1);
+  A (strlen (&a[i1][i1][i3]) == 0);
+  A (strlen (&a[i1][i1][i4]) == 1);
+  A (strlen (&a[i1][i1][i5]) == 0);
+  A (strlen (&a[i1][i1][i6]) == 0);
+  A (strlen (&a[i1][i1][i7]) == 0);
+
+  A (strlen (&a[i0][i0][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i7) == 0);
+
+  A (strlen (&a[i1][i0][i0] + i1) == 1);
+  A (strlen (&a[i1][i1][i0] + i1) == 2);
+  A (strlen (&a[i1][i1][i0] + i2) == 1);
+  A (strlen (&a[i1][i1][i0] + i3) == 0);
+  A (strlen (&a[i1][i1][i0] + i4) == 1);
+  A (strlen (&a[i1][i1][i0] + i5) == 0);
+  A (strlen (&a[i1][i1][i0] + i6) == 0);
+  A (strlen (&a[i1][i1][i0] + i7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+
+  A (strlen (&a[i0][i0][i0] + v1) == 0);
+  A (strlen (&a[i0][i0][i0] + v2) == 0);
+  A (strlen (&a[i0][i0][i0] + v7) == 0);
+
+  A (strlen (&a[i0][i1][i0] + v1) == 0);
+  A (strlen (&a[i0][i1][i0] + v2) == 1);
+  A (strlen (&a[i0][i1][i0] + v3) == 0);
+
+  A (strlen (&a[i1][i0][i0] + v1) == 1);
+  A (strlen (&a[i1][i1][i0] + v1) == 2);
+  A (strlen (&a[i1][i1][i0] + v2) == 1);
+  A (strlen (&a[i1][i1][i0] + v3) == 0);
+  A (strlen (&a[i1][i1][i0] + v4) == 1);
+  A (strlen (&a[i1][i1][i0] + v5) == 0);
+  A (strlen (&a[i1][i1][i0] + v6) == 0);
+  A (strlen (&a[i1][i1][i0] + v7) == 0);
+}
+
+
+int main (void)
+{
+  test_array_ref ();
+}
Richard Biener July 20, 2018, 10:20 a.m. UTC | #10
On Thu, 19 Jul 2018, Martin Sebor wrote:

> Here's one more update with tweaks addressing a couple more
> of Bernd's comments:
> 
> 1) correct the use of TREE_STRING_LENGTH() where a number of
> array elements is expected and not bytes
> 2) set CHARTYPE as soon as it's first determined rather than
> trying to extract it again later

Please look at Bernds followup comments.  One additional note:

I see you are ultimatively using CHARTYPE to get at the size
of the access.  That is wrong.

   if (TREE_CODE (arg) == ADDR_EXPR)
     {
+      tree argtype = TREE_TYPE (arg);
+      chartype = argtype;
+
       arg = TREE_OPERAND (arg, 0);
       tree ref = arg;
       if (TREE_CODE (arg) == ARRAY_REF)
        {

so the "access" is of size array_ref_element_size (arg) here.  You
may not simply use TYPE_SIZE_UNIT of sth.

That is, lookign at current trunk,

  if (TREE_CODE (arg) == ADDR_EXPR)
    {
      arg = TREE_OPERAND (arg, 0);
      tree ref = arg;
      if (TREE_CODE (arg) == ARRAY_REF)
        {
          tree idx = TREE_OPERAND (arg, 1);
          if (TREE_CODE (idx) != INTEGER_CST)
            {
              /* Extract the variable index to prevent
                 get_addr_base_and_unit_offset() from failing due to
                 it.  Use it later to compute the non-constant offset
                 into the string and return it to the caller.  */
              varidx = idx;
              ref = TREE_OPERAND (arg, 0);

you should scale the index here by array_ref_element_size (arg).
Or simply rewrite this to instead of using get_addr_base_and_unit_offset,
use get_inner_reference which does all that magic for you.

That is, you shouldn't need chartype.

Richard.


> On 07/19/2018 01:49 PM, Martin Sebor wrote:
> > On 07/19/2018 01:17 AM, Richard Biener wrote:
> > > On Wed, 18 Jul 2018, Martin Sebor wrote:
> > > 
> > > > > > > +      while (TREE_CODE (chartype) != INTEGER_TYPE)
> > > > > > > +    chartype = TREE_TYPE (chartype);
> > > > > > This is a bit concerning.  First under what conditions is chartype
> > > > > > not
> > > > > > going to be an INTEGER_TYPE?  And under what conditions will
> > > > > > extracting
> > > > > > its type ultimately lead to something that is an INTEGER_TYPE?
> > > > > 
> > > > > chartype is usually (maybe even always) pointer type here:
> > > > > 
> > > > >   const char a[] = "123";
> > > > >   extern int i;
> > > > >   n = strlen (&a[i]);
> > > > 
> > > > But your hunch was correct that the loop isn't safe because
> > > > the element type need not be an integer (I didn't know/forgot
> > > > that the function is called for non-strings too).  The loop
> > > > should be replaced by:
> > > > 
> > > >       while (TREE_CODE (chartype) == ARRAY_TYPE
> > > >          || TREE_CODE (chartype) == POINTER_TYPE)
> > > >     chartype = TREE_TYPE (chartype);
> > > 
> > > As this function may be called "late" you need to cope with
> > > the middle-end ignoring type changes and thus happily
> > > passing int *** directly rather than (char *) of that.
> > > 
> > > Also doesn't the above yield int for int *[]?
> > 
> > I don't think it ever gets this far for either a pointer to
> > an array of int, or for an array of pointers to int.  So for
> > something like the following the function fails earlier:
> > 
> >   const int* const a[2] = { ... };
> >   const char* (const *p)[2] = &a;
> > 
> >   int f (void)
> >   {
> >     return __builtin_memcmp (*p, "12345678", 8);
> >   }
> > 
> > (Assuming this is what you were asking about.)
> > 
> > > I guess you really want
> > > 
> > >    if (POINTER_TYPE_P (chartype))
> > >      chartype = TREE_TYPE (chartype);
> > >    while (TREE_CODE (chartype) == ARRAY_TYPE)
> > >      chartype = TREE_TYPE (chartype);
> > > 
> > > ?
> > 
> > That seems to work too.  Attached is an update with this tweak.
> > The update also addresses some of Bernd's comments: it removes
> > the pointless second test in:
> > 
> >     if (TREE_CODE (type) == ARRAY_TYPE
> >         && TREE_CODE (type) != INTEGER_TYPE)
> > 
> > the unused assignment to chartype in:
> > 
> >    else if (DECL_P (arg))
> >      {
> >        array = arg;
> >        chartype = TREE_TYPE (arg);
> >      }
> > 
> > and calls string_constant() instead of strnlen() to compute
> > the length of a generic string.
> > 
> > Other improvements  are possible in this area but they are
> > orthogonal to the bug I'm trying to fix so I'll post separate
> > patches for some of those.
> > 
> > Martin
> 
>
Martin Sebor July 24, 2018, 8:16 p.m. UTC | #11
On 07/20/2018 04:20 AM, Richard Biener wrote:
> On Thu, 19 Jul 2018, Martin Sebor wrote:
>
>> Here's one more update with tweaks addressing a couple more
>> of Bernd's comments:
>>
>> 1) correct the use of TREE_STRING_LENGTH() where a number of
>> array elements is expected and not bytes
>> 2) set CHARTYPE as soon as it's first determined rather than
>> trying to extract it again later
>
> Please look at Bernds followup comments.  One additional note:
>
> I see you are ultimatively using CHARTYPE to get at the size
> of the access.  That is wrong.
>
>    if (TREE_CODE (arg) == ADDR_EXPR)
>      {
> +      tree argtype = TREE_TYPE (arg);
> +      chartype = argtype;
> +
>        arg = TREE_OPERAND (arg, 0);
>        tree ref = arg;
>        if (TREE_CODE (arg) == ARRAY_REF)
>         {
>
> so the "access" is of size array_ref_element_size (arg) here.  You
> may not simply use TYPE_SIZE_UNIT of sth.
>
> That is, lookign at current trunk,
>
>   if (TREE_CODE (arg) == ADDR_EXPR)
>     {
>       arg = TREE_OPERAND (arg, 0);
>       tree ref = arg;
>       if (TREE_CODE (arg) == ARRAY_REF)
>         {
>           tree idx = TREE_OPERAND (arg, 1);
>           if (TREE_CODE (idx) != INTEGER_CST)
>             {
>               /* Extract the variable index to prevent
>                  get_addr_base_and_unit_offset() from failing due to
>                  it.  Use it later to compute the non-constant offset
>                  into the string and return it to the caller.  */
>               varidx = idx;
>               ref = TREE_OPERAND (arg, 0);
>
> you should scale the index here by array_ref_element_size (arg).
> Or simply rewrite this to instead of using get_addr_base_and_unit_offset,
> use get_inner_reference which does all that magic for you.
>
> That is, you shouldn't need chartype.

I've made use of size array_ref_element_size() here as you suggest
and eliminated the type.  For the purposes of testing though,
I haven't been able to come up with a test case that would have
the function return something other than TYPE_SIZE_UNIT().  IIUC,
the function is used to compute the size of elements of overaligned
types and there is no way that I know of to create an array of
overaligned characters.  (If there is a way to exercise this I'd
appreciate a test case so I can add it to the test suite).

I've also fixed the other bug Bernd pointed with pointers to arrays.
The fix seems small enough that it makes sense to handle at the same
time as this bug.

Attached is an update with these changes.

Martin

>
> Richard.
>
>
>> On 07/19/2018 01:49 PM, Martin Sebor wrote:
>>> On 07/19/2018 01:17 AM, Richard Biener wrote:
>>>> On Wed, 18 Jul 2018, Martin Sebor wrote:
>>>>
>>>>>>>> +      while (TREE_CODE (chartype) != INTEGER_TYPE)
>>>>>>>> +    chartype = TREE_TYPE (chartype);
>>>>>>> This is a bit concerning.  First under what conditions is chartype
>>>>>>> not
>>>>>>> going to be an INTEGER_TYPE?  And under what conditions will
>>>>>>> extracting
>>>>>>> its type ultimately lead to something that is an INTEGER_TYPE?
>>>>>>
>>>>>> chartype is usually (maybe even always) pointer type here:
>>>>>>
>>>>>>   const char a[] = "123";
>>>>>>   extern int i;
>>>>>>   n = strlen (&a[i]);
>>>>>
>>>>> But your hunch was correct that the loop isn't safe because
>>>>> the element type need not be an integer (I didn't know/forgot
>>>>> that the function is called for non-strings too).  The loop
>>>>> should be replaced by:
>>>>>
>>>>>       while (TREE_CODE (chartype) == ARRAY_TYPE
>>>>>          || TREE_CODE (chartype) == POINTER_TYPE)
>>>>>     chartype = TREE_TYPE (chartype);
>>>>
>>>> As this function may be called "late" you need to cope with
>>>> the middle-end ignoring type changes and thus happily
>>>> passing int *** directly rather than (char *) of that.
>>>>
>>>> Also doesn't the above yield int for int *[]?
>>>
>>> I don't think it ever gets this far for either a pointer to
>>> an array of int, or for an array of pointers to int.  So for
>>> something like the following the function fails earlier:
>>>
>>>   const int* const a[2] = { ... };
>>>   const char* (const *p)[2] = &a;
>>>
>>>   int f (void)
>>>   {
>>>     return __builtin_memcmp (*p, "12345678", 8);
>>>   }
>>>
>>> (Assuming this is what you were asking about.)
>>>
>>>> I guess you really want
>>>>
>>>>    if (POINTER_TYPE_P (chartype))
>>>>      chartype = TREE_TYPE (chartype);
>>>>    while (TREE_CODE (chartype) == ARRAY_TYPE)
>>>>      chartype = TREE_TYPE (chartype);
>>>>
>>>> ?
>>>
>>> That seems to work too.  Attached is an update with this tweak.
>>> The update also addresses some of Bernd's comments: it removes
>>> the pointless second test in:
>>>
>>>     if (TREE_CODE (type) == ARRAY_TYPE
>>>         && TREE_CODE (type) != INTEGER_TYPE)
>>>
>>> the unused assignment to chartype in:
>>>
>>>    else if (DECL_P (arg))
>>>      {
>>>        array = arg;
>>>        chartype = TREE_TYPE (arg);
>>>      }
>>>
>>> and calls string_constant() instead of strnlen() to compute
>>> the length of a generic string.
>>>
>>> Other improvements  are possible in this area but they are
>>> orthogonal to the bug I'm trying to fix so I'll post separate
>>> patches for some of those.
>>>
>>> Martin
>>
>>
>
PR tree-optimization/86622 - incorrect strlen of array of array plus variable offset
PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522

gcc/ChangeLog:

	PR tree-optimization/86622
	PR tree-optimization/86532
	* builtins.h (string_length): Declare.
	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
	(check_access): Be prepared for non-constant length ranges.
	(string_length): Make extern.
	* expr.c (string_constant): Only handle the minor non-constant
	array index.  Use string_constant to compute the length of
	a generic string constant.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86622
	PR tree-optimization/86532
	* gcc.c-torture/execute/strlen-2.c: New test.
	* gcc.c-torture/execute/strlen-3.c: New test.
	* gcc.c-torture/execute/strlen-4.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 539a6d1..aa3e0d8 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -517,11 +517,11 @@ get_pointer_alignment (tree exp)
   return align;
 }
 
-/* Return the number of non-zero elements in the sequence
+/* Return the number of leading non-zero elements in the sequence
    [ PTR, PTR + MAXELTS ) where each element's size is ELTSIZE bytes.
    ELTSIZE must be a power of 2 less than 8.  Used by c_strlen.  */
 
-static unsigned
+unsigned
 string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
 {
   gcc_checking_assert (eltsize == 1 || eltsize == 2 || eltsize == 4);
@@ -605,14 +605,21 @@ c_strlen (tree src, int only_value)
 
   /* Set MAXELTS to sizeof (SRC) / sizeof (*SRC) - 1, the maximum possible
      length of SRC.  Prefer TYPE_SIZE() to TREE_STRING_LENGTH() if possible
-     in case the latter is less than the size of the array.  */
-  HOST_WIDE_INT maxelts = TREE_STRING_LENGTH (src);
+     in case the latter is less than the size of the array, such as when
+     SRC refers to a short string literal used to initialize a large array.
+     In that case, the elements of the array after the terminating NUL are
+     all NUL.  */
+  HOST_WIDE_INT strelts = TREE_STRING_LENGTH (src);
+  strelts = strelts / eltsize - 1;
+
+  HOST_WIDE_INT maxelts = strelts;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      maxelts = tree_to_uhwi (size);
-
-  maxelts = maxelts / eltsize - 1;
+      {
+	maxelts = tree_to_uhwi (size);
+	maxelts = maxelts / eltsize - 1;
+      }
 
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
@@ -620,10 +627,12 @@ c_strlen (tree src, int only_value)
 
   if (byteoff && TREE_CODE (byteoff) != INTEGER_CST)
     {
-      /* If the string has an internal zero byte (e.g., "foo\0bar"), we can't
-	 compute the offset to the following null if we don't know where to
+      /* If the string has an internal NUL character followed by any
+	 non-NUL characters (e.g., "foo\0bar"), we can't compute
+	 the offset to the following NUL if we don't know where to
 	 start searching for it.  */
-      if (string_length (ptr, eltsize, maxelts) < maxelts)
+      unsigned len = string_length (ptr, eltsize, strelts);
+      if (len < strelts)
 	{
 	  /* Return when an embedded null character is found.  */
 	  return NULL_TREE;
@@ -633,12 +642,17 @@ c_strlen (tree src, int only_value)
 	return ssize_int (0);
 
       /* We don't know the starting offset, but we do know that the string
-	 has no internal zero bytes.  We can assume that the offset falls
-	 within the bounds of the string; otherwise, the programmer deserves
-	 what he gets.  Subtract the offset from the length of the string,
-	 and return that.  This would perhaps not be valid if we were dealing
-	 with named arrays in addition to literal string constants.  */
-      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
+	 has no internal zero bytes.  If the offset falls within the bounds
+	 of the string subtract the offset from the length of the string,
+	 and return that.  Otherwise the length is zero.  Take care to
+	 use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
+      offsave = fold_convert (ssizetype, offsave);
+      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
+				      build_int_cst (ssizetype, len * eltsize));
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
+			      build_zero_cst (ssizetype));
     }
 
   /* Offset from the beginning of the string in elements.  */
@@ -3192,15 +3206,13 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (dstwrite)
     get_size_range (dstwrite, range);
 
-  /* This can happen at -O0.  */
-  if (range[0] && TREE_CODE (range[0]) != INTEGER_CST)
-    return false;
-
   tree func = get_callee_fndecl (exp);
 
   /* First check the number of bytes to be written against the maximum
      object size.  */
-  if (range[0] && tree_int_cst_lt (maxobjsize, range[0]))
+  if (range[0]
+      && TREE_CODE (range[0]) == INTEGER_CST
+      && tree_int_cst_lt (maxobjsize, range[0]))
     {
       if (TREE_NO_WARNING (exp))
 	return false;
@@ -3235,9 +3247,11 @@ check_access (tree exp, tree, tree, tree dstwrite,
   if (range[0] || !exactwrite || integer_all_onesp (dstwrite))
     {
       if (range[0]
+	  && TREE_CODE (range[0]) == INTEGER_CST
 	  && ((tree_fits_uhwi_p (dstsize)
 	       && tree_int_cst_lt (dstsize, range[0]))
-	      || (tree_fits_uhwi_p (dstwrite)
+	      || (dstwrite
+		  && tree_fits_uhwi_p (dstwrite)
 		  && tree_int_cst_lt (dstwrite, range[0]))))
 	{
 	  if (TREE_NO_WARNING (exp))
diff --git a/gcc/builtins.h b/gcc/builtins.h
index c922904..2e0a2f9 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -57,6 +57,7 @@ extern unsigned int get_object_alignment (tree);
 extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
+extern unsigned string_length (const void*, unsigned, unsigned);
 extern tree c_strlen (tree, int);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
diff --git a/gcc/expr.c b/gcc/expr.c
index f665e18..de6709d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11294,12 +11294,15 @@ string_constant (tree arg, tree *ptr_offset)
 	  tree idx = TREE_OPERAND (arg, 1);
 	  if (TREE_CODE (idx) != INTEGER_CST)
 	    {
-	      /* Extract the variable index to prevent
-		 get_addr_base_and_unit_offset() from failing due to
-		 it.  Use it later to compute the non-constant offset
+	      /* From a pointer (but not array) argument extract the variable
+		 index to prevent get_addr_base_and_unit_offset() from failing
+		 due to it.  Use it later to compute the non-constant offset
 		 into the string and return it to the caller.  */
 	      varidx = idx;
 	      ref = TREE_OPERAND (arg, 0);
+
+	      if (TREE_CODE (TREE_TYPE (arg)) == ARRAY_TYPE)
+		return NULL_TREE;
 	    }
 	}
       array = get_addr_base_and_unit_offset (ref, &base_off);
@@ -11327,6 +11330,12 @@ string_constant (tree arg, tree *ptr_offset)
       tree offset;
       if (tree str = string_constant (arg0, &offset))
 	{
+	  /* Avoid pointers to arrays (see bug 86622).  */
+	  if (POINTER_TYPE_P (TREE_TYPE (arg))
+	      && TREE_CODE (TREE_TYPE (TREE_TYPE (arg))) == ARRAY_TYPE
+	      && TREE_CODE (TREE_OPERAND (arg0, 0)) == ARRAY_REF)
+	    return NULL_TREE;
+
 	  tree type = TREE_TYPE (arg1);
 	  *ptr_offset = fold_build2 (PLUS_EXPR, type, offset, arg1);
 	  return str;
@@ -11343,16 +11352,17 @@ string_constant (tree arg, tree *ptr_offset)
     {
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
-      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
-	{
-	  /* Add the scaled variable index to the constant offset.  */
-	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
-				     fold_convert (sizetype, varidx),
-				     eltsize);
-	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
-	}
-      else
-	return NULL_TREE;
+
+      gcc_assert (TREE_CODE (arg) == ARRAY_REF);
+      tree chartype = TREE_TYPE (TREE_TYPE (TREE_OPERAND (arg, 0)));
+      if (TREE_CODE (chartype) != INTEGER_TYPE)
+	return NULL;
+
+      tree charsize = array_ref_element_size (arg);
+      /* Set the non-constant offset to the non-constant index scaled
+	 by the size of the character type.  */
+      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
+			    fold_convert (sizetype, varidx), charsize);
     }
 
   if (TREE_CODE (array) == STRING_CST)
@@ -11371,11 +11381,6 @@ string_constant (tree arg, tree *ptr_offset)
     return NULL_TREE;
   if (TREE_CODE (init) == CONSTRUCTOR)
     {
-      if (TREE_CODE (arg) != ARRAY_REF
-	  && TREE_CODE (arg) == COMPONENT_REF
-	  && TREE_CODE (arg) == MEM_REF)
-	return NULL_TREE;
-
       /* Convert the 64-bit constant offset to a wider type to avoid
 	 overflow.  */
       offset_int wioff;
@@ -11391,11 +11396,15 @@ string_constant (tree arg, tree *ptr_offset)
       init = fold_ctor_reference (NULL_TREE, init, base_off, 0, array,
 				  &fieldoff);
       HOST_WIDE_INT cstoff;
-      if (init && base_off.is_constant (&cstoff))
-	{
-	  cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
-	  offset = build_int_cst (sizetype, cstoff);
-	}
+      if (!base_off.is_constant (&cstoff))
+	return NULL_TREE;
+
+      cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
+      tree off = build_int_cst (sizetype, cstoff);
+      if (varidx)
+	offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, off);
+      else
+	offset = off;
     }
 
   if (!init || TREE_CODE (init) != STRING_CST)
@@ -11413,8 +11422,11 @@ string_constant (tree arg, tree *ptr_offset)
      const char a[4] = "abc\000\000";
      The excess elements contribute to TREE_STRING_LENGTH()
      but not to strlen().  */
-  unsigned HOST_WIDE_INT length
-    = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
+  unsigned HOST_WIDE_INT charsize
+    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init))));
+  unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
+  length = string_length (TREE_STRING_POINTER (init), charsize,
+			  length / charsize);
   if (compare_tree_int (array_size, length + 1) < 0)
     return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-2.c b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
new file mode 100644
index 0000000..4519f6a
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-2.c
@@ -0,0 +1,210 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3] = { "1", "12" };
+static const char b[2][2][5] = { { "1", "12" }, { "123", "1234" } };
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref_2_3 (void)
+{
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[0][v0]) == 1);
+  A (strlen (&a[v0][0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][0]) == 2);
+  A (strlen (&a[1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][1]) == 1);
+  A (strlen (&a[v1][1]) == 1);
+
+  A (strlen (&a[v1][2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[i0][v0]) == 1);
+  A (strlen (&a[v0][i0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][i0]) == 2);
+  A (strlen (&a[i1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][i1]) == 1);
+  A (strlen (&a[v1][i1]) == 1);
+
+  A (strlen (&a[v1][i2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+}
+
+void test_array_off_2_3 (void)
+{
+  A (strlen (a[0] + 0) == 1);
+  A (strlen (a[0] + v0) == 1);
+  A (strlen (a[v0] + 0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[1] + v0) == 2);
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + 1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + 2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[i0] + i0) == 1);
+  A (strlen (a[i0] + v0) == 1);
+  A (strlen (a[v0] + i0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[i1] + v0) == 2);
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + i1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + i2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+}
+
+void test_array_ref_2_2_5 (void)
+{
+  A (strlen (b[0][v0]) == 1);
+  A (strlen (b[v0][0]) == 1);
+
+  A (strlen (&b[0][0][v0]) == 1);
+  A (strlen (&b[0][v0][0]) == 1);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (b[0][v1]) == 2);
+  A (strlen (b[v1][0]) == 3);
+
+  A (strlen (&b[0][0][v1]) == 0);
+  A (strlen (&b[0][v1][0]) == 2);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (&b[0][v1][v1]) == 1);
+  A (strlen (&b[v1][0][v1]) == 2);
+  A (strlen (&b[v1][v1][0]) == 4);
+  A (strlen (&b[v1][v1][1]) == 3);
+  A (strlen (&b[v1][v1][2]) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][v0]) == 1);
+  A (strlen (b[v0][i0]) == 1);
+
+  A (strlen (&b[i0][i0][v0]) == 1);
+  A (strlen (&b[i0][v0][i0]) == 1);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (b[i0][v1]) == 2);
+  A (strlen (b[v1][i0]) == 3);
+
+  A (strlen (&b[i0][i0][v1]) == 0);
+  A (strlen (&b[i0][v1][i0]) == 2);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (&b[i0][v1][v1]) == 1);
+  A (strlen (&b[v1][i0][v1]) == 2);
+  A (strlen (&b[v1][v1][i0]) == 4);
+  A (strlen (&b[v1][v1][i1]) == 3);
+  A (strlen (&b[v1][v1][i2]) == 2);
+}
+
+void test_array_off_2_2_5 (void)
+{
+  A (strlen (b[0][0] + v0) == 1);
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[0][0] + v1) == 0);
+  A (strlen (b[0][v1] + 0) == 2);
+  A (strlen (b[v0][0] + 0) == 1);
+
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + 0) == 1);
+
+  A (strlen (b[0][v1] + v1) == 1);
+  A (strlen (b[v1][0] + v1) == 2);
+  A (strlen (b[v1][v1] + 0) == 4);
+  A (strlen (b[v1][v1] + 1) == 3);
+  A (strlen (b[v1][v1] + 2) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][i0] + v0) == 1);
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[i0][i0] + v1) == 0);
+  A (strlen (b[i0][v1] + i0) == 2);
+  A (strlen (b[v0][i0] + i0) == 1);
+
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + i0) == 1);
+
+  A (strlen (b[i0][v1] + v1) == 1);
+  A (strlen (b[v1][i0] + v1) == 2);
+  A (strlen (b[v1][v1] + i0) == 4);
+  A (strlen (b[v1][v1] + i1) == 3);
+  A (strlen (b[v1][v1] + i2) == 2);
+}
+
+int main ()
+{
+  test_array_ref_2_3 ();
+  test_array_off_2_3 ();
+
+  test_array_ref_2_2_5 ();
+  test_array_off_2_2_5 ();
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-3.c b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
new file mode 100644
index 0000000..182afdd
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-3.c
@@ -0,0 +1,132 @@
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding
+   starting with r262522
+   Exercise strlen() with a multi-dimensional array of strings with
+   embedded nuls.  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3][9] = {
+  { "1", "1\0002" },
+  { "12\0003", "123\0004" }
+};
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+volatile int v3 = 3;
+volatile int v4 = 4;
+volatile int v5 = 5;
+volatile int v6 = 6;
+volatile int v7 = 7;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+  int i5 = i4 + 1;
+  int i6 = i5 + 1;
+  int i7 = i6 + 1;
+
+  A (strlen (a[0][0]) == 1);
+  A (strlen (a[0][1]) == 1);
+
+  A (strlen (a[1][0]) == 2);
+  A (strlen (a[1][1]) == 3);
+
+  A (strlen (&a[0][0][0]) == 1);
+  A (strlen (&a[0][1][0]) == 1);
+
+  A (strlen (&a[1][0][0]) == 2);
+  A (strlen (&a[1][1][0]) == 3);
+
+  A (strlen (&a[0][0][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 1) == 0);
+  A (strlen (&a[0][1][0] + 2) == 1);
+  A (strlen (&a[0][1][0] + 3) == 0);
+  A (strlen (&a[0][1][0] + 7) == 0);
+
+  A (strlen (&a[1][0][0] + 1) == 1);
+  A (strlen (&a[1][1][0] + 1) == 2);
+  A (strlen (&a[1][1][0] + 2) == 1);
+  A (strlen (&a[1][1][0] + 7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+  A (strlen (&a[i0][i1][i1]) == 0);
+  A (strlen (&a[i0][i1][i2]) == 1);
+  A (strlen (&a[i0][i1][i3]) == 0);
+  A (strlen (&a[i0][i1][i3]) == 0);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+  A (strlen (&a[i1][i1][i1]) == 2);
+  A (strlen (&a[i1][i1][i2]) == 1);
+  A (strlen (&a[i1][i1][i3]) == 0);
+  A (strlen (&a[i1][i1][i4]) == 1);
+  A (strlen (&a[i1][i1][i5]) == 0);
+  A (strlen (&a[i1][i1][i6]) == 0);
+  A (strlen (&a[i1][i1][i7]) == 0);
+
+  A (strlen (&a[i0][i0][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i1) == 0);
+  A (strlen (&a[i0][i1][i0] + i7) == 0);
+
+  A (strlen (&a[i1][i0][i0] + i1) == 1);
+  A (strlen (&a[i1][i1][i0] + i1) == 2);
+  A (strlen (&a[i1][i1][i0] + i2) == 1);
+  A (strlen (&a[i1][i1][i0] + i3) == 0);
+  A (strlen (&a[i1][i1][i0] + i4) == 1);
+  A (strlen (&a[i1][i1][i0] + i5) == 0);
+  A (strlen (&a[i1][i1][i0] + i6) == 0);
+  A (strlen (&a[i1][i1][i0] + i7) == 0);
+
+
+  A (strlen (a[i0][i0]) == 1);
+  A (strlen (a[i0][i1]) == 1);
+
+  A (strlen (a[i1][i0]) == 2);
+  A (strlen (a[i1][i1]) == 3);
+
+  A (strlen (&a[i0][i0][i0]) == 1);
+  A (strlen (&a[i0][i1][i0]) == 1);
+
+  A (strlen (&a[i1][i0][i0]) == 2);
+  A (strlen (&a[i1][i1][i0]) == 3);
+
+  A (strlen (&a[i0][i0][i0] + v1) == 0);
+  A (strlen (&a[i0][i0][i0] + v2) == 0);
+  A (strlen (&a[i0][i0][i0] + v7) == 0);
+
+  A (strlen (&a[i0][i1][i0] + v1) == 0);
+  A (strlen (&a[i0][i1][i0] + v2) == 1);
+  A (strlen (&a[i0][i1][i0] + v3) == 0);
+
+  A (strlen (&a[i1][i0][i0] + v1) == 1);
+  A (strlen (&a[i1][i1][i0] + v1) == 2);
+  A (strlen (&a[i1][i1][i0] + v2) == 1);
+  A (strlen (&a[i1][i1][i0] + v3) == 0);
+  A (strlen (&a[i1][i1][i0] + v4) == 1);
+  A (strlen (&a[i1][i1][i0] + v5) == 0);
+  A (strlen (&a[i1][i1][i0] + v6) == 0);
+  A (strlen (&a[i1][i1][i0] + v7) == 0);
+}
+
+int main (void)
+{
+  test_array_ref ();
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/strlen-4.c b/gcc/testsuite/gcc.c-torture/execute/strlen-4.c
new file mode 100644
index 0000000..c3b2c77
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/strlen-4.c
@@ -0,0 +1,232 @@
+/* PR tree-optimization/86622 - incorrect strlen of array of array plus
+   variable offset
+   Exercise strlen() with a multi-dimensional array of strings with
+   offsets.  */
+
+extern int printf (const char*, ...);
+extern __SIZE_TYPE__ strlen (const char*);
+
+typedef char A28[28];
+typedef A28 A3_28[3];
+typedef A3_28 A2_3_28[2];
+
+static const A2_3_28 a = {
+  /* [0][0]    [0][1]         [0][2] */
+  { "1\00012", "123\0001234", "12345\000123456" },
+  /* [1][0]    [1][1]         [1][2] */
+  { "1234567\00012345678", "123456789\0001234567890", "12345678901\000123456789012" }
+};
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+volatile int v3 = 3;
+volatile int v4 = 4;
+volatile int v5 = 5;
+volatile int v6 = 6;
+volatile int v7 = 7;
+
+#define A(expr, N)							\
+  ((strlen (expr) == N)							\
+   ? (void)0 : (printf ("line %i: strlen (%s = \"%s\") != %i\n",	\
+			__LINE__, #expr, expr, N),			\
+		__builtin_abort ()))
+
+/* Verify that strlen() involving pointer to array arguments computes
+   the correct result.  */
+
+void test_array_ptr (void)
+{
+  /* Compute the length of the string at the refeenced array.  */
+  A (*(&a[0][0] + 0), 1);
+  A (*(&a[0][0] + 1), 3);
+  A (*(&a[0][0] + 2), 5);
+
+  A (*(&a[0][1] - 1), 1);
+  A (*(&a[0][1] + 0), 3);
+  A (*(&a[0][1] + 1), 5);
+
+  A (*(&a[0][2] - 2), 1);
+  A (*(&a[0][2] - 1), 3);
+  A (*(&a[0][2] + 0), 5);
+
+  A (*(&a[1][0] + 0), 7);
+  A (*(&a[1][0] + 1), 9);
+  A (*(&a[1][0] + 2), 11);
+
+  A (*(&a[1][1] - 1), 7);
+  A (*(&a[1][1] + 0), 9);
+  A (*(&a[1][1] + 1), 11);
+
+  A (*(&a[1][2] - 2), 7);
+  A (*(&a[1][2] - 1), 9);
+  A (*(&a[1][2] - 0), 11);
+
+  /* Compute the length of the string past the first nul.  */
+  A (*(&a[0][0] + 0) + 2, 2);
+  A (*(&a[0][0] + 1) + 4, 4);
+  A (*(&a[0][0] + 2) + 6, 6);
+
+  /* Compute the length of the string past the second nul.  */
+  A (*(&a[0][0] + 0) + 5, 0);
+  A (*(&a[0][0] + 1) + 10, 0);
+  A (*(&a[0][0] + 2) + 14, 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+  int i5 = i4 + 1;
+
+  A (*(&a[0][0] + i0), 1);
+  A (*(&a[0][0] + i1), 3);
+  A (*(&a[0][0] + i2), 5);
+
+  A (*(&a[0][1] - i1), 1);
+  A (*(&a[0][1] + i0), 3);
+  A (*(&a[0][1] + i1), 5);
+
+  A (*(&a[0][2] - i2), 1);
+  A (*(&a[0][2] - i1), 3);
+  A (*(&a[0][2] + i0), 5);
+
+  A (*(&a[1][0] + i0), 7);
+  A (*(&a[1][0] + i1), 9);
+  A (*(&a[1][0] + i2), 11);
+
+  A (*(&a[1][1] - i1), 7);
+  A (*(&a[1][1] + i0), 9);
+  A (*(&a[1][1] + i1), 11);
+
+  A (*(&a[1][2] - i2), 7);
+  A (*(&a[1][2] - i1), 9);
+  A (*(&a[1][2] - i0), 11);
+
+
+  A (*(&a[i0][i0] + i0), 1);
+  A (*(&a[i0][i0] + i1), 3);
+  A (*(&a[i0][i0] + i2), 5);
+
+  A (*(&a[i0][i1] - i1), 1);
+  A (*(&a[i0][i1] + i0), 3);
+  A (*(&a[i0][i1] + i1), 5);
+
+  A (*(&a[i0][i2] - i2), 1);
+  A (*(&a[i0][i2] - i1), 3);
+  A (*(&a[i0][i2] + i0), 5);
+
+  A (*(&a[i1][i0] + i0), 7);
+  A (*(&a[i1][i0] + i1), 9);
+  A (*(&a[i1][i0] + i2), 11);
+
+  A (*(&a[i1][i1] - i1), 7);
+  A (*(&a[i1][i1] + i0), 9);
+  A (*(&a[i1][i1] + i1), 11);
+
+  A (*(&a[i1][i2] - i2), 7);
+  A (*(&a[i1][i2] - i1), 9);
+  A (*(&a[i1][i2] - i0), 11);
+
+
+  A (*(&a[i0][i0] + v0), 1);
+  A (*(&a[i0][i0] + v1), 3);
+  A (*(&a[i0][i0] + v2), 5);
+
+  A (*(&a[i0][i1] - v1), 1);
+  A (*(&a[i0][i1] + v0), 3);
+  A (*(&a[i0][i1] + v1), 5);
+
+  A (*(&a[i0][i2] - v2), 1);
+  A (*(&a[i0][i2] - v1), 3);
+  A (*(&a[i0][i2] + v0), 5);
+
+  A (*(&a[i1][i0] + v0), 7);
+  A (*(&a[i1][i0] + v1), 9);
+  A (*(&a[i1][i0] + v2), 11);
+
+  A (*(&a[i1][i1] - v1), 7);
+  A (*(&a[i1][i1] + v0), 9);
+  A (*(&a[i1][i1] + v1), 11);
+
+  A (*(&a[i1][i2] - v2), 7);
+  A (*(&a[i1][i2] - v1), 9);
+  A (*(&a[i1][i2] - v0), 11);
+
+
+  A (*(&a[i0][i0] + v0) + i1, 0);
+  A (*(&a[i0][i0] + v1) + i2, 1);
+  A (*(&a[i0][i0] + v2) + i3, 2);
+
+  A (*(&a[i0][i1] - v1) + v1, 0);
+  A (*(&a[i0][i1] + v0) + v3, 0);
+  A (*(&a[i0][i1] + v1) + v5, 0);
+
+  A (*(&a[i0][v1] - i1) + i1, 0);
+  A (*(&a[i0][v1] + i0) + i3, 0);
+  A (*(&a[i0][v1] + i1) + i5, 0);
+}
+
+static const A3_28* const pa0 = &a[0];
+static const A3_28* const pa1 = &a[1];
+
+static const A3_28* const paa[] = { &a[0], &a[1] };
+
+/* Verify that strlen() involving pointers and arrays of pointers
+   to array arguments computes the correct result.  */
+
+void test_ptr_array (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+
+  A (*((*pa0) + i0), 1);
+  A (*((*pa0) + i1), 3);
+  A (*((*pa0) + i2), 5);
+
+  A (*(pa0[0] + i0), 1);
+  A (*(pa0[0] + i1), 3);
+  A (*(pa0[0] + i2), 5);
+
+  A ((*pa0)[i0] + i1, 0);
+  A ((*pa0)[i1] + i2, 1);
+  A ((*pa0)[i2] + i3, 2);
+
+
+  A (*((*pa1) + i0), 7);
+  A (*((*pa1) + i1), 9);
+  A (*((*pa1) + i2), 11);
+
+  A (*(pa1[0] + i0), 7);
+  A (*(pa1[0] + i1), 9);
+  A (*(pa1[0] + i2), 11);
+
+  A ((*pa1)[i0] + i1, 6);
+  A ((*pa1)[i1] + i2, 7);
+  A ((*pa1)[i2] + i3, 8);
+
+  A (*(*(paa[0]) + i0), 1);
+  A (*(*(paa[0]) + i1), 3);
+  A (*(*(paa[0]) + i2), 5);
+
+  A (*(*(paa[1]) + i0), 7);
+  A (*(*(paa[1]) + i1), 9);
+  A (*(*(paa[1]) + i2), 11);
+
+  A (*(*(paa[1]) - i1), 5);
+  A (*(*(paa[1]) - i2), 3);
+  A (*(*(paa[1]) - i3), 1);
+
+  A (*(*(paa[0]) + i0) + i1, 0);
+  A (*(*(paa[0]) + i1) + i2, 1);
+  A (*(*(paa[0]) + i2) + i3, 2);
+}
+
+int main (void)
+{
+  test_array_ptr ();
+
+  test_ptr_array ();
+}
Jeff Law July 24, 2018, 10:27 p.m. UTC | #12
On 07/24/2018 02:16 PM, Martin Sebor wrote:
> On 07/20/2018 04:20 AM, Richard Biener wrote:
>> On Thu, 19 Jul 2018, Martin Sebor wrote:
>>
>>> Here's one more update with tweaks addressing a couple more
>>> of Bernd's comments:
>>>
>>> 1) correct the use of TREE_STRING_LENGTH() where a number of
>>> array elements is expected and not bytes
>>> 2) set CHARTYPE as soon as it's first determined rather than
>>> trying to extract it again later
TREE_STRING_LENGTH is *really* poorly named.  It practically invites misuse.

[ Snip ]


> 
> 
> gcc-86532.diff
> 
> 
> PR tree-optimization/86622 - incorrect strlen of array of array plus variable offset
> PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/86622
> 	PR tree-optimization/86532
> 	* builtins.h (string_length): Declare.
> 	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
> 	(check_access): Be prepared for non-constant length ranges.
> 	(string_length): Make extern.
> 	* expr.c (string_constant): Only handle the minor non-constant
> 	array index.  Use string_constant to compute the length of
> 	a generic string constant.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/86622
> 	PR tree-optimization/86532
> 	* gcc.c-torture/execute/strlen-2.c: New test.
> 	* gcc.c-torture/execute/strlen-3.c: New test.
> 	* gcc.c-torture/execute/strlen-4.c: New test.
> 
OK
jeff
diff mbox series

Patch

PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522

gcc/ChangeLog:

	PR tree-optimization/86532
	* builtins.c (c_strlen): Correct handling of non-constant offsets.	
	(check_access): Be prepared for non-constant length ranges.
	* expr.c (string_constant): Only handle the minor non-constant
	array index.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86532
	* gcc.c-torture/execute/strlen-2.c: New test.

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 262764)
+++ gcc/builtins.c	(working copy)
@@ -517,7 +517,7 @@  get_pointer_alignment (tree exp)
   return align;
 }
 
-/* Return the number of non-zero elements in the sequence
+/* Return the number of leading non-zero elements in the sequence
    [ PTR, PTR + MAXELTS ) where each element's size is ELTSIZE bytes.
    ELTSIZE must be a power of 2 less than 8.  Used by c_strlen.  */
 
@@ -605,15 +605,22 @@  c_strlen (tree src, int only_value)
 
   /* Set MAXELTS to sizeof (SRC) / sizeof (*SRC) - 1, the maximum possible
      length of SRC.  Prefer TYPE_SIZE() to TREE_STRING_LENGTH() if possible
-     in case the latter is less than the size of the array.  */
-  HOST_WIDE_INT maxelts = TREE_STRING_LENGTH (src);
+     in case the latter is less than the size of the array, such as when
+     SRC refers to a short string literal used to initialize a large array.
+     In that case, the elements of the array after the terminating NUL are
+     all NUL.  */
+  HOST_WIDE_INT strelts = TREE_STRING_LENGTH (src);
+  strelts = strelts / eltsize - 1;
+
+  HOST_WIDE_INT maxelts = strelts;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      maxelts = tree_to_uhwi (size);
+      {
+	maxelts = tree_to_uhwi (size);
+	maxelts = maxelts / eltsize - 1;
+      }
 
-  maxelts = maxelts / eltsize - 1;
-
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
   const char *ptr = TREE_STRING_POINTER (src);
@@ -620,10 +627,12 @@  c_strlen (tree src, int only_value)
 
   if (byteoff && TREE_CODE (byteoff) != INTEGER_CST)
     {
-      /* If the string has an internal zero byte (e.g., "foo\0bar"), we can't
-	 compute the offset to the following null if we don't know where to
+      /* If the string has an internal NUL character followed by any
+	 non-NUL characters (e.g., "foo\0bar"), we can't compute
+	 the offset to the following NUL if we don't know where to
 	 start searching for it.  */
-      if (string_length (ptr, eltsize, maxelts) < maxelts)
+      unsigned len = string_length (ptr, eltsize, strelts);
+      if (len < strelts)
 	{
 	  /* Return when an embedded null character is found.  */
 	  return NULL_TREE;
@@ -633,12 +642,18 @@  c_strlen (tree src, int only_value)
 	return ssize_int (0);
 
       /* We don't know the starting offset, but we do know that the string
-	 has no internal zero bytes.  We can assume that the offset falls
-	 within the bounds of the string; otherwise, the programmer deserves
-	 what he gets.  Subtract the offset from the length of the string,
-	 and return that.  This would perhaps not be valid if we were dealing
-	 with named arrays in addition to literal string constants.  */
-      return size_diffop_loc (loc, size_int (maxelts * eltsize), byteoff);
+	 has no internal zero bytes.  If the offset falls within the bounds
+	 of the string subtract the offset from the length of the string,
+	 and return that.  Otherwise the length is zero.  Take care to
+	 use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave = fold_build1_loc (loc, SAVE_EXPR,
+				      TREE_TYPE (byteoff), byteoff);
+      tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
+				      build_int_cst (size_type_node,
+						     len * eltsize));
+      tree lenexp = size_diffop_loc (loc, size_int (strelts * eltsize), offsave);
+      return fold_build3_loc (loc, COND_EXPR, size_type_node, condexp, lenexp,
+			      build_zero_cst (size_type_node));
     }
 
   /* Offset from the beginning of the string in elements.  */
@@ -3192,15 +3207,13 @@  check_access (tree exp, tree, tree, tree dstwrite,
   if (dstwrite)
     get_size_range (dstwrite, range);
 
-  /* This can happen at -O0.  */
-  if (range[0] && TREE_CODE (range[0]) != INTEGER_CST)
-    return false;
-
   tree func = get_callee_fndecl (exp);
 
   /* First check the number of bytes to be written against the maximum
      object size.  */
-  if (range[0] && tree_int_cst_lt (maxobjsize, range[0]))
+  if (range[0]
+      && TREE_CODE (range[0]) == INTEGER_CST
+      && tree_int_cst_lt (maxobjsize, range[0]))
     {
       if (TREE_NO_WARNING (exp))
 	return false;
@@ -3235,6 +3248,7 @@  check_access (tree exp, tree, tree, tree dstwrite,
   if (range[0] || !exactwrite || integer_all_onesp (dstwrite))
     {
       if (range[0]
+	  && dstwrite
 	  && ((tree_fits_uhwi_p (dstsize)
 	       && tree_int_cst_lt (dstsize, range[0]))
 	      || (tree_fits_uhwi_p (dstwrite)
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 262764)
+++ gcc/expr.c	(working copy)
@@ -11282,24 +11282,34 @@  string_constant (tree arg, tree *ptr_offset)
   /* Non-constant index into the character array in an ARRAY_REF
      expression or null.  */
   tree varidx = NULL_TREE;
+  tree chartype;
 
   poly_int64 base_off = 0;
 
   if (TREE_CODE (arg) == ADDR_EXPR)
     {
+      tree argtype = TREE_TYPE (arg);
+      chartype = argtype;
+
       arg = TREE_OPERAND (arg, 0);
       tree ref = arg;
       if (TREE_CODE (arg) == ARRAY_REF)
 	{
 	  tree idx = TREE_OPERAND (arg, 1);
-	  if (TREE_CODE (idx) != INTEGER_CST)
+	  if (TREE_CODE (idx) != INTEGER_CST
+	      && TREE_CODE (argtype) == POINTER_TYPE)
 	    {
-	      /* Extract the variable index to prevent
-		 get_addr_base_and_unit_offset() from failing due to
-		 it.  Use it later to compute the non-constant offset
+	      /* From a pointer (but not array) argument extract the variable
+		 index to prevent get_addr_base_and_unit_offset() from failing
+		 due to it.  Use it later to compute the non-constant offset
 		 into the string and return it to the caller.  */
 	      varidx = idx;
 	      ref = TREE_OPERAND (arg, 0);
+
+	      tree type = TREE_TYPE (arg);
+	      if (TREE_CODE (type) == ARRAY_TYPE
+		  && TREE_CODE (type) != INTEGER_TYPE)
+		return NULL_TREE;
 	    }
 	}
       array = get_addr_base_and_unit_offset (ref, &base_off);
@@ -11334,7 +11344,10 @@  string_constant (tree arg, tree *ptr_offset)
       return NULL_TREE;
     }
   else if (DECL_P (arg))
-    array = arg;
+    {
+      array = arg;
+      chartype = TREE_TYPE (arg);
+    }
   else
     return NULL_TREE;
 
@@ -11343,16 +11356,15 @@  string_constant (tree arg, tree *ptr_offset)
     {
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
-      if (tree eltsize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (array))))
-	{
-	  /* Add the scaled variable index to the constant offset.  */
-	  tree eltoff = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
-				     fold_convert (sizetype, varidx),
-				     eltsize);
-	  offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, eltoff);
-	}
-      else
-	return NULL_TREE;
+
+      while (TREE_CODE (chartype) != INTEGER_TYPE)
+	chartype = TREE_TYPE (chartype);
+
+      /* Set the non-constant offset to the non-constant index scaled
+	 by the size of the character type.  */
+      offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset),
+			    fold_convert (sizetype, varidx),
+			    TYPE_SIZE_UNIT (chartype));
     }
 
   if (TREE_CODE (array) == STRING_CST)
@@ -11391,11 +11403,15 @@  string_constant (tree arg, tree *ptr_offset)
       init = fold_ctor_reference (NULL_TREE, init, base_off, 0, array,
 				  &fieldoff);
       HOST_WIDE_INT cstoff;
-      if (init && base_off.is_constant (&cstoff))
-	{
-	  cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
-	  offset = build_int_cst (sizetype, cstoff);
-	}
+      if (!base_off.is_constant (&cstoff))
+	return NULL_TREE;
+
+      cstoff = (cstoff - fieldoff) / BITS_PER_UNIT;
+      tree off = build_int_cst (sizetype, cstoff);
+      if (varidx)
+	offset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, off);
+      else
+	offset = off;
     }
 
   if (!init || TREE_CODE (init) != STRING_CST)
Index: gcc/testsuite/gcc.c-torture/execute/strlen-2.c
===================================================================
--- gcc/testsuite/gcc.c-torture/execute/strlen-2.c	(nonexistent)
+++ gcc/testsuite/gcc.c-torture/execute/strlen-2.c	(working copy)
@@ -0,0 +1,210 @@ 
+/* PR tree-optimization/86532 - Wrong code due to a wrong strlen folding  */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+static const char a[2][3] = { "1", "12" };
+static const char b[2][2][5] = { { "1", "12" }, { "123", "1234" } };
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+
+#define A(expr)								\
+  ((expr) ? (void)0 : (__builtin_printf ("assertion on line %i: %s\n",	\
+					 __LINE__, #expr),		\
+		       __builtin_abort ()))
+
+void test_array_ref_2_3 (void)
+{
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[0][v0]) == 1);
+  A (strlen (&a[v0][0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][0]) == 2);
+  A (strlen (&a[1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][1]) == 1);
+  A (strlen (&a[v1][1]) == 1);
+
+  A (strlen (&a[v1][2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[v0]) == 1);
+  A (strlen (&a[v0][v0]) == 1);
+  A (strlen (&a[i0][v0]) == 1);
+  A (strlen (&a[v0][i0]) == 1);
+
+  A (strlen (a[v1]) == 2);
+  A (strlen (&a[v1][i0]) == 2);
+  A (strlen (&a[i1][v0]) == 2);
+  A (strlen (&a[v1][v0]) == 2);
+
+  A (strlen (&a[v1][i1]) == 1);
+  A (strlen (&a[v1][i1]) == 1);
+
+  A (strlen (&a[v1][i2]) == 0);
+  A (strlen (&a[v1][v2]) == 0);
+}
+
+void test_array_off_2_3 (void)
+{
+  A (strlen (a[0] + 0) == 1);
+  A (strlen (a[0] + v0) == 1);
+  A (strlen (a[v0] + 0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[1] + v0) == 2);
+  A (strlen (a[v1] + 0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + 1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + 2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (a[i0] + i0) == 1);
+  A (strlen (a[i0] + v0) == 1);
+  A (strlen (a[v0] + i0) == 1);
+  A (strlen (a[v0] + v0) == 1);
+
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[i1] + v0) == 2);
+  A (strlen (a[v1] + i0) == 2);
+  A (strlen (a[v1] + v0) == 2);
+
+  A (strlen (a[v1] + i1) == 1);
+  A (strlen (a[v1] + v1) == 1);
+
+  A (strlen (a[v1] + i2) == 0);
+  A (strlen (a[v1] + v2) == 0);
+}
+
+void test_array_ref_2_2_5 (void)
+{
+  A (strlen (b[0][v0]) == 1);
+  A (strlen (b[v0][0]) == 1);
+
+  A (strlen (&b[0][0][v0]) == 1);
+  A (strlen (&b[0][v0][0]) == 1);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (b[0][v1]) == 2);
+  A (strlen (b[v1][0]) == 3);
+
+  A (strlen (&b[0][0][v1]) == 0);
+  A (strlen (&b[0][v1][0]) == 2);
+  A (strlen (&b[v0][0][0]) == 1);
+
+  A (strlen (&b[0][v0][v0]) == 1);
+  A (strlen (&b[v0][0][v0]) == 1);
+  A (strlen (&b[v0][v0][0]) == 1);
+
+  A (strlen (&b[0][v1][v1]) == 1);
+  A (strlen (&b[v1][0][v1]) == 2);
+  A (strlen (&b[v1][v1][0]) == 4);
+  A (strlen (&b[v1][v1][1]) == 3);
+  A (strlen (&b[v1][v1][2]) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][v0]) == 1);
+  A (strlen (b[v0][i0]) == 1);
+
+  A (strlen (&b[i0][i0][v0]) == 1);
+  A (strlen (&b[i0][v0][i0]) == 1);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (b[i0][v1]) == 2);
+  A (strlen (b[v1][i0]) == 3);
+
+  A (strlen (&b[i0][i0][v1]) == 0);
+  A (strlen (&b[i0][v1][i0]) == 2);
+  A (strlen (&b[v0][i0][i0]) == 1);
+
+  A (strlen (&b[i0][v0][v0]) == 1);
+  A (strlen (&b[v0][i0][v0]) == 1);
+  A (strlen (&b[v0][v0][i0]) == 1);
+
+  A (strlen (&b[i0][v1][v1]) == 1);
+  A (strlen (&b[v1][i0][v1]) == 2);
+  A (strlen (&b[v1][v1][i0]) == 4);
+  A (strlen (&b[v1][v1][i1]) == 3);
+  A (strlen (&b[v1][v1][i2]) == 2);
+}
+
+void test_array_off_2_2_5 (void)
+{
+  A (strlen (b[0][0] + v0) == 1);
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[0][0] + v1) == 0);
+  A (strlen (b[0][v1] + 0) == 2);
+  A (strlen (b[v0][0] + 0) == 1);
+
+  A (strlen (b[0][v0] + v0) == 1);
+  A (strlen (b[v0][0] + v0) == 1);
+  A (strlen (b[v0][v0] + 0) == 1);
+
+  A (strlen (b[0][v1] + v1) == 1);
+  A (strlen (b[v1][0] + v1) == 2);
+  A (strlen (b[v1][v1] + 0) == 4);
+  A (strlen (b[v1][v1] + 1) == 3);
+  A (strlen (b[v1][v1] + 2) == 2);
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+
+  A (strlen (b[i0][i0] + v0) == 1);
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + v0) == 1);
+
+  A (strlen (b[i0][i0] + v1) == 0);
+  A (strlen (b[i0][v1] + i0) == 2);
+  A (strlen (b[v0][i0] + i0) == 1);
+
+  A (strlen (b[i0][v0] + v0) == 1);
+  A (strlen (b[v0][i0] + v0) == 1);
+  A (strlen (b[v0][v0] + i0) == 1);
+
+  A (strlen (b[i0][v1] + v1) == 1);
+  A (strlen (b[v1][i0] + v1) == 2);
+  A (strlen (b[v1][v1] + i0) == 4);
+  A (strlen (b[v1][v1] + i1) == 3);
+  A (strlen (b[v1][v1] + i2) == 2);
+}
+
+int main ()
+{
+  test_array_ref_2_3 ();
+  test_array_off_2_3 ();
+
+  test_array_ref_2_2_5 ();
+  test_array_off_2_2_5 ();
+}