diff mbox series

avoid assuming known string length is constant (PR 83896)

Message ID df25c3ab-2e13-38c2-3883-25a00edf4d96@gmail.com
State New
Headers show
Series avoid assuming known string length is constant (PR 83896) | expand

Commit Message

Martin Sebor Jan. 16, 2018, 6:56 p.m. UTC
Recent improvements to the strlen pass introduced the assumption
that when the length of a string has been recorded by the pass
the length is necessarily constant.  Bug 83896 shows that this
assumption is not always true, and that GCC fails with an ICE
when it doesn't hold.  To avoid the ICE the attached patch
removes the assumption.

x86_64-linux bootstrap successful, regression test in progress.

Martin

Comments

Richard Sandiford Jan. 16, 2018, 7:37 p.m. UTC | #1
Martin Sebor <msebor@gmail.com> writes:
> Recent improvements to the strlen pass introduced the assumption
> that when the length of a string has been recorded by the pass
> the length is necessarily constant.  Bug 83896 shows that this
> assumption is not always true, and that GCC fails with an ICE
> when it doesn't hold.  To avoid the ICE the attached patch
> removes the assumption.
>
> x86_64-linux bootstrap successful, regression test in progress.
>
> Martin
>
> PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length
>
> gcc/ChangeLog:
>
> 	PR tree-optimization/83896
> 	* tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant.
>
> gcc/testsuite/ChangeLog:
>
> 	PR tree-optimization/83896
> 	* gcc.dg/strlenopt-43.c: New test.
>
> Index: gcc/tree-ssa-strlen.c
> ===================================================================
> --- gcc/tree-ssa-strlen.c	(revision 256752)
> +++ gcc/tree-ssa-strlen.c	(working copy)
> @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
>      }
>  }
>  
> -/* Check if RHS is string_cst possibly wrapped by mem_ref.  */
> +/* If RHS, either directly or indirectly, refers to a string of constant
> +   length, return it.  Otherwise return a negative value.  */
> +
>  static int
>  get_string_len (tree rhs)
>  {

I think this should be returning HOST_WIDE_INT given the unconstrained
tree_to_shwi return.  Same type change for rhslen in the caller.

(Not my call, but it might be better to have a more specific function name,
given that the file already had "get_string_length" before this function
was added.)

> @@ -2789,7 +2791,8 @@ get_string_len (tree rhs)
>  	      if (idx > 0)
>  		{
>  		  strinfo *si = get_strinfo (idx);
> -		  if (si && si->full_string_p)
> +		  if (si && si->full_string_p
> +		      && TREE_CODE (si->nonzero_chars) == INTEGER_CST)
>  		    return tree_to_shwi (si->nonzero_chars);

tree_fits_shwi_p?

Thanks,
Richard

>  		}
>  	    }
> Index: gcc/testsuite/gcc.dg/strlenopt-43.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/strlenopt-43.c	(nonexistent)
> +++ gcc/testsuite/gcc.dg/strlenopt-43.c	(working copy)
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
> +   with non-constant length
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall" } */
> +
> +extern char a[5];
> +extern char b[];
> +
> +void f (void)
> +{
> +  if (__builtin_strlen (b) != 4)
> +    __builtin_memcpy (a, b, sizeof a);
> +}
Jakub Jelinek Jan. 16, 2018, 9:26 p.m. UTC | #2
On Tue, Jan 16, 2018 at 07:37:30PM +0000, Richard Sandiford wrote:
> > -/* Check if RHS is string_cst possibly wrapped by mem_ref.  */
> > +/* If RHS, either directly or indirectly, refers to a string of constant
> > +   length, return it.  Otherwise return a negative value.  */
> > +
> >  static int
> >  get_string_len (tree rhs)
> >  {
> 
> I think this should be returning HOST_WIDE_INT given the unconstrained
> tree_to_shwi return.  Same type change for rhslen in the caller.
> 
> (Not my call, but it might be better to have a more specific function name,
> given that the file already had "get_string_length" before this function
> was added.)

Yeah, certainly for both.

> > @@ -2789,7 +2791,8 @@ get_string_len (tree rhs)
> >  	      if (idx > 0)
> >  		{
> >  		  strinfo *si = get_strinfo (idx);
> > -		  if (si && si->full_string_p)
> > +		  if (si && si->full_string_p
> > +		      && TREE_CODE (si->nonzero_chars) == INTEGER_CST)
> >  		    return tree_to_shwi (si->nonzero_chars);
> 
> tree_fits_shwi_p?

Surely that instead of TREE_CODE check, but even that will not make sure it
fits into host int, so yes, it should be HOST_WIDE_INT and the code should
make sure it is also >= 0.

	Jakub
Martin Sebor Jan. 16, 2018, 10:20 p.m. UTC | #3
On 01/16/2018 12:37 PM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> Recent improvements to the strlen pass introduced the assumption
>> that when the length of a string has been recorded by the pass
>> the length is necessarily constant.  Bug 83896 shows that this
>> assumption is not always true, and that GCC fails with an ICE
>> when it doesn't hold.  To avoid the ICE the attached patch
>> removes the assumption.
>>
>> x86_64-linux bootstrap successful, regression test in progress.
>>
>> Martin
>>
>> PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/83896
>> 	* tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/83896
>> 	* gcc.dg/strlenopt-43.c: New test.
>>
>> Index: gcc/tree-ssa-strlen.c
>> ===================================================================
>> --- gcc/tree-ssa-strlen.c	(revision 256752)
>> +++ gcc/tree-ssa-strlen.c	(working copy)
>> @@ -2772,7 +2772,9 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
>>      }
>>  }
>>
>> -/* Check if RHS is string_cst possibly wrapped by mem_ref.  */
>> +/* If RHS, either directly or indirectly, refers to a string of constant
>> +   length, return it.  Otherwise return a negative value.  */
>> +
>>  static int
>>  get_string_len (tree rhs)
>>  {
>
> I think this should be returning HOST_WIDE_INT given the unconstrained
> tree_to_shwi return.  Same type change for rhslen in the caller.

Thanks for looking at it!  I confess it's not completely clear
to me in what type the pass tracks string lengths.  For string
constants, get_stridx() returns an int with the their length
bit-flipped.  I tried to maintain that invariant in the change
I introduced in the block toward the end of the function (in
a different patch).  But then in other places the pass works
with HOST_WIDE_INT, so it looks like it would be appropriate
to use here as well.

I tried to come up with a test case that would exercise this
conversion but couldn't.  If you (or someone else) have an idea
for one I'd be more than happy to add it to the test suite.

> (Not my call, but it might be better to have a more specific function name,
> given that the file already had "get_string_length" before this function
> was added.)

I renamed it (again), this time to get_string_cst_length().
Nothing better came to mind.

>
>> @@ -2789,7 +2791,8 @@ get_string_len (tree rhs)
>>  	      if (idx > 0)
>>  		{
>>  		  strinfo *si = get_strinfo (idx);
>> -		  if (si && si->full_string_p)
>> +		  if (si && si->full_string_p
>> +		      && TREE_CODE (si->nonzero_chars) == INTEGER_CST)
>>  		    return tree_to_shwi (si->nonzero_chars);
>
> tree_fits_shwi_p?

Sigh.  Yes.  I still keep forgetting about all these gotchas.
Dealing with integers is so painfully error-prone in GCC (as
evident from all the bug reports with ICEs for these things).

It would be much simpler and safer if tree_to_shwi() returned
true on success and false for error (e.g., null, non-const,
or overflow) and took an extra argument for the result.  Then
the code would become:

   HOST_WIDE_INT result;
   if (si && tree_to_shwi (&result, si->nonzero_chars))
     return result;

and it would be nearly impossible to forget to check for bad
input.

Anyway, attached is an updated patch.

Martin

>
> Thanks,
> Richard
>
>>  		}
>>  	    }
>> Index: gcc/testsuite/gcc.dg/strlenopt-43.c
>> ===================================================================
>> --- gcc/testsuite/gcc.dg/strlenopt-43.c	(nonexistent)
>> +++ gcc/testsuite/gcc.dg/strlenopt-43.c	(working copy)
>> @@ -0,0 +1,13 @@
>> +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
>> +   with non-constant length
>> +   { dg-do compile }
>> +   { dg-options "-O2 -Wall" } */
>> +
>> +extern char a[5];
>> +extern char b[];
>> +
>> +void f (void)
>> +{
>> +  if (__builtin_strlen (b) != 4)
>> +    __builtin_memcpy (a, b, sizeof a);
>> +}
PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length

gcc/ChangeLog:

	PR tree-optimization/83896
	* tree-ssa-strlen.c (get_string_len): Rename...
	(get_string_cst_length): ...to this.  Return HOST_WIDE_INT.
	Avoid assuming length is constant.
	(handle_char_store): Use HOST_WIDE_INT for string length.

gcc/testsuite/ChangeLog:

	PR tree-optimization/83896
	* gcc.dg/strlenopt-43.c: New test.

Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c	(revision 256752)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -2772,16 +2772,20 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
     }
 }
 
-/* Check if RHS is string_cst possibly wrapped by mem_ref.  */
-static int
-get_string_len (tree rhs)
+/* If RHS, either directly or indirectly, refers to a string of constant
+   length, return it.  Otherwise return a negative value.  */
+
+static HOST_WIDE_INT
+get_string_cst_length (tree rhs)
 {
   if (TREE_CODE (rhs) == MEM_REF
       && integer_zerop (TREE_OPERAND (rhs, 1)))
     {
-      tree rhs_addr = rhs = TREE_OPERAND (rhs, 0);
+      rhs = TREE_OPERAND (rhs, 0);
       if (TREE_CODE (rhs) == ADDR_EXPR)
 	{
+	  tree rhs_addr = rhs;
+
 	  rhs = TREE_OPERAND (rhs, 0);
 	  if (TREE_CODE (rhs) != STRING_CST)
 	    {
@@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
 	      if (idx > 0)
 		{
 		  strinfo *si = get_strinfo (idx);
-		  if (si && si->full_string_p)
+		  if (si && si->full_string_p
+		      && tree_fits_shwi_p (si->nonzero_chars))
 		    return tree_to_shwi (si->nonzero_chars);
 		}
 	    }
@@ -2801,10 +2806,7 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
     rhs = DECL_INITIAL (rhs);
 
   if (rhs && TREE_CODE (rhs) == STRING_CST)
-    {
-      unsigned HOST_WIDE_INT ilen = strlen (TREE_STRING_POINTER (rhs));
-      return ilen <= INT_MAX ? ilen : -1;
-    }
+    return strlen (TREE_STRING_POINTER (rhs));
 
   return -1;
 }
@@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi)
   unsigned HOST_WIDE_INT offset = 0;
 
   /* Set to the length of the string being assigned if known.  */
-  int rhslen;
+  HOST_WIDE_INT rhslen;
 
   if (TREE_CODE (lhs) == MEM_REF
       && TREE_CODE (TREE_OPERAND (lhs, 0)) == SSA_NAME)
@@ -2967,7 +2969,7 @@ handle_char_store (gimple_stmt_iterator *gsi)
 	}
     }
   else if (idx == 0
-	   && (rhslen = get_string_len (gimple_assign_rhs1 (stmt))) >= 0
+	   && (rhslen = get_string_cst_length (gimple_assign_rhs1 (stmt))) >= 0
 	   && ssaname == NULL_TREE
 	   && TREE_CODE (TREE_TYPE (lhs)) == ARRAY_TYPE)
     {
Index: gcc/testsuite/gcc.dg/strlenopt-43.c
===================================================================
--- gcc/testsuite/gcc.dg/strlenopt-43.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-43.c	(working copy)
@@ -0,0 +1,13 @@
+/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
+   with non-constant length
+   { dg-do compile }
+   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
+
+extern char a[5];
+extern char b[];
+
+void f (void)
+{
+  if (__builtin_strlen (b) != 4)
+    __builtin_memcpy (a, b, sizeof a);
+}
Martin Sebor Jan. 16, 2018, 10:25 p.m. UTC | #4
On 01/16/2018 02:26 PM, Jakub Jelinek wrote:
> On Tue, Jan 16, 2018 at 07:37:30PM +0000, Richard Sandiford wrote:
>>> -/* Check if RHS is string_cst possibly wrapped by mem_ref.  */
>>> +/* If RHS, either directly or indirectly, refers to a string of constant
>>> +   length, return it.  Otherwise return a negative value.  */
>>> +
>>>  static int
>>>  get_string_len (tree rhs)
>>>  {
>>
>> I think this should be returning HOST_WIDE_INT given the unconstrained
>> tree_to_shwi return.  Same type change for rhslen in the caller.
>>
>> (Not my call, but it might be better to have a more specific function name,
>> given that the file already had "get_string_length" before this function
>> was added.)
>
> Yeah, certainly for both.
>
>>> @@ -2789,7 +2791,8 @@ get_string_len (tree rhs)
>>>  	      if (idx > 0)
>>>  		{
>>>  		  strinfo *si = get_strinfo (idx);
>>> -		  if (si && si->full_string_p)
>>> +		  if (si && si->full_string_p
>>> +		      && TREE_CODE (si->nonzero_chars) == INTEGER_CST)
>>>  		    return tree_to_shwi (si->nonzero_chars);
>>
>> tree_fits_shwi_p?
>
> Surely that instead of TREE_CODE check, but even that will not make sure it
> fits into host int, so yes, it should be HOST_WIDE_INT and the code should
> make sure it is also >= 0.

I made these changes except for the last part:  How/when can
the length be negative?

Martin
Jakub Jelinek Jan. 16, 2018, 10:34 p.m. UTC | #5
On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote:
> Thanks for looking at it!  I confess it's not completely clear
> to me in what type the pass tracks string lengths.  For string
> constants, get_stridx() returns an int with the their length
> bit-flipped.  I tried to maintain that invariant in the change

That is because TREE_STRING_LENGTH is an int, so gcc doesn't allow
string literals longer than 2GB.  All other length are tracked as tree.

	Jakub
Jakub Jelinek Jan. 22, 2018, 4:33 p.m. UTC | #6
On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote:
> gcc/ChangeLog:
> 
> 	PR tree-optimization/83896
> 	* tree-ssa-strlen.c (get_string_len): Rename...
> 	(get_string_cst_length): ...to this.  Return HOST_WIDE_INT.
> 	Avoid assuming length is constant.
> 	(handle_char_store): Use HOST_WIDE_INT for string length.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/83896
> 	* gcc.dg/strlenopt-43.c: New test.
...
> 
>    if (TREE_CODE (rhs) == MEM_REF
>        && integer_zerop (TREE_OPERAND (rhs, 1)))

For the future, there is no reason it couldn't handle
also non-zero offsets if the to be returned length is bigger or
equal than that offset, where the offset can be then subtracted.
But let's defer that for GCC9.

> @@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
>  	      if (idx > 0)
>  		{
>  		  strinfo *si = get_strinfo (idx);
> -		  if (si && si->full_string_p)
> +		  if (si && si->full_string_p
> +		      && tree_fits_shwi_p (si->nonzero_chars))

If a && or || using condition doesn't fit onto a single line, it should be
split into one condition per line, so please change the above into:
		  if (si
		      && si->full_string_p
		      && tree_fits_shwi_p (si->nonzero_chars))

> @@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi)
>    unsigned HOST_WIDE_INT offset = 0;
>  
>    /* Set to the length of the string being assigned if known.  */
> -  int rhslen;
> +  HOST_WIDE_INT rhslen;

Please remove the empty line above the above comment, the function
starts with definitions of multiple variables, rhslen isn't even
initialized and there is nothing special on it compared to others,
so they should be in one block.

Ok for trunk with those changes.

> --- gcc/testsuite/gcc.dg/strlenopt-43.c	(nonexistent)
> +++ gcc/testsuite/gcc.dg/strlenopt-43.c	(working copy)
> @@ -0,0 +1,13 @@
> +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
> +   with non-constant length
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
> +
> +extern char a[5];
> +extern char b[];
> +
> +void f (void)
> +{
> +  if (__builtin_strlen (b) != 4)
> +    __builtin_memcpy (a, b, sizeof a);
> +}

Most of the strlenopt*.c tests use the strlenopt.h header and then
use the non-__builtin_ function names, if you want to change that too,
please do so, but not a requirement.

	Jakub
Martin Sebor Jan. 26, 2018, 5:23 p.m. UTC | #7
On 01/22/2018 09:33 AM, Jakub Jelinek wrote:
> On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote:
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/83896
>> 	* tree-ssa-strlen.c (get_string_len): Rename...
>> 	(get_string_cst_length): ...to this.  Return HOST_WIDE_INT.
>> 	Avoid assuming length is constant.
>> 	(handle_char_store): Use HOST_WIDE_INT for string length.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/83896
>> 	* gcc.dg/strlenopt-43.c: New test.
> ...
>>
>>    if (TREE_CODE (rhs) == MEM_REF
>>        && integer_zerop (TREE_OPERAND (rhs, 1)))
>
> For the future, there is no reason it couldn't handle
> also non-zero offsets if the to be returned length is bigger or
> equal than that offset, where the offset can be then subtracted.
> But let's defer that for GCC9.

I opened bug 84069 for this oversight.

>
>> @@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
>>  	      if (idx > 0)
>>  		{
>>  		  strinfo *si = get_strinfo (idx);
>> -		  if (si && si->full_string_p)
>> +		  if (si && si->full_string_p
>> +		      && tree_fits_shwi_p (si->nonzero_chars))
>
> If a && or || using condition doesn't fit onto a single line, it should be
> split into one condition per line, so please change the above into:
> 		  if (si
> 		      && si->full_string_p
> 		      && tree_fits_shwi_p (si->nonzero_chars))
>
>> @@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi)
>>    unsigned HOST_WIDE_INT offset = 0;
>>
>>    /* Set to the length of the string being assigned if known.  */
>> -  int rhslen;
>> +  HOST_WIDE_INT rhslen;
>
> Please remove the empty line above the above comment, the function
> starts with definitions of multiple variables, rhslen isn't even
> initialized and there is nothing special on it compared to others,
> so they should be in one block.
>
> Ok for trunk with those changes.

I committed the patch with the tweaks suggested above and below
in 257100.

Martin

>
>> --- gcc/testsuite/gcc.dg/strlenopt-43.c	(nonexistent)
>> +++ gcc/testsuite/gcc.dg/strlenopt-43.c	(working copy)
>> @@ -0,0 +1,13 @@
>> +/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
>> +   with non-constant length
>> +   { dg-do compile }
>> +   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
>> +
>> +extern char a[5];
>> +extern char b[];
>> +
>> +void f (void)
>> +{
>> +  if (__builtin_strlen (b) != 4)
>> +    __builtin_memcpy (a, b, sizeof a);
>> +}
>
> Most of the strlenopt*.c tests use the strlenopt.h header and then
> use the non-__builtin_ function names, if you want to change that too,
> please do so, but not a requirement.
>
> 	Jakub
>
diff mbox series

Patch

PR tree-optimization/83896 - ice in get_string_len on a call to strlen with non-constant length

gcc/ChangeLog:

	PR tree-optimization/83896
	* tree-ssa-strlen.c (get_string_len): Avoid assuming length is constant.

gcc/testsuite/ChangeLog:

	PR tree-optimization/83896
	* gcc.dg/strlenopt-43.c: New test.

Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c	(revision 256752)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -2772,7 +2772,9 @@  handle_pointer_plus (gimple_stmt_iterator *gsi)
     }
 }
 
-/* Check if RHS is string_cst possibly wrapped by mem_ref.  */
+/* If RHS, either directly or indirectly, refers to a string of constant
+   length, return it.  Otherwise return a negative value.  */
+
 static int
 get_string_len (tree rhs)
 {
@@ -2789,7 +2791,8 @@  get_string_len (tree rhs)
 	      if (idx > 0)
 		{
 		  strinfo *si = get_strinfo (idx);
-		  if (si && si->full_string_p)
+		  if (si && si->full_string_p
+		      && TREE_CODE (si->nonzero_chars) == INTEGER_CST)
 		    return tree_to_shwi (si->nonzero_chars);
 		}
 	    }
Index: gcc/testsuite/gcc.dg/strlenopt-43.c
===================================================================
--- gcc/testsuite/gcc.dg/strlenopt-43.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-43.c	(working copy)
@@ -0,0 +1,13 @@ 
+/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
+   with non-constant length
+   { dg-do compile }
+   { dg-options "-O2 -Wall" } */
+
+extern char a[5];
+extern char b[];
+
+void f (void)
+{
+  if (__builtin_strlen (b) != 4)
+    __builtin_memcpy (a, b, sizeof a);
+}