diff mbox series

Fix up strnlen handling in tree-ssa-strlen.c (PR tree-optimization/89500)

Message ID 20190225231249.GD7611@tucnak
State New
Headers show
Series Fix up strnlen handling in tree-ssa-strlen.c (PR tree-optimization/89500) | expand

Commit Message

Jakub Jelinek Feb. 25, 2019, 11:12 p.m. UTC
Hi!

The following testcase ICEs, because rhs (previously known string length)
is SSA_NAME (return value from the earlier strlen), but bound is 0 and so
MIN_EXPR yields also size_zero_node.  The noncst_bound initialization
checks if new_rhs is INTEGER_CST and if it is, assumes rhs must be too
and uses tree_int_cst_lt.  While we could just add
|| TREE_CODE (rhs) != INTEGER_CST
before the || tree_int_cst_lt (new_rhs, rhs), I fail to see what
noncst_bound (which would be misnamed anyway) is good for.
For further optimizations in the strlen pass we care about string lengths
only, and we can prove strnlen returns that length only if all of rhs,
new_rhs and bound are actually INTEGER_CSTs for which we can prove that
new_rhs is equal to rhs.  The
          if (si != NULL
              && TREE_CODE (si->nonzero_chars) != SSA_NAME
              && TREE_CODE (si->nonzero_chars) != INTEGER_CST
              && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
            {
              si = unshare_strinfo (si);
              si->nonzero_chars = lhs;
              gcc_assert (si->full_string_p);
            }
hunk is though about trying to improve the case where we could compute
the length through some extra code (e.g. through strchr transformations,
stpcpy etc.) and for the next time we don't want to repeat that computation,
but just use the lhs of the strlen call, i.e. SSA_NAME.  As written above,
we explicitly don't do anything if it is INTEGER_CST, we already know very
well the length.  So the only case where we can prove something is not
useful here.
The only remaining code in the function before returning is:
          if (strlen_to_stridx)
            strlen_to_stridx->put (lhs, stridx_strlenloc (idx, loc));
but I believe doing that for strnlen wasn't really intentional, in the
case we don't know the length before we don't store it either:
      if (strlen_to_stridx && !bound)
        strlen_to_stridx->put (lhs, stridx_strlenloc (idx, loc));
it is for the warnings and I believe it is desirable to do that for strlen
only.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-02-25  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/89500
	* tree-ssa-strlen.c (handle_builtin_strlen): Remove noncst_bound
	variable, return early after optimizing stmt if bound is non-NULL.

	* gcc.dg/pr89500.c: New test.


	Jakub

Comments

Martin Sebor Feb. 25, 2019, 11:55 p.m. UTC | #1
On 2/25/19 4:12 PM, Jakub Jelinek wrote:
> Hi!
> 
> The following testcase ICEs, because rhs (previously known string length)
> is SSA_NAME (return value from the earlier strlen), but bound is 0 and so
> MIN_EXPR yields also size_zero_node.  The noncst_bound initialization
> checks if new_rhs is INTEGER_CST and if it is, assumes rhs must be too
> and uses tree_int_cst_lt.  While we could just add
> || TREE_CODE (rhs) != INTEGER_CST
> before the || tree_int_cst_lt (new_rhs, rhs), I fail to see what
> noncst_bound (which would be misnamed anyway) is good for.
> For further optimizations in the strlen pass we care about string lengths
> only, and we can prove strnlen returns that length only if all of rhs,
> new_rhs and bound are actually INTEGER_CSTs for which we can prove that
> new_rhs is equal to rhs.  The
>            if (si != NULL
>                && TREE_CODE (si->nonzero_chars) != SSA_NAME
>                && TREE_CODE (si->nonzero_chars) != INTEGER_CST
>                && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
>              {
>                si = unshare_strinfo (si);
>                si->nonzero_chars = lhs;
>                gcc_assert (si->full_string_p);
>              }
> hunk is though about trying to improve the case where we could compute
> the length through some extra code (e.g. through strchr transformations,
> stpcpy etc.) and for the next time we don't want to repeat that computation,
> but just use the lhs of the strlen call, i.e. SSA_NAME.  As written above,
> we explicitly don't do anything if it is INTEGER_CST, we already know very
> well the length.  So the only case where we can prove something is not
> useful here.
> The only remaining code in the function before returning is:
>            if (strlen_to_stridx)
>              strlen_to_stridx->put (lhs, stridx_strlenloc (idx, loc));
> but I believe doing that for strnlen wasn't really intentional, in the
> case we don't know the length before we don't store it either:
>        if (strlen_to_stridx && !bound)
>          strlen_to_stridx->put (lhs, stridx_strlenloc (idx, loc));
> it is for the warnings and I believe it is desirable to do that for strlen
> only.

Storing the strnlen result is intentional when it equals strlen.
I'm surprised there is no test case exercising it but the before
and after difference can be seen in this test case:

   void f (void)
   {
     char a[30] = "123456789";

     int n = __builtin_strlen (a);

     __builtin_strncpy (d, a, n);   // -Wstringop-overflow here
   }


   void g (void)
   {
     char a[30] = "123456789";

     int n = __builtin_strnlen (a, 20);

     __builtin_strncpy (d, a, n);   // -Wstringop-overflow here
  }

By not storing the length we lose the second warning.  I'd prefer
not to lose the warning in the second case so your first suggestion
sound like a good fix to me.  I can handle that and add tests for
this if you like.

Martin

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-02-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR tree-optimization/89500
> 	* tree-ssa-strlen.c (handle_builtin_strlen): Remove noncst_bound
> 	variable, return early after optimizing stmt if bound is non-NULL.
> 
> 	* gcc.dg/pr89500.c: New test.
> 
> --- gcc/tree-ssa-strlen.c.jj	2019-01-18 00:33:19.466003372 +0100
> +++ gcc/tree-ssa-strlen.c	2019-02-25 22:13:27.165521677 +0100
> @@ -1294,18 +1294,8 @@ handle_builtin_strlen (gimple_stmt_itera
>   	  if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
>   	    rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
>   
> -	  /* Set for strnlen() calls with a non-constant bound.  */
> -	  bool noncst_bound = false;
>   	  if (bound)
> -	    {
> -	      tree new_rhs
> -		= fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (rhs), rhs, bound);
> -
> -	      noncst_bound = (TREE_CODE (new_rhs) != INTEGER_CST
> -			      || tree_int_cst_lt (new_rhs, rhs));
> -
> -	      rhs = new_rhs;
> -	    }
> +	    rhs = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (rhs), rhs, bound);
>   
>   	  if (!update_call_from_tree (gsi, rhs))
>   	    gimplify_and_update_call_from_tree (gsi, rhs);
> @@ -1317,9 +1307,12 @@ handle_builtin_strlen (gimple_stmt_itera
>   	      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>   	    }
>   
> -	  /* Avoid storing the length for calls to strnlen() with
> -	     a non-constant bound.  */
> -	  if (noncst_bound)
> +	  /* Don't record anything below for strnlen, it is to simplify
> +	     length computation if si->nonzero_chars is complex, but
> +	     the only case where we could prove strnlen return value is
> +	     the string length is if si->nonzero_chars is constant and
> +	     bound is too and si->nonzero_chars <= bound.  */
> +	  if (bound)
>   	    return;
>   
>   	  if (si != NULL
> --- gcc/testsuite/gcc.dg/pr89500.c.jj	2019-02-25 22:14:46.468222667 +0100
> +++ gcc/testsuite/gcc.dg/pr89500.c	2019-02-25 22:14:26.948542413 +0100
> @@ -0,0 +1,17 @@
> +/* PR tree-optimization/89500 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef __SIZE_TYPE__ size_t;
> +extern size_t strlen (const char *);
> +extern size_t strnlen (const char *, size_t);
> +extern void bar (char *);
> +
> +void
> +foo (int *a)
> +{
> +  char c[64];
> +  bar (c);
> +  a[0] = strlen (c);
> +  a[1] = strnlen (c, 0);
> +}
> 
> 	Jakub
>
diff mbox series

Patch

--- gcc/tree-ssa-strlen.c.jj	2019-01-18 00:33:19.466003372 +0100
+++ gcc/tree-ssa-strlen.c	2019-02-25 22:13:27.165521677 +0100
@@ -1294,18 +1294,8 @@  handle_builtin_strlen (gimple_stmt_itera
 	  if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs)))
 	    rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
 
-	  /* Set for strnlen() calls with a non-constant bound.  */
-	  bool noncst_bound = false;
 	  if (bound)
-	    {
-	      tree new_rhs
-		= fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (rhs), rhs, bound);
-
-	      noncst_bound = (TREE_CODE (new_rhs) != INTEGER_CST
-			      || tree_int_cst_lt (new_rhs, rhs));
-
-	      rhs = new_rhs;
-	    }
+	    rhs = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (rhs), rhs, bound);
 
 	  if (!update_call_from_tree (gsi, rhs))
 	    gimplify_and_update_call_from_tree (gsi, rhs);
@@ -1317,9 +1307,12 @@  handle_builtin_strlen (gimple_stmt_itera
 	      print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
 	    }
 
-	  /* Avoid storing the length for calls to strnlen() with
-	     a non-constant bound.  */
-	  if (noncst_bound)
+	  /* Don't record anything below for strnlen, it is to simplify
+	     length computation if si->nonzero_chars is complex, but
+	     the only case where we could prove strnlen return value is
+	     the string length is if si->nonzero_chars is constant and
+	     bound is too and si->nonzero_chars <= bound.  */
+	  if (bound)
 	    return;
 
 	  if (si != NULL
--- gcc/testsuite/gcc.dg/pr89500.c.jj	2019-02-25 22:14:46.468222667 +0100
+++ gcc/testsuite/gcc.dg/pr89500.c	2019-02-25 22:14:26.948542413 +0100
@@ -0,0 +1,17 @@ 
+/* PR tree-optimization/89500 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef __SIZE_TYPE__ size_t;
+extern size_t strlen (const char *);
+extern size_t strnlen (const char *, size_t);
+extern void bar (char *);
+
+void
+foo (int *a)
+{
+  char c[64];
+  bar (c);
+  a[0] = strlen (c);
+  a[1] = strnlen (c, 0);
+}