diff mbox series

[committed] Use structure to bubble up information about unterminated strings from c_strlen

Message ID 5bac7adf-fad3-bd4a-985c-7f9d30bcebbd@redhat.com
State New
Headers show
Series [committed] Use structure to bubble up information about unterminated strings from c_strlen | expand

Commit Message

Jeff Law Sept. 29, 2018, 4:06 p.m. UTC
This patch changes the NONSTR argument to c_strlen to instead be a
little data structure c_strlen can populate with nuggets of information
about the string.

There's clearly a need for the decl related to the non-string argument.
I see an immediate need for the length of a non-terminated string
(c_strlen returns NULL for non-terminated strings).  I also see a need
for the offset within the non-terminated strong as well.

We only populate the structure when c_strlen encounters a non-terminated
string.  One could argue we should always fill in the members.  Right
now I think filling it in for unterminated cases makes the most sense,
but I could be convinced otherwise.

I won't be surprised if subsequent warnings from Martin need additional
information about the string.  The idea here is we can add more elements
to the structure without continually adding arguments to c_strlen.

Bootstrapped in isolation as well as with Martin's patches for strnlen
and sprintf checking.  Installing on the trunk.

Jeff
* builtins.c (unterminated_array): Pass in c_strlen_data * to
            c_strlen rather than just a tree *.
            (c_strlen): Change NONSTR argument to a c_strlen_data pointer.
            Update recursive calls appropriately.  If caller did not provide a
            suitable data pointer, create a local one.  When a non-terminated
            string is discovered, bubble up information about the string via the
            c_strlen_data object.
            * builtins.h (c_strlen): Update prototype.
            (c_strlen_data): New structure.
            * gimple-fold.c (get_range_strlen): Update calls to c_strlen.
            For a type 2 call, if c_strlen indicates a non-terminated string
            use the length of the non-terminated string.
            (gimple_fold_builtin_stpcpy): Update calls to c_strlen.

Comments

Christophe Lyon Oct. 1, 2018, 9:46 p.m. UTC | #1
On Sat, 29 Sep 2018 at 18:06, Jeff Law <law@redhat.com> wrote:
>
>
> This patch changes the NONSTR argument to c_strlen to instead be a
> little data structure c_strlen can populate with nuggets of information
> about the string.
>
> There's clearly a need for the decl related to the non-string argument.
> I see an immediate need for the length of a non-terminated string
> (c_strlen returns NULL for non-terminated strings).  I also see a need
> for the offset within the non-terminated strong as well.
>
> We only populate the structure when c_strlen encounters a non-terminated
> string.  One could argue we should always fill in the members.  Right
> now I think filling it in for unterminated cases makes the most sense,
> but I could be convinced otherwise.
>
> I won't be surprised if subsequent warnings from Martin need additional
> information about the string.  The idea here is we can add more elements
> to the structure without continually adding arguments to c_strlen.
>
> Bootstrapped in isolation as well as with Martin's patches for strnlen
> and sprintf checking.  Installing on the trunk.
>

Hi Jeff,

+             /* If TYPE is asking for a maximum, then use any
+                length (including the length of an unterminated
+                string) for VAL.  */
+             if (type == 2)
+               val = data.len;

It seems this part is dead-code, since the case type==2 is handled in
the "then" part of the "if" (this code is in the "else" part).

Since you added a comment, I suspect you explicitly tested it, though?

Christophe

> Jeff
Jeff Law Oct. 1, 2018, 9:49 p.m. UTC | #2
On 10/1/18 3:46 PM, Christophe Lyon wrote:
> On Sat, 29 Sep 2018 at 18:06, Jeff Law <law@redhat.com> wrote:
>>
>>
>> This patch changes the NONSTR argument to c_strlen to instead be a
>> little data structure c_strlen can populate with nuggets of information
>> about the string.
>>
>> There's clearly a need for the decl related to the non-string argument.
>> I see an immediate need for the length of a non-terminated string
>> (c_strlen returns NULL for non-terminated strings).  I also see a need
>> for the offset within the non-terminated strong as well.
>>
>> We only populate the structure when c_strlen encounters a non-terminated
>> string.  One could argue we should always fill in the members.  Right
>> now I think filling it in for unterminated cases makes the most sense,
>> but I could be convinced otherwise.
>>
>> I won't be surprised if subsequent warnings from Martin need additional
>> information about the string.  The idea here is we can add more elements
>> to the structure without continually adding arguments to c_strlen.
>>
>> Bootstrapped in isolation as well as with Martin's patches for strnlen
>> and sprintf checking.  Installing on the trunk.
>>
> 
> Hi Jeff,
> 
> +             /* If TYPE is asking for a maximum, then use any
> +                length (including the length of an unterminated
> +                string) for VAL.  */
> +             if (type == 2)
> +               val = data.len;
> 
> It seems this part is dead-code, since the case type==2 is handled in
> the "then" part of the "if" (this code is in the "else" part).
> 
> Since you added a comment, I suspect you explicitly tested it, though?
Yea, I know that code got triggered at some point.  It may be dead now
after some cleanups, or it might have been needed by the patch I just
installed on the sprintf bits.  I'll double-check either way. (I'd seen
it in the coverity scans this morning as well).

jeff
Jeff Law Oct. 2, 2018, 2:11 p.m. UTC | #3
On 10/1/18 3:46 PM, Christophe Lyon wrote:
> On Sat, 29 Sep 2018 at 18:06, Jeff Law <law@redhat.com> wrote:
>>
>>
>> This patch changes the NONSTR argument to c_strlen to instead be a
>> little data structure c_strlen can populate with nuggets of information
>> about the string.
>>
>> There's clearly a need for the decl related to the non-string argument.
>> I see an immediate need for the length of a non-terminated string
>> (c_strlen returns NULL for non-terminated strings).  I also see a need
>> for the offset within the non-terminated strong as well.
>>
>> We only populate the structure when c_strlen encounters a non-terminated
>> string.  One could argue we should always fill in the members.  Right
>> now I think filling it in for unterminated cases makes the most sense,
>> but I could be convinced otherwise.
>>
>> I won't be surprised if subsequent warnings from Martin need additional
>> information about the string.  The idea here is we can add more elements
>> to the structure without continually adding arguments to c_strlen.
>>
>> Bootstrapped in isolation as well as with Martin's patches for strnlen
>> and sprintf checking.  Installing on the trunk.
>>
> 
> Hi Jeff,
> 
> +             /* If TYPE is asking for a maximum, then use any
> +                length (including the length of an unterminated
> +                string) for VAL.  */
> +             if (type == 2)
> +               val = data.len;
> 
> It seems this part is dead-code, since the case type==2 is handled in
> the "then" part of the "if" (this code is in the "else" part).
> 
> Since you added a comment, I suspect you explicitly tested it, though?
So it's not needed by patch #6 (strnlen), but patch #4 (not yet
committed) does need some work in this area -- but even so I'm not sure
what that's going to look like yet.

So I've pulled the dead code out and pushed that commit to the trunk
after the usual bootstrap and regression test cycle.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 40f87aaeae5..59a01f86436 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2018-10-02  Jeff Law  <law@redhat.com>
+
+	* gimple-fold.c (get_range_strlen): Remove dead code.
+
 2018-10-02  Martin Sebor  <msebor@redhat.com>
 	    Jeff Law  <law@redhat.com>
 
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index cf04c92180b..fa1fc60876c 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1345,14 +1345,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	  /* If we potentially had a non-terminated string, then
 	     bubble that information up to the caller.  */
 	  if (!val)
-	    {
-	      *nonstr = data.decl;
-	      /* If TYPE is asking for a maximum, then use any
-		 length (including the length of an unterminated
-		 string) for VAL.  */
-	      if (type == 2)
-		val = data.len;
-	    }
+	    *nonstr = data.decl;
 	}
 
       if (!val && fuzzy)
diff mbox series

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 7be6ceb3d62..23e0ec7b34d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,19 @@ 
+2018-09-29  Jeff Law  <law@redhat.com>
+
+	* builtins.c (unterminated_array): Pass in c_strlen_data * to
+	c_strlen rather than just a tree *.
+	(c_strlen): Change NONSTR argument to a c_strlen_data pointer.
+	Update recursive calls appropriately.  If caller did not provide a
+	suitable data pointer, create a local one.  When a non-terminated
+	string is discovered, bubble up information about the string via the
+	c_strlen_data object.
+	* builtins.h (c_strlen): Update prototype.
+	(c_strlen_data): New structure.
+	* gimple-fold.c (get_range_strlen): Update calls to c_strlen.
+	For a type 2 call, if c_strlen indicates a non-terminated string
+	use the length of the non-terminated string.
+	(gimple_fold_builtin_stpcpy): Update calls to c_strlen.
+
 2018-09-28  John David Anglin  <danglin@gcc.gnu.org>
 
 	* match.pd (simple_comparison): Don't optimize if either operand is
diff --git a/gcc/builtins.c b/gcc/builtins.c
index e655623febd..fe411efd9a9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -570,9 +570,10 @@  warn_string_no_nul (location_t loc, const char *fn, tree arg, tree decl)
 tree
 unterminated_array (tree exp)
 {
-  tree nonstr = NULL;
-  c_strlen (exp, 1, &nonstr);
-  return nonstr;
+  c_strlen_data data;
+  memset (&data, 0, sizeof (c_strlen_data));
+  c_strlen (exp, 1, &data);
+  return data.decl;
 }
 
 /* Compute the length of a null-terminated character string or wide
@@ -592,10 +593,12 @@  unterminated_array (tree exp)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
-   If a not zero-terminated string value is encountered and NONSTR is
-   non-zero, the declaration of the string value is assigned to *NONSTR.
-   *NONSTR is accumulating, thus not cleared on success, therefore it has
-   to be initialized to NULL_TREE by the caller.
+   Additional information about the string accessed may be recorded
+   in DATA.  For example, if SRC references an unterminated string,
+   then the declaration will be stored in the DECL field.   If the
+   length of the unterminated string can be determined, it'll be
+   stored in the LEN field.  Note this length could well be different
+   than what a C strlen call would return.
 
    ELTSIZE is 1 for normal single byte character strings, and 2 or
    4 for wide characer strings.  ELTSIZE is by default 1.
@@ -603,8 +606,16 @@  unterminated_array (tree exp)
    The value returned is of type `ssizetype'.  */
 
 tree
-c_strlen (tree src, int only_value, tree *nonstr, unsigned eltsize)
+c_strlen (tree src, int only_value, c_strlen_data *data, unsigned eltsize)
 {
+  /* If we were not passed a DATA pointer, then get one to a local
+     structure.  That avoids having to check DATA for NULL before
+     each time we want to use it.  */
+  c_strlen_data local_strlen_data;
+  memset (&local_strlen_data, 0, sizeof (c_strlen_data));
+  if (!data)
+    data = &local_strlen_data;
+
   gcc_checking_assert (eltsize == 1 || eltsize == 2 || eltsize == 4);
   STRIP_NOPS (src);
   if (TREE_CODE (src) == COND_EXPR
@@ -612,15 +623,15 @@  c_strlen (tree src, int only_value, tree *nonstr, unsigned eltsize)
     {
       tree len1, len2;
 
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value, nonstr, eltsize);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value, nonstr, eltsize);
+      len1 = c_strlen (TREE_OPERAND (src, 1), only_value, data, eltsize);
+      len2 = c_strlen (TREE_OPERAND (src, 2), only_value, data, eltsize);
       if (tree_int_cst_equal (len1, len2))
 	return len1;
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value, nonstr, eltsize);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, data, eltsize);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
@@ -666,13 +677,15 @@  c_strlen (tree src, int only_value, tree *nonstr, unsigned eltsize)
 	 start searching for it.  */
       unsigned len = string_length (ptr, eltsize, strelts);
 
-      /* Return when an embedded null character is found or none at all.  */
+      /* Return when an embedded null character is found or none at all.
+	 In the latter case, set the DECL/LEN field in the DATA structure
+	 so that callers may examine them.  */
       if (len + 1 < strelts)
 	return NULL_TREE;
       else if (len >= maxelts)
 	{
-	  if (nonstr && decl)
-	    *nonstr = decl;
+	  data->decl = decl;
+	  data->len = ssize_int (len);
 	  return NULL_TREE;
 	}
 
@@ -737,11 +750,12 @@  c_strlen (tree src, int only_value, tree *nonstr, unsigned eltsize)
 				strelts - eltoff);
 
   /* Don't know what to return if there was no zero termination.
-     Ideally this would turn into a gcc_checking_assert over time.  */
+     Ideally this would turn into a gcc_checking_assert over time.
+     Set DECL/LEN so callers can examine them.  */
   if (len >= maxelts - eltoff)
     {
-      if (nonstr && decl)
-	*nonstr = decl;
+      data->decl = decl;
+      data->len = ssize_int (len);
       return NULL_TREE;
     }
 
@@ -3965,13 +3979,14 @@  expand_builtin_stpcpy_1 (tree exp, rtx target, machine_mode mode)
 	 compile-time, not an expression containing a string.  This is
 	 because the latter will potentially produce pessimized code
 	 when used to produce the return value.  */
-      tree nonstr = NULL_TREE;
+      c_strlen_data data;
+      memset (&data, 0, sizeof (c_strlen_data));
       if (!c_getstr (src, NULL)
-	  || !(len = c_strlen (src, 0, &nonstr, 1)))
+	  || !(len = c_strlen (src, 0, &data, 1)))
 	return expand_movstr (dst, src, target, /*endp=*/2);
 
-      if (nonstr && !TREE_NO_WARNING (exp))
-	warn_string_no_nul (EXPR_LOCATION (exp), "stpcpy", src, nonstr);
+      if (data.decl && !TREE_NO_WARNING (exp))
+	warn_string_no_nul (EXPR_LOCATION (exp), "stpcpy", src, data.decl);
 
       lenp1 = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
       ret = expand_builtin_mempcpy_args (dst, src, lenp1,
@@ -8444,22 +8459,23 @@  fold_builtin_strlen (location_t loc, tree type, tree arg)
     return NULL_TREE;
   else
     {
-      tree nonstr = NULL_TREE;
-      tree len = c_strlen (arg, 0, &nonstr);
+      c_strlen_data data;
+      memset (&data, 0, sizeof (c_strlen_data));
+      tree len = c_strlen (arg, 0, &data);
 
       if (len)
 	return fold_convert_loc (loc, type, len);
 
-      if (!nonstr)
-	c_strlen (arg, 1, &nonstr);
+      if (!data.decl)
+	c_strlen (arg, 1, &data);
 
-      if (nonstr)
+      if (data.decl)
 	{
 	  if (EXPR_HAS_LOCATION (arg))
 	    loc = EXPR_LOCATION (arg);
 	  else if (loc == UNKNOWN_LOCATION)
 	    loc = input_location;
-	  warn_string_no_nul (loc, "strlen", arg, nonstr);
+	  warn_string_no_nul (loc, "strlen", arg, data.decl);
 	}
 
       return NULL_TREE;
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 45ad684cb52..3801251f372 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -57,7 +57,14 @@  extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
 extern unsigned string_length (const void*, unsigned, unsigned);
-extern tree c_strlen (tree, int, tree * = NULL, unsigned = 1);
+struct c_strlen_data
+{
+  tree decl;
+  tree len;
+  tree off;
+};
+
+extern tree c_strlen (tree, int, c_strlen_data * = NULL, unsigned = 1);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 1e84722d22d..cf04c92180b 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1337,7 +1337,23 @@  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	    return false;
 	}
       else
-	val = c_strlen (arg, 1, nonstr, eltsize);
+	{
+	  c_strlen_data data;
+	  memset (&data, 0, sizeof (c_strlen_data));
+	  val = c_strlen (arg, 1, &data, eltsize);
+
+	  /* If we potentially had a non-terminated string, then
+	     bubble that information up to the caller.  */
+	  if (!val)
+	    {
+	      *nonstr = data.decl;
+	      /* If TYPE is asking for a maximum, then use any
+		 length (including the length of an unterminated
+		 string) for VAL.  */
+	      if (type == 2)
+		val = data.len;
+	    }
+	}
 
       if (!val && fuzzy)
 	{
@@ -2812,21 +2828,22 @@  gimple_fold_builtin_stpcpy (gimple_stmt_iterator *gsi)
     }
 
   /* Set to non-null if ARG refers to an unterminated array.  */
-  tree nonstr = NULL;
-  tree len = c_strlen (src, 1, &nonstr, 1);
+  c_strlen_data data;
+  memset (&data, 0, sizeof (c_strlen_data));
+  tree len = c_strlen (src, 1, &data, 1);
   if (!len
       || TREE_CODE (len) != INTEGER_CST)
     {
-      nonstr = unterminated_array (src);
-      if (!nonstr)
+      data.decl = unterminated_array (src);
+      if (!data.decl)
 	return false;
     }
 
-  if (nonstr)
+  if (data.decl)
     {
       /* Avoid folding calls with unterminated arrays.  */
       if (!gimple_no_warning_p (stmt))
-	warn_string_no_nul (loc, "stpcpy", src, nonstr);
+	warn_string_no_nul (loc, "stpcpy", src, data.decl);
       gimple_set_no_warning (stmt, true);
       return false;
     }