diff mbox series

include size and offset in -Wstringop-overflow

Message ID 9d3210bf-e1bd-81ac-b1a0-5c66c8ec6db7@gmail.com
State New
Headers show
Series include size and offset in -Wstringop-overflow | expand

Commit Message

Martin Sebor Nov. 6, 2019, 6 p.m. UTC
The -Wstringop-overflow warnings for single-byte and multi-byte
stores mention the amount of data being stored and the amount of
space remaining in the destination, such as:

warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
   123 |   *p = 0;
       |   ~~~^~~
note: destination object declared here
    45 |   char b[N];
       |        ^

A warning like this can take some time to analyze.  First, the size
of the destination isn't mentioned and may not be easy to tell from
the sources.  In the note above, when N's value is the result of
some non-trivial computation, chasing it down may be a small project
in and of itself.  Second, it's also not clear why the region size
is zero.  It could be because the offset is exactly N, or because
it's negative, or because it's in some range greater than N.

Mentioning both the size of the destination object and the offset
makes the existing messages clearer, are will become essential when
GCC starts diagnosing overflow into allocated buffers (as my
follow-on patch does).

The attached patch enhances -Wstringop-overflow to do this by
letting compute_objsize return the offset to its caller, doing
something similar in get_stridx, and adding a new function to
the strlen pass to issue this enhanced warning (eventually, I'd
like the function to replace the -Wstringop-overflow handler in
builtins.c).  With the change, the note above might read something
like:

note: at offset 11 to object ‘b’ with size 8 declared here
    45 |   char b[N];
       |        ^

Tested on x86_64-linux.

Martin

Comments

Jeff Law Nov. 6, 2019, 6:55 p.m. UTC | #1
On 11/6/19 11:00 AM, Martin Sebor wrote:
> The -Wstringop-overflow warnings for single-byte and multi-byte
> stores mention the amount of data being stored and the amount of
> space remaining in the destination, such as:
> 
> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>   123 |   *p = 0;
>       |   ~~~^~~
> note: destination object declared here
>    45 |   char b[N];
>       |        ^
> 
> A warning like this can take some time to analyze.  First, the size
> of the destination isn't mentioned and may not be easy to tell from
> the sources.  In the note above, when N's value is the result of
> some non-trivial computation, chasing it down may be a small project
> in and of itself.  Second, it's also not clear why the region size
> is zero.  It could be because the offset is exactly N, or because
> it's negative, or because it's in some range greater than N.
> 
> Mentioning both the size of the destination object and the offset
> makes the existing messages clearer, are will become essential when
> GCC starts diagnosing overflow into allocated buffers (as my
> follow-on patch does).
> 
> The attached patch enhances -Wstringop-overflow to do this by
> letting compute_objsize return the offset to its caller, doing
> something similar in get_stridx, and adding a new function to
> the strlen pass to issue this enhanced warning (eventually, I'd
> like the function to replace the -Wstringop-overflow handler in
> builtins.c).  With the change, the note above might read something
> like:
> 
> note: at offset 11 to object ‘b’ with size 8 declared here
>    45 |   char b[N];
>       |        ^
> 
> Tested on x86_64-linux.
> 
> Martin
> 
> gcc-store-offset.diff
> 
> gcc/ChangeLog:
> 
> 	* builtins.c (compute_objsize): Add an argument and set it to offset
> 	into destination.
> 	* builtins.h (compute_objsize): Add an argument.
> 	* tree-object-size.c (addr_object_size): Add an argument and set it
> 	to offset into destination.
> 	(compute_builtin_object_size): Same.
> 	* tree-object-size.h (compute_builtin_object_size): Add an argument.
> 	* tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
> 	to offset into destination.
> 	(maybe_warn_overflow): New function.
> 	(handle_store): Call maybe_warn_overflow to issue warnings.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* c-c++-common/Wstringop-overflow-2.c: Adjust text of expected messages.
> 	* g++.dg/warn/Wstringop-overflow-3.C: Same.
> 	* gcc.dg/Wstringop-overflow-17.c: Same.
> 

> Index: gcc/tree-ssa-strlen.c
> ===================================================================
> --- gcc/tree-ssa-strlen.c	(revision 277886)
> +++ gcc/tree-ssa-strlen.c	(working copy)
> @@ -189,6 +189,52 @@ struct laststmt_struct
>  static int get_stridx_plus_constant (strinfo *, unsigned HOST_WIDE_INT, tree);
>  static void handle_builtin_stxncpy (built_in_function, gimple_stmt_iterator *);
>  
> +/* Sets MINMAX to either the constant value or the range VAL is in
> +   and returns true on success.  */
> +
> +static bool
> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
> +{
> +  if (tree_fits_uhwi_p (val))
> +    {
> +      minmax[0] = minmax[1] = wi::to_wide (val);
> +      return true;
> +    }
> +
> +  if (TREE_CODE (val) != SSA_NAME)
> +    return false;
> +
> +  if (rvals)
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (val);
> +      if (gimple_assign_single_p (def)
> +	  && gimple_assign_rhs_code (def) == INTEGER_CST)
> +	{
> +	  /* get_value_range returns [0, N] for constant assignments.  */
> +	  val = gimple_assign_rhs1 (def);
> +	  minmax[0] = minmax[1] = wi::to_wide (val);
> +	  return true;
> +	}
Umm, something seems really off with this hunk.  If the SSA_NAME is set
via a simple constant assignment, then the range ought to be a singleton
ie [CONST,CONST].   Is there are particular test were this is not true?

The only way offhand I could see this happening is if originally the RHS
wasn't a constant, but due to optimizations it either simplified into a
constant or a constant was propagated into an SSA_NAME appearing on the
RHS.  This would have to happen between the last range analysis and the
point where you're making this query.




> +
> +  // FIXME: handle anti-ranges?
> +  return false;
Please don't if we can avoid them.  anti-ranges are on the chopping
block, so I'd prefer not to add cases where we're trying to handle them
if we can reasonably avoid it.


No objections elsewhere.  So I think we just need to figure out what's
going on with the ranges when you've got an INTEGER_CST on the RHS of an
assignment.

jeff
Martin Sebor Nov. 6, 2019, 8:27 p.m. UTC | #2
On 11/6/19 11:55 AM, Jeff Law wrote:
> On 11/6/19 11:00 AM, Martin Sebor wrote:
>> The -Wstringop-overflow warnings for single-byte and multi-byte
>> stores mention the amount of data being stored and the amount of
>> space remaining in the destination, such as:
>>
>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>    123 |   *p = 0;
>>        |   ~~~^~~
>> note: destination object declared here
>>     45 |   char b[N];
>>        |        ^
>>
>> A warning like this can take some time to analyze.  First, the size
>> of the destination isn't mentioned and may not be easy to tell from
>> the sources.  In the note above, when N's value is the result of
>> some non-trivial computation, chasing it down may be a small project
>> in and of itself.  Second, it's also not clear why the region size
>> is zero.  It could be because the offset is exactly N, or because
>> it's negative, or because it's in some range greater than N.
>>
>> Mentioning both the size of the destination object and the offset
>> makes the existing messages clearer, are will become essential when
>> GCC starts diagnosing overflow into allocated buffers (as my
>> follow-on patch does).
>>
>> The attached patch enhances -Wstringop-overflow to do this by
>> letting compute_objsize return the offset to its caller, doing
>> something similar in get_stridx, and adding a new function to
>> the strlen pass to issue this enhanced warning (eventually, I'd
>> like the function to replace the -Wstringop-overflow handler in
>> builtins.c).  With the change, the note above might read something
>> like:
>>
>> note: at offset 11 to object ‘b’ with size 8 declared here
>>     45 |   char b[N];
>>        |        ^
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>> gcc-store-offset.diff
>>
>> gcc/ChangeLog:
>>
>> 	* builtins.c (compute_objsize): Add an argument and set it to offset
>> 	into destination.
>> 	* builtins.h (compute_objsize): Add an argument.
>> 	* tree-object-size.c (addr_object_size): Add an argument and set it
>> 	to offset into destination.
>> 	(compute_builtin_object_size): Same.
>> 	* tree-object-size.h (compute_builtin_object_size): Add an argument.
>> 	* tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>> 	to offset into destination.
>> 	(maybe_warn_overflow): New function.
>> 	(handle_store): Call maybe_warn_overflow to issue warnings.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* c-c++-common/Wstringop-overflow-2.c: Adjust text of expected messages.
>> 	* g++.dg/warn/Wstringop-overflow-3.C: Same.
>> 	* gcc.dg/Wstringop-overflow-17.c: Same.
>>
> 
>> Index: gcc/tree-ssa-strlen.c
>> ===================================================================
>> --- gcc/tree-ssa-strlen.c	(revision 277886)
>> +++ gcc/tree-ssa-strlen.c	(working copy)
>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>   static int get_stridx_plus_constant (strinfo *, unsigned HOST_WIDE_INT, tree);
>>   static void handle_builtin_stxncpy (built_in_function, gimple_stmt_iterator *);
>>   
>> +/* Sets MINMAX to either the constant value or the range VAL is in
>> +   and returns true on success.  */
>> +
>> +static bool
>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
>> +{
>> +  if (tree_fits_uhwi_p (val))
>> +    {
>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>> +      return true;
>> +    }
>> +
>> +  if (TREE_CODE (val) != SSA_NAME)
>> +    return false;
>> +
>> +  if (rvals)
>> +    {
>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>> +      if (gimple_assign_single_p (def)
>> +	  && gimple_assign_rhs_code (def) == INTEGER_CST)
>> +	{
>> +	  /* get_value_range returns [0, N] for constant assignments.  */
>> +	  val = gimple_assign_rhs1 (def);
>> +	  minmax[0] = minmax[1] = wi::to_wide (val);
>> +	  return true;
>> +	}
> Umm, something seems really off with this hunk.  If the SSA_NAME is set
> via a simple constant assignment, then the range ought to be a singleton
> ie [CONST,CONST].   Is there are particular test were this is not true?
> 
> The only way offhand I could see this happening is if originally the RHS
> wasn't a constant, but due to optimizations it either simplified into a
> constant or a constant was propagated into an SSA_NAME appearing on the
> RHS.  This would have to happen between the last range analysis and the
> point where you're making this query.

Yes, I think that's right.  Here's an example where it happens:

   void f (void)
   {
     char s[] = "1234";
     unsigned n = strlen (s);
     char vla[n];   // or malloc (n)
     vla[n] = 0;    // n = [4, 4]
     ...
   }

The strlen call is folded to 4 but that's not propagated to
the access until sometime after the strlen pass is done.

>> +  // FIXME: handle anti-ranges?
>> +  return false;
> Please don't if we can avoid them.  anti-ranges are on the chopping
> block, so I'd prefer not to add cases where we're trying to handle them
> if we can reasonably avoid it.

It's mostly a reminder that there may be room for improvement
here.  Maybe not for ranges of sizes but possibly for ranges
of offsets (e.g., if an offset's range is the union of
[-4, -1] and [7, 9] and the destination array is 4 byes big
the access is invalid).

I've been thinking about how to handle multiple ranges when
the new range info makes them available.  I'm not sure I see
how it will be possible to retrofit the existing code to make
use of them.  It seems that even code that tries to put anti-
ranges to use today will need to change.  It will be a fun
exercise.

Martin

> 
> 
> No objections elsewhere.  So I think we just need to figure out what's
> going on with the ranges when you've got an INTEGER_CST on the RHS of an
> assignment.
> 
> jeff
>
Jeff Law Nov. 6, 2019, 8:39 p.m. UTC | #3
On 11/6/19 1:27 PM, Martin Sebor wrote:
> On 11/6/19 11:55 AM, Jeff Law wrote:
>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>> stores mention the amount of data being stored and the amount of
>>> space remaining in the destination, such as:
>>>
>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>    123 |   *p = 0;
>>>        |   ~~~^~~
>>> note: destination object declared here
>>>     45 |   char b[N];
>>>        |        ^
>>>
>>> A warning like this can take some time to analyze.  First, the size
>>> of the destination isn't mentioned and may not be easy to tell from
>>> the sources.  In the note above, when N's value is the result of
>>> some non-trivial computation, chasing it down may be a small project
>>> in and of itself.  Second, it's also not clear why the region size
>>> is zero.  It could be because the offset is exactly N, or because
>>> it's negative, or because it's in some range greater than N.
>>>
>>> Mentioning both the size of the destination object and the offset
>>> makes the existing messages clearer, are will become essential when
>>> GCC starts diagnosing overflow into allocated buffers (as my
>>> follow-on patch does).
>>>
>>> The attached patch enhances -Wstringop-overflow to do this by
>>> letting compute_objsize return the offset to its caller, doing
>>> something similar in get_stridx, and adding a new function to
>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>> like the function to replace the -Wstringop-overflow handler in
>>> builtins.c).  With the change, the note above might read something
>>> like:
>>>
>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>     45 |   char b[N];
>>>        |        ^
>>>
>>> Tested on x86_64-linux.
>>>
>>> Martin
>>>
>>> gcc-store-offset.diff
>>>
>>> gcc/ChangeLog:
>>>
>>>     * builtins.c (compute_objsize): Add an argument and set it to offset
>>>     into destination.
>>>     * builtins.h (compute_objsize): Add an argument.
>>>     * tree-object-size.c (addr_object_size): Add an argument and set it
>>>     to offset into destination.
>>>     (compute_builtin_object_size): Same.
>>>     * tree-object-size.h (compute_builtin_object_size): Add an argument.
>>>     * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>>>     to offset into destination.
>>>     (maybe_warn_overflow): New function.
>>>     (handle_store): Call maybe_warn_overflow to issue warnings.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>> messages.
>>>     * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>     * gcc.dg/Wstringop-overflow-17.c: Same.
>>>
>>
>>> Index: gcc/tree-ssa-strlen.c
>>> ===================================================================
>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>   static int get_stridx_plus_constant (strinfo *, unsigned
>>> HOST_WIDE_INT, tree);
>>>   static void handle_builtin_stxncpy (built_in_function,
>>> gimple_stmt_iterator *);
>>>   +/* Sets MINMAX to either the constant value or the range VAL is in
>>> +   and returns true on success.  */
>>> +
>>> +static bool
>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
>>> +{
>>> +  if (tree_fits_uhwi_p (val))
>>> +    {
>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>> +      return true;
>>> +    }
>>> +
>>> +  if (TREE_CODE (val) != SSA_NAME)
>>> +    return false;
>>> +
>>> +  if (rvals)
>>> +    {
>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>> +      if (gimple_assign_single_p (def)
>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>> +    {
>>> +      /* get_value_range returns [0, N] for constant assignments.  */
>>> +      val = gimple_assign_rhs1 (def);
>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>> +      return true;
>>> +    }
>> Umm, something seems really off with this hunk.  If the SSA_NAME is set
>> via a simple constant assignment, then the range ought to be a singleton
>> ie [CONST,CONST].   Is there are particular test were this is not true?
>>
>> The only way offhand I could see this happening is if originally the RHS
>> wasn't a constant, but due to optimizations it either simplified into a
>> constant or a constant was propagated into an SSA_NAME appearing on the
>> RHS.  This would have to happen between the last range analysis and the
>> point where you're making this query.
> 
> Yes, I think that's right.  Here's an example where it happens:
> 
>   void f (void)
>   {
>     char s[] = "1234";
>     unsigned n = strlen (s);
>     char vla[n];   // or malloc (n)
>     vla[n] = 0;    // n = [4, 4]
>     ...
>   }
> 
> The strlen call is folded to 4 but that's not propagated to
> the access until sometime after the strlen pass is done.
Hmm.  Are we calling set_range_info in that case?  That goes behind the
back of pass instance of vr_values.  If so, that might argue we want to
be setting it in vr_values too.

> 
> It's mostly a reminder that there may be room for improvement
> here.  Maybe not for ranges of sizes but possibly for ranges
> of offsets (e.g., if an offset's range is the union of
> [-4, -1] and [7, 9] and the destination array is 4 byes big
> the access is invalid).
I'd be surprised if this happens in practice, but maybe I'm missing
something.

> 
> I've been thinking about how to handle multiple ranges when
> the new range info makes them available.  I'm not sure I see
> how it will be possible to retrofit the existing code to make
> use of them.  It seems that even code that tries to put anti-
> ranges to use today will need to change.  It will be a fun
> exercise.
It's certainly going to be interesting.  As you may have heard in the
meeting yesterday, Aldy and Andrew are looking at expanding how many
subranges are in the representation because we're losing data with the
current representation.

Jeff
Martin Sebor Nov. 6, 2019, 9:06 p.m. UTC | #4
On 11/6/19 1:39 PM, Jeff Law wrote:
> On 11/6/19 1:27 PM, Martin Sebor wrote:
>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>> stores mention the amount of data being stored and the amount of
>>>> space remaining in the destination, such as:
>>>>
>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>     123 |   *p = 0;
>>>>         |   ~~~^~~
>>>> note: destination object declared here
>>>>      45 |   char b[N];
>>>>         |        ^
>>>>
>>>> A warning like this can take some time to analyze.  First, the size
>>>> of the destination isn't mentioned and may not be easy to tell from
>>>> the sources.  In the note above, when N's value is the result of
>>>> some non-trivial computation, chasing it down may be a small project
>>>> in and of itself.  Second, it's also not clear why the region size
>>>> is zero.  It could be because the offset is exactly N, or because
>>>> it's negative, or because it's in some range greater than N.
>>>>
>>>> Mentioning both the size of the destination object and the offset
>>>> makes the existing messages clearer, are will become essential when
>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>> follow-on patch does).
>>>>
>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>> letting compute_objsize return the offset to its caller, doing
>>>> something similar in get_stridx, and adding a new function to
>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>> like the function to replace the -Wstringop-overflow handler in
>>>> builtins.c).  With the change, the note above might read something
>>>> like:
>>>>
>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>      45 |   char b[N];
>>>>         |        ^
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> Martin
>>>>
>>>> gcc-store-offset.diff
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>      * builtins.c (compute_objsize): Add an argument and set it to offset
>>>>      into destination.
>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>      * tree-object-size.c (addr_object_size): Add an argument and set it
>>>>      to offset into destination.
>>>>      (compute_builtin_object_size): Same.
>>>>      * tree-object-size.h (compute_builtin_object_size): Add an argument.
>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>>>>      to offset into destination.
>>>>      (maybe_warn_overflow): New function.
>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>> messages.
>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>
>>>
>>>> Index: gcc/tree-ssa-strlen.c
>>>> ===================================================================
>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>> HOST_WIDE_INT, tree);
>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>> gimple_stmt_iterator *);
>>>>    +/* Sets MINMAX to either the constant value or the range VAL is in
>>>> +   and returns true on success.  */
>>>> +
>>>> +static bool
>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
>>>> +{
>>>> +  if (tree_fits_uhwi_p (val))
>>>> +    {
>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>> +      return true;
>>>> +    }
>>>> +
>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>> +    return false;
>>>> +
>>>> +  if (rvals)
>>>> +    {
>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>> +      if (gimple_assign_single_p (def)
>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>> +    {
>>>> +      /* get_value_range returns [0, N] for constant assignments.  */
>>>> +      val = gimple_assign_rhs1 (def);
>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>> +      return true;
>>>> +    }
>>> Umm, something seems really off with this hunk.  If the SSA_NAME is set
>>> via a simple constant assignment, then the range ought to be a singleton
>>> ie [CONST,CONST].   Is there are particular test were this is not true?
>>>
>>> The only way offhand I could see this happening is if originally the RHS
>>> wasn't a constant, but due to optimizations it either simplified into a
>>> constant or a constant was propagated into an SSA_NAME appearing on the
>>> RHS.  This would have to happen between the last range analysis and the
>>> point where you're making this query.
>>
>> Yes, I think that's right.  Here's an example where it happens:
>>
>>    void f (void)
>>    {
>>      char s[] = "1234";
>>      unsigned n = strlen (s);
>>      char vla[n];   // or malloc (n)
>>      vla[n] = 0;    // n = [4, 4]
>>      ...
>>    }
>>
>> The strlen call is folded to 4 but that's not propagated to
>> the access until sometime after the strlen pass is done.
> Hmm.  Are we calling set_range_info in that case?  That goes behind the
> back of pass instance of vr_values.  If so, that might argue we want to
> be setting it in vr_values too.

No, set_range_info is only called for ranges.  In this case,
handle_builtin_strlen replaces the strlen() call with 4:

   s = "1234";
   _1 = __builtin_strlen (&s);
   n_2 = (unsigned int) _1;
   a.1_8 = __builtin_alloca_with_align (_1, 8);
   (*a.1_8)[n_2] = 0;

When the access is made, the __builtin_alloca_with_align call
is found as the destination and the _1 SSA_NAME is used to
get its size.  We get back the range [4, 4].

This is only done when we record the allocation statements;
this patch doesn't do that yet.  It's meant be just the simple
infrastructure bits for the follow-up work.

>> It's mostly a reminder that there may be room for improvement
>> here.  Maybe not for ranges of sizes but possibly for ranges
>> of offsets (e.g., if an offset's range is the union of
>> [-4, -1] and [7, 9] and the destination array is 4 byes big
>> the access is invalid).
> I'd be surprised if this happens in practice, but maybe I'm missing
> something.
> 
>>
>> I've been thinking about how to handle multiple ranges when
>> the new range info makes them available.  I'm not sure I see
>> how it will be possible to retrofit the existing code to make
>> use of them.  It seems that even code that tries to put anti-
>> ranges to use today will need to change.  It will be a fun
>> exercise.
> It's certainly going to be interesting.  As you may have heard in the
> meeting yesterday, Aldy and Andrew are looking at expanding how many
> subranges are in the representation because we're losing data with the
> current representation.

To more than three?  Fun!

Martin
Martin Sebor Nov. 6, 2019, 10:34 p.m. UTC | #5
On 11/6/19 2:06 PM, Martin Sebor wrote:
> On 11/6/19 1:39 PM, Jeff Law wrote:
>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>> stores mention the amount of data being stored and the amount of
>>>>> space remaining in the destination, such as:
>>>>>
>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=] 
>>>>>
>>>>>     123 |   *p = 0;
>>>>>         |   ~~~^~~
>>>>> note: destination object declared here
>>>>>      45 |   char b[N];
>>>>>         |        ^
>>>>>
>>>>> A warning like this can take some time to analyze.  First, the size
>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>> the sources.  In the note above, when N's value is the result of
>>>>> some non-trivial computation, chasing it down may be a small project
>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>> it's negative, or because it's in some range greater than N.
>>>>>
>>>>> Mentioning both the size of the destination object and the offset
>>>>> makes the existing messages clearer, are will become essential when
>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>> follow-on patch does).
>>>>>
>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>> letting compute_objsize return the offset to its caller, doing
>>>>> something similar in get_stridx, and adding a new function to
>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>> builtins.c).  With the change, the note above might read something
>>>>> like:
>>>>>
>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>      45 |   char b[N];
>>>>>         |        ^
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> gcc-store-offset.diff
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>      * builtins.c (compute_objsize): Add an argument and set it to 
>>>>> offset
>>>>>      into destination.
>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>      * tree-object-size.c (addr_object_size): Add an argument and 
>>>>> set it
>>>>>      to offset into destination.
>>>>>      (compute_builtin_object_size): Same.
>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an 
>>>>> argument.
>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>>>>>      to offset into destination.
>>>>>      (maybe_warn_overflow): New function.
>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>> messages.
>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>
>>>>
>>>>> Index: gcc/tree-ssa-strlen.c
>>>>> ===================================================================
>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>> HOST_WIDE_INT, tree);
>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>> gimple_stmt_iterator *);
>>>>>    +/* Sets MINMAX to either the constant value or the range VAL is in
>>>>> +   and returns true on success.  */
>>>>> +
>>>>> +static bool
>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = 
>>>>> NULL)
>>>>> +{
>>>>> +  if (tree_fits_uhwi_p (val))
>>>>> +    {
>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>> +      return true;
>>>>> +    }
>>>>> +
>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>> +    return false;
>>>>> +
>>>>> +  if (rvals)
>>>>> +    {
>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>> +      if (gimple_assign_single_p (def)
>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>> +    {
>>>>> +      /* get_value_range returns [0, N] for constant assignments.  */
>>>>> +      val = gimple_assign_rhs1 (def);
>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>> +      return true;
>>>>> +    }
>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is set
>>>> via a simple constant assignment, then the range ought to be a 
>>>> singleton
>>>> ie [CONST,CONST].   Is there are particular test were this is not true?
>>>>
>>>> The only way offhand I could see this happening is if originally the 
>>>> RHS
>>>> wasn't a constant, but due to optimizations it either simplified into a
>>>> constant or a constant was propagated into an SSA_NAME appearing on the
>>>> RHS.  This would have to happen between the last range analysis and the
>>>> point where you're making this query.
>>>
>>> Yes, I think that's right.  Here's an example where it happens:
>>>
>>>    void f (void)
>>>    {
>>>      char s[] = "1234";
>>>      unsigned n = strlen (s);
>>>      char vla[n];   // or malloc (n)
>>>      vla[n] = 0;    // n = [4, 4]
>>>      ...
>>>    }
>>>
>>> The strlen call is folded to 4 but that's not propagated to
>>> the access until sometime after the strlen pass is done.
>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>> back of pass instance of vr_values.  If so, that might argue we want to
>> be setting it in vr_values too.
> 
> No, set_range_info is only called for ranges.  In this case,
> handle_builtin_strlen replaces the strlen() call with 4:
> 
>    s = "1234";
>    _1 = __builtin_strlen (&s);
>    n_2 = (unsigned int) _1;
>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>    (*a.1_8)[n_2] = 0;
> 
> When the access is made, the __builtin_alloca_with_align call
> is found as the destination and the _1 SSA_NAME is used to
> get its size.  We get back the range [4, 4].

By the way, I glossed over one detail.  The above doesn't work
exactly as is because the allocation size is the SSA_NAME _1
(with the range [4, 4]) but the index is the SSA_NAME n_2 (with
the range [0, 4]; the range is [0, 4] because it was set based
on the size of the argument to the strlen() call well before
the strlen pass even ran).

To make it work across assignments we need to propagate the strlen
results down the CFG somehow.  I'm hoping the on-demand VRP will
do this automagically.

Martin

> 
> This is only done when we record the allocation statements;
> this patch doesn't do that yet.  It's meant be just the simple
> infrastructure bits for the follow-up work.
> 
>>> It's mostly a reminder that there may be room for improvement
>>> here.  Maybe not for ranges of sizes but possibly for ranges
>>> of offsets (e.g., if an offset's range is the union of
>>> [-4, -1] and [7, 9] and the destination array is 4 byes big
>>> the access is invalid).
>> I'd be surprised if this happens in practice, but maybe I'm missing
>> something.
>>
>>>
>>> I've been thinking about how to handle multiple ranges when
>>> the new range info makes them available.  I'm not sure I see
>>> how it will be possible to retrofit the existing code to make
>>> use of them.  It seems that even code that tries to put anti-
>>> ranges to use today will need to change.  It will be a fun
>>> exercise.
>> It's certainly going to be interesting.  As you may have heard in the
>> meeting yesterday, Aldy and Andrew are looking at expanding how many
>> subranges are in the representation because we're losing data with the
>> current representation.
> 
> To more than three?  Fun!
> 
> Martin
Martin Sebor Nov. 8, 2019, 4:57 p.m. UTC | #6
On 11/6/19 2:06 PM, Martin Sebor wrote:
> On 11/6/19 1:39 PM, Jeff Law wrote:
>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>> stores mention the amount of data being stored and the amount of
>>>>> space remaining in the destination, such as:
>>>>>
>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=] 
>>>>>
>>>>>     123 |   *p = 0;
>>>>>         |   ~~~^~~
>>>>> note: destination object declared here
>>>>>      45 |   char b[N];
>>>>>         |        ^
>>>>>
>>>>> A warning like this can take some time to analyze.  First, the size
>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>> the sources.  In the note above, when N's value is the result of
>>>>> some non-trivial computation, chasing it down may be a small project
>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>> it's negative, or because it's in some range greater than N.
>>>>>
>>>>> Mentioning both the size of the destination object and the offset
>>>>> makes the existing messages clearer, are will become essential when
>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>> follow-on patch does).
>>>>>
>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>> letting compute_objsize return the offset to its caller, doing
>>>>> something similar in get_stridx, and adding a new function to
>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>> builtins.c).  With the change, the note above might read something
>>>>> like:
>>>>>
>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>      45 |   char b[N];
>>>>>         |        ^
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> gcc-store-offset.diff
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>      * builtins.c (compute_objsize): Add an argument and set it to 
>>>>> offset
>>>>>      into destination.
>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>      * tree-object-size.c (addr_object_size): Add an argument and 
>>>>> set it
>>>>>      to offset into destination.
>>>>>      (compute_builtin_object_size): Same.
>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an 
>>>>> argument.
>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>>>>>      to offset into destination.
>>>>>      (maybe_warn_overflow): New function.
>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>> messages.
>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>
>>>>
>>>>> Index: gcc/tree-ssa-strlen.c
>>>>> ===================================================================
>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>> HOST_WIDE_INT, tree);
>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>> gimple_stmt_iterator *);
>>>>>    +/* Sets MINMAX to either the constant value or the range VAL is in
>>>>> +   and returns true on success.  */
>>>>> +
>>>>> +static bool
>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = 
>>>>> NULL)
>>>>> +{
>>>>> +  if (tree_fits_uhwi_p (val))
>>>>> +    {
>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>> +      return true;
>>>>> +    }
>>>>> +
>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>> +    return false;
>>>>> +
>>>>> +  if (rvals)
>>>>> +    {
>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>> +      if (gimple_assign_single_p (def)
>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>> +    {
>>>>> +      /* get_value_range returns [0, N] for constant assignments.  */
>>>>> +      val = gimple_assign_rhs1 (def);
>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>> +      return true;
>>>>> +    }
>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is set
>>>> via a simple constant assignment, then the range ought to be a 
>>>> singleton
>>>> ie [CONST,CONST].   Is there are particular test were this is not true?
>>>>
>>>> The only way offhand I could see this happening is if originally the 
>>>> RHS
>>>> wasn't a constant, but due to optimizations it either simplified into a
>>>> constant or a constant was propagated into an SSA_NAME appearing on the
>>>> RHS.  This would have to happen between the last range analysis and the
>>>> point where you're making this query.
>>>
>>> Yes, I think that's right.  Here's an example where it happens:
>>>
>>>    void f (void)
>>>    {
>>>      char s[] = "1234";
>>>      unsigned n = strlen (s);
>>>      char vla[n];   // or malloc (n)
>>>      vla[n] = 0;    // n = [4, 4]
>>>      ...
>>>    }
>>>
>>> The strlen call is folded to 4 but that's not propagated to
>>> the access until sometime after the strlen pass is done.
>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>> back of pass instance of vr_values.  If so, that might argue we want to
>> be setting it in vr_values too.
> 
> No, set_range_info is only called for ranges.  In this case,
> handle_builtin_strlen replaces the strlen() call with 4:
> 
>    s = "1234";
>    _1 = __builtin_strlen (&s);
>    n_2 = (unsigned int) _1;
>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>    (*a.1_8)[n_2] = 0;
> 
> When the access is made, the __builtin_alloca_with_align call
> is found as the destination and the _1 SSA_NAME is used to
> get its size.  We get back the range [4, 4].
> 
> This is only done when we record the allocation statements;
> this patch doesn't do that yet.  It's meant be just the simple
> infrastructure bits for the follow-up work.

Did I answer your question or is there something else?

Martin
Bernhard Reutner-Fischer Nov. 11, 2019, 11:30 a.m. UTC | #7
On 8 November 2019 17:57:51 CET, Martin Sebor <msebor@gmail.com> wrote:
>On 11/6/19 2:06 PM, Martin Sebor wrote:
>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>> stores mention the amount of data being stored and the amount of
>>>>>> space remaining in the destination, such as:
>>>>>>
>>>>>>
>warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>
>>>>>>
>>>>>>     123 |   *p = 0;
>>>>>>         |   ~~~^~~
>>>>>> note: destination object declared here
>>>>>>      45 |   char b[N];
>>>>>>         |        ^
>>>>>>
>>>>>>
>A warning like this can take some time to analyze.  First, the size
>>>>>>
>of the destination isn't mentioned and may not be easy to tell from
>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>
>some non-trivial computation, chasing it down may be a small project
>>>>>>
>in and of itself.  Second, it's also not clear why the region size
>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>
>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>
>makes the existing messages clearer, are will become essential when
>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>> follow-on patch does).
>>>>>>
>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>> something similar in get_stridx, and adding a new function to
>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>
>builtins.c).  With the change, the note above might read something
>>>>>> like:
>>>>>>
>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>      45 |   char b[N];
>>>>>>         |        ^
>>>>>>

Is "to object" correct? Into? I somehow fund it hard to read as proposed.

thanks,
Martin Sebor Nov. 11, 2019, 3:43 p.m. UTC | #8
On 11/11/19 4:30 AM, Bernhard Reutner-Fischer wrote:
> On 8 November 2019 17:57:51 CET, Martin Sebor <msebor@gmail.com> wrote:
>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>> space remaining in the destination, such as:
>>>>>>>
>>>>>>>
>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>
>>>>>>>
>>>>>>>      123 |   *p = 0;
>>>>>>>          |   ~~~^~~
>>>>>>> note: destination object declared here
>>>>>>>       45 |   char b[N];
>>>>>>>          |        ^
>>>>>>>
>>>>>>>
>> A warning like this can take some time to analyze.  First, the size
>>>>>>>
>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>
>> some non-trivial computation, chasing it down may be a small project
>>>>>>>
>> in and of itself.  Second, it's also not clear why the region size
>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>
>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>
>> makes the existing messages clearer, are will become essential when
>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>> follow-on patch does).
>>>>>>>
>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>
>> builtins.c).  With the change, the note above might read something
>>>>>>> like:
>>>>>>>
>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>       45 |   char b[N];
>>>>>>>          |        ^
>>>>>>>
> 
> Is "to object" correct? Into? I somehow fund it hard to read as proposed.

I agree.  Other messages use "offset from" so let me change it to
that.

Thanks for the suggestion!

Martin
Jeff Law Nov. 12, 2019, 5:07 a.m. UTC | #9
On 11/6/19 2:06 PM, Martin Sebor wrote:
> On 11/6/19 1:39 PM, Jeff Law wrote:
>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>> stores mention the amount of data being stored and the amount of
>>>>> space remaining in the destination, such as:
>>>>>
>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>
>>>>>     123 |   *p = 0;
>>>>>         |   ~~~^~~
>>>>> note: destination object declared here
>>>>>      45 |   char b[N];
>>>>>         |        ^
>>>>>
>>>>> A warning like this can take some time to analyze.  First, the size
>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>> the sources.  In the note above, when N's value is the result of
>>>>> some non-trivial computation, chasing it down may be a small project
>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>> it's negative, or because it's in some range greater than N.
>>>>>
>>>>> Mentioning both the size of the destination object and the offset
>>>>> makes the existing messages clearer, are will become essential when
>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>> follow-on patch does).
>>>>>
>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>> letting compute_objsize return the offset to its caller, doing
>>>>> something similar in get_stridx, and adding a new function to
>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>> builtins.c).  With the change, the note above might read something
>>>>> like:
>>>>>
>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>      45 |   char b[N];
>>>>>         |        ^
>>>>>
>>>>> Tested on x86_64-linux.
>>>>>
>>>>> Martin
>>>>>
>>>>> gcc-store-offset.diff
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
>>>>> offset
>>>>>      into destination.
>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
>>>>> set it
>>>>>      to offset into destination.
>>>>>      (compute_builtin_object_size): Same.
>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
>>>>> argument.
>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
>>>>>      to offset into destination.
>>>>>      (maybe_warn_overflow): New function.
>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>> messages.
>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>
>>>>
>>>>> Index: gcc/tree-ssa-strlen.c
>>>>> ===================================================================
>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>> HOST_WIDE_INT, tree);
>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>> gimple_stmt_iterator *);
>>>>>    +/* Sets MINMAX to either the constant value or the range VAL is in
>>>>> +   and returns true on success.  */
>>>>> +
>>>>> +static bool
>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>> NULL)
>>>>> +{
>>>>> +  if (tree_fits_uhwi_p (val))
>>>>> +    {
>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>> +      return true;
>>>>> +    }
>>>>> +
>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>> +    return false;
>>>>> +
>>>>> +  if (rvals)
>>>>> +    {
>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>> +      if (gimple_assign_single_p (def)
>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>> +    {
>>>>> +      /* get_value_range returns [0, N] for constant assignments.  */
>>>>> +      val = gimple_assign_rhs1 (def);
>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>> +      return true;
>>>>> +    }
>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is set
>>>> via a simple constant assignment, then the range ought to be a
>>>> singleton
>>>> ie [CONST,CONST].   Is there are particular test were this is not true?
>>>>
>>>> The only way offhand I could see this happening is if originally the
>>>> RHS
>>>> wasn't a constant, but due to optimizations it either simplified into a
>>>> constant or a constant was propagated into an SSA_NAME appearing on the
>>>> RHS.  This would have to happen between the last range analysis and the
>>>> point where you're making this query.
>>>
>>> Yes, I think that's right.  Here's an example where it happens:
>>>
>>>    void f (void)
>>>    {
>>>      char s[] = "1234";
>>>      unsigned n = strlen (s);
>>>      char vla[n];   // or malloc (n)
>>>      vla[n] = 0;    // n = [4, 4]
>>>      ...
>>>    }
>>>
>>> The strlen call is folded to 4 but that's not propagated to
>>> the access until sometime after the strlen pass is done.
>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>> back of pass instance of vr_values.  If so, that might argue we want to
>> be setting it in vr_values too.
> 
> No, set_range_info is only called for ranges.  In this case,
> handle_builtin_strlen replaces the strlen() call with 4:
> 
>   s = "1234";
>   _1 = __builtin_strlen (&s);
>   n_2 = (unsigned int) _1;
>   a.1_8 = __builtin_alloca_with_align (_1, 8);
>   (*a.1_8)[n_2] = 0;
Right.  But at the point where we make the substitution for the call on
the RHS the range is a singleton and we could set the range of _1 to [4,
4].  We could also set its SSA_NAME_VALUE to 4.  Hell, we could even
forward propagate the constant to the uses.  Any/all of those would seem
better than the hack in question.


> 
> When the access is made, the __builtin_alloca_with_align call
> is found as the destination and the _1 SSA_NAME is used to
> get its size.  We get back the range [4, 4].
Now I'm confused.  If we're getting [4, 4], then that's exactly what we
want.

Jeff
Jeff Law Nov. 12, 2019, 5:10 a.m. UTC | #10
On 11/6/19 3:34 PM, Martin Sebor wrote:
> On 11/6/19 2:06 PM, Martin Sebor wrote:
>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>> stores mention the amount of data being stored and the amount of
>>>>>> space remaining in the destination, such as:
>>>>>>
>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>>
>>>>>>     123 |   *p = 0;
>>>>>>         |   ~~~^~~
>>>>>> note: destination object declared here
>>>>>>      45 |   char b[N];
>>>>>>         |        ^
>>>>>>
>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>
>>>>>> Mentioning both the size of the destination object and the offset
>>>>>> makes the existing messages clearer, are will become essential when
>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>> follow-on patch does).
>>>>>>
>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>> something similar in get_stridx, and adding a new function to
>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>> builtins.c).  With the change, the note above might read something
>>>>>> like:
>>>>>>
>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>      45 |   char b[N];
>>>>>>         |        ^
>>>>>>
>>>>>> Tested on x86_64-linux.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> gcc-store-offset.diff
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
>>>>>> offset
>>>>>>      into destination.
>>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
>>>>>> set it
>>>>>>      to offset into destination.
>>>>>>      (compute_builtin_object_size): Same.
>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>> argument.
>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>> set it
>>>>>>      to offset into destination.
>>>>>>      (maybe_warn_overflow): New function.
>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>> messages.
>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>
>>>>>
>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>> ===================================================================
>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>> HOST_WIDE_INT, tree);
>>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>>> gimple_stmt_iterator *);
>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
>>>>>> is in
>>>>>> +   and returns true on success.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>> NULL)
>>>>>> +{
>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>> +    {
>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +
>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (rvals)
>>>>>> +    {
>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>> +      if (gimple_assign_single_p (def)
>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>> +    {
>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>> assignments.  */
>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>> +      return true;
>>>>>> +    }
>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>> set
>>>>> via a simple constant assignment, then the range ought to be a
>>>>> singleton
>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>> true?
>>>>>
>>>>> The only way offhand I could see this happening is if originally
>>>>> the RHS
>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>> into a
>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>> the
>>>>> RHS.  This would have to happen between the last range analysis and
>>>>> the
>>>>> point where you're making this query.
>>>>
>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>
>>>>    void f (void)
>>>>    {
>>>>      char s[] = "1234";
>>>>      unsigned n = strlen (s);
>>>>      char vla[n];   // or malloc (n)
>>>>      vla[n] = 0;    // n = [4, 4]
>>>>      ...
>>>>    }
>>>>
>>>> The strlen call is folded to 4 but that's not propagated to
>>>> the access until sometime after the strlen pass is done.
>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>> back of pass instance of vr_values.  If so, that might argue we want to
>>> be setting it in vr_values too.
>>
>> No, set_range_info is only called for ranges.  In this case,
>> handle_builtin_strlen replaces the strlen() call with 4:
>>
>>    s = "1234";
>>    _1 = __builtin_strlen (&s);
>>    n_2 = (unsigned int) _1;
>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>>    (*a.1_8)[n_2] = 0;
>>
>> When the access is made, the __builtin_alloca_with_align call
>> is found as the destination and the _1 SSA_NAME is used to
>> get its size.  We get back the range [4, 4].
> 
> By the way, I glossed over one detail.  The above doesn't work
> exactly as is because the allocation size is the SSA_NAME _1
> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
> the range [0, 4]; the range is [0, 4] because it was set based
> on the size of the argument to the strlen() call well before
> the strlen pass even ran).
Which would tend to argue that we should forward propagate the constant
to the uses of _1.  That should expose that the RHS of the assignment to
n_2 is a constant as well.


> 
> To make it work across assignments we need to propagate the strlen
> results down the CFG somehow.  I'm hoping the on-demand VRP will
> do this automagically.
It would, but it's probably more heavyweight than we need.  We just need
to forward propagate the discovered constant to the use points and pick
up any secondary opportunities that arise.

jeff
Richard Biener Nov. 12, 2019, 8:15 a.m. UTC | #11
On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>
> On 11/6/19 3:34 PM, Martin Sebor wrote:
> > On 11/6/19 2:06 PM, Martin Sebor wrote:
> >> On 11/6/19 1:39 PM, Jeff Law wrote:
> >>> On 11/6/19 1:27 PM, Martin Sebor wrote:
> >>>> On 11/6/19 11:55 AM, Jeff Law wrote:
> >>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
> >>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
> >>>>>> stores mention the amount of data being stored and the amount of
> >>>>>> space remaining in the destination, such as:
> >>>>>>
> >>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
> >>>>>>
> >>>>>>     123 |   *p = 0;
> >>>>>>         |   ~~~^~~
> >>>>>> note: destination object declared here
> >>>>>>      45 |   char b[N];
> >>>>>>         |        ^
> >>>>>>
> >>>>>> A warning like this can take some time to analyze.  First, the size
> >>>>>> of the destination isn't mentioned and may not be easy to tell from
> >>>>>> the sources.  In the note above, when N's value is the result of
> >>>>>> some non-trivial computation, chasing it down may be a small project
> >>>>>> in and of itself.  Second, it's also not clear why the region size
> >>>>>> is zero.  It could be because the offset is exactly N, or because
> >>>>>> it's negative, or because it's in some range greater than N.
> >>>>>>
> >>>>>> Mentioning both the size of the destination object and the offset
> >>>>>> makes the existing messages clearer, are will become essential when
> >>>>>> GCC starts diagnosing overflow into allocated buffers (as my
> >>>>>> follow-on patch does).
> >>>>>>
> >>>>>> The attached patch enhances -Wstringop-overflow to do this by
> >>>>>> letting compute_objsize return the offset to its caller, doing
> >>>>>> something similar in get_stridx, and adding a new function to
> >>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
> >>>>>> like the function to replace the -Wstringop-overflow handler in
> >>>>>> builtins.c).  With the change, the note above might read something
> >>>>>> like:
> >>>>>>
> >>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
> >>>>>>      45 |   char b[N];
> >>>>>>         |        ^
> >>>>>>
> >>>>>> Tested on x86_64-linux.
> >>>>>>
> >>>>>> Martin
> >>>>>>
> >>>>>> gcc-store-offset.diff
> >>>>>>
> >>>>>> gcc/ChangeLog:
> >>>>>>
> >>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
> >>>>>> offset
> >>>>>>      into destination.
> >>>>>>      * builtins.h (compute_objsize): Add an argument.
> >>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
> >>>>>> set it
> >>>>>>      to offset into destination.
> >>>>>>      (compute_builtin_object_size): Same.
> >>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
> >>>>>> argument.
> >>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
> >>>>>> set it
> >>>>>>      to offset into destination.
> >>>>>>      (maybe_warn_overflow): New function.
> >>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
> >>>>>>
> >>>>>> gcc/testsuite/ChangeLog:
> >>>>>>
> >>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
> >>>>>> messages.
> >>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
> >>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
> >>>>>>
> >>>>>
> >>>>>> Index: gcc/tree-ssa-strlen.c
> >>>>>> ===================================================================
> >>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
> >>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
> >>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
> >>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
> >>>>>> HOST_WIDE_INT, tree);
> >>>>>>    static void handle_builtin_stxncpy (built_in_function,
> >>>>>> gimple_stmt_iterator *);
> >>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
> >>>>>> is in
> >>>>>> +   and returns true on success.  */
> >>>>>> +
> >>>>>> +static bool
> >>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
> >>>>>> NULL)
> >>>>>> +{
> >>>>>> +  if (tree_fits_uhwi_p (val))
> >>>>>> +    {
> >>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>> +      return true;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +  if (TREE_CODE (val) != SSA_NAME)
> >>>>>> +    return false;
> >>>>>> +
> >>>>>> +  if (rvals)
> >>>>>> +    {
> >>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
> >>>>>> +      if (gimple_assign_single_p (def)
> >>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
> >>>>>> +    {
> >>>>>> +      /* get_value_range returns [0, N] for constant
> >>>>>> assignments.  */
> >>>>>> +      val = gimple_assign_rhs1 (def);
> >>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>> +      return true;
> >>>>>> +    }
> >>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
> >>>>> set
> >>>>> via a simple constant assignment, then the range ought to be a
> >>>>> singleton
> >>>>> ie [CONST,CONST].   Is there are particular test were this is not
> >>>>> true?
> >>>>>
> >>>>> The only way offhand I could see this happening is if originally
> >>>>> the RHS
> >>>>> wasn't a constant, but due to optimizations it either simplified
> >>>>> into a
> >>>>> constant or a constant was propagated into an SSA_NAME appearing on
> >>>>> the
> >>>>> RHS.  This would have to happen between the last range analysis and
> >>>>> the
> >>>>> point where you're making this query.
> >>>>
> >>>> Yes, I think that's right.  Here's an example where it happens:
> >>>>
> >>>>    void f (void)
> >>>>    {
> >>>>      char s[] = "1234";
> >>>>      unsigned n = strlen (s);
> >>>>      char vla[n];   // or malloc (n)
> >>>>      vla[n] = 0;    // n = [4, 4]
> >>>>      ...
> >>>>    }
> >>>>
> >>>> The strlen call is folded to 4 but that's not propagated to
> >>>> the access until sometime after the strlen pass is done.
> >>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
> >>> back of pass instance of vr_values.  If so, that might argue we want to
> >>> be setting it in vr_values too.
> >>
> >> No, set_range_info is only called for ranges.  In this case,
> >> handle_builtin_strlen replaces the strlen() call with 4:
> >>
> >>    s = "1234";
> >>    _1 = __builtin_strlen (&s);
> >>    n_2 = (unsigned int) _1;
> >>    a.1_8 = __builtin_alloca_with_align (_1, 8);
> >>    (*a.1_8)[n_2] = 0;
> >>
> >> When the access is made, the __builtin_alloca_with_align call
> >> is found as the destination and the _1 SSA_NAME is used to
> >> get its size.  We get back the range [4, 4].
> >
> > By the way, I glossed over one detail.  The above doesn't work
> > exactly as is because the allocation size is the SSA_NAME _1
> > (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
> > the range [0, 4]; the range is [0, 4] because it was set based
> > on the size of the argument to the strlen() call well before
> > the strlen pass even ran).
> Which would tend to argue that we should forward propagate the constant
> to the uses of _1.  That should expose that the RHS of the assignment to
> n_2 is a constant as well.
>
>
> >
> > To make it work across assignments we need to propagate the strlen
> > results down the CFG somehow.  I'm hoping the on-demand VRP will
> > do this automagically.
> It would, but it's probably more heavyweight than we need.  We just need
> to forward propagate the discovered constant to the use points and pick
> up any secondary opportunities that arise.

Yes.  And the usual way of doing this is to keep a constant-and-copy
lattice (and for copies you'd need to track availability) and before optimizing
a stmt substitute its operands with the lattice contents.

forwprop has a scheme that can be followed doing a RPO walk, strlen
does a DOM walk, there you can follow what DOM/PRE elimination do
(for tracking copy availability - if you just track constants you can
elide that).

Richard.

> jeff
>
Richard Biener Nov. 12, 2019, 8:16 a.m. UTC | #12
On Tue, Nov 12, 2019 at 9:15 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
> >
> > On 11/6/19 3:34 PM, Martin Sebor wrote:
> > > On 11/6/19 2:06 PM, Martin Sebor wrote:
> > >> On 11/6/19 1:39 PM, Jeff Law wrote:
> > >>> On 11/6/19 1:27 PM, Martin Sebor wrote:
> > >>>> On 11/6/19 11:55 AM, Jeff Law wrote:
> > >>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
> > >>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
> > >>>>>> stores mention the amount of data being stored and the amount of
> > >>>>>> space remaining in the destination, such as:
> > >>>>>>
> > >>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
> > >>>>>>
> > >>>>>>     123 |   *p = 0;
> > >>>>>>         |   ~~~^~~
> > >>>>>> note: destination object declared here
> > >>>>>>      45 |   char b[N];
> > >>>>>>         |        ^
> > >>>>>>
> > >>>>>> A warning like this can take some time to analyze.  First, the size
> > >>>>>> of the destination isn't mentioned and may not be easy to tell from
> > >>>>>> the sources.  In the note above, when N's value is the result of
> > >>>>>> some non-trivial computation, chasing it down may be a small project
> > >>>>>> in and of itself.  Second, it's also not clear why the region size
> > >>>>>> is zero.  It could be because the offset is exactly N, or because
> > >>>>>> it's negative, or because it's in some range greater than N.
> > >>>>>>
> > >>>>>> Mentioning both the size of the destination object and the offset
> > >>>>>> makes the existing messages clearer, are will become essential when
> > >>>>>> GCC starts diagnosing overflow into allocated buffers (as my
> > >>>>>> follow-on patch does).
> > >>>>>>
> > >>>>>> The attached patch enhances -Wstringop-overflow to do this by
> > >>>>>> letting compute_objsize return the offset to its caller, doing
> > >>>>>> something similar in get_stridx, and adding a new function to
> > >>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
> > >>>>>> like the function to replace the -Wstringop-overflow handler in
> > >>>>>> builtins.c).  With the change, the note above might read something
> > >>>>>> like:
> > >>>>>>
> > >>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
> > >>>>>>      45 |   char b[N];
> > >>>>>>         |        ^
> > >>>>>>
> > >>>>>> Tested on x86_64-linux.
> > >>>>>>
> > >>>>>> Martin
> > >>>>>>
> > >>>>>> gcc-store-offset.diff
> > >>>>>>
> > >>>>>> gcc/ChangeLog:
> > >>>>>>
> > >>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
> > >>>>>> offset
> > >>>>>>      into destination.
> > >>>>>>      * builtins.h (compute_objsize): Add an argument.
> > >>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
> > >>>>>> set it
> > >>>>>>      to offset into destination.
> > >>>>>>      (compute_builtin_object_size): Same.
> > >>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
> > >>>>>> argument.
> > >>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
> > >>>>>> set it
> > >>>>>>      to offset into destination.
> > >>>>>>      (maybe_warn_overflow): New function.
> > >>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
> > >>>>>>
> > >>>>>> gcc/testsuite/ChangeLog:
> > >>>>>>
> > >>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
> > >>>>>> messages.
> > >>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
> > >>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
> > >>>>>>
> > >>>>>
> > >>>>>> Index: gcc/tree-ssa-strlen.c
> > >>>>>> ===================================================================
> > >>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
> > >>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
> > >>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
> > >>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
> > >>>>>> HOST_WIDE_INT, tree);
> > >>>>>>    static void handle_builtin_stxncpy (built_in_function,
> > >>>>>> gimple_stmt_iterator *);
> > >>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
> > >>>>>> is in
> > >>>>>> +   and returns true on success.  */
> > >>>>>> +
> > >>>>>> +static bool
> > >>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
> > >>>>>> NULL)
> > >>>>>> +{
> > >>>>>> +  if (tree_fits_uhwi_p (val))
> > >>>>>> +    {
> > >>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> > >>>>>> +      return true;
> > >>>>>> +    }
> > >>>>>> +
> > >>>>>> +  if (TREE_CODE (val) != SSA_NAME)
> > >>>>>> +    return false;
> > >>>>>> +
> > >>>>>> +  if (rvals)
> > >>>>>> +    {
> > >>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
> > >>>>>> +      if (gimple_assign_single_p (def)
> > >>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
> > >>>>>> +    {
> > >>>>>> +      /* get_value_range returns [0, N] for constant
> > >>>>>> assignments.  */
> > >>>>>> +      val = gimple_assign_rhs1 (def);
> > >>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> > >>>>>> +      return true;
> > >>>>>> +    }
> > >>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
> > >>>>> set
> > >>>>> via a simple constant assignment, then the range ought to be a
> > >>>>> singleton
> > >>>>> ie [CONST,CONST].   Is there are particular test were this is not
> > >>>>> true?
> > >>>>>
> > >>>>> The only way offhand I could see this happening is if originally
> > >>>>> the RHS
> > >>>>> wasn't a constant, but due to optimizations it either simplified
> > >>>>> into a
> > >>>>> constant or a constant was propagated into an SSA_NAME appearing on
> > >>>>> the
> > >>>>> RHS.  This would have to happen between the last range analysis and
> > >>>>> the
> > >>>>> point where you're making this query.
> > >>>>
> > >>>> Yes, I think that's right.  Here's an example where it happens:
> > >>>>
> > >>>>    void f (void)
> > >>>>    {
> > >>>>      char s[] = "1234";
> > >>>>      unsigned n = strlen (s);
> > >>>>      char vla[n];   // or malloc (n)
> > >>>>      vla[n] = 0;    // n = [4, 4]
> > >>>>      ...
> > >>>>    }
> > >>>>
> > >>>> The strlen call is folded to 4 but that's not propagated to
> > >>>> the access until sometime after the strlen pass is done.
> > >>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
> > >>> back of pass instance of vr_values.  If so, that might argue we want to
> > >>> be setting it in vr_values too.
> > >>
> > >> No, set_range_info is only called for ranges.  In this case,
> > >> handle_builtin_strlen replaces the strlen() call with 4:
> > >>
> > >>    s = "1234";
> > >>    _1 = __builtin_strlen (&s);
> > >>    n_2 = (unsigned int) _1;
> > >>    a.1_8 = __builtin_alloca_with_align (_1, 8);
> > >>    (*a.1_8)[n_2] = 0;
> > >>
> > >> When the access is made, the __builtin_alloca_with_align call
> > >> is found as the destination and the _1 SSA_NAME is used to
> > >> get its size.  We get back the range [4, 4].
> > >
> > > By the way, I glossed over one detail.  The above doesn't work
> > > exactly as is because the allocation size is the SSA_NAME _1
> > > (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
> > > the range [0, 4]; the range is [0, 4] because it was set based
> > > on the size of the argument to the strlen() call well before
> > > the strlen pass even ran).
> > Which would tend to argue that we should forward propagate the constant
> > to the uses of _1.  That should expose that the RHS of the assignment to
> > n_2 is a constant as well.
> >
> >
> > >
> > > To make it work across assignments we need to propagate the strlen
> > > results down the CFG somehow.  I'm hoping the on-demand VRP will
> > > do this automagically.
> > It would, but it's probably more heavyweight than we need.  We just need
> > to forward propagate the discovered constant to the use points and pick
> > up any secondary opportunities that arise.
>
> Yes.  And the usual way of doing this is to keep a constant-and-copy
> lattice (and for copies you'd need to track availability) and before optimizing
> a stmt substitute its operands with the lattice contents.
>
> forwprop has a scheme that can be followed doing a RPO walk, strlen
> does a DOM walk, there you can follow what DOM/PRE elimination do
> (for tracking copy availability - if you just track constants you can
> elide that).

I guess we could enhance domwalk with lattice tracking utilities as well
(in a derived class).

> Richard.
>
> > jeff
> >
Martin Sebor Nov. 12, 2019, 3:45 p.m. UTC | #13
On 11/11/19 10:10 PM, Jeff Law wrote:
> On 11/6/19 3:34 PM, Martin Sebor wrote:
>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>> space remaining in the destination, such as:
>>>>>>>
>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>>>
>>>>>>>      123 |   *p = 0;
>>>>>>>          |   ~~~^~~
>>>>>>> note: destination object declared here
>>>>>>>       45 |   char b[N];
>>>>>>>          |        ^
>>>>>>>
>>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>
>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>> makes the existing messages clearer, are will become essential when
>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>> follow-on patch does).
>>>>>>>
>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>> builtins.c).  With the change, the note above might read something
>>>>>>> like:
>>>>>>>
>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>       45 |   char b[N];
>>>>>>>          |        ^
>>>>>>>
>>>>>>> Tested on x86_64-linux.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> gcc-store-offset.diff
>>>>>>>
>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>>       * builtins.c (compute_objsize): Add an argument and set it to
>>>>>>> offset
>>>>>>>       into destination.
>>>>>>>       * builtins.h (compute_objsize): Add an argument.
>>>>>>>       * tree-object-size.c (addr_object_size): Add an argument and
>>>>>>> set it
>>>>>>>       to offset into destination.
>>>>>>>       (compute_builtin_object_size): Same.
>>>>>>>       * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>> argument.
>>>>>>>       * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>> set it
>>>>>>>       to offset into destination.
>>>>>>>       (maybe_warn_overflow): New function.
>>>>>>>       (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>>       * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>>> messages.
>>>>>>>       * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>       * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>
>>>>>>
>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>     static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>     static void handle_builtin_stxncpy (built_in_function,
>>>>>>> gimple_stmt_iterator *);
>>>>>>>     +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>> is in
>>>>>>> +   and returns true on success.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>>> NULL)
>>>>>>> +{
>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>> +    {
>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  if (rvals)
>>>>>>> +    {
>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>> +    {
>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>> assignments.  */
>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>>> set
>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>> singleton
>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>> true?
>>>>>>
>>>>>> The only way offhand I could see this happening is if originally
>>>>>> the RHS
>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>> into a
>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>>> the
>>>>>> RHS.  This would have to happen between the last range analysis and
>>>>>> the
>>>>>> point where you're making this query.
>>>>>
>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>
>>>>>     void f (void)
>>>>>     {
>>>>>       char s[] = "1234";
>>>>>       unsigned n = strlen (s);
>>>>>       char vla[n];   // or malloc (n)
>>>>>       vla[n] = 0;    // n = [4, 4]
>>>>>       ...
>>>>>     }
>>>>>
>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>> the access until sometime after the strlen pass is done.
>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>>> back of pass instance of vr_values.  If so, that might argue we want to
>>>> be setting it in vr_values too.
>>>
>>> No, set_range_info is only called for ranges.  In this case,
>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>
>>>     s = "1234";
>>>     _1 = __builtin_strlen (&s);
>>>     n_2 = (unsigned int) _1;
>>>     a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>     (*a.1_8)[n_2] = 0;
>>>
>>> When the access is made, the __builtin_alloca_with_align call
>>> is found as the destination and the _1 SSA_NAME is used to
>>> get its size.  We get back the range [4, 4].
>>
>> By the way, I glossed over one detail.  The above doesn't work
>> exactly as is because the allocation size is the SSA_NAME _1
>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>> the range [0, 4]; the range is [0, 4] because it was set based
>> on the size of the argument to the strlen() call well before
>> the strlen pass even ran).
> Which would tend to argue that we should forward propagate the constant
> to the uses of _1.  That should expose that the RHS of the assignment to
> n_2 is a constant as well.
> 
> 
>>
>> To make it work across assignments we need to propagate the strlen
>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>> do this automagically.
> It would, but it's probably more heavyweight than we need.  We just need
> to forward propagate the discovered constant to the use points and pick
> up any secondary opportunities that arise.

How do I do that?

Martin
Jeff Law Nov. 12, 2019, 5:54 p.m. UTC | #14
On 11/12/19 1:15 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>>
>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>> space remaining in the destination, such as:
>>>>>>>>
>>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>>>>
>>>>>>>>     123 |   *p = 0;
>>>>>>>>         |   ~~~^~~
>>>>>>>> note: destination object declared here
>>>>>>>>      45 |   char b[N];
>>>>>>>>         |        ^
>>>>>>>>
>>>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>
>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>> makes the existing messages clearer, are will become essential when
>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>> follow-on patch does).
>>>>>>>>
>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>> builtins.c).  With the change, the note above might read something
>>>>>>>> like:
>>>>>>>>
>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>      45 |   char b[N];
>>>>>>>>         |        ^
>>>>>>>>
>>>>>>>> Tested on x86_64-linux.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> gcc-store-offset.diff
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
>>>>>>>> offset
>>>>>>>>      into destination.
>>>>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
>>>>>>>> set it
>>>>>>>>      to offset into destination.
>>>>>>>>      (compute_builtin_object_size): Same.
>>>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>> argument.
>>>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>> set it
>>>>>>>>      to offset into destination.
>>>>>>>>      (maybe_warn_overflow): New function.
>>>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>>>> messages.
>>>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>
>>>>>>>
>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>> ===================================================================
>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>> is in
>>>>>>>> +   and returns true on success.  */
>>>>>>>> +
>>>>>>>> +static bool
>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>>>> NULL)
>>>>>>>> +{
>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>> +    {
>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>> +      return true;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>> +    return false;
>>>>>>>> +
>>>>>>>> +  if (rvals)
>>>>>>>> +    {
>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>> +    {
>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>> assignments.  */
>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>> +      return true;
>>>>>>>> +    }
>>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>>>> set
>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>> singleton
>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>> true?
>>>>>>>
>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>> the RHS
>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>> into a
>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>>>> the
>>>>>>> RHS.  This would have to happen between the last range analysis and
>>>>>>> the
>>>>>>> point where you're making this query.
>>>>>>
>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>
>>>>>>    void f (void)
>>>>>>    {
>>>>>>      char s[] = "1234";
>>>>>>      unsigned n = strlen (s);
>>>>>>      char vla[n];   // or malloc (n)
>>>>>>      vla[n] = 0;    // n = [4, 4]
>>>>>>      ...
>>>>>>    }
>>>>>>
>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>> the access until sometime after the strlen pass is done.
>>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>>>> back of pass instance of vr_values.  If so, that might argue we want to
>>>>> be setting it in vr_values too.
>>>>
>>>> No, set_range_info is only called for ranges.  In this case,
>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>
>>>>    s = "1234";
>>>>    _1 = __builtin_strlen (&s);
>>>>    n_2 = (unsigned int) _1;
>>>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>    (*a.1_8)[n_2] = 0;
>>>>
>>>> When the access is made, the __builtin_alloca_with_align call
>>>> is found as the destination and the _1 SSA_NAME is used to
>>>> get its size.  We get back the range [4, 4].
>>>
>>> By the way, I glossed over one detail.  The above doesn't work
>>> exactly as is because the allocation size is the SSA_NAME _1
>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>> the range [0, 4]; the range is [0, 4] because it was set based
>>> on the size of the argument to the strlen() call well before
>>> the strlen pass even ran).
>> Which would tend to argue that we should forward propagate the constant
>> to the uses of _1.  That should expose that the RHS of the assignment to
>> n_2 is a constant as well.
>>
>>
>>>
>>> To make it work across assignments we need to propagate the strlen
>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>> do this automagically.
>> It would, but it's probably more heavyweight than we need.  We just need
>> to forward propagate the discovered constant to the use points and pick
>> up any secondary opportunities that arise.
> 
> Yes.  And the usual way of doing this is to keep a constant-and-copy
> lattice (and for copies you'd need to track availability) and before optimizing
> a stmt substitute its operands with the lattice contents.
> 
> forwprop has a scheme that can be followed doing a RPO walk, strlen
> does a DOM walk, there you can follow what DOM/PRE elimination do
> (for tracking copy availability - if you just track constants you can
> elide that).
I'm also note sure handling copies is all that interesting here and if
we just handle constants/invariants, then it's pretty easy.

Whenever we replace a strlen call with a const, we put the LHS (assuming
its an SSA_NAME) of the statement on a worklist.

We pull items off the worklist and propagate the value to the use points
and try to simplify the resulting statement.  If the RHS of the use
point simplified to a constant, then put the LHS of the use statement
onto the worklist.  Iterate until the list is empty.

That would capture everything of interest I suspect and ought to be cheap.

jeff



I think whenever we substitute a constant or SSA_NAME for a strlen call,
we can just replace uses of the LHS of the assignment with the
const/copy.  Any statements we propagate into are put on a worklist.

We pull statements off the worklist and try to simplify their RHS.  If
the RHS simplifies to a const/copy, then then we repeat the process of
propagating to the use points and

x = <whatever>;

If we find <whatever> collapses to a constant or copy we can just record
it in SSA_NAME_VALUE.  As we walk through statements, we can propagate
> 
> Richard.
> 
>> jeff
>>
>
Martin Sebor Nov. 12, 2019, 7:55 p.m. UTC | #15
On 11/12/19 10:54 AM, Jeff Law wrote:
> On 11/12/19 1:15 AM, Richard Biener wrote:
>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>
>>>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>>>>>
>>>>>>>>>      123 |   *p = 0;
>>>>>>>>>          |   ~~~^~~
>>>>>>>>> note: destination object declared here
>>>>>>>>>       45 |   char b[N];
>>>>>>>>>          |        ^
>>>>>>>>>
>>>>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>
>>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>>> makes the existing messages clearer, are will become essential when
>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>> follow-on patch does).
>>>>>>>>>
>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>> builtins.c).  With the change, the note above might read something
>>>>>>>>> like:
>>>>>>>>>
>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>       45 |   char b[N];
>>>>>>>>>          |        ^
>>>>>>>>>
>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>
>>>>>>>>>       * builtins.c (compute_objsize): Add an argument and set it to
>>>>>>>>> offset
>>>>>>>>>       into destination.
>>>>>>>>>       * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>       * tree-object-size.c (addr_object_size): Add an argument and
>>>>>>>>> set it
>>>>>>>>>       to offset into destination.
>>>>>>>>>       (compute_builtin_object_size): Same.
>>>>>>>>>       * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>>> argument.
>>>>>>>>>       * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>>> set it
>>>>>>>>>       to offset into destination.
>>>>>>>>>       (maybe_warn_overflow): New function.
>>>>>>>>>       (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>
>>>>>>>>>       * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>>>>> messages.
>>>>>>>>>       * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>       * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>
>>>>>>>>
>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>> ===================================================================
>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>     static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>     static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>     +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>>> is in
>>>>>>>>> +   and returns true on success.  */
>>>>>>>>> +
>>>>>>>>> +static bool
>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>>>>> NULL)
>>>>>>>>> +{
>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>> +    {
>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>> +      return true;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>> +    return false;
>>>>>>>>> +
>>>>>>>>> +  if (rvals)
>>>>>>>>> +    {
>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>> +    {
>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>> assignments.  */
>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>> +      return true;
>>>>>>>>> +    }
>>>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>>>>> set
>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>> singleton
>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>>> true?
>>>>>>>>
>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>> the RHS
>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>> into a
>>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>>>>> the
>>>>>>>> RHS.  This would have to happen between the last range analysis and
>>>>>>>> the
>>>>>>>> point where you're making this query.
>>>>>>>
>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>
>>>>>>>     void f (void)
>>>>>>>     {
>>>>>>>       char s[] = "1234";
>>>>>>>       unsigned n = strlen (s);
>>>>>>>       char vla[n];   // or malloc (n)
>>>>>>>       vla[n] = 0;    // n = [4, 4]
>>>>>>>       ...
>>>>>>>     }
>>>>>>>
>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>> the access until sometime after the strlen pass is done.
>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>>>>> back of pass instance of vr_values.  If so, that might argue we want to
>>>>>> be setting it in vr_values too.
>>>>>
>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>
>>>>>     s = "1234";
>>>>>     _1 = __builtin_strlen (&s);
>>>>>     n_2 = (unsigned int) _1;
>>>>>     a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>     (*a.1_8)[n_2] = 0;
>>>>>
>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>> get its size.  We get back the range [4, 4].
>>>>
>>>> By the way, I glossed over one detail.  The above doesn't work
>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>> on the size of the argument to the strlen() call well before
>>>> the strlen pass even ran).
>>> Which would tend to argue that we should forward propagate the constant
>>> to the uses of _1.  That should expose that the RHS of the assignment to
>>> n_2 is a constant as well.
>>>
>>>
>>>>
>>>> To make it work across assignments we need to propagate the strlen
>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>> do this automagically.
>>> It would, but it's probably more heavyweight than we need.  We just need
>>> to forward propagate the discovered constant to the use points and pick
>>> up any secondary opportunities that arise.
>>
>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>> lattice (and for copies you'd need to track availability) and before optimizing
>> a stmt substitute its operands with the lattice contents.
>>
>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>> does a DOM walk, there you can follow what DOM/PRE elimination do
>> (for tracking copy availability - if you just track constants you can
>> elide that).
> I'm also note sure handling copies is all that interesting here and if
> we just handle constants/invariants, then it's pretty easy.
> 
> Whenever we replace a strlen call with a const, we put the LHS (assuming
> its an SSA_NAME) of the statement on a worklist.
> 
> We pull items off the worklist and propagate the value to the use points
> and try to simplify the resulting statement.  If the RHS of the use
> point simplified to a constant, then put the LHS of the use statement
> onto the worklist.  Iterate until the list is empty.
> 
> That would capture everything of interest I suspect and ought to be cheap.

This sounds like a significant project of its own, and well beyond
the scope of the simple infrastructure enhancement I'm making here:
all this does is improve the accuracy of the diagnostics.  Is
implementing it a precondition for accepting this patch?

If yes, since I don't think I have the time for it for GCC 10
I need to decide whether to drop just this improvement or all
of the buffer overflow checks that depend on it.

Let me know which you prefer.

Martin

> 
> jeff
> 
> 
> 
> I think whenever we substitute a constant or SSA_NAME for a strlen call,
> we can just replace uses of the LHS of the assignment with the
> const/copy.  Any statements we propagate into are put on a worklist.
> 
> We pull statements off the worklist and try to simplify their RHS.  If
> the RHS simplifies to a const/copy, then then we repeat the process of
> propagating to the use points and
> 
> x = <whatever>;
> 
> If we find <whatever> collapses to a constant or copy we can just record
> it in SSA_NAME_VALUE.  As we walk through statements, we can propagate
>>
>> Richard.
>>
>>> jeff
>>>
>>
>
Richard Biener Nov. 13, 2019, 2:34 p.m. UTC | #16
On Tue, Nov 12, 2019 at 6:55 PM Jeff Law <law@redhat.com> wrote:
>
> On 11/12/19 1:15 AM, Richard Biener wrote:
> > On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
> >>
> >> On 11/6/19 3:34 PM, Martin Sebor wrote:
> >>> On 11/6/19 2:06 PM, Martin Sebor wrote:
> >>>> On 11/6/19 1:39 PM, Jeff Law wrote:
> >>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
> >>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
> >>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
> >>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
> >>>>>>>> stores mention the amount of data being stored and the amount of
> >>>>>>>> space remaining in the destination, such as:
> >>>>>>>>
> >>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
> >>>>>>>>
> >>>>>>>>     123 |   *p = 0;
> >>>>>>>>         |   ~~~^~~
> >>>>>>>> note: destination object declared here
> >>>>>>>>      45 |   char b[N];
> >>>>>>>>         |        ^
> >>>>>>>>
> >>>>>>>> A warning like this can take some time to analyze.  First, the size
> >>>>>>>> of the destination isn't mentioned and may not be easy to tell from
> >>>>>>>> the sources.  In the note above, when N's value is the result of
> >>>>>>>> some non-trivial computation, chasing it down may be a small project
> >>>>>>>> in and of itself.  Second, it's also not clear why the region size
> >>>>>>>> is zero.  It could be because the offset is exactly N, or because
> >>>>>>>> it's negative, or because it's in some range greater than N.
> >>>>>>>>
> >>>>>>>> Mentioning both the size of the destination object and the offset
> >>>>>>>> makes the existing messages clearer, are will become essential when
> >>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
> >>>>>>>> follow-on patch does).
> >>>>>>>>
> >>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
> >>>>>>>> letting compute_objsize return the offset to its caller, doing
> >>>>>>>> something similar in get_stridx, and adding a new function to
> >>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
> >>>>>>>> like the function to replace the -Wstringop-overflow handler in
> >>>>>>>> builtins.c).  With the change, the note above might read something
> >>>>>>>> like:
> >>>>>>>>
> >>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
> >>>>>>>>      45 |   char b[N];
> >>>>>>>>         |        ^
> >>>>>>>>
> >>>>>>>> Tested on x86_64-linux.
> >>>>>>>>
> >>>>>>>> Martin
> >>>>>>>>
> >>>>>>>> gcc-store-offset.diff
> >>>>>>>>
> >>>>>>>> gcc/ChangeLog:
> >>>>>>>>
> >>>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
> >>>>>>>> offset
> >>>>>>>>      into destination.
> >>>>>>>>      * builtins.h (compute_objsize): Add an argument.
> >>>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
> >>>>>>>> set it
> >>>>>>>>      to offset into destination.
> >>>>>>>>      (compute_builtin_object_size): Same.
> >>>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
> >>>>>>>> argument.
> >>>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
> >>>>>>>> set it
> >>>>>>>>      to offset into destination.
> >>>>>>>>      (maybe_warn_overflow): New function.
> >>>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
> >>>>>>>>
> >>>>>>>> gcc/testsuite/ChangeLog:
> >>>>>>>>
> >>>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
> >>>>>>>> messages.
> >>>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
> >>>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
> >>>>>>>>
> >>>>>>>
> >>>>>>>> Index: gcc/tree-ssa-strlen.c
> >>>>>>>> ===================================================================
> >>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
> >>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
> >>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
> >>>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
> >>>>>>>> HOST_WIDE_INT, tree);
> >>>>>>>>    static void handle_builtin_stxncpy (built_in_function,
> >>>>>>>> gimple_stmt_iterator *);
> >>>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
> >>>>>>>> is in
> >>>>>>>> +   and returns true on success.  */
> >>>>>>>> +
> >>>>>>>> +static bool
> >>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
> >>>>>>>> NULL)
> >>>>>>>> +{
> >>>>>>>> +  if (tree_fits_uhwi_p (val))
> >>>>>>>> +    {
> >>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>>>> +      return true;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
> >>>>>>>> +    return false;
> >>>>>>>> +
> >>>>>>>> +  if (rvals)
> >>>>>>>> +    {
> >>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
> >>>>>>>> +      if (gimple_assign_single_p (def)
> >>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
> >>>>>>>> +    {
> >>>>>>>> +      /* get_value_range returns [0, N] for constant
> >>>>>>>> assignments.  */
> >>>>>>>> +      val = gimple_assign_rhs1 (def);
> >>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
> >>>>>>>> +      return true;
> >>>>>>>> +    }
> >>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
> >>>>>>> set
> >>>>>>> via a simple constant assignment, then the range ought to be a
> >>>>>>> singleton
> >>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
> >>>>>>> true?
> >>>>>>>
> >>>>>>> The only way offhand I could see this happening is if originally
> >>>>>>> the RHS
> >>>>>>> wasn't a constant, but due to optimizations it either simplified
> >>>>>>> into a
> >>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
> >>>>>>> the
> >>>>>>> RHS.  This would have to happen between the last range analysis and
> >>>>>>> the
> >>>>>>> point where you're making this query.
> >>>>>>
> >>>>>> Yes, I think that's right.  Here's an example where it happens:
> >>>>>>
> >>>>>>    void f (void)
> >>>>>>    {
> >>>>>>      char s[] = "1234";
> >>>>>>      unsigned n = strlen (s);
> >>>>>>      char vla[n];   // or malloc (n)
> >>>>>>      vla[n] = 0;    // n = [4, 4]
> >>>>>>      ...
> >>>>>>    }
> >>>>>>
> >>>>>> The strlen call is folded to 4 but that's not propagated to
> >>>>>> the access until sometime after the strlen pass is done.
> >>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
> >>>>> back of pass instance of vr_values.  If so, that might argue we want to
> >>>>> be setting it in vr_values too.
> >>>>
> >>>> No, set_range_info is only called for ranges.  In this case,
> >>>> handle_builtin_strlen replaces the strlen() call with 4:
> >>>>
> >>>>    s = "1234";
> >>>>    _1 = __builtin_strlen (&s);
> >>>>    n_2 = (unsigned int) _1;
> >>>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
> >>>>    (*a.1_8)[n_2] = 0;
> >>>>
> >>>> When the access is made, the __builtin_alloca_with_align call
> >>>> is found as the destination and the _1 SSA_NAME is used to
> >>>> get its size.  We get back the range [4, 4].
> >>>
> >>> By the way, I glossed over one detail.  The above doesn't work
> >>> exactly as is because the allocation size is the SSA_NAME _1
> >>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
> >>> the range [0, 4]; the range is [0, 4] because it was set based
> >>> on the size of the argument to the strlen() call well before
> >>> the strlen pass even ran).
> >> Which would tend to argue that we should forward propagate the constant
> >> to the uses of _1.  That should expose that the RHS of the assignment to
> >> n_2 is a constant as well.
> >>
> >>
> >>>
> >>> To make it work across assignments we need to propagate the strlen
> >>> results down the CFG somehow.  I'm hoping the on-demand VRP will
> >>> do this automagically.
> >> It would, but it's probably more heavyweight than we need.  We just need
> >> to forward propagate the discovered constant to the use points and pick
> >> up any secondary opportunities that arise.
> >
> > Yes.  And the usual way of doing this is to keep a constant-and-copy
> > lattice (and for copies you'd need to track availability) and before optimizing
> > a stmt substitute its operands with the lattice contents.
> >
> > forwprop has a scheme that can be followed doing a RPO walk, strlen
> > does a DOM walk, there you can follow what DOM/PRE elimination do
> > (for tracking copy availability - if you just track constants you can
> > elide that).
> I'm also note sure handling copies is all that interesting here and if
> we just handle constants/invariants, then it's pretty easy.
>
> Whenever we replace a strlen call with a const, we put the LHS (assuming
> its an SSA_NAME) of the statement on a worklist.
>
> We pull items off the worklist and propagate the value to the use points
> and try to simplify the resulting statement.  If the RHS of the use
> point simplified to a constant, then put the LHS of the use statement
> onto the worklist.  Iterate until the list is empty.
>
> That would capture everything of interest I suspect and ought to be cheap.

That's the ad-hoc way, yes.

To Martin: no, this shouldn't be a prerequesite

> jeff
>
>
>
> I think whenever we substitute a constant or SSA_NAME for a strlen call,
> we can just replace uses of the LHS of the assignment with the
> const/copy.  Any statements we propagate into are put on a worklist.
>
> We pull statements off the worklist and try to simplify their RHS.  If
> the RHS simplifies to a const/copy, then then we repeat the process of
> propagating to the use points and
>
> x = <whatever>;
>
> If we find <whatever> collapses to a constant or copy we can just record
> it in SSA_NAME_VALUE.  As we walk through statements, we can propagate
> >
> > Richard.
> >
> >> jeff
> >>
> >
>
Jeff Law Nov. 17, 2019, 7 p.m. UTC | #17
On 11/13/19 7:34 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 6:55 PM Jeff Law <law@redhat.com> wrote:
>>
>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>>
>>>>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>>>>>>
>>>>>>>>>>     123 |   *p = 0;
>>>>>>>>>>         |   ~~~^~~
>>>>>>>>>> note: destination object declared here
>>>>>>>>>>      45 |   char b[N];
>>>>>>>>>>         |        ^
>>>>>>>>>>
>>>>>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>>
>>>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>>>> makes the existing messages clearer, are will become essential when
>>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>>> follow-on patch does).
>>>>>>>>>>
>>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>>> builtins.c).  With the change, the note above might read something
>>>>>>>>>> like:
>>>>>>>>>>
>>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>>      45 |   char b[N];
>>>>>>>>>>         |        ^
>>>>>>>>>>
>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>>
>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
>>>>>>>>>> offset
>>>>>>>>>>      into destination.
>>>>>>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
>>>>>>>>>> set it
>>>>>>>>>>      to offset into destination.
>>>>>>>>>>      (compute_builtin_object_size): Same.
>>>>>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>>>> argument.
>>>>>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>>>> set it
>>>>>>>>>>      to offset into destination.
>>>>>>>>>>      (maybe_warn_overflow): New function.
>>>>>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>>>
>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>>>>>> messages.
>>>>>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>>> ===================================================================
>>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>>>> is in
>>>>>>>>>> +   and returns true on success.  */
>>>>>>>>>> +
>>>>>>>>>> +static bool
>>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>>>>>> NULL)
>>>>>>>>>> +{
>>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>>> +    {
>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>> +      return true;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>>> +    return false;
>>>>>>>>>> +
>>>>>>>>>> +  if (rvals)
>>>>>>>>>> +    {
>>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>>> +    {
>>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>>> assignments.  */
>>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>> +      return true;
>>>>>>>>>> +    }
>>>>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>>>>>> set
>>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>>> singleton
>>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>>>> true?
>>>>>>>>>
>>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>>> the RHS
>>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>>> into a
>>>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>>>>>> the
>>>>>>>>> RHS.  This would have to happen between the last range analysis and
>>>>>>>>> the
>>>>>>>>> point where you're making this query.
>>>>>>>>
>>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>>
>>>>>>>>    void f (void)
>>>>>>>>    {
>>>>>>>>      char s[] = "1234";
>>>>>>>>      unsigned n = strlen (s);
>>>>>>>>      char vla[n];   // or malloc (n)
>>>>>>>>      vla[n] = 0;    // n = [4, 4]
>>>>>>>>      ...
>>>>>>>>    }
>>>>>>>>
>>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>>> the access until sometime after the strlen pass is done.
>>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>>>>>> back of pass instance of vr_values.  If so, that might argue we want to
>>>>>>> be setting it in vr_values too.
>>>>>>
>>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>>
>>>>>>    s = "1234";
>>>>>>    _1 = __builtin_strlen (&s);
>>>>>>    n_2 = (unsigned int) _1;
>>>>>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>>    (*a.1_8)[n_2] = 0;
>>>>>>
>>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>>> get its size.  We get back the range [4, 4].
>>>>>
>>>>> By the way, I glossed over one detail.  The above doesn't work
>>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>>> on the size of the argument to the strlen() call well before
>>>>> the strlen pass even ran).
>>>> Which would tend to argue that we should forward propagate the constant
>>>> to the uses of _1.  That should expose that the RHS of the assignment to
>>>> n_2 is a constant as well.
>>>>
>>>>
>>>>>
>>>>> To make it work across assignments we need to propagate the strlen
>>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>>> do this automagically.
>>>> It would, but it's probably more heavyweight than we need.  We just need
>>>> to forward propagate the discovered constant to the use points and pick
>>>> up any secondary opportunities that arise.
>>>
>>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>>> lattice (and for copies you'd need to track availability) and before optimizing
>>> a stmt substitute its operands with the lattice contents.
>>>
>>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>>> does a DOM walk, there you can follow what DOM/PRE elimination do
>>> (for tracking copy availability - if you just track constants you can
>>> elide that).
>> I'm also note sure handling copies is all that interesting here and if
>> we just handle constants/invariants, then it's pretty easy.
>>
>> Whenever we replace a strlen call with a const, we put the LHS (assuming
>> its an SSA_NAME) of the statement on a worklist.
>>
>> We pull items off the worklist and propagate the value to the use points
>> and try to simplify the resulting statement.  If the RHS of the use
>> point simplified to a constant, then put the LHS of the use statement
>> onto the worklist.  Iterate until the list is empty.
>>
>> That would capture everything of interest I suspect and ought to be cheap.
> 
> That's the ad-hoc way, yes.

> 
> To Martin: no, this shouldn't be a prerequesite
I think that leaves the question of what to do with that one hunk in the
kit.  I think our choices are:

1. Leave the hunk as-is.

2. Pull out the hunk and adjust tests (with xfails I'd think).


I'd tend to lean towards #2, but I don't know how badly we'd be
compromising Martin's goals.

Jeff
Jeff Law Nov. 17, 2019, 7:03 p.m. UTC | #18
On 11/12/19 12:55 PM, Martin Sebor wrote:
> On 11/12/19 10:54 AM, Jeff Law wrote:
>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>>>>
>>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>>
>>>>>>>>>> warning: writing 4 bytes into a region of size 0
>>>>>>>>>> [-Wstringop-overflow=]
>>>>>>>>>>
>>>>>>>>>>      123 |   *p = 0;
>>>>>>>>>>          |   ~~~^~~
>>>>>>>>>> note: destination object declared here
>>>>>>>>>>       45 |   char b[N];
>>>>>>>>>>          |        ^
>>>>>>>>>>
>>>>>>>>>> A warning like this can take some time to analyze.  First, the
>>>>>>>>>> size
>>>>>>>>>> of the destination isn't mentioned and may not be easy to tell
>>>>>>>>>> from
>>>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>>>> some non-trivial computation, chasing it down may be a small
>>>>>>>>>> project
>>>>>>>>>> in and of itself.  Second, it's also not clear why the region
>>>>>>>>>> size
>>>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>>
>>>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>>>> makes the existing messages clearer, are will become essential
>>>>>>>>>> when
>>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>>> follow-on patch does).
>>>>>>>>>>
>>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>>> builtins.c).  With the change, the note above might read
>>>>>>>>>> something
>>>>>>>>>> like:
>>>>>>>>>>
>>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>>       45 |   char b[N];
>>>>>>>>>>          |        ^
>>>>>>>>>>
>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>
>>>>>>>>>> Martin
>>>>>>>>>>
>>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>>
>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>       * builtins.c (compute_objsize): Add an argument and set
>>>>>>>>>> it to
>>>>>>>>>> offset
>>>>>>>>>>       into destination.
>>>>>>>>>>       * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>>       * tree-object-size.c (addr_object_size): Add an argument
>>>>>>>>>> and
>>>>>>>>>> set it
>>>>>>>>>>       to offset into destination.
>>>>>>>>>>       (compute_builtin_object_size): Same.
>>>>>>>>>>       * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>>>> argument.
>>>>>>>>>>       * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>>>> set it
>>>>>>>>>>       to offset into destination.
>>>>>>>>>>       (maybe_warn_overflow): New function.
>>>>>>>>>>       (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>>>
>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>
>>>>>>>>>>       * c-c++-common/Wstringop-overflow-2.c: Adjust text of
>>>>>>>>>> expected
>>>>>>>>>> messages.
>>>>>>>>>>       * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>>       * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>>> ===================================================================
>>>>>>>>>>
>>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>>     static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>>     static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>>     +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>>>> is in
>>>>>>>>>> +   and returns true on success.  */
>>>>>>>>>> +
>>>>>>>>>> +static bool
>>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values
>>>>>>>>>> *rvals =
>>>>>>>>>> NULL)
>>>>>>>>>> +{
>>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>>> +    {
>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>> +      return true;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>>> +    return false;
>>>>>>>>>> +
>>>>>>>>>> +  if (rvals)
>>>>>>>>>> +    {
>>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>>> +    {
>>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>>> assignments.  */
>>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>> +      return true;
>>>>>>>>>> +    }
>>>>>>>>> Umm, something seems really off with this hunk.  If the
>>>>>>>>> SSA_NAME is
>>>>>>>>> set
>>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>>> singleton
>>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>>>> true?
>>>>>>>>>
>>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>>> the RHS
>>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>>> into a
>>>>>>>>> constant or a constant was propagated into an SSA_NAME
>>>>>>>>> appearing on
>>>>>>>>> the
>>>>>>>>> RHS.  This would have to happen between the last range analysis
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>> point where you're making this query.
>>>>>>>>
>>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>>
>>>>>>>>     void f (void)
>>>>>>>>     {
>>>>>>>>       char s[] = "1234";
>>>>>>>>       unsigned n = strlen (s);
>>>>>>>>       char vla[n];   // or malloc (n)
>>>>>>>>       vla[n] = 0;    // n = [4, 4]
>>>>>>>>       ...
>>>>>>>>     }
>>>>>>>>
>>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>>> the access until sometime after the strlen pass is done.
>>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes
>>>>>>> behind the
>>>>>>> back of pass instance of vr_values.  If so, that might argue we
>>>>>>> want to
>>>>>>> be setting it in vr_values too.
>>>>>>
>>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>>
>>>>>>     s = "1234";
>>>>>>     _1 = __builtin_strlen (&s);
>>>>>>     n_2 = (unsigned int) _1;
>>>>>>     a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>>     (*a.1_8)[n_2] = 0;
>>>>>>
>>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>>> get its size.  We get back the range [4, 4].
>>>>>
>>>>> By the way, I glossed over one detail.  The above doesn't work
>>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>>> on the size of the argument to the strlen() call well before
>>>>> the strlen pass even ran).
>>>> Which would tend to argue that we should forward propagate the constant
>>>> to the uses of _1.  That should expose that the RHS of the
>>>> assignment to
>>>> n_2 is a constant as well.
>>>>
>>>>
>>>>>
>>>>> To make it work across assignments we need to propagate the strlen
>>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>>> do this automagically.
>>>> It would, but it's probably more heavyweight than we need.  We just
>>>> need
>>>> to forward propagate the discovered constant to the use points and pick
>>>> up any secondary opportunities that arise.
>>>
>>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>>> lattice (and for copies you'd need to track availability) and before
>>> optimizing
>>> a stmt substitute its operands with the lattice contents.
>>>
>>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>>> does a DOM walk, there you can follow what DOM/PRE elimination do
>>> (for tracking copy availability - if you just track constants you can
>>> elide that).
>> I'm also note sure handling copies is all that interesting here and if
>> we just handle constants/invariants, then it's pretty easy.
>>
>> Whenever we replace a strlen call with a const, we put the LHS (assuming
>> its an SSA_NAME) of the statement on a worklist.
>>
>> We pull items off the worklist and propagate the value to the use points
>> and try to simplify the resulting statement.  If the RHS of the use
>> point simplified to a constant, then put the LHS of the use statement
>> onto the worklist.  Iterate until the list is empty.
>>
>> That would capture everything of interest I suspect and ought to be
>> cheap.
> 
> This sounds like a significant project of its own, and well beyond
> the scope of the simple infrastructure enhancement I'm making here:
> all this does is improve the accuracy of the diagnostics.  Is
> implementing it a precondition for accepting this patch?
It's not bad as probably 90%+ of the code could be cribbed from
elsewhere and a simple API built around it.

> 
> If yes, since I don't think I have the time for it for GCC 10
> I need to decide whether to drop just this improvement or all
> of the buffer overflow checks that depend on it.
> 
> Let me know which you prefer.
As I mentioned in my previous message, I think we've got two potential
paths.  We could just drop the problem hunk and adjust the tests with
xfails, but I'm not sure how badly that compromises what you're trying
to do.  We could also leave the hunk in with a fixme or somesuch.

Jeff
Jeff Law Nov. 17, 2019, 7:04 p.m. UTC | #19
On 11/12/19 1:16 AM, Richard Biener wrote:
> On Tue, Nov 12, 2019 at 9:15 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>>>
>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>
>>>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=]
>>>>>>>>>
>>>>>>>>>     123 |   *p = 0;
>>>>>>>>>         |   ~~~^~~
>>>>>>>>> note: destination object declared here
>>>>>>>>>      45 |   char b[N];
>>>>>>>>>         |        ^
>>>>>>>>>
>>>>>>>>> A warning like this can take some time to analyze.  First, the size
>>>>>>>>> of the destination isn't mentioned and may not be easy to tell from
>>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>>> some non-trivial computation, chasing it down may be a small project
>>>>>>>>> in and of itself.  Second, it's also not clear why the region size
>>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>
>>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>>> makes the existing messages clearer, are will become essential when
>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>> follow-on patch does).
>>>>>>>>>
>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>> builtins.c).  With the change, the note above might read something
>>>>>>>>> like:
>>>>>>>>>
>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>      45 |   char b[N];
>>>>>>>>>         |        ^
>>>>>>>>>
>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>
>>>>>>>>> Martin
>>>>>>>>>
>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>
>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>
>>>>>>>>>      * builtins.c (compute_objsize): Add an argument and set it to
>>>>>>>>> offset
>>>>>>>>>      into destination.
>>>>>>>>>      * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>      * tree-object-size.c (addr_object_size): Add an argument and
>>>>>>>>> set it
>>>>>>>>>      to offset into destination.
>>>>>>>>>      (compute_builtin_object_size): Same.
>>>>>>>>>      * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>>> argument.
>>>>>>>>>      * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>>> set it
>>>>>>>>>      to offset into destination.
>>>>>>>>>      (maybe_warn_overflow): New function.
>>>>>>>>>      (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>>
>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>
>>>>>>>>>      * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected
>>>>>>>>> messages.
>>>>>>>>>      * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>      * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>
>>>>>>>>
>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>> ===================================================================
>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>    static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>    static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>    +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>>> is in
>>>>>>>>> +   and returns true on success.  */
>>>>>>>>> +
>>>>>>>>> +static bool
>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals =
>>>>>>>>> NULL)
>>>>>>>>> +{
>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>> +    {
>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>> +      return true;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>> +    return false;
>>>>>>>>> +
>>>>>>>>> +  if (rvals)
>>>>>>>>> +    {
>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>> +    {
>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>> assignments.  */
>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>> +      return true;
>>>>>>>>> +    }
>>>>>>>> Umm, something seems really off with this hunk.  If the SSA_NAME is
>>>>>>>> set
>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>> singleton
>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>>> true?
>>>>>>>>
>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>> the RHS
>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>> into a
>>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on
>>>>>>>> the
>>>>>>>> RHS.  This would have to happen between the last range analysis and
>>>>>>>> the
>>>>>>>> point where you're making this query.
>>>>>>>
>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>
>>>>>>>    void f (void)
>>>>>>>    {
>>>>>>>      char s[] = "1234";
>>>>>>>      unsigned n = strlen (s);
>>>>>>>      char vla[n];   // or malloc (n)
>>>>>>>      vla[n] = 0;    // n = [4, 4]
>>>>>>>      ...
>>>>>>>    }
>>>>>>>
>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>> the access until sometime after the strlen pass is done.
>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes behind the
>>>>>> back of pass instance of vr_values.  If so, that might argue we want to
>>>>>> be setting it in vr_values too.
>>>>>
>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>
>>>>>    s = "1234";
>>>>>    _1 = __builtin_strlen (&s);
>>>>>    n_2 = (unsigned int) _1;
>>>>>    a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>    (*a.1_8)[n_2] = 0;
>>>>>
>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>> get its size.  We get back the range [4, 4].
>>>>
>>>> By the way, I glossed over one detail.  The above doesn't work
>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>> on the size of the argument to the strlen() call well before
>>>> the strlen pass even ran).
>>> Which would tend to argue that we should forward propagate the constant
>>> to the uses of _1.  That should expose that the RHS of the assignment to
>>> n_2 is a constant as well.
>>>
>>>
>>>>
>>>> To make it work across assignments we need to propagate the strlen
>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>> do this automagically.
>>> It would, but it's probably more heavyweight than we need.  We just need
>>> to forward propagate the discovered constant to the use points and pick
>>> up any secondary opportunities that arise.
>>
>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>> lattice (and for copies you'd need to track availability) and before optimizing
>> a stmt substitute its operands with the lattice contents.
>>
>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>> does a DOM walk, there you can follow what DOM/PRE elimination do
>> (for tracking copy availability - if you just track constants you can
>> elide that).
> 
> I guess we could enhance domwalk with lattice tracking utilities as well
> (in a derived class).
Yea.  That would actually be helpful in other contexts as well.

jeff
Martin Sebor Nov. 18, 2019, 4:15 p.m. UTC | #20
On 11/17/19 12:03 PM, Jeff Law wrote:
> On 11/12/19 12:55 PM, Martin Sebor wrote:
>> On 11/12/19 10:54 AM, Jeff Law wrote:
>>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>>>>>
>>>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>>>> stores mention the amount of data being stored and the amount of
>>>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>>>
>>>>>>>>>>> warning: writing 4 bytes into a region of size 0
>>>>>>>>>>> [-Wstringop-overflow=]
>>>>>>>>>>>
>>>>>>>>>>>       123 |   *p = 0;
>>>>>>>>>>>           |   ~~~^~~
>>>>>>>>>>> note: destination object declared here
>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>           |        ^
>>>>>>>>>>>
>>>>>>>>>>> A warning like this can take some time to analyze.  First, the
>>>>>>>>>>> size
>>>>>>>>>>> of the destination isn't mentioned and may not be easy to tell
>>>>>>>>>>> from
>>>>>>>>>>> the sources.  In the note above, when N's value is the result of
>>>>>>>>>>> some non-trivial computation, chasing it down may be a small
>>>>>>>>>>> project
>>>>>>>>>>> in and of itself.  Second, it's also not clear why the region
>>>>>>>>>>> size
>>>>>>>>>>> is zero.  It could be because the offset is exactly N, or because
>>>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>>>
>>>>>>>>>>> Mentioning both the size of the destination object and the offset
>>>>>>>>>>> makes the existing messages clearer, are will become essential
>>>>>>>>>>> when
>>>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>>>> follow-on patch does).
>>>>>>>>>>>
>>>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>>>> builtins.c).  With the change, the note above might read
>>>>>>>>>>> something
>>>>>>>>>>> like:
>>>>>>>>>>>
>>>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>           |        ^
>>>>>>>>>>>
>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>
>>>>>>>>>>> Martin
>>>>>>>>>>>
>>>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>>>
>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>>        * builtins.c (compute_objsize): Add an argument and set
>>>>>>>>>>> it to
>>>>>>>>>>> offset
>>>>>>>>>>>        into destination.
>>>>>>>>>>>        * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>>>        * tree-object-size.c (addr_object_size): Add an argument
>>>>>>>>>>> and
>>>>>>>>>>> set it
>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>        (compute_builtin_object_size): Same.
>>>>>>>>>>>        * tree-object-size.h (compute_builtin_object_size): Add an
>>>>>>>>>>> argument.
>>>>>>>>>>>        * tree-ssa-strlen.c (get_addr_stridx): Add an argument and
>>>>>>>>>>> set it
>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>        (maybe_warn_overflow): New function.
>>>>>>>>>>>        (handle_store): Call maybe_warn_overflow to issue warnings.
>>>>>>>>>>>
>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>>        * c-c++-common/Wstringop-overflow-2.c: Adjust text of
>>>>>>>>>>> expected
>>>>>>>>>>> messages.
>>>>>>>>>>>        * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>>>        * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>>>> ===================================================================
>>>>>>>>>>>
>>>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>>>      static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>>>      static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>>>      +/* Sets MINMAX to either the constant value or the range VAL
>>>>>>>>>>> is in
>>>>>>>>>>> +   and returns true on success.  */
>>>>>>>>>>> +
>>>>>>>>>>> +static bool
>>>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values
>>>>>>>>>>> *rvals =
>>>>>>>>>>> NULL)
>>>>>>>>>>> +{
>>>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>>>> +    {
>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>> +      return true;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>>>> +    return false;
>>>>>>>>>>> +
>>>>>>>>>>> +  if (rvals)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>>>> +    {
>>>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>>>> assignments.  */
>>>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>> +      return true;
>>>>>>>>>>> +    }
>>>>>>>>>> Umm, something seems really off with this hunk.  If the
>>>>>>>>>> SSA_NAME is
>>>>>>>>>> set
>>>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>>>> singleton
>>>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is not
>>>>>>>>>> true?
>>>>>>>>>>
>>>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>>>> the RHS
>>>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>>>> into a
>>>>>>>>>> constant or a constant was propagated into an SSA_NAME
>>>>>>>>>> appearing on
>>>>>>>>>> the
>>>>>>>>>> RHS.  This would have to happen between the last range analysis
>>>>>>>>>> and
>>>>>>>>>> the
>>>>>>>>>> point where you're making this query.
>>>>>>>>>
>>>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>>>
>>>>>>>>>      void f (void)
>>>>>>>>>      {
>>>>>>>>>        char s[] = "1234";
>>>>>>>>>        unsigned n = strlen (s);
>>>>>>>>>        char vla[n];   // or malloc (n)
>>>>>>>>>        vla[n] = 0;    // n = [4, 4]
>>>>>>>>>        ...
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>>>> the access until sometime after the strlen pass is done.
>>>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes
>>>>>>>> behind the
>>>>>>>> back of pass instance of vr_values.  If so, that might argue we
>>>>>>>> want to
>>>>>>>> be setting it in vr_values too.
>>>>>>>
>>>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>>>
>>>>>>>      s = "1234";
>>>>>>>      _1 = __builtin_strlen (&s);
>>>>>>>      n_2 = (unsigned int) _1;
>>>>>>>      a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>>>      (*a.1_8)[n_2] = 0;
>>>>>>>
>>>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>>>> get its size.  We get back the range [4, 4].
>>>>>>
>>>>>> By the way, I glossed over one detail.  The above doesn't work
>>>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>>>> on the size of the argument to the strlen() call well before
>>>>>> the strlen pass even ran).
>>>>> Which would tend to argue that we should forward propagate the constant
>>>>> to the uses of _1.  That should expose that the RHS of the
>>>>> assignment to
>>>>> n_2 is a constant as well.
>>>>>
>>>>>
>>>>>>
>>>>>> To make it work across assignments we need to propagate the strlen
>>>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>>>> do this automagically.
>>>>> It would, but it's probably more heavyweight than we need.  We just
>>>>> need
>>>>> to forward propagate the discovered constant to the use points and pick
>>>>> up any secondary opportunities that arise.
>>>>
>>>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>>>> lattice (and for copies you'd need to track availability) and before
>>>> optimizing
>>>> a stmt substitute its operands with the lattice contents.
>>>>
>>>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>>>> does a DOM walk, there you can follow what DOM/PRE elimination do
>>>> (for tracking copy availability - if you just track constants you can
>>>> elide that).
>>> I'm also note sure handling copies is all that interesting here and if
>>> we just handle constants/invariants, then it's pretty easy.
>>>
>>> Whenever we replace a strlen call with a const, we put the LHS (assuming
>>> its an SSA_NAME) of the statement on a worklist.
>>>
>>> We pull items off the worklist and propagate the value to the use points
>>> and try to simplify the resulting statement.  If the RHS of the use
>>> point simplified to a constant, then put the LHS of the use statement
>>> onto the worklist.  Iterate until the list is empty.
>>>
>>> That would capture everything of interest I suspect and ought to be
>>> cheap.
>>
>> This sounds like a significant project of its own, and well beyond
>> the scope of the simple infrastructure enhancement I'm making here:
>> all this does is improve the accuracy of the diagnostics.  Is
>> implementing it a precondition for accepting this patch?
> It's not bad as probably 90%+ of the code could be cribbed from
> elsewhere and a simple API built around it.

Maybe.  I just don't expect to have the time to do it.  Plus, with
the on-demand VRP expected to land in GCC 11, I don't think it would
be time well spent.  We'd be adding code only to remove it a few
months from now.  (Unless, of course, the new VRP doesn't handle it.)

> 
>>
>> If yes, since I don't think I have the time for it for GCC 10
>> I need to decide whether to drop just this improvement or all
>> of the buffer overflow checks that depend on it.
>>
>> Let me know which you prefer.
> As I mentioned in my previous message, I think we've got two potential
> paths.  We could just drop the problem hunk and adjust the tests with
> xfails, but I'm not sure how badly that compromises what you're trying
> to do.  We could also leave the hunk in with a fixme or somesuch.

It would compromise the buffer overflow detection in cases where
the length of a string computed by the pass is then used in the same
function to access another string.  There are more of these now as
the strlen has been getting better and better at tracking the string
lenghts.

Rather than using fixed size arrays, code that deals with strings
of unlimited lengths has to allocate space for them dynamically.
Until now, the strlen pass hasn't tried to detect or prevent
overflow into those (and transformed writes to them in ways that
prevented it from being detected later by _FORTIFY_SOURCE).  But
the follow-on enhancement to this patch changes that, and so this
limitation becomes more of an impediment.

But I don't understand why you think using the RHS of an SSA_NAME
assignment is a problem when it's an INTEGER_CST.  Is that unsafe
for some reason, or likely to fail somehow?

Martin
Jeff Law Nov. 20, 2019, 5:47 p.m. UTC | #21
On 11/18/19 9:15 AM, Martin Sebor wrote:
> On 11/17/19 12:03 PM, Jeff Law wrote:
>> On 11/12/19 12:55 PM, Martin Sebor wrote:
>>> On 11/12/19 10:54 AM, Jeff Law wrote:
>>>> On 11/12/19 1:15 AM, Richard Biener wrote:
>>>>> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law <law@redhat.com> wrote:
>>>>>>
>>>>>> On 11/6/19 3:34 PM, Martin Sebor wrote:
>>>>>>> On 11/6/19 2:06 PM, Martin Sebor wrote:
>>>>>>>> On 11/6/19 1:39 PM, Jeff Law wrote:
>>>>>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote:
>>>>>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote:
>>>>>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote:
>>>>>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte
>>>>>>>>>>>> stores mention the amount of data being stored and the
>>>>>>>>>>>> amount of
>>>>>>>>>>>> space remaining in the destination, such as:
>>>>>>>>>>>>
>>>>>>>>>>>> warning: writing 4 bytes into a region of size 0
>>>>>>>>>>>> [-Wstringop-overflow=]
>>>>>>>>>>>>
>>>>>>>>>>>>       123 |   *p = 0;
>>>>>>>>>>>>           |   ~~~^~~
>>>>>>>>>>>> note: destination object declared here
>>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>>           |        ^
>>>>>>>>>>>>
>>>>>>>>>>>> A warning like this can take some time to analyze.  First, the
>>>>>>>>>>>> size
>>>>>>>>>>>> of the destination isn't mentioned and may not be easy to tell
>>>>>>>>>>>> from
>>>>>>>>>>>> the sources.  In the note above, when N's value is the
>>>>>>>>>>>> result of
>>>>>>>>>>>> some non-trivial computation, chasing it down may be a small
>>>>>>>>>>>> project
>>>>>>>>>>>> in and of itself.  Second, it's also not clear why the region
>>>>>>>>>>>> size
>>>>>>>>>>>> is zero.  It could be because the offset is exactly N, or
>>>>>>>>>>>> because
>>>>>>>>>>>> it's negative, or because it's in some range greater than N.
>>>>>>>>>>>>
>>>>>>>>>>>> Mentioning both the size of the destination object and the
>>>>>>>>>>>> offset
>>>>>>>>>>>> makes the existing messages clearer, are will become essential
>>>>>>>>>>>> when
>>>>>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my
>>>>>>>>>>>> follow-on patch does).
>>>>>>>>>>>>
>>>>>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by
>>>>>>>>>>>> letting compute_objsize return the offset to its caller, doing
>>>>>>>>>>>> something similar in get_stridx, and adding a new function to
>>>>>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd
>>>>>>>>>>>> like the function to replace the -Wstringop-overflow handler in
>>>>>>>>>>>> builtins.c).  With the change, the note above might read
>>>>>>>>>>>> something
>>>>>>>>>>>> like:
>>>>>>>>>>>>
>>>>>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here
>>>>>>>>>>>>        45 |   char b[N];
>>>>>>>>>>>>           |        ^
>>>>>>>>>>>>
>>>>>>>>>>>> Tested on x86_64-linux.
>>>>>>>>>>>>
>>>>>>>>>>>> Martin
>>>>>>>>>>>>
>>>>>>>>>>>> gcc-store-offset.diff
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>        * builtins.c (compute_objsize): Add an argument and set
>>>>>>>>>>>> it to
>>>>>>>>>>>> offset
>>>>>>>>>>>>        into destination.
>>>>>>>>>>>>        * builtins.h (compute_objsize): Add an argument.
>>>>>>>>>>>>        * tree-object-size.c (addr_object_size): Add an argument
>>>>>>>>>>>> and
>>>>>>>>>>>> set it
>>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>>        (compute_builtin_object_size): Same.
>>>>>>>>>>>>        * tree-object-size.h (compute_builtin_object_size):
>>>>>>>>>>>> Add an
>>>>>>>>>>>> argument.
>>>>>>>>>>>>        * tree-ssa-strlen.c (get_addr_stridx): Add an
>>>>>>>>>>>> argument and
>>>>>>>>>>>> set it
>>>>>>>>>>>>        to offset into destination.
>>>>>>>>>>>>        (maybe_warn_overflow): New function.
>>>>>>>>>>>>        (handle_store): Call maybe_warn_overflow to issue
>>>>>>>>>>>> warnings.
>>>>>>>>>>>>
>>>>>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>>>>>
>>>>>>>>>>>>        * c-c++-common/Wstringop-overflow-2.c: Adjust text of
>>>>>>>>>>>> expected
>>>>>>>>>>>> messages.
>>>>>>>>>>>>        * g++.dg/warn/Wstringop-overflow-3.C: Same.
>>>>>>>>>>>>        * gcc.dg/Wstringop-overflow-17.c: Same.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Index: gcc/tree-ssa-strlen.c
>>>>>>>>>>>> ===================================================================
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --- gcc/tree-ssa-strlen.c    (revision 277886)
>>>>>>>>>>>> +++ gcc/tree-ssa-strlen.c    (working copy)
>>>>>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct
>>>>>>>>>>>>      static int get_stridx_plus_constant (strinfo *, unsigned
>>>>>>>>>>>> HOST_WIDE_INT, tree);
>>>>>>>>>>>>      static void handle_builtin_stxncpy (built_in_function,
>>>>>>>>>>>> gimple_stmt_iterator *);
>>>>>>>>>>>>      +/* Sets MINMAX to either the constant value or the
>>>>>>>>>>>> range VAL
>>>>>>>>>>>> is in
>>>>>>>>>>>> +   and returns true on success.  */
>>>>>>>>>>>> +
>>>>>>>>>>>> +static bool
>>>>>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values
>>>>>>>>>>>> *rvals =
>>>>>>>>>>>> NULL)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +  if (tree_fits_uhwi_p (val))
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>>> +      return true;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +  if (TREE_CODE (val) != SSA_NAME)
>>>>>>>>>>>> +    return false;
>>>>>>>>>>>> +
>>>>>>>>>>>> +  if (rvals)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      gimple *def = SSA_NAME_DEF_STMT (val);
>>>>>>>>>>>> +      if (gimple_assign_single_p (def)
>>>>>>>>>>>> +      && gimple_assign_rhs_code (def) == INTEGER_CST)
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      /* get_value_range returns [0, N] for constant
>>>>>>>>>>>> assignments.  */
>>>>>>>>>>>> +      val = gimple_assign_rhs1 (def);
>>>>>>>>>>>> +      minmax[0] = minmax[1] = wi::to_wide (val);
>>>>>>>>>>>> +      return true;
>>>>>>>>>>>> +    }
>>>>>>>>>>> Umm, something seems really off with this hunk.  If the
>>>>>>>>>>> SSA_NAME is
>>>>>>>>>>> set
>>>>>>>>>>> via a simple constant assignment, then the range ought to be a
>>>>>>>>>>> singleton
>>>>>>>>>>> ie [CONST,CONST].   Is there are particular test were this is
>>>>>>>>>>> not
>>>>>>>>>>> true?
>>>>>>>>>>>
>>>>>>>>>>> The only way offhand I could see this happening is if originally
>>>>>>>>>>> the RHS
>>>>>>>>>>> wasn't a constant, but due to optimizations it either simplified
>>>>>>>>>>> into a
>>>>>>>>>>> constant or a constant was propagated into an SSA_NAME
>>>>>>>>>>> appearing on
>>>>>>>>>>> the
>>>>>>>>>>> RHS.  This would have to happen between the last range analysis
>>>>>>>>>>> and
>>>>>>>>>>> the
>>>>>>>>>>> point where you're making this query.
>>>>>>>>>>
>>>>>>>>>> Yes, I think that's right.  Here's an example where it happens:
>>>>>>>>>>
>>>>>>>>>>      void f (void)
>>>>>>>>>>      {
>>>>>>>>>>        char s[] = "1234";
>>>>>>>>>>        unsigned n = strlen (s);
>>>>>>>>>>        char vla[n];   // or malloc (n)
>>>>>>>>>>        vla[n] = 0;    // n = [4, 4]
>>>>>>>>>>        ...
>>>>>>>>>>      }
>>>>>>>>>>
>>>>>>>>>> The strlen call is folded to 4 but that's not propagated to
>>>>>>>>>> the access until sometime after the strlen pass is done.
>>>>>>>>> Hmm.  Are we calling set_range_info in that case?  That goes
>>>>>>>>> behind the
>>>>>>>>> back of pass instance of vr_values.  If so, that might argue we
>>>>>>>>> want to
>>>>>>>>> be setting it in vr_values too.
>>>>>>>>
>>>>>>>> No, set_range_info is only called for ranges.  In this case,
>>>>>>>> handle_builtin_strlen replaces the strlen() call with 4:
>>>>>>>>
>>>>>>>>      s = "1234";
>>>>>>>>      _1 = __builtin_strlen (&s);
>>>>>>>>      n_2 = (unsigned int) _1;
>>>>>>>>      a.1_8 = __builtin_alloca_with_align (_1, 8);
>>>>>>>>      (*a.1_8)[n_2] = 0;
>>>>>>>>
>>>>>>>> When the access is made, the __builtin_alloca_with_align call
>>>>>>>> is found as the destination and the _1 SSA_NAME is used to
>>>>>>>> get its size.  We get back the range [4, 4].
>>>>>>>
>>>>>>> By the way, I glossed over one detail.  The above doesn't work
>>>>>>> exactly as is because the allocation size is the SSA_NAME _1
>>>>>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with
>>>>>>> the range [0, 4]; the range is [0, 4] because it was set based
>>>>>>> on the size of the argument to the strlen() call well before
>>>>>>> the strlen pass even ran).
>>>>>> Which would tend to argue that we should forward propagate the
>>>>>> constant
>>>>>> to the uses of _1.  That should expose that the RHS of the
>>>>>> assignment to
>>>>>> n_2 is a constant as well.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> To make it work across assignments we need to propagate the strlen
>>>>>>> results down the CFG somehow.  I'm hoping the on-demand VRP will
>>>>>>> do this automagically.
>>>>>> It would, but it's probably more heavyweight than we need.  We just
>>>>>> need
>>>>>> to forward propagate the discovered constant to the use points and
>>>>>> pick
>>>>>> up any secondary opportunities that arise.
>>>>>
>>>>> Yes.  And the usual way of doing this is to keep a constant-and-copy
>>>>> lattice (and for copies you'd need to track availability) and before
>>>>> optimizing
>>>>> a stmt substitute its operands with the lattice contents.
>>>>>
>>>>> forwprop has a scheme that can be followed doing a RPO walk, strlen
>>>>> does a DOM walk, there you can follow what DOM/PRE elimination do
>>>>> (for tracking copy availability - if you just track constants you can
>>>>> elide that).
>>>> I'm also note sure handling copies is all that interesting here and if
>>>> we just handle constants/invariants, then it's pretty easy.
>>>>
>>>> Whenever we replace a strlen call with a const, we put the LHS
>>>> (assuming
>>>> its an SSA_NAME) of the statement on a worklist.
>>>>
>>>> We pull items off the worklist and propagate the value to the use
>>>> points
>>>> and try to simplify the resulting statement.  If the RHS of the use
>>>> point simplified to a constant, then put the LHS of the use statement
>>>> onto the worklist.  Iterate until the list is empty.
>>>>
>>>> That would capture everything of interest I suspect and ought to be
>>>> cheap.
>>>
>>> This sounds like a significant project of its own, and well beyond
>>> the scope of the simple infrastructure enhancement I'm making here:
>>> all this does is improve the accuracy of the diagnostics.  Is
>>> implementing it a precondition for accepting this patch?
>> It's not bad as probably 90%+ of the code could be cribbed from
>> elsewhere and a simple API built around it.
> 
> Maybe.  I just don't expect to have the time to do it.  Plus, with
> the on-demand VRP expected to land in GCC 11, I don't think it would
> be time well spent.  We'd be adding code only to remove it a few
> months from now.  (Unless, of course, the new VRP doesn't handle it.)
> 
>>
>>>
>>> If yes, since I don't think I have the time for it for GCC 10
>>> I need to decide whether to drop just this improvement or all
>>> of the buffer overflow checks that depend on it.
>>>
>>> Let me know which you prefer.
>> As I mentioned in my previous message, I think we've got two potential
>> paths.  We could just drop the problem hunk and adjust the tests with
>> xfails, but I'm not sure how badly that compromises what you're trying
>> to do.  We could also leave the hunk in with a fixme or somesuch.
> 
> It would compromise the buffer overflow detection in cases where
> the length of a string computed by the pass is then used in the same
> function to access another string.  There are more of these now as
> the strlen has been getting better and better at tracking the string
> lenghts.
> 
> Rather than using fixed size arrays, code that deals with strings
> of unlimited lengths has to allocate space for them dynamically.
> Until now, the strlen pass hasn't tried to detect or prevent
> overflow into those (and transformed writes to them in ways that
> prevented it from being detected later by _FORTIFY_SOURCE).  But
> the follow-on enhancement to this patch changes that, and so this
> limitation becomes more of an impediment.
Maybe I would ask the question differently.  In real world scenarios how
often does this happen?  How about in tests you're adding?  Is there
value in the patch without this hunk?

Essentially I'm trying to make a determination if the patch and its
follow-ups have value without this hack.

> 
> But I don't understand why you think using the RHS of an SSA_NAME
> assignment is a problem when it's an INTEGER_CST.  Is that unsafe
> for some reason, or likely to fail somehow?
Because it's just working around a failing in our optimizers and a
fairly trivial one at that.  I'd rather fix the real problem rather than
just paper over it.

jeff
Martin Sebor Dec. 5, 2019, 11:37 p.m. UTC | #22
>>>> If yes, since I don't think I have the time for it for GCC 10
>>>> I need to decide whether to drop just this improvement or all
>>>> of the buffer overflow checks that depend on it.
>>>>
>>>> Let me know which you prefer.
>>> As I mentioned in my previous message, I think we've got two potential
>>> paths.  We could just drop the problem hunk and adjust the tests with
>>> xfails, but I'm not sure how badly that compromises what you're trying
>>> to do.  We could also leave the hunk in with a fixme or somesuch.
>>
>> It would compromise the buffer overflow detection in cases where
>> the length of a string computed by the pass is then used in the same
>> function to access another string.  There are more of these now as
>> the strlen has been getting better and better at tracking the string
>> lenghts.
>>
>> Rather than using fixed size arrays, code that deals with strings
>> of unlimited lengths has to allocate space for them dynamically.
>> Until now, the strlen pass hasn't tried to detect or prevent
>> overflow into those (and transformed writes to them in ways that
>> prevented it from being detected later by _FORTIFY_SOURCE).  But
>> the follow-on enhancement to this patch changes that, and so this
>> limitation becomes more of an impediment.
> Maybe I would ask the question differently.  In real world scenarios how
> often does this happen?  How about in tests you're adding?  Is there
> value in the patch without this hunk?

To reiterate what we discussed privately: the patches in this
series (the dynamic buffer overflow detection) have considerable
value even without the hunk.  The detection will work correctly
in most cases.  It will fail and cause false negatives in
the specific cases I mentioned uptread, namely:

where the out-of-bounds store is to a character array whose size
or offset depend on the length of a string previously computed
by the strlen pass.  The class of these cases is large but, due
to the limitation (it has to be the same SSA_NAME that's used
in both the allocation context and the destination offset),
the subset where it is effective is relatively small.

> 
> Essentially I'm trying to make a determination if the patch and its
> follow-ups have value without this hack.
> 
>>
>> But I don't understand why you think using the RHS of an SSA_NAME
>> assignment is a problem when it's an INTEGER_CST.  Is that unsafe
>> for some reason, or likely to fail somehow?
> Because it's just working around a failing in our optimizers and a
> fairly trivial one at that.  I'd rather fix the real problem rather than
> just paper over it.

The robust solution is to implement the full constant propagation
in (or for) the strlen pass.  Hopefully, the on-demand VRP will
give us that in GCC 11.  If not, I will look into implementing
it myself.

Since the workaround stands in the way of getting the patch
approved I have removed it.  Attached is a revised patch, rebased
on the top of trunk and retested on x86_64-linux.

Martin
Martin Sebor Dec. 11, 2019, 7:52 p.m. UTC | #23
On 12/5/19 4:37 PM, Martin Sebor wrote:
>>>>> If yes, since I don't think I have the time for it for GCC 10
>>>>> I need to decide whether to drop just this improvement or all
>>>>> of the buffer overflow checks that depend on it.
>>>>>
>>>>> Let me know which you prefer.
>>>> As I mentioned in my previous message, I think we've got two potential
>>>> paths.  We could just drop the problem hunk and adjust the tests with
>>>> xfails, but I'm not sure how badly that compromises what you're trying
>>>> to do.  We could also leave the hunk in with a fixme or somesuch.
>>>
>>> It would compromise the buffer overflow detection in cases where
>>> the length of a string computed by the pass is then used in the same
>>> function to access another string.  There are more of these now as
>>> the strlen has been getting better and better at tracking the string
>>> lenghts.
>>>
>>> Rather than using fixed size arrays, code that deals with strings
>>> of unlimited lengths has to allocate space for them dynamically.
>>> Until now, the strlen pass hasn't tried to detect or prevent
>>> overflow into those (and transformed writes to them in ways that
>>> prevented it from being detected later by _FORTIFY_SOURCE).  But
>>> the follow-on enhancement to this patch changes that, and so this
>>> limitation becomes more of an impediment.
>> Maybe I would ask the question differently.  In real world scenarios how
>> often does this happen?  How about in tests you're adding?  Is there
>> value in the patch without this hunk?
> 
> To reiterate what we discussed privately: the patches in this
> series (the dynamic buffer overflow detection) have considerable
> value even without the hunk.  The detection will work correctly
> in most cases.  It will fail and cause false negatives in
> the specific cases I mentioned uptread, namely:
> 
> where the out-of-bounds store is to a character array whose size
> or offset depend on the length of a string previously computed
> by the strlen pass.  The class of these cases is large but, due
> to the limitation (it has to be the same SSA_NAME that's used
> in both the allocation context and the destination offset),
> the subset where it is effective is relatively small.
> 
>>
>> Essentially I'm trying to make a determination if the patch and its
>> follow-ups have value without this hack.
>>
>>>
>>> But I don't understand why you think using the RHS of an SSA_NAME
>>> assignment is a problem when it's an INTEGER_CST.  Is that unsafe
>>> for some reason, or likely to fail somehow?
>> Because it's just working around a failing in our optimizers and a
>> fairly trivial one at that.  I'd rather fix the real problem rather than
>> just paper over it.
> 
> The robust solution is to implement the full constant propagation
> in (or for) the strlen pass.  Hopefully, the on-demand VRP will
> give us that in GCC 11.  If not, I will look into implementing
> it myself.
> 
> Since the workaround stands in the way of getting the patch
> approved I have removed it.  Attached is a revised patch, rebased
> on the top of trunk and retested on x86_64-linux.

I have committed this in r279248 with Jeff's off-list approval.

Martin
diff mbox series

Patch

gcc/ChangeLog:

	* builtins.c (compute_objsize): Add an argument and set it to offset
	into destination.
	* builtins.h (compute_objsize): Add an argument.
	* tree-object-size.c (addr_object_size): Add an argument and set it
	to offset into destination.
	(compute_builtin_object_size): Same.
	* tree-object-size.h (compute_builtin_object_size): Add an argument.
	* tree-ssa-strlen.c (get_addr_stridx): Add an argument and set it
	to offset into destination.
	(maybe_warn_overflow): New function.
	(handle_store): Call maybe_warn_overflow to issue warnings.

gcc/testsuite/ChangeLog:

	* c-c++-common/Wstringop-overflow-2.c: Adjust text of expected messages.
	* g++.dg/warn/Wstringop-overflow-3.C: Same.
	* gcc.dg/Wstringop-overflow-17.c: Same.

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 277886)
+++ gcc/builtins.c	(working copy)
@@ -3571,23 +3571,29 @@  check_access (tree exp, tree, tree, tree dstwrite,
    a non-constant offset in some range the returned value represents
    the largest size given the smallest non-negative offset in the
    range.  If nonnull, set *PDECL to the decl of the referenced
-   subobject if it can be determined, or to null otherwise.
+   subobject if it can be determined, or to null otherwise.  Likewise,
+   when POFF is nonnull *POFF is set to the offset into *PDECL.
    The function is intended for diagnostics and should not be used
    to influence code generation or optimization.  */
 
 tree
-compute_objsize (tree dest, int ostype, tree *pdecl /* = NULL */)
+compute_objsize (tree dest, int ostype, tree *pdecl /* = NULL */,
+		 tree *poff /* = NULL */)
 {
-  tree dummy = NULL_TREE;
+  tree dummy_decl = NULL_TREE;
   if (!pdecl)
-    pdecl = &dummy;
+    pdecl = &dummy_decl;
 
+  tree dummy_off = size_zero_node;
+  if (!poff)
+    poff = &dummy_off;
+
   unsigned HOST_WIDE_INT size;
 
   /* Only the two least significant bits are meaningful.  */
   ostype &= 3;
 
-  if (compute_builtin_object_size (dest, ostype, &size, pdecl))
+  if (compute_builtin_object_size (dest, ostype, &size, pdecl, poff))
     return build_int_cst (sizetype, size);
 
   if (TREE_CODE (dest) == SSA_NAME)
@@ -3608,7 +3614,7 @@  tree
 	  tree off = gimple_assign_rhs2 (stmt);
 	  if (TREE_CODE (off) == INTEGER_CST)
 	    {
-	      if (tree size = compute_objsize (dest, ostype, pdecl))
+	      if (tree size = compute_objsize (dest, ostype, pdecl, poff))
 		{
 		  wide_int wioff = wi::to_wide (off);
 		  wide_int wisiz = wi::to_wide (size);
@@ -3619,10 +3625,16 @@  tree
 		  if (wi::sign_mask (wioff))
 		    ;
 		  else if (wi::ltu_p (wioff, wisiz))
-		    return wide_int_to_tree (TREE_TYPE (size),
-					     wi::sub (wisiz, wioff));
+		    {
+		      *poff = size_binop (PLUS_EXPR, *poff, off);
+		      return wide_int_to_tree (TREE_TYPE (size),
+					       wi::sub (wisiz, wioff));
+		    }
 		  else
-		    return size_zero_node;
+		    {
+		      *poff = size_binop (PLUS_EXPR, *poff, off);
+		      return size_zero_node;
+		    }
 		}
 	    }
 	  else if (TREE_CODE (off) == SSA_NAME
@@ -3644,10 +3656,18 @@  tree
 			  || wi::sign_mask (max))
 			;
 		      else if (wi::ltu_p (min, wisiz))
-			return wide_int_to_tree (TREE_TYPE (size),
-						 wi::sub (wisiz, min));
+			{
+			  *poff = size_binop (PLUS_EXPR, *poff,
+					      wide_int_to_tree (sizetype, min));
+			  return wide_int_to_tree (TREE_TYPE (size),
+						   wi::sub (wisiz, min));
+			}
 		      else
-			return size_zero_node;
+			{
+			  *poff = size_binop (PLUS_EXPR, *poff,
+					      wide_int_to_tree (sizetype, min));
+			  return size_zero_node;
+			}
 		    }
 		}
 	    }
@@ -3666,17 +3686,21 @@  tree
     {
       tree ref = TREE_OPERAND (dest, 0);
       tree off = TREE_OPERAND (dest, 1);
-      if (tree size = compute_objsize (ref, ostype, pdecl))
+      if (tree size = compute_objsize (ref, ostype, pdecl, poff))
 	{
+	  *poff = size_binop (PLUS_EXPR, *poff, fold_convert (sizetype, off));
+
 	  /* If the declaration of the destination object is known
 	     to have zero size, return zero.  */
 	  if (integer_zerop (size))
-	    return integer_zero_node;
+	    return size_zero_node;
 
-	  if (TREE_CODE (off) != INTEGER_CST
-	      || TREE_CODE (size) != INTEGER_CST)
+	  if (TREE_CODE (size) != INTEGER_CST
+	      || TREE_CODE (off) != INTEGER_CST)
 	    return NULL_TREE;
 
+	  off = fold_convert (ptrdiff_type_node, off);
+
 	  if (TREE_CODE (dest) == ARRAY_REF)
 	    {
 	      tree eltype = TREE_TYPE (dest);
@@ -3687,9 +3711,19 @@  tree
 		return NULL_TREE;
 	    }
 
+	  if (tree_int_cst_sgn (off) < 0)
+	    {
+	      /* Add the negative offset to the size and return the smaller
+		 of the difference and zero.  */
+	      tree szdiff = fold_build2 (PLUS_EXPR, size_type_node, size, off);
+	      if (tree_int_cst_lt (szdiff, size))
+		return szdiff;
+	      return size_zero_node;
+	    }
+
 	  if (tree_int_cst_lt (off, size))
 	    return fold_build2 (MINUS_EXPR, size_type_node, size, off);
-	  return integer_zero_node;
+	  return size_zero_node;
 	}
 
       return NULL_TREE;
Index: gcc/builtins.h
===================================================================
--- gcc/builtins.h	(revision 277886)
+++ gcc/builtins.h	(working copy)
@@ -134,7 +134,7 @@  extern tree fold_call_stmt (gcall *, bool);
 extern void set_builtin_user_assembler_name (tree decl, const char *asmspec);
 extern bool is_simple_builtin (tree);
 extern bool is_inexpensive_builtin (tree);
-extern tree compute_objsize (tree, int, tree * = NULL);
+extern tree compute_objsize (tree, int, tree * = NULL, tree * = NULL);
 
 extern bool readonly_data_expr (tree exp);
 extern bool init_target_chars (void);
Index: gcc/testsuite/c-c++-common/Wstringop-overflow-2.c
===================================================================
--- gcc/testsuite/c-c++-common/Wstringop-overflow-2.c	(revision 277886)
+++ gcc/testsuite/c-c++-common/Wstringop-overflow-2.c	(working copy)
@@ -10,7 +10,7 @@  void sink (void*);
 struct Ax
 {
   char n;
-  char a[];                     // { dg-message "destination object declared here" }
+  char a[];                     // { dg-message "declared here" }
 };
 
 // Verify warning for a definition with no initializer.
@@ -91,7 +91,7 @@  void gaxx (void)
 struct A0
 {
   char n;
-  char a[0];                    // { dg-message "destination object declared here" }
+  char a[0];                    // { dg-message "declared here" }
 };
 
 // Verify warning for a definition with no initializer.
@@ -158,7 +158,7 @@  void ga0x (void)
 struct A1
 {
   char n;
-  char a[1];                    // { dg-message "destination object declared here" }
+  char a[1];                    // { dg-message "declared here" }
 };
 
 // Verify warning for a definition with no initializer.
@@ -256,7 +256,7 @@  void ga1x (void)
 struct A1i
 {
   char n;
-  char a[1];                    // { dg-message "destination object declared here" }
+  char a[1];                    // { dg-message "declared here" }
   char x;
 };
 
Index: gcc/testsuite/g++.dg/warn/Wstringop-overflow-3.C
===================================================================
--- gcc/testsuite/g++.dg/warn/Wstringop-overflow-3.C	(revision 277886)
+++ gcc/testsuite/g++.dg/warn/Wstringop-overflow-3.C	(working copy)
@@ -3,6 +3,8 @@ 
    { dg-do compile }
    { dg-options "-O2 -Wall -Wno-array-bounds" } */
 
+#define NOIPA __attribute__ ((noipa))
+
 void sink (void*);
 
 // Exercise flexible array members.
@@ -10,13 +12,13 @@  void sink (void*);
 struct Ax
 {
   char n;
-  char a[];                     // { dg-message "destination object declared here" }
+  char a[];                     // { dg-message "at offset \[0-2\] to object 'Ax::a' declared here" }
 };
 
 // Verify warning for a definition with no initializer.
 Ax ax_;
 
-void gax_ ()
+NOIPA void gax_ ()
 {
   ax_.a[0] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
   ax_.a[1] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -27,7 +29,7 @@  Ax ax_;
 // initialize the flexible array member.
 Ax ax0 = { 0 };
 
-void gax0 ()
+NOIPA void gax0 ()
 {
   ax0.a[0] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
   ax0.a[1] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -38,7 +40,7 @@  Ax ax0 = { 0 };
 // initializes the flexible array member to empty.
 Ax ax0_ = { 0, { } };
 
-void gax0_ ()
+NOIPA void gax0_ ()
 {
   ax0_.a[0] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
   ax0_.a[1] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
@@ -49,7 +51,7 @@  Ax ax0_ = { 0, { } };
 // an initializer.
 Ax ax1 = { 1, { 0 } };
 
-void gax1 ()
+NOIPA void gax1 ()
 {
   ax1.a[0] = 0;
   ax1.a[1] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -58,7 +60,7 @@  Ax ax1 = { 1, { 0 } };
 
 Ax ax2 = { 2, { 1, 0 } };
 
-void gax2 ()
+NOIPA void gax2 ()
 {
   ax2.a[0] = 0;
   ax2.a[1] = 0;
@@ -67,7 +69,7 @@  Ax ax2 = { 2, { 1, 0 } };
 
 
 // Verify no warning for an unknown struct object.
-void gaxp (Ax *p)
+NOIPA void gaxp (Ax *p)
 {
   p->a[0] = 0;
   p->a[3] = 0;
@@ -79,7 +81,7 @@  Ax ax2 = { 2, { 1, 0 } };
 // initialized to any number of elements.
 extern Ax axx;
 
-void gaxx ()
+NOIPA void gaxx ()
 {
   axx.a[0] = 0;
   axx.a[3] = 0;
@@ -91,13 +93,13 @@  extern Ax axx;
 struct A0
 {
   char n;
-  char a[0];                    // { dg-message "destination object declared here" }
+  char a[0];                    // { dg-message "at offset \[0-2\] to object 'A0::a' with size 0 declared here" }
 };
 
 // Verify warning for a definition with no initializer.
 A0 a0_;
 
-void ga0_ ()
+NOIPA void ga0_ ()
 {
   a0_.a[0] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
   a0_.a[1] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -108,7 +110,7 @@  A0 a0_;
 // initialize the flexible array member.
 A0 a00 = { 0 };
 
-void ga00 ()
+NOIPA void ga00 ()
 {
   a00.a[0] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
   a00.a[1] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -119,7 +121,7 @@  A0 a00 = { 0 };
 // initializes the flexible array member to empty.
 A0 a00_ = { 0, { } };
 
-void ga00_ ()
+NOIPA void ga00_ ()
 {
   a00_.a[0] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
   a00_.a[1] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
@@ -133,7 +135,7 @@  A0 a00_ = { 0, { } };
 
 
 // Verify no warning for an unknown struct object.
-void ga0p (A0 *p)
+NOIPA void ga0p (A0 *p)
 {
   p->a[0] = 0;
   p->a[3] = 0;
@@ -145,7 +147,7 @@  A0 a00_ = { 0, { } };
 // flexible array member) may not be initialized.
 extern A0 a0x;
 
-void ga0x ()
+NOIPA void ga0x ()
 {
   a0x.a[0] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
   a0x.a[3] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -158,13 +160,13 @@  extern A0 a0x;
 struct A1
 {
   char n;
-  char a[1];                    // { dg-message "destination object declared here" }
+  char a[1];                    // { dg-message "at offset \[1-9\] to object 'A1::a' with size 1 declared here" }
 };
 
 // Verify warning for a definition with no initializer.
 A1 a1_;
 
-void ga1_ ()
+NOIPA void ga1_ ()
 {
   a1_.a[0] = 0;
   a1_.a[1] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -175,7 +177,7 @@  A1 a1_;
 // initialize the one-element array member.
 A1 a1__ = { 0 };
 
-void ga1__ ()
+NOIPA void ga1__ ()
 {
   a1__.a[0] = 0;
   a1__.a[1] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -186,7 +188,7 @@  A1 a1__ = { 0 };
 // initializes the one-element array member to empty.
 A1 a1_0 = { 0, { } };
 
-void ga1_0_ ()
+NOIPA void ga1_0_ ()
 {
   a1_0.a[0] = 0;
   a1_0.a[1] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
@@ -197,7 +199,7 @@  A1 a1_0 = { 0, { } };
 // initializes the one-element array member.
 A1 a1_1 = { 0, { 1 } };
 
-void ga1_1 ()
+NOIPA void ga1_1 ()
 {
   a1_1.a[0] = 0;
   a1_1.a[1] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
@@ -206,7 +208,7 @@  A1 a1_1 = { 0, { 1 } };
 
 
 // Verify no warning for an unknown struct object.
-void ga1p (A1 *p)
+NOIPA void ga1p (A1 *p)
 {
   p->a[0] = 0;
   p->a[3] = 0;
@@ -219,7 +221,7 @@  A1 a1_1 = { 0, { 1 } };
 // a single element.
 extern A1 a1x;
 
-void ga1x ()
+NOIPA void ga1x ()
 {
   a1x.a[0] = 0;
   a1x.a[3] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -232,7 +234,7 @@  extern A1 a1x;
 struct A1i
 {
   char n;
-  char a[1];                    // { dg-message "destination object declared here" }
+  char a[1];                    // { dg-message "at offset \[1-9\] to object 'A1i::a' with size 1 declared here" }
   char x;
 };
 
@@ -239,7 +241,7 @@  struct A1i
 // Verify warning for a definition with no initializer.
 A1i a1i_;
 
-void ga1i_ ()
+NOIPA void ga1i_ ()
 {
   a1i_.a[0] = 0;
   a1i_.a[1] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
@@ -250,7 +252,7 @@  A1i a1i_;
 // initialize the one-element array member.
 A1i a1i__ = { 0 };
 
-void ga1i__ ()
+NOIPA void ga1i__ ()
 {
   a1i__.a[0] = 0;
   a1i__.a[1] = 0;                // { dg-warning "\\\[-Wstringop-overflow" }
@@ -261,7 +263,7 @@  A1i a1i__ = { 0 };
 // initializes the one-element array member to empty.
 A1 a1i_0 = { 0, { } };
 
-void ga1i_0_ ()
+NOIPA void ga1i_0_ ()
 {
   a1i_0.a[0] = 0;
   a1i_0.a[1] = 0;               // { dg-warning "\\\[-Wstringop-overflow" }
@@ -272,7 +274,7 @@  A1 a1i_0 = { 0, { } };
 // initializes the one-element array member.
 A1 a1i_1 = { 0, { 1 } };
 
-void ga1i_1 ()
+NOIPA void ga1i_1 ()
 {
   a1i_1.a[0] = 0;
   a1i_1.a[1] = 0;               // { dg-warning "\\\[-Wstringop-overflow" }
@@ -281,7 +283,7 @@  A1 a1i_1 = { 0, { 1 } };
 
 
 // Verify no warning for an unknown struct object.
-void ga1ip (A1i *p)
+NOIPA void ga1ip (A1i *p)
 {
   p->a[0] = 0;
   p->a[3] = 0;                  // { dg-warning "\\\[-Wstringop-overflow" }
@@ -292,7 +294,7 @@  A1 a1i_1 = { 0, { 1 } };
 // Verify no warning for an extern struct object.
 extern A1i a1ix;
 
-void ga1ix ()
+NOIPA void ga1ix ()
 {
   a1ix.a[0] = 0;
   a1ix.a[3] = 0;                 // { dg-warning "\\\[-Wstringop-overflow" }
@@ -305,7 +307,7 @@  extern A1i a1ix;
 struct Bx
 {
   char n;
-  char a[];                     // { dg-message "destination object declared here" }
+  char a[];                     // { dg-message "at offset 0 to object 'Bx::a' declared here" }
 
   // Verify the warning for a constant.
   Bx () { a[0] = 0; }           // { dg-warning "\\\[-Wstringop-overflow" }
@@ -315,13 +317,13 @@  struct Bx
   Bx (int i) { a[i] = 0; }      // { dg-warning "\\\[-Wstringop-overflow" }
 };
 
-void gbx (void)
+NOIPA void gbx (void)
 {
   struct Bx bx;
   sink (&bx);
 }
 
-void gbxi (int i)
+NOIPA void gbxi (int i)
 {
   struct Bx bxi (i);
   sink (&bxi);
@@ -330,13 +332,13 @@  struct Bx
 struct B0
 {
   char n;
-  char a[0];                    // { dg-message "destination object declared here" }
+  char a[0];                    // { dg-message "at offset 0 to object 'B0::a' with size 0 declared here" }
 
   B0 () { a[0] = 0; }           // { dg-warning "\\\[-Wstringop-overflow" }
 };
 
 
-void gb0 (void)
+NOIPA void gb0 (void)
 {
   struct B0 b0;
   sink (&b0);
@@ -346,12 +348,12 @@  struct B0
 struct B1
 {
   char n;
-  char a[1];                    // { dg-message "destination object declared here" }
+  char a[1];                    // { dg-message "at offset 1 to object 'B1::a' with size 1 declared here" }
 
   B1 () { a[1] = 0; }           // { dg-warning "\\\[-Wstringop-overflow" }
 };
 
-void gb1 (void)
+NOIPA void gb1 (void)
 {
   struct B1 b1;
   sink (&b1);
@@ -360,12 +362,12 @@  struct B1
 
 struct B123
 {
-  char a[123];                  // { dg-message "destination object declared here" }
+  char a[123];                  // { dg-message "at offset 123 to object 'B123::a' with size 123 declared here" }
 
   B123 () { a[123] = 0; }       // { dg-warning "\\\[-Wstringop-overflow" }
 };
 
-void gb123 (void)
+NOIPA void gb123 (void)
 {
   struct B123 b123;
   sink (&b123);
@@ -374,12 +376,12 @@  struct B123
 
 struct B234
 {
-  char a[234];                  // { dg-message "destination object declared here" }
+  char a[234];                  // { dg-message "at offset 234 to object 'B234::a' with size 234 declared here" }
 
   B234 (int i) { a[i] = 0; }    // { dg-warning "\\\[-Wstringop-overflow" }
 };
 
-void g234 (void)
+NOIPA void g234 (void)
 {
   struct B234 b234 (234);
   sink (&b234);
Index: gcc/testsuite/gcc.dg/Wstringop-overflow-17.c
===================================================================
--- gcc/testsuite/gcc.dg/Wstringop-overflow-17.c	(revision 277886)
+++ gcc/testsuite/gcc.dg/Wstringop-overflow-17.c	(working copy)
@@ -13,7 +13,7 @@  void sink (void*);
 
 void call_copy_n (const char *s)
 {
-  char a[3];        // { dg-message "destination object declared here" }
+  char a[3];        // { dg-message "declared here" }
   copy_n (a, "1234567", 7);
   sink (a);
 }
Index: gcc/tree-object-size.c
===================================================================
--- gcc/tree-object-size.c	(revision 277886)
+++ gcc/tree-object-size.c	(working copy)
@@ -55,7 +55,7 @@  static const unsigned HOST_WIDE_INT unknown[4] = {
 static tree compute_object_offset (const_tree, const_tree);
 static bool addr_object_size (struct object_size_info *,
 			      const_tree, int, unsigned HOST_WIDE_INT *,
-			      tree * = NULL);
+			      tree * = NULL, tree * = NULL);
 static unsigned HOST_WIDE_INT alloc_object_size (const gcall *, int);
 static tree pass_through_call (const gcall *);
 static void collect_object_sizes_for (struct object_size_info *, tree);
@@ -174,13 +174,15 @@  compute_object_offset (const_tree expr, const_tree
 static bool
 addr_object_size (struct object_size_info *osi, const_tree ptr,
 		  int object_size_type, unsigned HOST_WIDE_INT *psize,
-		  tree *pdecl /* = NULL */)
+		  tree *pdecl /* = NULL */, tree *poff /* = NULL */)
 {
   tree pt_var, pt_var_size = NULL_TREE, var_size, bytes;
 
-  tree dummy;
+  tree dummy_decl, dummy_off = size_zero_node;
   if (!pdecl)
-    pdecl = &dummy;
+    pdecl = &dummy_decl;
+  if (!poff)
+    poff = &dummy_off;
 
   gcc_assert (TREE_CODE (ptr) == ADDR_EXPR);
 
@@ -201,7 +203,7 @@  addr_object_size (struct object_size_info *osi, co
 	  || TREE_CODE (TREE_OPERAND (pt_var, 0)) != SSA_NAME)
 	{
 	  compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
-				       object_size_type & ~1, &sz, pdecl);
+				       object_size_type & ~1, &sz, pdecl, poff);
 	}
       else
 	{
@@ -376,6 +378,7 @@  addr_object_size (struct object_size_info *osi, co
 	    bytes = size_zero_node;
 	  else
 	    bytes = size_binop (MINUS_EXPR, var_size, bytes);
+	  *poff = bytes;
 	}
       if (var != pt_var
 	  && pt_var_size
@@ -390,6 +393,7 @@  addr_object_size (struct object_size_info *osi, co
 		bytes2 = size_zero_node;
 	      else
 		bytes2 = size_binop (MINUS_EXPR, pt_var_size, bytes2);
+	      *poff = size_binop (PLUS_EXPR, *poff, bytes2);
 	      bytes = size_binop (MIN_EXPR, bytes, bytes2);
 	    }
 	}
@@ -496,10 +500,16 @@  pass_through_call (const gcall *call)
 bool
 compute_builtin_object_size (tree ptr, int object_size_type,
 			     unsigned HOST_WIDE_INT *psize,
-			     tree *pdecl /* = NULL */)
+			     tree *pdecl /* = NULL */, tree *poff /* = NULL */)
 {
   gcc_assert (object_size_type >= 0 && object_size_type <= 3);
 
+  tree dummy_decl, dummy_off = size_zero_node;
+  if (!pdecl)
+    pdecl = &dummy_decl;
+  if (!poff)
+    poff = &dummy_off;
+
   /* Set to unknown and overwrite just before returning if the size
      could be determined.  */
   *psize = unknown[object_size_type];
@@ -508,7 +518,7 @@  compute_builtin_object_size (tree ptr, int object_
     init_offset_limit ();
 
   if (TREE_CODE (ptr) == ADDR_EXPR)
-    return addr_object_size (NULL, ptr, object_size_type, psize, pdecl);
+    return addr_object_size (NULL, ptr, object_size_type, psize, pdecl, poff);
 
   if (TREE_CODE (ptr) != SSA_NAME
       || !POINTER_TYPE_P (TREE_TYPE (ptr)))
@@ -533,11 +543,12 @@  compute_builtin_object_size (tree ptr, int object_
 
 	      if (tree_fits_shwi_p (offset)
 		  && compute_builtin_object_size (ptr, object_size_type,
-						  psize, pdecl))
+						  psize, pdecl, poff))
 		{
 		  /* Return zero when the offset is out of bounds.  */
 		  unsigned HOST_WIDE_INT off = tree_to_shwi (offset);
 		  *psize = off < *psize ? *psize - off : 0;
+		  *poff = offset;
 		  return true;
 		}
 	    }
Index: gcc/tree-object-size.h
===================================================================
--- gcc/tree-object-size.h	(revision 277886)
+++ gcc/tree-object-size.h	(working copy)
@@ -23,6 +23,6 @@  along with GCC; see the file COPYING3.  If not see
 extern void init_object_sizes (void);
 extern void fini_object_sizes (void);
 extern bool compute_builtin_object_size (tree, int, unsigned HOST_WIDE_INT *,
-					 tree * = NULL);
+					 tree * = NULL, tree * = NULL);
 
 #endif  // GCC_TREE_OBJECT_SIZE_H
Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c	(revision 277886)
+++ gcc/tree-ssa-strlen.c	(working copy)
@@ -189,6 +189,52 @@  struct laststmt_struct
 static int get_stridx_plus_constant (strinfo *, unsigned HOST_WIDE_INT, tree);
 static void handle_builtin_stxncpy (built_in_function, gimple_stmt_iterator *);
 
+/* Sets MINMAX to either the constant value or the range VAL is in
+   and returns true on success.  */
+
+static bool
+get_range (tree val, wide_int minmax[2], const vr_values *rvals = NULL)
+{
+  if (tree_fits_uhwi_p (val))
+    {
+      minmax[0] = minmax[1] = wi::to_wide (val);
+      return true;
+    }
+
+  if (TREE_CODE (val) != SSA_NAME)
+    return false;
+
+  if (rvals)
+    {
+      gimple *def = SSA_NAME_DEF_STMT (val);
+      if (gimple_assign_single_p (def)
+	  && gimple_assign_rhs_code (def) == INTEGER_CST)
+	{
+	  /* get_value_range returns [0, N] for constant assignments.  */
+	  val = gimple_assign_rhs1 (def);
+	  minmax[0] = minmax[1] = wi::to_wide (val);
+	  return true;
+	}
+
+      const value_range *vr
+	= (CONST_CAST (class vr_values *, rvals)->get_value_range (val));
+      value_range_kind rng = vr->kind ();
+      if (rng != VR_RANGE || !range_int_cst_p (vr))
+	return false;
+
+      minmax[0] = wi::to_wide (vr->min ());
+      minmax[1] = wi::to_wide (vr->max ());
+      return true;
+    }
+
+  value_range_kind rng = get_range_info (val, minmax, minmax + 1);
+  if (rng == VR_RANGE)
+    return true;
+
+  // FIXME: handle anti-ranges?
+  return false;
+}
+
 /* Return:
 
    *  +1  if SI is known to start with more than OFF nonzero characters.
@@ -334,11 +380,18 @@  get_addr_stridx (tree exp, tree ptr, unsigned HOST
   return 0;
 }
 
-/* Return string index for EXP.  */
+/* Returns string index for EXP.  When EXP is an SSA_NAME that refers
+   to a known strinfo with an offset and OFFRNG is non-null, sets
+   both elements of the OFFRNG array to the range of the offset and
+   returns the index of the known strinfo.  In this case the result
+   must not be used in for functions that modify the string.  */
 
 static int
-get_stridx (tree exp)
+get_stridx (tree exp, wide_int offrng[2] = NULL)
 {
+  if (offrng)
+    offrng[0] = offrng[1] = wi::zero (TYPE_PRECISION (sizetype));
+
   if (TREE_CODE (exp) == SSA_NAME)
     {
       if (ssa_ver_to_stridx[SSA_NAME_VERSION (exp)])
@@ -345,6 +398,7 @@  static int
 	return ssa_ver_to_stridx[SSA_NAME_VERSION (exp)];
 
       tree e = exp;
+      int last_idx = 0;
       HOST_WIDE_INT offset = 0;
       /* Follow a chain of at most 5 assignments.  */
       for (int i = 0; i < 5; i++)
@@ -351,7 +405,7 @@  static int
 	{
 	  gimple *def_stmt = SSA_NAME_DEF_STMT (e);
 	  if (!is_gimple_assign (def_stmt))
-	    return 0;
+	    return last_idx;
 
 	  tree_code rhs_code = gimple_assign_rhs_code (def_stmt);
 	  tree ptr, off;
@@ -403,25 +457,69 @@  static int
 	  else
 	    return 0;
 
-	  if (TREE_CODE (ptr) != SSA_NAME
-	      || !tree_fits_shwi_p (off))
+	  if (TREE_CODE (ptr) != SSA_NAME)
 	    return 0;
+
+	  if (!tree_fits_shwi_p (off))
+	    {
+	      if (int idx = ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)])
+		if (offrng)
+		  {
+		    /* Only when requested by setting OFFRNG to non-null,
+		       return the index corresponding to the SSA_NAME.
+		       Do this irrespective of the whether the offset
+		       is known.  */
+		    if (get_range (off, offrng))
+		      {
+			/* When the offset range is known, increment it
+			   it by the constant offset computed in prior
+			   iterations and store it in the OFFRNG array.  */
+ 			offrng[0] += offset;
+			offrng[1] += offset;
+		      }
+		    else
+		      {
+			/* When the offset range cannot be determined
+			   store [0, SIZE_MAX] and let the caller decide
+			   if the offset matters.  */
+			offrng[1] = wi::to_wide (TYPE_MAX_VALUE (sizetype));
+			offrng[0] = wi::zero (offrng[1].get_precision ());
+		      }
+		    return idx;
+		  }
+	      return 0;
+	    }
+
 	  HOST_WIDE_INT this_off = tree_to_shwi (off);
+	  if (offrng)
+	    {
+	      offrng[0] += wi::shwi (this_off, offrng->get_precision ());
+	      offrng[1] += offrng[0];
+	    }
+
 	  if (this_off < 0)
-	    return 0;
+	    return last_idx;
+
 	  offset = (unsigned HOST_WIDE_INT) offset + this_off;
 	  if (offset < 0)
-	    return 0;
-	  if (ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)])
+	    return last_idx;
+
+	  if (int idx = ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)])
 	    {
-	      strinfo *si
-	        = get_strinfo (ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)]);
-	      if (si && compare_nonzero_chars (si, offset) >= 0)
-	        return get_stridx_plus_constant (si, offset, exp);
+	      strinfo *si = get_strinfo (idx);
+	      if (si)
+		{
+		  if (compare_nonzero_chars (si, offset) >= 0)
+		    return get_stridx_plus_constant (si, offset, exp);
+
+		  if (offrng)
+		    last_idx = idx;
+		}
 	    }
 	  e = ptr;
 	}
-      return 0;
+
+      return last_idx;
     }
 
   if (TREE_CODE (exp) == ADDR_EXPR)
@@ -1763,6 +1861,279 @@  maybe_set_strlen_range (tree lhs, tree src, tree b
   return set_strlen_range (lhs, min, max, bound);
 }
 
+/* Diagnose buffer overflow by a STMT writing LEN + PLUS_ONE bytes,
+   into an object designated by the LHS of STMT otherise.  */
+
+static void
+maybe_warn_overflow (gimple *stmt, tree len,
+		     const vr_values *rvals = NULL,
+		     strinfo *si = NULL, bool plus_one = false)
+{
+  if (!len || gimple_no_warning_p (stmt))
+    return;
+
+  tree writefn = NULL_TREE;
+  tree destdecl = NULL_TREE;
+  tree destsize = NULL_TREE;
+  tree dest = NULL_TREE;
+
+  /* The offset into the destination object set by compute_objsize
+     but already reflected in DESTSIZE.  */
+  tree destoff = NULL_TREE;
+
+  if (is_gimple_assign (stmt))
+    {
+      dest = gimple_assign_lhs (stmt);
+      if (TREE_NO_WARNING (dest))
+	return;
+
+      /* For assignments try to determine the size of the destination
+	 first.  Set DESTOFF to the the offset on success.  */
+      tree off = size_zero_node;
+      destsize = compute_objsize (dest, 1, &destdecl, &off);
+      if (destsize)
+	destoff = off;
+    }
+  else if (is_gimple_call (stmt))
+    {
+      writefn = gimple_call_fndecl (stmt);
+      dest = gimple_call_arg (stmt, 0);
+    }
+
+  /* The offset into the destination object computed below and not
+     reflected in DESTSIZE.  Either DESTOFF is set above or OFFRNG
+     below.  */
+  wide_int offrng[2];
+  offrng[0] = wi::zero (TYPE_PRECISION (sizetype));
+  offrng[1] = offrng[0];
+
+  if (!destsize && !si && dest)
+    {
+      /* For both assignments and calls, if no destination STRINFO was
+	 provided, try to get it from the DEST.  */
+      tree ref = dest;
+      tree off = NULL_TREE;
+      if (TREE_CODE (ref) == ARRAY_REF)
+	{
+	  /* Handle stores to VLAs (represented as
+	     ARRAY_REF (MEM_REF (vlaptr, 0), N].  */
+	  off = TREE_OPERAND (ref, 1);
+	  ref = TREE_OPERAND (ref, 0);
+	}
+
+      if (TREE_CODE (ref) == MEM_REF)
+	{
+	  tree mem_off = TREE_OPERAND (ref, 1);
+	  if (off)
+	    {
+	      if (!integer_zerop (mem_off))
+		return;
+	    }
+	  else
+	    off = mem_off;
+	  ref = TREE_OPERAND (ref, 0);
+	}
+
+      if (int idx = get_stridx (ref, offrng))
+	{
+	  si = get_strinfo (idx);
+	  if (off && TREE_CODE (off) == INTEGER_CST)
+	    {
+	      wide_int wioff = wi::to_wide (off, offrng->get_precision ());
+	      offrng[0] += wioff;
+	      offrng[1] += wioff;
+	    }
+	}
+      else
+	return;
+    }
+
+  /* Return early if the DESTSIZE size expression is the same as LEN
+     and the offset into the destination is zero.  This might happen
+     in the case of a pair of malloc and memset calls to allocate
+     an object and clear it as if by calloc.  */
+  if (destsize == len && !plus_one && offrng[0] == 0 && offrng[0] == offrng[1])
+    return;
+
+  wide_int lenrng[2];
+  if (!get_range (len, lenrng, rvals))
+    return;
+
+  if (plus_one)
+    {
+      lenrng[0] += 1;
+      lenrng[1] += 1;
+    }
+
+  /* Compute the range of sizes of the destination object.  The range
+     is constant for declared objects but may be a range for allocated
+     objects.  */
+  wide_int sizrng[2];
+  if (!destsize || !get_range (destsize, sizrng, rvals))
+    {
+      /* On failure, rather than bailing outright, use the maximum range
+	 so that overflow in allocated objects whose size depends on
+	 the strlen of the source can still be diagnosed below.  */
+      sizrng[0] = wi::zero (lenrng->get_precision ());
+      sizrng[1] = wi::to_wide (TYPE_MAX_VALUE (ptrdiff_type_node));
+    }
+
+  /* The size of the remaining space in the destination computed as
+     the size of the latter minus the offset into it.  */
+  wide_int spcrng[2] = { sizrng[0], sizrng[1] };
+  if (wi::sign_mask (offrng[0]))
+    {
+      /* FIXME: Handle negative offsets into allocated objects.  */
+      if (destdecl)
+	spcrng[0] = spcrng[1] = wi::zero (spcrng->get_precision ());
+      else
+	return;
+    }
+  else
+    {
+      spcrng[0] -= wi::ltu_p (offrng[0], spcrng[0]) ? offrng[0] : spcrng[0];
+      spcrng[1] -= wi::ltu_p (offrng[0], spcrng[1]) ? offrng[0] : spcrng[1];
+    }
+
+  if (wi::leu_p (lenrng[0], spcrng[0]))
+    return;
+
+  if (lenrng[0] == spcrng[1]
+      && (len != destsize
+	  || !si || !is_strlen_related_p (si->ptr, len)))
+    return;
+
+  location_t loc = gimple_nonartificial_location (stmt);
+  if (loc == UNKNOWN_LOCATION && dest && EXPR_HAS_LOCATION (dest))
+    loc = tree_nonartificial_location (dest);
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned = false;
+  if (wi::leu_p (lenrng[0], spcrng[1]))
+    {
+      if (len != destsize
+	  && (!si || !is_strlen_related_p (si->ptr, len)))
+	return;
+
+      warned = (writefn
+		? warning_at (loc, OPT_Wstringop_overflow_,
+			      "%G%qD writing one too many bytes into a region "
+			      "of a size that depends on %<strlen%>",
+			      stmt, writefn)
+		: warning_at (loc, OPT_Wstringop_overflow_,
+			      "%Gwriting one too many bytes into a region "
+			      "of a size that depends on %<strlen%>",
+			      stmt));
+    }
+  else if (lenrng[0] == lenrng[1])
+    {
+      if (spcrng[0] == spcrng[1])
+	warned = (writefn
+		  ? warning_n (loc, OPT_Wstringop_overflow_,
+			       lenrng[0].to_uhwi (),
+			       "%G%qD writing %wu byte into a region "
+			       "of size %wu",
+			       "%G%qD writing %wu bytes into a region "
+			       "of size %wu",
+			       stmt, writefn, lenrng[0].to_uhwi (),
+			       spcrng[0].to_uhwi ())
+		  : warning_n (loc, OPT_Wstringop_overflow_,
+			       lenrng[0].to_uhwi (),
+			       "%Gwriting %wu byte into a region "
+			       "of size %wu",
+			       "%Gwriting %wu bytes into a region "
+			       "of size %wu",
+			       stmt, lenrng[0].to_uhwi (),
+			       spcrng[0].to_uhwi ()));
+      else
+	warned = (writefn
+		  ? warning_n (loc, OPT_Wstringop_overflow_,
+			       lenrng[0].to_uhwi (),
+			       "%G%qD writing %wu byte into a region "
+			       "of size between %wu and %wu",
+			       "%G%qD writing %wu bytes into a region "
+			       "of size between %wu and %wu",
+			       stmt, writefn, lenrng[0].to_uhwi (),
+			       spcrng[0].to_uhwi (), spcrng[1].to_uhwi ())
+		  : warning_n (loc, OPT_Wstringop_overflow_,
+			       lenrng[0].to_uhwi (),
+			       "%Gwriting %wu byte into a region "
+			       "of size between %wu and %wu",
+			       "%Gwriting %wu bytes into a region "
+			       "of size between %wu and %wu",
+			       stmt, lenrng[0].to_uhwi (),
+			       spcrng[0].to_uhwi (), spcrng[1].to_uhwi ()));
+    }
+  else if (spcrng[0] == spcrng[1])
+    warned = (writefn
+	      ? warning_at (loc, OPT_Wstringop_overflow_,
+			    "%G%qD writing between %wu and %wu bytes "
+			    "into a region of size %wu",
+			    stmt, writefn, lenrng[0].to_uhwi (),
+			    lenrng[1].to_uhwi (),
+			    spcrng[0].to_uhwi ())
+	      : warning_at (loc, OPT_Wstringop_overflow_,
+			    "%Gwriting between %wu and %wu bytes "
+			    "into a region of size %wu",
+			    stmt, lenrng[0].to_uhwi (),
+			    lenrng[1].to_uhwi (),
+			    spcrng[0].to_uhwi ()));
+  else
+    warned = (writefn
+	      ? warning_at (loc, OPT_Wstringop_overflow_,
+			    "%G%qD writing between %wu and %wu bytes "
+			    "into a region of size between %wu and %wu",
+			    stmt, writefn, lenrng[0].to_uhwi (),
+			    lenrng[1].to_uhwi (),
+			    spcrng[0].to_uhwi (), spcrng[1].to_uhwi ())
+	      : warning_at (loc, OPT_Wstringop_overflow_,
+			    "%Gwriting between %wu and %wu bytes "
+			    "into a region of size between %wu and %wu",
+			    stmt, lenrng[0].to_uhwi (),
+			    lenrng[1].to_uhwi (),
+			    spcrng[0].to_uhwi (), spcrng[1].to_uhwi ()));
+
+  if (!warned)
+    return;
+
+  /* If DESTOFF is not null, use it to format the offset value/range.  */
+  if (destoff)
+    get_range (destoff, offrng);
+
+  /* Format the offset to keep the number of inform calls from growing
+     out of control.  */
+  char offstr[64];
+  if (offrng[0] == offrng[1])
+    sprintf (offstr, "%lli", (long long) offrng[0].to_shwi ());
+  else
+    sprintf (offstr, "[%lli, %lli]",
+	     (long long) offrng[0].to_shwi (), (long long) offrng[1].to_shwi ());
+
+  if (destdecl)
+    {
+      if (tree size = DECL_SIZE_UNIT (destdecl))
+	inform (DECL_SOURCE_LOCATION (destdecl),
+		"at offset %s to object %qD with size %E declared here",
+		offstr, destdecl, size);
+      else
+	inform (DECL_SOURCE_LOCATION (destdecl),
+		"at offset %s to object %qD declared here",
+		offstr, destdecl);
+      return;
+    }
+}
+
+/* Convenience wrapper for the above.  */
+
+static inline void
+maybe_warn_overflow (gimple *stmt, unsigned HOST_WIDE_INT len,
+		     const vr_values *rvals = NULL,
+		     strinfo *si = NULL, bool plus_one = false)
+{
+  maybe_warn_overflow (stmt, build_int_cst (size_type_node, len), rvals,
+		       si, plus_one);
+}
+
 /* Handle a strlen call.  If strlen of the argument is known, replace
    the strlen call with the known value, otherwise remember that strlen
    of the argument is stored in the lhs SSA_NAME.  */
@@ -4323,6 +4694,13 @@  handle_store (gimple_stmt_iterator *gsi, bool *zer
 	  else if (si == NULL || compare_nonzero_chars (si, offset, rvals) < 0)
 	    {
 	      *zero_write = initializer_zerop (rhs);
+
+	      bool dummy;
+	      unsigned lenrange[] = { UINT_MAX, 0, 0 };
+	      if (count_nonzero_bytes (rhs, lenrange, &dummy, &dummy, &dummy,
+				       rvals))
+		maybe_warn_overflow (stmt, lenrange[2], rvals);
+
 	      return true;
 	    }
 	}
@@ -4360,36 +4738,7 @@  handle_store (gimple_stmt_iterator *gsi, bool *zer
       rhs_minlen = lenrange[0];
       storing_nonzero_p = lenrange[1] > 0;
       *zero_write = storing_all_zeros_p;
-
-      /* Avoid issuing multiple warnings for the same LHS or statement.
-	 For example, -Warray-bounds may have already been issued for
-	 an out-of-bounds subscript.  */
-      if (!TREE_NO_WARNING (lhs) && !gimple_no_warning_p (stmt))
-	{
-	  /* Set to the declaration referenced by LHS (if known).  */
-	  tree decl = NULL_TREE;
-	  if (tree dstsize = compute_objsize (lhs, 1, &decl))
-	    if (compare_tree_int (dstsize, lenrange[2]) < 0)
-	      {
-		/* Fall back on the LHS location if the statement
-		   doesn't have one.  */
-		location_t loc = gimple_nonartificial_location (stmt);
-		if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (lhs))
-		  loc = tree_nonartificial_location (lhs);
-		loc = expansion_point_location_if_in_system_header (loc);
-		if (warning_n (loc, OPT_Wstringop_overflow_,
-			       lenrange[2],
-			       "%Gwriting %u byte into a region of size %E",
-			       "%Gwriting %u bytes into a region of size %E",
-			       stmt, lenrange[2], dstsize))
-		  {
-		    if (decl)
-		      inform (DECL_SOURCE_LOCATION (decl),
-			      "destination object declared here");
-		    gimple_set_no_warning (stmt, true);
-		  }
-	      }
-	}
+      maybe_warn_overflow (stmt, lenrange[2], rvals);
     }
   else
     {