diff mbox series

handle non-constant offsets in -Wstringop-overflow (PR 77608)

Message ID 3b35c9b9-491c-4d2d-a2db-737dc2c535d5@gmail.com
State New
Headers show
Series handle non-constant offsets in -Wstringop-overflow (PR 77608) | expand

Commit Message

Martin Sebor Nov. 17, 2017, 7:36 p.m. UTC
The attached patch enhances -Wstringop-overflow to detect more
instances of buffer overflow at compile time by handling non-
constant offsets into the destination object that are known to
be in some range.  The solution could be improved by handling
even more cases (e.g., anti-ranges or offsets relative to
pointers beyond the beginning of an object) but it's a start.

In addition to bootsrapping/regtesting GCC, also tested with
Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
regressions.

Martin

The top of GDB fails to compile at the moment so the validation
there was incomplete.

Comments

Jeff Law Nov. 18, 2017, 7:53 a.m. UTC | #1
On 11/17/2017 12:36 PM, Martin Sebor wrote:
> The attached patch enhances -Wstringop-overflow to detect more
> instances of buffer overflow at compile time by handling non-
> constant offsets into the destination object that are known to
> be in some range.  The solution could be improved by handling
> even more cases (e.g., anti-ranges or offsets relative to
> pointers beyond the beginning of an object) but it's a start.
> 
> In addition to bootsrapping/regtesting GCC, also tested with
> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
> regressions.
> 
> Martin
> 
> The top of GDB fails to compile at the moment so the validation
> there was incomplete.
> 
> gcc-77608.diff
> 
> 
> PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/77608
> 	* builtins.c (compute_objsize): Handle non-constant offsets.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR middle-end/77608
> 	* gcc.dg/Wstringop-overflow.c: New test.
The recursive call into compute_objsize passing in the ostype avoids
having to think about the whole object vs nearest containing object
issues.  Right?

What's left to worry about is maximum or minimum remaining bytes in the
object.  At least that's my understanding of how ostype works here.

So we get the amount remaining, ignoring the variable offset, from the
recursive call (SIZE).  The space left after we account for the variable
offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
return SIZE-MIN (which you do) and for type 2/3 you have to return
SIZE-MAX which I think you get wrong (and you have to account for the
possibility that MAX or MIN is greater than SIZE and thus there's
nothing left).

The other concern here is precision of the answer and how it's going to
be used by callers.  But after a lot of thought I think we're ok on this
front based on how b_o_s is supposed to handle cases where it's given a
pointer to multiple objects which all have known, but possibly different
sizes.






> +	  /* compute_builtin_object_size fails for addresses with
> +	     non-constant offsets.  Try to determine the range of
> +	     such an offset here and use it to adjus the constant
> +	     size.  */
> +	  tree off = gimple_assign_rhs2 (stmt);
> +	  if (TREE_CODE (off) == SSA_NAME
> +	      && INTEGRAL_TYPE_P (TREE_TYPE (off)))
> +	    {
> +	      wide_int min, max;
> +	      enum value_range_type rng = get_range_info (off, &min, &max);
> +
> +	      if (rng == VR_RANGE)
> +		{
> +		  if (tree size = compute_objsize (dest, ostype))
> +		    {
> +		      wide_int wisiz = wi::to_wide (size);
> +		      if (wi::sign_mask (min))
> +			/* ignore negative offsets for now */;
Put the comment before the condition.  like this

  /* Ignore negative offsets for now.  */
  if (wi::sign_mask (min))
    ;

That makes the empty if clause clearer to spot.  You could even argue
that instead of an empty if clause it should be:

  /* Ignore negative offsets for now.  */
  if (wi::sign_mask (min)
    return NULL_TREE;

We're giving up at this point anyway, so why make the reader follow
further control flow to see that's what's going to ultimately happen.



> +		      else if (wi::ltu_p (min, wisiz))
> +			return wide_int_to_tree (TREE_TYPE (size),
> +						 wi::sub (wisiz, min));
So for type 0,1 (max space available).  So I think the condition is

	else if ((ostype & 2) == 0
	         && wi::ltu_p (min, wisize))
	  return SIZE-MIN;

Then you'll need

	else if ((ostype & 2) == 2
		 && (wi::ltu_p (max, wisize)
	  return SIZE-MAX;
	else
	  return size_zero_node;



I'd like Jakub or Richi to chime in here to check my analysis.  It's
getting late :-)

Jeff
Martin Sebor Nov. 18, 2017, 4:47 p.m. UTC | #2
On 11/18/2017 12:53 AM, Jeff Law wrote:
> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>> The attached patch enhances -Wstringop-overflow to detect more
>> instances of buffer overflow at compile time by handling non-
>> constant offsets into the destination object that are known to
>> be in some range.  The solution could be improved by handling
>> even more cases (e.g., anti-ranges or offsets relative to
>> pointers beyond the beginning of an object) but it's a start.

I should have made it even more clear that this change affects
just warnings, not the runtime protection with _FORTIFY_SOURCE.
The latter is driven by the __builtin_object_size intrinsic and
by the compute_builtin_object_size GCC function.

This patch does not change what __builtin_object_size returns.

>> In addition to bootsrapping/regtesting GCC, also tested with
>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>> regressions.
>>
>> Martin
>>
>> The top of GDB fails to compile at the moment so the validation
>> there was incomplete.
>>
>> gcc-77608.diff
>>
>>
>> PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow
>>
>> gcc/ChangeLog:
>>
>> 	PR middle-end/77608
>> 	* builtins.c (compute_objsize): Handle non-constant offsets.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR middle-end/77608
>> 	* gcc.dg/Wstringop-overflow.c: New test.
> The recursive call into compute_objsize passing in the ostype avoids
> having to think about the whole object vs nearest containing object
> issues.  Right?

Yes, something like that.

> What's left to worry about is maximum or minimum remaining bytes in the
> object.  At least that's my understanding of how ostype works here.

-Wstringop-overflow (i.e., warnings only) always uses the largest
object size for memxxx functions (type 0) and the smallest (type
2) for string functions by default.  This can be changed but I
suspect no one does because (IME) few users understand what
the other modes mean or what good using them vs the default might
do.  It was probably a mistake to expose the mode via the option.

I'll address the remaining comments separately.

Martin
Jeff Law Nov. 19, 2017, 10:08 p.m. UTC | #3
On 11/18/2017 09:47 AM, Martin Sebor wrote:
> On 11/18/2017 12:53 AM, Jeff Law wrote:
>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>> The attached patch enhances -Wstringop-overflow to detect more
>>> instances of buffer overflow at compile time by handling non-
>>> constant offsets into the destination object that are known to
>>> be in some range.  The solution could be improved by handling
>>> even more cases (e.g., anti-ranges or offsets relative to
>>> pointers beyond the beginning of an object) but it's a start.
> 
> I should have made it even more clear that this change affects
> just warnings, not the runtime protection with _FORTIFY_SOURCE.
> The latter is driven by the __builtin_object_size intrinsic and
> by the compute_builtin_object_size GCC function.
> 
> This patch does not change what __builtin_object_size returns.
Understood.  But I think that the documented behavior of b_o_s directly
impacts how this routine is supposed to work.

>> 
> -Wstringop-overflow (i.e., warnings only) always uses the largest
> object size for memxxx functions (type 0) and the smallest (type
> 2) for string functions by default.  This can be changed but I
> suspect no one does because (IME) few users understand what
> the other modes mean or what good using them vs the default might
> do.  It was probably a mistake to expose the mode via the option.
Then maybe only do the deeper analysis for max mode only?  As it stands
if someone were to call it in a min mode I think they're going to be
rather surprised by the answer.

jeff
Martin Sebor Nov. 19, 2017, 11:28 p.m. UTC | #4
On 11/18/2017 12:53 AM, Jeff Law wrote:
> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>> The attached patch enhances -Wstringop-overflow to detect more
>> instances of buffer overflow at compile time by handling non-
>> constant offsets into the destination object that are known to
>> be in some range.  The solution could be improved by handling
>> even more cases (e.g., anti-ranges or offsets relative to
>> pointers beyond the beginning of an object) but it's a start.
>>
>> In addition to bootsrapping/regtesting GCC, also tested with
>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>> regressions.
>>
>> Martin
>>
>> The top of GDB fails to compile at the moment so the validation
>> there was incomplete.
>>
>> gcc-77608.diff
>>
>>
>> PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow
>>
>> gcc/ChangeLog:
>>
>> 	PR middle-end/77608
>> 	* builtins.c (compute_objsize): Handle non-constant offsets.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR middle-end/77608
>> 	* gcc.dg/Wstringop-overflow.c: New test.
> The recursive call into compute_objsize passing in the ostype avoids
> having to think about the whole object vs nearest containing object
> issues.  Right?
>
> What's left to worry about is maximum or minimum remaining bytes in the
> object.  At least that's my understanding of how ostype works here.
>
> So we get the amount remaining, ignoring the variable offset, from the
> recursive call (SIZE).  The space left after we account for the variable
> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
> return SIZE-MIN (which you do) and for type 2/3 you have to return
> SIZE-MAX which I think you get wrong (and you have to account for the
> possibility that MAX or MIN is greater than SIZE and thus there's
> nothing left).

Subtracting the upper bound of the offset from the size instead
of the lower bound when the caller is asking for the minimum
object size would make the result essentially meaningless in
common cases where the offset is smaller than size_t, as in:

   char a[7];

   void f (const char *s, unsigned i)
   {
     __builtin_strcpy (a + i, s);
   }

Here, i's range is [0, UINT_MAX].

IMO, it's only useful to use the lower bound here, otherwise
the result would only rarely be non-zero.

This is also what other warnings that deal with ranges do.  For
-Warray-bounds only considers the lower bound (unless it's negative)
when deciding whether or not to warn for

   int g (unsigned i)
   {
     return a[i];
   }

It would be too noisy to be of practical use otherwise (even at
level 2).

Martin
Jeff Law Nov. 21, 2017, 4:55 p.m. UTC | #5
On 11/19/2017 04:28 PM, Martin Sebor wrote:
> On 11/18/2017 12:53 AM, Jeff Law wrote:
>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>> The attached patch enhances -Wstringop-overflow to detect more
>>> instances of buffer overflow at compile time by handling non-
>>> constant offsets into the destination object that are known to
>>> be in some range.  The solution could be improved by handling
>>> even more cases (e.g., anti-ranges or offsets relative to
>>> pointers beyond the beginning of an object) but it's a start.
>>>
>>> In addition to bootsrapping/regtesting GCC, also tested with
>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>> regressions.
>>>
>>> Martin
>>>
>>> The top of GDB fails to compile at the moment so the validation
>>> there was incomplete.
>>>
>>> gcc-77608.diff
>>>
>>>
>>> PR middle-end/77608 - missing protection on trivially detectable
>>> runtime buffer overflow
>>>
>>> gcc/ChangeLog:
>>>
>>>     PR middle-end/77608
>>>     * builtins.c (compute_objsize): Handle non-constant offsets.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     PR middle-end/77608
>>>     * gcc.dg/Wstringop-overflow.c: New test.
>> The recursive call into compute_objsize passing in the ostype avoids
>> having to think about the whole object vs nearest containing object
>> issues.  Right?
>>
>> What's left to worry about is maximum or minimum remaining bytes in the
>> object.  At least that's my understanding of how ostype works here.
>>
>> So we get the amount remaining, ignoring the variable offset, from the
>> recursive call (SIZE).  The space left after we account for the variable
>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>> SIZE-MAX which I think you get wrong (and you have to account for the
>> possibility that MAX or MIN is greater than SIZE and thus there's
>> nothing left).
> 
> Subtracting the upper bound of the offset from the size instead
> of the lower bound when the caller is asking for the minimum
> object size would make the result essentially meaningless in
> common cases where the offset is smaller than size_t, as in:
> 
>   char a[7];
> 
>   void f (const char *s, unsigned i)
>   {
>     __builtin_strcpy (a + i, s);
>   }
> 
> Here, i's range is [0, UINT_MAX].
> 
> IMO, it's only useful to use the lower bound here, otherwise
> the result would only rarely be non-zero.
But when we're asking for the minimum left, aren't we essentially asking
for "how much space am I absolutely sure I can write"?  And if that is
the question, then the only conservatively correct answer is to subtract
the high bound.


> 
> This is also what other warnings that deal with ranges do.  For
> -Warray-bounds only considers the lower bound (unless it's negative)
> when deciding whether or not to warn for
> 
>   int g (unsigned i)
>   {
>     return a[i];
>   }>
> It would be too noisy to be of practical use otherwise (even at
> level 2).
Which argues that:

1. Our ranges suck
2. We're currently avoiding dealing with that by giving answers that are
not conservatively correct.

Right?


Jeff
Martin Sebor Nov. 21, 2017, 7:07 p.m. UTC | #6
On 11/21/2017 09:55 AM, Jeff Law wrote:
> On 11/19/2017 04:28 PM, Martin Sebor wrote:
>> On 11/18/2017 12:53 AM, Jeff Law wrote:
>>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>>> The attached patch enhances -Wstringop-overflow to detect more
>>>> instances of buffer overflow at compile time by handling non-
>>>> constant offsets into the destination object that are known to
>>>> be in some range.  The solution could be improved by handling
>>>> even more cases (e.g., anti-ranges or offsets relative to
>>>> pointers beyond the beginning of an object) but it's a start.
>>>>
>>>> In addition to bootsrapping/regtesting GCC, also tested with
>>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>>> regressions.
>>>>
>>>> Martin
>>>>
>>>> The top of GDB fails to compile at the moment so the validation
>>>> there was incomplete.
>>>>
>>>> gcc-77608.diff
>>>>
>>>>
>>>> PR middle-end/77608 - missing protection on trivially detectable
>>>> runtime buffer overflow
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>     PR middle-end/77608
>>>>     * builtins.c (compute_objsize): Handle non-constant offsets.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>     PR middle-end/77608
>>>>     * gcc.dg/Wstringop-overflow.c: New test.
>>> The recursive call into compute_objsize passing in the ostype avoids
>>> having to think about the whole object vs nearest containing object
>>> issues.  Right?
>>>
>>> What's left to worry about is maximum or minimum remaining bytes in the
>>> object.  At least that's my understanding of how ostype works here.
>>>
>>> So we get the amount remaining, ignoring the variable offset, from the
>>> recursive call (SIZE).  The space left after we account for the variable
>>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>>> SIZE-MAX which I think you get wrong (and you have to account for the
>>> possibility that MAX or MIN is greater than SIZE and thus there's
>>> nothing left).
>>
>> Subtracting the upper bound of the offset from the size instead
>> of the lower bound when the caller is asking for the minimum
>> object size would make the result essentially meaningless in
>> common cases where the offset is smaller than size_t, as in:
>>
>>   char a[7];
>>
>>   void f (const char *s, unsigned i)
>>   {
>>     __builtin_strcpy (a + i, s);
>>   }
>>
>> Here, i's range is [0, UINT_MAX].
>>
>> IMO, it's only useful to use the lower bound here, otherwise
>> the result would only rarely be non-zero.
> But when we're asking for the minimum left, aren't we essentially asking
> for "how much space am I absolutely sure I can write"?  And if that is
> the question, then the only conservatively correct answer is to subtract
> the high bound.

I suppose you could look at it that way but IME with this work
(now, and also last year when I submitted a patch actually
changing the built-in), using the upper bound is just not that
useful because it's too often way too big.  There's no way to
distinguish an out-of-range upper bound that's the result of
an inadequate attempt to constrain a value from an out-of-range
upper bound that is sufficiently constrained but in a way GCC
doesn't see.

There are no clients of this API that would be affected by
the decision one way or the other (unless the user specifies
a -Wstringop-overflow= argument greater than the default 2)
so I don't think what we do now matters much, if at all.
Perhaps in the future with some of the range improvements
that you, Aldy and Andrew have been working on.

That said, if it helps us move forward with this enhancement
I'll use the upper bound -- let me know.  In the future, when
it is actually used, we'll adjust it as necessary.

>> This is also what other warnings that deal with ranges do.  For
>> -Warray-bounds only considers the lower bound (unless it's negative)
>> when deciding whether or not to warn for
>>
>>   int g (unsigned i)
>>   {
>>     return a[i];
>>   }>
>> It would be too noisy to be of practical use otherwise (even at
>> level 2).
> Which argues that:
>
> 1. Our ranges suck
> 2. We're currently avoiding dealing with that by giving answers that are
> not conservatively correct.
>
> Right?

I don't feel quite as strongly.  Modulo the pesky bugs we get
every so often for some of the corner cases or known limitations
they seem actually reasonably accurate in most cases, and the
warnings, for the most part, strike a reasonable balance between
false positives and negatives.  But I certainly agree that there
is room to improve and I look forward to taking some of the range
enhancement out for a spin.

Martin
Jeff Law Nov. 23, 2017, 12:03 a.m. UTC | #7
On 11/21/2017 12:07 PM, Martin Sebor wrote:
> On 11/21/2017 09:55 AM, Jeff Law wrote:
>> On 11/19/2017 04:28 PM, Martin Sebor wrote:
>>> On 11/18/2017 12:53 AM, Jeff Law wrote:
>>>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>>>> The attached patch enhances -Wstringop-overflow to detect more
>>>>> instances of buffer overflow at compile time by handling non-
>>>>> constant offsets into the destination object that are known to
>>>>> be in some range.  The solution could be improved by handling
>>>>> even more cases (e.g., anti-ranges or offsets relative to
>>>>> pointers beyond the beginning of an object) but it's a start.
>>>>>
>>>>> In addition to bootsrapping/regtesting GCC, also tested with
>>>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>>>> regressions.
>>>>>
>>>>> Martin
>>>>>
>>>>> The top of GDB fails to compile at the moment so the validation
>>>>> there was incomplete.
>>>>>
>>>>> gcc-77608.diff
>>>>>
>>>>>
>>>>> PR middle-end/77608 - missing protection on trivially detectable
>>>>> runtime buffer overflow
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>     PR middle-end/77608
>>>>>     * builtins.c (compute_objsize): Handle non-constant offsets.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>     PR middle-end/77608
>>>>>     * gcc.dg/Wstringop-overflow.c: New test.
>>>> The recursive call into compute_objsize passing in the ostype avoids
>>>> having to think about the whole object vs nearest containing object
>>>> issues.  Right?
>>>>
>>>> What's left to worry about is maximum or minimum remaining bytes in the
>>>> object.  At least that's my understanding of how ostype works here.
>>>>
>>>> So we get the amount remaining, ignoring the variable offset, from the
>>>> recursive call (SIZE).  The space left after we account for the
>>>> variable
>>>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>>>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>>>> SIZE-MAX which I think you get wrong (and you have to account for the
>>>> possibility that MAX or MIN is greater than SIZE and thus there's
>>>> nothing left).
>>>
>>> Subtracting the upper bound of the offset from the size instead
>>> of the lower bound when the caller is asking for the minimum
>>> object size would make the result essentially meaningless in
>>> common cases where the offset is smaller than size_t, as in:
>>>
>>>   char a[7];
>>>
>>>   void f (const char *s, unsigned i)
>>>   {
>>>     __builtin_strcpy (a + i, s);
>>>   }
>>>
>>> Here, i's range is [0, UINT_MAX].
>>>
>>> IMO, it's only useful to use the lower bound here, otherwise
>>> the result would only rarely be non-zero.
>> But when we're asking for the minimum left, aren't we essentially asking
>> for "how much space am I absolutely sure I can write"?  And if that is
>> the question, then the only conservatively correct answer is to subtract
>> the high bound.
> 
> I suppose you could look at it that way but IME with this work
> (now, and also last year when I submitted a patch actually
> changing the built-in), using the upper bound is just not that
> useful because it's too often way too big.  There's no way to
> distinguish an out-of-range upper bound that's the result of
> an inadequate attempt to constrain a value from an out-of-range
> upper bound that is sufficiently constrained but in a way GCC
> doesn't see.
Understood.

So while it's reasonable to not warn in those cases where we just have
crap range information (that's always going to be the case for some code
regardless of how good my work or Andrew/Aldy's work is), we have to be
very careful and make sure that nobody acts on this information for
optimization purposes because what we're returning is not conservatively
correct.


> 
> There are no clients of this API that would be affected by
> the decision one way or the other (unless the user specifies
> a -Wstringop-overflow= argument greater than the default 2)
> so I don't think what we do now matters much, if at all.
Right, but what's to stop someone without knowledge of the
implementation and its quirk of not returning the conservatively safe
result from using the results in other ways.

What would the impact be of simply not supporting those queries,
essentially returning "I don't know" rather than returning something
that isn't conservatively correct?

If we have to support both queries and we're going to return something
that is not conservatively correct, then it really needs to be
documented to avoid folks using the code in ways that are not safe.



> Perhaps in the future with some of the range improvements
> that you, Aldy and Andrew have been working on.
I think they'll help, but there's always going to be codes that can't be
analyzed, it's just inherent in the languages we use.

> 
> That said, if it helps us move forward with this enhancement
> I'll use the upper bound -- let me know.  In the future, when
> it is actually used, we'll adjust it as necessary.
Understood.  Let's answer the questions above first.  Namely, do we have
to support the other mode?  I thought I saw something that indicated it
wasn't ever used that way by the current clients.  So if we can just
return "don't know" for those ostype queries that's a reasonable place
to go.  Else we should look at a clear comment about the limitations of
the code to help prevent (ab)use of the analysis in contexts were it's
not appropriate.

Jeff
Martin Sebor Nov. 30, 2017, 8:30 p.m. UTC | #8
On 11/22/2017 05:03 PM, Jeff Law wrote:
> On 11/21/2017 12:07 PM, Martin Sebor wrote:
>> On 11/21/2017 09:55 AM, Jeff Law wrote:
>>> On 11/19/2017 04:28 PM, Martin Sebor wrote:
>>>> On 11/18/2017 12:53 AM, Jeff Law wrote:
>>>>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>>>>> The attached patch enhances -Wstringop-overflow to detect more
>>>>>> instances of buffer overflow at compile time by handling non-
>>>>>> constant offsets into the destination object that are known to
>>>>>> be in some range.  The solution could be improved by handling
>>>>>> even more cases (e.g., anti-ranges or offsets relative to
>>>>>> pointers beyond the beginning of an object) but it's a start.
>>>>>>
>>>>>> In addition to bootsrapping/regtesting GCC, also tested with
>>>>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>>>>> regressions.
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>> The top of GDB fails to compile at the moment so the validation
>>>>>> there was incomplete.
>>>>>>
>>>>>> gcc-77608.diff
>>>>>>
>>>>>>
>>>>>> PR middle-end/77608 - missing protection on trivially detectable
>>>>>> runtime buffer overflow
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>>     PR middle-end/77608
>>>>>>     * builtins.c (compute_objsize): Handle non-constant offsets.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>>     PR middle-end/77608
>>>>>>     * gcc.dg/Wstringop-overflow.c: New test.
>>>>> The recursive call into compute_objsize passing in the ostype avoids
>>>>> having to think about the whole object vs nearest containing object
>>>>> issues.  Right?
>>>>>
>>>>> What's left to worry about is maximum or minimum remaining bytes in the
>>>>> object.  At least that's my understanding of how ostype works here.
>>>>>
>>>>> So we get the amount remaining, ignoring the variable offset, from the
>>>>> recursive call (SIZE).  The space left after we account for the
>>>>> variable
>>>>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>>>>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>>>>> SIZE-MAX which I think you get wrong (and you have to account for the
>>>>> possibility that MAX or MIN is greater than SIZE and thus there's
>>>>> nothing left).
>>>>
>>>> Subtracting the upper bound of the offset from the size instead
>>>> of the lower bound when the caller is asking for the minimum
>>>> object size would make the result essentially meaningless in
>>>> common cases where the offset is smaller than size_t, as in:
>>>>
>>>>   char a[7];
>>>>
>>>>   void f (const char *s, unsigned i)
>>>>   {
>>>>     __builtin_strcpy (a + i, s);
>>>>   }
>>>>
>>>> Here, i's range is [0, UINT_MAX].
>>>>
>>>> IMO, it's only useful to use the lower bound here, otherwise
>>>> the result would only rarely be non-zero.
>>> But when we're asking for the minimum left, aren't we essentially asking
>>> for "how much space am I absolutely sure I can write"?  And if that is
>>> the question, then the only conservatively correct answer is to subtract
>>> the high bound.
>>
>> I suppose you could look at it that way but IME with this work
>> (now, and also last year when I submitted a patch actually
>> changing the built-in), using the upper bound is just not that
>> useful because it's too often way too big.  There's no way to
>> distinguish an out-of-range upper bound that's the result of
>> an inadequate attempt to constrain a value from an out-of-range
>> upper bound that is sufficiently constrained but in a way GCC
>> doesn't see.
> Understood.
>
> So while it's reasonable to not warn in those cases where we just have
> crap range information (that's always going to be the case for some code
> regardless of how good my work or Andrew/Aldy's work is), we have to be
> very careful and make sure that nobody acts on this information for
> optimization purposes because what we're returning is not conservatively
> correct.
>
>
>>
>> There are no clients of this API that would be affected by
>> the decision one way or the other (unless the user specifies
>> a -Wstringop-overflow= argument greater than the default 2)
>> so I don't think what we do now matters much, if at all.
> Right, but what's to stop someone without knowledge of the
> implementation and its quirk of not returning the conservatively safe
> result from using the results in other ways.

Presumably they would find out by testing their code.  But this
is a hypothetical scenario.  I added the function for warnings.
I wasn't expecting it to be used for optimization, no such uses
have emerged, and I don't have the impression that anyone is
contemplating adding them (certainly not in stage 3).  If you
think the function could be useful for optimization then we
should certainly consider changing it as we gain experience
with it under those conditions.

> What would the impact be of simply not supporting those queries,
> essentially returning "I don't know" rather than returning something
> that isn't conservatively correct?

Except for false negatives with -Wstringop-overflow=3 and =4
(i.e., Object Size type 2 and 3) I don't think there would be
any impact.  As I said, the function isn't used for optimization
and I don't think the option is used in these modes either, so
in my mind it matters very little.  I don't even think there
are any tests for it in these modes, either.

>
> If we have to support both queries and we're going to return something
> that is not conservatively correct, then it really needs to be
> documented to avoid folks using the code in ways that are not safe.
>
>
>
>> Perhaps in the future with some of the range improvements
>> that you, Aldy and Andrew have been working on.
> I think they'll help, but there's always going to be codes that can't be
> analyzed, it's just inherent in the languages we use.
>
>>
>> That said, if it helps us move forward with this enhancement
>> I'll use the upper bound -- let me know.  In the future, when
>> it is actually used, we'll adjust it as necessary.
> Understood.  Let's answer the questions above first.  Namely, do we have
> to support the other mode?  I thought I saw something that indicated it
> wasn't ever used that way by the current clients.  So if we can just
> return "don't know" for those ostype queries that's a reasonable place
> to go.  Else we should look at a clear comment about the limitations of
> the code to help prevent (ab)use of the analysis in contexts were it's
> not appropriate.

The -Wstringop-overflow= option supports all four object size
types.  I suspect only types 0 and 1 are used but I also don't
see a pressing need to disable or compromise the others.  If
you do I'm fine with changing the option either to only support
modes 0 and 1, or to remove the argument altogether (and using
the same default as today, i.e., mode 0 for memxxx functions
and mode 1 for strxxx functions).  But I see no benefit in
changing the function to give up in modes 2 and 3 and have it
cause false negatives.  It simply isn't necessary today.

For now, I've added comments to clarify the return value and
the purpose of the function.  Let me know if that's good enough
or if you want me to make any other changes.

Martin
PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow

gcc/ChangeLog:

	PR middle-end/77608
	* builtins.c (compute_objsize): Handle non-constant offsets.

gcc/testsuite/ChangeLog:

	PR middle-end/77608
	* gcc.dg/Wstringop-overflow.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 650de0d..e35c514 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3262,8 +3262,12 @@ check_sizes (int opt, tree exp, tree size, tree maxlen, tree src, tree objsize)
 /* Helper to compute the size of the object referenced by the DEST
    expression which must have pointer type, using Object Size type
    OSTYPE (only the least significant 2 bits are used).  Return
-   the size of the object if successful or NULL when the size cannot
-   be determined.  */
+   an estimate of the size of the object if successful or NULL when
+   the size cannot be determined.  When the referenced object involves
+   a non-constant offset in some range the returned value represents
+   the largest size given the smallest non-negative offset in the
+   range.  The function is intended for diagnostics and is not
+   suitable for optimization.  */
 
 tree
 compute_objsize (tree dest, int ostype)
@@ -3276,24 +3280,57 @@ compute_objsize (tree dest, int ostype)
   if (compute_builtin_object_size (dest, ostype, &size))
     return build_int_cst (sizetype, size);
 
-  /* Unless computing the largest size (for memcpy and other raw memory
-     functions), try to determine the size of the object from its type.  */
-  if (!ostype)
-    return NULL_TREE;
-
   if (TREE_CODE (dest) == SSA_NAME)
     {
       gimple *stmt = SSA_NAME_DEF_STMT (dest);
       if (!is_gimple_assign (stmt))
 	return NULL_TREE;
 
+      dest = gimple_assign_rhs1 (stmt);
+
       tree_code code = gimple_assign_rhs_code (stmt);
-      if (code != ADDR_EXPR && code != POINTER_PLUS_EXPR)
-	return NULL_TREE;
+      if (code == POINTER_PLUS_EXPR)
+	{
+	  /* compute_builtin_object_size fails for addresses with
+	     non-constant offsets.  Try to determine the range of
+	     such an offset here and use it to adjus the constant
+	     size.  */
+	  tree off = gimple_assign_rhs2 (stmt);
+	  if (TREE_CODE (off) == SSA_NAME
+	      && INTEGRAL_TYPE_P (TREE_TYPE (off)))
+	    {
+	      wide_int min, max;
+	      enum value_range_type rng = get_range_info (off, &min, &max);
 
-      dest = gimple_assign_rhs1 (stmt);
+	      if (rng == VR_RANGE)
+		{
+		  if (tree size = compute_objsize (dest, ostype))
+		    {
+		      wide_int wisiz = wi::to_wide (size);
+
+		      /* Ignore negative offsets for now.  For others,
+			 use the lower bound as the most optimistic
+			 estimate of the (remaining)size.  */
+		      if (wi::sign_mask (min))
+			;
+		      else if (wi::ltu_p (min, wisiz))
+			return wide_int_to_tree (TREE_TYPE (size),
+						 wi::sub (wisiz, min));
+		      else
+			return size_zero_node;
+		    }
+		}
+	    }
+	}
+      else if (code != ADDR_EXPR)
+	return NULL_TREE;
     }
 
+  /* Unless computing the largest size (for memcpy and other raw memory
+     functions), try to determine the size of the object from its type.  */
+  if (!ostype)
+    return NULL_TREE;
+
   if (TREE_CODE (dest) != ADDR_EXPR)
     return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow.c b/gcc/testsuite/gcc.dg/Wstringop-overflow.c
new file mode 100644
index 0000000..b5bd40e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow.c
@@ -0,0 +1,132 @@
+/* PR middle-end/77608 - missing protection on trivially detectable runtime
+   buffer overflow
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-overflow -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX   __SIZE_MAX__
+#define DIFF_MAX   __PTRDIFF_MAX__
+#define DIFF_MIN   (-DIFF_MAX - 1)
+
+typedef __SIZE_TYPE__ size_t;
+
+extern void* memcpy (void*, const void*, size_t);
+extern char* strcpy (char*, const char*);
+extern char* strncpy (char*, const char*, size_t);
+
+void sink (void*);
+
+extern size_t unsigned_value (void)
+{
+  extern volatile size_t unsigned_value_source;
+  return unsigned_value_source;
+}
+
+size_t unsigned_range (size_t min, size_t max)
+{
+  size_t val = unsigned_value ();
+  return val < min || max < val ? min : val;
+}
+
+#define UR(min, max) unsigned_range (min, max)
+
+
+char a7[7];
+
+struct MemArray { char a9[9]; char a1[1]; };
+
+void test_memcpy_array (const void *s)
+{
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+  T (a7 + UR (0, 1), s, 7);
+  T (a7 + UR (0, 7), s, 7);
+  T (a7 + UR (0, 8), s, 7);
+  T (a7 + UR (0, DIFF_MAX), s, 7);
+  T (a7 + UR (0, SIZE_MAX), s, 7);
+
+  T (a7 + UR (1, 2), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 6" } */
+  T (a7 + UR (2, 3), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 5" } */
+  T (a7 + UR (6, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 1" } */
+  T (a7 + UR (7, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (8, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  T (a7 + UR (9, 10), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, SIZE_MAX), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  /* This is valid.  */
+  char *d = a7 + 7;
+  T (d + UR (-8, -7), s, 7);
+}
+
+/* Verify the absence of warnings for memcpy writing beyond object
+   boundaries. */
+
+void test_memcpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+  /* The following are valid.  */
+  T (p->a9 + UR (0, 1), s, 9);
+  T (p->a9 + UR (0, 7), s, 9);
+  T (p->a9 + UR (0, 8), s, 9);
+  T (p->a9 + UR (0, DIFF_MAX), s, 9);
+  T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+  /* The following are invalid.  Unfortunately, there is apparently enough
+     code out there that abuses memcpy to write past the end of one member
+     and into the members that follow so the following are not diagnosed
+     by design.  It sure would be nice not to have to cater to hacks like
+     these...  */
+  T (p->a9 + UR (1, 2), s, 9);
+  T (p->a9 + UR (1, 2), s, 123);
+}
+
+
+void test_strcpy_array (void)
+{
+#undef T
+#define T(d, s) (strcpy ((d), (s)), sink (d))
+
+  T (a7 + UR (0, 1), "012345");
+  T (a7 + UR (0, 7), "012345");
+  T (a7 + UR (0, 8), "012345");
+  T (a7 + UR (0, DIFF_MAX), "012345");
+  T (a7 + UR (0, SIZE_MAX), "012345");
+
+  T (a7 + UR (1, 2), "012345");   /* { dg-warning "writing 7 bytes into a region of size 6" } */
+  T (a7 + UR (2, 3), "012345");   /* { dg-warning "writing 7 bytes into a region of size 5" } */
+  T (a7 + UR (6, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 1" } */
+  T (a7 + UR (7, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (8, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  T (a7 + UR (9, 10), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, SIZE_MAX), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  char *d = a7 + 7;
+
+  T (d + UR (-8, -7), "012345");
+}
+
+void test_strncpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (strncpy ((d), (s), (n)), sink (d))
+
+  T (p->a9 + UR (0, 1), s, 9);
+  T (p->a9 + UR (0, 7), s, 9);
+  T (p->a9 + UR (0, 8), s, 9);
+  T (p->a9 + UR (0, DIFF_MAX), s, 9);
+  T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+  T (p->a9 + UR (1, 2), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 8" } */
+  T (p->a9 + UR (2, 3), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 7" } */
+  T (p->a9 + UR (6, 9), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 3" } */
+  T (p->a9 + UR (9, 10), s, 9);   /* { dg-warning "writing 9 bytes into a region of size 0" } */
+  T (p->a9 + UR (10, 11), s, 9);  /* { dg-warning "writing 9 bytes into a region of size 0" } */
+
+  T (p->a9 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 1);  /* { dg-warning "writing 1 byte into a region of size 0" } */
+  T (p->a9 + UR (DIFF_MAX, SIZE_MAX), s, 3);  /* { dg-warning "writing 3 bytes into a region of size 0" } */
+}
Jeff Law Dec. 1, 2017, 8:26 a.m. UTC | #9
On 11/30/2017 01:30 PM, Martin Sebor wrote:
> On 11/22/2017 05:03 PM, Jeff Law wrote:
>> On 11/21/2017 12:07 PM, Martin Sebor wrote:
>>> On 11/21/2017 09:55 AM, Jeff Law wrote:
>>>> On 11/19/2017 04:28 PM, Martin Sebor wrote:
>>>>> On 11/18/2017 12:53 AM, Jeff Law wrote:
>>>>>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>>>>>> The attached patch enhances -Wstringop-overflow to detect more
>>>>>>> instances of buffer overflow at compile time by handling non-
>>>>>>> constant offsets into the destination object that are known to
>>>>>>> be in some range.  The solution could be improved by handling
>>>>>>> even more cases (e.g., anti-ranges or offsets relative to
>>>>>>> pointers beyond the beginning of an object) but it's a start.
>>>>>>>
>>>>>>> In addition to bootsrapping/regtesting GCC, also tested with
>>>>>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>>>>>> regressions.
>>>>>>>
>>>>>>> Martin
>>>>>>>
>>>>>>> The top of GDB fails to compile at the moment so the validation
>>>>>>> there was incomplete.
>>>>>>>
>>>>>>> gcc-77608.diff
>>>>>>>
>>>>>>>
>>>>>>> PR middle-end/77608 - missing protection on trivially detectable
>>>>>>> runtime buffer overflow
>>>>>>>
>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>>     PR middle-end/77608
>>>>>>>     * builtins.c (compute_objsize): Handle non-constant offsets.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>>     PR middle-end/77608
>>>>>>>     * gcc.dg/Wstringop-overflow.c: New test.
>>>>>> The recursive call into compute_objsize passing in the ostype avoids
>>>>>> having to think about the whole object vs nearest containing object
>>>>>> issues.  Right?
>>>>>>
>>>>>> What's left to worry about is maximum or minimum remaining bytes
>>>>>> in the
>>>>>> object.  At least that's my understanding of how ostype works here.
>>>>>>
>>>>>> So we get the amount remaining, ignoring the variable offset, from
>>>>>> the
>>>>>> recursive call (SIZE).  The space left after we account for the
>>>>>> variable
>>>>>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>>>>>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>>>>>> SIZE-MAX which I think you get wrong (and you have to account for the
>>>>>> possibility that MAX or MIN is greater than SIZE and thus there's
>>>>>> nothing left).
>>>>>
>>>>> Subtracting the upper bound of the offset from the size instead
>>>>> of the lower bound when the caller is asking for the minimum
>>>>> object size would make the result essentially meaningless in
>>>>> common cases where the offset is smaller than size_t, as in:
>>>>>
>>>>>   char a[7];
>>>>>
>>>>>   void f (const char *s, unsigned i)
>>>>>   {
>>>>>     __builtin_strcpy (a + i, s);
>>>>>   }
>>>>>
>>>>> Here, i's range is [0, UINT_MAX].
>>>>>
>>>>> IMO, it's only useful to use the lower bound here, otherwise
>>>>> the result would only rarely be non-zero.
>>>> But when we're asking for the minimum left, aren't we essentially
>>>> asking
>>>> for "how much space am I absolutely sure I can write"?  And if that is
>>>> the question, then the only conservatively correct answer is to
>>>> subtract
>>>> the high bound.
>>>
>>> I suppose you could look at it that way but IME with this work
>>> (now, and also last year when I submitted a patch actually
>>> changing the built-in), using the upper bound is just not that
>>> useful because it's too often way too big.  There's no way to
>>> distinguish an out-of-range upper bound that's the result of
>>> an inadequate attempt to constrain a value from an out-of-range
>>> upper bound that is sufficiently constrained but in a way GCC
>>> doesn't see.
>> Understood.
>>
>> So while it's reasonable to not warn in those cases where we just have
>> crap range information (that's always going to be the case for some code
>> regardless of how good my work or Andrew/Aldy's work is), we have to be
>> very careful and make sure that nobody acts on this information for
>> optimization purposes because what we're returning is not conservatively
>> correct.
>>
>>
>>>
>>> There are no clients of this API that would be affected by
>>> the decision one way or the other (unless the user specifies
>>> a -Wstringop-overflow= argument greater than the default 2)
>>> so I don't think what we do now matters much, if at all.
>> Right, but what's to stop someone without knowledge of the
>> implementation and its quirk of not returning the conservatively safe
>> result from using the results in other ways.
> 
> Presumably they would find out by testing their code.  But this
> is a hypothetical scenario.  I added the function for warnings.
> I wasn't expecting it to be used for optimization, no such uses
> have emerged, and I don't have the impression that anyone is
> contemplating adding them (certainly not in stage 3).  If you
> think the function could be useful for optimization then we
> should certainly consider changing it as we gain experience
> with it under those conditions.
Merely passing tests does not mean the code is correct, we both have the
war stories and scars to prove it. :-)  Hell, I prove it to myself
nearly daily :(

Furthermore, just because nobody is using the function today in an
optimization context does not mean it will always be the case.  Worse
yet, someone could potentially use a caller of compute_objsize without
knowing about the limitations.

In fact, if I look at how we handle expand_builtin_mempcpy we have:

 /* Avoid expanding mempcpy into memcpy when the call is determined
     to overflow the buffer.  This also prevents the same overflow
     from being diagnosed again when expanding memcpy.  */
  if (!check_memop_sizes (exp, dest, src, len))
    return NULL_RTX;

While that's not strictly meant to be an optimization, it is in effect
changing the code we generate based on the return value of
check_memop_sizes, which comes from compute_objsize.  Thankfully, the
worst that happens here is we'll fail to turn the mempcpy into a memcpy
if compute_objsize/check_memop_sizes returns a value that is not
strictly correct.  But I think it highlights how easy it is to end up
having code generation changing based on the results of compute_objsize.


>> What would the impact be of simply not supporting those queries,
>> essentially returning "I don't know" rather than returning something
>> that isn't conservatively correct?
> 
> Except for false negatives with -Wstringop-overflow=3 and =4
> (i.e., Object Size type 2 and 3) I don't think there would be
> any impact.  As I said, the function isn't used for optimization
> and I don't think the option is used in these modes either, so
> in my mind it matters very little.  I don't even think there
> are any tests for it in these modes, either.
SO based on my research around the mempcpy stuff above, my thinking is
refined a bit.

Inherently the result of compute_objsize is inexact when presented with
a non-constant offset (and it may be in other contexts as well).  That
introduces a level of risk in that callers may well end up using the
result of compute_objsize in ways that are unsafe.

THe best way I know of to deal with that is to make the result a
multi-state so that we can distinguish between the exact results and
inexact results.  ie, the fact that the function returns an inexact
result gets baked into its API.  It's a lot harder to mis-use.

I could possibly live with a pair of comments though.  In
compute_objsize we'd want a comment clarifying that the result is
inexact and discourage using the result to make code generation
decisions of any kind (ie, more general than just discouraging using the
return value for optimization purposes).

In the mempcpy code we'd want a comment indicating why it's safe to use
the result the way we do.  Essentially if compute_objsize returns true,
then we transform the result into a memcpy + return value adjustment,
otherwise we just call mempcpy.  In both cases we copy the same data and
the final return value is the same.

> 
> For now, I've added comments to clarify the return value and
> the purpose of the function.  Let me know if that's good enough
> or if you want me to make any other changes.
SO would you rather go with baking the inexact nature of the return
value into the API or the pair of comments noted above?

jeff
Martin Sebor Dec. 1, 2017, 6:06 p.m. UTC | #10
On 12/01/2017 01:26 AM, Jeff Law wrote:
> On 11/30/2017 01:30 PM, Martin Sebor wrote:
>> On 11/22/2017 05:03 PM, Jeff Law wrote:
>>> On 11/21/2017 12:07 PM, Martin Sebor wrote:
>>>> On 11/21/2017 09:55 AM, Jeff Law wrote:
>>>>> On 11/19/2017 04:28 PM, Martin Sebor wrote:
>>>>>> On 11/18/2017 12:53 AM, Jeff Law wrote:
>>>>>>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
>>>>>>>> The attached patch enhances -Wstringop-overflow to detect more
>>>>>>>> instances of buffer overflow at compile time by handling non-
>>>>>>>> constant offsets into the destination object that are known to
>>>>>>>> be in some range.  The solution could be improved by handling
>>>>>>>> even more cases (e.g., anti-ranges or offsets relative to
>>>>>>>> pointers beyond the beginning of an object) but it's a start.
>>>>>>>>
>>>>>>>> In addition to bootsrapping/regtesting GCC, also tested with
>>>>>>>> Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
>>>>>>>> regressions.
>>>>>>>>
>>>>>>>> Martin
>>>>>>>>
>>>>>>>> The top of GDB fails to compile at the moment so the validation
>>>>>>>> there was incomplete.
>>>>>>>>
>>>>>>>> gcc-77608.diff
>>>>>>>>
>>>>>>>>
>>>>>>>> PR middle-end/77608 - missing protection on trivially detectable
>>>>>>>> runtime buffer overflow
>>>>>>>>
>>>>>>>> gcc/ChangeLog:
>>>>>>>>
>>>>>>>>     PR middle-end/77608
>>>>>>>>     * builtins.c (compute_objsize): Handle non-constant offsets.
>>>>>>>>
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>>
>>>>>>>>     PR middle-end/77608
>>>>>>>>     * gcc.dg/Wstringop-overflow.c: New test.
>>>>>>> The recursive call into compute_objsize passing in the ostype avoids
>>>>>>> having to think about the whole object vs nearest containing object
>>>>>>> issues.  Right?
>>>>>>>
>>>>>>> What's left to worry about is maximum or minimum remaining bytes
>>>>>>> in the
>>>>>>> object.  At least that's my understanding of how ostype works here.
>>>>>>>
>>>>>>> So we get the amount remaining, ignoring the variable offset, from
>>>>>>> the
>>>>>>> recursive call (SIZE).  The space left after we account for the
>>>>>>> variable
>>>>>>> offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
>>>>>>> return SIZE-MIN (which you do) and for type 2/3 you have to return
>>>>>>> SIZE-MAX which I think you get wrong (and you have to account for the
>>>>>>> possibility that MAX or MIN is greater than SIZE and thus there's
>>>>>>> nothing left).
>>>>>>
>>>>>> Subtracting the upper bound of the offset from the size instead
>>>>>> of the lower bound when the caller is asking for the minimum
>>>>>> object size would make the result essentially meaningless in
>>>>>> common cases where the offset is smaller than size_t, as in:
>>>>>>
>>>>>>   char a[7];
>>>>>>
>>>>>>   void f (const char *s, unsigned i)
>>>>>>   {
>>>>>>     __builtin_strcpy (a + i, s);
>>>>>>   }
>>>>>>
>>>>>> Here, i's range is [0, UINT_MAX].
>>>>>>
>>>>>> IMO, it's only useful to use the lower bound here, otherwise
>>>>>> the result would only rarely be non-zero.
>>>>> But when we're asking for the minimum left, aren't we essentially
>>>>> asking
>>>>> for "how much space am I absolutely sure I can write"?  And if that is
>>>>> the question, then the only conservatively correct answer is to
>>>>> subtract
>>>>> the high bound.
>>>>
>>>> I suppose you could look at it that way but IME with this work
>>>> (now, and also last year when I submitted a patch actually
>>>> changing the built-in), using the upper bound is just not that
>>>> useful because it's too often way too big.  There's no way to
>>>> distinguish an out-of-range upper bound that's the result of
>>>> an inadequate attempt to constrain a value from an out-of-range
>>>> upper bound that is sufficiently constrained but in a way GCC
>>>> doesn't see.
>>> Understood.
>>>
>>> So while it's reasonable to not warn in those cases where we just have
>>> crap range information (that's always going to be the case for some code
>>> regardless of how good my work or Andrew/Aldy's work is), we have to be
>>> very careful and make sure that nobody acts on this information for
>>> optimization purposes because what we're returning is not conservatively
>>> correct.
>>>
>>>
>>>>
>>>> There are no clients of this API that would be affected by
>>>> the decision one way or the other (unless the user specifies
>>>> a -Wstringop-overflow= argument greater than the default 2)
>>>> so I don't think what we do now matters much, if at all.
>>> Right, but what's to stop someone without knowledge of the
>>> implementation and its quirk of not returning the conservatively safe
>>> result from using the results in other ways.
>>
>> Presumably they would find out by testing their code.  But this
>> is a hypothetical scenario.  I added the function for warnings.
>> I wasn't expecting it to be used for optimization, no such uses
>> have emerged, and I don't have the impression that anyone is
>> contemplating adding them (certainly not in stage 3).  If you
>> think the function could be useful for optimization then we
>> should certainly consider changing it as we gain experience
>> with it under those conditions.
> Merely passing tests does not mean the code is correct, we both have the
> war stories and scars to prove it. :-)  Hell, I prove it to myself
> nearly daily :(
>
> Furthermore, just because nobody is using the function today in an
> optimization context does not mean it will always be the case.  Worse
> yet, someone could potentially use a caller of compute_objsize without
> knowing about the limitations.
>
> In fact, if I look at how we handle expand_builtin_mempcpy we have:
>
>  /* Avoid expanding mempcpy into memcpy when the call is determined
>      to overflow the buffer.  This also prevents the same overflow
>      from being diagnosed again when expanding memcpy.  */
>   if (!check_memop_sizes (exp, dest, src, len))
>     return NULL_RTX;
>
> While that's not strictly meant to be an optimization, it is in effect
> changing the code we generate based on the return value of
> check_memop_sizes, which comes from compute_objsize.  Thankfully, the
> worst that happens here is we'll fail to turn the mempcpy into a memcpy
> if compute_objsize/check_memop_sizes returns a value that is not
> strictly correct.  But I think it highlights how easy it is to end up
> having code generation changing based on the results of compute_objsize.

I think you have misread the code (which is easy to do), so I'm
not sure it does highlight it.

The mempcpy code unconditionally uses Object Size type 0 (see
check_memop_sizes) so it cannot be used the way you are concerned
about.  All raw memory built-ins that write into memory do.

The string functions do use other Object Size modes, depending
on the -Wstringop-overflow=N argument (by default mode 1), but
they don't use it to disable the expansion.

Even if any of the existing functions did use the function to
disable the expansion, because compute_objsize uses the lower
bound of the offset, the result is the upper of the size and
so only considered invalid if it definitely is out-of-bounds.
In this scenario, using the upper bound would have the effect
of disabling the expansion even in cases they are not provably
in bounds (though only in Object Size modes 2 and 3, which
I suspect are never used). I.e., it would have an adverse
impact not only in terms of false positives but also on
the expansion.

If we were talking about changing __builtin_object_size (we're
not), using the upper bound would cause string function calls
with an in-bounds lower bound and an out-of-bounds upper bound
to abort with _FORTIFY_SOURCE.  Conversely, giving up by
returning a "don't know" result in this case would fail to
prevent provable buffer overflows.

I believe that where ranges are involved, it only makes sense
to use the conservative bound (which may be the lower or the
upper bound, depending on the circumstances).  But it just
isn't viable to use the other one because it leads to wrong
(suboptimal) results.

>>> What would the impact be of simply not supporting those queries,
>>> essentially returning "I don't know" rather than returning something
>>> that isn't conservatively correct?
>>
>> Except for false negatives with -Wstringop-overflow=3 and =4
>> (i.e., Object Size type 2 and 3) I don't think there would be
>> any impact.  As I said, the function isn't used for optimization
>> and I don't think the option is used in these modes either, so
>> in my mind it matters very little.  I don't even think there
>> are any tests for it in these modes, either.
> SO based on my research around the mempcpy stuff above, my thinking is
> refined a bit.
>
> Inherently the result of compute_objsize is inexact when presented with
> a non-constant offset (and it may be in other contexts as well).  That
> introduces a level of risk in that callers may well end up using the
> result of compute_objsize in ways that are unsafe.
>
> THe best way I know of to deal with that is to make the result a
> multi-state so that we can distinguish between the exact results and
> inexact results.  ie, the fact that the function returns an inexact
> result gets baked into its API.  It's a lot harder to mis-use.

The result already is multi-state: NULL means "don't know," which
is interpreted as valid.

> I could possibly live with a pair of comments though.  In
> compute_objsize we'd want a comment clarifying that the result is
> inexact and discourage using the result to make code generation
> decisions of any kind (ie, more general than just discouraging using the
> return value for optimization purposes).
>
> In the mempcpy code we'd want a comment indicating why it's safe to use
> the result the way we do.  Essentially if compute_objsize returns true,
> then we transform the result into a memcpy + return value adjustment,
> otherwise we just call mempcpy.  In both cases we copy the same data and
> the final return value is the same.
>
>>
>> For now, I've added comments to clarify the return value and
>> the purpose of the function.  Let me know if that's good enough
>> or if you want me to make any other changes.
> SO would you rather go with baking the inexact nature of the return
> value into the API or the pair of comments noted above?

I added the comment to compute_objsize in the last iteration
of the patch.

As I explained above, expand_builtin_mempcpy isn't affected by
this patch.  The comments above and within in the function
already explain what what happens when a buffer overflow is
detected and why so I'm not sure what else to say there.  If
I misunderstood your suggestion please clarify.

The other change you are suggesting means restoring the false
negatives in Object Size modes 2 and 3 where my patch detected
true positives.  Here's an example to demonstrate the effect.
With my original patch, the overflow in both functions below
is diagnosed with all -Wstringop-overflow=N arguments.  With
the change you asked for, only the memcpy overflow is diagnosed
with all arguments.  The strncpy overflow is only diagnosed
with N=1 and 2, and suppressed with N=3 and N=4.  That's clearly
a worse outcome, but in the interest of moving forward I've made
the change.  I think the overall improvement is worthwhile despite
this flaw.

$ cat a.c && gcc -O2 -S -Wstringop-overflow=3 a.c
   char d[7];

   void f (const void *s, unsigned i)
   {
     if (i < 3 || 9 < i)
       i = 3;

     __builtin_memcpy (d + i, s, 5);   // diagnosed
   }

   void g (const char *s, unsigned i)
   {
     if (i < 3 || 9 < i)
       i = 3;

     __builtin_strncpy (d + i, s, 5);   // false negative
   }

a.c: In function ‘f’:
a.c:8:3: warning: ‘__builtin_memcpy’ writing 5 bytes into a region of 
size 4 overflows the destination [-Wstringop-overflow=]
    __builtin_memcpy (d + i, s, 5);   // diagnosed
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Martin
PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow

gcc/ChangeLog:

	PR middle-end/77608
	* builtins.c (compute_objsize): Handle non-constant offsets.

gcc/testsuite/ChangeLog:

	PR middle-end/77608
	* gcc.dg/Wstringop-overflow.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 650de0d..0565eaf 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3262,8 +3262,12 @@ check_sizes (int opt, tree exp, tree size, tree maxlen, tree src, tree objsize)
 /* Helper to compute the size of the object referenced by the DEST
    expression which must have pointer type, using Object Size type
    OSTYPE (only the least significant 2 bits are used).  Return
-   the size of the object if successful or NULL when the size cannot
-   be determined.  */
+   an estimate of the size of the object if successful or NULL when
+   the size cannot be determined.  When the referenced object involves
+   a non-constant offset in some range the returned value represents
+   the largest size given the smallest non-negative offset in the
+   range.  The function is intended for diagnostics and is not
+   suitable for optimization.  */
 
 tree
 compute_objsize (tree dest, int ostype)
@@ -3276,24 +3280,68 @@ compute_objsize (tree dest, int ostype)
   if (compute_builtin_object_size (dest, ostype, &size))
     return build_int_cst (sizetype, size);
 
-  /* Unless computing the largest size (for memcpy and other raw memory
-     functions), try to determine the size of the object from its type.  */
-  if (!ostype)
-    return NULL_TREE;
-
   if (TREE_CODE (dest) == SSA_NAME)
     {
       gimple *stmt = SSA_NAME_DEF_STMT (dest);
       if (!is_gimple_assign (stmt))
 	return NULL_TREE;
 
+      dest = gimple_assign_rhs1 (stmt);
+
       tree_code code = gimple_assign_rhs_code (stmt);
-      if (code != ADDR_EXPR && code != POINTER_PLUS_EXPR)
-	return NULL_TREE;
+      if (code == POINTER_PLUS_EXPR)
+	{
+	  /* compute_builtin_object_size fails for addresses with
+	     non-constant offsets.  Try to determine the range of
+	     such an offset here and use it to adjus the constant
+	     size.  */
+	  tree off = gimple_assign_rhs2 (stmt);
+	  if (TREE_CODE (off) == SSA_NAME
+	      && INTEGRAL_TYPE_P (TREE_TYPE (off)))
+	    {
+	      wide_int min, max;
+	      enum value_range_type rng = get_range_info (off, &min, &max);
 
-      dest = gimple_assign_rhs1 (stmt);
+	      if (rng == VR_RANGE)
+		{
+		  if (tree size = compute_objsize (dest, ostype))
+		    {
+		      wide_int wisiz = wi::to_wide (size);
+
+		      /* Ignore negative offsets for now.  */
+		      if (wi::sign_mask (min))
+			;
+		      else if (ostype > 1)
+			{	
+			  /* In Object Size modes 2 and 3, return zero
+			     if the remaining size is definitely zero,
+			     otherwise return a "don't know" result.  */
+			  if (wi::leu_p (wisiz, min))
+			    return size_zero_node;
+
+			  return NULL_TREE;
+			}
+		      else if (wi::ltu_p (min, wisiz))	
+			/* In Object Size modes 0 and 1, use the lower
+			   bound of the offset to compute as the most
+			   optimistic estimate of the (remaining) size.  */
+			return wide_int_to_tree (TREE_TYPE (size),
+						 wi::sub (wisiz, min));
+		      else
+			return size_zero_node;
+		    }
+		}
+	    }
+	}
+      else if (code != ADDR_EXPR)
+	return NULL_TREE;
     }
 
+  /* Unless computing the largest size (for memcpy and other raw memory
+     functions), try to determine the size of the object from its type.  */
+  if (!ostype)
+    return NULL_TREE;
+
   if (TREE_CODE (dest) != ADDR_EXPR)
     return NULL_TREE;
 
diff --git a/gcc/testsuite/gcc.dg/Wstringop-overflow.c b/gcc/testsuite/gcc.dg/Wstringop-overflow.c
new file mode 100644
index 0000000..b5bd40e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wstringop-overflow.c
@@ -0,0 +1,132 @@
+/* PR middle-end/77608 - missing protection on trivially detectable runtime
+   buffer overflow
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-overflow -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX   __SIZE_MAX__
+#define DIFF_MAX   __PTRDIFF_MAX__
+#define DIFF_MIN   (-DIFF_MAX - 1)
+
+typedef __SIZE_TYPE__ size_t;
+
+extern void* memcpy (void*, const void*, size_t);
+extern char* strcpy (char*, const char*);
+extern char* strncpy (char*, const char*, size_t);
+
+void sink (void*);
+
+extern size_t unsigned_value (void)
+{
+  extern volatile size_t unsigned_value_source;
+  return unsigned_value_source;
+}
+
+size_t unsigned_range (size_t min, size_t max)
+{
+  size_t val = unsigned_value ();
+  return val < min || max < val ? min : val;
+}
+
+#define UR(min, max) unsigned_range (min, max)
+
+
+char a7[7];
+
+struct MemArray { char a9[9]; char a1[1]; };
+
+void test_memcpy_array (const void *s)
+{
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+  T (a7 + UR (0, 1), s, 7);
+  T (a7 + UR (0, 7), s, 7);
+  T (a7 + UR (0, 8), s, 7);
+  T (a7 + UR (0, DIFF_MAX), s, 7);
+  T (a7 + UR (0, SIZE_MAX), s, 7);
+
+  T (a7 + UR (1, 2), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 6" } */
+  T (a7 + UR (2, 3), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 5" } */
+  T (a7 + UR (6, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 1" } */
+  T (a7 + UR (7, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (8, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  T (a7 + UR (9, 10), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, SIZE_MAX), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  /* This is valid.  */
+  char *d = a7 + 7;
+  T (d + UR (-8, -7), s, 7);
+}
+
+/* Verify the absence of warnings for memcpy writing beyond object
+   boundaries. */
+
+void test_memcpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+  /* The following are valid.  */
+  T (p->a9 + UR (0, 1), s, 9);
+  T (p->a9 + UR (0, 7), s, 9);
+  T (p->a9 + UR (0, 8), s, 9);
+  T (p->a9 + UR (0, DIFF_MAX), s, 9);
+  T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+  /* The following are invalid.  Unfortunately, there is apparently enough
+     code out there that abuses memcpy to write past the end of one member
+     and into the members that follow so the following are not diagnosed
+     by design.  It sure would be nice not to have to cater to hacks like
+     these...  */
+  T (p->a9 + UR (1, 2), s, 9);
+  T (p->a9 + UR (1, 2), s, 123);
+}
+
+
+void test_strcpy_array (void)
+{
+#undef T
+#define T(d, s) (strcpy ((d), (s)), sink (d))
+
+  T (a7 + UR (0, 1), "012345");
+  T (a7 + UR (0, 7), "012345");
+  T (a7 + UR (0, 8), "012345");
+  T (a7 + UR (0, DIFF_MAX), "012345");
+  T (a7 + UR (0, SIZE_MAX), "012345");
+
+  T (a7 + UR (1, 2), "012345");   /* { dg-warning "writing 7 bytes into a region of size 6" } */
+  T (a7 + UR (2, 3), "012345");   /* { dg-warning "writing 7 bytes into a region of size 5" } */
+  T (a7 + UR (6, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 1" } */
+  T (a7 + UR (7, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (8, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  T (a7 + UR (9, 10), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, SIZE_MAX), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  char *d = a7 + 7;
+
+  T (d + UR (-8, -7), "012345");
+}
+
+void test_strncpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (strncpy ((d), (s), (n)), sink (d))
+
+  T (p->a9 + UR (0, 1), s, 9);
+  T (p->a9 + UR (0, 7), s, 9);
+  T (p->a9 + UR (0, 8), s, 9);
+  T (p->a9 + UR (0, DIFF_MAX), s, 9);
+  T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+  T (p->a9 + UR (1, 2), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 8" } */
+  T (p->a9 + UR (2, 3), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 7" } */
+  T (p->a9 + UR (6, 9), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 3" } */
+  T (p->a9 + UR (9, 10), s, 9);   /* { dg-warning "writing 9 bytes into a region of size 0" } */
+  T (p->a9 + UR (10, 11), s, 9);  /* { dg-warning "writing 9 bytes into a region of size 0" } */
+
+  T (p->a9 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 1);  /* { dg-warning "writing 1 byte into a region of size 0" } */
+  T (p->a9 + UR (DIFF_MAX, SIZE_MAX), s, 3);  /* { dg-warning "writing 3 bytes into a region of size 0" } */
+}
Jeff Law Dec. 7, 2017, 9:06 p.m. UTC | #11
On 12/01/2017 11:06 AM, Martin Sebor wrote:
> On 12/01/2017 01:26 AM, Jeff Law wrote:
>> On 11/30/2017 01:30 PM, Martin Sebor wrote:
>>> On 11/22/2017 05:03 PM, Jeff Law wrote:
>>>> On 11/21/2017 12:07 PM, Martin Sebor wrote:
>>>>> On 11/21/2017 09:55 AM, Jeff Law wrote:
>>>>>> On 11/19/2017 04:28 PM, Martin Sebor wrote:
>>>>>>> On 11/18/2017 12:53 AM, Jeff Law wrote:
>>>>>>>> On 11/17/2017 12:36 PM, Martin Sebor wrote:
[ Lots of snipping ]
>>
>> In fact, if I look at how we handle expand_builtin_mempcpy we have:
>>
>>  /* Avoid expanding mempcpy into memcpy when the call is determined
>>      to overflow the buffer.  This also prevents the same overflow
>>      from being diagnosed again when expanding memcpy.  */
>>   if (!check_memop_sizes (exp, dest, src, len))
>>     return NULL_RTX;
>>
>> While that's not strictly meant to be an optimization, it is in effect
>> changing the code we generate based on the return value of
>> check_memop_sizes, which comes from compute_objsize.  Thankfully, the
>> worst that happens here is we'll fail to turn the mempcpy into a memcpy
>> if compute_objsize/check_memop_sizes returns a value that is not
>> strictly correct.  But I think it highlights how easy it is to end up
>> having code generation changing based on the results of compute_objsize.
> 
> I think you have misread the code (which is easy to do), so I'm
> not sure it does highlight it.
Just to be clear, I'm convinced the code should DTRT as-written and that
it's  behavior is not changed by your patch.

My point is that these routines should not generally be used to
influence code generation/optimization decisions.  In an ideal world we
could express and enforce that rule.  But we don't live in that ideal
world and as a result it's relatively easy to use those routines to
influence code generation decisions without even knowing it.


So again, anytime we have something like what we see above where we call
check_memop_sizes/compute_objsize and the result is used to select
between paths that change code generation/optimization we need a
comment.  We also need a comment in compute_objsize to discourage its
use in any contexts where the result affects code generation/optimization.

Again, the code is safe as-is and not affected by your patch.  I'm just
asking for a comment for those functions and at any site where those
functions are used to influence code generation/optimization decisions.


[ More snipping..  Hopefully not losing too much context.. ]
>> SO would you rather go with baking the inexact nature of the return
>> value into the API or the pair of comments noted above?
> 
> I added the comment to compute_objsize in the last iteration
> of the patch.
> 
> As I explained above, expand_builtin_mempcpy isn't affected by
> this patch.  The comments above and within in the function
> already explain what what happens when a buffer overflow is
> detected and why so I'm not sure what else to say there.  If
> I misunderstood your suggestion please clarify.
The comment I'm looking for in expand_builtin_mempcpy would be something
like this:

/* Policy does not generally allow using compute_objsize (which
   is used internally by check_memop_size) to change code generation
   or drive optimization decisions.

   In this instance it is safe because the code we generate has
   the same semantics regardless of the return value of
   check_memop_sizes.   Exactly the same amount of data is copied
   and the return value is exactly the same in both cases.

   Furthermore, check_memop_size always uses mode 0 for the call to
   compute_objsize, so the imprecise nature of compute_objsize is
   avoided.  */



> 
> The other change you are suggesting means restoring the false
> negatives in Object Size modes 2 and 3 where my patch detected
> true positives.  Here's an example to demonstrate the effect.
> With my original patch, the overflow in both functions below
> is diagnosed with all -Wstringop-overflow=N arguments.  With
> the change you asked for, only the memcpy overflow is diagnosed
> with all arguments.  The strncpy overflow is only diagnosed
> with N=1 and 2, and suppressed with N=3 and N=4.  That's clearly
> a worse outcome, but in the interest of moving forward I've made
> the change.  I think the overall improvement is worthwhile despite
> this flaw.
So given the comments about restrictions in how compute_objsize is used,
I don't think we need to make the change I originally asked for.  Go
with whatever you think is best WRT handling of that bound computation.

Either computation of that bound has a degree of imprecision due to
range involvement and makes the result unsuitable for influencing code
generation or optimization.

In the compute_objsize comment I'd use

"The function is intended for diagnostics and should not be used to
influence code generation or optimization."

I think that's slightly better.


OK with either computation the updated comment for compute_objsize.
Please add a comment to the mempcpy bits as well.

Jeff
diff mbox series

Patch

PR middle-end/77608 - missing protection on trivially detectable runtime buffer overflow

gcc/ChangeLog:

	PR middle-end/77608
	* builtins.c (compute_objsize): Handle non-constant offsets.

gcc/testsuite/ChangeLog:

	PR middle-end/77608
	* gcc.dg/Wstringop-overflow.c: New test.

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	(revision 254879)
+++ gcc/builtins.c	(working copy)
@@ -3276,11 +3276,6 @@  compute_objsize (tree dest, int ostype)
   if (compute_builtin_object_size (dest, ostype, &size))
     return build_int_cst (sizetype, size);
 
-  /* Unless computing the largest size (for memcpy and other raw memory
-     functions), try to determine the size of the object from its type.  */
-  if (!ostype)
-    return NULL_TREE;
-
   if (TREE_CODE (dest) == SSA_NAME)
     {
       gimple *stmt = SSA_NAME_DEF_STMT (dest);
@@ -3287,13 +3282,47 @@  compute_objsize (tree dest, int ostype)
       if (!is_gimple_assign (stmt))
 	return NULL_TREE;
 
+      dest = gimple_assign_rhs1 (stmt);
+
       tree_code code = gimple_assign_rhs_code (stmt);
-      if (code != ADDR_EXPR && code != POINTER_PLUS_EXPR)
+      if (code == POINTER_PLUS_EXPR)
+	{
+	  /* compute_builtin_object_size fails for addresses with
+	     non-constant offsets.  Try to determine the range of
+	     such an offset here and use it to adjus the constant
+	     size.  */
+	  tree off = gimple_assign_rhs2 (stmt);
+	  if (TREE_CODE (off) == SSA_NAME
+	      && INTEGRAL_TYPE_P (TREE_TYPE (off)))
+	    {
+	      wide_int min, max;
+	      enum value_range_type rng = get_range_info (off, &min, &max);
+
+	      if (rng == VR_RANGE)
+		{
+		  if (tree size = compute_objsize (dest, ostype))
+		    {
+		      wide_int wisiz = wi::to_wide (size);
+		      if (wi::sign_mask (min))
+			/* ignore negative offsets for now */;
+		      else if (wi::ltu_p (min, wisiz))
+			return wide_int_to_tree (TREE_TYPE (size),
+						 wi::sub (wisiz, min));
+		      else
+			return size_zero_node;
+		    }
+		}
+	    }
+	}
+      else if (code != ADDR_EXPR)
 	return NULL_TREE;
-
-      dest = gimple_assign_rhs1 (stmt);
     }
 
+  /* Unless computing the largest size (for memcpy and other raw memory
+     functions), try to determine the size of the object from its type.  */
+  if (!ostype)
+    return NULL_TREE;
+
   if (TREE_CODE (dest) != ADDR_EXPR)
     return NULL_TREE;
 
Index: gcc/testsuite/gcc.dg/Wstringop-overflow.c
===================================================================
--- gcc/testsuite/gcc.dg/Wstringop-overflow.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/Wstringop-overflow.c	(working copy)
@@ -0,0 +1,132 @@ 
+/* PR middle-end/77608 - missing protection on trivially detectable runtime
+   buffer overflow
+   { dg-do compile }
+   { dg-options "-O2 -Wstringop-overflow -ftrack-macro-expansion=0" }  */
+
+#define SIZE_MAX   __SIZE_MAX__
+#define DIFF_MAX   __PTRDIFF_MAX__
+#define DIFF_MIN   (-DIFF_MAX - 1)
+
+typedef __SIZE_TYPE__ size_t;
+
+extern void* memcpy (void*, const void*, size_t);
+extern char* strcpy (char*, const char*);
+extern char* strncpy (char*, const char*, size_t);
+
+void sink (void*);
+
+extern size_t unsigned_value (void)
+{
+  extern volatile size_t unsigned_value_source;
+  return unsigned_value_source;
+}
+
+size_t unsigned_range (size_t min, size_t max)
+{
+  size_t val = unsigned_value ();
+  return val < min || max < val ? min : val;
+}
+
+#define UR(min, max) unsigned_range (min, max)
+
+
+char a7[7];
+
+struct MemArray { char a9[9]; char a1[1]; };
+
+void test_memcpy_array (const void *s)
+{
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+  T (a7 + UR (0, 1), s, 7);
+  T (a7 + UR (0, 7), s, 7);
+  T (a7 + UR (0, 8), s, 7);
+  T (a7 + UR (0, DIFF_MAX), s, 7);
+  T (a7 + UR (0, SIZE_MAX), s, 7);
+
+  T (a7 + UR (1, 2), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 6" } */
+  T (a7 + UR (2, 3), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 5" } */
+  T (a7 + UR (6, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 1" } */
+  T (a7 + UR (7, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (8, 9), s, 7);   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  T (a7 + UR (9, 10), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, SIZE_MAX), s, 7);  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  /* This is valid.  */
+  char *d = a7 + 7;
+  T (d + UR (-8, -7), s, 7);
+}
+
+/* Verify the absence of warnings for memcpy writing beyond object
+   boundaries. */
+
+void test_memcpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (memcpy ((d), (s), (n)), sink (d))
+
+  /* The following are valid.  */
+  T (p->a9 + UR (0, 1), s, 9);
+  T (p->a9 + UR (0, 7), s, 9);
+  T (p->a9 + UR (0, 8), s, 9);
+  T (p->a9 + UR (0, DIFF_MAX), s, 9);
+  T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+  /* The following are invalid.  Unfortunately, there is apparently enough
+     code out there that abuses memcpy to write past the end of one member
+     and into the members that follow so the following are not diagnosed
+     by design.  It sure would be nice not to have to cater to hacks like
+     these...  */
+  T (p->a9 + UR (1, 2), s, 9);
+  T (p->a9 + UR (1, 2), s, 123);
+}
+
+
+void test_strcpy_array (void)
+{
+#undef T
+#define T(d, s) (strcpy ((d), (s)), sink (d))
+
+  T (a7 + UR (0, 1), "012345");
+  T (a7 + UR (0, 7), "012345");
+  T (a7 + UR (0, 8), "012345");
+  T (a7 + UR (0, DIFF_MAX), "012345");
+  T (a7 + UR (0, SIZE_MAX), "012345");
+
+  T (a7 + UR (1, 2), "012345");   /* { dg-warning "writing 7 bytes into a region of size 6" } */
+  T (a7 + UR (2, 3), "012345");   /* { dg-warning "writing 7 bytes into a region of size 5" } */
+  T (a7 + UR (6, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 1" } */
+  T (a7 + UR (7, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (8, 9), "012345");   /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  T (a7 + UR (9, 10), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+  T (a7 + UR (DIFF_MAX, SIZE_MAX), "012345");  /* { dg-warning "writing 7 bytes into a region of size 0" } */
+
+  char *d = a7 + 7;
+
+  T (d + UR (-8, -7), "012345");
+}
+
+void test_strncpy_memarray (struct MemArray *p, const void *s)
+{
+#undef T
+#define T(d, s, n) (strncpy ((d), (s), (n)), sink (d))
+
+  T (p->a9 + UR (0, 1), s, 9);
+  T (p->a9 + UR (0, 7), s, 9);
+  T (p->a9 + UR (0, 8), s, 9);
+  T (p->a9 + UR (0, DIFF_MAX), s, 9);
+  T (p->a9 + UR (0, SIZE_MAX), s, 9);
+
+  T (p->a9 + UR (1, 2), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 8" } */
+  T (p->a9 + UR (2, 3), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 7" } */
+  T (p->a9 + UR (6, 9), s, 9);    /* { dg-warning "writing 9 bytes into a region of size 3" } */
+  T (p->a9 + UR (9, 10), s, 9);   /* { dg-warning "writing 9 bytes into a region of size 0" } */
+  T (p->a9 + UR (10, 11), s, 9);  /* { dg-warning "writing 9 bytes into a region of size 0" } */
+
+  T (p->a9 + UR (DIFF_MAX, DIFF_MAX + (size_t)1), s, 1);  /* { dg-warning "writing 1 byte into a region of size 0" } */
+  T (p->a9 + UR (DIFF_MAX, SIZE_MAX), s, 3);  /* { dg-warning "writing 3 bytes into a region of size 0" } */
+}