diff mbox series

consider MIN_EXPR in get_size_range() (PR 85888)

Message ID a7d66288-6108-d8da-4929-2ce58419db72@gmail.com
State New
Headers show
Series consider MIN_EXPR in get_size_range() (PR 85888) | expand

Commit Message

Martin Sebor May 23, 2018, 10:50 p.m. UTC
The attached patch enhances the get_size_range() function to
extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
false positives on some targets in cases like some of those
reported in bug 85623 - strncmp() warns about attribute nonstring
incorrectly in -Wstringop-overflow

When first trying to expand calls to __builtin_strncmp, most back
ends succeed even when the expansion results in a call to the library
function.  The powerpc64 back-end, however, fails and and allows
the generic expansion to take place.  There is a subtle difference
between the two cases that shows up when examining the bound of
the strncmp call expression (the third argument): in the original
call expression (in expand_builtin_strncmp) the bound is or can be
an SSA_NAME.  But when the early expansion fails,
expand_builtin_strncmp transforms the original call expression into
a new one, substituting MIN_EXPR (bound, CST) for the bound where
CST is the constant result of strlen (STR), with STR being either
the first or second strncmp argument.  When the bound is an SSA_NAME
the subsequent attempt to determine its range from the MIN_EXPR fails.

Martin

Comments

Richard Biener May 24, 2018, 9:39 a.m. UTC | #1
On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com> wrote:

> The attached patch enhances the get_size_range() function to
> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
> false positives on some targets in cases like some of those
> reported in bug 85623 - strncmp() warns about attribute nonstring
> incorrectly in -Wstringop-overflow

> When first trying to expand calls to __builtin_strncmp, most back
> ends succeed even when the expansion results in a call to the library
> function.  The powerpc64 back-end, however, fails and and allows
> the generic expansion to take place.  There is a subtle difference
> between the two cases that shows up when examining the bound of
> the strncmp call expression (the third argument): in the original
> call expression (in expand_builtin_strncmp) the bound is or can be
> an SSA_NAME.  But when the early expansion fails,
> expand_builtin_strncmp transforms the original call expression into
> a new one, substituting MIN_EXPR (bound, CST) for the bound where
> CST is the constant result of strlen (STR), with STR being either
> the first or second strncmp argument.  When the bound is an SSA_NAME
> the subsequent attempt to determine its range from the MIN_EXPR fails.

Hmm, wouldn't sth extracted from the attached random patch I have lying
around be more useful in general?  That is, refactor the GENERIC
expression walk in determine_value_range_1 to sth that can be used
in the current get_range_info () interface?  Or if we don't want to change
that to accept non-SSA names add a corresponding get_expr_range_info ()
interface.

If you do that please cut out the dominator/loop condition handling and
refrain from the idea of looking at SSA def stmts.

I always wanted an excuse to actually push parts of the idea in this patch
to trunk.

Thanks,
Richard.

> Martin
Martin Sebor May 24, 2018, 5:02 p.m. UTC | #2
On 05/24/2018 03:39 AM, Richard Biener wrote:
> On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com> wrote:
>
>> The attached patch enhances the get_size_range() function to
>> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
>> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
>> false positives on some targets in cases like some of those
>> reported in bug 85623 - strncmp() warns about attribute nonstring
>> incorrectly in -Wstringop-overflow
>
>> When first trying to expand calls to __builtin_strncmp, most back
>> ends succeed even when the expansion results in a call to the library
>> function.  The powerpc64 back-end, however, fails and and allows
>> the generic expansion to take place.  There is a subtle difference
>> between the two cases that shows up when examining the bound of
>> the strncmp call expression (the third argument): in the original
>> call expression (in expand_builtin_strncmp) the bound is or can be
>> an SSA_NAME.  But when the early expansion fails,
>> expand_builtin_strncmp transforms the original call expression into
>> a new one, substituting MIN_EXPR (bound, CST) for the bound where
>> CST is the constant result of strlen (STR), with STR being either
>> the first or second strncmp argument.  When the bound is an SSA_NAME
>> the subsequent attempt to determine its range from the MIN_EXPR fails.
>
> Hmm, wouldn't sth extracted from the attached random patch I have lying
> around be more useful in general?  That is, refactor the GENERIC
> expression walk in determine_value_range_1 to sth that can be used
> in the current get_range_info () interface?  Or if we don't want to change
> that to accept non-SSA names add a corresponding get_expr_range_info ()
> interface.

Thanks.  It is certainly more general.  I'm just not sure how much
use the generality can be put to because...

> If you do that please cut out the dominator/loop condition handling and
> refrain from the idea of looking at SSA def stmts.

...I've done that but I'm having trouble exercising the code except
for this bug.  (I've run the C testsuite but the only tests that end
up in it call it with uninteresting arguments like VAR_DECL).  Do
you have any suggestions?

Martin

PS I was able to create a test that at -O0 could call it with
a more interesting argument such as:

   const __SIZE_TYPE__ i = 3, j = 5;

   int f (void)
   {
     return __builtin_strncmp ("1234", "123", i < j ? i : j);
   }

Here, the first call to gimple_fold_builtin_string_compare() that
I picked as a test case sees:

   j.0_1 = 5;
   i.1_2 = 3;
   _3 = MIN_EXPR <j.0_1, i.1_2>;
   D.1963 = __builtin_strncmp ("1234", "123", _3);

Above, _3's range info is not available yet at this point so
the expression could be evaluated but only by looking at def stmts.
Why is it not a good idea?

Martin
Richard Biener May 24, 2018, 5:15 p.m. UTC | #3
On May 24, 2018 7:02:17 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>On 05/24/2018 03:39 AM, Richard Biener wrote:
>> On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com>
>wrote:
>>
>>> The attached patch enhances the get_size_range() function to
>>> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
>>> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
>>> false positives on some targets in cases like some of those
>>> reported in bug 85623 - strncmp() warns about attribute nonstring
>>> incorrectly in -Wstringop-overflow
>>
>>> When first trying to expand calls to __builtin_strncmp, most back
>>> ends succeed even when the expansion results in a call to the
>library
>>> function.  The powerpc64 back-end, however, fails and and allows
>>> the generic expansion to take place.  There is a subtle difference
>>> between the two cases that shows up when examining the bound of
>>> the strncmp call expression (the third argument): in the original
>>> call expression (in expand_builtin_strncmp) the bound is or can be
>>> an SSA_NAME.  But when the early expansion fails,
>>> expand_builtin_strncmp transforms the original call expression into
>>> a new one, substituting MIN_EXPR (bound, CST) for the bound where
>>> CST is the constant result of strlen (STR), with STR being either
>>> the first or second strncmp argument.  When the bound is an SSA_NAME
>>> the subsequent attempt to determine its range from the MIN_EXPR
>fails.
>>
>> Hmm, wouldn't sth extracted from the attached random patch I have
>lying
>> around be more useful in general?  That is, refactor the GENERIC
>> expression walk in determine_value_range_1 to sth that can be used
>> in the current get_range_info () interface?  Or if we don't want to
>change
>> that to accept non-SSA names add a corresponding get_expr_range_info
>()
>> interface.
>
>Thanks.  It is certainly more general.  I'm just not sure how much
>use the generality can be put to because...
>
>> If you do that please cut out the dominator/loop condition handling
>and
>> refrain from the idea of looking at SSA def stmts.
>
>...I've done that but I'm having trouble exercising the code except
>for this bug.  (I've run the C testsuite but the only tests that end
>up in it call it with uninteresting arguments like VAR_DECL).  Do
>you have any suggestions?

Well, what builds the MIN_EXPR for your testcase? I orinigally used it for niters analysis which deals with complex GENERIC expressions. If you just changed get_range_info then of course callers are guarded with SSA name checks. 

>Martin
>
>PS I was able to create a test that at -O0 could call it with
>a more interesting argument such as:
>
>   const __SIZE_TYPE__ i = 3, j = 5;
>
>   int f (void)
>   {
>     return __builtin_strncmp ("1234", "123", i < j ? i : j);
>   }
>
>Here, the first call to gimple_fold_builtin_string_compare() that
>I picked as a test case sees:
>
>   j.0_1 = 5;
>   i.1_2 = 3;
>   _3 = MIN_EXPR <j.0_1, i.1_2>;
>   D.1963 = __builtin_strncmp ("1234", "123", _3);
>
>Above, _3's range info is not available yet at this point so
>the expression could be evaluated but only by looking at def stmts.
>Why is it not a good idea?

Because it raises complexity concerns unless you also populate intermediate SSA range Infos and stop at present ones. And you can't deal with conditional info as the patch tries to do (but I wouldn't keep that either initially). 

Richard. 

>
>Martin
Martin Sebor May 24, 2018, 9:22 p.m. UTC | #4
On 05/24/2018 11:15 AM, Richard Biener wrote:
> On May 24, 2018 7:02:17 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>> On 05/24/2018 03:39 AM, Richard Biener wrote:
>>> On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com>
>> wrote:
>>>
>>>> The attached patch enhances the get_size_range() function to
>>>> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
>>>> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
>>>> false positives on some targets in cases like some of those
>>>> reported in bug 85623 - strncmp() warns about attribute nonstring
>>>> incorrectly in -Wstringop-overflow
>>>
>>>> When first trying to expand calls to __builtin_strncmp, most back
>>>> ends succeed even when the expansion results in a call to the
>> library
>>>> function.  The powerpc64 back-end, however, fails and and allows
>>>> the generic expansion to take place.  There is a subtle difference
>>>> between the two cases that shows up when examining the bound of
>>>> the strncmp call expression (the third argument): in the original
>>>> call expression (in expand_builtin_strncmp) the bound is or can be
>>>> an SSA_NAME.  But when the early expansion fails,
>>>> expand_builtin_strncmp transforms the original call expression into
>>>> a new one, substituting MIN_EXPR (bound, CST) for the bound where
>>>> CST is the constant result of strlen (STR), with STR being either
>>>> the first or second strncmp argument.  When the bound is an SSA_NAME
>>>> the subsequent attempt to determine its range from the MIN_EXPR
>> fails.
>>>
>>> Hmm, wouldn't sth extracted from the attached random patch I have
>> lying
>>> around be more useful in general?  That is, refactor the GENERIC
>>> expression walk in determine_value_range_1 to sth that can be used
>>> in the current get_range_info () interface?  Or if we don't want to
>> change
>>> that to accept non-SSA names add a corresponding get_expr_range_info
>> ()
>>> interface.
>>
>> Thanks.  It is certainly more general.  I'm just not sure how much
>> use the generality can be put to because...
>>
>>> If you do that please cut out the dominator/loop condition handling
>> and
>>> refrain from the idea of looking at SSA def stmts.
>>
>> ...I've done that but I'm having trouble exercising the code except
>> for this bug.  (I've run the C testsuite but the only tests that end
>> up in it call it with uninteresting arguments like VAR_DECL).  Do
>> you have any suggestions?
>
> Well, what builds the MIN_EXPR for your testcase? I orinigally used it for niters analysis which deals with complex GENERIC expressions. If you just changed get_range_info then of course callers are guarded with SSA name checks.

In pr85888 it's built by expand_builtin_strncmp (so very late).
It doesn't seem like a good use case because of this code:

   /* Expand the library call ourselves using a stabilized argument
      list to avoid re-evaluating the function's arguments twice.  */
   tree fn = build_call_nofold_loc (loc, fndecl, 3, arg1, arg2, len);
   gcc_assert (TREE_CODE (fn) == CALL_EXPR);
   CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
   return expand_call (fn, target, target == const0_rtx);

AFAICS, the new call expression is the same as the old one, the only
difference is the LEN argument which is the MIN_EXPR.

The code was added in r72380 where the stabilization referred to
calling save_expr() on the arguments.  That code has been gone
since r242556 so I think a better/more targeted solution to fix
pr85888 than either of our approaches might be to simply call
expand_call() on the original CALL_EXPR with the original
arguments, including ARG3.

I don't fully understand the point of the save_expr() calls there
(and they're still in builtin_expand_strcmp), so if they need to
be restored then they would presumably include ARG3.  I.e.,
expand_call() would be called using ARG3 in the original SSA_NAME
form rather than the MIN_EXPR wrapper created by the function.

Attached is a patch that implements the first approach above.

If the SAVE_EXPRs are needed for some reason, it would be good
to have a test case to verify it.  I haven't been able to come
up with one (or find an existing test that fails due to their
removal) but that could very well be because my limited
understanding of what they're for (and poor testsuite coverage).

I'm also up for taking your patch and trying to make use of it
for other trees where get_range_info() is currently being called
for SSA_NAMEs only.  But it feels like a separate project from
this bug fix.

Martin

>
>> Martin
>>
>> PS I was able to create a test that at -O0 could call it with
>> a more interesting argument such as:
>>
>>   const __SIZE_TYPE__ i = 3, j = 5;
>>
>>   int f (void)
>>   {
>>     return __builtin_strncmp ("1234", "123", i < j ? i : j);
>>   }
>>
>> Here, the first call to gimple_fold_builtin_string_compare() that
>> I picked as a test case sees:
>>
>>   j.0_1 = 5;
>>   i.1_2 = 3;
>>   _3 = MIN_EXPR <j.0_1, i.1_2>;
>>   D.1963 = __builtin_strncmp ("1234", "123", _3);
>>
>> Above, _3's range info is not available yet at this point so
>> the expression could be evaluated but only by looking at def stmts.
>> Why is it not a good idea?
>
> Because it raises complexity concerns unless you also populate intermediate SSA range Infos and stop at present ones. And you can't deal with conditional info as the patch tries to do (but I wouldn't keep that either initially).
>
> Richard.
>
>>
>> Martin
>
PR testsuite/85888 - New test case c-c++-common/attr-nonstring-6.c from r260541 fails with excess errors

gcc/ChangeLog:

	PR testsuite/85888
	* builtins.c (expand_builtin_strncmp): Call expand_call with
	the original CALL_EXPR since no stabilization has been done.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 841c1ef..5b9085b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -4708,12 +4708,7 @@ expand_builtin_strncmp (tree exp, ATTRIBUTE_UNUSED rtx target,
       return target;
     }
 
-  /* Expand the library call ourselves using a stabilized argument
-     list to avoid re-evaluating the function's arguments twice.  */
-  tree fn = build_call_nofold_loc (loc, fndecl, 3, arg1, arg2, len);
-  gcc_assert (TREE_CODE (fn) == CALL_EXPR);
-  CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
-  return expand_call (fn, target, target == const0_rtx);
+  return expand_call (exp, target, target == const0_rtx);
 }
 
 /* Expand a call to __builtin_saveregs, generating the result in TARGET,
Richard Biener May 25, 2018, 7:02 a.m. UTC | #5
On Thu, May 24, 2018 at 11:22 PM Martin Sebor <msebor@gmail.com> wrote:

> On 05/24/2018 11:15 AM, Richard Biener wrote:
> > On May 24, 2018 7:02:17 PM GMT+02:00, Martin Sebor <msebor@gmail.com>
wrote:
> >> On 05/24/2018 03:39 AM, Richard Biener wrote:
> >>> On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com>
> >> wrote:
> >>>
> >>>> The attached patch enhances the get_size_range() function to
> >>>> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
> >>>> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
> >>>> false positives on some targets in cases like some of those
> >>>> reported in bug 85623 - strncmp() warns about attribute nonstring
> >>>> incorrectly in -Wstringop-overflow
> >>>
> >>>> When first trying to expand calls to __builtin_strncmp, most back
> >>>> ends succeed even when the expansion results in a call to the
> >> library
> >>>> function.  The powerpc64 back-end, however, fails and and allows
> >>>> the generic expansion to take place.  There is a subtle difference
> >>>> between the two cases that shows up when examining the bound of
> >>>> the strncmp call expression (the third argument): in the original
> >>>> call expression (in expand_builtin_strncmp) the bound is or can be
> >>>> an SSA_NAME.  But when the early expansion fails,
> >>>> expand_builtin_strncmp transforms the original call expression into
> >>>> a new one, substituting MIN_EXPR (bound, CST) for the bound where
> >>>> CST is the constant result of strlen (STR), with STR being either
> >>>> the first or second strncmp argument.  When the bound is an SSA_NAME
> >>>> the subsequent attempt to determine its range from the MIN_EXPR
> >> fails.
> >>>
> >>> Hmm, wouldn't sth extracted from the attached random patch I have
> >> lying
> >>> around be more useful in general?  That is, refactor the GENERIC
> >>> expression walk in determine_value_range_1 to sth that can be used
> >>> in the current get_range_info () interface?  Or if we don't want to
> >> change
> >>> that to accept non-SSA names add a corresponding get_expr_range_info
> >> ()
> >>> interface.
> >>
> >> Thanks.  It is certainly more general.  I'm just not sure how much
> >> use the generality can be put to because...
> >>
> >>> If you do that please cut out the dominator/loop condition handling
> >> and
> >>> refrain from the idea of looking at SSA def stmts.
> >>
> >> ...I've done that but I'm having trouble exercising the code except
> >> for this bug.  (I've run the C testsuite but the only tests that end
> >> up in it call it with uninteresting arguments like VAR_DECL).  Do
> >> you have any suggestions?
> >
> > Well, what builds the MIN_EXPR for your testcase? I orinigally used it
for niters analysis which deals with complex GENERIC expressions. If you
just changed get_range_info then of course callers are guarded with SSA
name checks.

> In pr85888 it's built by expand_builtin_strncmp (so very late).
> It doesn't seem like a good use case because of this code:

>     /* Expand the library call ourselves using a stabilized argument
>        list to avoid re-evaluating the function's arguments twice.  */
>     tree fn = build_call_nofold_loc (loc, fndecl, 3, arg1, arg2, len);
>     gcc_assert (TREE_CODE (fn) == CALL_EXPR);
>     CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
>     return expand_call (fn, target, target == const0_rtx);

> AFAICS, the new call expression is the same as the old one, the only
> difference is the LEN argument which is the MIN_EXPR.

> The code was added in r72380 where the stabilization referred to
> calling save_expr() on the arguments.  That code has been gone
> since r242556 so I think a better/more targeted solution to fix
> pr85888 than either of our approaches might be to simply call
> expand_call() on the original CALL_EXPR with the original
> arguments, including ARG3.

> I don't fully understand the point of the save_expr() calls there
> (and they're still in builtin_expand_strcmp), so if they need to
> be restored then they would presumably include ARG3.  I.e.,
> expand_call() would be called using ARG3 in the original SSA_NAME
> form rather than the MIN_EXPR wrapper created by the function.

> Attached is a patch that implements the first approach above.

> If the SAVE_EXPRs are needed for some reason, it would be good
> to have a test case to verify it.  I haven't been able to come
> up with one (or find an existing test that fails due to their
> removal) but that could very well be because my limited
> understanding of what they're for (and poor testsuite coverage).

> I'm also up for taking your patch and trying to make use of it
> for other trees where get_range_info() is currently being called
> for SSA_NAMEs only.  But it feels like a separate project from
> this bug fix.

But your patch makes the whole 'len' computation after

   /* If we don't have a constant length for the first, use the length
      of the second, if we know it.  If neither string is constant length,
      use the given length argument.  We don't require a constant for
      this case; some cost analysis could be done if both are available
      but neither is constant.  For now, assume they're equally cheap,
      unless one has side effects.  If both strings have constant lengths,
      use the smaller.  */

   if (!len1 && !len2)
     len = len3;
   else if (!len1)
...
   /* If we are not using the given length, we must incorporate it here.
      The actual new length parameter will be MIN(len,arg3) in this case.  */
   if (len != len3)
     len = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (len), len, len3);

dead if expand_cmpstrn_or_cmpmem doesn't expand things inline.
I believe the idea is to possibly reduce 'len' for expansion of the
libcall.

OTOH all this "optimization" should have been done as folding
and not at RTL expansion time.  Thus if the issue is getting
a warning from the recursive re-expansion of the adjusted
call then removing this "postmature" optimization and making
sure it happens earlier as folding might be indeed a good thing.

But your patch as-is looks like it loses some optimization
(and I'm not caring about "folding" at -O0).

Richard.

> Martin

> >
> >> Martin
> >>
> >> PS I was able to create a test that at -O0 could call it with
> >> a more interesting argument such as:
> >>
> >>   const __SIZE_TYPE__ i = 3, j = 5;
> >>
> >>   int f (void)
> >>   {
> >>     return __builtin_strncmp ("1234", "123", i < j ? i : j);
> >>   }
> >>
> >> Here, the first call to gimple_fold_builtin_string_compare() that
> >> I picked as a test case sees:
> >>
> >>   j.0_1 = 5;
> >>   i.1_2 = 3;
> >>   _3 = MIN_EXPR <j.0_1, i.1_2>;
> >>   D.1963 = __builtin_strncmp ("1234", "123", _3);
> >>
> >> Above, _3's range info is not available yet at this point so
> >> the expression could be evaluated but only by looking at def stmts.
> >> Why is it not a good idea?
> >
> > Because it raises complexity concerns unless you also populate
intermediate SSA range Infos and stop at present ones. And you can't deal
with conditional info as the patch tries to do (but I wouldn't keep that
either initially).
> >
> > Richard.
> >
> >>
> >> Martin
> >
Martin Sebor May 25, 2018, 5:31 p.m. UTC | #6
On 05/25/2018 01:02 AM, Richard Biener wrote:
> On Thu, May 24, 2018 at 11:22 PM Martin Sebor <msebor@gmail.com> wrote:
>
>> On 05/24/2018 11:15 AM, Richard Biener wrote:
>>> On May 24, 2018 7:02:17 PM GMT+02:00, Martin Sebor <msebor@gmail.com>
> wrote:
>>>> On 05/24/2018 03:39 AM, Richard Biener wrote:
>>>>> On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com>
>>>> wrote:
>>>>>
>>>>>> The attached patch enhances the get_size_range() function to
>>>>>> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
>>>>>> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
>>>>>> false positives on some targets in cases like some of those
>>>>>> reported in bug 85623 - strncmp() warns about attribute nonstring
>>>>>> incorrectly in -Wstringop-overflow
>>>>>
>>>>>> When first trying to expand calls to __builtin_strncmp, most back
>>>>>> ends succeed even when the expansion results in a call to the
>>>> library
>>>>>> function.  The powerpc64 back-end, however, fails and and allows
>>>>>> the generic expansion to take place.  There is a subtle difference
>>>>>> between the two cases that shows up when examining the bound of
>>>>>> the strncmp call expression (the third argument): in the original
>>>>>> call expression (in expand_builtin_strncmp) the bound is or can be
>>>>>> an SSA_NAME.  But when the early expansion fails,
>>>>>> expand_builtin_strncmp transforms the original call expression into
>>>>>> a new one, substituting MIN_EXPR (bound, CST) for the bound where
>>>>>> CST is the constant result of strlen (STR), with STR being either
>>>>>> the first or second strncmp argument.  When the bound is an SSA_NAME
>>>>>> the subsequent attempt to determine its range from the MIN_EXPR
>>>> fails.
>>>>>
>>>>> Hmm, wouldn't sth extracted from the attached random patch I have
>>>> lying
>>>>> around be more useful in general?  That is, refactor the GENERIC
>>>>> expression walk in determine_value_range_1 to sth that can be used
>>>>> in the current get_range_info () interface?  Or if we don't want to
>>>> change
>>>>> that to accept non-SSA names add a corresponding get_expr_range_info
>>>> ()
>>>>> interface.
>>>>
>>>> Thanks.  It is certainly more general.  I'm just not sure how much
>>>> use the generality can be put to because...
>>>>
>>>>> If you do that please cut out the dominator/loop condition handling
>>>> and
>>>>> refrain from the idea of looking at SSA def stmts.
>>>>
>>>> ...I've done that but I'm having trouble exercising the code except
>>>> for this bug.  (I've run the C testsuite but the only tests that end
>>>> up in it call it with uninteresting arguments like VAR_DECL).  Do
>>>> you have any suggestions?
>>>
>>> Well, what builds the MIN_EXPR for your testcase? I orinigally used it
> for niters analysis which deals with complex GENERIC expressions. If you
> just changed get_range_info then of course callers are guarded with SSA
> name checks.
>
>> In pr85888 it's built by expand_builtin_strncmp (so very late).
>> It doesn't seem like a good use case because of this code:
>
>>     /* Expand the library call ourselves using a stabilized argument
>>        list to avoid re-evaluating the function's arguments twice.  */
>>     tree fn = build_call_nofold_loc (loc, fndecl, 3, arg1, arg2, len);
>>     gcc_assert (TREE_CODE (fn) == CALL_EXPR);
>>     CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
>>     return expand_call (fn, target, target == const0_rtx);
>
>> AFAICS, the new call expression is the same as the old one, the only
>> difference is the LEN argument which is the MIN_EXPR.
>
>> The code was added in r72380 where the stabilization referred to
>> calling save_expr() on the arguments.  That code has been gone
>> since r242556 so I think a better/more targeted solution to fix
>> pr85888 than either of our approaches might be to simply call
>> expand_call() on the original CALL_EXPR with the original
>> arguments, including ARG3.
>
>> I don't fully understand the point of the save_expr() calls there
>> (and they're still in builtin_expand_strcmp), so if they need to
>> be restored then they would presumably include ARG3.  I.e.,
>> expand_call() would be called using ARG3 in the original SSA_NAME
>> form rather than the MIN_EXPR wrapper created by the function.
>
>> Attached is a patch that implements the first approach above.
>
>> If the SAVE_EXPRs are needed for some reason, it would be good
>> to have a test case to verify it.  I haven't been able to come
>> up with one (or find an existing test that fails due to their
>> removal) but that could very well be because my limited
>> understanding of what they're for (and poor testsuite coverage).
>
>> I'm also up for taking your patch and trying to make use of it
>> for other trees where get_range_info() is currently being called
>> for SSA_NAMEs only.  But it feels like a separate project from
>> this bug fix.
>
> But your patch makes the whole 'len' computation after
>
>    /* If we don't have a constant length for the first, use the length
>       of the second, if we know it.  If neither string is constant length,
>       use the given length argument.  We don't require a constant for
>       this case; some cost analysis could be done if both are available
>       but neither is constant.  For now, assume they're equally cheap,
>       unless one has side effects.  If both strings have constant lengths,
>       use the smaller.  */
>
>    if (!len1 && !len2)
>      len = len3;
>    else if (!len1)
> ...
>    /* If we are not using the given length, we must incorporate it here.
>       The actual new length parameter will be MIN(len,arg3) in this case.  */
>    if (len != len3)
>      len = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (len), len, len3);
>
> dead if expand_cmpstrn_or_cmpmem doesn't expand things inline.
> I believe the idea is to possibly reduce 'len' for expansion of the
> libcall.
>
> OTOH all this "optimization" should have been done as folding
> and not at RTL expansion time.  Thus if the issue is getting
> a warning from the recursive re-expansion of the adjusted
> call then removing this "postmature" optimization and making
> sure it happens earlier as folding might be indeed a good thing.
>
> But your patch as-is looks like it loses some optimization
> (and I'm not caring about "folding" at -O0).

On the one affected target (as far as I know) it increases
the code size by adding a branch.  I wouldn't have thought
of that as an optimization.  Since it bloats code, at
a minimum I'd expect the decision to be gated for -Os.  But
I'm not sure the branch is necessarily a win either -- it
will certainly depend on whether strncpy makes use of it.

At the same time, if you believe this code is deliberately
crafted this way I wouldn't want to undo it just to suppress
a warning.  So I'm back to the question: is my first patch
good enough for now to suppress the warning or do you want
me to use your patch?

I don't have a problem using yours but in the first pass
I wouldn't want to do more than just fix the warning; with
it fixed, I could the look into removing some of the guards
for SSA_NAME around calls to get_range_info().

Martin

> Richard.
>
>> Martin
>
>>>
>>>> Martin
>>>>
>>>> PS I was able to create a test that at -O0 could call it with
>>>> a more interesting argument such as:
>>>>
>>>>   const __SIZE_TYPE__ i = 3, j = 5;
>>>>
>>>>   int f (void)
>>>>   {
>>>>     return __builtin_strncmp ("1234", "123", i < j ? i : j);
>>>>   }
>>>>
>>>> Here, the first call to gimple_fold_builtin_string_compare() that
>>>> I picked as a test case sees:
>>>>
>>>>   j.0_1 = 5;
>>>>   i.1_2 = 3;
>>>>   _3 = MIN_EXPR <j.0_1, i.1_2>;
>>>>   D.1963 = __builtin_strncmp ("1234", "123", _3);
>>>>
>>>> Above, _3's range info is not available yet at this point so
>>>> the expression could be evaluated but only by looking at def stmts.
>>>> Why is it not a good idea?
>>>
>>> Because it raises complexity concerns unless you also populate
> intermediate SSA range Infos and stop at present ones. And you can't deal
> with conditional info as the patch tries to do (but I wouldn't keep that
> either initially).
>>>
>>> Richard.
>>>
>>>>
>>>> Martin
>>>
Richard Biener May 25, 2018, 5:51 p.m. UTC | #7
On May 25, 2018 7:31:43 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>On 05/25/2018 01:02 AM, Richard Biener wrote:
>> On Thu, May 24, 2018 at 11:22 PM Martin Sebor <msebor@gmail.com>
>wrote:
>>
>>> On 05/24/2018 11:15 AM, Richard Biener wrote:
>>>> On May 24, 2018 7:02:17 PM GMT+02:00, Martin Sebor
><msebor@gmail.com>
>> wrote:
>>>>> On 05/24/2018 03:39 AM, Richard Biener wrote:
>>>>>> On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> The attached patch enhances the get_size_range() function to
>>>>>>> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
>>>>>>> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
>>>>>>> false positives on some targets in cases like some of those
>>>>>>> reported in bug 85623 - strncmp() warns about attribute
>nonstring
>>>>>>> incorrectly in -Wstringop-overflow
>>>>>>
>>>>>>> When first trying to expand calls to __builtin_strncmp, most
>back
>>>>>>> ends succeed even when the expansion results in a call to the
>>>>> library
>>>>>>> function.  The powerpc64 back-end, however, fails and and allows
>>>>>>> the generic expansion to take place.  There is a subtle
>difference
>>>>>>> between the two cases that shows up when examining the bound of
>>>>>>> the strncmp call expression (the third argument): in the
>original
>>>>>>> call expression (in expand_builtin_strncmp) the bound is or can
>be
>>>>>>> an SSA_NAME.  But when the early expansion fails,
>>>>>>> expand_builtin_strncmp transforms the original call expression
>into
>>>>>>> a new one, substituting MIN_EXPR (bound, CST) for the bound
>where
>>>>>>> CST is the constant result of strlen (STR), with STR being
>either
>>>>>>> the first or second strncmp argument.  When the bound is an
>SSA_NAME
>>>>>>> the subsequent attempt to determine its range from the MIN_EXPR
>>>>> fails.
>>>>>>
>>>>>> Hmm, wouldn't sth extracted from the attached random patch I have
>>>>> lying
>>>>>> around be more useful in general?  That is, refactor the GENERIC
>>>>>> expression walk in determine_value_range_1 to sth that can be
>used
>>>>>> in the current get_range_info () interface?  Or if we don't want
>to
>>>>> change
>>>>>> that to accept non-SSA names add a corresponding
>get_expr_range_info
>>>>> ()
>>>>>> interface.
>>>>>
>>>>> Thanks.  It is certainly more general.  I'm just not sure how much
>>>>> use the generality can be put to because...
>>>>>
>>>>>> If you do that please cut out the dominator/loop condition
>handling
>>>>> and
>>>>>> refrain from the idea of looking at SSA def stmts.
>>>>>
>>>>> ...I've done that but I'm having trouble exercising the code
>except
>>>>> for this bug.  (I've run the C testsuite but the only tests that
>end
>>>>> up in it call it with uninteresting arguments like VAR_DECL).  Do
>>>>> you have any suggestions?
>>>>
>>>> Well, what builds the MIN_EXPR for your testcase? I orinigally used
>it
>> for niters analysis which deals with complex GENERIC expressions. If
>you
>> just changed get_range_info then of course callers are guarded with
>SSA
>> name checks.
>>
>>> In pr85888 it's built by expand_builtin_strncmp (so very late).
>>> It doesn't seem like a good use case because of this code:
>>
>>>     /* Expand the library call ourselves using a stabilized argument
>>>        list to avoid re-evaluating the function's arguments twice. 
>*/
>>>     tree fn = build_call_nofold_loc (loc, fndecl, 3, arg1, arg2,
>len);
>>>     gcc_assert (TREE_CODE (fn) == CALL_EXPR);
>>>     CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
>>>     return expand_call (fn, target, target == const0_rtx);
>>
>>> AFAICS, the new call expression is the same as the old one, the only
>>> difference is the LEN argument which is the MIN_EXPR.
>>
>>> The code was added in r72380 where the stabilization referred to
>>> calling save_expr() on the arguments.  That code has been gone
>>> since r242556 so I think a better/more targeted solution to fix
>>> pr85888 than either of our approaches might be to simply call
>>> expand_call() on the original CALL_EXPR with the original
>>> arguments, including ARG3.
>>
>>> I don't fully understand the point of the save_expr() calls there
>>> (and they're still in builtin_expand_strcmp), so if they need to
>>> be restored then they would presumably include ARG3.  I.e.,
>>> expand_call() would be called using ARG3 in the original SSA_NAME
>>> form rather than the MIN_EXPR wrapper created by the function.
>>
>>> Attached is a patch that implements the first approach above.
>>
>>> If the SAVE_EXPRs are needed for some reason, it would be good
>>> to have a test case to verify it.  I haven't been able to come
>>> up with one (or find an existing test that fails due to their
>>> removal) but that could very well be because my limited
>>> understanding of what they're for (and poor testsuite coverage).
>>
>>> I'm also up for taking your patch and trying to make use of it
>>> for other trees where get_range_info() is currently being called
>>> for SSA_NAMEs only.  But it feels like a separate project from
>>> this bug fix.
>>
>> But your patch makes the whole 'len' computation after
>>
>>    /* If we don't have a constant length for the first, use the
>length
>>       of the second, if we know it.  If neither string is constant
>length,
>>       use the given length argument.  We don't require a constant for
>>       this case; some cost analysis could be done if both are
>available
>>       but neither is constant.  For now, assume they're equally
>cheap,
>>       unless one has side effects.  If both strings have constant
>lengths,
>>       use the smaller.  */
>>
>>    if (!len1 && !len2)
>>      len = len3;
>>    else if (!len1)
>> ...
>>    /* If we are not using the given length, we must incorporate it
>here.
>>       The actual new length parameter will be MIN(len,arg3) in this
>case.  */
>>    if (len != len3)
>>      len = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (len), len,
>len3);
>>
>> dead if expand_cmpstrn_or_cmpmem doesn't expand things inline.
>> I believe the idea is to possibly reduce 'len' for expansion of the
>> libcall.
>>
>> OTOH all this "optimization" should have been done as folding
>> and not at RTL expansion time.  Thus if the issue is getting
>> a warning from the recursive re-expansion of the adjusted
>> call then removing this "postmature" optimization and making
>> sure it happens earlier as folding might be indeed a good thing.
>>
>> But your patch as-is looks like it loses some optimization
>> (and I'm not caring about "folding" at -O0).
>
>On the one affected target (as far as I know) it increases
>the code size by adding a branch.  I wouldn't have thought
>of that as an optimization.  Since it bloats code, at
>a minimum I'd expect the decision to be gated for -Os.  But
>I'm not sure the branch is necessarily a win either -- it
>will certainly depend on whether strncpy makes use of it.
>
>At the same time, if you believe this code is deliberately
>crafted this way I wouldn't want to undo it just to suppress
>a warning.  So I'm back to the question: is my first patch
>good enough for now to suppress the warning or do you want
>me to use your patch?

I don't like to have random pieces of range propagation code spread throughout the compiler so no, your first patch isn't OK. A variant using a new get_expr_range_info is preferred. 

Richard. 

>I don't have a problem using yours but in the first pass
>I wouldn't want to do more than just fix the warning; with
>it fixed, I could the look into removing some of the guards
>for SSA_NAME around calls to get_range_info().
>
>Martin
>
>> Richard.
>>
>>> Martin
>>
>>>>
>>>>> Martin
>>>>>
>>>>> PS I was able to create a test that at -O0 could call it with
>>>>> a more interesting argument such as:
>>>>>
>>>>>   const __SIZE_TYPE__ i = 3, j = 5;
>>>>>
>>>>>   int f (void)
>>>>>   {
>>>>>     return __builtin_strncmp ("1234", "123", i < j ? i : j);
>>>>>   }
>>>>>
>>>>> Here, the first call to gimple_fold_builtin_string_compare() that
>>>>> I picked as a test case sees:
>>>>>
>>>>>   j.0_1 = 5;
>>>>>   i.1_2 = 3;
>>>>>   _3 = MIN_EXPR <j.0_1, i.1_2>;
>>>>>   D.1963 = __builtin_strncmp ("1234", "123", _3);
>>>>>
>>>>> Above, _3's range info is not available yet at this point so
>>>>> the expression could be evaluated but only by looking at def
>stmts.
>>>>> Why is it not a good idea?
>>>>
>>>> Because it raises complexity concerns unless you also populate
>> intermediate SSA range Infos and stop at present ones. And you can't
>deal
>> with conditional info as the patch tries to do (but I wouldn't keep
>that
>> either initially).
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> Martin
>>>>
Martin Sebor May 25, 2018, 6:02 p.m. UTC | #8
On 05/25/2018 11:51 AM, Richard Biener wrote:
> On May 25, 2018 7:31:43 PM GMT+02:00, Martin Sebor <msebor@gmail.com> wrote:
>> On 05/25/2018 01:02 AM, Richard Biener wrote:
>>> On Thu, May 24, 2018 at 11:22 PM Martin Sebor <msebor@gmail.com>
>> wrote:
>>>
>>>> On 05/24/2018 11:15 AM, Richard Biener wrote:
>>>>> On May 24, 2018 7:02:17 PM GMT+02:00, Martin Sebor
>> <msebor@gmail.com>
>>> wrote:
>>>>>> On 05/24/2018 03:39 AM, Richard Biener wrote:
>>>>>>> On Thu, May 24, 2018 at 12:50 AM Martin Sebor <msebor@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> The attached patch enhances the get_size_range() function to
>>>>>>>> extract more accurate ranges out of MIN_EXPR and MAX_EXPR nodes
>>>>>>>> with SSA_NAME operand(s).  This helps -Wstringop-overflow avoid
>>>>>>>> false positives on some targets in cases like some of those
>>>>>>>> reported in bug 85623 - strncmp() warns about attribute
>> nonstring
>>>>>>>> incorrectly in -Wstringop-overflow
>>>>>>>
>>>>>>>> When first trying to expand calls to __builtin_strncmp, most
>> back
>>>>>>>> ends succeed even when the expansion results in a call to the
>>>>>> library
>>>>>>>> function.  The powerpc64 back-end, however, fails and and allows
>>>>>>>> the generic expansion to take place.  There is a subtle
>> difference
>>>>>>>> between the two cases that shows up when examining the bound of
>>>>>>>> the strncmp call expression (the third argument): in the
>> original
>>>>>>>> call expression (in expand_builtin_strncmp) the bound is or can
>> be
>>>>>>>> an SSA_NAME.  But when the early expansion fails,
>>>>>>>> expand_builtin_strncmp transforms the original call expression
>> into
>>>>>>>> a new one, substituting MIN_EXPR (bound, CST) for the bound
>> where
>>>>>>>> CST is the constant result of strlen (STR), with STR being
>> either
>>>>>>>> the first or second strncmp argument.  When the bound is an
>> SSA_NAME
>>>>>>>> the subsequent attempt to determine its range from the MIN_EXPR
>>>>>> fails.
>>>>>>>
>>>>>>> Hmm, wouldn't sth extracted from the attached random patch I have
>>>>>> lying
>>>>>>> around be more useful in general?  That is, refactor the GENERIC
>>>>>>> expression walk in determine_value_range_1 to sth that can be
>> used
>>>>>>> in the current get_range_info () interface?  Or if we don't want
>> to
>>>>>> change
>>>>>>> that to accept non-SSA names add a corresponding
>> get_expr_range_info
>>>>>> ()
>>>>>>> interface.
>>>>>>
>>>>>> Thanks.  It is certainly more general.  I'm just not sure how much
>>>>>> use the generality can be put to because...
>>>>>>
>>>>>>> If you do that please cut out the dominator/loop condition
>> handling
>>>>>> and
>>>>>>> refrain from the idea of looking at SSA def stmts.
>>>>>>
>>>>>> ...I've done that but I'm having trouble exercising the code
>> except
>>>>>> for this bug.  (I've run the C testsuite but the only tests that
>> end
>>>>>> up in it call it with uninteresting arguments like VAR_DECL).  Do
>>>>>> you have any suggestions?
>>>>>
>>>>> Well, what builds the MIN_EXPR for your testcase? I orinigally used
>> it
>>> for niters analysis which deals with complex GENERIC expressions. If
>> you
>>> just changed get_range_info then of course callers are guarded with
>> SSA
>>> name checks.
>>>
>>>> In pr85888 it's built by expand_builtin_strncmp (so very late).
>>>> It doesn't seem like a good use case because of this code:
>>>
>>>>     /* Expand the library call ourselves using a stabilized argument
>>>>        list to avoid re-evaluating the function's arguments twice.
>> */
>>>>     tree fn = build_call_nofold_loc (loc, fndecl, 3, arg1, arg2,
>> len);
>>>>     gcc_assert (TREE_CODE (fn) == CALL_EXPR);
>>>>     CALL_EXPR_TAILCALL (fn) = CALL_EXPR_TAILCALL (exp);
>>>>     return expand_call (fn, target, target == const0_rtx);
>>>
>>>> AFAICS, the new call expression is the same as the old one, the only
>>>> difference is the LEN argument which is the MIN_EXPR.
>>>
>>>> The code was added in r72380 where the stabilization referred to
>>>> calling save_expr() on the arguments.  That code has been gone
>>>> since r242556 so I think a better/more targeted solution to fix
>>>> pr85888 than either of our approaches might be to simply call
>>>> expand_call() on the original CALL_EXPR with the original
>>>> arguments, including ARG3.
>>>
>>>> I don't fully understand the point of the save_expr() calls there
>>>> (and they're still in builtin_expand_strcmp), so if they need to
>>>> be restored then they would presumably include ARG3.  I.e.,
>>>> expand_call() would be called using ARG3 in the original SSA_NAME
>>>> form rather than the MIN_EXPR wrapper created by the function.
>>>
>>>> Attached is a patch that implements the first approach above.
>>>
>>>> If the SAVE_EXPRs are needed for some reason, it would be good
>>>> to have a test case to verify it.  I haven't been able to come
>>>> up with one (or find an existing test that fails due to their
>>>> removal) but that could very well be because my limited
>>>> understanding of what they're for (and poor testsuite coverage).
>>>
>>>> I'm also up for taking your patch and trying to make use of it
>>>> for other trees where get_range_info() is currently being called
>>>> for SSA_NAMEs only.  But it feels like a separate project from
>>>> this bug fix.
>>>
>>> But your patch makes the whole 'len' computation after
>>>
>>>    /* If we don't have a constant length for the first, use the
>> length
>>>       of the second, if we know it.  If neither string is constant
>> length,
>>>       use the given length argument.  We don't require a constant for
>>>       this case; some cost analysis could be done if both are
>> available
>>>       but neither is constant.  For now, assume they're equally
>> cheap,
>>>       unless one has side effects.  If both strings have constant
>> lengths,
>>>       use the smaller.  */
>>>
>>>    if (!len1 && !len2)
>>>      len = len3;
>>>    else if (!len1)
>>> ...
>>>    /* If we are not using the given length, we must incorporate it
>> here.
>>>       The actual new length parameter will be MIN(len,arg3) in this
>> case.  */
>>>    if (len != len3)
>>>      len = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (len), len,
>> len3);
>>>
>>> dead if expand_cmpstrn_or_cmpmem doesn't expand things inline.
>>> I believe the idea is to possibly reduce 'len' for expansion of the
>>> libcall.
>>>
>>> OTOH all this "optimization" should have been done as folding
>>> and not at RTL expansion time.  Thus if the issue is getting
>>> a warning from the recursive re-expansion of the adjusted
>>> call then removing this "postmature" optimization and making
>>> sure it happens earlier as folding might be indeed a good thing.
>>>
>>> But your patch as-is looks like it loses some optimization
>>> (and I'm not caring about "folding" at -O0).
>>
>> On the one affected target (as far as I know) it increases
>> the code size by adding a branch.  I wouldn't have thought
>> of that as an optimization.  Since it bloats code, at
>> a minimum I'd expect the decision to be gated for -Os.  But
>> I'm not sure the branch is necessarily a win either -- it
>> will certainly depend on whether strncpy makes use of it.
>>
>> At the same time, if you believe this code is deliberately
>> crafted this way I wouldn't want to undo it just to suppress
>> a warning.  So I'm back to the question: is my first patch
>> good enough for now to suppress the warning or do you want
>> me to use your patch?
>
> I don't like to have random pieces of range propagation code spread throughout the compiler so no, your first patch isn't OK. A variant using a new get_expr_range_info is preferred.

Okay, I'll use it.

I do want to follow up on the optimization you referred to above.
After thinking about it some more I don't see what benefit it
could provide.  The value of the bound expression (LEN) ends up
computed at the call site and stored in a register that the library
strncmp has to read.  I can't think of any difference the MIN_EXPR
could make in the library, so the code as is seems strictly worse
than if it used the SSA_NAME directly instead.  Please let me know
if you don't agree (and what benefit you think it might provide).
Otherwise I'll open a bug to change it.

Thanks
Martin

>
> Richard.
>
>> I don't have a problem using yours but in the first pass
>> I wouldn't want to do more than just fix the warning; with
>> it fixed, I could the look into removing some of the guards
>> for SSA_NAME around calls to get_range_info().
>>
>> Martin
>>
>>> Richard.
>>>
>>>> Martin
>>>
>>>>>
>>>>>> Martin
>>>>>>
>>>>>> PS I was able to create a test that at -O0 could call it with
>>>>>> a more interesting argument such as:
>>>>>>
>>>>>>   const __SIZE_TYPE__ i = 3, j = 5;
>>>>>>
>>>>>>   int f (void)
>>>>>>   {
>>>>>>     return __builtin_strncmp ("1234", "123", i < j ? i : j);
>>>>>>   }
>>>>>>
>>>>>> Here, the first call to gimple_fold_builtin_string_compare() that
>>>>>> I picked as a test case sees:
>>>>>>
>>>>>>   j.0_1 = 5;
>>>>>>   i.1_2 = 3;
>>>>>>   _3 = MIN_EXPR <j.0_1, i.1_2>;
>>>>>>   D.1963 = __builtin_strncmp ("1234", "123", _3);
>>>>>>
>>>>>> Above, _3's range info is not available yet at this point so
>>>>>> the expression could be evaluated but only by looking at def
>> stmts.
>>>>>> Why is it not a good idea?
>>>>>
>>>>> Because it raises complexity concerns unless you also populate
>>> intermediate SSA range Infos and stop at present ones. And you can't
>> deal
>>> with conditional info as the patch tries to do (but I wouldn't keep
>> that
>>> either initially).
>>>>>
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> Martin
>>>>>
>
Martin Sebor May 25, 2018, 8:15 p.m. UTC | #9
Attached is revision 3 of the patch incorporating your
determine_value_range function with the requested changes.

Martin
PR testsuite/85888 - New test case c-c++-common/attr-nonstring-6.c from r260541 fails with excess errors

gcc/ChangeLog:

	PR testsuite/85888
	* builtins.c (expand_builtin_strncmp): Call expand_call with
	the original CALL_EXPR since no stabilization has been done.
	* gcc/tree-ssanames.c (get_range_info): Call determine_value_range
	for arguments other than SSA_NAME.
	* calls.c (get_size_range): Call get_range_info even for non-SSA
	arguments.
	* tree-vrp.h (determine_value_range): Declared new function.
	* tree-vrp.c (determine_value_range_1, determine_value_range): New.

diff --git a/gcc/calls.c b/gcc/calls.c
index 35bcff7..f6bf3a6 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1319,7 +1319,7 @@ get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
   wide_int min, max;
   enum value_range_type range_type;
 
-  if (TREE_CODE (exp) == SSA_NAME && integral)
+  if (integral)
     range_type = get_range_info (exp, &min, &max);
   else
     range_type = VR_VARYING;
diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c
index 6cce43b..b5d257b 100644
--- a/gcc/tree-ssanames.c
+++ b/gcc/tree-ssanames.c
@@ -398,25 +398,30 @@ set_range_info (tree name, enum value_range_type range_type,
 
 
 /* Gets range information MIN, MAX and returns enum value_range_type
-   corresponding to tree ssa_name NAME.  enum value_range_type returned
-   is used to determine if MIN and MAX are valid values.  */
+   corresponding to expression EXPR.  Returned value_range_type is
+   used to determine if MIN and MAX are valid values.  */
 
 enum value_range_type
-get_range_info (const_tree name, wide_int *min, wide_int *max)
+get_range_info (const_tree expr, wide_int *min, wide_int *max)
 {
-  gcc_assert (!POINTER_TYPE_P (TREE_TYPE (name)));
+  gcc_assert (!POINTER_TYPE_P (TREE_TYPE (expr)));
   gcc_assert (min && max);
-  range_info_def *ri = SSA_NAME_RANGE_INFO (name);
+  if (TREE_CODE (expr) == SSA_NAME)
+    {
+      range_info_def *ri = SSA_NAME_RANGE_INFO (expr);
 
-  /* Return VR_VARYING for SSA_NAMEs with NULL RANGE_INFO or SSA_NAMEs
-     with integral types width > 2 * HOST_BITS_PER_WIDE_INT precision.  */
-  if (!ri || (GET_MODE_PRECISION (SCALAR_INT_TYPE_MODE (TREE_TYPE (name)))
-	      > 2 * HOST_BITS_PER_WIDE_INT))
-    return VR_VARYING;
+      /* Return VR_VARYING for SSA_NAMEs with NULL RANGE_INFO or SSA_NAMEs
+	 with integral types width > 2 * HOST_BITS_PER_WIDE_INT precision.  */
+      if (!ri || (GET_MODE_PRECISION (SCALAR_INT_TYPE_MODE (TREE_TYPE (expr)))
+		  > 2 * HOST_BITS_PER_WIDE_INT))
+	return VR_VARYING;
+
+      *min = ri->get_min ();
+      *max = ri->get_max ();
+      return SSA_NAME_RANGE_TYPE (expr);
+    }
 
-  *min = ri->get_min ();
-  *max = ri->get_max ();
-  return SSA_NAME_RANGE_TYPE (name);
+  return determine_value_range (const_cast<tree> (expr), min, max);
 }
 
 /* Set nonnull attribute to pointer NAME.  */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 6c482dd..1b1084b 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -7130,3 +7130,62 @@ make_pass_vrp (gcc::context *ctxt)
 {
   return new pass_vrp (ctxt);
 }
+
+
+/* Worker for determine_value_range.  */
+
+static void
+determine_value_range_1 (value_range *vr, tree expr)
+{
+  if (BINARY_CLASS_P (expr))
+    {
+      value_range vr0 = VR_INITIALIZER, vr1 = VR_INITIALIZER;
+      determine_value_range_1 (&vr0, TREE_OPERAND (expr, 0));
+      determine_value_range_1 (&vr1, TREE_OPERAND (expr, 1));
+      extract_range_from_binary_expr_1 (vr, TREE_CODE (expr), TREE_TYPE (expr),
+					&vr0, &vr1);
+    }
+  else if (UNARY_CLASS_P (expr))
+    {
+      value_range vr0 = VR_INITIALIZER;
+      determine_value_range_1 (&vr0, TREE_OPERAND (expr, 0));
+      extract_range_from_unary_expr (vr, TREE_CODE (expr), TREE_TYPE (expr),
+				     &vr0, TREE_TYPE (TREE_OPERAND (expr, 0)));
+    }
+  else if (TREE_CODE (expr) == INTEGER_CST)
+    set_value_range_to_value (vr, expr, NULL);
+  else
+    {
+      value_range_type kind;
+      wide_int min, max;
+      /* For SSA names try to extract range info computed by VRP.  Otherwise
+	 fall back to varying.  */
+      if (TREE_CODE (expr) == SSA_NAME
+	  && INTEGRAL_TYPE_P (TREE_TYPE (expr))
+	  && (kind = get_range_info (expr, &min, &max)) != VR_VARYING)
+	set_value_range (vr, kind, wide_int_to_tree (TREE_TYPE (expr), min),
+			 wide_int_to_tree (TREE_TYPE (expr), max), NULL);
+      else
+	set_value_range_to_varying (vr);
+    }
+}
+
+/* Compute a value-range for EXPR and set it in *MIN and *MAX.  Return
+   determined the range type.  */
+
+value_range_type
+determine_value_range (tree expr, wide_int *min, wide_int *max)
+{
+  value_range vr = VR_INITIALIZER;
+  determine_value_range_1 (&vr, expr);
+  if ((vr.type == VR_RANGE
+       || vr.type == VR_ANTI_RANGE)
+      && !symbolic_range_p (&vr))
+    {
+      *min = wi::to_wide (vr.min);
+      *max = wi::to_wide (vr.max);
+      return vr.type;
+    }
+
+  return VR_VARYING;
+}
diff --git a/gcc/tree-vrp.h b/gcc/tree-vrp.h
index d8f60be..608ca56 100644
--- a/gcc/tree-vrp.h
+++ b/gcc/tree-vrp.h
@@ -119,7 +119,7 @@ extern bool range_int_cst_singleton_p (value_range *);
 extern int value_inside_range (tree, tree, tree);
 extern tree get_single_symbol (tree, bool *, tree *);
 extern void maybe_set_nonzero_bits (edge, tree);
-
+extern value_range_type determine_value_range (tree, wide_int *, wide_int *);
 
 struct switch_update {
   gswitch *stmt;
Richard Biener May 28, 2018, 9:11 a.m. UTC | #10
On Fri, May 25, 2018 at 10:15 PM Martin Sebor <msebor@gmail.com> wrote:

> Attached is revision 3 of the patch incorporating your
> determine_value_range function with the requested changes.

I'm somewhat torn about removing the "basic" interface on SSA names
so can you please not change get_range_info for now and instead
use determine_value_range in get_size_range for now?

OK with that change.
Thanks,
Richard.

> Martin
Martin Sebor May 29, 2018, 2:58 p.m. UTC | #11
On 05/28/2018 03:11 AM, Richard Biener wrote:
> On Fri, May 25, 2018 at 10:15 PM Martin Sebor <msebor@gmail.com> wrote:
>
>> Attached is revision 3 of the patch incorporating your
>> determine_value_range function with the requested changes.
>
> I'm somewhat torn about removing the "basic" interface on SSA names
> so can you please not change get_range_info for now and instead
> use determine_value_range in get_size_range for now?

I can do that.  Can you explain why you're having second thoughts
about going this route?

FWIW: I've already made changes to most clients to get_range_info
to let them call the function for non-SSA arguments and have been
testing the (incremental) patch.  I haven't seen a dramatic increase
in the number of successful calls to the function as a result so it
doesn't seem like it would be too much of an improvement.  I did
notice that some calls end up returning a one-element range, i.e.,
N, N].  Unless callers are prepared to handle such ranges this could
expose bugs or optimization opportunities.  Is it worth finishing
this up?

Martin
Martin Sebor May 29, 2018, 6:19 p.m. UTC | #12
> I do want to follow up on the optimization you referred to above.
> After thinking about it some more I don't see what benefit it
> could provide.  The value of the bound expression (LEN) ends up
> computed at the call site and stored in a register that the library
> strncmp has to read.  I can't think of any difference the MIN_EXPR
> could make in the library, so the code as is seems strictly worse
> than if it used the SSA_NAME directly instead.  Please let me know
> if you don't agree (and what benefit you think it might provide).
> Otherwise I'll open a bug to change it.

I've raised bug 85980 for this.

Martin
Richard Biener May 30, 2018, 9:37 a.m. UTC | #13
On Tue, May 29, 2018 at 4:58 PM Martin Sebor <msebor@gmail.com> wrote:

> On 05/28/2018 03:11 AM, Richard Biener wrote:
> > On Fri, May 25, 2018 at 10:15 PM Martin Sebor <msebor@gmail.com> wrote:
> >
> >> Attached is revision 3 of the patch incorporating your
> >> determine_value_range function with the requested changes.
> >
> > I'm somewhat torn about removing the "basic" interface on SSA names
> > so can you please not change get_range_info for now and instead
> > use determine_value_range in get_size_range for now?

> I can do that.  Can you explain why you're having second thoughts
> about going this route?

I've seen you recurse between both APIs, thus they call each other.
That's ugly which is why I prefer to keep one of them a simple accessor
to the range-info associated with an SSA name.

A future enhancement for the new API would be to walk def stmts
but then the API should stop at SSA names that do have range-info
associated and record range-info it computed into SSA names it
walked so the IL itself serves as a cache.  That requires a way
to see whether an SSA name has range-info rather than having
get_range_info recurse into the walking machinery again.

> FWIW: I've already made changes to most clients to get_range_info
> to let them call the function for non-SSA arguments and have been
> testing the (incremental) patch.  I haven't seen a dramatic increase
> in the number of successful calls to the function as a result so it
> doesn't seem like it would be too much of an improvement.  I did
> notice that some calls end up returning a one-element range, i.e.,
> N, N].  Unless callers are prepared to handle such ranges this could
> expose bugs or optimization opportunities.  Is it worth finishing
> this up?

It's probably missed optimization opportunities but they might be
caused by "bad" pass placement.

But yes, in principle finishing this up is worthwhile - we should just
keep a bare-metal get_range_info somehow.  We can use an
alternate name for that one for example - get_ssa_range_info
for example?  And adjust the setters accordingly.

Or go away with a new name for the enhanced API - I like
get_expr_range_info more there but it of course requires
changing all affected users.  If I had the choice and would
do the implementation I would do this - make the enhanced API
called get_expr_range_info.

Richard.


> Martin
Jeff Law May 30, 2018, 10:42 p.m. UTC | #14
On 05/30/2018 03:37 AM, Richard Biener wrote:
> On Tue, May 29, 2018 at 4:58 PM Martin Sebor <msebor@gmail.com> wrote:
> 
>> On 05/28/2018 03:11 AM, Richard Biener wrote:
>>> On Fri, May 25, 2018 at 10:15 PM Martin Sebor <msebor@gmail.com> wrote:
>>>
>>>> Attached is revision 3 of the patch incorporating your
>>>> determine_value_range function with the requested changes.
>>>
>>> I'm somewhat torn about removing the "basic" interface on SSA names
>>> so can you please not change get_range_info for now and instead
>>> use determine_value_range in get_size_range for now?
> 
>> I can do that.  Can you explain why you're having second thoughts
>> about going this route?
> 
> I've seen you recurse between both APIs, thus they call each other.
> That's ugly which is why I prefer to keep one of them a simple accessor
> to the range-info associated with an SSA name.
> 
> A future enhancement for the new API would be to walk def stmts
> but then the API should stop at SSA names that do have range-info
> associated and record range-info it computed into SSA names it
> walked so the IL itself serves as a cache.  That requires a way
> to see whether an SSA name has range-info rather than having
> get_range_info recurse into the walking machinery again.
There's a lot of similarities between what you're suggesting here and
the ranger API that Andrew has been working on.


Jeff
Richard Biener May 31, 2018, 8:15 a.m. UTC | #15
On May 31, 2018 12:42:39 AM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 05/30/2018 03:37 AM, Richard Biener wrote:
>> On Tue, May 29, 2018 at 4:58 PM Martin Sebor <msebor@gmail.com>
>wrote:
>> 
>>> On 05/28/2018 03:11 AM, Richard Biener wrote:
>>>> On Fri, May 25, 2018 at 10:15 PM Martin Sebor <msebor@gmail.com>
>wrote:
>>>>
>>>>> Attached is revision 3 of the patch incorporating your
>>>>> determine_value_range function with the requested changes.
>>>>
>>>> I'm somewhat torn about removing the "basic" interface on SSA names
>>>> so can you please not change get_range_info for now and instead
>>>> use determine_value_range in get_size_range for now?
>> 
>>> I can do that.  Can you explain why you're having second thoughts
>>> about going this route?
>> 
>> I've seen you recurse between both APIs, thus they call each other.
>> That's ugly which is why I prefer to keep one of them a simple
>accessor
>> to the range-info associated with an SSA name.
>> 
>> A future enhancement for the new API would be to walk def stmts
>> but then the API should stop at SSA names that do have range-info
>> associated and record range-info it computed into SSA names it
>> walked so the IL itself serves as a cache.  That requires a way
>> to see whether an SSA name has range-info rather than having
>> get_range_info recurse into the walking machinery again.
>There's a lot of similarities between what you're suggesting here and
>the ranger API that Andrew has been working on.

Maybe - at least it integrates easily with existing infrastructure and doesn't require yet another set of operations on ranges. 

I'll yet have to review the ranger stuff. 

Richard. 

>
>Jeff
diff mbox series

Patch

PR testsuite/85888 - New test case c-c++-common/attr-nonstring-6.c from r260541 fails with excess errors

gcc/ChangeLog:

	PR testsuite/85888
	* calls.c (get_size_range): Handle MIN_EXPR and MAX_EXPR.

diff --git a/gcc/calls.c b/gcc/calls.c
index 35bcff7..4998780 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1295,7 +1295,7 @@  alloc_max_size (void)
 
 /* Return true when EXP's range can be determined and set RANGE[] to it
    after adjusting it if necessary to make EXP a represents a valid size
-   of object, or a valid size argument to an allocation function declared
+   of an object, or a valid size argument to an allocation function declared
    with attribute alloc_size (whose argument may be signed), or to a string
    manipulation function like memset.  When ALLOW_ZERO is true, allow
    returning a range of [0, 0] for a size in an anti-range [1, N] where
@@ -1317,12 +1317,53 @@  get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
   bool integral = INTEGRAL_TYPE_P (exptype);
 
   wide_int min, max;
-  enum value_range_type range_type;
+  enum value_range_type range_type = VR_VARYING;
 
-  if (TREE_CODE (exp) == SSA_NAME && integral)
-    range_type = get_range_info (exp, &min, &max);
-  else
-    range_type = VR_VARYING;
+  if (integral)
+    {
+      bool min_expr = TREE_CODE (exp) == MIN_EXPR;
+      if (min_expr || TREE_CODE (exp) == MAX_EXPR)
+	{
+	  /* EXP may be an SSA_NAME wrapped in a MIN_/MAX_EXPR.  Try
+	     to extract the range from the SSA_NAME and, if successful,
+	     set the final range to the intersection of the ranges
+	     extracted from the expressions' arguments.  */
+	  tree arg0 = TREE_OPERAND (exp, 0);
+	  tree arg1 = TREE_OPERAND (exp, 1);
+
+	  STRIP_NOPS (arg0);
+	  STRIP_NOPS (arg1);
+
+	  tree rng0[2], rng1[2];
+	  if (get_size_range (arg0, rng0, allow_zero)
+	      && get_size_range (arg1, rng1, allow_zero))
+	    {
+	      if (min_expr)
+		{
+		  range[0]
+		    = tree_int_cst_lt (rng0[0], rng1[0]) ? rng0[0] : rng1[0];
+		  range[1]
+		    = tree_int_cst_lt (rng0[1], rng1[1]) ? rng0[1] : rng1[1];
+		}
+	      else
+		{
+		  range[0]
+		    = tree_int_cst_lt (rng1[0], rng0[0]) ? rng0[0] : rng1[0];
+		  range[1]
+		    = tree_int_cst_lt (rng1[1], rng0[1]) ? rng0[1] : rng1[1];
+		}
+
+	      inform (UNKNOWN_LOCATION,
+		      "%E (%s_EXPR): [%E, %E] U [%E, %E] ==> [%E, %E]",
+		      exp, TREE_CODE (exp) == MIN_EXPR ? "MIN" : "MAX",
+		      rng0[0], rng0[1], rng1[0], rng1[1], range[0], range[1]);
+	      return true;
+	    }
+	}
+
+      if (TREE_CODE (exp) == SSA_NAME)
+	range_type = get_range_info (exp, &min, &max);
+    }
 
   if (range_type == VR_VARYING)
     {