diff mbox series

generalized range_query class for multiple contexts

Message ID 9bf12c00-b502-79b1-3cdc-b9256c2aa420@redhat.com
State New
Headers show
Series generalized range_query class for multiple contexts | expand

Commit Message

Aldy Hernandez Sept. 18, 2020, 6:38 p.m. UTC
As part of the ranger work, we have been trying to clean up and 
generalize interfaces whenever possible.  This not only helps in 
reducing the maintenance burden going forward, but provides mechanisms 
for backwards compatibility between ranger and other providers/users of 
ranges throughout the compiler like evrp and VRP.

One such interface is the range_query class in vr_values.h, which 
provides a range query mechanism for use in the simplify_using_ranges 
module.  With it, simplify_using_ranges can be used with the ranger, or 
the VRP twins by providing a get_value_range() method.  This has helped 
us in comparing apples to apples while doing our work, and has also 
future proofed the interface so that asking for a range can be done 
within the context in which it appeared.  For example, get_value_range 
now takes a gimple statement which provides context.  We are no longer 
tied to asking for a global SSA range, but can ask for the range of an 
SSA within a statement.  Granted, this functionality is currently only 
in the ranger, but evrp/vrp could be adapted to pass such context.

The range_query is a good first step, but what we really want is a 
generic query mechanism that can ask for SSA ranges within an 
expression, a statement, an edge, or anything else that may come up.  We 
think that a generic mechanism can be used not only for range producers, 
but consumers such as the substitute_and_fold_engine (see get_value 
virtual) and possibly the gimple folder (see valueize).

The attached patchset provides such an interface.  It is meant to be a 
replacement for range_query that can be used for vr_values, 
substitute_and_fold, the subsitute_and_fold_engine, as well as the 
ranger.  The general API is:

class value_query
{
public:
   // Return the singleton expression for NAME at a gimple statement,
   // or NULL if none found.
   virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
   // Return the singleton expression for NAME at an edge, or NULL if
   // none found.
   virtual tree value_on_edge (edge, tree name);
   // Return the singleton expression for the LHS of a gimple
   // statement, assuming an (optional) initial value of NAME.  Returns
   // NULL if none found.
   //
   // Note this method calculates the range the LHS would have *after*
   // the statement has executed.
   virtual tree value_of_stmt (gimple *, tree name = NULL);
};

class range_query : public value_query
{
public:
   range_query ();
   virtual ~range_query ();

   virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
   virtual tree value_on_edge (edge, tree name) OVERRIDE;
   virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;

   // These are the range equivalents of the value_* methods.  Instead
   // of returning a singleton, they calculate a range and return it in
   // R.  TRUE is returned on success or FALSE if no range was found.
   virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
   virtual bool range_on_edge (irange &r, edge, tree name);
   virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);

   // DEPRECATED: This method is used from vr-values.  The plan is to
   // rewrite all uses of it to the above API.
   virtual const class value_range_equiv *get_value_range (const_tree,
							  gimple * = NULL);
};

The duality of the API (value_of_* and range_on_*) is because some 
passes are interested in a singleton value 
(substitute_and_fold_enginge), while others are interested in ranges 
(vr_values).  Passes that are only interested in singletons can take a 
value_query, while passes that are interested in full ranges, can take a 
range_query.  Of course, for future proofing, we would recommend taking 
a range_query, since if you provide a default range_of_expr, sensible 
defaults will be provided for the others in terms of range_of_expr.

Note, that the absolute bare minimum that must be provided is a 
value_of_expr and a range_of_expr respectively.

One piece of the API which is missing is a method  to return the range 
of an arbitrary SSA_NAME *after* a statement.  Currently range_of_expr 
calculates the range of an expression upon entry to the statement, 
whereas range_of_stmt calculates the range of *only* the LHS of a 
statement AFTER the statement has executed.

This would allow for complete representation of the ranges/values in 
something like:

     d_4 = *g_7;

Here the range of g_7 upon entry could be VARYING, but after the 
dereference we know it must be non-zero.  Well for sane targets anyhow.

Choices would be to:

   1) add a 4th method such as "range_after_stmt", or

   2) merge that functionality with the existing range_of_stmt method to 
provide "after" functionality for any ssa_name.  Currently the SSA_NAME 
must be the same as the LHS if specified.  It also does not need to be 
specified to allow evaluation of statements without a LHS, (if (x < 1) 
has no LHS, but will return [0,0] [1,1] or [0,1] as there is a boolean 
"result" to the statement.

Also, we've debated whether to use the two classes (value_query and 
range_query) separately as above, or perhaps join them into one class. 
I personally like the separation of powers.  Some users only care for 
singletons, no sense in polluting their implementations with talk of 
ranges.  But we're open to suggestions.

The patchset is divided into 3 parts:

1. Generic implementation of class.

2. Conversion of vr_values and substitute_and_fold_engine to class.

The conversion of the substitute_and_fold_engine class involved changing 
all calls to get_value() to a corresponding method providing context. 
For example, when the edge or statement is available, we pass these to 
the appropriate method.

With this, we've been able to plug in the ranger into our hybrid evrp, 
which can then get better ranges.  Our hybrid evrp has moved a lot of 
optimizations that were happening much later into the early vrp stage.

3. Conversion of sprintf/strlen pass to class.

This is a nonfunctional change to the sprintf/strlen passes.  That is, 
no effort was made to change the passes to multi-ranges.  However, with 
this patch, we are able to plug in a ranger or evrp with just a few 
lines, since the range query mechanism share a common API.

---------------------

We think this interface is generic enough to be used in any place that 
queries ranges or singletons, basically any place we have *valueize* in 
the compiler.  It can also be used to replace range generators at will, 
or even combine them to get the best of both worlds.  For example, 
places that currently have a value_* interface because they work with 
constants, like CCP, could have an alternate valueizer plugged in and if 
a range folds to a constant, CCP could then propagate that constant.

Tested on x86-64 Linux.

Aldy

Comments

Aldy Hernandez Sept. 18, 2020, 8:40 p.m. UTC | #1
Forgot to include ChangeLog entries.

Aldy
Martin Sebor Sept. 23, 2020, 11:53 p.m. UTC | #2
On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
> As part of the ranger work, we have been trying to clean up and 
> generalize interfaces whenever possible.  This not only helps in 
> reducing the maintenance burden going forward, but provides mechanisms 
> for backwards compatibility between ranger and other providers/users of 
> ranges throughout the compiler like evrp and VRP.
> 
> One such interface is the range_query class in vr_values.h, which 
> provides a range query mechanism for use in the simplify_using_ranges 
> module.  With it, simplify_using_ranges can be used with the ranger, or 
> the VRP twins by providing a get_value_range() method.  This has helped 
> us in comparing apples to apples while doing our work, and has also 
> future proofed the interface so that asking for a range can be done 
> within the context in which it appeared.  For example, get_value_range 
> now takes a gimple statement which provides context.  We are no longer 
> tied to asking for a global SSA range, but can ask for the range of an 
> SSA within a statement.  Granted, this functionality is currently only 
> in the ranger, but evrp/vrp could be adapted to pass such context.
> 
> The range_query is a good first step, but what we really want is a 
> generic query mechanism that can ask for SSA ranges within an 
> expression, a statement, an edge, or anything else that may come up.  We 
> think that a generic mechanism can be used not only for range producers, 
> but consumers such as the substitute_and_fold_engine (see get_value 
> virtual) and possibly the gimple folder (see valueize).
> 
> The attached patchset provides such an interface.  It is meant to be a 
> replacement for range_query that can be used for vr_values, 
> substitute_and_fold, the subsitute_and_fold_engine, as well as the 
> ranger.  The general API is:
> 
> class value_query
> {
> public:
>    // Return the singleton expression for NAME at a gimple statement,
>    // or NULL if none found.
>    virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
>    // Return the singleton expression for NAME at an edge, or NULL if
>    // none found.
>    virtual tree value_on_edge (edge, tree name);
>    // Return the singleton expression for the LHS of a gimple
>    // statement, assuming an (optional) initial value of NAME.  Returns
>    // NULL if none found.
>    //
>    // Note this method calculates the range the LHS would have *after*
>    // the statement has executed.
>    virtual tree value_of_stmt (gimple *, tree name = NULL);
> };
> 
> class range_query : public value_query
> {
> public:
>    range_query ();
>    virtual ~range_query ();
> 
>    virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
>    virtual tree value_on_edge (edge, tree name) OVERRIDE;
>    virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;
> 
>    // These are the range equivalents of the value_* methods.  Instead
>    // of returning a singleton, they calculate a range and return it in
>    // R.  TRUE is returned on success or FALSE if no range was found.
>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
>    virtual bool range_on_edge (irange &r, edge, tree name);
>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
> 
>    // DEPRECATED: This method is used from vr-values.  The plan is to
>    // rewrite all uses of it to the above API.
>    virtual const class value_range_equiv *get_value_range (const_tree,
>                                gimple * = NULL);
> };
> 
> The duality of the API (value_of_* and range_on_*) is because some 
> passes are interested in a singleton value 
> (substitute_and_fold_enginge), while others are interested in ranges 
> (vr_values).  Passes that are only interested in singletons can take a 
> value_query, while passes that are interested in full ranges, can take a 
> range_query.  Of course, for future proofing, we would recommend taking 
> a range_query, since if you provide a default range_of_expr, sensible 
> defaults will be provided for the others in terms of range_of_expr.
> 
> Note, that the absolute bare minimum that must be provided is a 
> value_of_expr and a range_of_expr respectively.
> 
> One piece of the API which is missing is a method  to return the range 
> of an arbitrary SSA_NAME *after* a statement.  Currently range_of_expr 
> calculates the range of an expression upon entry to the statement, 
> whereas range_of_stmt calculates the range of *only* the LHS of a 
> statement AFTER the statement has executed.
> 
> This would allow for complete representation of the ranges/values in 
> something like:
> 
>      d_4 = *g_7;
> 
> Here the range of g_7 upon entry could be VARYING, but after the 
> dereference we know it must be non-zero.  Well for sane targets anyhow.
> 
> Choices would be to:
> 
>    1) add a 4th method such as "range_after_stmt", or
> 
>    2) merge that functionality with the existing range_of_stmt method to 
> provide "after" functionality for any ssa_name.  Currently the SSA_NAME 
> must be the same as the LHS if specified.  It also does not need to be 
> specified to allow evaluation of statements without a LHS, (if (x < 1) 
> has no LHS, but will return [0,0] [1,1] or [0,1] as there is a boolean 
> "result" to the statement.
> 
> Also, we've debated whether to use the two classes (value_query and 
> range_query) separately as above, or perhaps join them into one class. I 
> personally like the separation of powers.  Some users only care for 
> singletons, no sense in polluting their implementations with talk of 
> ranges.  But we're open to suggestions.
> 
> The patchset is divided into 3 parts:
> 
> 1. Generic implementation of class.

Just a few comments on the new APIs in part 1.

Here I would again recommend preventing the copying and assigning
the new range_query class.  Ditto for equiv_allocator unless its
base takes care of that.  For value_query, since it has a pure
virtual function, I would suggest to declare its default ctor
protected (and = default).  That way, if someone tries to define
an object of it, they'll get a nice compilation error as opposed
to a confusing linker error.

The range_query default ctor uses the new epression to allocate
and construct its equiv_allocator object but the range_query dtor
just calls release on the object without invoking the delete
expression on it.  Unless the release() function does that (I'd
hope not but I have seen it) that seems like a leak.

Also, defining virtual functions to take default arguments tends
to be tricky and is generally discouraged.  Usually the default
argument value from the base class is used, unless the overridden
virtual function is called directly (either via Derived::foo()
or via ptr_to_derived->foo()).  To avoid surprises all overridden
virtual functions need to have the same defaults as the base,
which is easy to get wrong without compiler warnings.

One way to get around this is to define a non-virtual public
function with defaults in the base class and have it call
a protected virtual without the defaults.

Finally, unless both a type and function with the same name exist
in the same scope there is no reason to mention the class-id when
referencing a class name.  I.e., this

   value_range_equiv *allocate_value_range_equiv ();
   void free_value_range_equiv (value_range_equiv *);

is the same as this:

   class value_range_equiv *allocate_value_range_equiv ();
   void free_value_range_equiv (class value_range_equiv *);

but the former is shorter and clearer (and in line with existing
practice).

I'm hoping to look at the remaining parts soon (especially part
3) but right now I'm being summoned to dinner.

Martin

> 
> 2. Conversion of vr_values and substitute_and_fold_engine to class.
> 
> The conversion of the substitute_and_fold_engine class involved changing 
> all calls to get_value() to a corresponding method providing context. 
> For example, when the edge or statement is available, we pass these to 
> the appropriate method.
> 
> With this, we've been able to plug in the ranger into our hybrid evrp, 
> which can then get better ranges.  Our hybrid evrp has moved a lot of 
> optimizations that were happening much later into the early vrp stage.
> 
> 3. Conversion of sprintf/strlen pass to class.
> 
> This is a nonfunctional change to the sprintf/strlen passes.  That is, 
> no effort was made to change the passes to multi-ranges.  However, with 
> this patch, we are able to plug in a ranger or evrp with just a few 
> lines, since the range query mechanism share a common API.
> 
> ---------------------
> 
> We think this interface is generic enough to be used in any place that 
> queries ranges or singletons, basically any place we have *valueize* in 
> the compiler.  It can also be used to replace range generators at will, 
> or even combine them to get the best of both worlds.  For example, 
> places that currently have a value_* interface because they work with 
> constants, like CCP, could have an alternate valueizer plugged in and if 
> a range folds to a constant, CCP could then propagate that constant.
> 
> Tested on x86-64 Linux.
> 
> Aldy
Aldy Hernandez Sept. 24, 2020, 6:46 a.m. UTC | #3
On 9/24/20 1:53 AM, Martin Sebor wrote:

> Finally, unless both a type and function with the same name exist
> in the same scope there is no reason to mention the class-id when
> referencing a class name.  I.e., this
> 
>    value_range_equiv *allocate_value_range_equiv ();
>    void free_value_range_equiv (value_range_equiv *);
> 
> is the same as this:
> 
>    class value_range_equiv *allocate_value_range_equiv ();
>    void free_value_range_equiv (class value_range_equiv *);
> 
> but the former is shorter and clearer (and in line with existing
> practice).

value_range_equiv may not be defined in the scope of range-query.h, so 
that is why the class specifier is there.

Aldy
Martin Sebor Sept. 24, 2020, 5:10 p.m. UTC | #4
On 9/24/20 12:46 AM, Aldy Hernandez wrote:
> 
> 
> On 9/24/20 1:53 AM, Martin Sebor wrote:
> 
>> Finally, unless both a type and function with the same name exist
>> in the same scope there is no reason to mention the class-id when
>> referencing a class name.  I.e., this
>>
>>    value_range_equiv *allocate_value_range_equiv ();
>>    void free_value_range_equiv (value_range_equiv *);
>>
>> is the same as this:
>>
>>    class value_range_equiv *allocate_value_range_equiv ();
>>    void free_value_range_equiv (class value_range_equiv *);
>>
>> but the former is shorter and clearer (and in line with existing
>> practice).
> 
> value_range_equiv may not be defined in the scope of range-query.h, so 
> that is why the class specifier is there.

I see.  It's probably a reflection of my C++ background that this
style catches my eye.  In C++ I think it's more common to introduce
a forward declaration of a class before using it.

Just as a side note, the first declaration of a type introduces it
into the enclosing namespace so that from that point forward it can
be used without the class-id.  E.g., this is valid:

   struct A
   {
     // Need class here...
     class B *f ();
     // ...but not here...
     void g (B *);
   };

  // ...or even here:
  B* A::f () { return 0; }

Either way, the code is correct as is and I don't object to it,
just noting that (at least some of) the class-ids are redundant.

Martin
Martin Sebor Sept. 24, 2020, 9:51 p.m. UTC | #5
On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
> As part of the ranger work, we have been trying to clean up and 
> generalize interfaces whenever possible.  This not only helps in 
> reducing the maintenance burden going forward, but provides mechanisms 
> for backwards compatibility between ranger and other providers/users of 
> ranges throughout the compiler like evrp and VRP.
> 
> One such interface is the range_query class in vr_values.h, which 
> provides a range query mechanism for use in the simplify_using_ranges 
> module.  With it, simplify_using_ranges can be used with the ranger, or 
> the VRP twins by providing a get_value_range() method.  This has helped 
> us in comparing apples to apples while doing our work, and has also 
> future proofed the interface so that asking for a range can be done 
> within the context in which it appeared.  For example, get_value_range 
> now takes a gimple statement which provides context.  We are no longer 
> tied to asking for a global SSA range, but can ask for the range of an 
> SSA within a statement.  Granted, this functionality is currently only 
> in the ranger, but evrp/vrp could be adapted to pass such context.
> 
> The range_query is a good first step, but what we really want is a 
> generic query mechanism that can ask for SSA ranges within an 
> expression, a statement, an edge, or anything else that may come up.  We 
> think that a generic mechanism can be used not only for range producers, 
> but consumers such as the substitute_and_fold_engine (see get_value 
> virtual) and possibly the gimple folder (see valueize).
> 
> The attached patchset provides such an interface.  It is meant to be a 
> replacement for range_query that can be used for vr_values, 
> substitute_and_fold, the subsitute_and_fold_engine, as well as the 
> ranger.  The general API is:
> 
> class value_query
> {
> public:
>    // Return the singleton expression for NAME at a gimple statement,
>    // or NULL if none found.
>    virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
>    // Return the singleton expression for NAME at an edge, or NULL if
>    // none found.
>    virtual tree value_on_edge (edge, tree name);
>    // Return the singleton expression for the LHS of a gimple
>    // statement, assuming an (optional) initial value of NAME.  Returns
>    // NULL if none found.
>    //
>    // Note this method calculates the range the LHS would have *after*
>    // the statement has executed.
>    virtual tree value_of_stmt (gimple *, tree name = NULL);
> };
> 
> class range_query : public value_query
> {
> public:
>    range_query ();
>    virtual ~range_query ();
> 
>    virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
>    virtual tree value_on_edge (edge, tree name) OVERRIDE;
>    virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;
> 
>    // These are the range equivalents of the value_* methods.  Instead
>    // of returning a singleton, they calculate a range and return it in
>    // R.  TRUE is returned on success or FALSE if no range was found.
>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
>    virtual bool range_on_edge (irange &r, edge, tree name);
>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
> 
>    // DEPRECATED: This method is used from vr-values.  The plan is to
>    // rewrite all uses of it to the above API.
>    virtual const class value_range_equiv *get_value_range (const_tree,
>                                gimple * = NULL);
> };
> 
> The duality of the API (value_of_* and range_on_*) is because some 
> passes are interested in a singleton value 
> (substitute_and_fold_enginge), while others are interested in ranges 
> (vr_values).  Passes that are only interested in singletons can take a 
> value_query, while passes that are interested in full ranges, can take a 
> range_query.  Of course, for future proofing, we would recommend taking 
> a range_query, since if you provide a default range_of_expr, sensible 
> defaults will be provided for the others in terms of range_of_expr.
> 
> Note, that the absolute bare minimum that must be provided is a 
> value_of_expr and a range_of_expr respectively.
> 
> One piece of the API which is missing is a method  to return the range 
> of an arbitrary SSA_NAME *after* a statement.  Currently range_of_expr 
> calculates the range of an expression upon entry to the statement, 
> whereas range_of_stmt calculates the range of *only* the LHS of a 
> statement AFTER the statement has executed.
> 
> This would allow for complete representation of the ranges/values in 
> something like:
> 
>      d_4 = *g_7;
> 
> Here the range of g_7 upon entry could be VARYING, but after the 
> dereference we know it must be non-zero.  Well for sane targets anyhow.
> 
> Choices would be to:
> 
>    1) add a 4th method such as "range_after_stmt", or
> 
>    2) merge that functionality with the existing range_of_stmt method to 
> provide "after" functionality for any ssa_name.  Currently the SSA_NAME 
> must be the same as the LHS if specified.  It also does not need to be 
> specified to allow evaluation of statements without a LHS, (if (x < 1) 
> has no LHS, but will return [0,0] [1,1] or [0,1] as there is a boolean 
> "result" to the statement.
> 
> Also, we've debated whether to use the two classes (value_query and 
> range_query) separately as above, or perhaps join them into one class. I 
> personally like the separation of powers.  Some users only care for 
> singletons, no sense in polluting their implementations with talk of 
> ranges.  But we're open to suggestions.
> 
> The patchset is divided into 3 parts:
> 
> 1. Generic implementation of class.
> 
> 2. Conversion of vr_values and substitute_and_fold_engine to class.
> 
> The conversion of the substitute_and_fold_engine class involved changing 
> all calls to get_value() to a corresponding method providing context. 
> For example, when the edge or statement is available, we pass these to 
> the appropriate method.
> 
> With this, we've been able to plug in the ranger into our hybrid evrp, 
> which can then get better ranges.  Our hybrid evrp has moved a lot of 
> optimizations that were happening much later into the early vrp stage.
> 
> 3. Conversion of sprintf/strlen pass to class.
> 
> This is a nonfunctional change to the sprintf/strlen passes.  That is, 
> no effort was made to change the passes to multi-ranges.  However, with 
> this patch, we are able to plug in a ranger or evrp with just a few 
> lines, since the range query mechanism share a common API.

Thanks for doing all this!  There isn't anything I don't understand
in the sprintf changes so no questions from me (well, almost none).
Just some comments:

The current call statement is available in all functions that take
a directive argument, as dir->info.callstmt.  There should be no need
to also add it as a new argument to the functions that now need it.

The change adds code along these lines in a bunch of places:

+	  value_range vr;
+	  if (!query->range_of_expr (vr, arg, stmt))
+	    vr.set_varying (TREE_TYPE (arg));

I thought under the new Ranger APIs when a range couldn't be
determined it would be automatically set to the maximum for
the type.  I like that and have been moving in that direction
with my code myself (rather than having an API fail, have it
set the max range and succeed).

Since that isn't so in this case, I think it would still be nice
if the added code could be written as if the range were set to
varying in this case and (ideally) reduced to just initialization:

   value_range vr = some-function (query, stmt, arg);

some-function could be an inline helper defined just for the sprintf
pass (and maybe also strlen which also seems to use the same pattern),
or it could be a value_range AKA irange ctor, or it could be a member
of range_query, whatever is the most appropriate.

(If assigning/copying a value_range is thought to be too expensive,
declaring it first and then passing it to that helper to set it
would work too).

In strlen, is the removed comment no longer relevant?  (I.e., does
the ranger solve the problem?)

-      /* The range below may be "inaccurate" if a constant has been
-	 substituted earlier for VAL by this pass that hasn't been
-	 propagated through the CFG.  This shoud be fixed by the new
-	 on-demand VRP if/when it becomes available (hopefully in
-	 GCC 11).  */

I'm wondering about the comment added to get_range_strlen_dynamic
and other places:

+	  // FIXME: Use range_query instead of global ranges.

Is that something you're planning to do in a followup or should
I remember to do it at some point?

Otherwise I have no concern with the changes.

Thanks
Martin

> 
> ---------------------
> 
> We think this interface is generic enough to be used in any place that 
> queries ranges or singletons, basically any place we have *valueize* in 
> the compiler.  It can also be used to replace range generators at will, 
> or even combine them to get the best of both worlds.  For example, 
> places that currently have a value_* interface because they work with 
> constants, like CCP, could have an alternate valueizer plugged in and if 
> a range folds to a constant, CCP could then propagate that constant.
> 
> Tested on x86-64 Linux.
> 
> Aldy
Andrew MacLeod Sept. 25, 2020, 1:17 p.m. UTC | #6
On 9/24/20 5:51 PM, Martin Sebor via Gcc-patches wrote:
> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
>>
>>
>> 3. Conversion of sprintf/strlen pass to class.
>>
>> This is a nonfunctional change to the sprintf/strlen passes. That is, 
>> no effort was made to change the passes to multi-ranges.  However, 
>> with this patch, we are able to plug in a ranger or evrp with just a 
>> few lines, since the range query mechanism share a common API.
>
> Thanks for doing all this!  There isn't anything I don't understand
> in the sprintf changes so no questions from me (well, almost none).
> Just some comments:
>
> The current call statement is available in all functions that take
> a directive argument, as dir->info.callstmt.  There should be no need
> to also add it as a new argument to the functions that now need it.
>
> The change adds code along these lines in a bunch of places:
>
> +      value_range vr;
> +      if (!query->range_of_expr (vr, arg, stmt))
> +        vr.set_varying (TREE_TYPE (arg));
>
> I thought under the new Ranger APIs when a range couldn't be
> determined it would be automatically set to the maximum for
> the type.  I like that and have been moving in that direction
> with my code myself (rather than having an API fail, have it
> set the max range and succeed).
>
Aldy will have to comment why that is there, probably an oversight The 
API should return VARYING if it cant calculate a better range. The only 
time the API returns a FALSE for a query is when the range is 
unsupported..  ie, you ask for the range of a float statement or argument.

Andrew
Andrew MacLeod Sept. 25, 2020, 5:41 p.m. UTC | #7
On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote:
> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
>> As part of the ranger work, we have been trying to clean up and 
>> generalize interfaces whenever possible. This not only helps in 
>> reducing the maintenance burden going forward, but provides 
>> mechanisms for backwards compatibility between ranger and other 
>> providers/users of ranges throughout the compiler like evrp and VRP.
>>
>> One such interface is the range_query class in vr_values.h, which 
>> provides a range query mechanism for use in the simplify_using_ranges 
>> module.  With it, simplify_using_ranges can be used with the ranger, 
>> or the VRP twins by providing a get_value_range() method.  This has 
>> helped us in comparing apples to apples while doing our work, and has 
>> also future proofed the interface so that asking for a range can be 
>> done within the context in which it appeared.  For example, 
>> get_value_range now takes a gimple statement which provides context.  
>> We are no longer tied to asking for a global SSA range, but can ask 
>> for the range of an SSA within a statement. Granted, this 
>> functionality is currently only in the ranger, but evrp/vrp could be 
>> adapted to pass such context.
>>
>> The range_query is a good first step, but what we really want is a 
>> generic query mechanism that can ask for SSA ranges within an 
>> expression, a statement, an edge, or anything else that may come up.  
>> We think that a generic mechanism can be used not only for range 
>> producers, but consumers such as the substitute_and_fold_engine (see 
>> get_value virtual) and possibly the gimple folder (see valueize).
>>
>> The attached patchset provides such an interface.  It is meant to be 
>> a replacement for range_query that can be used for vr_values, 
>> substitute_and_fold, the subsitute_and_fold_engine, as well as the 
>> ranger.  The general API is:
>>
>> class value_query
>> {
>> public:
>>    // Return the singleton expression for NAME at a gimple statement,
>>    // or NULL if none found.
>>    virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
>>    // Return the singleton expression for NAME at an edge, or NULL if
>>    // none found.
>>    virtual tree value_on_edge (edge, tree name);
>>    // Return the singleton expression for the LHS of a gimple
>>    // statement, assuming an (optional) initial value of NAME. Returns
>>    // NULL if none found.
>>    //
>>    // Note this method calculates the range the LHS would have *after*
>>    // the statement has executed.
>>    virtual tree value_of_stmt (gimple *, tree name = NULL);
>> };
>>
>> class range_query : public value_query
>> {
>> public:
>>    range_query ();
>>    virtual ~range_query ();
>>
>>    virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
>>    virtual tree value_on_edge (edge, tree name) OVERRIDE;
>>    virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;
>>
>>    // These are the range equivalents of the value_* methods. Instead
>>    // of returning a singleton, they calculate a range and return it in
>>    // R.  TRUE is returned on success or FALSE if no range was found.
>>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
>> = 0;
>>    virtual bool range_on_edge (irange &r, edge, tree name);
>>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
>>
>>    // DEPRECATED: This method is used from vr-values.  The plan is to
>>    // rewrite all uses of it to the above API.
>>    virtual const class value_range_equiv *get_value_range (const_tree,
>>                                gimple * = NULL);
>> };
>>
>> The duality of the API (value_of_* and range_on_*) is because some 
>> passes are interested in a singleton value 
>> (substitute_and_fold_enginge), while others are interested in ranges 
>> (vr_values).  Passes that are only interested in singletons can take 
>> a value_query, while passes that are interested in full ranges, can 
>> take a range_query.  Of course, for future proofing, we would 
>> recommend taking a range_query, since if you provide a default 
>> range_of_expr, sensible defaults will be provided for the others in 
>> terms of range_of_expr.
>>
>> Note, that the absolute bare minimum that must be provided is a 
>> value_of_expr and a range_of_expr respectively.
>>
>> One piece of the API which is missing is a method  to return the 
>> range of an arbitrary SSA_NAME *after* a statement.  Currently 
>> range_of_expr calculates the range of an expression upon entry to the 
>> statement, whereas range_of_stmt calculates the range of *only* the 
>> LHS of a statement AFTER the statement has executed.
>>
>> This would allow for complete representation of the ranges/values in 
>> something like:
>>
>>      d_4 = *g_7;
>>
>> Here the range of g_7 upon entry could be VARYING, but after the 
>> dereference we know it must be non-zero.  Well for sane targets anyhow.
>>
>> Choices would be to:
>>
>>    1) add a 4th method such as "range_after_stmt", or
>>
>>    2) merge that functionality with the existing range_of_stmt method 
>> to provide "after" functionality for any ssa_name. Currently the 
>> SSA_NAME must be the same as the LHS if specified.  It also does not 
>> need to be specified to allow evaluation of statements without a LHS, 
>> (if (x < 1) has no LHS, but will return [0,0] [1,1] or [0,1] as there 
>> is a boolean "result" to the statement.
>>
>> Also, we've debated whether to use the two classes (value_query and 
>> range_query) separately as above, or perhaps join them into one 
>> class. I personally like the separation of powers.  Some users only 
>> care for singletons, no sense in polluting their implementations with 
>> talk of ranges.  But we're open to suggestions.
>>
>> The patchset is divided into 3 parts:
>>
>> 1. Generic implementation of class.
>
> Just a few comments on the new APIs in part 1.
>
> Here I would again recommend preventing the copying and assigning
> the new range_query class.  Ditto for equiv_allocator unless its
> base takes care of that.  For value_query, since it has a pure
> virtual function, I would suggest to declare its default ctor
> protected (and = default).  That way, if someone tries to define
> an object of it, they'll get a nice compilation error as opposed
> to a confusing linker error.
>
> The range_query default ctor uses the new epression to allocate
> and construct its equiv_allocator object but the range_query dtor
> just calls release on the object without invoking the delete
> expression on it.  Unless the release() function does that (I'd
> hope not but I have seen it) that seems like a leak.
>

looks like an oversight, definitely call delete.

> Also, defining virtual functions to take default arguments tends
> to be tricky and is generally discouraged.  Usually the default
> argument value from the base class is used, unless the overridden
> virtual function is called directly (either via Derived::foo()
> or via ptr_to_derived->foo()).  To avoid surprises all overridden
> virtual functions need to have the same defaults as the base,
> which is easy to get wrong without compiler warnings.
>

> One way to get around this is to define a non-virtual public
> function with defaults in the base class and have it call
> a protected virtual without the defaults.
>
>
I'm not overly concerned about this.. If there is a default, its always 
NULL, and it is simply to bypass having multiple versions of the routine 
which come with varies degreees of confusion of its own.  I experimented 
with multiple approaches, including that one.. I found each thing was a 
little awkward one way or another.

Given a NULL default is going to be pretty consistent, I'd leave it be 
for now.   Its primarily for legacy code, and when we nuke those uses 
down the road, we could consider removing the default, forcing the 
client to explicitly provide the NULL indicating the know they are 
looking for a global value, not a contextual one.


Since you have replied to this thread, whats your opinion whether there 
should be an extra API entry point for range/value_after_stmt or whether 
that should be rolled into  the range_of_stmt routine, and any ssa-name 
specified would be the it's range after the stmt..   perhaps renaming it 
to something like "range_post_stmt" and if no ssa-name is specified, it 
applies to the stmt result.

Andrew
Aldy Hernandez Sept. 28, 2020, 8:12 a.m. UTC | #8
On 9/25/20 3:17 PM, Andrew MacLeod wrote:
> On 9/24/20 5:51 PM, Martin Sebor via Gcc-patches wrote:
>> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
>>>
>>>
>>> 3. Conversion of sprintf/strlen pass to class.
>>>
>>> This is a nonfunctional change to the sprintf/strlen passes. That is, 
>>> no effort was made to change the passes to multi-ranges.  However, 
>>> with this patch, we are able to plug in a ranger or evrp with just a 
>>> few lines, since the range query mechanism share a common API.
>>
>> Thanks for doing all this!  There isn't anything I don't understand
>> in the sprintf changes so no questions from me (well, almost none).
>> Just some comments:
>>
>> The current call statement is available in all functions that take
>> a directive argument, as dir->info.callstmt.  There should be no need
>> to also add it as a new argument to the functions that now need it.
>>
>> The change adds code along these lines in a bunch of places:
>>
>> +      value_range vr;
>> +      if (!query->range_of_expr (vr, arg, stmt))
>> +        vr.set_varying (TREE_TYPE (arg));
>>
>> I thought under the new Ranger APIs when a range couldn't be
>> determined it would be automatically set to the maximum for
>> the type.  I like that and have been moving in that direction
>> with my code myself (rather than having an API fail, have it
>> set the max range and succeed).
>>
> Aldy will have to comment why that is there, probably an oversight The 
> API should return VARYING if it cant calculate a better range. The only 
> time the API returns a FALSE for a query is when the range is 
> unsupported..  ie, you ask for the range of a float statement or argument.

Right on both counts.

We could ignore the return value, or my preferred method (which I know 
Andrew hates):

gcc_assert (query->range_of_expr(vr, arg, stmt));

Aldy
Martin Sebor Sept. 28, 2020, 3:27 p.m. UTC | #9
On 9/25/20 11:41 AM, Andrew MacLeod wrote:
> On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote:
>> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
>>> As part of the ranger work, we have been trying to clean up and 
>>> generalize interfaces whenever possible. This not only helps in 
>>> reducing the maintenance burden going forward, but provides 
>>> mechanisms for backwards compatibility between ranger and other 
>>> providers/users of ranges throughout the compiler like evrp and VRP.
>>>
>>> One such interface is the range_query class in vr_values.h, which 
>>> provides a range query mechanism for use in the simplify_using_ranges 
>>> module.  With it, simplify_using_ranges can be used with the ranger, 
>>> or the VRP twins by providing a get_value_range() method.  This has 
>>> helped us in comparing apples to apples while doing our work, and has 
>>> also future proofed the interface so that asking for a range can be 
>>> done within the context in which it appeared.  For example, 
>>> get_value_range now takes a gimple statement which provides context. 
>>> We are no longer tied to asking for a global SSA range, but can ask 
>>> for the range of an SSA within a statement. Granted, this 
>>> functionality is currently only in the ranger, but evrp/vrp could be 
>>> adapted to pass such context.
>>>
>>> The range_query is a good first step, but what we really want is a 
>>> generic query mechanism that can ask for SSA ranges within an 
>>> expression, a statement, an edge, or anything else that may come up. 
>>> We think that a generic mechanism can be used not only for range 
>>> producers, but consumers such as the substitute_and_fold_engine (see 
>>> get_value virtual) and possibly the gimple folder (see valueize).
>>>
>>> The attached patchset provides such an interface.  It is meant to be 
>>> a replacement for range_query that can be used for vr_values, 
>>> substitute_and_fold, the subsitute_and_fold_engine, as well as the 
>>> ranger.  The general API is:
>>>
>>> class value_query
>>> {
>>> public:
>>>    // Return the singleton expression for NAME at a gimple statement,
>>>    // or NULL if none found.
>>>    virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
>>>    // Return the singleton expression for NAME at an edge, or NULL if
>>>    // none found.
>>>    virtual tree value_on_edge (edge, tree name);
>>>    // Return the singleton expression for the LHS of a gimple
>>>    // statement, assuming an (optional) initial value of NAME. Returns
>>>    // NULL if none found.
>>>    //
>>>    // Note this method calculates the range the LHS would have *after*
>>>    // the statement has executed.
>>>    virtual tree value_of_stmt (gimple *, tree name = NULL);
>>> };
>>>
>>> class range_query : public value_query
>>> {
>>> public:
>>>    range_query ();
>>>    virtual ~range_query ();
>>>
>>>    virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
>>>    virtual tree value_on_edge (edge, tree name) OVERRIDE;
>>>    virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;
>>>
>>>    // These are the range equivalents of the value_* methods. Instead
>>>    // of returning a singleton, they calculate a range and return it in
>>>    // R.  TRUE is returned on success or FALSE if no range was found.
>>>    virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) 
>>> = 0;
>>>    virtual bool range_on_edge (irange &r, edge, tree name);
>>>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
>>>
>>>    // DEPRECATED: This method is used from vr-values.  The plan is to
>>>    // rewrite all uses of it to the above API.
>>>    virtual const class value_range_equiv *get_value_range (const_tree,
>>>                                gimple * = NULL);
>>> };
>>>
>>> The duality of the API (value_of_* and range_on_*) is because some 
>>> passes are interested in a singleton value 
>>> (substitute_and_fold_enginge), while others are interested in ranges 
>>> (vr_values).  Passes that are only interested in singletons can take 
>>> a value_query, while passes that are interested in full ranges, can 
>>> take a range_query.  Of course, for future proofing, we would 
>>> recommend taking a range_query, since if you provide a default 
>>> range_of_expr, sensible defaults will be provided for the others in 
>>> terms of range_of_expr.
>>>
>>> Note, that the absolute bare minimum that must be provided is a 
>>> value_of_expr and a range_of_expr respectively.
>>>
>>> One piece of the API which is missing is a method  to return the 
>>> range of an arbitrary SSA_NAME *after* a statement.  Currently 
>>> range_of_expr calculates the range of an expression upon entry to the 
>>> statement, whereas range_of_stmt calculates the range of *only* the 
>>> LHS of a statement AFTER the statement has executed.
>>>
>>> This would allow for complete representation of the ranges/values in 
>>> something like:
>>>
>>>      d_4 = *g_7;
>>>
>>> Here the range of g_7 upon entry could be VARYING, but after the 
>>> dereference we know it must be non-zero.  Well for sane targets anyhow.
>>>
>>> Choices would be to:
>>>
>>>    1) add a 4th method such as "range_after_stmt", or
>>>
>>>    2) merge that functionality with the existing range_of_stmt method 
>>> to provide "after" functionality for any ssa_name. Currently the 
>>> SSA_NAME must be the same as the LHS if specified.  It also does not 
>>> need to be specified to allow evaluation of statements without a LHS, 
>>> (if (x < 1) has no LHS, but will return [0,0] [1,1] or [0,1] as there 
>>> is a boolean "result" to the statement.
>>>
>>> Also, we've debated whether to use the two classes (value_query and 
>>> range_query) separately as above, or perhaps join them into one 
>>> class. I personally like the separation of powers.  Some users only 
>>> care for singletons, no sense in polluting their implementations with 
>>> talk of ranges.  But we're open to suggestions.
>>>
>>> The patchset is divided into 3 parts:
>>>
>>> 1. Generic implementation of class.
>>
>> Just a few comments on the new APIs in part 1.
>>
>> Here I would again recommend preventing the copying and assigning
>> the new range_query class.  Ditto for equiv_allocator unless its
>> base takes care of that.  For value_query, since it has a pure
>> virtual function, I would suggest to declare its default ctor
>> protected (and = default).  That way, if someone tries to define
>> an object of it, they'll get a nice compilation error as opposed
>> to a confusing linker error.
>>
>> The range_query default ctor uses the new epression to allocate
>> and construct its equiv_allocator object but the range_query dtor
>> just calls release on the object without invoking the delete
>> expression on it.  Unless the release() function does that (I'd
>> hope not but I have seen it) that seems like a leak.
>>
> 
> looks like an oversight, definitely call delete.
> 
>> Also, defining virtual functions to take default arguments tends
>> to be tricky and is generally discouraged.  Usually the default
>> argument value from the base class is used, unless the overridden
>> virtual function is called directly (either via Derived::foo()
>> or via ptr_to_derived->foo()).  To avoid surprises all overridden
>> virtual functions need to have the same defaults as the base,
>> which is easy to get wrong without compiler warnings.
>>
> 
>> One way to get around this is to define a non-virtual public
>> function with defaults in the base class and have it call
>> a protected virtual without the defaults.
>>
>>
> I'm not overly concerned about this.. If there is a default, its always 
> NULL, and it is simply to bypass having multiple versions of the routine 
> which come with varies degreees of confusion of its own.  I experimented 
> with multiple approaches, including that one.. I found each thing was a 
> little awkward one way or another.
> 
> Given a NULL default is going to be pretty consistent, I'd leave it be 
> for now.   Its primarily for legacy code, and when we nuke those uses 
> down the road, we could consider removing the default, forcing the 
> client to explicitly provide the NULL indicating the know they are 
> looking for a global value, not a contextual one.

I agree that the risk with the null is lower than with other
default values.  I also don't have a sense of how common deriving
from these classes will be so if it's expected to be rare it's
probably not something to be concerned about.

> Since you have replied to this thread, whats your opinion whether there 
> should be an extra API entry point for range/value_after_stmt or whether 
> that should be rolled into  the range_of_stmt routine, and any ssa-name 
> specified would be the it's range after the stmt..   perhaps renaming it 
> to something like "range_post_stmt" and if no ssa-name is specified, it 
> applies to the stmt result.

I prefer API parameterization over distinct names.  It makes APIs
easier for me to reason about, and when implementing them, often
lets me avoid duplicating code between.  A good rule of thumb is
to ask if it might ever make sense to defer the decision to call
one or the other alternative until runtime.  If the answer isn't
an unequivocal "no" then a single entry point with a parameter is
probably a better choice.

By the way, your question about API design makes me think about
something else.  A range is a generalization of a singleton value
(it's a collection of singletons, which is particularly true with
the new multi-range design).  Conversely, a singleton value is
a specialization of a range.  It always makes sense to ask for
the range of an expression, even if it evaluates to a constant.
It only makes sense to ask for its value if it's know to evaluate
to a constant.  So if it's expected to be useful to make
a distinction between the two concepts in the type system I would
think it natural to model the relationship by having value_query
derive from range_query rather than the other way around.

Martin
Andrew MacLeod Sept. 28, 2020, 3:48 p.m. UTC | #10
On 9/28/20 11:27 AM, Martin Sebor wrote:
> On 9/25/20 11:41 AM, Andrew MacLeod wrote:

>
>> Since you have replied to this thread, whats your opinion whether 
>> there should be an extra API entry point for range/value_after_stmt 
>> or whether that should be rolled into  the range_of_stmt routine, and 
>> any ssa-name specified would be the it's range after the stmt..   
>> perhaps renaming it to something like "range_post_stmt" and if no 
>> ssa-name is specified, it applies to the stmt result.
>
> I prefer API parameterization over distinct names.  It makes APIs
> easier for me to reason about, and when implementing them, often
> lets me avoid duplicating code between.  A good rule of thumb is
> to ask if it might ever make sense to defer the decision to call
> one or the other alternative until runtime.  If the answer isn't
> an unequivocal "no" then a single entry point with a parameter is
> probably a better choice.
>
> By the way, your question about API design makes me think about
> something else.  A range is a generalization of a singleton value
> (it's a collection of singletons, which is particularly true with
> the new multi-range design).  Conversely, a singleton value is
> a specialization of a range.  It always makes sense to ask for
> the range of an expression, even if it evaluates to a constant.
> It only makes sense to ask for its value if it's know to evaluate
> to a constant.  So if it's expected to be useful to make
> a distinction between the two concepts in the type system I would
> think it natural to model the relationship by having value_query
> derive from range_query rather than the other way around.
>

We've considered a single class for queries rather than the existing 
derived structure, but it all depends on your point of view.

You can have values without knowing about ranges, but you can't have 
ranges without knowing about values.  Its persepective, so there isnt a 
really a "right" answer. Either can be implemented in terms of the other.

Lots of places in the compiler deal with value callbacks, only a few 
know or care about ranges, and those usually care about both ranges and 
values... thus the current structure.

Regardless, its a starting point and easy to combine as a single class 
later should that be worthwhile.

Andrew
Andrew MacLeod Oct. 1, 2020, 4:17 a.m. UTC | #11
On 9/25/20 1:41 PM, Andrew MacLeod via Gcc-patches wrote:
> On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote:
>> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote:
>>> As part of the ranger work, we have been trying to clean up and 
>>> generalize interfaces whenever possible. This not only helps in 
>>> reducing the maintenance burden going forward, but provides 
>>> mechanisms for backwards compatibility between ranger and other 
>>> providers/users of ranges throughout the compiler like evrp and VRP.
>>>
>>> One such interface is the range_query class in vr_values.h, which 
>>> provides a range query mechanism for use in the 
>>> simplify_using_ranges module.  With it, simplify_using_ranges can be 
>>> used with the ranger, or the VRP twins by providing a 
>>> get_value_range() method.  This has helped us in comparing apples to 
>>> apples while doing our work, and has also future proofed the 
>>> interface so that asking for a range can be done within the context 
>>> in which it appeared.  For example, get_value_range now takes a 
>>> gimple statement which provides context.  We are no longer tied to 
>>> asking for a global SSA range, but can ask for the range of an SSA 
>>> within a statement. Granted, this functionality is currently only in 
>>> the ranger, but evrp/vrp could be adapted to pass such context.
>>>
>>> The range_query is a good first step, but what we really want is a 
>>> generic query mechanism that can ask for SSA ranges within an 
>>> expression, a statement, an edge, or anything else that may come 
>>> up.  We think that a generic mechanism can be used not only for 
>>> range producers, but consumers such as the 
>>> substitute_and_fold_engine (see get_value virtual) and possibly the 
>>> gimple folder (see valueize).
>>>
>>> The attached patchset provides such an interface.  It is meant to be 
>>> a replacement for range_query that can be used for vr_values, 
>>> substitute_and_fold, the subsitute_and_fold_engine, as well as the 
>>> ranger.  The general API is:
>>>
>>> class value_query
>>> {
>>> public:
>>>    // Return the singleton expression for NAME at a gimple statement,
>>>    // or NULL if none found.
>>>    virtual tree value_of_expr (tree name, gimple * = NULL) = 0;
>>>    // Return the singleton expression for NAME at an edge, or NULL if
>>>    // none found.
>>>    virtual tree value_on_edge (edge, tree name);
>>>    // Return the singleton expression for the LHS of a gimple
>>>    // statement, assuming an (optional) initial value of NAME. Returns
>>>    // NULL if none found.
>>>    //
>>>    // Note this method calculates the range the LHS would have *after*
>>>    // the statement has executed.
>>>    virtual tree value_of_stmt (gimple *, tree name = NULL);
>>> };
>>>
>>> class range_query : public value_query
>>> {
>>> public:
>>>    range_query ();
>>>    virtual ~range_query ();
>>>
>>>    virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE;
>>>    virtual tree value_on_edge (edge, tree name) OVERRIDE;
>>>    virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE;
>>>
>>>    // These are the range equivalents of the value_* methods. Instead
>>>    // of returning a singleton, they calculate a range and return it in
>>>    // R.  TRUE is returned on success or FALSE if no range was found.
>>>    virtual bool range_of_expr (irange &r, tree name, gimple * = 
>>> NULL) = 0;
>>>    virtual bool range_on_edge (irange &r, edge, tree name);
>>>    virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
>>>
>>>    // DEPRECATED: This method is used from vr-values.  The plan is to
>>>    // rewrite all uses of it to the above API.
>>>    virtual const class value_range_equiv *get_value_range (const_tree,
>>>                                gimple * = NULL);
>>> };
>>>
>>> The duality of the API (value_of_* and range_on_*) is because some 
>>> passes are interested in a singleton value 
>>> (substitute_and_fold_enginge), while others are interested in ranges 
>>> (vr_values).  Passes that are only interested in singletons can take 
>>> a value_query, while passes that are interested in full ranges, can 
>>> take a range_query.  Of course, for future proofing, we would 
>>> recommend taking a range_query, since if you provide a default 
>>> range_of_expr, sensible defaults will be provided for the others in 
>>> terms of range_of_expr.
>>>
>>> Note, that the absolute bare minimum that must be provided is a 
>>> value_of_expr and a range_of_expr respectively.
>>>
>>> One piece of the API which is missing is a method  to return the 
>>> range of an arbitrary SSA_NAME *after* a statement. Currently 
>>> range_of_expr calculates the range of an expression upon entry to 
>>> the statement, whereas range_of_stmt calculates the range of *only* 
>>> the LHS of a statement AFTER the statement has executed.
>>>
>>> This would allow for complete representation of the ranges/values in 
>>> something like:
>>>
>>>      d_4 = *g_7;
>>>
>>> Here the range of g_7 upon entry could be VARYING, but after the 
>>> dereference we know it must be non-zero.  Well for sane targets anyhow.
>>>
>>> Choices would be to:
>>>
>>>    1) add a 4th method such as "range_after_stmt", or
>>>
>>>    2) merge that functionality with the existing range_of_stmt 
>>> method to provide "after" functionality for any ssa_name. Currently 
>>> the SSA_NAME must be the same as the LHS if specified.  It also does 
>>> not need to be specified to allow evaluation of statements without a 
>>> LHS, (if (x < 1) has no LHS, but will return [0,0] [1,1] or [0,1] as 
>>> there is a boolean "result" to the statement.
>>>
>>> Also, we've debated whether to use the two classes (value_query and 
>>> range_query) separately as above, or perhaps join them into one 
>>> class. I personally like the separation of powers.  Some users only 
>>> care for singletons, no sense in polluting their implementations 
>>> with talk of ranges.  But we're open to suggestions.
>>>
>>> The patchset is divided into 3 parts:
>>>
>>> 1. Generic implementation of class.
>>
>> Just a few comments on the new APIs in part 1.
>>
>> Here I would again recommend preventing the copying and assigning
>> the new range_query class.  Ditto for equiv_allocator unless its
>> base takes care of that.  For value_query, since it has a pure
>> virtual function, I would suggest to declare its default ctor
>> protected (and = default).  That way, if someone tries to define
>> an object of it, they'll get a nice compilation error as opposed
>> to a confusing linker error.
>>
>> The range_query default ctor uses the new epression to allocate
>> and construct its equiv_allocator object but the range_query dtor
>> just calls release on the object without invoking the delete
>> expression on it.  Unless the release() function does that (I'd
>> hope not but I have seen it) that seems like a leak.
>>
>
> looks like an oversight, definitely call delete.
>
>> Also, defining virtual functions to take default arguments tends
>> to be tricky and is generally discouraged.  Usually the default
>> argument value from the base class is used, unless the overridden
>> virtual function is called directly (either via Derived::foo()
>> or via ptr_to_derived->foo()).  To avoid surprises all overridden
>> virtual functions need to have the same defaults as the base,
>> which is easy to get wrong without compiler warnings.
>>
>
>> One way to get around this is to define a non-virtual public
>> function with defaults in the base class and have it call
>> a protected virtual without the defaults.
>>
>>
> I'm not overly concerned about this.. If there is a default, its 
> always NULL, and it is simply to bypass having multiple versions of 
> the routine which come with varies degreees of confusion of its own.  
> I experimented with multiple approaches, including that one.. I found 
> each thing was a little awkward one way or another.
>
> Given a NULL default is going to be pretty consistent, I'd leave it be 
> for now.   Its primarily for legacy code, and when we nuke those uses 
> down the road, we could consider removing the default, forcing the 
> client to explicitly provide the NULL indicating the know they are 
> looking for a global value, not a contextual one.
>
>
> Since you have replied to this thread, whats your opinion whether 
> there should be an extra API entry point for range/value_after_stmt or 
> whether that should be rolled into  the range_of_stmt routine, and any 
> ssa-name specified would be the it's range after the stmt..   perhaps 
> renaming it to something like "range_post_stmt" and if no ssa-name is 
> specified, it applies to the stmt result.
>
> Andrew
>
>
With the change to add the delete and DISABLE_COPY_AND_ASSIGN this is OK.
Its not in widespread use and easy enough to tweak later.

Andrew
Aldy Hernandez Oct. 1, 2020, 9:05 a.m. UTC | #12
> Thanks for doing all this!  There isn't anything I don't understand
> in the sprintf changes so no questions from me (well, almost none).
> Just some comments:

Thanks for your comments on the sprintf/strlen API conversion.

>
> The current call statement is available in all functions that take
> a directive argument, as dir->info.callstmt.  There should be no need
> to also add it as a new argument to the functions that now need it.

Fixed.

>
> The change adds code along these lines in a bunch of places:
>
> +         value_range vr;
> +         if (!query->range_of_expr (vr, arg, stmt))
> +           vr.set_varying (TREE_TYPE (arg));
>
> I thought under the new Ranger APIs when a range couldn't be
> determined it would be automatically set to the maximum for
> the type.  I like that and have been moving in that direction
> with my code myself (rather than having an API fail, have it
> set the max range and succeed).

I went through all the above idioms and noticed all are being used on
supported types (integers or pointers).  So range_of_expr will always
return true.  I've removed the if() and the set_varying.

>
> Since that isn't so in this case, I think it would still be nice
> if the added code could be written as if the range were set to
> varying in this case and (ideally) reduced to just initialization:
>
>    value_range vr = some-function (query, stmt, arg);
>
> some-function could be an inline helper defined just for the sprintf
> pass (and maybe also strlen which also seems to use the same pattern),
> or it could be a value_range AKA irange ctor, or it could be a member
> of range_query, whatever is the most appropriate.
>
> (If assigning/copying a value_range is thought to be too expensive,
> declaring it first and then passing it to that helper to set it
> would work too).
>
> In strlen, is the removed comment no longer relevant?  (I.e., does
> the ranger solve the problem?)
>
> -      /* The range below may be "inaccurate" if a constant has been
> -        substituted earlier for VAL by this pass that hasn't been
> -        propagated through the CFG.  This shoud be fixed by the new
> -        on-demand VRP if/when it becomes available (hopefully in
> -        GCC 11).  */

It should.

>
> I'm wondering about the comment added to get_range_strlen_dynamic
> and other places:
>
> +         // FIXME: Use range_query instead of global ranges.
>
> Is that something you're planning to do in a followup or should
> I remember to do it at some point?

I'm not planning on doing it.  It's just a reminder that it would be
beneficial to do so.

>
> Otherwise I have no concern with the changes.

It's not cleared whether Andrew approved all 3 parts of the patchset
or just the valuation part.  I'll wait for his nod before committing
this chunk.

Aldy
Andrew MacLeod Oct. 1, 2020, 1:22 p.m. UTC | #13
On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>> Thanks for doing all this!  There isn't anything I don't understand
>> in the sprintf changes so no questions from me (well, almost none).
>> Just some comments:
> Thanks for your comments on the sprintf/strlen API conversion.
>
>> The current call statement is available in all functions that take
>> a directive argument, as dir->info.callstmt.  There should be no need
>> to also add it as a new argument to the functions that now need it.
> Fixed.
>
>> The change adds code along these lines in a bunch of places:
>>
>> +         value_range vr;
>> +         if (!query->range_of_expr (vr, arg, stmt))
>> +           vr.set_varying (TREE_TYPE (arg));
>>
>> I thought under the new Ranger APIs when a range couldn't be
>> determined it would be automatically set to the maximum for
>> the type.  I like that and have been moving in that direction
>> with my code myself (rather than having an API fail, have it
>> set the max range and succeed).
> I went through all the above idioms and noticed all are being used on
> supported types (integers or pointers).  So range_of_expr will always
> return true.  I've removed the if() and the set_varying.
>
>> Since that isn't so in this case, I think it would still be nice
>> if the added code could be written as if the range were set to
>> varying in this case and (ideally) reduced to just initialization:
>>
>>     value_range vr = some-function (query, stmt, arg);
>>
>> some-function could be an inline helper defined just for the sprintf
>> pass (and maybe also strlen which also seems to use the same pattern),
>> or it could be a value_range AKA irange ctor, or it could be a member
>> of range_query, whatever is the most appropriate.
>>
>> (If assigning/copying a value_range is thought to be too expensive,
>> declaring it first and then passing it to that helper to set it
>> would work too).
>>
>> In strlen, is the removed comment no longer relevant?  (I.e., does
>> the ranger solve the problem?)
>>
>> -      /* The range below may be "inaccurate" if a constant has been
>> -        substituted earlier for VAL by this pass that hasn't been
>> -        propagated through the CFG.  This shoud be fixed by the new
>> -        on-demand VRP if/when it becomes available (hopefully in
>> -        GCC 11).  */
> It should.
>
>> I'm wondering about the comment added to get_range_strlen_dynamic
>> and other places:
>>
>> +         // FIXME: Use range_query instead of global ranges.
>>
>> Is that something you're planning to do in a followup or should
>> I remember to do it at some point?
> I'm not planning on doing it.  It's just a reminder that it would be
> beneficial to do so.
>
>> Otherwise I have no concern with the changes.
> It's not cleared whether Andrew approved all 3 parts of the patchset
> or just the valuation part.  I'll wait for his nod before committing
> this chunk.
>
> Aldy
>
I have no issue with it, so OK.

Just an observation that should be pointed out, I believe Aldy has all 
the code for converting to a ranger, but we have not pursued that any 
further yet since there is a regression due to our lack of equivalence 
processing I think?  That should be resolved in the coming month, but at 
the moment is a holdback/concern for converting these passes...  iirc.

Andrew
Aldy Hernandez Oct. 1, 2020, 3:34 p.m. UTC | #14
On 10/1/20 3:22 PM, Andrew MacLeod wrote:
 > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
 >>> Thanks for doing all this!  There isn't anything I don't understand
 >>> in the sprintf changes so no questions from me (well, almost none).
 >>> Just some comments:
 >> Thanks for your comments on the sprintf/strlen API conversion.
 >>
 >>> The current call statement is available in all functions that take
 >>> a directive argument, as dir->info.callstmt.  There should be no need
 >>> to also add it as a new argument to the functions that now need it.
 >> Fixed.
 >>
 >>> The change adds code along these lines in a bunch of places:
 >>>
 >>> +         value_range vr;
 >>> +         if (!query->range_of_expr (vr, arg, stmt))
 >>> +           vr.set_varying (TREE_TYPE (arg));
 >>>
 >>> I thought under the new Ranger APIs when a range couldn't be
 >>> determined it would be automatically set to the maximum for
 >>> the type.  I like that and have been moving in that direction
 >>> with my code myself (rather than having an API fail, have it
 >>> set the max range and succeed).
 >> I went through all the above idioms and noticed all are being used on
 >> supported types (integers or pointers).  So range_of_expr will always
 >> return true.  I've removed the if() and the set_varying.
 >>
 >>> Since that isn't so in this case, I think it would still be nice
 >>> if the added code could be written as if the range were set to
 >>> varying in this case and (ideally) reduced to just initialization:
 >>>
 >>>     value_range vr = some-function (query, stmt, arg);
 >>>
 >>> some-function could be an inline helper defined just for the sprintf
 >>> pass (and maybe also strlen which also seems to use the same pattern),
 >>> or it could be a value_range AKA irange ctor, or it could be a member
 >>> of range_query, whatever is the most appropriate.
 >>>
 >>> (If assigning/copying a value_range is thought to be too expensive,
 >>> declaring it first and then passing it to that helper to set it
 >>> would work too).
 >>>
 >>> In strlen, is the removed comment no longer relevant?  (I.e., does
 >>> the ranger solve the problem?)
 >>>
 >>> -      /* The range below may be "inaccurate" if a constant has been
 >>> -        substituted earlier for VAL by this pass that hasn't been
 >>> -        propagated through the CFG.  This shoud be fixed by the new
 >>> -        on-demand VRP if/when it becomes available (hopefully in
 >>> -        GCC 11).  */
 >> It should.
 >>
 >>> I'm wondering about the comment added to get_range_strlen_dynamic
 >>> and other places:
 >>>
 >>> +         // FIXME: Use range_query instead of global ranges.
 >>>
 >>> Is that something you're planning to do in a followup or should
 >>> I remember to do it at some point?
 >> I'm not planning on doing it.  It's just a reminder that it would be
 >> beneficial to do so.
 >>
 >>> Otherwise I have no concern with the changes.
 >> It's not cleared whether Andrew approved all 3 parts of the patchset
 >> or just the valuation part.  I'll wait for his nod before committing
 >> this chunk.
 >>
 >> Aldy
 >>
 > I have no issue with it, so OK.

Pushed all 3 patches.

 >
 > Just an observation that should be pointed out, I believe Aldy has all
 > the code for converting to a ranger, but we have not pursued that any
 > further yet since there is a regression due to our lack of equivalence
 > processing I think?  That should be resolved in the coming month, but at
 > the moment is a holdback/concern for converting these passes...  iirc.

Yes.  Martin, the take away here is that the strlen/sprintf pass has 
been converted to the new API, but ranger is still not up and running on 
it (even on the branch).

With the new API, all you have to do is remove all instances of 
evrp_range_analyzer and replace them with a ranger.  That's it.
Below is an untested patch that would convert you to a ranger once it's 
contributed.

IIRC when I enabled the ranger for your pass a while back, there was one 
or two regressions due to missing equivalences, and the rest were 
because the tests were expecting an actual specific range, and the 
ranger returned a slightly different/better one.  You'll need to adjust 
your tests.

Aldy

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index f4d1c5ca256..9f9e95b7155 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -58,8 +58,8 @@ along with GCC; see the file COPYING3.  If not see
  #include "tree-ssa-loop.h"
  #include "tree-scalar-evolution.h"
  #include "vr-values.h"
-#include "gimple-ssa-evrp-analyze.h"
  #include "tree-ssa.h"
+#include "gimple-range.h"

  /* A vector indexed by SSA_NAME_VERSION.  0 means unknown, positive value
     is an index into strinfo vector, negative value stands for
@@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker
  public:
    strlen_dom_walker (cdi_direction direction)
      : dom_walker (direction),
-    evrp (false),
      m_cleanup_cfg (false)
    {}

    virtual edge before_dom_children (basic_block);
    virtual void after_dom_children (basic_block);

-  /* EVRP analyzer used for printf argument range processing, and
-     to track strlen results across integer variable assignments.  */
-  evrp_range_analyzer evrp;
+  gimple_ranger ranger;

    /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen
       execute function.  */
@@ -5777,8 +5774,6 @@ public:
  edge
  strlen_dom_walker::before_dom_children (basic_block bb)
  {
-  evrp.enter (bb);
-
    basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb);

    if (dombb == NULL)
@@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children 
(basic_block bb)
    /* Attempt to optimize individual statements.  */
    for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
      {
-      gimple *stmt = gsi_stmt (gsi);
-
-      /* First record ranges generated by this statement so they
-	 can be used by printf argument processing.  */
-      evrp.record_ranges_from_stmt (stmt, false);
-
-      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp))
+      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger))
  	gsi_next (&gsi);
      }

@@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children 
(basic_block bb)
  void
  strlen_dom_walker::after_dom_children (basic_block bb)
  {
-  evrp.leave (bb);
-
    if (bb->aux)
      {
        stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) bb->aux);
Martin Sebor Oct. 1, 2020, 5:25 p.m. UTC | #15
On 10/1/20 9:34 AM, Aldy Hernandez wrote:
> 
> 
> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>  >>> Thanks for doing all this!  There isn't anything I don't understand
>  >>> in the sprintf changes so no questions from me (well, almost none).
>  >>> Just some comments:
>  >> Thanks for your comments on the sprintf/strlen API conversion.
>  >>
>  >>> The current call statement is available in all functions that take
>  >>> a directive argument, as dir->info.callstmt.  There should be no need
>  >>> to also add it as a new argument to the functions that now need it.
>  >> Fixed.
>  >>
>  >>> The change adds code along these lines in a bunch of places:
>  >>>
>  >>> +         value_range vr;
>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>  >>> +           vr.set_varying (TREE_TYPE (arg));
>  >>>
>  >>> I thought under the new Ranger APIs when a range couldn't be
>  >>> determined it would be automatically set to the maximum for
>  >>> the type.  I like that and have been moving in that direction
>  >>> with my code myself (rather than having an API fail, have it
>  >>> set the max range and succeed).
>  >> I went through all the above idioms and noticed all are being used on
>  >> supported types (integers or pointers).  So range_of_expr will always
>  >> return true.  I've removed the if() and the set_varying.
>  >>
>  >>> Since that isn't so in this case, I think it would still be nice
>  >>> if the added code could be written as if the range were set to
>  >>> varying in this case and (ideally) reduced to just initialization:
>  >>>
>  >>>     value_range vr = some-function (query, stmt, arg);
>  >>>
>  >>> some-function could be an inline helper defined just for the sprintf
>  >>> pass (and maybe also strlen which also seems to use the same pattern),
>  >>> or it could be a value_range AKA irange ctor, or it could be a member
>  >>> of range_query, whatever is the most appropriate.
>  >>>
>  >>> (If assigning/copying a value_range is thought to be too expensive,
>  >>> declaring it first and then passing it to that helper to set it
>  >>> would work too).
>  >>>
>  >>> In strlen, is the removed comment no longer relevant?  (I.e., does
>  >>> the ranger solve the problem?)
>  >>>
>  >>> -      /* The range below may be "inaccurate" if a constant has been
>  >>> -        substituted earlier for VAL by this pass that hasn't been
>  >>> -        propagated through the CFG.  This shoud be fixed by the new
>  >>> -        on-demand VRP if/when it becomes available (hopefully in
>  >>> -        GCC 11).  */
>  >> It should.
>  >>
>  >>> I'm wondering about the comment added to get_range_strlen_dynamic
>  >>> and other places:
>  >>>
>  >>> +         // FIXME: Use range_query instead of global ranges.
>  >>>
>  >>> Is that something you're planning to do in a followup or should
>  >>> I remember to do it at some point?
>  >> I'm not planning on doing it.  It's just a reminder that it would be
>  >> beneficial to do so.
>  >>
>  >>> Otherwise I have no concern with the changes.
>  >> It's not cleared whether Andrew approved all 3 parts of the patchset
>  >> or just the valuation part.  I'll wait for his nod before committing
>  >> this chunk.
>  >>
>  >> Aldy
>  >>
>  > I have no issue with it, so OK.
> 
> Pushed all 3 patches.
> 
>  >
>  > Just an observation that should be pointed out, I believe Aldy has all
>  > the code for converting to a ranger, but we have not pursued that any
>  > further yet since there is a regression due to our lack of equivalence
>  > processing I think?  That should be resolved in the coming month, but at
>  > the moment is a holdback/concern for converting these passes...  iirc.
> 
> Yes.  Martin, the take away here is that the strlen/sprintf pass has 
> been converted to the new API, but ranger is still not up and running on 
> it (even on the branch).
> 
> With the new API, all you have to do is remove all instances of 
> evrp_range_analyzer and replace them with a ranger.  That's it.
> Below is an untested patch that would convert you to a ranger once it's 
> contributed.
> 
> IIRC when I enabled the ranger for your pass a while back, there was one 
> or two regressions due to missing equivalences, and the rest were 
> because the tests were expecting an actual specific range, and the 
> ranger returned a slightly different/better one.  You'll need to adjust 
> your tests.

Ack.  I'll be on the lookout for the ranger commit (if you hppen
to remember and CC me on it just in case I might miss it that would
be great).

Thanks
Martin

> 
> Aldy
> 
> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> index f4d1c5ca256..9f9e95b7155 100644
> --- a/gcc/tree-ssa-strlen.c
> +++ b/gcc/tree-ssa-strlen.c
> @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>   #include "tree-ssa-loop.h"
>   #include "tree-scalar-evolution.h"
>   #include "vr-values.h"
> -#include "gimple-ssa-evrp-analyze.h"
>   #include "tree-ssa.h"
> +#include "gimple-range.h"
> 
>   /* A vector indexed by SSA_NAME_VERSION.  0 means unknown, positive value
>      is an index into strinfo vector, negative value stands for
> @@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker
>   public:
>     strlen_dom_walker (cdi_direction direction)
>       : dom_walker (direction),
> -    evrp (false),
>       m_cleanup_cfg (false)
>     {}
> 
>     virtual edge before_dom_children (basic_block);
>     virtual void after_dom_children (basic_block);
> 
> -  /* EVRP analyzer used for printf argument range processing, and
> -     to track strlen results across integer variable assignments.  */
> -  evrp_range_analyzer evrp;
> +  gimple_ranger ranger;
> 
>     /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen
>        execute function.  */
> @@ -5777,8 +5774,6 @@ public:
>   edge
>   strlen_dom_walker::before_dom_children (basic_block bb)
>   {
> -  evrp.enter (bb);
> -
>     basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb);
> 
>     if (dombb == NULL)
> @@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children 
> (basic_block bb)
>     /* Attempt to optimize individual statements.  */
>     for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); )
>       {
> -      gimple *stmt = gsi_stmt (gsi);
> -
> -      /* First record ranges generated by this statement so they
> -     can be used by printf argument processing.  */
> -      evrp.record_ranges_from_stmt (stmt, false);
> -
> -      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp))
> +      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger))
>       gsi_next (&gsi);
>       }
> 
> @@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children 
> (basic_block bb)
>   void
>   strlen_dom_walker::after_dom_children (basic_block bb)
>   {
> -  evrp.leave (bb);
> -
>     if (bb->aux)
>       {
>         stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) 
> bb->aux);
>
Martin Sebor Nov. 5, 2020, 7:29 p.m. UTC | #16
On 10/1/20 11:25 AM, Martin Sebor wrote:
> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>
>>
>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>  >>> Thanks for doing all this!  There isn't anything I don't understand
>>  >>> in the sprintf changes so no questions from me (well, almost none).
>>  >>> Just some comments:
>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>  >>
>>  >>> The current call statement is available in all functions that take
>>  >>> a directive argument, as dir->info.callstmt.  There should be no 
>> need
>>  >>> to also add it as a new argument to the functions that now need it.
>>  >> Fixed.
>>  >>
>>  >>> The change adds code along these lines in a bunch of places:
>>  >>>
>>  >>> +         value_range vr;
>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>  >>>
>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>  >>> determined it would be automatically set to the maximum for
>>  >>> the type.  I like that and have been moving in that direction
>>  >>> with my code myself (rather than having an API fail, have it
>>  >>> set the max range and succeed).
>>  >> I went through all the above idioms and noticed all are being used on
>>  >> supported types (integers or pointers).  So range_of_expr will always
>>  >> return true.  I've removed the if() and the set_varying.
>>  >>
>>  >>> Since that isn't so in this case, I think it would still be nice
>>  >>> if the added code could be written as if the range were set to
>>  >>> varying in this case and (ideally) reduced to just initialization:
>>  >>>
>>  >>>     value_range vr = some-function (query, stmt, arg);
>>  >>>
>>  >>> some-function could be an inline helper defined just for the sprintf
>>  >>> pass (and maybe also strlen which also seems to use the same 
>> pattern),
>>  >>> or it could be a value_range AKA irange ctor, or it could be a 
>> member
>>  >>> of range_query, whatever is the most appropriate.
>>  >>>
>>  >>> (If assigning/copying a value_range is thought to be too expensive,
>>  >>> declaring it first and then passing it to that helper to set it
>>  >>> would work too).
>>  >>>
>>  >>> In strlen, is the removed comment no longer relevant?  (I.e., does
>>  >>> the ranger solve the problem?)
>>  >>>
>>  >>> -      /* The range below may be "inaccurate" if a constant has been
>>  >>> -        substituted earlier for VAL by this pass that hasn't been
>>  >>> -        propagated through the CFG.  This shoud be fixed by the new
>>  >>> -        on-demand VRP if/when it becomes available (hopefully in
>>  >>> -        GCC 11).  */
>>  >> It should.
>>  >>
>>  >>> I'm wondering about the comment added to get_range_strlen_dynamic
>>  >>> and other places:
>>  >>>
>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>  >>>
>>  >>> Is that something you're planning to do in a followup or should
>>  >>> I remember to do it at some point?
>>  >> I'm not planning on doing it.  It's just a reminder that it would be
>>  >> beneficial to do so.
>>  >>
>>  >>> Otherwise I have no concern with the changes.
>>  >> It's not cleared whether Andrew approved all 3 parts of the patchset
>>  >> or just the valuation part.  I'll wait for his nod before committing
>>  >> this chunk.
>>  >>
>>  >> Aldy
>>  >>
>>  > I have no issue with it, so OK.
>>
>> Pushed all 3 patches.
>>
>>  >
>>  > Just an observation that should be pointed out, I believe Aldy has all
>>  > the code for converting to a ranger, but we have not pursued that any
>>  > further yet since there is a regression due to our lack of equivalence
>>  > processing I think?  That should be resolved in the coming month, 
>> but at
>>  > the moment is a holdback/concern for converting these passes...  iirc.
>>
>> Yes.  Martin, the take away here is that the strlen/sprintf pass has 
>> been converted to the new API, but ranger is still not up and running 
>> on it (even on the branch).
>>
>> With the new API, all you have to do is remove all instances of 
>> evrp_range_analyzer and replace them with a ranger.  That's it.
>> Below is an untested patch that would convert you to a ranger once 
>> it's contributed.
>>
>> IIRC when I enabled the ranger for your pass a while back, there was 
>> one or two regressions due to missing equivalences, and the rest were 
>> because the tests were expecting an actual specific range, and the 
>> ranger returned a slightly different/better one.  You'll need to 
>> adjust your tests.
> 
> Ack.  I'll be on the lookout for the ranger commit (if you hppen
> to remember and CC me on it just in case I might miss it that would
> be great).

I have applied the patch and ran some tests.  There are quite
a few failures (see the list below).  I have only looked at
a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
boils down to the following test case.  There should be no warning
for either sprintf call.  The one in h() is a false positive and
the reason for at least some of the regressions.  Somehow,
the conversions between int and char are causing Ranger to lose
the range.

$ cat t.c && gcc -O2 -S -Wall t.c
char a[2];

extern int x;

signed char f (int min, int max)
{
   signed char i = x;
   return i < min || max < i ? min : i;
}

void ff (signed char i)
{
   __builtin_sprintf (a, "%i", f (0, 9));   // okay
}

signed char g (signed char min, signed char max)
{
   signed char i = x;
   return i < min || max < i ? min : i;
}

void gg (void)
{
   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
}

t.c: In function ‘gg’:
t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into a 
region of size 2 [-Wformat-overflow=]
    24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
       |                          ^~
t.c:24:25: note: directive argument in the range [-128, 127]
    24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
       |                         ^~~~
t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into a 
destination of size 2
    24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
       |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


Another failure (the one in builtin-sprintf-warn-22.c) is this where
the false positive is due to the strlen pass somehow having lost
the upper bound on the sum of the two string lengths.

$ cat t.c && gcc -O2 -S -Wall t.c
typedef __SIZE_TYPE__ size_t;

char a[1025];

void g (char *s1, char *s2)
{
   size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
   if (n + d + 1 >= 1025)
     return;

   __builtin_sprintf (a, "%s.%s", s1, s2);     // { dg-bogus 
"\\\[-Wformat-overflow" }
}

t.c: In function ‘g’:
t.c:11:29: warning: ‘%s’ directive writing up to 1023 bytes into a 
region of size between 1 and 1024 [-Wformat-overflow=]
    11 |   __builtin_sprintf (a, "%s.%s", s1, s2);     // { dg-bogus 
"\\\[-Wformat-overflow" }
       |                             ^~
t.c:11:3: note: ‘__builtin_sprintf’ output between 2 and 2048 bytes into 
a destination of size 1025
    11 |   __builtin_sprintf (a, "%s.%s", s1, s2);     // { dg-bogus 
"\\\[-Wformat-overflow" }
       |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


I'll see if I can reduce the others into test cases but I'll have
to hold off on enabling the ranger in the sprintf/strlen pass until
these issues are resolved.

Thanks
Martin

FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 28)
FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 29)
FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 32)
FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 38)
FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 39)
XPASS: gcc.dg/pr80776-1.c  (test for bogus messages, line 22)
FAIL: gcc.dg/strlenopt-81.c (test for excess errors)
FAIL: gcc.dg/strlenopt-90.c scan-tree-dump-not strlen1 "strlen \\(\\&a"
FAIL: gcc.dg/strlenopt-80.c scan-tree-dump-times optimized 
"failure_on_line \\(" 0
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "strlen \\(\\&a"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a1\\] = 0;"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a2 \\+ 1B\\] = 0;"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a3 \\+ 2B\\] = 0;"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a4 \\+ 3B\\] = 0;"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a5 \\+ 4B\\] = 0;"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a6 \\+ 5B\\] = 0;"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a7 \\+ 6B\\] = 0;"
FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a8 \\+ 7B\\] = 0;"
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-3.c (test for excess errors)
(many failures)
FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-22.c  (test for bogus 
messages, line 21)


> 
> Thanks
> Martin
> 
>>
>> Aldy
>>
>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>> index f4d1c5ca256..9f9e95b7155 100644
>> --- a/gcc/tree-ssa-strlen.c
>> +++ b/gcc/tree-ssa-strlen.c
>> @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "tree-ssa-loop.h"
>>   #include "tree-scalar-evolution.h"
>>   #include "vr-values.h"
>> -#include "gimple-ssa-evrp-analyze.h"
>>   #include "tree-ssa.h"
>> +#include "gimple-range.h"
>>
>>   /* A vector indexed by SSA_NAME_VERSION.  0 means unknown, positive 
>> value
>>      is an index into strinfo vector, negative value stands for
>> @@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker
>>   public:
>>     strlen_dom_walker (cdi_direction direction)
>>       : dom_walker (direction),
>> -    evrp (false),
>>       m_cleanup_cfg (false)
>>     {}
>>
>>     virtual edge before_dom_children (basic_block);
>>     virtual void after_dom_children (basic_block);
>>
>> -  /* EVRP analyzer used for printf argument range processing, and
>> -     to track strlen results across integer variable assignments.  */
>> -  evrp_range_analyzer evrp;
>> +  gimple_ranger ranger;
>>
>>     /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen
>>        execute function.  */
>> @@ -5777,8 +5774,6 @@ public:
>>   edge
>>   strlen_dom_walker::before_dom_children (basic_block bb)
>>   {
>> -  evrp.enter (bb);
>> -
>>     basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>
>>     if (dombb == NULL)
>> @@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children 
>> (basic_block bb)
>>     /* Attempt to optimize individual statements.  */
>>     for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p 
>> (gsi); )
>>       {
>> -      gimple *stmt = gsi_stmt (gsi);
>> -
>> -      /* First record ranges generated by this statement so they
>> -     can be used by printf argument processing.  */
>> -      evrp.record_ranges_from_stmt (stmt, false);
>> -
>> -      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp))
>> +      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger))
>>       gsi_next (&gsi);
>>       }
>>
>> @@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children 
>> (basic_block bb)
>>   void
>>   strlen_dom_walker::after_dom_children (basic_block bb)
>>   {
>> -  evrp.leave (bb);
>> -
>>     if (bb->aux)
>>       {
>>         stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) 
>> bb->aux);
>>
>
Martin Sebor Nov. 5, 2020, 9:02 p.m. UTC | #17
On 11/5/20 12:29 PM, Martin Sebor wrote:
> On 10/1/20 11:25 AM, Martin Sebor wrote:
>> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>>
>>>
>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>>  >>> Thanks for doing all this!  There isn't anything I don't understand
>>>  >>> in the sprintf changes so no questions from me (well, almost none).
>>>  >>> Just some comments:
>>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>>  >>
>>>  >>> The current call statement is available in all functions that take
>>>  >>> a directive argument, as dir->info.callstmt.  There should be no 
>>> need
>>>  >>> to also add it as a new argument to the functions that now need it.
>>>  >> Fixed.
>>>  >>
>>>  >>> The change adds code along these lines in a bunch of places:
>>>  >>>
>>>  >>> +         value_range vr;
>>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>>  >>>
>>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>>  >>> determined it would be automatically set to the maximum for
>>>  >>> the type.  I like that and have been moving in that direction
>>>  >>> with my code myself (rather than having an API fail, have it
>>>  >>> set the max range and succeed).
>>>  >> I went through all the above idioms and noticed all are being 
>>> used on
>>>  >> supported types (integers or pointers).  So range_of_expr will 
>>> always
>>>  >> return true.  I've removed the if() and the set_varying.
>>>  >>
>>>  >>> Since that isn't so in this case, I think it would still be nice
>>>  >>> if the added code could be written as if the range were set to
>>>  >>> varying in this case and (ideally) reduced to just initialization:
>>>  >>>
>>>  >>>     value_range vr = some-function (query, stmt, arg);
>>>  >>>
>>>  >>> some-function could be an inline helper defined just for the 
>>> sprintf
>>>  >>> pass (and maybe also strlen which also seems to use the same 
>>> pattern),
>>>  >>> or it could be a value_range AKA irange ctor, or it could be a 
>>> member
>>>  >>> of range_query, whatever is the most appropriate.
>>>  >>>
>>>  >>> (If assigning/copying a value_range is thought to be too expensive,
>>>  >>> declaring it first and then passing it to that helper to set it
>>>  >>> would work too).
>>>  >>>
>>>  >>> In strlen, is the removed comment no longer relevant?  (I.e., does
>>>  >>> the ranger solve the problem?)
>>>  >>>
>>>  >>> -      /* The range below may be "inaccurate" if a constant has 
>>> been
>>>  >>> -        substituted earlier for VAL by this pass that hasn't been
>>>  >>> -        propagated through the CFG.  This shoud be fixed by the 
>>> new
>>>  >>> -        on-demand VRP if/when it becomes available (hopefully in
>>>  >>> -        GCC 11).  */
>>>  >> It should.
>>>  >>
>>>  >>> I'm wondering about the comment added to get_range_strlen_dynamic
>>>  >>> and other places:
>>>  >>>
>>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>>  >>>
>>>  >>> Is that something you're planning to do in a followup or should
>>>  >>> I remember to do it at some point?
>>>  >> I'm not planning on doing it.  It's just a reminder that it would be
>>>  >> beneficial to do so.
>>>  >>
>>>  >>> Otherwise I have no concern with the changes.
>>>  >> It's not cleared whether Andrew approved all 3 parts of the patchset
>>>  >> or just the valuation part.  I'll wait for his nod before committing
>>>  >> this chunk.
>>>  >>
>>>  >> Aldy
>>>  >>
>>>  > I have no issue with it, so OK.
>>>
>>> Pushed all 3 patches.
>>>
>>>  >
>>>  > Just an observation that should be pointed out, I believe Aldy has 
>>> all
>>>  > the code for converting to a ranger, but we have not pursued that any
>>>  > further yet since there is a regression due to our lack of 
>>> equivalence
>>>  > processing I think?  That should be resolved in the coming month, 
>>> but at
>>>  > the moment is a holdback/concern for converting these passes...  
>>> iirc.
>>>
>>> Yes.  Martin, the take away here is that the strlen/sprintf pass has 
>>> been converted to the new API, but ranger is still not up and running 
>>> on it (even on the branch).
>>>
>>> With the new API, all you have to do is remove all instances of 
>>> evrp_range_analyzer and replace them with a ranger.  That's it.
>>> Below is an untested patch that would convert you to a ranger once 
>>> it's contributed.
>>>
>>> IIRC when I enabled the ranger for your pass a while back, there was 
>>> one or two regressions due to missing equivalences, and the rest were 
>>> because the tests were expecting an actual specific range, and the 
>>> ranger returned a slightly different/better one.  You'll need to 
>>> adjust your tests.
>>
>> Ack.  I'll be on the lookout for the ranger commit (if you hppen
>> to remember and CC me on it just in case I might miss it that would
>> be great).
> 
> I have applied the patch and ran some tests.  There are quite
> a few failures (see the list below).  I have only looked at
> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
> boils down to the following test case.  There should be no warning
> for either sprintf call.  The one in h() is a false positive and
> the reason for at least some of the regressions.  Somehow,
> the conversions between int and char are causing Ranger to lose
> the range.
> 
> $ cat t.c && gcc -O2 -S -Wall t.c
> char a[2];
> 
> extern int x;
> 
> signed char f (int min, int max)
> {
>    signed char i = x;
>    return i < min || max < i ? min : i;
> }
> 
> void ff (signed char i)
> {
>    __builtin_sprintf (a, "%i", f (0, 9));   // okay
> }
> 
> signed char g (signed char min, signed char max)
> {
>    signed char i = x;
>    return i < min || max < i ? min : i;
> }
> 
> void gg (void)
> {
>    __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
> }
> 
> t.c: In function ‘gg’:
> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into a 
> region of size 2 [-Wformat-overflow=]
>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>        |                          ^~
> t.c:24:25: note: directive argument in the range [-128, 127]
>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>        |                         ^~~~
> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into a 
> destination of size 2
>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> Another failure (the one in builtin-sprintf-warn-22.c) is this where
> the false positive is due to the strlen pass somehow having lost
> the upper bound on the sum of the two string lengths.

I'm guessing this might have something to do with the strlen pass
calling set_range_info() on the nonconstant results of strlen calls
it can determine the range for.  How would I go about doing it with
ranger?  I see ranger_cache::set_global_range() and the class ctor
takes a gimple_ranger as argument.  Would calling the function be
the right way to do it?  (I couldn't find anything on the Wiki.)

> 
> $ cat t.c && gcc -O2 -S -Wall t.c
> typedef __SIZE_TYPE__ size_t;
> 
> char a[1025];
> 
> void g (char *s1, char *s2)
> {
>    size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2);
>    if (n + d + 1 >= 1025)
>      return;
> 
>    __builtin_sprintf (a, "%s.%s", s1, s2);     // { dg-bogus 
> "\\\[-Wformat-overflow" }
> }
> 
> t.c: In function ‘g’:
> t.c:11:29: warning: ‘%s’ directive writing up to 1023 bytes into a 
> region of size between 1 and 1024 [-Wformat-overflow=]
>     11 |   __builtin_sprintf (a, "%s.%s", s1, s2);     // { dg-bogus 
> "\\\[-Wformat-overflow" }
>        |                             ^~
> t.c:11:3: note: ‘__builtin_sprintf’ output between 2 and 2048 bytes into 
> a destination of size 1025
>     11 |   __builtin_sprintf (a, "%s.%s", s1, s2);     // { dg-bogus 
> "\\\[-Wformat-overflow" }
>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> I'll see if I can reduce the others into test cases but I'll have
> to hold off on enabling the ranger in the sprintf/strlen pass until
> these issues are resolved.
> 
> Thanks
> Martin
> 
> FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 28)
> FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 29)
> FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 32)
> FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 38)
> FAIL: gcc.dg/Wstringop-overflow-20.c  (test for warnings, line 39)
> XPASS: gcc.dg/pr80776-1.c  (test for bogus messages, line 22)
> FAIL: gcc.dg/strlenopt-81.c (test for excess errors)
> FAIL: gcc.dg/strlenopt-90.c scan-tree-dump-not strlen1 "strlen \\(\\&a"
> FAIL: gcc.dg/strlenopt-80.c scan-tree-dump-times optimized 
> "failure_on_line \\(" 0
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "strlen \\(\\&a"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a1\\] = 0;"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a2 \\+ 1B\\] = 0;"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a3 \\+ 2B\\] = 0;"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a4 \\+ 3B\\] = 0;"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a5 \\+ 4B\\] = 0;"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a6 \\+ 5B\\] = 0;"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a7 \\+ 6B\\] = 0;"
> FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a8 \\+ 7B\\] = 0;"
> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-3.c (test for excess errors)
> (many failures)
> FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-22.c  (test for bogus 
> messages, line 21)
> 
> 
>>
>> Thanks
>> Martin
>>
>>>
>>> Aldy
>>>
>>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>>> index f4d1c5ca256..9f9e95b7155 100644
>>> --- a/gcc/tree-ssa-strlen.c
>>> +++ b/gcc/tree-ssa-strlen.c
>>> @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3.  If not see
>>>   #include "tree-ssa-loop.h"
>>>   #include "tree-scalar-evolution.h"
>>>   #include "vr-values.h"
>>> -#include "gimple-ssa-evrp-analyze.h"
>>>   #include "tree-ssa.h"
>>> +#include "gimple-range.h"
>>>
>>>   /* A vector indexed by SSA_NAME_VERSION.  0 means unknown, positive 
>>> value
>>>      is an index into strinfo vector, negative value stands for
>>> @@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker
>>>   public:
>>>     strlen_dom_walker (cdi_direction direction)
>>>       : dom_walker (direction),
>>> -    evrp (false),
>>>       m_cleanup_cfg (false)
>>>     {}
>>>
>>>     virtual edge before_dom_children (basic_block);
>>>     virtual void after_dom_children (basic_block);
>>>
>>> -  /* EVRP analyzer used for printf argument range processing, and
>>> -     to track strlen results across integer variable assignments.  */
>>> -  evrp_range_analyzer evrp;
>>> +  gimple_ranger ranger;
>>>
>>>     /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen
>>>        execute function.  */
>>> @@ -5777,8 +5774,6 @@ public:
>>>   edge
>>>   strlen_dom_walker::before_dom_children (basic_block bb)
>>>   {
>>> -  evrp.enter (bb);
>>> -
>>>     basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb);
>>>
>>>     if (dombb == NULL)
>>> @@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children 
>>> (basic_block bb)
>>>     /* Attempt to optimize individual statements.  */
>>>     for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p 
>>> (gsi); )
>>>       {
>>> -      gimple *stmt = gsi_stmt (gsi);
>>> -
>>> -      /* First record ranges generated by this statement so they
>>> -     can be used by printf argument processing.  */
>>> -      evrp.record_ranges_from_stmt (stmt, false);
>>> -
>>> -      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp))
>>> +      if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger))
>>>       gsi_next (&gsi);
>>>       }
>>>
>>> @@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children 
>>> (basic_block bb)
>>>   void
>>>   strlen_dom_walker::after_dom_children (basic_block bb)
>>>   {
>>> -  evrp.leave (bb);
>>> -
>>>     if (bb->aux)
>>>       {
>>>         stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) 
>>> bb->aux);
>>>
>>
>
Andrew MacLeod Nov. 6, 2020, 12:02 a.m. UTC | #18
On 11/5/20 4:02 PM, Martin Sebor wrote:
> On 11/5/20 12:29 PM, Martin Sebor wrote:
>> On 10/1/20 11:25 AM, Martin Sebor wrote:
>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>>>
>>>>
>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>>>  >>> Thanks for doing all this!  There isn't anything I don't 
>>>> understand
>>>>  >>> in the sprintf changes so no questions from me (well, almost 
>>>> none).
>>>>  >>> Just some comments:
>>>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>>>  >>
>>>>  >>> The current call statement is available in all functions that 
>>>> take
>>>>  >>> a directive argument, as dir->info.callstmt.  There should be 
>>>> no need
>>>>  >>> to also add it as a new argument to the functions that now 
>>>> need it.
>>>>  >> Fixed.
>>>>  >>
>>>>  >>> The change adds code along these lines in a bunch of places:
>>>>  >>>
>>>>  >>> +         value_range vr;
>>>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>>>  >>>
>>>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>>>  >>> determined it would be automatically set to the maximum for
>>>>  >>> the type.  I like that and have been moving in that direction
>>>>  >>> with my code myself (rather than having an API fail, have it
>>>>  >>> set the max range and succeed).
>>>>  >> I went through all the above idioms and noticed all are being 
>>>> used on
>>>>  >> supported types (integers or pointers).  So range_of_expr will 
>>>> always
>>>>  >> return true.  I've removed the if() and the set_varying.
>>>>  >>
>>>>  >>> Since that isn't so in this case, I think it would still be nice
>>>>  >>> if the added code could be written as if the range were set to
>>>>  >>> varying in this case and (ideally) reduced to just 
>>>> initialization:
>>>>  >>>
>>>>  >>>     value_range vr = some-function (query, stmt, arg);
>>>>  >>>
>>>>  >>> some-function could be an inline helper defined just for the 
>>>> sprintf
>>>>  >>> pass (and maybe also strlen which also seems to use the same 
>>>> pattern),
>>>>  >>> or it could be a value_range AKA irange ctor, or it could be a 
>>>> member
>>>>  >>> of range_query, whatever is the most appropriate.
>>>>  >>>
>>>>  >>> (If assigning/copying a value_range is thought to be too 
>>>> expensive,
>>>>  >>> declaring it first and then passing it to that helper to set it
>>>>  >>> would work too).
>>>>  >>>
>>>>  >>> In strlen, is the removed comment no longer relevant?  (I.e., 
>>>> does
>>>>  >>> the ranger solve the problem?)
>>>>  >>>
>>>>  >>> -      /* The range below may be "inaccurate" if a constant 
>>>> has been
>>>>  >>> -        substituted earlier for VAL by this pass that hasn't 
>>>> been
>>>>  >>> -        propagated through the CFG.  This shoud be fixed by 
>>>> the new
>>>>  >>> -        on-demand VRP if/when it becomes available (hopefully in
>>>>  >>> -        GCC 11).  */
>>>>  >> It should.
>>>>  >>
>>>>  >>> I'm wondering about the comment added to get_range_strlen_dynamic
>>>>  >>> and other places:
>>>>  >>>
>>>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>>>  >>>
>>>>  >>> Is that something you're planning to do in a followup or should
>>>>  >>> I remember to do it at some point?
>>>>  >> I'm not planning on doing it.  It's just a reminder that it 
>>>> would be
>>>>  >> beneficial to do so.
>>>>  >>
>>>>  >>> Otherwise I have no concern with the changes.
>>>>  >> It's not cleared whether Andrew approved all 3 parts of the 
>>>> patchset
>>>>  >> or just the valuation part.  I'll wait for his nod before 
>>>> committing
>>>>  >> this chunk.
>>>>  >>
>>>>  >> Aldy
>>>>  >>
>>>>  > I have no issue with it, so OK.
>>>>
>>>> Pushed all 3 patches.
>>>>
>>>>  >
>>>>  > Just an observation that should be pointed out, I believe Aldy 
>>>> has all
>>>>  > the code for converting to a ranger, but we have not pursued 
>>>> that any
>>>>  > further yet since there is a regression due to our lack of 
>>>> equivalence
>>>>  > processing I think?  That should be resolved in the coming 
>>>> month, but at
>>>>  > the moment is a holdback/concern for converting these passes...  
>>>> iirc.
>>>>
>>>> Yes.  Martin, the take away here is that the strlen/sprintf pass 
>>>> has been converted to the new API, but ranger is still not up and 
>>>> running on it (even on the branch).
>>>>
>>>> With the new API, all you have to do is remove all instances of 
>>>> evrp_range_analyzer and replace them with a ranger. That's it.
>>>> Below is an untested patch that would convert you to a ranger once 
>>>> it's contributed.
>>>>
>>>> IIRC when I enabled the ranger for your pass a while back, there 
>>>> was one or two regressions due to missing equivalences, and the 
>>>> rest were because the tests were expecting an actual specific 
>>>> range, and the ranger returned a slightly different/better one.  
>>>> You'll need to adjust your tests.
>>>
>>> Ack.  I'll be on the lookout for the ranger commit (if you hppen
>>> to remember and CC me on it just in case I might miss it that would
>>> be great).
>>
>> I have applied the patch and ran some tests.  There are quite
>> a few failures (see the list below).  I have only looked at
>> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
>> boils down to the following test case.  There should be no warning
>> for either sprintf call.  The one in h() is a false positive and
>> the reason for at least some of the regressions.  Somehow,
>> the conversions between int and char are causing Ranger to lose
>> the range.
>>
>> $ cat t.c && gcc -O2 -S -Wall t.c
>> char a[2];
>>
>> extern int x;
>>
>> signed char f (int min, int max)
>> {
>>    signed char i = x;
>>    return i < min || max < i ? min : i;
>> }
>>
>> void ff (signed char i)
>> {
>>    __builtin_sprintf (a, "%i", f (0, 9));   // okay
>> }
>>
>> signed char g (signed char min, signed char max)
>> {
>>    signed char i = x;
>>    return i < min || max < i ? min : i;
>> }
>>
>> void gg (void)
>> {
>>    __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>> }
>>
>> t.c: In function ‘gg’:
>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into 
>> a region of size 2 [-Wformat-overflow=]
>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>        |                          ^~
>> t.c:24:25: note: directive argument in the range [-128, 127]
>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>        |                         ^~~~
>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into 
>> a destination of size 2
>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>>
>> Another failure (the one in builtin-sprintf-warn-22.c) is this where
>> the false positive is due to the strlen pass somehow having lost
>> the upper bound on the sum of the two string lengths.
>
> I'm guessing this might have something to do with the strlen pass
> calling set_range_info() on the nonconstant results of strlen calls
> it can determine the range for.  How would I go about doing it with
> ranger?  I see ranger_cache::set_global_range() and the class ctor
> takes a gimple_ranger as argument.  Would calling the function be
> the right way to do it?  (I couldn't find anything on the Wiki.)

I haven't had time to write a new modern wiki yet... probably won't 
until well after stage 1 closes either.

And no, thats the internal cache of the ranger, which you dont  have 
access to.

At the moment, there isnt a way to inject a "new" global range out of 
the blue.  It hasnt been a  requirement before, but wouldn't be 
particularly difficult.

Short answer, there is no current equivalent of set_range_info() because 
we dont have a persistent global cache in the same way, and the ranger 
doesnt set  the RANGE_INFO field at this point.. we havent gotten to the 
point of reworking that yet.

So the question is, what exactly are you trying to do? I presume you are 
analyzing  a strlen statement, and are making range conclusions about 
either the result or some of the operands are that are not otherwise 
exposed in the code?

And you are looking to have those values injected into the rangers 
calculations as refinements?

Andrew
Andrew MacLeod Nov. 6, 2020, 12:05 a.m. UTC | #19
On 11/5/20 2:29 PM, Martin Sebor wrote:
>
>
> signed char g (signed char min, signed char max)
> {
>   signed char i = x;
>   return i < min || max < i ? min : i;
> }
>
> void gg (void)
> {
>   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
> }
Im looking at this. its actually completely different code thats 
generated for this signed char case.  And something is being missed that 
shouldnt be.     I'll get back to you.

Andrew
Martin Sebor Nov. 6, 2020, 12:50 a.m. UTC | #20
On 11/5/20 5:02 PM, Andrew MacLeod wrote:
> On 11/5/20 4:02 PM, Martin Sebor wrote:
>> On 11/5/20 12:29 PM, Martin Sebor wrote:
>>> On 10/1/20 11:25 AM, Martin Sebor wrote:
>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>>>>
>>>>>
>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>>>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>>>>  >>> Thanks for doing all this!  There isn't anything I don't 
>>>>> understand
>>>>>  >>> in the sprintf changes so no questions from me (well, almost 
>>>>> none).
>>>>>  >>> Just some comments:
>>>>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>>>>  >>
>>>>>  >>> The current call statement is available in all functions that 
>>>>> take
>>>>>  >>> a directive argument, as dir->info.callstmt.  There should be 
>>>>> no need
>>>>>  >>> to also add it as a new argument to the functions that now 
>>>>> need it.
>>>>>  >> Fixed.
>>>>>  >>
>>>>>  >>> The change adds code along these lines in a bunch of places:
>>>>>  >>>
>>>>>  >>> +         value_range vr;
>>>>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>>>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>>>>  >>>
>>>>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>>>>  >>> determined it would be automatically set to the maximum for
>>>>>  >>> the type.  I like that and have been moving in that direction
>>>>>  >>> with my code myself (rather than having an API fail, have it
>>>>>  >>> set the max range and succeed).
>>>>>  >> I went through all the above idioms and noticed all are being 
>>>>> used on
>>>>>  >> supported types (integers or pointers).  So range_of_expr will 
>>>>> always
>>>>>  >> return true.  I've removed the if() and the set_varying.
>>>>>  >>
>>>>>  >>> Since that isn't so in this case, I think it would still be nice
>>>>>  >>> if the added code could be written as if the range were set to
>>>>>  >>> varying in this case and (ideally) reduced to just 
>>>>> initialization:
>>>>>  >>>
>>>>>  >>>     value_range vr = some-function (query, stmt, arg);
>>>>>  >>>
>>>>>  >>> some-function could be an inline helper defined just for the 
>>>>> sprintf
>>>>>  >>> pass (and maybe also strlen which also seems to use the same 
>>>>> pattern),
>>>>>  >>> or it could be a value_range AKA irange ctor, or it could be a 
>>>>> member
>>>>>  >>> of range_query, whatever is the most appropriate.
>>>>>  >>>
>>>>>  >>> (If assigning/copying a value_range is thought to be too 
>>>>> expensive,
>>>>>  >>> declaring it first and then passing it to that helper to set it
>>>>>  >>> would work too).
>>>>>  >>>
>>>>>  >>> In strlen, is the removed comment no longer relevant?  (I.e., 
>>>>> does
>>>>>  >>> the ranger solve the problem?)
>>>>>  >>>
>>>>>  >>> -      /* The range below may be "inaccurate" if a constant 
>>>>> has been
>>>>>  >>> -        substituted earlier for VAL by this pass that hasn't 
>>>>> been
>>>>>  >>> -        propagated through the CFG.  This shoud be fixed by 
>>>>> the new
>>>>>  >>> -        on-demand VRP if/when it becomes available (hopefully in
>>>>>  >>> -        GCC 11).  */
>>>>>  >> It should.
>>>>>  >>
>>>>>  >>> I'm wondering about the comment added to get_range_strlen_dynamic
>>>>>  >>> and other places:
>>>>>  >>>
>>>>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>>>>  >>>
>>>>>  >>> Is that something you're planning to do in a followup or should
>>>>>  >>> I remember to do it at some point?
>>>>>  >> I'm not planning on doing it.  It's just a reminder that it 
>>>>> would be
>>>>>  >> beneficial to do so.
>>>>>  >>
>>>>>  >>> Otherwise I have no concern with the changes.
>>>>>  >> It's not cleared whether Andrew approved all 3 parts of the 
>>>>> patchset
>>>>>  >> or just the valuation part.  I'll wait for his nod before 
>>>>> committing
>>>>>  >> this chunk.
>>>>>  >>
>>>>>  >> Aldy
>>>>>  >>
>>>>>  > I have no issue with it, so OK.
>>>>>
>>>>> Pushed all 3 patches.
>>>>>
>>>>>  >
>>>>>  > Just an observation that should be pointed out, I believe Aldy 
>>>>> has all
>>>>>  > the code for converting to a ranger, but we have not pursued 
>>>>> that any
>>>>>  > further yet since there is a regression due to our lack of 
>>>>> equivalence
>>>>>  > processing I think?  That should be resolved in the coming 
>>>>> month, but at
>>>>>  > the moment is a holdback/concern for converting these passes... 
>>>>> iirc.
>>>>>
>>>>> Yes.  Martin, the take away here is that the strlen/sprintf pass 
>>>>> has been converted to the new API, but ranger is still not up and 
>>>>> running on it (even on the branch).
>>>>>
>>>>> With the new API, all you have to do is remove all instances of 
>>>>> evrp_range_analyzer and replace them with a ranger. That's it.
>>>>> Below is an untested patch that would convert you to a ranger once 
>>>>> it's contributed.
>>>>>
>>>>> IIRC when I enabled the ranger for your pass a while back, there 
>>>>> was one or two regressions due to missing equivalences, and the 
>>>>> rest were because the tests were expecting an actual specific 
>>>>> range, and the ranger returned a slightly different/better one. 
>>>>> You'll need to adjust your tests.
>>>>
>>>> Ack.  I'll be on the lookout for the ranger commit (if you hppen
>>>> to remember and CC me on it just in case I might miss it that would
>>>> be great).
>>>
>>> I have applied the patch and ran some tests.  There are quite
>>> a few failures (see the list below).  I have only looked at
>>> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
>>> boils down to the following test case.  There should be no warning
>>> for either sprintf call.  The one in h() is a false positive and
>>> the reason for at least some of the regressions.  Somehow,
>>> the conversions between int and char are causing Ranger to lose
>>> the range.
>>>
>>> $ cat t.c && gcc -O2 -S -Wall t.c
>>> char a[2];
>>>
>>> extern int x;
>>>
>>> signed char f (int min, int max)
>>> {
>>>    signed char i = x;
>>>    return i < min || max < i ? min : i;
>>> }
>>>
>>> void ff (signed char i)
>>> {
>>>    __builtin_sprintf (a, "%i", f (0, 9));   // okay
>>> }
>>>
>>> signed char g (signed char min, signed char max)
>>> {
>>>    signed char i = x;
>>>    return i < min || max < i ? min : i;
>>> }
>>>
>>> void gg (void)
>>> {
>>>    __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>> }
>>>
>>> t.c: In function ‘gg’:
>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into 
>>> a region of size 2 [-Wformat-overflow=]
>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>        |                          ^~
>>> t.c:24:25: note: directive argument in the range [-128, 127]
>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>        |                         ^~~~
>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into 
>>> a destination of size 2
>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>
>>>
>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where
>>> the false positive is due to the strlen pass somehow having lost
>>> the upper bound on the sum of the two string lengths.
>>
>> I'm guessing this might have something to do with the strlen pass
>> calling set_range_info() on the nonconstant results of strlen calls
>> it can determine the range for.  How would I go about doing it with
>> ranger?  I see ranger_cache::set_global_range() and the class ctor
>> takes a gimple_ranger as argument.  Would calling the function be
>> the right way to do it?  (I couldn't find anything on the Wiki.)
> 
> I haven't had time to write a new modern wiki yet... probably won't 
> until well after stage 1 closes either.
> 
> And no, thats the internal cache of the ranger, which you dont  have 
> access to.
> 
> At the moment, there isnt a way to inject a "new" global range out of 
> the blue.  It hasnt been a  requirement before, but wouldn't be 
> particularly difficult.
> 
> Short answer, there is no current equivalent of set_range_info() because 
> we dont have a persistent global cache in the same way, and the ranger 
> doesnt set  the RANGE_INFO field at this point.. we havent gotten to the 
> point of reworking that yet.
> 
> So the question is, what exactly are you trying to do? I presume you are 
> analyzing  a strlen statement, and are making range conclusions about 
> either the result or some of the operands are that are not otherwise 
> exposed in the code?
> 
> And you are looking to have those values injected into the rangers 
> calculations as refinements?

Yes, exactly.  This is used both for optimization and for warnings
(both to enable them and to suppress false positives).

Besides folding strlen and sprintf calls into constants, the strlen
pass sets the range of the results of the calls when the lengths of
the arguments/output are known to be at least some number of bytes.

Besides exposing optimization opportunities downstream, the pass
uses this information to either issue or suppress warnings like in
the sprintf case above.  -Wformat-overflow issues warnings either
based on string lengths (or ranges) if they're known, or based on
the sizes of the arrays the strings are stored in when nothing is
known about the lengths.  The array size is used as the worst case
estimate, which can cause false positives.

Here's an example of a downstream optimization made possible by
this.  Because i's range is known, the range of the snprintf result
is also known (the range is computed for all snprintf arguments).
The strlen pass sets the range and subsequent passes eliminate
the unreachable code.

   void f (int i)
   {
     if (i < 100 || 9999 < i)
       i = 100;

     int n = __builtin_snprintf (0, 0, "%i", i);
     if (n < 3 || 4 < n)
       __builtin_abort ();
   }

Martin
Andrew MacLeod Nov. 6, 2020, 3:08 a.m. UTC | #21
On 11/5/20 7:50 PM, Martin Sebor wrote:
> On 11/5/20 5:02 PM, Andrew MacLeod wrote:
>> On 11/5/20 4:02 PM, Martin Sebor wrote:
>>> On 11/5/20 12:29 PM, Martin Sebor wrote:
>>>> On 10/1/20 11:25 AM, Martin Sebor wrote:
>>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>>>>>
>>>>>>
>>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>>>>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>>>>>  >>> Thanks for doing all this!  There isn't anything I don't 
>>>>>> understand
>>>>>>  >>> in the sprintf changes so no questions from me (well, almost 
>>>>>> none).
>>>>>>  >>> Just some comments:
>>>>>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>>>>>  >>
>>>>>>  >>> The current call statement is available in all functions 
>>>>>> that take
>>>>>>  >>> a directive argument, as dir->info.callstmt.  There should 
>>>>>> be no need
>>>>>>  >>> to also add it as a new argument to the functions that now 
>>>>>> need it.
>>>>>>  >> Fixed.
>>>>>>  >>
>>>>>>  >>> The change adds code along these lines in a bunch of places:
>>>>>>  >>>
>>>>>>  >>> +         value_range vr;
>>>>>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>>>>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>>>>>  >>>
>>>>>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>>>>>  >>> determined it would be automatically set to the maximum for
>>>>>>  >>> the type.  I like that and have been moving in that direction
>>>>>>  >>> with my code myself (rather than having an API fail, have it
>>>>>>  >>> set the max range and succeed).
>>>>>>  >> I went through all the above idioms and noticed all are being 
>>>>>> used on
>>>>>>  >> supported types (integers or pointers).  So range_of_expr 
>>>>>> will always
>>>>>>  >> return true.  I've removed the if() and the set_varying.
>>>>>>  >>
>>>>>>  >>> Since that isn't so in this case, I think it would still be 
>>>>>> nice
>>>>>>  >>> if the added code could be written as if the range were set to
>>>>>>  >>> varying in this case and (ideally) reduced to just 
>>>>>> initialization:
>>>>>>  >>>
>>>>>>  >>>     value_range vr = some-function (query, stmt, arg);
>>>>>>  >>>
>>>>>>  >>> some-function could be an inline helper defined just for the 
>>>>>> sprintf
>>>>>>  >>> pass (and maybe also strlen which also seems to use the same 
>>>>>> pattern),
>>>>>>  >>> or it could be a value_range AKA irange ctor, or it could be 
>>>>>> a member
>>>>>>  >>> of range_query, whatever is the most appropriate.
>>>>>>  >>>
>>>>>>  >>> (If assigning/copying a value_range is thought to be too 
>>>>>> expensive,
>>>>>>  >>> declaring it first and then passing it to that helper to set it
>>>>>>  >>> would work too).
>>>>>>  >>>
>>>>>>  >>> In strlen, is the removed comment no longer relevant?  
>>>>>> (I.e., does
>>>>>>  >>> the ranger solve the problem?)
>>>>>>  >>>
>>>>>>  >>> -      /* The range below may be "inaccurate" if a constant 
>>>>>> has been
>>>>>>  >>> -        substituted earlier for VAL by this pass that 
>>>>>> hasn't been
>>>>>>  >>> -        propagated through the CFG.  This shoud be fixed by 
>>>>>> the new
>>>>>>  >>> -        on-demand VRP if/when it becomes available 
>>>>>> (hopefully in
>>>>>>  >>> -        GCC 11).  */
>>>>>>  >> It should.
>>>>>>  >>
>>>>>>  >>> I'm wondering about the comment added to 
>>>>>> get_range_strlen_dynamic
>>>>>>  >>> and other places:
>>>>>>  >>>
>>>>>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>>>>>  >>>
>>>>>>  >>> Is that something you're planning to do in a followup or should
>>>>>>  >>> I remember to do it at some point?
>>>>>>  >> I'm not planning on doing it.  It's just a reminder that it 
>>>>>> would be
>>>>>>  >> beneficial to do so.
>>>>>>  >>
>>>>>>  >>> Otherwise I have no concern with the changes.
>>>>>>  >> It's not cleared whether Andrew approved all 3 parts of the 
>>>>>> patchset
>>>>>>  >> or just the valuation part.  I'll wait for his nod before 
>>>>>> committing
>>>>>>  >> this chunk.
>>>>>>  >>
>>>>>>  >> Aldy
>>>>>>  >>
>>>>>>  > I have no issue with it, so OK.
>>>>>>
>>>>>> Pushed all 3 patches.
>>>>>>
>>>>>>  >
>>>>>>  > Just an observation that should be pointed out, I believe Aldy 
>>>>>> has all
>>>>>>  > the code for converting to a ranger, but we have not pursued 
>>>>>> that any
>>>>>>  > further yet since there is a regression due to our lack of 
>>>>>> equivalence
>>>>>>  > processing I think?  That should be resolved in the coming 
>>>>>> month, but at
>>>>>>  > the moment is a holdback/concern for converting these 
>>>>>> passes... iirc.
>>>>>>
>>>>>> Yes.  Martin, the take away here is that the strlen/sprintf pass 
>>>>>> has been converted to the new API, but ranger is still not up and 
>>>>>> running on it (even on the branch).
>>>>>>
>>>>>> With the new API, all you have to do is remove all instances of 
>>>>>> evrp_range_analyzer and replace them with a ranger. That's it.
>>>>>> Below is an untested patch that would convert you to a ranger 
>>>>>> once it's contributed.
>>>>>>
>>>>>> IIRC when I enabled the ranger for your pass a while back, there 
>>>>>> was one or two regressions due to missing equivalences, and the 
>>>>>> rest were because the tests were expecting an actual specific 
>>>>>> range, and the ranger returned a slightly different/better one. 
>>>>>> You'll need to adjust your tests.
>>>>>
>>>>> Ack.  I'll be on the lookout for the ranger commit (if you hppen
>>>>> to remember and CC me on it just in case I might miss it that would
>>>>> be great).
>>>>
>>>> I have applied the patch and ran some tests.  There are quite
>>>> a few failures (see the list below).  I have only looked at
>>>> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
>>>> boils down to the following test case.  There should be no warning
>>>> for either sprintf call.  The one in h() is a false positive and
>>>> the reason for at least some of the regressions.  Somehow,
>>>> the conversions between int and char are causing Ranger to lose
>>>> the range.
>>>>
>>>> $ cat t.c && gcc -O2 -S -Wall t.c
>>>> char a[2];
>>>>
>>>> extern int x;
>>>>
>>>> signed char f (int min, int max)
>>>> {
>>>>    signed char i = x;
>>>>    return i < min || max < i ? min : i;
>>>> }
>>>>
>>>> void ff (signed char i)
>>>> {
>>>>    __builtin_sprintf (a, "%i", f (0, 9));   // okay
>>>> }
>>>>
>>>> signed char g (signed char min, signed char max)
>>>> {
>>>>    signed char i = x;
>>>>    return i < min || max < i ? min : i;
>>>> }
>>>>
>>>> void gg (void)
>>>> {
>>>>    __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>> }
>>>>
>>>> t.c: In function ‘gg’:
>>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes 
>>>> into a region of size 2 [-Wformat-overflow=]
>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>        |                          ^~
>>>> t.c:24:25: note: directive argument in the range [-128, 127]
>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>        |                         ^~~~
>>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes 
>>>> into a destination of size 2
>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>
>>>>
>>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where
>>>> the false positive is due to the strlen pass somehow having lost
>>>> the upper bound on the sum of the two string lengths.
>>>
>>> I'm guessing this might have something to do with the strlen pass
>>> calling set_range_info() on the nonconstant results of strlen calls
>>> it can determine the range for.  How would I go about doing it with
>>> ranger?  I see ranger_cache::set_global_range() and the class ctor
>>> takes a gimple_ranger as argument.  Would calling the function be
>>> the right way to do it?  (I couldn't find anything on the Wiki.)
>>
>> I haven't had time to write a new modern wiki yet... probably won't 
>> until well after stage 1 closes either.
>>
>> And no, thats the internal cache of the ranger, which you dont have 
>> access to.
>>
>> At the moment, there isnt a way to inject a "new" global range out of 
>> the blue.  It hasnt been a  requirement before, but wouldn't be 
>> particularly difficult.
>>
>> Short answer, there is no current equivalent of set_range_info() 
>> because we dont have a persistent global cache in the same way, and 
>> the ranger doesnt set  the RANGE_INFO field at this point.. we havent 
>> gotten to the point of reworking that yet.
>>
>> So the question is, what exactly are you trying to do? I presume you 
>> are analyzing  a strlen statement, and are making range conclusions 
>> about either the result or some of the operands are that are not 
>> otherwise exposed in the code?
>>
>> And you are looking to have those values injected into the rangers 
>> calculations as refinements?
>
> Yes, exactly.  This is used both for optimization and for warnings
> (both to enable them and to suppress false positives).
>
> Besides folding strlen and sprintf calls into constants, the strlen
> pass sets the range of the results of the calls when the lengths of
> the arguments/output are known to be at least some number of bytes.
>
> Besides exposing optimization opportunities downstream, the pass
> uses this information to either issue or suppress warnings like in
> the sprintf case above.  -Wformat-overflow issues warnings either
> based on string lengths (or ranges) if they're known, or based on
> the sizes of the arrays the strings are stored in when nothing is
> known about the lengths.  The array size is used as the worst case
> estimate, which can cause false positives.
>
> Here's an example of a downstream optimization made possible by
> this.  Because i's range is known, the range of the snprintf result
> is also known (the range is computed for all snprintf arguments).
> The strlen pass sets the range and subsequent passes eliminate
> the unreachable code.
>
>   void f (int i)
>   {
>     if (i < 100 || 9999 < i)
>       i = 100;
>
>     int n = __builtin_snprintf (0, 0, "%i", i);
>     if (n < 3 || 4 < n)
>       __builtin_abort ();
>   }
>
> Martin
>
That seems like somethings that the range code for builtins should take 
care of rather than be calculated externally and injected

  they are just like any other  builtins that if the range of certain 
arguments are known, you can produce a result? I gather the range of i 
being known to be [100,9999] means we know something about n.  The the 
ranger would be able to do that anywhere and everywhere, not just within 
this pass.  One of the ranger goals is to centralize all range tracking

and the warning pass can continue to check the ranges and issue warnings 
when the ranges arent what it likes.

There may be other hoops you have to continue to jump thru, i don't know 
the specifics of what you deal with,  but at least the basics of ranges 
produced from  a builtin sounds like something that should be internal 
to the range engine.

Andrew
Martin Sebor Nov. 6, 2020, 4:13 p.m. UTC | #22
On 11/5/20 8:08 PM, Andrew MacLeod wrote:
> On 11/5/20 7:50 PM, Martin Sebor wrote:
>> On 11/5/20 5:02 PM, Andrew MacLeod wrote:
>>> On 11/5/20 4:02 PM, Martin Sebor wrote:
>>>> On 11/5/20 12:29 PM, Martin Sebor wrote:
>>>>> On 10/1/20 11:25 AM, Martin Sebor wrote:
>>>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>>>>>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>>>>>>  >>> Thanks for doing all this!  There isn't anything I don't 
>>>>>>> understand
>>>>>>>  >>> in the sprintf changes so no questions from me (well, almost 
>>>>>>> none).
>>>>>>>  >>> Just some comments:
>>>>>>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>>>>>>  >>
>>>>>>>  >>> The current call statement is available in all functions 
>>>>>>> that take
>>>>>>>  >>> a directive argument, as dir->info.callstmt.  There should 
>>>>>>> be no need
>>>>>>>  >>> to also add it as a new argument to the functions that now 
>>>>>>> need it.
>>>>>>>  >> Fixed.
>>>>>>>  >>
>>>>>>>  >>> The change adds code along these lines in a bunch of places:
>>>>>>>  >>>
>>>>>>>  >>> +         value_range vr;
>>>>>>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>>>>>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>>>>>>  >>>
>>>>>>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>>>>>>  >>> determined it would be automatically set to the maximum for
>>>>>>>  >>> the type.  I like that and have been moving in that direction
>>>>>>>  >>> with my code myself (rather than having an API fail, have it
>>>>>>>  >>> set the max range and succeed).
>>>>>>>  >> I went through all the above idioms and noticed all are being 
>>>>>>> used on
>>>>>>>  >> supported types (integers or pointers).  So range_of_expr 
>>>>>>> will always
>>>>>>>  >> return true.  I've removed the if() and the set_varying.
>>>>>>>  >>
>>>>>>>  >>> Since that isn't so in this case, I think it would still be 
>>>>>>> nice
>>>>>>>  >>> if the added code could be written as if the range were set to
>>>>>>>  >>> varying in this case and (ideally) reduced to just 
>>>>>>> initialization:
>>>>>>>  >>>
>>>>>>>  >>>     value_range vr = some-function (query, stmt, arg);
>>>>>>>  >>>
>>>>>>>  >>> some-function could be an inline helper defined just for the 
>>>>>>> sprintf
>>>>>>>  >>> pass (and maybe also strlen which also seems to use the same 
>>>>>>> pattern),
>>>>>>>  >>> or it could be a value_range AKA irange ctor, or it could be 
>>>>>>> a member
>>>>>>>  >>> of range_query, whatever is the most appropriate.
>>>>>>>  >>>
>>>>>>>  >>> (If assigning/copying a value_range is thought to be too 
>>>>>>> expensive,
>>>>>>>  >>> declaring it first and then passing it to that helper to set it
>>>>>>>  >>> would work too).
>>>>>>>  >>>
>>>>>>>  >>> In strlen, is the removed comment no longer relevant? (I.e., 
>>>>>>> does
>>>>>>>  >>> the ranger solve the problem?)
>>>>>>>  >>>
>>>>>>>  >>> -      /* The range below may be "inaccurate" if a constant 
>>>>>>> has been
>>>>>>>  >>> -        substituted earlier for VAL by this pass that 
>>>>>>> hasn't been
>>>>>>>  >>> -        propagated through the CFG.  This shoud be fixed by 
>>>>>>> the new
>>>>>>>  >>> -        on-demand VRP if/when it becomes available 
>>>>>>> (hopefully in
>>>>>>>  >>> -        GCC 11).  */
>>>>>>>  >> It should.
>>>>>>>  >>
>>>>>>>  >>> I'm wondering about the comment added to 
>>>>>>> get_range_strlen_dynamic
>>>>>>>  >>> and other places:
>>>>>>>  >>>
>>>>>>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>>>>>>  >>>
>>>>>>>  >>> Is that something you're planning to do in a followup or should
>>>>>>>  >>> I remember to do it at some point?
>>>>>>>  >> I'm not planning on doing it.  It's just a reminder that it 
>>>>>>> would be
>>>>>>>  >> beneficial to do so.
>>>>>>>  >>
>>>>>>>  >>> Otherwise I have no concern with the changes.
>>>>>>>  >> It's not cleared whether Andrew approved all 3 parts of the 
>>>>>>> patchset
>>>>>>>  >> or just the valuation part.  I'll wait for his nod before 
>>>>>>> committing
>>>>>>>  >> this chunk.
>>>>>>>  >>
>>>>>>>  >> Aldy
>>>>>>>  >>
>>>>>>>  > I have no issue with it, so OK.
>>>>>>>
>>>>>>> Pushed all 3 patches.
>>>>>>>
>>>>>>>  >
>>>>>>>  > Just an observation that should be pointed out, I believe Aldy 
>>>>>>> has all
>>>>>>>  > the code for converting to a ranger, but we have not pursued 
>>>>>>> that any
>>>>>>>  > further yet since there is a regression due to our lack of 
>>>>>>> equivalence
>>>>>>>  > processing I think?  That should be resolved in the coming 
>>>>>>> month, but at
>>>>>>>  > the moment is a holdback/concern for converting these 
>>>>>>> passes... iirc.
>>>>>>>
>>>>>>> Yes.  Martin, the take away here is that the strlen/sprintf pass 
>>>>>>> has been converted to the new API, but ranger is still not up and 
>>>>>>> running on it (even on the branch).
>>>>>>>
>>>>>>> With the new API, all you have to do is remove all instances of 
>>>>>>> evrp_range_analyzer and replace them with a ranger. That's it.
>>>>>>> Below is an untested patch that would convert you to a ranger 
>>>>>>> once it's contributed.
>>>>>>>
>>>>>>> IIRC when I enabled the ranger for your pass a while back, there 
>>>>>>> was one or two regressions due to missing equivalences, and the 
>>>>>>> rest were because the tests were expecting an actual specific 
>>>>>>> range, and the ranger returned a slightly different/better one. 
>>>>>>> You'll need to adjust your tests.
>>>>>>
>>>>>> Ack.  I'll be on the lookout for the ranger commit (if you hppen
>>>>>> to remember and CC me on it just in case I might miss it that would
>>>>>> be great).
>>>>>
>>>>> I have applied the patch and ran some tests.  There are quite
>>>>> a few failures (see the list below).  I have only looked at
>>>>> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
>>>>> boils down to the following test case.  There should be no warning
>>>>> for either sprintf call.  The one in h() is a false positive and
>>>>> the reason for at least some of the regressions.  Somehow,
>>>>> the conversions between int and char are causing Ranger to lose
>>>>> the range.
>>>>>
>>>>> $ cat t.c && gcc -O2 -S -Wall t.c
>>>>> char a[2];
>>>>>
>>>>> extern int x;
>>>>>
>>>>> signed char f (int min, int max)
>>>>> {
>>>>>    signed char i = x;
>>>>>    return i < min || max < i ? min : i;
>>>>> }
>>>>>
>>>>> void ff (signed char i)
>>>>> {
>>>>>    __builtin_sprintf (a, "%i", f (0, 9));   // okay
>>>>> }
>>>>>
>>>>> signed char g (signed char min, signed char max)
>>>>> {
>>>>>    signed char i = x;
>>>>>    return i < min || max < i ? min : i;
>>>>> }
>>>>>
>>>>> void gg (void)
>>>>> {
>>>>>    __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>> }
>>>>>
>>>>> t.c: In function ‘gg’:
>>>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes 
>>>>> into a region of size 2 [-Wformat-overflow=]
>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>        |                          ^~
>>>>> t.c:24:25: note: directive argument in the range [-128, 127]
>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>        |                         ^~~~
>>>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes 
>>>>> into a destination of size 2
>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>
>>>>>
>>>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where
>>>>> the false positive is due to the strlen pass somehow having lost
>>>>> the upper bound on the sum of the two string lengths.
>>>>
>>>> I'm guessing this might have something to do with the strlen pass
>>>> calling set_range_info() on the nonconstant results of strlen calls
>>>> it can determine the range for.  How would I go about doing it with
>>>> ranger?  I see ranger_cache::set_global_range() and the class ctor
>>>> takes a gimple_ranger as argument.  Would calling the function be
>>>> the right way to do it?  (I couldn't find anything on the Wiki.)
>>>
>>> I haven't had time to write a new modern wiki yet... probably won't 
>>> until well after stage 1 closes either.
>>>
>>> And no, thats the internal cache of the ranger, which you dont have 
>>> access to.
>>>
>>> At the moment, there isnt a way to inject a "new" global range out of 
>>> the blue.  It hasnt been a  requirement before, but wouldn't be 
>>> particularly difficult.
>>>
>>> Short answer, there is no current equivalent of set_range_info() 
>>> because we dont have a persistent global cache in the same way, and 
>>> the ranger doesnt set  the RANGE_INFO field at this point.. we havent 
>>> gotten to the point of reworking that yet.
>>>
>>> So the question is, what exactly are you trying to do? I presume you 
>>> are analyzing  a strlen statement, and are making range conclusions 
>>> about either the result or some of the operands are that are not 
>>> otherwise exposed in the code?
>>>
>>> And you are looking to have those values injected into the rangers 
>>> calculations as refinements?
>>
>> Yes, exactly.  This is used both for optimization and for warnings
>> (both to enable them and to suppress false positives).
>>
>> Besides folding strlen and sprintf calls into constants, the strlen
>> pass sets the range of the results of the calls when the lengths of
>> the arguments/output are known to be at least some number of bytes.
>>
>> Besides exposing optimization opportunities downstream, the pass
>> uses this information to either issue or suppress warnings like in
>> the sprintf case above.  -Wformat-overflow issues warnings either
>> based on string lengths (or ranges) if they're known, or based on
>> the sizes of the arrays the strings are stored in when nothing is
>> known about the lengths.  The array size is used as the worst case
>> estimate, which can cause false positives.
>>
>> Here's an example of a downstream optimization made possible by
>> this.  Because i's range is known, the range of the snprintf result
>> is also known (the range is computed for all snprintf arguments).
>> The strlen pass sets the range and subsequent passes eliminate
>> the unreachable code.
>>
>>   void f (int i)
>>   {
>>     if (i < 100 || 9999 < i)
>>       i = 100;
>>
>>     int n = __builtin_snprintf (0, 0, "%i", i);
>>     if (n < 3 || 4 < n)
>>       __builtin_abort ();
>>   }
>>
>> Martin
>>
> That seems like somethings that the range code for builtins should take 
> care of rather than be calculated externally and injected
> 
>   they are just like any other  builtins that if the range of certain 
> arguments are known, you can produce a result?

The strlen optimization is more involved in that it keeps track of
the state of the strings as they're being created by calls to string
functions like strcpy (or single and multi-byte stores).  So the only
way for Ranger to figure out the range of the result of a strlen call
is to query the strlen pass.  That might work via a callback but it
would of course only work while the strlen pass is running.  The same
goes for the sprintf results (which are based on the strlen data).
It might be good enough as long as the ranges can also be exported
to downstream passes.

It might be possible to compute this on demand too (and introduce
something like a strlen_range_query) but it would need reworking
the strlen pass.  I like the on-demand design but in the strlen
case I'm not sure the benefits would justify the costs.

> I gather the range of i 
> being known to be [100,9999] means we know something about n.  The the 
> ranger would be able to do that anywhere and everywhere, not just within 
> this pass.  One of the ranger goals is to centralize all range tracking
> 
> and the warning pass can continue to check the ranges and issue warnings 
> when the ranges arent what it likes.

In this simple case, yes.  But consider a more involved example:

   char a[8];

   void f (int i)
   {
     if (i < 100 || 9999 < i) i = 100;
     memcpy (a, "123", 3);   // strlen(a) is in [3, 7]
     int n = snprintf (0, 0, "%s %i", a, i);   // n is in [7, 12]
     if (n < 7 || 12 < n)   // folded to false
       abort ();
   }

To compute n's range you need to know not only that of i but also
the length of a (or its range in this case).  That's what the strlen
optimization enables.

> 
> There may be other hoops you have to continue to jump thru, i don't know 
> the specifics of what you deal with,  but at least the basics of ranges 
> produced from  a builtin sounds like something that should be internal 
> to the range engine.

It might help to meet to help each other come up to speed on what
we're doing, what we're planning, and how to best integrate the work
and enable future enhancements.  I still know very little about
the Ranger internals and it might be helpful to you to learn more
about some of the warnings I work on and the optimizations that
enable them (and what I'd like to do in the future).

Martin
Andrew MacLeod Nov. 6, 2020, 9:54 p.m. UTC | #23
On 11/6/20 11:13 AM, Martin Sebor wrote:
> On 11/5/20 8:08 PM, Andrew MacLeod wrote:
>> On 11/5/20 7:50 PM, Martin Sebor wrote:
>>> On 11/5/20 5:02 PM, Andrew MacLeod wrote:
>>>> On 11/5/20 4:02 PM, Martin Sebor wrote:
>>>>> On 11/5/20 12:29 PM, Martin Sebor wrote:
>>>>>> On 10/1/20 11:25 AM, Martin Sebor wrote:
>>>>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>>>>>>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>>>>>>>  >>> Thanks for doing all this!  There isn't anything I don't 
>>>>>>>> understand
>>>>>>>>  >>> in the sprintf changes so no questions from me (well, 
>>>>>>>> almost none).
>>>>>>>>  >>> Just some comments:
>>>>>>>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>>>>>>>  >>
>>>>>>>>  >>> The current call statement is available in all functions 
>>>>>>>> that take
>>>>>>>>  >>> a directive argument, as dir->info.callstmt.  There should 
>>>>>>>> be no need
>>>>>>>>  >>> to also add it as a new argument to the functions that now 
>>>>>>>> need it.
>>>>>>>>  >> Fixed.
>>>>>>>>  >>
>>>>>>>>  >>> The change adds code along these lines in a bunch of places:
>>>>>>>>  >>>
>>>>>>>>  >>> +         value_range vr;
>>>>>>>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>>>>>>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>>>>>>>  >>>
>>>>>>>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>>>>>>>  >>> determined it would be automatically set to the maximum for
>>>>>>>>  >>> the type.  I like that and have been moving in that direction
>>>>>>>>  >>> with my code myself (rather than having an API fail, have it
>>>>>>>>  >>> set the max range and succeed).
>>>>>>>>  >> I went through all the above idioms and noticed all are 
>>>>>>>> being used on
>>>>>>>>  >> supported types (integers or pointers). So range_of_expr 
>>>>>>>> will always
>>>>>>>>  >> return true.  I've removed the if() and the set_varying.
>>>>>>>>  >>
>>>>>>>>  >>> Since that isn't so in this case, I think it would still 
>>>>>>>> be nice
>>>>>>>>  >>> if the added code could be written as if the range were 
>>>>>>>> set to
>>>>>>>>  >>> varying in this case and (ideally) reduced to just 
>>>>>>>> initialization:
>>>>>>>>  >>>
>>>>>>>>  >>>     value_range vr = some-function (query, stmt, arg);
>>>>>>>>  >>>
>>>>>>>>  >>> some-function could be an inline helper defined just for 
>>>>>>>> the sprintf
>>>>>>>>  >>> pass (and maybe also strlen which also seems to use the 
>>>>>>>> same pattern),
>>>>>>>>  >>> or it could be a value_range AKA irange ctor, or it could 
>>>>>>>> be a member
>>>>>>>>  >>> of range_query, whatever is the most appropriate.
>>>>>>>>  >>>
>>>>>>>>  >>> (If assigning/copying a value_range is thought to be too 
>>>>>>>> expensive,
>>>>>>>>  >>> declaring it first and then passing it to that helper to 
>>>>>>>> set it
>>>>>>>>  >>> would work too).
>>>>>>>>  >>>
>>>>>>>>  >>> In strlen, is the removed comment no longer relevant? 
>>>>>>>> (I.e., does
>>>>>>>>  >>> the ranger solve the problem?)
>>>>>>>>  >>>
>>>>>>>>  >>> -      /* The range below may be "inaccurate" if a 
>>>>>>>> constant has been
>>>>>>>>  >>> -        substituted earlier for VAL by this pass that 
>>>>>>>> hasn't been
>>>>>>>>  >>> -        propagated through the CFG. This shoud be fixed 
>>>>>>>> by the new
>>>>>>>>  >>> -        on-demand VRP if/when it becomes available 
>>>>>>>> (hopefully in
>>>>>>>>  >>> -        GCC 11).  */
>>>>>>>>  >> It should.
>>>>>>>>  >>
>>>>>>>>  >>> I'm wondering about the comment added to 
>>>>>>>> get_range_strlen_dynamic
>>>>>>>>  >>> and other places:
>>>>>>>>  >>>
>>>>>>>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>>>>>>>  >>>
>>>>>>>>  >>> Is that something you're planning to do in a followup or 
>>>>>>>> should
>>>>>>>>  >>> I remember to do it at some point?
>>>>>>>>  >> I'm not planning on doing it.  It's just a reminder that it 
>>>>>>>> would be
>>>>>>>>  >> beneficial to do so.
>>>>>>>>  >>
>>>>>>>>  >>> Otherwise I have no concern with the changes.
>>>>>>>>  >> It's not cleared whether Andrew approved all 3 parts of the 
>>>>>>>> patchset
>>>>>>>>  >> or just the valuation part.  I'll wait for his nod before 
>>>>>>>> committing
>>>>>>>>  >> this chunk.
>>>>>>>>  >>
>>>>>>>>  >> Aldy
>>>>>>>>  >>
>>>>>>>>  > I have no issue with it, so OK.
>>>>>>>>
>>>>>>>> Pushed all 3 patches.
>>>>>>>>
>>>>>>>>  >
>>>>>>>>  > Just an observation that should be pointed out, I believe 
>>>>>>>> Aldy has all
>>>>>>>>  > the code for converting to a ranger, but we have not pursued 
>>>>>>>> that any
>>>>>>>>  > further yet since there is a regression due to our lack of 
>>>>>>>> equivalence
>>>>>>>>  > processing I think?  That should be resolved in the coming 
>>>>>>>> month, but at
>>>>>>>>  > the moment is a holdback/concern for converting these 
>>>>>>>> passes... iirc.
>>>>>>>>
>>>>>>>> Yes.  Martin, the take away here is that the strlen/sprintf 
>>>>>>>> pass has been converted to the new API, but ranger is still not 
>>>>>>>> up and running on it (even on the branch).
>>>>>>>>
>>>>>>>> With the new API, all you have to do is remove all instances of 
>>>>>>>> evrp_range_analyzer and replace them with a ranger. That's it.
>>>>>>>> Below is an untested patch that would convert you to a ranger 
>>>>>>>> once it's contributed.
>>>>>>>>
>>>>>>>> IIRC when I enabled the ranger for your pass a while back, 
>>>>>>>> there was one or two regressions due to missing equivalences, 
>>>>>>>> and the rest were because the tests were expecting an actual 
>>>>>>>> specific range, and the ranger returned a slightly 
>>>>>>>> different/better one. You'll need to adjust your tests.
>>>>>>>
>>>>>>> Ack.  I'll be on the lookout for the ranger commit (if you hppen
>>>>>>> to remember and CC me on it just in case I might miss it that would
>>>>>>> be great).
>>>>>>
>>>>>> I have applied the patch and ran some tests.  There are quite
>>>>>> a few failures (see the list below).  I have only looked at
>>>>>> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
>>>>>> boils down to the following test case.  There should be no warning
>>>>>> for either sprintf call.  The one in h() is a false positive and
>>>>>> the reason for at least some of the regressions. Somehow,
>>>>>> the conversions between int and char are causing Ranger to lose
>>>>>> the range.
>>>>>>
>>>>>> $ cat t.c && gcc -O2 -S -Wall t.c
>>>>>> char a[2];
>>>>>>
>>>>>> extern int x;
>>>>>>
>>>>>> signed char f (int min, int max)
>>>>>> {
>>>>>>    signed char i = x;
>>>>>>    return i < min || max < i ? min : i;
>>>>>> }
>>>>>>
>>>>>> void ff (signed char i)
>>>>>> {
>>>>>>    __builtin_sprintf (a, "%i", f (0, 9));   // okay
>>>>>> }
>>>>>>
>>>>>> signed char g (signed char min, signed char max)
>>>>>> {
>>>>>>    signed char i = x;
>>>>>>    return i < min || max < i ? min : i;
>>>>>> }
>>>>>>
>>>>>> void gg (void)
>>>>>> {
>>>>>>    __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>> }
>>>>>>
>>>>>> t.c: In function ‘gg’:
>>>>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes 
>>>>>> into a region of size 2 [-Wformat-overflow=]
>>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>>        |                          ^~
>>>>>> t.c:24:25: note: directive argument in the range [-128, 127]
>>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>>        |                         ^~~~
>>>>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes 
>>>>>> into a destination of size 2
>>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
>>>>>>
>>>>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where
>>>>>> the false positive is due to the strlen pass somehow having lost
>>>>>> the upper bound on the sum of the two string lengths.
>>>>>
>>>>> I'm guessing this might have something to do with the strlen pass
>>>>> calling set_range_info() on the nonconstant results of strlen calls
>>>>> it can determine the range for.  How would I go about doing it with
>>>>> ranger?  I see ranger_cache::set_global_range() and the class ctor
>>>>> takes a gimple_ranger as argument.  Would calling the function be
>>>>> the right way to do it?  (I couldn't find anything on the Wiki.)
>>>>
>>>> I haven't had time to write a new modern wiki yet... probably won't 
>>>> until well after stage 1 closes either.
>>>>
>>>> And no, thats the internal cache of the ranger, which you dont have 
>>>> access to.
>>>>
>>>> At the moment, there isnt a way to inject a "new" global range out 
>>>> of the blue.  It hasnt been a  requirement before, but wouldn't be 
>>>> particularly difficult.
>>>>
>>>> Short answer, there is no current equivalent of set_range_info() 
>>>> because we dont have a persistent global cache in the same way, and 
>>>> the ranger doesnt set  the RANGE_INFO field at this point.. we 
>>>> havent gotten to the point of reworking that yet.
>>>>
>>>> So the question is, what exactly are you trying to do? I presume 
>>>> you are analyzing  a strlen statement, and are making range 
>>>> conclusions about either the result or some of the operands are 
>>>> that are not otherwise exposed in the code?
>>>>
>>>> And you are looking to have those values injected into the rangers 
>>>> calculations as refinements?
>>>
>>> Yes, exactly.  This is used both for optimization and for warnings
>>> (both to enable them and to suppress false positives).
>>>
>>> Besides folding strlen and sprintf calls into constants, the strlen
>>> pass sets the range of the results of the calls when the lengths of
>>> the arguments/output are known to be at least some number of bytes.
>>>
>>> Besides exposing optimization opportunities downstream, the pass
>>> uses this information to either issue or suppress warnings like in
>>> the sprintf case above.  -Wformat-overflow issues warnings either
>>> based on string lengths (or ranges) if they're known, or based on
>>> the sizes of the arrays the strings are stored in when nothing is
>>> known about the lengths.  The array size is used as the worst case
>>> estimate, which can cause false positives.
>>>
>>> Here's an example of a downstream optimization made possible by
>>> this.  Because i's range is known, the range of the snprintf result
>>> is also known (the range is computed for all snprintf arguments).
>>> The strlen pass sets the range and subsequent passes eliminate
>>> the unreachable code.
>>>
>>>   void f (int i)
>>>   {
>>>     if (i < 100 || 9999 < i)
>>>       i = 100;
>>>
>>>     int n = __builtin_snprintf (0, 0, "%i", i);
>>>     if (n < 3 || 4 < n)
>>>       __builtin_abort ();
>>>   }
>>>
>>> Martin
>>>
>> That seems like somethings that the range code for builtins should 
>> take care of rather than be calculated externally and injected
>>
>>   they are just like any other  builtins that if the range of certain 
>> arguments are known, you can produce a result?
>
> The strlen optimization is more involved in that it keeps track of
> the state of the strings as they're being created by calls to string
> functions like strcpy (or single and multi-byte stores).  So the only
> way for Ranger to figure out the range of the result of a strlen call
> is to query the strlen pass.  That might work via a callback but it
> would of course only work while the strlen pass is running.  The same
> goes for the sprintf results (which are based on the strlen data).
> It might be good enough as long as the ranges can also be exported
> to downstream passes.
>
> It might be possible to compute this on demand too (and introduce
> something like a strlen_range_query) but it would need reworking
> the strlen pass.  I like the on-demand design but in the strlen
> case I'm not sure the benefits would justify the costs.
>
>> I gather the range of i being known to be [100,9999] means we know 
>> something about n.  The the ranger would be able to do that anywhere 
>> and everywhere, not just within this pass.  One of the ranger goals 
>> is to centralize all range tracking
>>
>> and the warning pass can continue to check the ranges and issue 
>> warnings when the ranges arent what it likes.
>
> In this simple case, yes.  But consider a more involved example:
>
>   char a[8];
>
>   void f (int i)
>   {
>     if (i < 100 || 9999 < i) i = 100;
>     memcpy (a, "123", 3);   // strlen(a) is in [3, 7]
>     int n = snprintf (0, 0, "%s %i", a, i);   // n is in [7, 12]
>     if (n < 7 || 12 < n)   // folded to false
>       abort ();
>   }
>
> To compute n's range you need to know not only that of i but also
> the length of a (or its range in this case).  That's what the strlen
> optimization enables.
>
Thats why we need precise examples :-)

Ive attached a thouroughly untested patch which you can apply to the 
latest trunk stuff I've checked  in and try.  It gives you an entry 
point into the ranger:

bool import_global_range (tree name, irange &r);
NAME is the ssa_name you are updating the global value for, and
R is the new range you wish to add.  Note it is not a replacement, but 
rather an intergration.  ie, and extra source of information the ranger 
can't see.

you can call it with your newly calculated value,and it will enhance the 
value the ranger already has with the extra information.. so above, you 
would call it with

  ranger->import_global_range (n_6, [7,12])

this will set the range for n_6 to the INTERSECTION of [7,12] and 
whatever the ranger already knows.  If its varying, then it'll set it to 
[7,12]. If the ranger had already figured it was [9,20] for some reason, 
then the result will be [9,12] aftewr this call, and it will reurn FALSE 
indicating the new range is different than the range you passed in.  if 
you care.

It should also push the range of n_6 forward into pother blocks, so on 
entry to the block with the abort(), it should have a range of [], 
indicating it can't eb reached.

There are a couple of caveats.

1) It may or may not affect other dependent value at this point 
depending on the order you visit things since the propagation of 
dependency chains is not complete yet.  ie  if the abort() block has
h_4 = n_6 + 2
g_2 = h_4 + 6   , g_2 may not be reevaluated with the new range... It 
might.. but I cant guarantee it just now.

2) If you wish this value to be exported into SSA_NAME_RANGE_INFO, you 
will have to continue doing that manually.  The ranger cache is not 
currently ever exported the RANGE_INFO fields.  We've run into numerous 
issues with "differences": that occur when a multi-range is converted to 
a value_range and although the range is correct, it may be different 
that the orignal and produce different results.   We are waiting until 
we replace the SSA_NAME_RANGE_INFO mechanism.

Anyway, let me known if this works. I havent had a chance to test it 
yet.   If it works we can add it to the full API

Andrew
Andrew MacLeod Nov. 10, 2020, 1:57 a.m. UTC | #24
On 11/5/20 2:29 PM, Martin Sebor wrote:
> On 10/1/20 11:25 AM, Martin Sebor wrote:
>>
>
> I have applied the patch and ran some tests.  There are quite
> a few failures (see the list below).  I have only looked at
> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
> boils down to the following test case.  There should be no warning
> for either sprintf call.  The one in h() is a false positive and
> the reason for at least some of the regressions.  Somehow,
> the conversions between int and char are causing Ranger to lose
> the range.
>
> $ cat t.c && gcc -O2 -S -Wall t.c
> char a[2];
>
> extern int x;
>
> signed char f (int min, int max)
> {
>   signed char i = x;
>   return i < min || max < i ? min : i;
> }
>
> void ff (signed char i)
> {
>   __builtin_sprintf (a, "%i", f (0, 9));   // okay
> }
>
> signed char g (signed char min, signed char max)
> {
>   signed char i = x;
>   return i < min || max < i ? min : i;
> }
>
> void gg (void)
> {
>   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
> }
>


The latest changes resolve the issues witg the gg() case.
you will now get a range of [0,9] for the temporary:
=========== BB 4 ============
     <bb 4> :
     # iftmp.3_10 = PHI <0(2), i_6(3)>
     _2 = (int) iftmp.3_10;
     __builtin_sprintf (&a, "%i", _2);
     return;

_2 : int [0, 9]
iftmp.3_10 : signed char [0, 9]




The code issued for the 2 routines is very different. The first routine 
produces:

  =========== BB 2 ============
     <bb 2> :
     x.0_4 = x;
     i_6 = (signed char) x.0_4;
     _7 = (int) i_6;
     if (_7 < 0)
       goto <bb 4>; [50.00%]
     else
       goto <bb 3>; [50.00%]

_7 : int [-128, 127]
2->4  (T) x.0_4 :       int [-INF, -257][-128, -1][128, +INF]
2->4  (T) i_6 :         signed char [-INF, -1]
2->4  (T) _7 :  int [-128, -1]
2->3  (F) x.0_4 :       int [-INF, -129][0, 127][256, +INF]
2->3  (F) i_6 :         signed char [0, +INF]
2->3  (F) _7 :  int [0, 127]

=========== BB 3 ============
i_6     signed char [0, +INF]
_7      int [0, 127]
     <bb 3> :
     if (_7 > 9)
       goto <bb 4>; [50.00%]
     else
       goto <bb 5>; [50.00%]

3->4  (T) _7 :  int [10, 127]
3->5  (F) _7 :  int [0, 9]

=========== BB 4 ============
     <bb 4> :

=========== BB 5 ============
     <bb 5> :
     # iftmp.1_9 = PHI <i_6(3), 0(4)>
     _2 = (int) iftmp.1_9;
     __builtin_sprintf (&a, "%i", _2);
     return;

_2 : int [0, 127]
iftmp.1_9 : signed char [0, +INF]


Note that we figure out that _7 is [0,9] from 3->5,  except the PHI node 
in block 5 refers to i_6...

i_6 is not referred to in BB3, so this generation of ranger doesnt see 
it as an exportable value, and thus doesnt do a calculation for it like 
it does in BB2, where it is defined.  So it only picks up the first of 
the 2 conditions for i_6 that is live on entry to BB3 ([0, +INF])

2 things are in the pipe which will resolve this
1 - equivalences.  _7 is a cast-copy of i_6.  since the range of _7 
falls within the range of a signed char, the forthcoming 
equivalency/relational work will have i_6 match the same value as _7.
2 - The next generation GORI engine will allow for recalculation of 
dependency chains from outside the block when appropriate, so this will 
cause i_6 to also be an exportable value from BB 3 as well. which will 
also then get the [0,9] range in circumstances in which it isnt a simply 
copy.


Neither of which helps you right now with the perfect precision  :-P

Andrew
Martin Sebor Nov. 10, 2020, 2:48 p.m. UTC | #25
On 11/6/20 2:54 PM, Andrew MacLeod wrote:
> On 11/6/20 11:13 AM, Martin Sebor wrote:
>> On 11/5/20 8:08 PM, Andrew MacLeod wrote:
>>> On 11/5/20 7:50 PM, Martin Sebor wrote:
>>>> On 11/5/20 5:02 PM, Andrew MacLeod wrote:
>>>>> On 11/5/20 4:02 PM, Martin Sebor wrote:
>>>>>> On 11/5/20 12:29 PM, Martin Sebor wrote:
>>>>>>> On 10/1/20 11:25 AM, Martin Sebor wrote:
>>>>>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote:
>>>>>>>>>  > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote:
>>>>>>>>>  >>> Thanks for doing all this!  There isn't anything I don't 
>>>>>>>>> understand
>>>>>>>>>  >>> in the sprintf changes so no questions from me (well, 
>>>>>>>>> almost none).
>>>>>>>>>  >>> Just some comments:
>>>>>>>>>  >> Thanks for your comments on the sprintf/strlen API conversion.
>>>>>>>>>  >>
>>>>>>>>>  >>> The current call statement is available in all functions 
>>>>>>>>> that take
>>>>>>>>>  >>> a directive argument, as dir->info.callstmt.  There should 
>>>>>>>>> be no need
>>>>>>>>>  >>> to also add it as a new argument to the functions that now 
>>>>>>>>> need it.
>>>>>>>>>  >> Fixed.
>>>>>>>>>  >>
>>>>>>>>>  >>> The change adds code along these lines in a bunch of places:
>>>>>>>>>  >>>
>>>>>>>>>  >>> +         value_range vr;
>>>>>>>>>  >>> +         if (!query->range_of_expr (vr, arg, stmt))
>>>>>>>>>  >>> +           vr.set_varying (TREE_TYPE (arg));
>>>>>>>>>  >>>
>>>>>>>>>  >>> I thought under the new Ranger APIs when a range couldn't be
>>>>>>>>>  >>> determined it would be automatically set to the maximum for
>>>>>>>>>  >>> the type.  I like that and have been moving in that direction
>>>>>>>>>  >>> with my code myself (rather than having an API fail, have it
>>>>>>>>>  >>> set the max range and succeed).
>>>>>>>>>  >> I went through all the above idioms and noticed all are 
>>>>>>>>> being used on
>>>>>>>>>  >> supported types (integers or pointers). So range_of_expr 
>>>>>>>>> will always
>>>>>>>>>  >> return true.  I've removed the if() and the set_varying.
>>>>>>>>>  >>
>>>>>>>>>  >>> Since that isn't so in this case, I think it would still 
>>>>>>>>> be nice
>>>>>>>>>  >>> if the added code could be written as if the range were 
>>>>>>>>> set to
>>>>>>>>>  >>> varying in this case and (ideally) reduced to just 
>>>>>>>>> initialization:
>>>>>>>>>  >>>
>>>>>>>>>  >>>     value_range vr = some-function (query, stmt, arg);
>>>>>>>>>  >>>
>>>>>>>>>  >>> some-function could be an inline helper defined just for 
>>>>>>>>> the sprintf
>>>>>>>>>  >>> pass (and maybe also strlen which also seems to use the 
>>>>>>>>> same pattern),
>>>>>>>>>  >>> or it could be a value_range AKA irange ctor, or it could 
>>>>>>>>> be a member
>>>>>>>>>  >>> of range_query, whatever is the most appropriate.
>>>>>>>>>  >>>
>>>>>>>>>  >>> (If assigning/copying a value_range is thought to be too 
>>>>>>>>> expensive,
>>>>>>>>>  >>> declaring it first and then passing it to that helper to 
>>>>>>>>> set it
>>>>>>>>>  >>> would work too).
>>>>>>>>>  >>>
>>>>>>>>>  >>> In strlen, is the removed comment no longer relevant? 
>>>>>>>>> (I.e., does
>>>>>>>>>  >>> the ranger solve the problem?)
>>>>>>>>>  >>>
>>>>>>>>>  >>> -      /* The range below may be "inaccurate" if a 
>>>>>>>>> constant has been
>>>>>>>>>  >>> -        substituted earlier for VAL by this pass that 
>>>>>>>>> hasn't been
>>>>>>>>>  >>> -        propagated through the CFG. This shoud be fixed 
>>>>>>>>> by the new
>>>>>>>>>  >>> -        on-demand VRP if/when it becomes available 
>>>>>>>>> (hopefully in
>>>>>>>>>  >>> -        GCC 11).  */
>>>>>>>>>  >> It should.
>>>>>>>>>  >>
>>>>>>>>>  >>> I'm wondering about the comment added to 
>>>>>>>>> get_range_strlen_dynamic
>>>>>>>>>  >>> and other places:
>>>>>>>>>  >>>
>>>>>>>>>  >>> +         // FIXME: Use range_query instead of global ranges.
>>>>>>>>>  >>>
>>>>>>>>>  >>> Is that something you're planning to do in a followup or 
>>>>>>>>> should
>>>>>>>>>  >>> I remember to do it at some point?
>>>>>>>>>  >> I'm not planning on doing it.  It's just a reminder that it 
>>>>>>>>> would be
>>>>>>>>>  >> beneficial to do so.
>>>>>>>>>  >>
>>>>>>>>>  >>> Otherwise I have no concern with the changes.
>>>>>>>>>  >> It's not cleared whether Andrew approved all 3 parts of the 
>>>>>>>>> patchset
>>>>>>>>>  >> or just the valuation part.  I'll wait for his nod before 
>>>>>>>>> committing
>>>>>>>>>  >> this chunk.
>>>>>>>>>  >>
>>>>>>>>>  >> Aldy
>>>>>>>>>  >>
>>>>>>>>>  > I have no issue with it, so OK.
>>>>>>>>>
>>>>>>>>> Pushed all 3 patches.
>>>>>>>>>
>>>>>>>>>  >
>>>>>>>>>  > Just an observation that should be pointed out, I believe 
>>>>>>>>> Aldy has all
>>>>>>>>>  > the code for converting to a ranger, but we have not pursued 
>>>>>>>>> that any
>>>>>>>>>  > further yet since there is a regression due to our lack of 
>>>>>>>>> equivalence
>>>>>>>>>  > processing I think?  That should be resolved in the coming 
>>>>>>>>> month, but at
>>>>>>>>>  > the moment is a holdback/concern for converting these 
>>>>>>>>> passes... iirc.
>>>>>>>>>
>>>>>>>>> Yes.  Martin, the take away here is that the strlen/sprintf 
>>>>>>>>> pass has been converted to the new API, but ranger is still not 
>>>>>>>>> up and running on it (even on the branch).
>>>>>>>>>
>>>>>>>>> With the new API, all you have to do is remove all instances of 
>>>>>>>>> evrp_range_analyzer and replace them with a ranger. That's it.
>>>>>>>>> Below is an untested patch that would convert you to a ranger 
>>>>>>>>> once it's contributed.
>>>>>>>>>
>>>>>>>>> IIRC when I enabled the ranger for your pass a while back, 
>>>>>>>>> there was one or two regressions due to missing equivalences, 
>>>>>>>>> and the rest were because the tests were expecting an actual 
>>>>>>>>> specific range, and the ranger returned a slightly 
>>>>>>>>> different/better one. You'll need to adjust your tests.
>>>>>>>>
>>>>>>>> Ack.  I'll be on the lookout for the ranger commit (if you hppen
>>>>>>>> to remember and CC me on it just in case I might miss it that would
>>>>>>>> be great).
>>>>>>>
>>>>>>> I have applied the patch and ran some tests.  There are quite
>>>>>>> a few failures (see the list below).  I have only looked at
>>>>>>> a couple.  The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
>>>>>>> boils down to the following test case.  There should be no warning
>>>>>>> for either sprintf call.  The one in h() is a false positive and
>>>>>>> the reason for at least some of the regressions. Somehow,
>>>>>>> the conversions between int and char are causing Ranger to lose
>>>>>>> the range.
>>>>>>>
>>>>>>> $ cat t.c && gcc -O2 -S -Wall t.c
>>>>>>> char a[2];
>>>>>>>
>>>>>>> extern int x;
>>>>>>>
>>>>>>> signed char f (int min, int max)
>>>>>>> {
>>>>>>>    signed char i = x;
>>>>>>>    return i < min || max < i ? min : i;
>>>>>>> }
>>>>>>>
>>>>>>> void ff (signed char i)
>>>>>>> {
>>>>>>>    __builtin_sprintf (a, "%i", f (0, 9));   // okay
>>>>>>> }
>>>>>>>
>>>>>>> signed char g (signed char min, signed char max)
>>>>>>> {
>>>>>>>    signed char i = x;
>>>>>>>    return i < min || max < i ? min : i;
>>>>>>> }
>>>>>>>
>>>>>>> void gg (void)
>>>>>>> {
>>>>>>>    __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>>> }
>>>>>>>
>>>>>>> t.c: In function ‘gg’:
>>>>>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes 
>>>>>>> into a region of size 2 [-Wformat-overflow=]
>>>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>>>        |                          ^~
>>>>>>> t.c:24:25: note: directive argument in the range [-128, 127]
>>>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>>>        |                         ^~~~
>>>>>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes 
>>>>>>> into a destination of size 2
>>>>>>>     24 |   __builtin_sprintf (a, "%i", g (0, 9));   // bogus warning
>>>>>>>        |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>
>>>>>>>
>>>>>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where
>>>>>>> the false positive is due to the strlen pass somehow having lost
>>>>>>> the upper bound on the sum of the two string lengths.
>>>>>>
>>>>>> I'm guessing this might have something to do with the strlen pass
>>>>>> calling set_range_info() on the nonconstant results of strlen calls
>>>>>> it can determine the range for.  How would I go about doing it with
>>>>>> ranger?  I see ranger_cache::set_global_range() and the class ctor
>>>>>> takes a gimple_ranger as argument.  Would calling the function be
>>>>>> the right way to do it?  (I couldn't find anything on the Wiki.)
>>>>>
>>>>> I haven't had time to write a new modern wiki yet... probably won't 
>>>>> until well after stage 1 closes either.
>>>>>
>>>>> And no, thats the internal cache of the ranger, which you dont have 
>>>>> access to.
>>>>>
>>>>> At the moment, there isnt a way to inject a "new" global range out 
>>>>> of the blue.  It hasnt been a  requirement before, but wouldn't be 
>>>>> particularly difficult.
>>>>>
>>>>> Short answer, there is no current equivalent of set_range_info() 
>>>>> because we dont have a persistent global cache in the same way, and 
>>>>> the ranger doesnt set  the RANGE_INFO field at this point.. we 
>>>>> havent gotten to the point of reworking that yet.
>>>>>
>>>>> So the question is, what exactly are you trying to do? I presume 
>>>>> you are analyzing  a strlen statement, and are making range 
>>>>> conclusions about either the result or some of the operands are 
>>>>> that are not otherwise exposed in the code?
>>>>>
>>>>> And you are looking to have those values injected into the rangers 
>>>>> calculations as refinements?
>>>>
>>>> Yes, exactly.  This is used both for optimization and for warnings
>>>> (both to enable them and to suppress false positives).
>>>>
>>>> Besides folding strlen and sprintf calls into constants, the strlen
>>>> pass sets the range of the results of the calls when the lengths of
>>>> the arguments/output are known to be at least some number of bytes.
>>>>
>>>> Besides exposing optimization opportunities downstream, the pass
>>>> uses this information to either issue or suppress warnings like in
>>>> the sprintf case above.  -Wformat-overflow issues warnings either
>>>> based on string lengths (or ranges) if they're known, or based on
>>>> the sizes of the arrays the strings are stored in when nothing is
>>>> known about the lengths.  The array size is used as the worst case
>>>> estimate, which can cause false positives.
>>>>
>>>> Here's an example of a downstream optimization made possible by
>>>> this.  Because i's range is known, the range of the snprintf result
>>>> is also known (the range is computed for all snprintf arguments).
>>>> The strlen pass sets the range and subsequent passes eliminate
>>>> the unreachable code.
>>>>
>>>>   void f (int i)
>>>>   {
>>>>     if (i < 100 || 9999 < i)
>>>>       i = 100;
>>>>
>>>>     int n = __builtin_snprintf (0, 0, "%i", i);
>>>>     if (n < 3 || 4 < n)
>>>>       __builtin_abort ();
>>>>   }
>>>>
>>>> Martin
>>>>
>>> That seems like somethings that the range code for builtins should 
>>> take care of rather than be calculated externally and injected
>>>
>>>   they are just like any other  builtins that if the range of certain 
>>> arguments are known, you can produce a result?
>>
>> The strlen optimization is more involved in that it keeps track of
>> the state of the strings as they're being created by calls to string
>> functions like strcpy (or single and multi-byte stores).  So the only
>> way for Ranger to figure out the range of the result of a strlen call
>> is to query the strlen pass.  That might work via a callback but it
>> would of course only work while the strlen pass is running.  The same
>> goes for the sprintf results (which are based on the strlen data).
>> It might be good enough as long as the ranges can also be exported
>> to downstream passes.
>>
>> It might be possible to compute this on demand too (and introduce
>> something like a strlen_range_query) but it would need reworking
>> the strlen pass.  I like the on-demand design but in the strlen
>> case I'm not sure the benefits would justify the costs.
>>
>>> I gather the range of i being known to be [100,9999] means we know 
>>> something about n.  The the ranger would be able to do that anywhere 
>>> and everywhere, not just within this pass.  One of the ranger goals 
>>> is to centralize all range tracking
>>>
>>> and the warning pass can continue to check the ranges and issue 
>>> warnings when the ranges arent what it likes.
>>
>> In this simple case, yes.  But consider a more involved example:
>>
>>   char a[8];
>>
>>   void f (int i)
>>   {
>>     if (i < 100 || 9999 < i) i = 100;
>>     memcpy (a, "123", 3);   // strlen(a) is in [3, 7]
>>     int n = snprintf (0, 0, "%s %i", a, i);   // n is in [7, 12]
>>     if (n < 7 || 12 < n)   // folded to false
>>       abort ();
>>   }
>>
>> To compute n's range you need to know not only that of i but also
>> the length of a (or its range in this case).  That's what the strlen
>> optimization enables.
>>
> Thats why we need precise examples :-)
> 
> Ive attached a thouroughly untested patch which you can apply to the 
> latest trunk stuff I've checked  in and try.  It gives you an entry 
> point into the ranger:
> 
> bool import_global_range (tree name, irange &r);
> NAME is the ssa_name you are updating the global value for, and
> R is the new range you wish to add.  Note it is not a replacement, but 
> rather an intergration.  ie, and extra source of information the ranger 
> can't see.
> 
> you can call it with your newly calculated value,and it will enhance the 
> value the ranger already has with the extra information.. so above, you 
> would call it with
> 
>   ranger->import_global_range (n_6, [7,12])
> 
> this will set the range for n_6 to the INTERSECTION of [7,12] and 
> whatever the ranger already knows.  If its varying, then it'll set it to 
> [7,12]. If the ranger had already figured it was [9,20] for some reason, 
> then the result will be [9,12] aftewr this call, and it will reurn FALSE 
> indicating the new range is different than the range you passed in.  if 
> you care.
> 
> It should also push the range of n_6 forward into pother blocks, so on 
> entry to the block with the abort(), it should have a range of [], 
> indicating it can't eb reached.
> 
> There are a couple of caveats.
> 
> 1) It may or may not affect other dependent value at this point 
> depending on the order you visit things since the propagation of 
> dependency chains is not complete yet.  ie  if the abort() block has
> h_4 = n_6 + 2
> g_2 = h_4 + 6   , g_2 may not be reevaluated with the new range... It 
> might.. but I cant guarantee it just now.
> 
> 2) If you wish this value to be exported into SSA_NAME_RANGE_INFO, you 
> will have to continue doing that manually.  The ranger cache is not 
> currently ever exported the RANGE_INFO fields.  We've run into numerous 
> issues with "differences": that occur when a multi-range is converted to 
> a value_range and although the range is correct, it may be different 
> that the orignal and produce different results.   We are waiting until 
> we replace the SSA_NAME_RANGE_INFO mechanism.
> 
> Anyway, let me known if this works. I havent had a chance to test it 
> yet.   If it works we can add it to the full API

Thanks a lot!  I'll try to get to it sometime this week if I can.

Martin
diff mbox series

Patch

From 75e48c3b6d0d744e511edfa075a50f635c879aa8 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez <aldyh@redhat.com>
Date: Thu, 17 Sep 2020 09:34:03 +0200
Subject: [PATCH 3/3] Convert sprintf/strlen passes to value query class.

---
 gcc/builtins.c           |  29 +++---
 gcc/builtins.h           |   5 +-
 gcc/gimple-ssa-sprintf.c | 136 +++++++++++++++------------
 gcc/tree-ssa-strlen.c    | 193 ++++++++++++++++++++-------------------
 gcc/tree-ssa-strlen.h    |   9 +-
 5 files changed, 196 insertions(+), 176 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 97f1a184dc6..fb94c8e0ac0 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -183,8 +183,9 @@  static void maybe_emit_chk_warning (tree, enum built_in_function);
 static void maybe_emit_sprintf_chk_warning (tree, enum built_in_function);
 static void maybe_emit_free_warning (tree);
 static tree fold_builtin_object_size (tree, tree);
-static tree compute_objsize (tree, int, access_ref *, const vr_values * = NULL);
-static bool get_range (tree, signop, offset_int[2], const vr_values * = NULL);
+static tree compute_objsize (tree, int, access_ref *, range_query * = NULL);
+static bool get_range (tree, gimple *, signop, offset_int[2],
+		       range_query * = NULL);
 static bool check_read_access (tree, tree, tree = NULL_TREE, int = 1);
 
 unsigned HOST_WIDE_INT target_newline;
@@ -4070,7 +4071,7 @@  check_read_access (tree exp, tree src, tree bound /* = NULL_TREE */,
 
 tree
 gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */,
-			const vr_values *rvals /* = NULL */)
+			range_query *rvals /* = NULL */)
 {
   if (!stmt)
     return NULL_TREE;
@@ -4122,7 +4123,7 @@  gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */,
   if (!rng1)
     rng1 = rng1_buf;
 
-  if (!get_range (size, rng1, rvals))
+  if (!get_range (size, stmt, rng1, rvals))
     return NULL_TREE;
 
   if (argidx2 > nargs && TREE_CODE (size) == INTEGER_CST)
@@ -4132,7 +4133,7 @@  gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */,
      of the upper bounds as a constant.  Ignore anti-ranges.  */
   tree n = argidx2 < nargs ? gimple_call_arg (stmt, argidx2) : integer_one_node;
   wide_int rng2[2];
-  if (!get_range (n, rng2, rvals))
+  if (!get_range (n, stmt, rng2, rvals))
     return NULL_TREE;
 
   /* Extend to the maximum precision to avoid overflow.  */
@@ -4160,11 +4161,11 @@  gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */,
    result but accepts offset_int instead.  */
 
 static bool
-get_range (tree x, signop sgn, offset_int r[2],
-	   const vr_values *rvals /* = NULL */)
+get_range (tree x, gimple *stmt, signop sgn, offset_int r[2],
+	   range_query *rvals /* = NULL */)
 {
   wide_int wr[2];
-  if (!get_range (x, wr, rvals))
+  if (!get_range (x, stmt, wr, rvals))
     return false;
 
   r[0] = offset_int::from (wr[0], sgn);
@@ -4188,7 +4189,7 @@  get_range (tree x, signop sgn, offset_int r[2],
 
 static bool
 compute_objsize (tree ptr, int ostype, access_ref *pref,
-		 bitmap *visited, const vr_values *rvals /* = NULL */)
+		 bitmap *visited, range_query *rvals /* = NULL */)
 {
   const bool addr = TREE_CODE (ptr) == ADDR_EXPR;
   if (addr)
@@ -4286,7 +4287,7 @@  compute_objsize (tree ptr, int ostype, access_ref *pref,
 
       offset_int orng[2];
       tree off = TREE_OPERAND (ptr, 1);
-      if (!get_range (off, SIGNED, orng, rvals))
+      if (!get_range (off, NULL, SIGNED, orng, rvals))
 	/* Fail unless the size of the object is zero.  */
 	return pref->sizrng[0] == 0 && pref->sizrng[0] == pref->sizrng[1];
 
@@ -4367,7 +4368,7 @@  compute_objsize (tree ptr, int ostype, access_ref *pref,
 	     offset to the maximum.  */
 	  offset_int orng[2];
 	  tree off = gimple_assign_rhs2 (stmt);
-	  if (!get_range (off, SIGNED, orng, rvals))
+	  if (!get_range (off, stmt, SIGNED, orng, rvals))
 	    {
 	      orng[0] = wi::to_offset (TYPE_MIN_VALUE (ptrdiff_type_node));
 	      orng[1] = wi::to_offset (TYPE_MAX_VALUE (ptrdiff_type_node));
@@ -4397,7 +4398,7 @@  compute_objsize (tree ptr, int ostype, access_ref *pref,
       && !array_at_struct_end_p (ptr))
     {
       if (tree size = TYPE_SIZE_UNIT (type))
-	return get_range (size, UNSIGNED, pref->sizrng, rvals);
+	return get_range (size, NULL, UNSIGNED, pref->sizrng, rvals);
     }
 
   return false;
@@ -4407,7 +4408,7 @@  compute_objsize (tree ptr, int ostype, access_ref *pref,
 
 static tree
 compute_objsize (tree ptr, int ostype, access_ref *pref,
-		 const vr_values *rvals /* = NULL */)
+		 range_query *rvals /* = NULL */)
 {
   bitmap visited = NULL;
 
@@ -4439,7 +4440,7 @@  compute_objsize (tree ptr, int ostype, access_ref *pref,
 
 tree
 compute_objsize (tree ptr, int ostype, tree *pdecl /* = NULL */,
-		 tree *poff /* = NULL */, const vr_values *rvals /* = NULL */)
+		 tree *poff /* = NULL */, class range_query *rvals /* = NULL */)
 {
   /* Set the initial offsets to zero and size to negative to indicate
      none has been computed yet.  */
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 94ff96b1292..5ebde7930cb 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -134,11 +134,10 @@  extern void set_builtin_user_assembler_name (tree decl, const char *asmspec);
 extern bool is_simple_builtin (tree);
 extern bool is_inexpensive_builtin (tree);
 
-class vr_values;
 tree gimple_call_alloc_size (gimple *, wide_int[2] = NULL,
-			     const vr_values * = NULL);
+			     class range_query * = NULL);
 extern tree compute_objsize (tree, int, tree * = NULL, tree * = NULL,
-			     const vr_values * = NULL);
+			     class range_query * = NULL);
 
 extern bool readonly_data_expr (tree exp);
 extern bool init_target_chars (void);
diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 70b031fe7b9..8b173987532 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -546,8 +546,8 @@  fmtresult::type_max_digits (tree type, int base)
 }
 
 static bool
-get_int_range (tree, HOST_WIDE_INT *, HOST_WIDE_INT *, bool, HOST_WIDE_INT,
-	       const vr_values *);
+get_int_range (tree, gimple *, HOST_WIDE_INT *, HOST_WIDE_INT *,
+	       bool, HOST_WIDE_INT, range_query *);
 
 struct call_info;
 
@@ -597,7 +597,7 @@  struct directive
 
   /* Format conversion function that given a directive and an argument
      returns the formatting result.  */
-  fmtresult (*fmtfunc) (const directive &, tree, const vr_values *);
+  fmtresult (*fmtfunc) (gimple *, const directive &, tree, range_query *);
 
   /* Return True when the format flag CHR has been used.  */
   bool get_flag (char chr) const
@@ -634,10 +634,7 @@  struct directive
      or 0, whichever is greater.  For a non-constant ARG in some range
      set width to its range adjusting each bound to -1 if it's less.
      For an indeterminate ARG set width to [0, INT_MAX].  */
-  void set_width (tree arg, const vr_values *vr)
-  {
-    get_int_range (arg, width, width + 1, true, 0, vr);
-  }
+  void set_width (tree arg, range_query *);
 
   /* Set both bounds of the precision range to VAL.  */
   void set_precision (HOST_WIDE_INT val)
@@ -650,10 +647,7 @@  struct directive
      or -1 whichever is greater.  For a non-constant ARG in some range
      set precision to its range adjusting each bound to -1 if it's less.
      For an indeterminate ARG set precision to [-1, INT_MAX].  */
-  void set_precision (tree arg, const vr_values *vr)
-  {
-    get_int_range (arg, prec, prec + 1, false, -1, vr);
-  }
+  void set_precision (tree arg, range_query *query);
 
   /* Return true if both width and precision are known to be
      either constant or in some range, false otherwise.  */
@@ -956,10 +950,22 @@  struct call_info
   }
 };
 
+void
+directive::set_width (tree arg, range_query *query)
+{
+  get_int_range (arg, info->callstmt, width, width + 1, true, 0, query);
+}
+
+void
+directive::set_precision (tree arg, range_query *query)
+{
+  get_int_range (arg, info->callstmt, prec, prec + 1, false, -1, query);
+}
+
 /* Return the result of formatting a no-op directive (such as '%n').  */
 
 static fmtresult
-format_none (const directive &, tree, const vr_values *)
+format_none (gimple *, const directive &, tree, range_query *)
 {
   fmtresult res (0);
   return res;
@@ -968,7 +974,7 @@  format_none (const directive &, tree, const vr_values *)
 /* Return the result of formatting the '%%' directive.  */
 
 static fmtresult
-format_percent (const directive &, tree, const vr_values *)
+format_percent (gimple *, const directive &, tree, range_query *)
 {
   fmtresult res (1);
   return res;
@@ -1026,9 +1032,10 @@  build_intmax_type_nodes (tree *pintmax, tree *puintmax)
    the determined range are replaced with NEGBOUND.  */
 
 static bool
-get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax,
+get_int_range (tree arg, gimple *stmt,
+	       HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax,
 	       bool absolute, HOST_WIDE_INT negbound,
-	       const class vr_values *vr_values)
+	       range_query *query)
 {
   /* The type of the result.  */
   const_tree type = integer_type_node;
@@ -1067,10 +1074,11 @@  get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax,
 	  && TYPE_PRECISION (argtype) <= TYPE_PRECISION (type))
 	{
 	  /* Try to determine the range of values of the integer argument.  */
-	  const value_range_equiv *vr
-	    = CONST_CAST (class vr_values *, vr_values)->get_value_range (arg);
+	  value_range vr;
+	  if (!query->range_of_expr (vr, arg, stmt))
+	    vr.set_varying (TREE_TYPE (arg));
 
-	  if (!vr->undefined_p () && !vr->varying_p () && !vr->symbolic_p ())
+	  if (!vr.undefined_p () && !vr.varying_p ())
 	    {
 	      HOST_WIDE_INT type_min
 		= (TYPE_UNSIGNED (argtype)
@@ -1080,8 +1088,8 @@  get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax,
 	      HOST_WIDE_INT type_max = tree_to_uhwi (TYPE_MAX_VALUE (argtype));
 
 	      tree type = TREE_TYPE (arg);
-	      tree tmin = wide_int_to_tree (type, vr->lower_bound ());
-	      tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+	      tree tmin = wide_int_to_tree (type, vr.lower_bound ());
+	      tree tmax = wide_int_to_tree (type, vr.upper_bound ());
 	      *pmin = TREE_INT_CST_LOW (tmin);
 	      *pmax = TREE_INT_CST_LOW (tmax);
 
@@ -1103,8 +1111,8 @@  get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax,
       /* Handle an argument with an unknown range as if none had been
 	 provided.  */
       if (unknown)
-	return get_int_range (NULL_TREE, pmin, pmax, absolute,
-			      negbound, vr_values);
+	return get_int_range (NULL_TREE, NULL, pmin, pmax, absolute,
+			      negbound, query);
     }
 
   /* Adjust each bound as specified by ABSOLUTE and NEGBOUND.  */
@@ -1189,7 +1197,8 @@  adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
    used when the directive argument or its value isn't known.  */
 
 static fmtresult
-format_integer (const directive &dir, tree arg, const vr_values *vr_values)
+format_integer (gimple *stmt, const directive &dir, tree arg,
+		range_query *query)
 {
   tree intmax_type_node;
   tree uintmax_type_node;
@@ -1372,13 +1381,14 @@  format_integer (const directive &dir, tree arg, const vr_values *vr_values)
     {
       /* Try to determine the range of values of the integer argument
 	 (range information is not available for pointers).  */
-      const value_range_equiv *vr
-	= CONST_CAST (class vr_values *, vr_values)->get_value_range (arg);
+      value_range vr;
+      if (!query->range_of_expr (vr, arg, stmt))
+	vr.set_varying (TREE_TYPE (arg));
 
-      if (!vr->varying_p () && !vr->undefined_p () && !vr->symbolic_p ())
+      if (!vr.varying_p () && !vr.undefined_p ())
 	{
-	  argmin = wide_int_to_tree (TREE_TYPE (arg), vr->lower_bound ());
-	  argmax = wide_int_to_tree (TREE_TYPE (arg), vr->upper_bound ());
+	  argmin = wide_int_to_tree (TREE_TYPE (arg), vr.lower_bound ());
+	  argmax = wide_int_to_tree (TREE_TYPE (arg), vr.upper_bound ());
 
 	  /* Set KNOWNRANGE if the argument is in a known subrange
 	     of the directive's type and neither width nor precision
@@ -1404,7 +1414,7 @@  format_integer (const directive &dir, tree arg, const vr_values *vr_values)
 	      if (code == INTEGER_CST)
 		{
 		  arg = gimple_assign_rhs1 (def);
-		  return format_integer (dir, arg, vr_values);
+		  return format_integer (def, dir, arg, query);
 		}
 
 	      if (code == NOP_EXPR)
@@ -1449,16 +1459,16 @@  format_integer (const directive &dir, tree arg, const vr_values *vr_values)
       /* For unsigned conversions/directives or signed when
 	 the minimum is positive, use the minimum and maximum to compute
 	 the shortest and longest output, respectively.  */
-      res.range.min = format_integer (dir, argmin, vr_values).range.min;
-      res.range.max = format_integer (dir, argmax, vr_values).range.max;
+      res.range.min = format_integer (stmt, dir, argmin, query).range.min;
+      res.range.max = format_integer (stmt, dir, argmax, query).range.max;
     }
   else if (tree_int_cst_sgn (argmax) < 0)
     {
       /* For signed conversions/directives if maximum is negative,
 	 use the minimum as the longest output and maximum as the
 	 shortest output.  */
-      res.range.min = format_integer (dir, argmax, vr_values).range.min;
-      res.range.max = format_integer (dir, argmin, vr_values).range.max;
+      res.range.min = format_integer (stmt, dir, argmax, query).range.min;
+      res.range.max = format_integer (stmt, dir, argmin, query).range.max;
     }
   else
     {
@@ -1467,11 +1477,11 @@  format_integer (const directive &dir, tree arg, const vr_values *vr_values)
 	 length of the output of both minimum and maximum and pick the
 	 longer.  */
       unsigned HOST_WIDE_INT max1
-	= format_integer (dir, argmin, vr_values).range.max;
+	= format_integer (stmt, dir, argmin, query).range.max;
       unsigned HOST_WIDE_INT max2
-	= format_integer (dir, argmax, vr_values).range.max;
+	= format_integer (stmt, dir, argmax, query).range.max;
       res.range.min
-	= format_integer (dir, integer_zero_node, vr_values).range.min;
+	= format_integer (stmt, dir, integer_zero_node, query).range.min;
       res.range.max = MAX (max1, max2);
     }
 
@@ -1820,7 +1830,7 @@  format_floating (const directive &dir, const HOST_WIDE_INT prec[2])
    ARG.  */
 
 static fmtresult
-format_floating (const directive &dir, tree arg, const vr_values *)
+format_floating (gimple *, const directive &dir, tree arg, range_query *)
 {
   HOST_WIDE_INT prec[] = { dir.prec[0], dir.prec[1] };
   tree type = (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll
@@ -2014,7 +2024,8 @@  format_floating (const directive &dir, tree arg, const vr_values *)
    Used by the format_string function below.  */
 
 static fmtresult
-get_string_length (tree str, unsigned eltsize, const vr_values *vr)
+get_string_length (tree str, gimple *stmt, unsigned eltsize,
+		   range_query *query)
 {
   if (!str)
     return fmtresult ();
@@ -2025,7 +2036,7 @@  get_string_length (tree str, unsigned eltsize, const vr_values *vr)
   c_strlen_data lendata = { };
   lendata.maxbound = str;
   if (eltsize == 1)
-    get_range_strlen_dynamic (str, &lendata, vr);
+    get_range_strlen_dynamic (str, stmt, &lendata, query);
   else
     {
       /* Determine the length of the shortest and longest string referenced
@@ -2122,7 +2133,8 @@  get_string_length (tree str, unsigned eltsize, const vr_values *vr)
    vsprinf).  */
 
 static fmtresult
-format_character (const directive &dir, tree arg, const vr_values *vr_values)
+format_character (gimple *stmt, const directive &dir, tree arg,
+		  range_query *query)
 {
   fmtresult res;
 
@@ -2135,7 +2147,7 @@  format_character (const directive &dir, tree arg, const vr_values *vr_values)
       res.range.min = 0;
 
       HOST_WIDE_INT min, max;
-      if (get_int_range (arg, &min, &max, false, 0, vr_values))
+      if (get_int_range (arg, stmt, &min, &max, false, 0, query))
 	{
 	  if (min == 0 && max == 0)
 	    {
@@ -2433,7 +2445,8 @@  alias_offset (tree arg, tree dst, HOST_WIDE_INT dst_fld)
    vsprinf).  */
 
 static fmtresult
-format_string (const directive &dir, tree arg, const vr_values *vr_values)
+format_string (gimple *stmt, const directive &dir, tree arg,
+	       range_query *query)
 {
   fmtresult res;
 
@@ -2462,7 +2475,7 @@  format_string (const directive &dir, tree arg, const vr_values *vr_values)
       gcc_checking_assert (count_by == 2 || count_by == 4);
     }
 
-  fmtresult slen = get_string_length (arg, count_by, vr_values);
+  fmtresult slen = get_string_length (arg, stmt, count_by, query);
   if (slen.range.min == slen.range.max
       && slen.range.min < HOST_WIDE_INT_MAX)
     {
@@ -2634,7 +2647,7 @@  format_string (const directive &dir, tree arg, const vr_values *vr_values)
 /* Format plain string (part of the format string itself).  */
 
 static fmtresult
-format_plain (const directive &dir, tree, const vr_values *)
+format_plain (gimple *, const directive &dir, tree, range_query *)
 {
   fmtresult res (dir.len);
   return res;
@@ -3030,7 +3043,7 @@  bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result &res)
 static bool
 format_directive (const call_info &info,
 		  format_result *res, const directive &dir,
-		  const class vr_values *vr_values)
+		  range_query *query)
 {
   /* Offset of the beginning of the directive from the beginning
      of the format string.  */
@@ -3055,7 +3068,7 @@  format_directive (const call_info &info,
     return false;
 
   /* Compute the range of lengths of the formatted output.  */
-  fmtresult fmtres = dir.fmtfunc (dir, dir.arg, vr_values);
+  fmtresult fmtres = dir.fmtfunc (info.callstmt, dir, dir.arg, query);
 
   /* Record whether the output of all directives is known to be
      bounded by some maximum, implying that their arguments are
@@ -3386,7 +3399,7 @@  static size_t
 parse_directive (call_info &info,
 		 directive &dir, format_result *res,
 		 const char *str, unsigned *argno,
-		 const vr_values *vr_values)
+		 range_query *query)
 {
   const char *pcnt = strchr (str, target_percent);
   dir.beg = str;
@@ -3711,7 +3724,7 @@  parse_directive (call_info &info,
   if (star_width)
     {
       if (INTEGRAL_TYPE_P (TREE_TYPE (star_width)))
-	dir.set_width (star_width, vr_values);
+	dir.set_width (star_width, query);
       else
 	{
 	  /* Width specified by a va_list takes on the range [0, -INT_MIN]
@@ -3744,7 +3757,7 @@  parse_directive (call_info &info,
   if (star_precision)
     {
       if (INTEGRAL_TYPE_P (TREE_TYPE (star_precision)))
-	dir.set_precision (star_precision, vr_values);
+	dir.set_precision (star_precision, query);
       else
 	{
 	  /* Precision specified by a va_list takes on the range [-1, INT_MAX]
@@ -3958,7 +3971,8 @@  maybe_warn_overlap (call_info &info, format_result *res)
    that caused the processing to be terminated early).  */
 
 static bool
-compute_format_length (call_info &info, format_result *res, const vr_values *vr)
+compute_format_length (call_info &info, format_result *res,
+		       range_query *query)
 {
   if (dump_file)
     {
@@ -3995,10 +4009,10 @@  compute_format_length (call_info &info, format_result *res, const vr_values *vr)
     {
       directive dir (&info, dirno);
 
-      size_t n = parse_directive (info, dir, res, pf, &argno, vr);
+      size_t n = parse_directive (info, dir, res, pf, &argno, query);
 
       /* Return failure if the format function fails.  */
-      if (!format_directive (info, res, dir, vr))
+      if (!format_directive (info, res, dir, query))
 	return false;
 
       /* Return success when the directive is zero bytes long and it's
@@ -4288,13 +4302,14 @@  get_user_idx_format (tree fndecl, unsigned *idx_args)
    gsi_next should not be performed in the caller.  */
 
 bool
-handle_printf_call (gimple_stmt_iterator *gsi, const vr_values *vr_values)
+handle_printf_call (gimple_stmt_iterator *gsi, range_query *query)
 {
   init_target_to_host_charmap ();
 
   call_info info = call_info ();
 
-  info.callstmt = gsi_stmt (*gsi);
+  gimple *stmt = gsi_stmt (*gsi);
+  info.callstmt = stmt;
   info.func = gimple_call_fndecl (info.callstmt);
   if (!info.func)
     return false;
@@ -4557,14 +4572,15 @@  handle_printf_call (gimple_stmt_iterator *gsi, const vr_values *vr_values)
 	  /* Try to determine the range of values of the argument
 	     and use the greater of the two at level 1 and the smaller
 	     of them at level 2.  */
-	  const value_range_equiv *vr
-	    = CONST_CAST (class vr_values *, vr_values)->get_value_range (size);
+	  value_range vr;
+	  if (!query->range_of_expr (vr, size, stmt))
+	    vr.set_varying (TREE_TYPE (size));
 
-	  if (!vr->undefined_p () && !vr->symbolic_p ())
+	  if (!vr.undefined_p ())
 	    {
 	      tree type = TREE_TYPE (size);
-	      tree tmin = wide_int_to_tree (type, vr->lower_bound ());
-	      tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+	      tree tmin = wide_int_to_tree (type, vr.lower_bound ());
+	      tree tmax = wide_int_to_tree (type, vr.upper_bound ());
 	      unsigned HOST_WIDE_INT minsize = TREE_INT_CST_LOW (tmin);
 	      unsigned HOST_WIDE_INT maxsize = TREE_INT_CST_LOW (tmax);
 	      dstsize = warn_level < 2 ? maxsize : minsize;
@@ -4675,7 +4691,7 @@  handle_printf_call (gimple_stmt_iterator *gsi, const vr_values *vr_values)
      never set to true again).  */
   res.posunder4k = posunder4k && dstptr;
 
-  bool success = compute_format_length (info, &res, vr_values);
+  bool success = compute_format_length (info, &res, query);
   if (res.warned)
     gimple_set_no_warning (info.callstmt, true);
 
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 9907cc0c824..8f2413de3ff 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -200,7 +200,8 @@  static void handle_builtin_stxncpy_strncat (bool, gimple_stmt_iterator *);
    to determine the range, otherwise get_range_info.  */
 
 tree
-get_range (tree val, wide_int minmax[2], const vr_values *rvals /* = NULL */)
+get_range (tree val, gimple *stmt, wide_int minmax[2],
+	   range_query *rvals /* = NULL */)
 {
   if (TREE_CODE (val) == INTEGER_CST)
     {
@@ -211,21 +212,17 @@  get_range (tree val, wide_int minmax[2], const vr_values *rvals /* = NULL */)
   if (TREE_CODE (val) != SSA_NAME)
     return NULL_TREE;
 
-  if (rvals)
-    {
-      /* The range below may be "inaccurate" if a constant has been
-	 substituted earlier for VAL by this pass that hasn't been
-	 propagated through the CFG.  This shoud be fixed by the new
-	 on-demand VRP if/when it becomes available (hopefully in
-	 GCC 11).  */
-      const value_range *vr
-	= (CONST_CAST (class vr_values *, rvals)->get_value_range (val));
-      value_range_kind rng = vr->kind ();
-      if (rng != VR_RANGE || !range_int_cst_p (vr))
+  if (rvals && stmt)
+    {
+      value_range vr;
+      if (!rvals->range_of_expr (vr, val, stmt))
+	return NULL_TREE;
+      value_range_kind rng = vr.kind ();
+      if (rng != VR_RANGE)
 	return NULL_TREE;
 
-      minmax[0] = wi::to_wide (vr->min ());
-      minmax[1] = wi::to_wide (vr->max ());
+      minmax[0] = wi::to_wide (vr.min ());
+      minmax[1] = wi::to_wide (vr.max ());
       return val;
     }
 
@@ -263,7 +260,7 @@  compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off)
 
 static int
 compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off,
-		       const vr_values *rvals)
+		       range_query *rvals)
 {
   if (!si->nonzero_chars)
     return -1;
@@ -274,20 +271,19 @@  compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off,
   if (!rvals || TREE_CODE (si->nonzero_chars) != SSA_NAME)
     return -1;
 
-  const value_range_equiv *vr
-    = (CONST_CAST (class vr_values *, rvals)
-       ->get_value_range (si->nonzero_chars));
-
-  value_range_kind rng = vr->kind ();
-  if (rng != VR_RANGE || !range_int_cst_p (vr))
+  value_range vr;
+  if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt))
+    return -1;
+  value_range_kind rng = vr.kind ();
+  if (rng != VR_RANGE)
     return -1;
 
   /* If the offset is less than the minimum length or if the bounds
      of the length range are equal return the result of the comparison
      same as in the constant case.  Otherwise return a conservative
      result.  */
-  int cmpmin = compare_tree_int (vr->min (), off);
-  if (cmpmin > 0 || tree_int_cst_equal (vr->min (), vr->max ()))
+  int cmpmin = compare_tree_int (vr.min (), off);
+  if (cmpmin > 0 || tree_int_cst_equal (vr.min (), vr.max ()))
     return cmpmin;
 
   return -1;
@@ -332,7 +328,7 @@  get_next_strinfo (strinfo *si)
 
 static int
 get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out,
-		 const vr_values *rvals = NULL)
+		 range_query *rvals = NULL)
 {
   HOST_WIDE_INT off;
   struct stridxlist *list, *last = NULL;
@@ -392,7 +388,7 @@  get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out,
    When nonnull, uses RVALS to determine range information.  */
 
 static int
-get_stridx (tree exp, wide_int offrng[2] = NULL, const vr_values *rvals = NULL)
+get_stridx (tree exp, wide_int offrng[2] = NULL, range_query *rvals = NULL)
 {
   if (offrng)
     offrng[0] = offrng[1] = wi::zero (TYPE_PRECISION (ptrdiff_type_node));
@@ -474,7 +470,7 @@  get_stridx (tree exp, wide_int offrng[2] = NULL, const vr_values *rvals = NULL)
 		       return the index corresponding to the SSA_NAME.
 		       Do this irrespective of the whether the offset
 		       is known.  */
-		    if (get_range (off, offrng, rvals))
+		    if (get_range (off, def_stmt, offrng, rvals))
 		      {
 			/* When the offset range is known, increment it
 			   it by the constant offset computed in prior
@@ -864,11 +860,11 @@  get_string_length (strinfo *si)
 }
 
 /* Dump strlen data to FP for statement STMT.  When non-null, RVALS
-   points to EVRP info and is used to dump strlen range for non-constant
-   results.  */
+   points to the valuation engine used to calculate ranges, and is
+   used to dump strlen range for non-constant results.  */
 
 DEBUG_FUNCTION void
-dump_strlen_info (FILE *fp, gimple *stmt, const vr_values *rvals)
+dump_strlen_info (FILE *fp, gimple *stmt, range_query *rvals)
 {
   if (stmt)
     {
@@ -909,14 +905,15 @@  dump_strlen_info (FILE *fp, gimple *stmt, const vr_values *rvals)
 		      wide_int min, max;
 		      if (rvals)
 			{
-			  const value_range *vr
-			    = CONST_CAST (class vr_values *, rvals)
-			    ->get_value_range (si->nonzero_chars);
-			  rng = vr->kind ();
-			  if (range_int_cst_p (vr))
+			  value_range vr;
+			  if (!rvals->range_of_expr (vr, si->nonzero_chars,
+						     si->stmt))
+			    vr.set_varying (TREE_TYPE (si->nonzero_chars));
+			  rng = vr.kind ();
+			  if (range_int_cst_p (&vr))
 			    {
-			      min = wi::to_wide (vr->min ());
-			      max = wi::to_wide (vr->max ());
+			      min = wi::to_wide (vr.min ());
+			      max = wi::to_wide (vr.max ());
 			    }
 			  else
 			    rng = VR_UNDEFINED;
@@ -1004,13 +1001,14 @@  dump_strlen_info (FILE *fp, gimple *stmt, const vr_values *rvals)
 
 /* Attempt to determine the length of the string SRC.  On success, store
    the length in *PDATA and return true.  Otherwise, return false.
-   VISITED is a bitmap of visited PHI nodes.  RVALS points to EVRP info
-   and PSSA_DEF_MAX to an SSA_NAME assignment limit used to prevent runaway
-   recursion.  */
+   VISITED is a bitmap of visited PHI nodes.  RVALS points to the valuation
+   engine used to calculate ranges.  PSSA_DEF_MAX to an SSA_NAME
+   assignment limit used to prevent runaway recursion.  */
 
 static bool
-get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited,
-			  const vr_values *rvals, unsigned *pssa_def_max)
+get_range_strlen_dynamic (tree src, gimple *stmt,
+			  c_strlen_data *pdata, bitmap *visited,
+			  range_query *rvals, unsigned *pssa_def_max)
 {
   int idx = get_stridx (src);
   if (!idx)
@@ -1042,8 +1040,8 @@  get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited,
 		    continue;
 
 		  c_strlen_data argdata = { };
-		  if (get_range_strlen_dynamic (arg, &argdata, visited, rvals,
-						pssa_def_max))
+		  if (get_range_strlen_dynamic (arg, phi, &argdata, visited,
+						rvals, pssa_def_max))
 		    {
 		      /* Set the DECL of an unterminated array this argument
 			 refers to if one hasn't been found yet.  */
@@ -1110,14 +1108,13 @@  get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited,
 	    pdata->minlen = si->nonzero_chars;
 	  else if (TREE_CODE (si->nonzero_chars) == SSA_NAME)
 	    {
-	      const value_range_equiv *vr
-		= CONST_CAST (class vr_values *, rvals)
-		->get_value_range (si->nonzero_chars);
-	      if (vr->kind () == VR_RANGE
-		  && range_int_cst_p (vr))
+	      value_range vr;
+	      if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt))
+		vr.set_varying (TREE_TYPE (si->nonzero_chars));
+	      if (range_int_cst_p (&vr))
 		{
-		  pdata->minlen = vr->min ();
-		  pdata->maxlen = vr->max ();
+		  pdata->minlen = vr.min ();
+		  pdata->maxlen = vr.max ();
 		}
 	      else
 		pdata->minlen = build_zero_cst (size_type_node);
@@ -1156,14 +1153,13 @@  get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited,
 	}
       else if (pdata->minlen && TREE_CODE (pdata->minlen) == SSA_NAME)
 	{
-	  const value_range_equiv *vr
-	    = CONST_CAST (class vr_values *, rvals)
-	    ->get_value_range (si->nonzero_chars);
-	  if (vr->kind () == VR_RANGE
-	      && range_int_cst_p (vr))
+	  value_range vr;
+	  if (!rvals->range_of_expr (vr, si->nonzero_chars, stmt))
+	    vr.set_varying (TREE_TYPE (si->nonzero_chars));
+	  if (range_int_cst_p (&vr))
 	    {
-	      pdata->minlen = vr->min ();
-	      pdata->maxlen = vr->max ();
+	      pdata->minlen = vr.min ();
+	      pdata->maxlen = vr.max ();
 	      pdata->maxbound = pdata->maxlen;
 	    }
 	  else
@@ -1198,17 +1194,17 @@  get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited,
    Try to obtain the range of the lengths of the string(s) referenced
    by SRC, or the size of the largest array SRC refers to if the range
    of lengths cannot be determined, and store all in *PDATA.  RVALS
-   points to EVRP info.  */
+   points to the valuation engine used to calculate ranges.  */
 
 void
-get_range_strlen_dynamic (tree src, c_strlen_data *pdata,
-			  const vr_values *rvals)
+get_range_strlen_dynamic (tree src, gimple *stmt, c_strlen_data *pdata,
+			  range_query *rvals)
 {
   bitmap visited = NULL;
   tree maxbound = pdata->maxbound;
 
   unsigned limit = param_ssa_name_def_chain_limit;
-  if (!get_range_strlen_dynamic (src, pdata, &visited, rvals, &limit))
+  if (!get_range_strlen_dynamic (src, stmt, pdata, &visited, rvals, &limit))
     {
       /* On failure extend the length range to an impossible maximum
 	 (a valid MAXLEN must be less than PTRDIFF_MAX - 1).  Other
@@ -1803,6 +1799,7 @@  set_strlen_range (tree lhs, wide_int min, wide_int max,
       else if (TREE_CODE (bound) == SSA_NAME)
 	{
 	  wide_int minbound, maxbound;
+	  // FIXME: Use range_query instead of global ranges.
 	  value_range_kind rng = get_range_info (bound, &minbound, &maxbound);
 	  if (rng == VR_RANGE)
 	    {
@@ -1907,7 +1904,7 @@  maybe_set_strlen_range (tree lhs, tree src, tree bound)
 
 static void
 maybe_warn_overflow (gimple *stmt, tree len,
-		     const vr_values *rvals = NULL,
+		     range_query *rvals = NULL,
 		     strinfo *si = NULL, bool plus_one = false,
 		     bool rawmem = false)
 {
@@ -1959,7 +1956,7 @@  maybe_warn_overflow (gimple *stmt, tree len,
 	  tree off = TREE_OPERAND (ref, 1);
 	  ref = TREE_OPERAND (ref, 0);
 	  wide_int rng[2];
-	  if (get_range (off, rng, rvals))
+	  if (get_range (off, stmt, rng, rvals))
 	    {
 	      /* Convert offsets to the maximum precision.  */
 	      offrng[0] = widest_int::from (rng[0], SIGNED);
@@ -1977,7 +1974,7 @@  maybe_warn_overflow (gimple *stmt, tree len,
 	  tree mem_off = TREE_OPERAND (ref, 1);
 	  ref = TREE_OPERAND (ref, 0);
 	  wide_int rng[2];
-	  if (get_range (mem_off, rng, rvals))
+	  if (get_range (mem_off, stmt, rng, rvals))
 	    {
 	      offrng[0] += widest_int::from (rng[0], SIGNED);
 	      offrng[1] += widest_int::from (rng[1], SIGNED);
@@ -2049,7 +2046,7 @@  maybe_warn_overflow (gimple *stmt, tree len,
 	    }
 
 	  wide_int rng[2];
-	  if (get_range (destsize, rng, rvals))
+	  if (get_range (destsize, stmt, rng, rvals))
 	    {
 	      sizrng[0] = widest_int::from (rng[0], UNSIGNED);
 	      sizrng[1] = widest_int::from (rng[1], UNSIGNED);
@@ -2080,7 +2077,7 @@  maybe_warn_overflow (gimple *stmt, tree len,
     return;
 
   wide_int rng[2];
-  if (!get_range (len, rng, rvals))
+  if (!get_range (len, stmt, rng, rvals))
     return;
 
   widest_int lenrng[2] =
@@ -2231,7 +2228,7 @@  maybe_warn_overflow (gimple *stmt, tree len,
   if (destoff)
     {
       wide_int rng[2];
-      if (get_range (destoff, rng))
+      if (get_range (destoff, stmt, rng))
 	{
 	  offrng[0] = widest_int::from (rng[0], SIGNED);
 	  offrng[1] = widest_int::from (rng[1], SIGNED);
@@ -2339,7 +2336,7 @@  maybe_warn_overflow (gimple *stmt, tree len,
 
 static inline void
 maybe_warn_overflow (gimple *stmt, unsigned HOST_WIDE_INT len,
-		     const vr_values *rvals = NULL, strinfo *si = NULL,
+		     range_query *rvals = NULL, strinfo *si = NULL,
 		     bool plus_one = false, bool rawmem = false)
 {
   maybe_warn_overflow (stmt, build_int_cst (size_type_node, len), rvals,
@@ -2642,7 +2639,7 @@  handle_builtin_strchr (gimple_stmt_iterator *gsi)
 
 static void
 handle_builtin_strcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi,
-		       const vr_values *rvals)
+		       range_query *rvals)
 {
   int idx, didx;
   tree src, dst, srclen, len, lhs, type, fn, oldlen;
@@ -3036,6 +3033,7 @@  maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt)
     cntrange[0] = cntrange[1] = wi::to_wide (cnt);
   else if (TREE_CODE (cnt) == SSA_NAME)
     {
+      // FIXME: Use range_query instead of global ranges.
       enum value_range_kind rng = get_range_info (cnt, cntrange, cntrange + 1);
       if (rng == VR_RANGE)
 	;
@@ -3444,7 +3442,7 @@  handle_builtin_stxncpy_strncat (bool append_p, gimple_stmt_iterator *gsi)
 
 static void
 handle_builtin_memcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi,
-		       const vr_values *rvals)
+		       range_query *rvals)
 {
   tree lhs, oldlen, newlen;
   gimple *stmt = gsi_stmt (*gsi);
@@ -3909,7 +3907,7 @@  handle_alloc_call (enum built_in_function bcode, gimple_stmt_iterator *gsi)
 
 static bool
 handle_builtin_memset (gimple_stmt_iterator *gsi, bool *zero_write,
-		       const vr_values *rvals)
+		       range_query *rvals)
 {
   gimple *memset_stmt = gsi_stmt (*gsi);
   tree ptr = gimple_call_arg (memset_stmt, 0);
@@ -4103,9 +4101,10 @@  handle_builtin_memcmp (gimple_stmt_iterator *gsi)
    determine range information. Returns true on success.  */
 
 static bool
-get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2],
+get_len_or_size (gimple *stmt, tree arg, int idx,
+		 unsigned HOST_WIDE_INT lenrng[2],
 		 unsigned HOST_WIDE_INT *size, bool *nulterm,
-		 const vr_values *rvals)
+		 range_query *rvals)
 {
   /* Invalidate.  */
   *size = HOST_WIDE_INT_M1U;
@@ -4140,6 +4139,7 @@  get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2],
       else if (TREE_CODE (si->nonzero_chars) == SSA_NAME)
 	{
 	  wide_int min, max;
+	  // FIXME: Use range_query instead of global ranges.
 	  value_range_kind rng = get_range_info (si->nonzero_chars, &min, &max);
 	  if (rng == VR_RANGE)
 	    {
@@ -4158,7 +4158,7 @@  get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2],
   /* Set MAXBOUND to an arbitrary non-null non-integer node as a request
      to have it set to the length of the longest string in a PHI.  */
   lendata.maxbound = arg;
-  get_range_strlen_dynamic (arg, &lendata, rvals);
+  get_range_strlen_dynamic (arg, stmt, &lendata, rvals);
 
   unsigned HOST_WIDE_INT maxbound = HOST_WIDE_INT_M1U;
   if (tree_fits_uhwi_p (lendata.maxbound)
@@ -4216,17 +4216,17 @@  get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2],
    Otherwise return null.  */
 
 static tree
-strxcmp_eqz_result (tree arg1, int idx1, tree arg2, int idx2,
+strxcmp_eqz_result (gimple *stmt, tree arg1, int idx1, tree arg2, int idx2,
 		    unsigned HOST_WIDE_INT bound, unsigned HOST_WIDE_INT len[2],
-		    unsigned HOST_WIDE_INT *psize, const vr_values *rvals)
+		    unsigned HOST_WIDE_INT *psize, range_query *rvals)
 {
   /* Determine the range the length of each string is in and whether it's
      known to be nul-terminated, or the size of the array it's stored in.  */
   bool nul1, nul2;
   unsigned HOST_WIDE_INT siz1, siz2;
   unsigned HOST_WIDE_INT len1rng[2], len2rng[2];
-  if (!get_len_or_size (arg1, idx1, len1rng, &siz1, &nul1, rvals)
-      || !get_len_or_size (arg2, idx2, len2rng, &siz2, &nul2, rvals))
+  if (!get_len_or_size (stmt, arg1, idx1, len1rng, &siz1, &nul1, rvals)
+      || !get_len_or_size (stmt, arg2, idx2, len2rng, &siz2, &nul2, rvals))
     return NULL_TREE;
 
   /* BOUND is set to HWI_M1U for strcmp and less to strncmp, and LENiRNG
@@ -4375,7 +4375,7 @@  maybe_warn_pointless_strcmp (gimple *stmt, HOST_WIDE_INT bound,
    another and false otherwise.  */
 
 static bool
-handle_builtin_string_cmp (gimple_stmt_iterator *gsi, const vr_values *rvals)
+handle_builtin_string_cmp (gimple_stmt_iterator *gsi, range_query *rvals)
 {
   gcall *stmt = as_a <gcall *> (gsi_stmt (*gsi));
   tree lhs = gimple_call_lhs (stmt);
@@ -4420,7 +4420,7 @@  handle_builtin_string_cmp (gimple_stmt_iterator *gsi, const vr_values *rvals)
     /* Try to determine if the two strings are either definitely equal
        or definitely unequal and if so, either fold the result to zero
        (when equal) or set the range of the result to ~[0, 0] otherwise.  */
-    if (tree eqz = strxcmp_eqz_result (arg1, idx1, arg2, idx2, bound,
+    if (tree eqz = strxcmp_eqz_result (stmt, arg1, idx1, arg2, idx2, bound,
 				       len, &siz, rvals))
       {
 	if (integer_zerop (eqz))
@@ -4457,8 +4457,9 @@  handle_builtin_string_cmp (gimple_stmt_iterator *gsi, const vr_values *rvals)
     unsigned HOST_WIDE_INT arsz1, arsz2;
     bool nulterm[2];
 
-    if (!get_len_or_size (arg1, idx1, len1rng, &arsz1, nulterm, rvals)
-	|| !get_len_or_size (arg2, idx2, len2rng, &arsz2, nulterm + 1, rvals))
+    if (!get_len_or_size (stmt, arg1, idx1, len1rng, &arsz1, nulterm, rvals)
+	|| !get_len_or_size (stmt, arg2, idx2, len2rng, &arsz2, nulterm + 1,
+			     rvals))
       return false;
 
     if (len1rng[0] == len1rng[1] && len1rng[0] < HOST_WIDE_INT_MAX)
@@ -4623,7 +4624,7 @@  int ssa_name_limit_t::next_ssa_name (tree ssa_name)
 static bool
 count_nonzero_bytes_addr (tree, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
 			  unsigned [3], bool *, bool *, bool *,
-			  const vr_values *, ssa_name_limit_t &);
+			  range_query *, ssa_name_limit_t &);
 
 /* Determines the minimum and maximum number of leading non-zero bytes
    in the representation of EXP and set LENRANGE[0] and LENRANGE[1]
@@ -4644,7 +4645,7 @@  static bool
 count_nonzero_bytes (tree exp, unsigned HOST_WIDE_INT offset,
 		     unsigned HOST_WIDE_INT nbytes,
 		     unsigned lenrange[3], bool *nulterm,
-		     bool *allnul, bool *allnonnul, const vr_values *rvals,
+		     bool *allnul, bool *allnonnul, range_query *rvals,
 		     ssa_name_limit_t &snlim)
 {
   if (TREE_CODE (exp) == SSA_NAME)
@@ -4836,7 +4837,7 @@  count_nonzero_bytes_addr (tree exp, unsigned HOST_WIDE_INT offset,
 			  unsigned HOST_WIDE_INT nbytes,
 			  unsigned lenrange[3], bool *nulterm,
 			  bool *allnul, bool *allnonnul,
-			  const vr_values *rvals, ssa_name_limit_t &snlim)
+			  range_query *rvals, ssa_name_limit_t &snlim)
 {
   int idx = get_stridx (exp);
   if (idx > 0)
@@ -4853,13 +4854,14 @@  count_nonzero_bytes_addr (tree exp, unsigned HOST_WIDE_INT offset,
       else if (si->nonzero_chars
 	       && TREE_CODE (si->nonzero_chars) == SSA_NAME)
 	{
-	  vr_values *v = CONST_CAST (vr_values *, rvals);
-	  const value_range_equiv *vr = v->get_value_range (si->nonzero_chars);
-	  if (vr->kind () != VR_RANGE || !range_int_cst_p (vr))
+	  value_range vr;
+	  if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt))
+	    vr.set_varying (TREE_TYPE (si->nonzero_chars));
+	  if (vr.kind () != VR_RANGE)
 	    return false;
 
-	  minlen = tree_to_uhwi (vr->min ());
-	  maxlen = tree_to_uhwi (vr->max ());
+	  minlen = tree_to_uhwi (vr.min ());
+	  maxlen = tree_to_uhwi (vr.max ());
 	}
       else
 	return false;
@@ -4948,7 +4950,7 @@  count_nonzero_bytes_addr (tree exp, unsigned HOST_WIDE_INT offset,
 
 static bool
 count_nonzero_bytes (tree exp, unsigned lenrange[3], bool *nulterm,
-		     bool *allnul, bool *allnonnul, const vr_values *rvals)
+		     bool *allnul, bool *allnonnul, range_query *rvals)
 {
   /* Set to optimistic values so the caller doesn't have to worry about
      initializing these and to what.  On success, the function will clear
@@ -4972,7 +4974,7 @@  count_nonzero_bytes (tree exp, unsigned lenrange[3], bool *nulterm,
 
 static bool
 handle_store (gimple_stmt_iterator *gsi, bool *zero_write,
-	      const vr_values *rvals)
+	      range_query *rvals)
 {
   int idx = -1;
   strinfo *si = NULL;
@@ -5382,7 +5384,7 @@  is_char_type (tree type)
 
 static bool
 strlen_check_and_optimize_call (gimple_stmt_iterator *gsi, bool *zero_write,
-				const vr_values *rvals)
+				range_query *rvals)
 {
   gimple *stmt = gsi_stmt (*gsi);
 
@@ -5473,7 +5475,7 @@  strlen_check_and_optimize_call (gimple_stmt_iterator *gsi, bool *zero_write,
 
 static void
 handle_integral_assign (gimple_stmt_iterator *gsi, bool *cleanup_eh,
-			const vr_values *rvals)
+			range_query *rvals)
 {
   gimple *stmt = gsi_stmt (*gsi);
   tree lhs = gimple_assign_lhs (stmt);
@@ -5565,6 +5567,7 @@  handle_integral_assign (gimple_stmt_iterator *gsi, bool *cleanup_eh,
 		  wide_int min, max;
 		  signop sign = TYPE_SIGN (lhs_type);
 		  int prec = TYPE_PRECISION (lhs_type);
+		  // FIXME: Use range_query instead of global ranges.
 		  value_range_kind vr = get_range_info (lhs, &min, &max);
 		  if (vr == VR_VARYING
 		      || (vr == VR_RANGE
@@ -5617,7 +5620,7 @@  handle_integral_assign (gimple_stmt_iterator *gsi, bool *cleanup_eh,
 
 static bool
 check_and_optimize_stmt (gimple_stmt_iterator *gsi, bool *cleanup_eh,
-			 const vr_values *rvals)
+			 range_query *rvals)
 {
   gimple *stmt = gsi_stmt (*gsi);
 
diff --git a/gcc/tree-ssa-strlen.h b/gcc/tree-ssa-strlen.h
index a11c4d579a1..225f64b1630 100644
--- a/gcc/tree-ssa-strlen.h
+++ b/gcc/tree-ssa-strlen.h
@@ -25,13 +25,14 @@  extern bool is_strlen_related_p (tree, tree);
 extern bool maybe_diag_stxncpy_trunc (gimple_stmt_iterator, tree, tree);
 extern tree set_strlen_range (tree, wide_int, wide_int, tree = NULL_TREE);
 
-class vr_values;
-extern tree get_range (tree, wide_int[2], const vr_values * = NULL);
+extern tree get_range (tree, gimple *, wide_int[2],
+		       class range_query * = NULL);
 
 struct c_strlen_data;
-extern void get_range_strlen_dynamic (tree , c_strlen_data *, const vr_values *);
+extern void get_range_strlen_dynamic (tree, gimple *, c_strlen_data *,
+				      class range_query *);
 
 /* APIs internal to strlen pass.  Defined in gimple-ssa-sprintf.c.  */
-extern bool handle_printf_call (gimple_stmt_iterator *,  const vr_values *);
+extern bool handle_printf_call (gimple_stmt_iterator *,  class range_query *);
 
 #endif   // GCC_TREE_SSA_STRLEN_H
-- 
2.26.2