Message ID | 9bf12c00-b502-79b1-3cdc-b9256c2aa420@redhat.com |
---|---|
State | New |
Headers | show |
Series | generalized range_query class for multiple contexts | expand |
Forgot to include ChangeLog entries. Aldy
On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote: > As part of the ranger work, we have been trying to clean up and > generalize interfaces whenever possible. This not only helps in > reducing the maintenance burden going forward, but provides mechanisms > for backwards compatibility between ranger and other providers/users of > ranges throughout the compiler like evrp and VRP. > > One such interface is the range_query class in vr_values.h, which > provides a range query mechanism for use in the simplify_using_ranges > module. With it, simplify_using_ranges can be used with the ranger, or > the VRP twins by providing a get_value_range() method. This has helped > us in comparing apples to apples while doing our work, and has also > future proofed the interface so that asking for a range can be done > within the context in which it appeared. For example, get_value_range > now takes a gimple statement which provides context. We are no longer > tied to asking for a global SSA range, but can ask for the range of an > SSA within a statement. Granted, this functionality is currently only > in the ranger, but evrp/vrp could be adapted to pass such context. > > The range_query is a good first step, but what we really want is a > generic query mechanism that can ask for SSA ranges within an > expression, a statement, an edge, or anything else that may come up. We > think that a generic mechanism can be used not only for range producers, > but consumers such as the substitute_and_fold_engine (see get_value > virtual) and possibly the gimple folder (see valueize). > > The attached patchset provides such an interface. It is meant to be a > replacement for range_query that can be used for vr_values, > substitute_and_fold, the subsitute_and_fold_engine, as well as the > ranger. The general API is: > > class value_query > { > public: > // Return the singleton expression for NAME at a gimple statement, > // or NULL if none found. > virtual tree value_of_expr (tree name, gimple * = NULL) = 0; > // Return the singleton expression for NAME at an edge, or NULL if > // none found. > virtual tree value_on_edge (edge, tree name); > // Return the singleton expression for the LHS of a gimple > // statement, assuming an (optional) initial value of NAME. Returns > // NULL if none found. > // > // Note this method calculates the range the LHS would have *after* > // the statement has executed. > virtual tree value_of_stmt (gimple *, tree name = NULL); > }; > > class range_query : public value_query > { > public: > range_query (); > virtual ~range_query (); > > virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE; > virtual tree value_on_edge (edge, tree name) OVERRIDE; > virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE; > > // These are the range equivalents of the value_* methods. Instead > // of returning a singleton, they calculate a range and return it in > // R. TRUE is returned on success or FALSE if no range was found. > virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0; > virtual bool range_on_edge (irange &r, edge, tree name); > virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL); > > // DEPRECATED: This method is used from vr-values. The plan is to > // rewrite all uses of it to the above API. > virtual const class value_range_equiv *get_value_range (const_tree, > gimple * = NULL); > }; > > The duality of the API (value_of_* and range_on_*) is because some > passes are interested in a singleton value > (substitute_and_fold_enginge), while others are interested in ranges > (vr_values). Passes that are only interested in singletons can take a > value_query, while passes that are interested in full ranges, can take a > range_query. Of course, for future proofing, we would recommend taking > a range_query, since if you provide a default range_of_expr, sensible > defaults will be provided for the others in terms of range_of_expr. > > Note, that the absolute bare minimum that must be provided is a > value_of_expr and a range_of_expr respectively. > > One piece of the API which is missing is a method to return the range > of an arbitrary SSA_NAME *after* a statement. Currently range_of_expr > calculates the range of an expression upon entry to the statement, > whereas range_of_stmt calculates the range of *only* the LHS of a > statement AFTER the statement has executed. > > This would allow for complete representation of the ranges/values in > something like: > > d_4 = *g_7; > > Here the range of g_7 upon entry could be VARYING, but after the > dereference we know it must be non-zero. Well for sane targets anyhow. > > Choices would be to: > > 1) add a 4th method such as "range_after_stmt", or > > 2) merge that functionality with the existing range_of_stmt method to > provide "after" functionality for any ssa_name. Currently the SSA_NAME > must be the same as the LHS if specified. It also does not need to be > specified to allow evaluation of statements without a LHS, (if (x < 1) > has no LHS, but will return [0,0] [1,1] or [0,1] as there is a boolean > "result" to the statement. > > Also, we've debated whether to use the two classes (value_query and > range_query) separately as above, or perhaps join them into one class. I > personally like the separation of powers. Some users only care for > singletons, no sense in polluting their implementations with talk of > ranges. But we're open to suggestions. > > The patchset is divided into 3 parts: > > 1. Generic implementation of class. Just a few comments on the new APIs in part 1. Here I would again recommend preventing the copying and assigning the new range_query class. Ditto for equiv_allocator unless its base takes care of that. For value_query, since it has a pure virtual function, I would suggest to declare its default ctor protected (and = default). That way, if someone tries to define an object of it, they'll get a nice compilation error as opposed to a confusing linker error. The range_query default ctor uses the new epression to allocate and construct its equiv_allocator object but the range_query dtor just calls release on the object without invoking the delete expression on it. Unless the release() function does that (I'd hope not but I have seen it) that seems like a leak. Also, defining virtual functions to take default arguments tends to be tricky and is generally discouraged. Usually the default argument value from the base class is used, unless the overridden virtual function is called directly (either via Derived::foo() or via ptr_to_derived->foo()). To avoid surprises all overridden virtual functions need to have the same defaults as the base, which is easy to get wrong without compiler warnings. One way to get around this is to define a non-virtual public function with defaults in the base class and have it call a protected virtual without the defaults. Finally, unless both a type and function with the same name exist in the same scope there is no reason to mention the class-id when referencing a class name. I.e., this value_range_equiv *allocate_value_range_equiv (); void free_value_range_equiv (value_range_equiv *); is the same as this: class value_range_equiv *allocate_value_range_equiv (); void free_value_range_equiv (class value_range_equiv *); but the former is shorter and clearer (and in line with existing practice). I'm hoping to look at the remaining parts soon (especially part 3) but right now I'm being summoned to dinner. Martin > > 2. Conversion of vr_values and substitute_and_fold_engine to class. > > The conversion of the substitute_and_fold_engine class involved changing > all calls to get_value() to a corresponding method providing context. > For example, when the edge or statement is available, we pass these to > the appropriate method. > > With this, we've been able to plug in the ranger into our hybrid evrp, > which can then get better ranges. Our hybrid evrp has moved a lot of > optimizations that were happening much later into the early vrp stage. > > 3. Conversion of sprintf/strlen pass to class. > > This is a nonfunctional change to the sprintf/strlen passes. That is, > no effort was made to change the passes to multi-ranges. However, with > this patch, we are able to plug in a ranger or evrp with just a few > lines, since the range query mechanism share a common API. > > --------------------- > > We think this interface is generic enough to be used in any place that > queries ranges or singletons, basically any place we have *valueize* in > the compiler. It can also be used to replace range generators at will, > or even combine them to get the best of both worlds. For example, > places that currently have a value_* interface because they work with > constants, like CCP, could have an alternate valueizer plugged in and if > a range folds to a constant, CCP could then propagate that constant. > > Tested on x86-64 Linux. > > Aldy
On 9/24/20 1:53 AM, Martin Sebor wrote: > Finally, unless both a type and function with the same name exist > in the same scope there is no reason to mention the class-id when > referencing a class name. I.e., this > > value_range_equiv *allocate_value_range_equiv (); > void free_value_range_equiv (value_range_equiv *); > > is the same as this: > > class value_range_equiv *allocate_value_range_equiv (); > void free_value_range_equiv (class value_range_equiv *); > > but the former is shorter and clearer (and in line with existing > practice). value_range_equiv may not be defined in the scope of range-query.h, so that is why the class specifier is there. Aldy
On 9/24/20 12:46 AM, Aldy Hernandez wrote: > > > On 9/24/20 1:53 AM, Martin Sebor wrote: > >> Finally, unless both a type and function with the same name exist >> in the same scope there is no reason to mention the class-id when >> referencing a class name. I.e., this >> >> value_range_equiv *allocate_value_range_equiv (); >> void free_value_range_equiv (value_range_equiv *); >> >> is the same as this: >> >> class value_range_equiv *allocate_value_range_equiv (); >> void free_value_range_equiv (class value_range_equiv *); >> >> but the former is shorter and clearer (and in line with existing >> practice). > > value_range_equiv may not be defined in the scope of range-query.h, so > that is why the class specifier is there. I see. It's probably a reflection of my C++ background that this style catches my eye. In C++ I think it's more common to introduce a forward declaration of a class before using it. Just as a side note, the first declaration of a type introduces it into the enclosing namespace so that from that point forward it can be used without the class-id. E.g., this is valid: struct A { // Need class here... class B *f (); // ...but not here... void g (B *); }; // ...or even here: B* A::f () { return 0; } Either way, the code is correct as is and I don't object to it, just noting that (at least some of) the class-ids are redundant. Martin
On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote: > As part of the ranger work, we have been trying to clean up and > generalize interfaces whenever possible. This not only helps in > reducing the maintenance burden going forward, but provides mechanisms > for backwards compatibility between ranger and other providers/users of > ranges throughout the compiler like evrp and VRP. > > One such interface is the range_query class in vr_values.h, which > provides a range query mechanism for use in the simplify_using_ranges > module. With it, simplify_using_ranges can be used with the ranger, or > the VRP twins by providing a get_value_range() method. This has helped > us in comparing apples to apples while doing our work, and has also > future proofed the interface so that asking for a range can be done > within the context in which it appeared. For example, get_value_range > now takes a gimple statement which provides context. We are no longer > tied to asking for a global SSA range, but can ask for the range of an > SSA within a statement. Granted, this functionality is currently only > in the ranger, but evrp/vrp could be adapted to pass such context. > > The range_query is a good first step, but what we really want is a > generic query mechanism that can ask for SSA ranges within an > expression, a statement, an edge, or anything else that may come up. We > think that a generic mechanism can be used not only for range producers, > but consumers such as the substitute_and_fold_engine (see get_value > virtual) and possibly the gimple folder (see valueize). > > The attached patchset provides such an interface. It is meant to be a > replacement for range_query that can be used for vr_values, > substitute_and_fold, the subsitute_and_fold_engine, as well as the > ranger. The general API is: > > class value_query > { > public: > // Return the singleton expression for NAME at a gimple statement, > // or NULL if none found. > virtual tree value_of_expr (tree name, gimple * = NULL) = 0; > // Return the singleton expression for NAME at an edge, or NULL if > // none found. > virtual tree value_on_edge (edge, tree name); > // Return the singleton expression for the LHS of a gimple > // statement, assuming an (optional) initial value of NAME. Returns > // NULL if none found. > // > // Note this method calculates the range the LHS would have *after* > // the statement has executed. > virtual tree value_of_stmt (gimple *, tree name = NULL); > }; > > class range_query : public value_query > { > public: > range_query (); > virtual ~range_query (); > > virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE; > virtual tree value_on_edge (edge, tree name) OVERRIDE; > virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE; > > // These are the range equivalents of the value_* methods. Instead > // of returning a singleton, they calculate a range and return it in > // R. TRUE is returned on success or FALSE if no range was found. > virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0; > virtual bool range_on_edge (irange &r, edge, tree name); > virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL); > > // DEPRECATED: This method is used from vr-values. The plan is to > // rewrite all uses of it to the above API. > virtual const class value_range_equiv *get_value_range (const_tree, > gimple * = NULL); > }; > > The duality of the API (value_of_* and range_on_*) is because some > passes are interested in a singleton value > (substitute_and_fold_enginge), while others are interested in ranges > (vr_values). Passes that are only interested in singletons can take a > value_query, while passes that are interested in full ranges, can take a > range_query. Of course, for future proofing, we would recommend taking > a range_query, since if you provide a default range_of_expr, sensible > defaults will be provided for the others in terms of range_of_expr. > > Note, that the absolute bare minimum that must be provided is a > value_of_expr and a range_of_expr respectively. > > One piece of the API which is missing is a method to return the range > of an arbitrary SSA_NAME *after* a statement. Currently range_of_expr > calculates the range of an expression upon entry to the statement, > whereas range_of_stmt calculates the range of *only* the LHS of a > statement AFTER the statement has executed. > > This would allow for complete representation of the ranges/values in > something like: > > d_4 = *g_7; > > Here the range of g_7 upon entry could be VARYING, but after the > dereference we know it must be non-zero. Well for sane targets anyhow. > > Choices would be to: > > 1) add a 4th method such as "range_after_stmt", or > > 2) merge that functionality with the existing range_of_stmt method to > provide "after" functionality for any ssa_name. Currently the SSA_NAME > must be the same as the LHS if specified. It also does not need to be > specified to allow evaluation of statements without a LHS, (if (x < 1) > has no LHS, but will return [0,0] [1,1] or [0,1] as there is a boolean > "result" to the statement. > > Also, we've debated whether to use the two classes (value_query and > range_query) separately as above, or perhaps join them into one class. I > personally like the separation of powers. Some users only care for > singletons, no sense in polluting their implementations with talk of > ranges. But we're open to suggestions. > > The patchset is divided into 3 parts: > > 1. Generic implementation of class. > > 2. Conversion of vr_values and substitute_and_fold_engine to class. > > The conversion of the substitute_and_fold_engine class involved changing > all calls to get_value() to a corresponding method providing context. > For example, when the edge or statement is available, we pass these to > the appropriate method. > > With this, we've been able to plug in the ranger into our hybrid evrp, > which can then get better ranges. Our hybrid evrp has moved a lot of > optimizations that were happening much later into the early vrp stage. > > 3. Conversion of sprintf/strlen pass to class. > > This is a nonfunctional change to the sprintf/strlen passes. That is, > no effort was made to change the passes to multi-ranges. However, with > this patch, we are able to plug in a ranger or evrp with just a few > lines, since the range query mechanism share a common API. Thanks for doing all this! There isn't anything I don't understand in the sprintf changes so no questions from me (well, almost none). Just some comments: The current call statement is available in all functions that take a directive argument, as dir->info.callstmt. There should be no need to also add it as a new argument to the functions that now need it. The change adds code along these lines in a bunch of places: + value_range vr; + if (!query->range_of_expr (vr, arg, stmt)) + vr.set_varying (TREE_TYPE (arg)); I thought under the new Ranger APIs when a range couldn't be determined it would be automatically set to the maximum for the type. I like that and have been moving in that direction with my code myself (rather than having an API fail, have it set the max range and succeed). Since that isn't so in this case, I think it would still be nice if the added code could be written as if the range were set to varying in this case and (ideally) reduced to just initialization: value_range vr = some-function (query, stmt, arg); some-function could be an inline helper defined just for the sprintf pass (and maybe also strlen which also seems to use the same pattern), or it could be a value_range AKA irange ctor, or it could be a member of range_query, whatever is the most appropriate. (If assigning/copying a value_range is thought to be too expensive, declaring it first and then passing it to that helper to set it would work too). In strlen, is the removed comment no longer relevant? (I.e., does the ranger solve the problem?) - /* The range below may be "inaccurate" if a constant has been - substituted earlier for VAL by this pass that hasn't been - propagated through the CFG. This shoud be fixed by the new - on-demand VRP if/when it becomes available (hopefully in - GCC 11). */ I'm wondering about the comment added to get_range_strlen_dynamic and other places: + // FIXME: Use range_query instead of global ranges. Is that something you're planning to do in a followup or should I remember to do it at some point? Otherwise I have no concern with the changes. Thanks Martin > > --------------------- > > We think this interface is generic enough to be used in any place that > queries ranges or singletons, basically any place we have *valueize* in > the compiler. It can also be used to replace range generators at will, > or even combine them to get the best of both worlds. For example, > places that currently have a value_* interface because they work with > constants, like CCP, could have an alternate valueizer plugged in and if > a range folds to a constant, CCP could then propagate that constant. > > Tested on x86-64 Linux. > > Aldy
On 9/24/20 5:51 PM, Martin Sebor via Gcc-patches wrote: > On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote: >> >> >> 3. Conversion of sprintf/strlen pass to class. >> >> This is a nonfunctional change to the sprintf/strlen passes. That is, >> no effort was made to change the passes to multi-ranges. However, >> with this patch, we are able to plug in a ranger or evrp with just a >> few lines, since the range query mechanism share a common API. > > Thanks for doing all this! There isn't anything I don't understand > in the sprintf changes so no questions from me (well, almost none). > Just some comments: > > The current call statement is available in all functions that take > a directive argument, as dir->info.callstmt. There should be no need > to also add it as a new argument to the functions that now need it. > > The change adds code along these lines in a bunch of places: > > + value_range vr; > + if (!query->range_of_expr (vr, arg, stmt)) > + vr.set_varying (TREE_TYPE (arg)); > > I thought under the new Ranger APIs when a range couldn't be > determined it would be automatically set to the maximum for > the type. I like that and have been moving in that direction > with my code myself (rather than having an API fail, have it > set the max range and succeed). > Aldy will have to comment why that is there, probably an oversight The API should return VARYING if it cant calculate a better range. The only time the API returns a FALSE for a query is when the range is unsupported.. ie, you ask for the range of a float statement or argument. Andrew
On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote: > On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote: >> As part of the ranger work, we have been trying to clean up and >> generalize interfaces whenever possible. This not only helps in >> reducing the maintenance burden going forward, but provides >> mechanisms for backwards compatibility between ranger and other >> providers/users of ranges throughout the compiler like evrp and VRP. >> >> One such interface is the range_query class in vr_values.h, which >> provides a range query mechanism for use in the simplify_using_ranges >> module. With it, simplify_using_ranges can be used with the ranger, >> or the VRP twins by providing a get_value_range() method. This has >> helped us in comparing apples to apples while doing our work, and has >> also future proofed the interface so that asking for a range can be >> done within the context in which it appeared. For example, >> get_value_range now takes a gimple statement which provides context. >> We are no longer tied to asking for a global SSA range, but can ask >> for the range of an SSA within a statement. Granted, this >> functionality is currently only in the ranger, but evrp/vrp could be >> adapted to pass such context. >> >> The range_query is a good first step, but what we really want is a >> generic query mechanism that can ask for SSA ranges within an >> expression, a statement, an edge, or anything else that may come up. >> We think that a generic mechanism can be used not only for range >> producers, but consumers such as the substitute_and_fold_engine (see >> get_value virtual) and possibly the gimple folder (see valueize). >> >> The attached patchset provides such an interface. It is meant to be >> a replacement for range_query that can be used for vr_values, >> substitute_and_fold, the subsitute_and_fold_engine, as well as the >> ranger. The general API is: >> >> class value_query >> { >> public: >> // Return the singleton expression for NAME at a gimple statement, >> // or NULL if none found. >> virtual tree value_of_expr (tree name, gimple * = NULL) = 0; >> // Return the singleton expression for NAME at an edge, or NULL if >> // none found. >> virtual tree value_on_edge (edge, tree name); >> // Return the singleton expression for the LHS of a gimple >> // statement, assuming an (optional) initial value of NAME. Returns >> // NULL if none found. >> // >> // Note this method calculates the range the LHS would have *after* >> // the statement has executed. >> virtual tree value_of_stmt (gimple *, tree name = NULL); >> }; >> >> class range_query : public value_query >> { >> public: >> range_query (); >> virtual ~range_query (); >> >> virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE; >> virtual tree value_on_edge (edge, tree name) OVERRIDE; >> virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE; >> >> // These are the range equivalents of the value_* methods. Instead >> // of returning a singleton, they calculate a range and return it in >> // R. TRUE is returned on success or FALSE if no range was found. >> virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) >> = 0; >> virtual bool range_on_edge (irange &r, edge, tree name); >> virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL); >> >> // DEPRECATED: This method is used from vr-values. The plan is to >> // rewrite all uses of it to the above API. >> virtual const class value_range_equiv *get_value_range (const_tree, >> gimple * = NULL); >> }; >> >> The duality of the API (value_of_* and range_on_*) is because some >> passes are interested in a singleton value >> (substitute_and_fold_enginge), while others are interested in ranges >> (vr_values). Passes that are only interested in singletons can take >> a value_query, while passes that are interested in full ranges, can >> take a range_query. Of course, for future proofing, we would >> recommend taking a range_query, since if you provide a default >> range_of_expr, sensible defaults will be provided for the others in >> terms of range_of_expr. >> >> Note, that the absolute bare minimum that must be provided is a >> value_of_expr and a range_of_expr respectively. >> >> One piece of the API which is missing is a method to return the >> range of an arbitrary SSA_NAME *after* a statement. Currently >> range_of_expr calculates the range of an expression upon entry to the >> statement, whereas range_of_stmt calculates the range of *only* the >> LHS of a statement AFTER the statement has executed. >> >> This would allow for complete representation of the ranges/values in >> something like: >> >> d_4 = *g_7; >> >> Here the range of g_7 upon entry could be VARYING, but after the >> dereference we know it must be non-zero. Well for sane targets anyhow. >> >> Choices would be to: >> >> 1) add a 4th method such as "range_after_stmt", or >> >> 2) merge that functionality with the existing range_of_stmt method >> to provide "after" functionality for any ssa_name. Currently the >> SSA_NAME must be the same as the LHS if specified. It also does not >> need to be specified to allow evaluation of statements without a LHS, >> (if (x < 1) has no LHS, but will return [0,0] [1,1] or [0,1] as there >> is a boolean "result" to the statement. >> >> Also, we've debated whether to use the two classes (value_query and >> range_query) separately as above, or perhaps join them into one >> class. I personally like the separation of powers. Some users only >> care for singletons, no sense in polluting their implementations with >> talk of ranges. But we're open to suggestions. >> >> The patchset is divided into 3 parts: >> >> 1. Generic implementation of class. > > Just a few comments on the new APIs in part 1. > > Here I would again recommend preventing the copying and assigning > the new range_query class. Ditto for equiv_allocator unless its > base takes care of that. For value_query, since it has a pure > virtual function, I would suggest to declare its default ctor > protected (and = default). That way, if someone tries to define > an object of it, they'll get a nice compilation error as opposed > to a confusing linker error. > > The range_query default ctor uses the new epression to allocate > and construct its equiv_allocator object but the range_query dtor > just calls release on the object without invoking the delete > expression on it. Unless the release() function does that (I'd > hope not but I have seen it) that seems like a leak. > looks like an oversight, definitely call delete. > Also, defining virtual functions to take default arguments tends > to be tricky and is generally discouraged. Usually the default > argument value from the base class is used, unless the overridden > virtual function is called directly (either via Derived::foo() > or via ptr_to_derived->foo()). To avoid surprises all overridden > virtual functions need to have the same defaults as the base, > which is easy to get wrong without compiler warnings. > > One way to get around this is to define a non-virtual public > function with defaults in the base class and have it call > a protected virtual without the defaults. > > I'm not overly concerned about this.. If there is a default, its always NULL, and it is simply to bypass having multiple versions of the routine which come with varies degreees of confusion of its own. I experimented with multiple approaches, including that one.. I found each thing was a little awkward one way or another. Given a NULL default is going to be pretty consistent, I'd leave it be for now. Its primarily for legacy code, and when we nuke those uses down the road, we could consider removing the default, forcing the client to explicitly provide the NULL indicating the know they are looking for a global value, not a contextual one. Since you have replied to this thread, whats your opinion whether there should be an extra API entry point for range/value_after_stmt or whether that should be rolled into the range_of_stmt routine, and any ssa-name specified would be the it's range after the stmt.. perhaps renaming it to something like "range_post_stmt" and if no ssa-name is specified, it applies to the stmt result. Andrew
On 9/25/20 3:17 PM, Andrew MacLeod wrote: > On 9/24/20 5:51 PM, Martin Sebor via Gcc-patches wrote: >> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote: >>> >>> >>> 3. Conversion of sprintf/strlen pass to class. >>> >>> This is a nonfunctional change to the sprintf/strlen passes. That is, >>> no effort was made to change the passes to multi-ranges. However, >>> with this patch, we are able to plug in a ranger or evrp with just a >>> few lines, since the range query mechanism share a common API. >> >> Thanks for doing all this! There isn't anything I don't understand >> in the sprintf changes so no questions from me (well, almost none). >> Just some comments: >> >> The current call statement is available in all functions that take >> a directive argument, as dir->info.callstmt. There should be no need >> to also add it as a new argument to the functions that now need it. >> >> The change adds code along these lines in a bunch of places: >> >> + value_range vr; >> + if (!query->range_of_expr (vr, arg, stmt)) >> + vr.set_varying (TREE_TYPE (arg)); >> >> I thought under the new Ranger APIs when a range couldn't be >> determined it would be automatically set to the maximum for >> the type. I like that and have been moving in that direction >> with my code myself (rather than having an API fail, have it >> set the max range and succeed). >> > Aldy will have to comment why that is there, probably an oversight The > API should return VARYING if it cant calculate a better range. The only > time the API returns a FALSE for a query is when the range is > unsupported.. ie, you ask for the range of a float statement or argument. Right on both counts. We could ignore the return value, or my preferred method (which I know Andrew hates): gcc_assert (query->range_of_expr(vr, arg, stmt)); Aldy
On 9/25/20 11:41 AM, Andrew MacLeod wrote: > On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote: >> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote: >>> As part of the ranger work, we have been trying to clean up and >>> generalize interfaces whenever possible. This not only helps in >>> reducing the maintenance burden going forward, but provides >>> mechanisms for backwards compatibility between ranger and other >>> providers/users of ranges throughout the compiler like evrp and VRP. >>> >>> One such interface is the range_query class in vr_values.h, which >>> provides a range query mechanism for use in the simplify_using_ranges >>> module. With it, simplify_using_ranges can be used with the ranger, >>> or the VRP twins by providing a get_value_range() method. This has >>> helped us in comparing apples to apples while doing our work, and has >>> also future proofed the interface so that asking for a range can be >>> done within the context in which it appeared. For example, >>> get_value_range now takes a gimple statement which provides context. >>> We are no longer tied to asking for a global SSA range, but can ask >>> for the range of an SSA within a statement. Granted, this >>> functionality is currently only in the ranger, but evrp/vrp could be >>> adapted to pass such context. >>> >>> The range_query is a good first step, but what we really want is a >>> generic query mechanism that can ask for SSA ranges within an >>> expression, a statement, an edge, or anything else that may come up. >>> We think that a generic mechanism can be used not only for range >>> producers, but consumers such as the substitute_and_fold_engine (see >>> get_value virtual) and possibly the gimple folder (see valueize). >>> >>> The attached patchset provides such an interface. It is meant to be >>> a replacement for range_query that can be used for vr_values, >>> substitute_and_fold, the subsitute_and_fold_engine, as well as the >>> ranger. The general API is: >>> >>> class value_query >>> { >>> public: >>> // Return the singleton expression for NAME at a gimple statement, >>> // or NULL if none found. >>> virtual tree value_of_expr (tree name, gimple * = NULL) = 0; >>> // Return the singleton expression for NAME at an edge, or NULL if >>> // none found. >>> virtual tree value_on_edge (edge, tree name); >>> // Return the singleton expression for the LHS of a gimple >>> // statement, assuming an (optional) initial value of NAME. Returns >>> // NULL if none found. >>> // >>> // Note this method calculates the range the LHS would have *after* >>> // the statement has executed. >>> virtual tree value_of_stmt (gimple *, tree name = NULL); >>> }; >>> >>> class range_query : public value_query >>> { >>> public: >>> range_query (); >>> virtual ~range_query (); >>> >>> virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE; >>> virtual tree value_on_edge (edge, tree name) OVERRIDE; >>> virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE; >>> >>> // These are the range equivalents of the value_* methods. Instead >>> // of returning a singleton, they calculate a range and return it in >>> // R. TRUE is returned on success or FALSE if no range was found. >>> virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) >>> = 0; >>> virtual bool range_on_edge (irange &r, edge, tree name); >>> virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL); >>> >>> // DEPRECATED: This method is used from vr-values. The plan is to >>> // rewrite all uses of it to the above API. >>> virtual const class value_range_equiv *get_value_range (const_tree, >>> gimple * = NULL); >>> }; >>> >>> The duality of the API (value_of_* and range_on_*) is because some >>> passes are interested in a singleton value >>> (substitute_and_fold_enginge), while others are interested in ranges >>> (vr_values). Passes that are only interested in singletons can take >>> a value_query, while passes that are interested in full ranges, can >>> take a range_query. Of course, for future proofing, we would >>> recommend taking a range_query, since if you provide a default >>> range_of_expr, sensible defaults will be provided for the others in >>> terms of range_of_expr. >>> >>> Note, that the absolute bare minimum that must be provided is a >>> value_of_expr and a range_of_expr respectively. >>> >>> One piece of the API which is missing is a method to return the >>> range of an arbitrary SSA_NAME *after* a statement. Currently >>> range_of_expr calculates the range of an expression upon entry to the >>> statement, whereas range_of_stmt calculates the range of *only* the >>> LHS of a statement AFTER the statement has executed. >>> >>> This would allow for complete representation of the ranges/values in >>> something like: >>> >>> d_4 = *g_7; >>> >>> Here the range of g_7 upon entry could be VARYING, but after the >>> dereference we know it must be non-zero. Well for sane targets anyhow. >>> >>> Choices would be to: >>> >>> 1) add a 4th method such as "range_after_stmt", or >>> >>> 2) merge that functionality with the existing range_of_stmt method >>> to provide "after" functionality for any ssa_name. Currently the >>> SSA_NAME must be the same as the LHS if specified. It also does not >>> need to be specified to allow evaluation of statements without a LHS, >>> (if (x < 1) has no LHS, but will return [0,0] [1,1] or [0,1] as there >>> is a boolean "result" to the statement. >>> >>> Also, we've debated whether to use the two classes (value_query and >>> range_query) separately as above, or perhaps join them into one >>> class. I personally like the separation of powers. Some users only >>> care for singletons, no sense in polluting their implementations with >>> talk of ranges. But we're open to suggestions. >>> >>> The patchset is divided into 3 parts: >>> >>> 1. Generic implementation of class. >> >> Just a few comments on the new APIs in part 1. >> >> Here I would again recommend preventing the copying and assigning >> the new range_query class. Ditto for equiv_allocator unless its >> base takes care of that. For value_query, since it has a pure >> virtual function, I would suggest to declare its default ctor >> protected (and = default). That way, if someone tries to define >> an object of it, they'll get a nice compilation error as opposed >> to a confusing linker error. >> >> The range_query default ctor uses the new epression to allocate >> and construct its equiv_allocator object but the range_query dtor >> just calls release on the object without invoking the delete >> expression on it. Unless the release() function does that (I'd >> hope not but I have seen it) that seems like a leak. >> > > looks like an oversight, definitely call delete. > >> Also, defining virtual functions to take default arguments tends >> to be tricky and is generally discouraged. Usually the default >> argument value from the base class is used, unless the overridden >> virtual function is called directly (either via Derived::foo() >> or via ptr_to_derived->foo()). To avoid surprises all overridden >> virtual functions need to have the same defaults as the base, >> which is easy to get wrong without compiler warnings. >> > >> One way to get around this is to define a non-virtual public >> function with defaults in the base class and have it call >> a protected virtual without the defaults. >> >> > I'm not overly concerned about this.. If there is a default, its always > NULL, and it is simply to bypass having multiple versions of the routine > which come with varies degreees of confusion of its own. I experimented > with multiple approaches, including that one.. I found each thing was a > little awkward one way or another. > > Given a NULL default is going to be pretty consistent, I'd leave it be > for now. Its primarily for legacy code, and when we nuke those uses > down the road, we could consider removing the default, forcing the > client to explicitly provide the NULL indicating the know they are > looking for a global value, not a contextual one. I agree that the risk with the null is lower than with other default values. I also don't have a sense of how common deriving from these classes will be so if it's expected to be rare it's probably not something to be concerned about. > Since you have replied to this thread, whats your opinion whether there > should be an extra API entry point for range/value_after_stmt or whether > that should be rolled into the range_of_stmt routine, and any ssa-name > specified would be the it's range after the stmt.. perhaps renaming it > to something like "range_post_stmt" and if no ssa-name is specified, it > applies to the stmt result. I prefer API parameterization over distinct names. It makes APIs easier for me to reason about, and when implementing them, often lets me avoid duplicating code between. A good rule of thumb is to ask if it might ever make sense to defer the decision to call one or the other alternative until runtime. If the answer isn't an unequivocal "no" then a single entry point with a parameter is probably a better choice. By the way, your question about API design makes me think about something else. A range is a generalization of a singleton value (it's a collection of singletons, which is particularly true with the new multi-range design). Conversely, a singleton value is a specialization of a range. It always makes sense to ask for the range of an expression, even if it evaluates to a constant. It only makes sense to ask for its value if it's know to evaluate to a constant. So if it's expected to be useful to make a distinction between the two concepts in the type system I would think it natural to model the relationship by having value_query derive from range_query rather than the other way around. Martin
On 9/28/20 11:27 AM, Martin Sebor wrote: > On 9/25/20 11:41 AM, Andrew MacLeod wrote: > >> Since you have replied to this thread, whats your opinion whether >> there should be an extra API entry point for range/value_after_stmt >> or whether that should be rolled into the range_of_stmt routine, and >> any ssa-name specified would be the it's range after the stmt.. >> perhaps renaming it to something like "range_post_stmt" and if no >> ssa-name is specified, it applies to the stmt result. > > I prefer API parameterization over distinct names. It makes APIs > easier for me to reason about, and when implementing them, often > lets me avoid duplicating code between. A good rule of thumb is > to ask if it might ever make sense to defer the decision to call > one or the other alternative until runtime. If the answer isn't > an unequivocal "no" then a single entry point with a parameter is > probably a better choice. > > By the way, your question about API design makes me think about > something else. A range is a generalization of a singleton value > (it's a collection of singletons, which is particularly true with > the new multi-range design). Conversely, a singleton value is > a specialization of a range. It always makes sense to ask for > the range of an expression, even if it evaluates to a constant. > It only makes sense to ask for its value if it's know to evaluate > to a constant. So if it's expected to be useful to make > a distinction between the two concepts in the type system I would > think it natural to model the relationship by having value_query > derive from range_query rather than the other way around. > We've considered a single class for queries rather than the existing derived structure, but it all depends on your point of view. You can have values without knowing about ranges, but you can't have ranges without knowing about values. Its persepective, so there isnt a really a "right" answer. Either can be implemented in terms of the other. Lots of places in the compiler deal with value callbacks, only a few know or care about ranges, and those usually care about both ranges and values... thus the current structure. Regardless, its a starting point and easy to combine as a single class later should that be worthwhile. Andrew
On 9/25/20 1:41 PM, Andrew MacLeod via Gcc-patches wrote: > On 9/23/20 7:53 PM, Martin Sebor via Gcc-patches wrote: >> On 9/18/20 12:38 PM, Aldy Hernandez via Gcc-patches wrote: >>> As part of the ranger work, we have been trying to clean up and >>> generalize interfaces whenever possible. This not only helps in >>> reducing the maintenance burden going forward, but provides >>> mechanisms for backwards compatibility between ranger and other >>> providers/users of ranges throughout the compiler like evrp and VRP. >>> >>> One such interface is the range_query class in vr_values.h, which >>> provides a range query mechanism for use in the >>> simplify_using_ranges module. With it, simplify_using_ranges can be >>> used with the ranger, or the VRP twins by providing a >>> get_value_range() method. This has helped us in comparing apples to >>> apples while doing our work, and has also future proofed the >>> interface so that asking for a range can be done within the context >>> in which it appeared. For example, get_value_range now takes a >>> gimple statement which provides context. We are no longer tied to >>> asking for a global SSA range, but can ask for the range of an SSA >>> within a statement. Granted, this functionality is currently only in >>> the ranger, but evrp/vrp could be adapted to pass such context. >>> >>> The range_query is a good first step, but what we really want is a >>> generic query mechanism that can ask for SSA ranges within an >>> expression, a statement, an edge, or anything else that may come >>> up. We think that a generic mechanism can be used not only for >>> range producers, but consumers such as the >>> substitute_and_fold_engine (see get_value virtual) and possibly the >>> gimple folder (see valueize). >>> >>> The attached patchset provides such an interface. It is meant to be >>> a replacement for range_query that can be used for vr_values, >>> substitute_and_fold, the subsitute_and_fold_engine, as well as the >>> ranger. The general API is: >>> >>> class value_query >>> { >>> public: >>> // Return the singleton expression for NAME at a gimple statement, >>> // or NULL if none found. >>> virtual tree value_of_expr (tree name, gimple * = NULL) = 0; >>> // Return the singleton expression for NAME at an edge, or NULL if >>> // none found. >>> virtual tree value_on_edge (edge, tree name); >>> // Return the singleton expression for the LHS of a gimple >>> // statement, assuming an (optional) initial value of NAME. Returns >>> // NULL if none found. >>> // >>> // Note this method calculates the range the LHS would have *after* >>> // the statement has executed. >>> virtual tree value_of_stmt (gimple *, tree name = NULL); >>> }; >>> >>> class range_query : public value_query >>> { >>> public: >>> range_query (); >>> virtual ~range_query (); >>> >>> virtual tree value_of_expr (tree name, gimple * = NULL) OVERRIDE; >>> virtual tree value_on_edge (edge, tree name) OVERRIDE; >>> virtual tree value_of_stmt (gimple *, tree name = NULL) OVERRIDE; >>> >>> // These are the range equivalents of the value_* methods. Instead >>> // of returning a singleton, they calculate a range and return it in >>> // R. TRUE is returned on success or FALSE if no range was found. >>> virtual bool range_of_expr (irange &r, tree name, gimple * = >>> NULL) = 0; >>> virtual bool range_on_edge (irange &r, edge, tree name); >>> virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL); >>> >>> // DEPRECATED: This method is used from vr-values. The plan is to >>> // rewrite all uses of it to the above API. >>> virtual const class value_range_equiv *get_value_range (const_tree, >>> gimple * = NULL); >>> }; >>> >>> The duality of the API (value_of_* and range_on_*) is because some >>> passes are interested in a singleton value >>> (substitute_and_fold_enginge), while others are interested in ranges >>> (vr_values). Passes that are only interested in singletons can take >>> a value_query, while passes that are interested in full ranges, can >>> take a range_query. Of course, for future proofing, we would >>> recommend taking a range_query, since if you provide a default >>> range_of_expr, sensible defaults will be provided for the others in >>> terms of range_of_expr. >>> >>> Note, that the absolute bare minimum that must be provided is a >>> value_of_expr and a range_of_expr respectively. >>> >>> One piece of the API which is missing is a method to return the >>> range of an arbitrary SSA_NAME *after* a statement. Currently >>> range_of_expr calculates the range of an expression upon entry to >>> the statement, whereas range_of_stmt calculates the range of *only* >>> the LHS of a statement AFTER the statement has executed. >>> >>> This would allow for complete representation of the ranges/values in >>> something like: >>> >>> d_4 = *g_7; >>> >>> Here the range of g_7 upon entry could be VARYING, but after the >>> dereference we know it must be non-zero. Well for sane targets anyhow. >>> >>> Choices would be to: >>> >>> 1) add a 4th method such as "range_after_stmt", or >>> >>> 2) merge that functionality with the existing range_of_stmt >>> method to provide "after" functionality for any ssa_name. Currently >>> the SSA_NAME must be the same as the LHS if specified. It also does >>> not need to be specified to allow evaluation of statements without a >>> LHS, (if (x < 1) has no LHS, but will return [0,0] [1,1] or [0,1] as >>> there is a boolean "result" to the statement. >>> >>> Also, we've debated whether to use the two classes (value_query and >>> range_query) separately as above, or perhaps join them into one >>> class. I personally like the separation of powers. Some users only >>> care for singletons, no sense in polluting their implementations >>> with talk of ranges. But we're open to suggestions. >>> >>> The patchset is divided into 3 parts: >>> >>> 1. Generic implementation of class. >> >> Just a few comments on the new APIs in part 1. >> >> Here I would again recommend preventing the copying and assigning >> the new range_query class. Ditto for equiv_allocator unless its >> base takes care of that. For value_query, since it has a pure >> virtual function, I would suggest to declare its default ctor >> protected (and = default). That way, if someone tries to define >> an object of it, they'll get a nice compilation error as opposed >> to a confusing linker error. >> >> The range_query default ctor uses the new epression to allocate >> and construct its equiv_allocator object but the range_query dtor >> just calls release on the object without invoking the delete >> expression on it. Unless the release() function does that (I'd >> hope not but I have seen it) that seems like a leak. >> > > looks like an oversight, definitely call delete. > >> Also, defining virtual functions to take default arguments tends >> to be tricky and is generally discouraged. Usually the default >> argument value from the base class is used, unless the overridden >> virtual function is called directly (either via Derived::foo() >> or via ptr_to_derived->foo()). To avoid surprises all overridden >> virtual functions need to have the same defaults as the base, >> which is easy to get wrong without compiler warnings. >> > >> One way to get around this is to define a non-virtual public >> function with defaults in the base class and have it call >> a protected virtual without the defaults. >> >> > I'm not overly concerned about this.. If there is a default, its > always NULL, and it is simply to bypass having multiple versions of > the routine which come with varies degreees of confusion of its own. > I experimented with multiple approaches, including that one.. I found > each thing was a little awkward one way or another. > > Given a NULL default is going to be pretty consistent, I'd leave it be > for now. Its primarily for legacy code, and when we nuke those uses > down the road, we could consider removing the default, forcing the > client to explicitly provide the NULL indicating the know they are > looking for a global value, not a contextual one. > > > Since you have replied to this thread, whats your opinion whether > there should be an extra API entry point for range/value_after_stmt or > whether that should be rolled into the range_of_stmt routine, and any > ssa-name specified would be the it's range after the stmt.. perhaps > renaming it to something like "range_post_stmt" and if no ssa-name is > specified, it applies to the stmt result. > > Andrew > > With the change to add the delete and DISABLE_COPY_AND_ASSIGN this is OK. Its not in widespread use and easy enough to tweak later. Andrew
> Thanks for doing all this! There isn't anything I don't understand > in the sprintf changes so no questions from me (well, almost none). > Just some comments: Thanks for your comments on the sprintf/strlen API conversion. > > The current call statement is available in all functions that take > a directive argument, as dir->info.callstmt. There should be no need > to also add it as a new argument to the functions that now need it. Fixed. > > The change adds code along these lines in a bunch of places: > > + value_range vr; > + if (!query->range_of_expr (vr, arg, stmt)) > + vr.set_varying (TREE_TYPE (arg)); > > I thought under the new Ranger APIs when a range couldn't be > determined it would be automatically set to the maximum for > the type. I like that and have been moving in that direction > with my code myself (rather than having an API fail, have it > set the max range and succeed). I went through all the above idioms and noticed all are being used on supported types (integers or pointers). So range_of_expr will always return true. I've removed the if() and the set_varying. > > Since that isn't so in this case, I think it would still be nice > if the added code could be written as if the range were set to > varying in this case and (ideally) reduced to just initialization: > > value_range vr = some-function (query, stmt, arg); > > some-function could be an inline helper defined just for the sprintf > pass (and maybe also strlen which also seems to use the same pattern), > or it could be a value_range AKA irange ctor, or it could be a member > of range_query, whatever is the most appropriate. > > (If assigning/copying a value_range is thought to be too expensive, > declaring it first and then passing it to that helper to set it > would work too). > > In strlen, is the removed comment no longer relevant? (I.e., does > the ranger solve the problem?) > > - /* The range below may be "inaccurate" if a constant has been > - substituted earlier for VAL by this pass that hasn't been > - propagated through the CFG. This shoud be fixed by the new > - on-demand VRP if/when it becomes available (hopefully in > - GCC 11). */ It should. > > I'm wondering about the comment added to get_range_strlen_dynamic > and other places: > > + // FIXME: Use range_query instead of global ranges. > > Is that something you're planning to do in a followup or should > I remember to do it at some point? I'm not planning on doing it. It's just a reminder that it would be beneficial to do so. > > Otherwise I have no concern with the changes. It's not cleared whether Andrew approved all 3 parts of the patchset or just the valuation part. I'll wait for his nod before committing this chunk. Aldy
On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >> Thanks for doing all this! There isn't anything I don't understand >> in the sprintf changes so no questions from me (well, almost none). >> Just some comments: > Thanks for your comments on the sprintf/strlen API conversion. > >> The current call statement is available in all functions that take >> a directive argument, as dir->info.callstmt. There should be no need >> to also add it as a new argument to the functions that now need it. > Fixed. > >> The change adds code along these lines in a bunch of places: >> >> + value_range vr; >> + if (!query->range_of_expr (vr, arg, stmt)) >> + vr.set_varying (TREE_TYPE (arg)); >> >> I thought under the new Ranger APIs when a range couldn't be >> determined it would be automatically set to the maximum for >> the type. I like that and have been moving in that direction >> with my code myself (rather than having an API fail, have it >> set the max range and succeed). > I went through all the above idioms and noticed all are being used on > supported types (integers or pointers). So range_of_expr will always > return true. I've removed the if() and the set_varying. > >> Since that isn't so in this case, I think it would still be nice >> if the added code could be written as if the range were set to >> varying in this case and (ideally) reduced to just initialization: >> >> value_range vr = some-function (query, stmt, arg); >> >> some-function could be an inline helper defined just for the sprintf >> pass (and maybe also strlen which also seems to use the same pattern), >> or it could be a value_range AKA irange ctor, or it could be a member >> of range_query, whatever is the most appropriate. >> >> (If assigning/copying a value_range is thought to be too expensive, >> declaring it first and then passing it to that helper to set it >> would work too). >> >> In strlen, is the removed comment no longer relevant? (I.e., does >> the ranger solve the problem?) >> >> - /* The range below may be "inaccurate" if a constant has been >> - substituted earlier for VAL by this pass that hasn't been >> - propagated through the CFG. This shoud be fixed by the new >> - on-demand VRP if/when it becomes available (hopefully in >> - GCC 11). */ > It should. > >> I'm wondering about the comment added to get_range_strlen_dynamic >> and other places: >> >> + // FIXME: Use range_query instead of global ranges. >> >> Is that something you're planning to do in a followup or should >> I remember to do it at some point? > I'm not planning on doing it. It's just a reminder that it would be > beneficial to do so. > >> Otherwise I have no concern with the changes. > It's not cleared whether Andrew approved all 3 parts of the patchset > or just the valuation part. I'll wait for his nod before committing > this chunk. > > Aldy > I have no issue with it, so OK. Just an observation that should be pointed out, I believe Aldy has all the code for converting to a ranger, but we have not pursued that any further yet since there is a regression due to our lack of equivalence processing I think? That should be resolved in the coming month, but at the moment is a holdback/concern for converting these passes... iirc. Andrew
On 10/1/20 3:22 PM, Andrew MacLeod wrote: > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>> Thanks for doing all this! There isn't anything I don't understand >>> in the sprintf changes so no questions from me (well, almost none). >>> Just some comments: >> Thanks for your comments on the sprintf/strlen API conversion. >> >>> The current call statement is available in all functions that take >>> a directive argument, as dir->info.callstmt. There should be no need >>> to also add it as a new argument to the functions that now need it. >> Fixed. >> >>> The change adds code along these lines in a bunch of places: >>> >>> + value_range vr; >>> + if (!query->range_of_expr (vr, arg, stmt)) >>> + vr.set_varying (TREE_TYPE (arg)); >>> >>> I thought under the new Ranger APIs when a range couldn't be >>> determined it would be automatically set to the maximum for >>> the type. I like that and have been moving in that direction >>> with my code myself (rather than having an API fail, have it >>> set the max range and succeed). >> I went through all the above idioms and noticed all are being used on >> supported types (integers or pointers). So range_of_expr will always >> return true. I've removed the if() and the set_varying. >> >>> Since that isn't so in this case, I think it would still be nice >>> if the added code could be written as if the range were set to >>> varying in this case and (ideally) reduced to just initialization: >>> >>> value_range vr = some-function (query, stmt, arg); >>> >>> some-function could be an inline helper defined just for the sprintf >>> pass (and maybe also strlen which also seems to use the same pattern), >>> or it could be a value_range AKA irange ctor, or it could be a member >>> of range_query, whatever is the most appropriate. >>> >>> (If assigning/copying a value_range is thought to be too expensive, >>> declaring it first and then passing it to that helper to set it >>> would work too). >>> >>> In strlen, is the removed comment no longer relevant? (I.e., does >>> the ranger solve the problem?) >>> >>> - /* The range below may be "inaccurate" if a constant has been >>> - substituted earlier for VAL by this pass that hasn't been >>> - propagated through the CFG. This shoud be fixed by the new >>> - on-demand VRP if/when it becomes available (hopefully in >>> - GCC 11). */ >> It should. >> >>> I'm wondering about the comment added to get_range_strlen_dynamic >>> and other places: >>> >>> + // FIXME: Use range_query instead of global ranges. >>> >>> Is that something you're planning to do in a followup or should >>> I remember to do it at some point? >> I'm not planning on doing it. It's just a reminder that it would be >> beneficial to do so. >> >>> Otherwise I have no concern with the changes. >> It's not cleared whether Andrew approved all 3 parts of the patchset >> or just the valuation part. I'll wait for his nod before committing >> this chunk. >> >> Aldy >> > I have no issue with it, so OK. Pushed all 3 patches. > > Just an observation that should be pointed out, I believe Aldy has all > the code for converting to a ranger, but we have not pursued that any > further yet since there is a regression due to our lack of equivalence > processing I think? That should be resolved in the coming month, but at > the moment is a holdback/concern for converting these passes... iirc. Yes. Martin, the take away here is that the strlen/sprintf pass has been converted to the new API, but ranger is still not up and running on it (even on the branch). With the new API, all you have to do is remove all instances of evrp_range_analyzer and replace them with a ranger. That's it. Below is an untested patch that would convert you to a ranger once it's contributed. IIRC when I enabled the ranger for your pass a while back, there was one or two regressions due to missing equivalences, and the rest were because the tests were expecting an actual specific range, and the ranger returned a slightly different/better one. You'll need to adjust your tests. Aldy diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index f4d1c5ca256..9f9e95b7155 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa-loop.h" #include "tree-scalar-evolution.h" #include "vr-values.h" -#include "gimple-ssa-evrp-analyze.h" #include "tree-ssa.h" +#include "gimple-range.h" /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive value is an index into strinfo vector, negative value stands for @@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker public: strlen_dom_walker (cdi_direction direction) : dom_walker (direction), - evrp (false), m_cleanup_cfg (false) {} virtual edge before_dom_children (basic_block); virtual void after_dom_children (basic_block); - /* EVRP analyzer used for printf argument range processing, and - to track strlen results across integer variable assignments. */ - evrp_range_analyzer evrp; + gimple_ranger ranger; /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen execute function. */ @@ -5777,8 +5774,6 @@ public: edge strlen_dom_walker::before_dom_children (basic_block bb) { - evrp.enter (bb); - basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb); if (dombb == NULL) @@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children (basic_block bb) /* Attempt to optimize individual statements. */ for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); ) { - gimple *stmt = gsi_stmt (gsi); - - /* First record ranges generated by this statement so they - can be used by printf argument processing. */ - evrp.record_ranges_from_stmt (stmt, false); - - if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp)) + if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger)) gsi_next (&gsi); } @@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children (basic_block bb) void strlen_dom_walker::after_dom_children (basic_block bb) { - evrp.leave (bb); - if (bb->aux) { stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) bb->aux);
On 10/1/20 9:34 AM, Aldy Hernandez wrote: > > > On 10/1/20 3:22 PM, Andrew MacLeod wrote: > > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: > >>> Thanks for doing all this! There isn't anything I don't understand > >>> in the sprintf changes so no questions from me (well, almost none). > >>> Just some comments: > >> Thanks for your comments on the sprintf/strlen API conversion. > >> > >>> The current call statement is available in all functions that take > >>> a directive argument, as dir->info.callstmt. There should be no need > >>> to also add it as a new argument to the functions that now need it. > >> Fixed. > >> > >>> The change adds code along these lines in a bunch of places: > >>> > >>> + value_range vr; > >>> + if (!query->range_of_expr (vr, arg, stmt)) > >>> + vr.set_varying (TREE_TYPE (arg)); > >>> > >>> I thought under the new Ranger APIs when a range couldn't be > >>> determined it would be automatically set to the maximum for > >>> the type. I like that and have been moving in that direction > >>> with my code myself (rather than having an API fail, have it > >>> set the max range and succeed). > >> I went through all the above idioms and noticed all are being used on > >> supported types (integers or pointers). So range_of_expr will always > >> return true. I've removed the if() and the set_varying. > >> > >>> Since that isn't so in this case, I think it would still be nice > >>> if the added code could be written as if the range were set to > >>> varying in this case and (ideally) reduced to just initialization: > >>> > >>> value_range vr = some-function (query, stmt, arg); > >>> > >>> some-function could be an inline helper defined just for the sprintf > >>> pass (and maybe also strlen which also seems to use the same pattern), > >>> or it could be a value_range AKA irange ctor, or it could be a member > >>> of range_query, whatever is the most appropriate. > >>> > >>> (If assigning/copying a value_range is thought to be too expensive, > >>> declaring it first and then passing it to that helper to set it > >>> would work too). > >>> > >>> In strlen, is the removed comment no longer relevant? (I.e., does > >>> the ranger solve the problem?) > >>> > >>> - /* The range below may be "inaccurate" if a constant has been > >>> - substituted earlier for VAL by this pass that hasn't been > >>> - propagated through the CFG. This shoud be fixed by the new > >>> - on-demand VRP if/when it becomes available (hopefully in > >>> - GCC 11). */ > >> It should. > >> > >>> I'm wondering about the comment added to get_range_strlen_dynamic > >>> and other places: > >>> > >>> + // FIXME: Use range_query instead of global ranges. > >>> > >>> Is that something you're planning to do in a followup or should > >>> I remember to do it at some point? > >> I'm not planning on doing it. It's just a reminder that it would be > >> beneficial to do so. > >> > >>> Otherwise I have no concern with the changes. > >> It's not cleared whether Andrew approved all 3 parts of the patchset > >> or just the valuation part. I'll wait for his nod before committing > >> this chunk. > >> > >> Aldy > >> > > I have no issue with it, so OK. > > Pushed all 3 patches. > > > > > Just an observation that should be pointed out, I believe Aldy has all > > the code for converting to a ranger, but we have not pursued that any > > further yet since there is a regression due to our lack of equivalence > > processing I think? That should be resolved in the coming month, but at > > the moment is a holdback/concern for converting these passes... iirc. > > Yes. Martin, the take away here is that the strlen/sprintf pass has > been converted to the new API, but ranger is still not up and running on > it (even on the branch). > > With the new API, all you have to do is remove all instances of > evrp_range_analyzer and replace them with a ranger. That's it. > Below is an untested patch that would convert you to a ranger once it's > contributed. > > IIRC when I enabled the ranger for your pass a while back, there was one > or two regressions due to missing equivalences, and the rest were > because the tests were expecting an actual specific range, and the > ranger returned a slightly different/better one. You'll need to adjust > your tests. Ack. I'll be on the lookout for the ranger commit (if you hppen to remember and CC me on it just in case I might miss it that would be great). Thanks Martin > > Aldy > > diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c > index f4d1c5ca256..9f9e95b7155 100644 > --- a/gcc/tree-ssa-strlen.c > +++ b/gcc/tree-ssa-strlen.c > @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3. If not see > #include "tree-ssa-loop.h" > #include "tree-scalar-evolution.h" > #include "vr-values.h" > -#include "gimple-ssa-evrp-analyze.h" > #include "tree-ssa.h" > +#include "gimple-range.h" > > /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive value > is an index into strinfo vector, negative value stands for > @@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker > public: > strlen_dom_walker (cdi_direction direction) > : dom_walker (direction), > - evrp (false), > m_cleanup_cfg (false) > {} > > virtual edge before_dom_children (basic_block); > virtual void after_dom_children (basic_block); > > - /* EVRP analyzer used for printf argument range processing, and > - to track strlen results across integer variable assignments. */ > - evrp_range_analyzer evrp; > + gimple_ranger ranger; > > /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen > execute function. */ > @@ -5777,8 +5774,6 @@ public: > edge > strlen_dom_walker::before_dom_children (basic_block bb) > { > - evrp.enter (bb); > - > basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb); > > if (dombb == NULL) > @@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children > (basic_block bb) > /* Attempt to optimize individual statements. */ > for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); ) > { > - gimple *stmt = gsi_stmt (gsi); > - > - /* First record ranges generated by this statement so they > - can be used by printf argument processing. */ > - evrp.record_ranges_from_stmt (stmt, false); > - > - if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp)) > + if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger)) > gsi_next (&gsi); > } > > @@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children > (basic_block bb) > void > strlen_dom_walker::after_dom_children (basic_block bb) > { > - evrp.leave (bb); > - > if (bb->aux) > { > stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) > bb->aux); >
On 10/1/20 11:25 AM, Martin Sebor wrote: > On 10/1/20 9:34 AM, Aldy Hernandez wrote: >> >> >> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >> >>> Thanks for doing all this! There isn't anything I don't understand >> >>> in the sprintf changes so no questions from me (well, almost none). >> >>> Just some comments: >> >> Thanks for your comments on the sprintf/strlen API conversion. >> >> >> >>> The current call statement is available in all functions that take >> >>> a directive argument, as dir->info.callstmt. There should be no >> need >> >>> to also add it as a new argument to the functions that now need it. >> >> Fixed. >> >> >> >>> The change adds code along these lines in a bunch of places: >> >>> >> >>> + value_range vr; >> >>> + if (!query->range_of_expr (vr, arg, stmt)) >> >>> + vr.set_varying (TREE_TYPE (arg)); >> >>> >> >>> I thought under the new Ranger APIs when a range couldn't be >> >>> determined it would be automatically set to the maximum for >> >>> the type. I like that and have been moving in that direction >> >>> with my code myself (rather than having an API fail, have it >> >>> set the max range and succeed). >> >> I went through all the above idioms and noticed all are being used on >> >> supported types (integers or pointers). So range_of_expr will always >> >> return true. I've removed the if() and the set_varying. >> >> >> >>> Since that isn't so in this case, I think it would still be nice >> >>> if the added code could be written as if the range were set to >> >>> varying in this case and (ideally) reduced to just initialization: >> >>> >> >>> value_range vr = some-function (query, stmt, arg); >> >>> >> >>> some-function could be an inline helper defined just for the sprintf >> >>> pass (and maybe also strlen which also seems to use the same >> pattern), >> >>> or it could be a value_range AKA irange ctor, or it could be a >> member >> >>> of range_query, whatever is the most appropriate. >> >>> >> >>> (If assigning/copying a value_range is thought to be too expensive, >> >>> declaring it first and then passing it to that helper to set it >> >>> would work too). >> >>> >> >>> In strlen, is the removed comment no longer relevant? (I.e., does >> >>> the ranger solve the problem?) >> >>> >> >>> - /* The range below may be "inaccurate" if a constant has been >> >>> - substituted earlier for VAL by this pass that hasn't been >> >>> - propagated through the CFG. This shoud be fixed by the new >> >>> - on-demand VRP if/when it becomes available (hopefully in >> >>> - GCC 11). */ >> >> It should. >> >> >> >>> I'm wondering about the comment added to get_range_strlen_dynamic >> >>> and other places: >> >>> >> >>> + // FIXME: Use range_query instead of global ranges. >> >>> >> >>> Is that something you're planning to do in a followup or should >> >>> I remember to do it at some point? >> >> I'm not planning on doing it. It's just a reminder that it would be >> >> beneficial to do so. >> >> >> >>> Otherwise I have no concern with the changes. >> >> It's not cleared whether Andrew approved all 3 parts of the patchset >> >> or just the valuation part. I'll wait for his nod before committing >> >> this chunk. >> >> >> >> Aldy >> >> >> > I have no issue with it, so OK. >> >> Pushed all 3 patches. >> >> > >> > Just an observation that should be pointed out, I believe Aldy has all >> > the code for converting to a ranger, but we have not pursued that any >> > further yet since there is a regression due to our lack of equivalence >> > processing I think? That should be resolved in the coming month, >> but at >> > the moment is a holdback/concern for converting these passes... iirc. >> >> Yes. Martin, the take away here is that the strlen/sprintf pass has >> been converted to the new API, but ranger is still not up and running >> on it (even on the branch). >> >> With the new API, all you have to do is remove all instances of >> evrp_range_analyzer and replace them with a ranger. That's it. >> Below is an untested patch that would convert you to a ranger once >> it's contributed. >> >> IIRC when I enabled the ranger for your pass a while back, there was >> one or two regressions due to missing equivalences, and the rest were >> because the tests were expecting an actual specific range, and the >> ranger returned a slightly different/better one. You'll need to >> adjust your tests. > > Ack. I'll be on the lookout for the ranger commit (if you hppen > to remember and CC me on it just in case I might miss it that would > be great). I have applied the patch and ran some tests. There are quite a few failures (see the list below). I have only looked at a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c boils down to the following test case. There should be no warning for either sprintf call. The one in h() is a false positive and the reason for at least some of the regressions. Somehow, the conversions between int and char are causing Ranger to lose the range. $ cat t.c && gcc -O2 -S -Wall t.c char a[2]; extern int x; signed char f (int min, int max) { signed char i = x; return i < min || max < i ? min : i; } void ff (signed char i) { __builtin_sprintf (a, "%i", f (0, 9)); // okay } signed char g (signed char min, signed char max) { signed char i = x; return i < min || max < i ? min : i; } void gg (void) { __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning } t.c: In function ‘gg’: t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into a region of size 2 [-Wformat-overflow=] 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning | ^~ t.c:24:25: note: directive argument in the range [-128, 127] 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning | ^~~~ t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into a destination of size 2 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another failure (the one in builtin-sprintf-warn-22.c) is this where the false positive is due to the strlen pass somehow having lost the upper bound on the sum of the two string lengths. $ cat t.c && gcc -O2 -S -Wall t.c typedef __SIZE_TYPE__ size_t; char a[1025]; void g (char *s1, char *s2) { size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2); if (n + d + 1 >= 1025) return; __builtin_sprintf (a, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" } } t.c: In function ‘g’: t.c:11:29: warning: ‘%s’ directive writing up to 1023 bytes into a region of size between 1 and 1024 [-Wformat-overflow=] 11 | __builtin_sprintf (a, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" } | ^~ t.c:11:3: note: ‘__builtin_sprintf’ output between 2 and 2048 bytes into a destination of size 1025 11 | __builtin_sprintf (a, "%s.%s", s1, s2); // { dg-bogus "\\\[-Wformat-overflow" } | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ I'll see if I can reduce the others into test cases but I'll have to hold off on enabling the ranger in the sprintf/strlen pass until these issues are resolved. Thanks Martin FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 28) FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 29) FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 32) FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 38) FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 39) XPASS: gcc.dg/pr80776-1.c (test for bogus messages, line 22) FAIL: gcc.dg/strlenopt-81.c (test for excess errors) FAIL: gcc.dg/strlenopt-90.c scan-tree-dump-not strlen1 "strlen \\(\\&a" FAIL: gcc.dg/strlenopt-80.c scan-tree-dump-times optimized "failure_on_line \\(" 0 FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "strlen \\(\\&a" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a1\\] = 0;" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a2 \\+ 1B\\] = 0;" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a3 \\+ 2B\\] = 0;" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a4 \\+ 3B\\] = 0;" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a5 \\+ 4B\\] = 0;" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a6 \\+ 5B\\] = 0;" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a7 \\+ 6B\\] = 0;" FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a8 \\+ 7B\\] = 0;" FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-3.c (test for excess errors) (many failures) FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-22.c (test for bogus messages, line 21) > > Thanks > Martin > >> >> Aldy >> >> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c >> index f4d1c5ca256..9f9e95b7155 100644 >> --- a/gcc/tree-ssa-strlen.c >> +++ b/gcc/tree-ssa-strlen.c >> @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-ssa-loop.h" >> #include "tree-scalar-evolution.h" >> #include "vr-values.h" >> -#include "gimple-ssa-evrp-analyze.h" >> #include "tree-ssa.h" >> +#include "gimple-range.h" >> >> /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive >> value >> is an index into strinfo vector, negative value stands for >> @@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker >> public: >> strlen_dom_walker (cdi_direction direction) >> : dom_walker (direction), >> - evrp (false), >> m_cleanup_cfg (false) >> {} >> >> virtual edge before_dom_children (basic_block); >> virtual void after_dom_children (basic_block); >> >> - /* EVRP analyzer used for printf argument range processing, and >> - to track strlen results across integer variable assignments. */ >> - evrp_range_analyzer evrp; >> + gimple_ranger ranger; >> >> /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen >> execute function. */ >> @@ -5777,8 +5774,6 @@ public: >> edge >> strlen_dom_walker::before_dom_children (basic_block bb) >> { >> - evrp.enter (bb); >> - >> basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb); >> >> if (dombb == NULL) >> @@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children >> (basic_block bb) >> /* Attempt to optimize individual statements. */ >> for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p >> (gsi); ) >> { >> - gimple *stmt = gsi_stmt (gsi); >> - >> - /* First record ranges generated by this statement so they >> - can be used by printf argument processing. */ >> - evrp.record_ranges_from_stmt (stmt, false); >> - >> - if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp)) >> + if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger)) >> gsi_next (&gsi); >> } >> >> @@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children >> (basic_block bb) >> void >> strlen_dom_walker::after_dom_children (basic_block bb) >> { >> - evrp.leave (bb); >> - >> if (bb->aux) >> { >> stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) >> bb->aux); >> >
On 11/5/20 12:29 PM, Martin Sebor wrote: > On 10/1/20 11:25 AM, Martin Sebor wrote: >> On 10/1/20 9:34 AM, Aldy Hernandez wrote: >>> >>> >>> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >>> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>> >>> Thanks for doing all this! There isn't anything I don't understand >>> >>> in the sprintf changes so no questions from me (well, almost none). >>> >>> Just some comments: >>> >> Thanks for your comments on the sprintf/strlen API conversion. >>> >> >>> >>> The current call statement is available in all functions that take >>> >>> a directive argument, as dir->info.callstmt. There should be no >>> need >>> >>> to also add it as a new argument to the functions that now need it. >>> >> Fixed. >>> >> >>> >>> The change adds code along these lines in a bunch of places: >>> >>> >>> >>> + value_range vr; >>> >>> + if (!query->range_of_expr (vr, arg, stmt)) >>> >>> + vr.set_varying (TREE_TYPE (arg)); >>> >>> >>> >>> I thought under the new Ranger APIs when a range couldn't be >>> >>> determined it would be automatically set to the maximum for >>> >>> the type. I like that and have been moving in that direction >>> >>> with my code myself (rather than having an API fail, have it >>> >>> set the max range and succeed). >>> >> I went through all the above idioms and noticed all are being >>> used on >>> >> supported types (integers or pointers). So range_of_expr will >>> always >>> >> return true. I've removed the if() and the set_varying. >>> >> >>> >>> Since that isn't so in this case, I think it would still be nice >>> >>> if the added code could be written as if the range were set to >>> >>> varying in this case and (ideally) reduced to just initialization: >>> >>> >>> >>> value_range vr = some-function (query, stmt, arg); >>> >>> >>> >>> some-function could be an inline helper defined just for the >>> sprintf >>> >>> pass (and maybe also strlen which also seems to use the same >>> pattern), >>> >>> or it could be a value_range AKA irange ctor, or it could be a >>> member >>> >>> of range_query, whatever is the most appropriate. >>> >>> >>> >>> (If assigning/copying a value_range is thought to be too expensive, >>> >>> declaring it first and then passing it to that helper to set it >>> >>> would work too). >>> >>> >>> >>> In strlen, is the removed comment no longer relevant? (I.e., does >>> >>> the ranger solve the problem?) >>> >>> >>> >>> - /* The range below may be "inaccurate" if a constant has >>> been >>> >>> - substituted earlier for VAL by this pass that hasn't been >>> >>> - propagated through the CFG. This shoud be fixed by the >>> new >>> >>> - on-demand VRP if/when it becomes available (hopefully in >>> >>> - GCC 11). */ >>> >> It should. >>> >> >>> >>> I'm wondering about the comment added to get_range_strlen_dynamic >>> >>> and other places: >>> >>> >>> >>> + // FIXME: Use range_query instead of global ranges. >>> >>> >>> >>> Is that something you're planning to do in a followup or should >>> >>> I remember to do it at some point? >>> >> I'm not planning on doing it. It's just a reminder that it would be >>> >> beneficial to do so. >>> >> >>> >>> Otherwise I have no concern with the changes. >>> >> It's not cleared whether Andrew approved all 3 parts of the patchset >>> >> or just the valuation part. I'll wait for his nod before committing >>> >> this chunk. >>> >> >>> >> Aldy >>> >> >>> > I have no issue with it, so OK. >>> >>> Pushed all 3 patches. >>> >>> > >>> > Just an observation that should be pointed out, I believe Aldy has >>> all >>> > the code for converting to a ranger, but we have not pursued that any >>> > further yet since there is a regression due to our lack of >>> equivalence >>> > processing I think? That should be resolved in the coming month, >>> but at >>> > the moment is a holdback/concern for converting these passes... >>> iirc. >>> >>> Yes. Martin, the take away here is that the strlen/sprintf pass has >>> been converted to the new API, but ranger is still not up and running >>> on it (even on the branch). >>> >>> With the new API, all you have to do is remove all instances of >>> evrp_range_analyzer and replace them with a ranger. That's it. >>> Below is an untested patch that would convert you to a ranger once >>> it's contributed. >>> >>> IIRC when I enabled the ranger for your pass a while back, there was >>> one or two regressions due to missing equivalences, and the rest were >>> because the tests were expecting an actual specific range, and the >>> ranger returned a slightly different/better one. You'll need to >>> adjust your tests. >> >> Ack. I'll be on the lookout for the ranger commit (if you hppen >> to remember and CC me on it just in case I might miss it that would >> be great). > > I have applied the patch and ran some tests. There are quite > a few failures (see the list below). I have only looked at > a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c > boils down to the following test case. There should be no warning > for either sprintf call. The one in h() is a false positive and > the reason for at least some of the regressions. Somehow, > the conversions between int and char are causing Ranger to lose > the range. > > $ cat t.c && gcc -O2 -S -Wall t.c > char a[2]; > > extern int x; > > signed char f (int min, int max) > { > signed char i = x; > return i < min || max < i ? min : i; > } > > void ff (signed char i) > { > __builtin_sprintf (a, "%i", f (0, 9)); // okay > } > > signed char g (signed char min, signed char max) > { > signed char i = x; > return i < min || max < i ? min : i; > } > > void gg (void) > { > __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning > } > > t.c: In function ‘gg’: > t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into a > region of size 2 [-Wformat-overflow=] > 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning > | ^~ > t.c:24:25: note: directive argument in the range [-128, 127] > 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning > | ^~~~ > t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into a > destination of size 2 > 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > Another failure (the one in builtin-sprintf-warn-22.c) is this where > the false positive is due to the strlen pass somehow having lost > the upper bound on the sum of the two string lengths. I'm guessing this might have something to do with the strlen pass calling set_range_info() on the nonconstant results of strlen calls it can determine the range for. How would I go about doing it with ranger? I see ranger_cache::set_global_range() and the class ctor takes a gimple_ranger as argument. Would calling the function be the right way to do it? (I couldn't find anything on the Wiki.) > > $ cat t.c && gcc -O2 -S -Wall t.c > typedef __SIZE_TYPE__ size_t; > > char a[1025]; > > void g (char *s1, char *s2) > { > size_t n = __builtin_strlen (s1), d = __builtin_strlen (s2); > if (n + d + 1 >= 1025) > return; > > __builtin_sprintf (a, "%s.%s", s1, s2); // { dg-bogus > "\\\[-Wformat-overflow" } > } > > t.c: In function ‘g’: > t.c:11:29: warning: ‘%s’ directive writing up to 1023 bytes into a > region of size between 1 and 1024 [-Wformat-overflow=] > 11 | __builtin_sprintf (a, "%s.%s", s1, s2); // { dg-bogus > "\\\[-Wformat-overflow" } > | ^~ > t.c:11:3: note: ‘__builtin_sprintf’ output between 2 and 2048 bytes into > a destination of size 1025 > 11 | __builtin_sprintf (a, "%s.%s", s1, s2); // { dg-bogus > "\\\[-Wformat-overflow" } > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > I'll see if I can reduce the others into test cases but I'll have > to hold off on enabling the ranger in the sprintf/strlen pass until > these issues are resolved. > > Thanks > Martin > > FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 28) > FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 29) > FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 32) > FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 38) > FAIL: gcc.dg/Wstringop-overflow-20.c (test for warnings, line 39) > XPASS: gcc.dg/pr80776-1.c (test for bogus messages, line 22) > FAIL: gcc.dg/strlenopt-81.c (test for excess errors) > FAIL: gcc.dg/strlenopt-90.c scan-tree-dump-not strlen1 "strlen \\(\\&a" > FAIL: gcc.dg/strlenopt-80.c scan-tree-dump-times optimized > "failure_on_line \\(" 0 > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "strlen \\(\\&a" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a1\\] = 0;" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a2 \\+ 1B\\] = 0;" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a3 \\+ 2B\\] = 0;" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a4 \\+ 3B\\] = 0;" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a5 \\+ 4B\\] = 0;" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a6 \\+ 5B\\] = 0;" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a7 \\+ 6B\\] = 0;" > FAIL: gcc.dg/strlenopt-89.c scan-tree-dump-not strlen1 "a8 \\+ 7B\\] = 0;" > FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-3.c (test for excess errors) > (many failures) > FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-22.c (test for bogus > messages, line 21) > > >> >> Thanks >> Martin >> >>> >>> Aldy >>> >>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c >>> index f4d1c5ca256..9f9e95b7155 100644 >>> --- a/gcc/tree-ssa-strlen.c >>> +++ b/gcc/tree-ssa-strlen.c >>> @@ -58,8 +58,8 @@ along with GCC; see the file COPYING3. If not see >>> #include "tree-ssa-loop.h" >>> #include "tree-scalar-evolution.h" >>> #include "vr-values.h" >>> -#include "gimple-ssa-evrp-analyze.h" >>> #include "tree-ssa.h" >>> +#include "gimple-range.h" >>> >>> /* A vector indexed by SSA_NAME_VERSION. 0 means unknown, positive >>> value >>> is an index into strinfo vector, negative value stands for >>> @@ -5755,16 +5755,13 @@ class strlen_dom_walker : public dom_walker >>> public: >>> strlen_dom_walker (cdi_direction direction) >>> : dom_walker (direction), >>> - evrp (false), >>> m_cleanup_cfg (false) >>> {} >>> >>> virtual edge before_dom_children (basic_block); >>> virtual void after_dom_children (basic_block); >>> >>> - /* EVRP analyzer used for printf argument range processing, and >>> - to track strlen results across integer variable assignments. */ >>> - evrp_range_analyzer evrp; >>> + gimple_ranger ranger; >>> >>> /* Flag that will trigger TODO_cleanup_cfg to be returned in strlen >>> execute function. */ >>> @@ -5777,8 +5774,6 @@ public: >>> edge >>> strlen_dom_walker::before_dom_children (basic_block bb) >>> { >>> - evrp.enter (bb); >>> - >>> basic_block dombb = get_immediate_dominator (CDI_DOMINATORS, bb); >>> >>> if (dombb == NULL) >>> @@ -5853,13 +5848,7 @@ strlen_dom_walker::before_dom_children >>> (basic_block bb) >>> /* Attempt to optimize individual statements. */ >>> for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p >>> (gsi); ) >>> { >>> - gimple *stmt = gsi_stmt (gsi); >>> - >>> - /* First record ranges generated by this statement so they >>> - can be used by printf argument processing. */ >>> - evrp.record_ranges_from_stmt (stmt, false); >>> - >>> - if (check_and_optimize_stmt (&gsi, &cleanup_eh, &evrp)) >>> + if (check_and_optimize_stmt (&gsi, &cleanup_eh, &ranger)) >>> gsi_next (&gsi); >>> } >>> >>> @@ -5878,8 +5867,6 @@ strlen_dom_walker::before_dom_children >>> (basic_block bb) >>> void >>> strlen_dom_walker::after_dom_children (basic_block bb) >>> { >>> - evrp.leave (bb); >>> - >>> if (bb->aux) >>> { >>> stridx_to_strinfo = ((vec<strinfo *, va_heap, vl_embed> *) >>> bb->aux); >>> >> >
On 11/5/20 4:02 PM, Martin Sebor wrote: > On 11/5/20 12:29 PM, Martin Sebor wrote: >> On 10/1/20 11:25 AM, Martin Sebor wrote: >>> On 10/1/20 9:34 AM, Aldy Hernandez wrote: >>>> >>>> >>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >>>> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>>> >>> Thanks for doing all this! There isn't anything I don't >>>> understand >>>> >>> in the sprintf changes so no questions from me (well, almost >>>> none). >>>> >>> Just some comments: >>>> >> Thanks for your comments on the sprintf/strlen API conversion. >>>> >> >>>> >>> The current call statement is available in all functions that >>>> take >>>> >>> a directive argument, as dir->info.callstmt. There should be >>>> no need >>>> >>> to also add it as a new argument to the functions that now >>>> need it. >>>> >> Fixed. >>>> >> >>>> >>> The change adds code along these lines in a bunch of places: >>>> >>> >>>> >>> + value_range vr; >>>> >>> + if (!query->range_of_expr (vr, arg, stmt)) >>>> >>> + vr.set_varying (TREE_TYPE (arg)); >>>> >>> >>>> >>> I thought under the new Ranger APIs when a range couldn't be >>>> >>> determined it would be automatically set to the maximum for >>>> >>> the type. I like that and have been moving in that direction >>>> >>> with my code myself (rather than having an API fail, have it >>>> >>> set the max range and succeed). >>>> >> I went through all the above idioms and noticed all are being >>>> used on >>>> >> supported types (integers or pointers). So range_of_expr will >>>> always >>>> >> return true. I've removed the if() and the set_varying. >>>> >> >>>> >>> Since that isn't so in this case, I think it would still be nice >>>> >>> if the added code could be written as if the range were set to >>>> >>> varying in this case and (ideally) reduced to just >>>> initialization: >>>> >>> >>>> >>> value_range vr = some-function (query, stmt, arg); >>>> >>> >>>> >>> some-function could be an inline helper defined just for the >>>> sprintf >>>> >>> pass (and maybe also strlen which also seems to use the same >>>> pattern), >>>> >>> or it could be a value_range AKA irange ctor, or it could be a >>>> member >>>> >>> of range_query, whatever is the most appropriate. >>>> >>> >>>> >>> (If assigning/copying a value_range is thought to be too >>>> expensive, >>>> >>> declaring it first and then passing it to that helper to set it >>>> >>> would work too). >>>> >>> >>>> >>> In strlen, is the removed comment no longer relevant? (I.e., >>>> does >>>> >>> the ranger solve the problem?) >>>> >>> >>>> >>> - /* The range below may be "inaccurate" if a constant >>>> has been >>>> >>> - substituted earlier for VAL by this pass that hasn't >>>> been >>>> >>> - propagated through the CFG. This shoud be fixed by >>>> the new >>>> >>> - on-demand VRP if/when it becomes available (hopefully in >>>> >>> - GCC 11). */ >>>> >> It should. >>>> >> >>>> >>> I'm wondering about the comment added to get_range_strlen_dynamic >>>> >>> and other places: >>>> >>> >>>> >>> + // FIXME: Use range_query instead of global ranges. >>>> >>> >>>> >>> Is that something you're planning to do in a followup or should >>>> >>> I remember to do it at some point? >>>> >> I'm not planning on doing it. It's just a reminder that it >>>> would be >>>> >> beneficial to do so. >>>> >> >>>> >>> Otherwise I have no concern with the changes. >>>> >> It's not cleared whether Andrew approved all 3 parts of the >>>> patchset >>>> >> or just the valuation part. I'll wait for his nod before >>>> committing >>>> >> this chunk. >>>> >> >>>> >> Aldy >>>> >> >>>> > I have no issue with it, so OK. >>>> >>>> Pushed all 3 patches. >>>> >>>> > >>>> > Just an observation that should be pointed out, I believe Aldy >>>> has all >>>> > the code for converting to a ranger, but we have not pursued >>>> that any >>>> > further yet since there is a regression due to our lack of >>>> equivalence >>>> > processing I think? That should be resolved in the coming >>>> month, but at >>>> > the moment is a holdback/concern for converting these passes... >>>> iirc. >>>> >>>> Yes. Martin, the take away here is that the strlen/sprintf pass >>>> has been converted to the new API, but ranger is still not up and >>>> running on it (even on the branch). >>>> >>>> With the new API, all you have to do is remove all instances of >>>> evrp_range_analyzer and replace them with a ranger. That's it. >>>> Below is an untested patch that would convert you to a ranger once >>>> it's contributed. >>>> >>>> IIRC when I enabled the ranger for your pass a while back, there >>>> was one or two regressions due to missing equivalences, and the >>>> rest were because the tests were expecting an actual specific >>>> range, and the ranger returned a slightly different/better one. >>>> You'll need to adjust your tests. >>> >>> Ack. I'll be on the lookout for the ranger commit (if you hppen >>> to remember and CC me on it just in case I might miss it that would >>> be great). >> >> I have applied the patch and ran some tests. There are quite >> a few failures (see the list below). I have only looked at >> a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c >> boils down to the following test case. There should be no warning >> for either sprintf call. The one in h() is a false positive and >> the reason for at least some of the regressions. Somehow, >> the conversions between int and char are causing Ranger to lose >> the range. >> >> $ cat t.c && gcc -O2 -S -Wall t.c >> char a[2]; >> >> extern int x; >> >> signed char f (int min, int max) >> { >> signed char i = x; >> return i < min || max < i ? min : i; >> } >> >> void ff (signed char i) >> { >> __builtin_sprintf (a, "%i", f (0, 9)); // okay >> } >> >> signed char g (signed char min, signed char max) >> { >> signed char i = x; >> return i < min || max < i ? min : i; >> } >> >> void gg (void) >> { >> __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >> } >> >> t.c: In function ‘gg’: >> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into >> a region of size 2 [-Wformat-overflow=] >> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >> | ^~ >> t.c:24:25: note: directive argument in the range [-128, 127] >> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >> | ^~~~ >> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into >> a destination of size 2 >> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> Another failure (the one in builtin-sprintf-warn-22.c) is this where >> the false positive is due to the strlen pass somehow having lost >> the upper bound on the sum of the two string lengths. > > I'm guessing this might have something to do with the strlen pass > calling set_range_info() on the nonconstant results of strlen calls > it can determine the range for. How would I go about doing it with > ranger? I see ranger_cache::set_global_range() and the class ctor > takes a gimple_ranger as argument. Would calling the function be > the right way to do it? (I couldn't find anything on the Wiki.) I haven't had time to write a new modern wiki yet... probably won't until well after stage 1 closes either. And no, thats the internal cache of the ranger, which you dont have access to. At the moment, there isnt a way to inject a "new" global range out of the blue. It hasnt been a requirement before, but wouldn't be particularly difficult. Short answer, there is no current equivalent of set_range_info() because we dont have a persistent global cache in the same way, and the ranger doesnt set the RANGE_INFO field at this point.. we havent gotten to the point of reworking that yet. So the question is, what exactly are you trying to do? I presume you are analyzing a strlen statement, and are making range conclusions about either the result or some of the operands are that are not otherwise exposed in the code? And you are looking to have those values injected into the rangers calculations as refinements? Andrew
On 11/5/20 2:29 PM, Martin Sebor wrote: > > > signed char g (signed char min, signed char max) > { > signed char i = x; > return i < min || max < i ? min : i; > } > > void gg (void) > { > __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning > } Im looking at this. its actually completely different code thats generated for this signed char case. And something is being missed that shouldnt be. I'll get back to you. Andrew
On 11/5/20 5:02 PM, Andrew MacLeod wrote: > On 11/5/20 4:02 PM, Martin Sebor wrote: >> On 11/5/20 12:29 PM, Martin Sebor wrote: >>> On 10/1/20 11:25 AM, Martin Sebor wrote: >>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote: >>>>> >>>>> >>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >>>>> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>>>> >>> Thanks for doing all this! There isn't anything I don't >>>>> understand >>>>> >>> in the sprintf changes so no questions from me (well, almost >>>>> none). >>>>> >>> Just some comments: >>>>> >> Thanks for your comments on the sprintf/strlen API conversion. >>>>> >> >>>>> >>> The current call statement is available in all functions that >>>>> take >>>>> >>> a directive argument, as dir->info.callstmt. There should be >>>>> no need >>>>> >>> to also add it as a new argument to the functions that now >>>>> need it. >>>>> >> Fixed. >>>>> >> >>>>> >>> The change adds code along these lines in a bunch of places: >>>>> >>> >>>>> >>> + value_range vr; >>>>> >>> + if (!query->range_of_expr (vr, arg, stmt)) >>>>> >>> + vr.set_varying (TREE_TYPE (arg)); >>>>> >>> >>>>> >>> I thought under the new Ranger APIs when a range couldn't be >>>>> >>> determined it would be automatically set to the maximum for >>>>> >>> the type. I like that and have been moving in that direction >>>>> >>> with my code myself (rather than having an API fail, have it >>>>> >>> set the max range and succeed). >>>>> >> I went through all the above idioms and noticed all are being >>>>> used on >>>>> >> supported types (integers or pointers). So range_of_expr will >>>>> always >>>>> >> return true. I've removed the if() and the set_varying. >>>>> >> >>>>> >>> Since that isn't so in this case, I think it would still be nice >>>>> >>> if the added code could be written as if the range were set to >>>>> >>> varying in this case and (ideally) reduced to just >>>>> initialization: >>>>> >>> >>>>> >>> value_range vr = some-function (query, stmt, arg); >>>>> >>> >>>>> >>> some-function could be an inline helper defined just for the >>>>> sprintf >>>>> >>> pass (and maybe also strlen which also seems to use the same >>>>> pattern), >>>>> >>> or it could be a value_range AKA irange ctor, or it could be a >>>>> member >>>>> >>> of range_query, whatever is the most appropriate. >>>>> >>> >>>>> >>> (If assigning/copying a value_range is thought to be too >>>>> expensive, >>>>> >>> declaring it first and then passing it to that helper to set it >>>>> >>> would work too). >>>>> >>> >>>>> >>> In strlen, is the removed comment no longer relevant? (I.e., >>>>> does >>>>> >>> the ranger solve the problem?) >>>>> >>> >>>>> >>> - /* The range below may be "inaccurate" if a constant >>>>> has been >>>>> >>> - substituted earlier for VAL by this pass that hasn't >>>>> been >>>>> >>> - propagated through the CFG. This shoud be fixed by >>>>> the new >>>>> >>> - on-demand VRP if/when it becomes available (hopefully in >>>>> >>> - GCC 11). */ >>>>> >> It should. >>>>> >> >>>>> >>> I'm wondering about the comment added to get_range_strlen_dynamic >>>>> >>> and other places: >>>>> >>> >>>>> >>> + // FIXME: Use range_query instead of global ranges. >>>>> >>> >>>>> >>> Is that something you're planning to do in a followup or should >>>>> >>> I remember to do it at some point? >>>>> >> I'm not planning on doing it. It's just a reminder that it >>>>> would be >>>>> >> beneficial to do so. >>>>> >> >>>>> >>> Otherwise I have no concern with the changes. >>>>> >> It's not cleared whether Andrew approved all 3 parts of the >>>>> patchset >>>>> >> or just the valuation part. I'll wait for his nod before >>>>> committing >>>>> >> this chunk. >>>>> >> >>>>> >> Aldy >>>>> >> >>>>> > I have no issue with it, so OK. >>>>> >>>>> Pushed all 3 patches. >>>>> >>>>> > >>>>> > Just an observation that should be pointed out, I believe Aldy >>>>> has all >>>>> > the code for converting to a ranger, but we have not pursued >>>>> that any >>>>> > further yet since there is a regression due to our lack of >>>>> equivalence >>>>> > processing I think? That should be resolved in the coming >>>>> month, but at >>>>> > the moment is a holdback/concern for converting these passes... >>>>> iirc. >>>>> >>>>> Yes. Martin, the take away here is that the strlen/sprintf pass >>>>> has been converted to the new API, but ranger is still not up and >>>>> running on it (even on the branch). >>>>> >>>>> With the new API, all you have to do is remove all instances of >>>>> evrp_range_analyzer and replace them with a ranger. That's it. >>>>> Below is an untested patch that would convert you to a ranger once >>>>> it's contributed. >>>>> >>>>> IIRC when I enabled the ranger for your pass a while back, there >>>>> was one or two regressions due to missing equivalences, and the >>>>> rest were because the tests were expecting an actual specific >>>>> range, and the ranger returned a slightly different/better one. >>>>> You'll need to adjust your tests. >>>> >>>> Ack. I'll be on the lookout for the ranger commit (if you hppen >>>> to remember and CC me on it just in case I might miss it that would >>>> be great). >>> >>> I have applied the patch and ran some tests. There are quite >>> a few failures (see the list below). I have only looked at >>> a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c >>> boils down to the following test case. There should be no warning >>> for either sprintf call. The one in h() is a false positive and >>> the reason for at least some of the regressions. Somehow, >>> the conversions between int and char are causing Ranger to lose >>> the range. >>> >>> $ cat t.c && gcc -O2 -S -Wall t.c >>> char a[2]; >>> >>> extern int x; >>> >>> signed char f (int min, int max) >>> { >>> signed char i = x; >>> return i < min || max < i ? min : i; >>> } >>> >>> void ff (signed char i) >>> { >>> __builtin_sprintf (a, "%i", f (0, 9)); // okay >>> } >>> >>> signed char g (signed char min, signed char max) >>> { >>> signed char i = x; >>> return i < min || max < i ? min : i; >>> } >>> >>> void gg (void) >>> { >>> __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>> } >>> >>> t.c: In function ‘gg’: >>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes into >>> a region of size 2 [-Wformat-overflow=] >>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>> | ^~ >>> t.c:24:25: note: directive argument in the range [-128, 127] >>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>> | ^~~~ >>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes into >>> a destination of size 2 >>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>> >>> >>> Another failure (the one in builtin-sprintf-warn-22.c) is this where >>> the false positive is due to the strlen pass somehow having lost >>> the upper bound on the sum of the two string lengths. >> >> I'm guessing this might have something to do with the strlen pass >> calling set_range_info() on the nonconstant results of strlen calls >> it can determine the range for. How would I go about doing it with >> ranger? I see ranger_cache::set_global_range() and the class ctor >> takes a gimple_ranger as argument. Would calling the function be >> the right way to do it? (I couldn't find anything on the Wiki.) > > I haven't had time to write a new modern wiki yet... probably won't > until well after stage 1 closes either. > > And no, thats the internal cache of the ranger, which you dont have > access to. > > At the moment, there isnt a way to inject a "new" global range out of > the blue. It hasnt been a requirement before, but wouldn't be > particularly difficult. > > Short answer, there is no current equivalent of set_range_info() because > we dont have a persistent global cache in the same way, and the ranger > doesnt set the RANGE_INFO field at this point.. we havent gotten to the > point of reworking that yet. > > So the question is, what exactly are you trying to do? I presume you are > analyzing a strlen statement, and are making range conclusions about > either the result or some of the operands are that are not otherwise > exposed in the code? > > And you are looking to have those values injected into the rangers > calculations as refinements? Yes, exactly. This is used both for optimization and for warnings (both to enable them and to suppress false positives). Besides folding strlen and sprintf calls into constants, the strlen pass sets the range of the results of the calls when the lengths of the arguments/output are known to be at least some number of bytes. Besides exposing optimization opportunities downstream, the pass uses this information to either issue or suppress warnings like in the sprintf case above. -Wformat-overflow issues warnings either based on string lengths (or ranges) if they're known, or based on the sizes of the arrays the strings are stored in when nothing is known about the lengths. The array size is used as the worst case estimate, which can cause false positives. Here's an example of a downstream optimization made possible by this. Because i's range is known, the range of the snprintf result is also known (the range is computed for all snprintf arguments). The strlen pass sets the range and subsequent passes eliminate the unreachable code. void f (int i) { if (i < 100 || 9999 < i) i = 100; int n = __builtin_snprintf (0, 0, "%i", i); if (n < 3 || 4 < n) __builtin_abort (); } Martin
On 11/5/20 7:50 PM, Martin Sebor wrote: > On 11/5/20 5:02 PM, Andrew MacLeod wrote: >> On 11/5/20 4:02 PM, Martin Sebor wrote: >>> On 11/5/20 12:29 PM, Martin Sebor wrote: >>>> On 10/1/20 11:25 AM, Martin Sebor wrote: >>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote: >>>>>> >>>>>> >>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >>>>>> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>>>>> >>> Thanks for doing all this! There isn't anything I don't >>>>>> understand >>>>>> >>> in the sprintf changes so no questions from me (well, almost >>>>>> none). >>>>>> >>> Just some comments: >>>>>> >> Thanks for your comments on the sprintf/strlen API conversion. >>>>>> >> >>>>>> >>> The current call statement is available in all functions >>>>>> that take >>>>>> >>> a directive argument, as dir->info.callstmt. There should >>>>>> be no need >>>>>> >>> to also add it as a new argument to the functions that now >>>>>> need it. >>>>>> >> Fixed. >>>>>> >> >>>>>> >>> The change adds code along these lines in a bunch of places: >>>>>> >>> >>>>>> >>> + value_range vr; >>>>>> >>> + if (!query->range_of_expr (vr, arg, stmt)) >>>>>> >>> + vr.set_varying (TREE_TYPE (arg)); >>>>>> >>> >>>>>> >>> I thought under the new Ranger APIs when a range couldn't be >>>>>> >>> determined it would be automatically set to the maximum for >>>>>> >>> the type. I like that and have been moving in that direction >>>>>> >>> with my code myself (rather than having an API fail, have it >>>>>> >>> set the max range and succeed). >>>>>> >> I went through all the above idioms and noticed all are being >>>>>> used on >>>>>> >> supported types (integers or pointers). So range_of_expr >>>>>> will always >>>>>> >> return true. I've removed the if() and the set_varying. >>>>>> >> >>>>>> >>> Since that isn't so in this case, I think it would still be >>>>>> nice >>>>>> >>> if the added code could be written as if the range were set to >>>>>> >>> varying in this case and (ideally) reduced to just >>>>>> initialization: >>>>>> >>> >>>>>> >>> value_range vr = some-function (query, stmt, arg); >>>>>> >>> >>>>>> >>> some-function could be an inline helper defined just for the >>>>>> sprintf >>>>>> >>> pass (and maybe also strlen which also seems to use the same >>>>>> pattern), >>>>>> >>> or it could be a value_range AKA irange ctor, or it could be >>>>>> a member >>>>>> >>> of range_query, whatever is the most appropriate. >>>>>> >>> >>>>>> >>> (If assigning/copying a value_range is thought to be too >>>>>> expensive, >>>>>> >>> declaring it first and then passing it to that helper to set it >>>>>> >>> would work too). >>>>>> >>> >>>>>> >>> In strlen, is the removed comment no longer relevant? >>>>>> (I.e., does >>>>>> >>> the ranger solve the problem?) >>>>>> >>> >>>>>> >>> - /* The range below may be "inaccurate" if a constant >>>>>> has been >>>>>> >>> - substituted earlier for VAL by this pass that >>>>>> hasn't been >>>>>> >>> - propagated through the CFG. This shoud be fixed by >>>>>> the new >>>>>> >>> - on-demand VRP if/when it becomes available >>>>>> (hopefully in >>>>>> >>> - GCC 11). */ >>>>>> >> It should. >>>>>> >> >>>>>> >>> I'm wondering about the comment added to >>>>>> get_range_strlen_dynamic >>>>>> >>> and other places: >>>>>> >>> >>>>>> >>> + // FIXME: Use range_query instead of global ranges. >>>>>> >>> >>>>>> >>> Is that something you're planning to do in a followup or should >>>>>> >>> I remember to do it at some point? >>>>>> >> I'm not planning on doing it. It's just a reminder that it >>>>>> would be >>>>>> >> beneficial to do so. >>>>>> >> >>>>>> >>> Otherwise I have no concern with the changes. >>>>>> >> It's not cleared whether Andrew approved all 3 parts of the >>>>>> patchset >>>>>> >> or just the valuation part. I'll wait for his nod before >>>>>> committing >>>>>> >> this chunk. >>>>>> >> >>>>>> >> Aldy >>>>>> >> >>>>>> > I have no issue with it, so OK. >>>>>> >>>>>> Pushed all 3 patches. >>>>>> >>>>>> > >>>>>> > Just an observation that should be pointed out, I believe Aldy >>>>>> has all >>>>>> > the code for converting to a ranger, but we have not pursued >>>>>> that any >>>>>> > further yet since there is a regression due to our lack of >>>>>> equivalence >>>>>> > processing I think? That should be resolved in the coming >>>>>> month, but at >>>>>> > the moment is a holdback/concern for converting these >>>>>> passes... iirc. >>>>>> >>>>>> Yes. Martin, the take away here is that the strlen/sprintf pass >>>>>> has been converted to the new API, but ranger is still not up and >>>>>> running on it (even on the branch). >>>>>> >>>>>> With the new API, all you have to do is remove all instances of >>>>>> evrp_range_analyzer and replace them with a ranger. That's it. >>>>>> Below is an untested patch that would convert you to a ranger >>>>>> once it's contributed. >>>>>> >>>>>> IIRC when I enabled the ranger for your pass a while back, there >>>>>> was one or two regressions due to missing equivalences, and the >>>>>> rest were because the tests were expecting an actual specific >>>>>> range, and the ranger returned a slightly different/better one. >>>>>> You'll need to adjust your tests. >>>>> >>>>> Ack. I'll be on the lookout for the ranger commit (if you hppen >>>>> to remember and CC me on it just in case I might miss it that would >>>>> be great). >>>> >>>> I have applied the patch and ran some tests. There are quite >>>> a few failures (see the list below). I have only looked at >>>> a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c >>>> boils down to the following test case. There should be no warning >>>> for either sprintf call. The one in h() is a false positive and >>>> the reason for at least some of the regressions. Somehow, >>>> the conversions between int and char are causing Ranger to lose >>>> the range. >>>> >>>> $ cat t.c && gcc -O2 -S -Wall t.c >>>> char a[2]; >>>> >>>> extern int x; >>>> >>>> signed char f (int min, int max) >>>> { >>>> signed char i = x; >>>> return i < min || max < i ? min : i; >>>> } >>>> >>>> void ff (signed char i) >>>> { >>>> __builtin_sprintf (a, "%i", f (0, 9)); // okay >>>> } >>>> >>>> signed char g (signed char min, signed char max) >>>> { >>>> signed char i = x; >>>> return i < min || max < i ? min : i; >>>> } >>>> >>>> void gg (void) >>>> { >>>> __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>> } >>>> >>>> t.c: In function ‘gg’: >>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes >>>> into a region of size 2 [-Wformat-overflow=] >>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>> | ^~ >>>> t.c:24:25: note: directive argument in the range [-128, 127] >>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>> | ^~~~ >>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes >>>> into a destination of size 2 >>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>> >>>> >>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where >>>> the false positive is due to the strlen pass somehow having lost >>>> the upper bound on the sum of the two string lengths. >>> >>> I'm guessing this might have something to do with the strlen pass >>> calling set_range_info() on the nonconstant results of strlen calls >>> it can determine the range for. How would I go about doing it with >>> ranger? I see ranger_cache::set_global_range() and the class ctor >>> takes a gimple_ranger as argument. Would calling the function be >>> the right way to do it? (I couldn't find anything on the Wiki.) >> >> I haven't had time to write a new modern wiki yet... probably won't >> until well after stage 1 closes either. >> >> And no, thats the internal cache of the ranger, which you dont have >> access to. >> >> At the moment, there isnt a way to inject a "new" global range out of >> the blue. It hasnt been a requirement before, but wouldn't be >> particularly difficult. >> >> Short answer, there is no current equivalent of set_range_info() >> because we dont have a persistent global cache in the same way, and >> the ranger doesnt set the RANGE_INFO field at this point.. we havent >> gotten to the point of reworking that yet. >> >> So the question is, what exactly are you trying to do? I presume you >> are analyzing a strlen statement, and are making range conclusions >> about either the result or some of the operands are that are not >> otherwise exposed in the code? >> >> And you are looking to have those values injected into the rangers >> calculations as refinements? > > Yes, exactly. This is used both for optimization and for warnings > (both to enable them and to suppress false positives). > > Besides folding strlen and sprintf calls into constants, the strlen > pass sets the range of the results of the calls when the lengths of > the arguments/output are known to be at least some number of bytes. > > Besides exposing optimization opportunities downstream, the pass > uses this information to either issue or suppress warnings like in > the sprintf case above. -Wformat-overflow issues warnings either > based on string lengths (or ranges) if they're known, or based on > the sizes of the arrays the strings are stored in when nothing is > known about the lengths. The array size is used as the worst case > estimate, which can cause false positives. > > Here's an example of a downstream optimization made possible by > this. Because i's range is known, the range of the snprintf result > is also known (the range is computed for all snprintf arguments). > The strlen pass sets the range and subsequent passes eliminate > the unreachable code. > > void f (int i) > { > if (i < 100 || 9999 < i) > i = 100; > > int n = __builtin_snprintf (0, 0, "%i", i); > if (n < 3 || 4 < n) > __builtin_abort (); > } > > Martin > That seems like somethings that the range code for builtins should take care of rather than be calculated externally and injected they are just like any other builtins that if the range of certain arguments are known, you can produce a result? I gather the range of i being known to be [100,9999] means we know something about n. The the ranger would be able to do that anywhere and everywhere, not just within this pass. One of the ranger goals is to centralize all range tracking and the warning pass can continue to check the ranges and issue warnings when the ranges arent what it likes. There may be other hoops you have to continue to jump thru, i don't know the specifics of what you deal with, but at least the basics of ranges produced from a builtin sounds like something that should be internal to the range engine. Andrew
On 11/5/20 8:08 PM, Andrew MacLeod wrote: > On 11/5/20 7:50 PM, Martin Sebor wrote: >> On 11/5/20 5:02 PM, Andrew MacLeod wrote: >>> On 11/5/20 4:02 PM, Martin Sebor wrote: >>>> On 11/5/20 12:29 PM, Martin Sebor wrote: >>>>> On 10/1/20 11:25 AM, Martin Sebor wrote: >>>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote: >>>>>>> >>>>>>> >>>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >>>>>>> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>>>>>> >>> Thanks for doing all this! There isn't anything I don't >>>>>>> understand >>>>>>> >>> in the sprintf changes so no questions from me (well, almost >>>>>>> none). >>>>>>> >>> Just some comments: >>>>>>> >> Thanks for your comments on the sprintf/strlen API conversion. >>>>>>> >> >>>>>>> >>> The current call statement is available in all functions >>>>>>> that take >>>>>>> >>> a directive argument, as dir->info.callstmt. There should >>>>>>> be no need >>>>>>> >>> to also add it as a new argument to the functions that now >>>>>>> need it. >>>>>>> >> Fixed. >>>>>>> >> >>>>>>> >>> The change adds code along these lines in a bunch of places: >>>>>>> >>> >>>>>>> >>> + value_range vr; >>>>>>> >>> + if (!query->range_of_expr (vr, arg, stmt)) >>>>>>> >>> + vr.set_varying (TREE_TYPE (arg)); >>>>>>> >>> >>>>>>> >>> I thought under the new Ranger APIs when a range couldn't be >>>>>>> >>> determined it would be automatically set to the maximum for >>>>>>> >>> the type. I like that and have been moving in that direction >>>>>>> >>> with my code myself (rather than having an API fail, have it >>>>>>> >>> set the max range and succeed). >>>>>>> >> I went through all the above idioms and noticed all are being >>>>>>> used on >>>>>>> >> supported types (integers or pointers). So range_of_expr >>>>>>> will always >>>>>>> >> return true. I've removed the if() and the set_varying. >>>>>>> >> >>>>>>> >>> Since that isn't so in this case, I think it would still be >>>>>>> nice >>>>>>> >>> if the added code could be written as if the range were set to >>>>>>> >>> varying in this case and (ideally) reduced to just >>>>>>> initialization: >>>>>>> >>> >>>>>>> >>> value_range vr = some-function (query, stmt, arg); >>>>>>> >>> >>>>>>> >>> some-function could be an inline helper defined just for the >>>>>>> sprintf >>>>>>> >>> pass (and maybe also strlen which also seems to use the same >>>>>>> pattern), >>>>>>> >>> or it could be a value_range AKA irange ctor, or it could be >>>>>>> a member >>>>>>> >>> of range_query, whatever is the most appropriate. >>>>>>> >>> >>>>>>> >>> (If assigning/copying a value_range is thought to be too >>>>>>> expensive, >>>>>>> >>> declaring it first and then passing it to that helper to set it >>>>>>> >>> would work too). >>>>>>> >>> >>>>>>> >>> In strlen, is the removed comment no longer relevant? (I.e., >>>>>>> does >>>>>>> >>> the ranger solve the problem?) >>>>>>> >>> >>>>>>> >>> - /* The range below may be "inaccurate" if a constant >>>>>>> has been >>>>>>> >>> - substituted earlier for VAL by this pass that >>>>>>> hasn't been >>>>>>> >>> - propagated through the CFG. This shoud be fixed by >>>>>>> the new >>>>>>> >>> - on-demand VRP if/when it becomes available >>>>>>> (hopefully in >>>>>>> >>> - GCC 11). */ >>>>>>> >> It should. >>>>>>> >> >>>>>>> >>> I'm wondering about the comment added to >>>>>>> get_range_strlen_dynamic >>>>>>> >>> and other places: >>>>>>> >>> >>>>>>> >>> + // FIXME: Use range_query instead of global ranges. >>>>>>> >>> >>>>>>> >>> Is that something you're planning to do in a followup or should >>>>>>> >>> I remember to do it at some point? >>>>>>> >> I'm not planning on doing it. It's just a reminder that it >>>>>>> would be >>>>>>> >> beneficial to do so. >>>>>>> >> >>>>>>> >>> Otherwise I have no concern with the changes. >>>>>>> >> It's not cleared whether Andrew approved all 3 parts of the >>>>>>> patchset >>>>>>> >> or just the valuation part. I'll wait for his nod before >>>>>>> committing >>>>>>> >> this chunk. >>>>>>> >> >>>>>>> >> Aldy >>>>>>> >> >>>>>>> > I have no issue with it, so OK. >>>>>>> >>>>>>> Pushed all 3 patches. >>>>>>> >>>>>>> > >>>>>>> > Just an observation that should be pointed out, I believe Aldy >>>>>>> has all >>>>>>> > the code for converting to a ranger, but we have not pursued >>>>>>> that any >>>>>>> > further yet since there is a regression due to our lack of >>>>>>> equivalence >>>>>>> > processing I think? That should be resolved in the coming >>>>>>> month, but at >>>>>>> > the moment is a holdback/concern for converting these >>>>>>> passes... iirc. >>>>>>> >>>>>>> Yes. Martin, the take away here is that the strlen/sprintf pass >>>>>>> has been converted to the new API, but ranger is still not up and >>>>>>> running on it (even on the branch). >>>>>>> >>>>>>> With the new API, all you have to do is remove all instances of >>>>>>> evrp_range_analyzer and replace them with a ranger. That's it. >>>>>>> Below is an untested patch that would convert you to a ranger >>>>>>> once it's contributed. >>>>>>> >>>>>>> IIRC when I enabled the ranger for your pass a while back, there >>>>>>> was one or two regressions due to missing equivalences, and the >>>>>>> rest were because the tests were expecting an actual specific >>>>>>> range, and the ranger returned a slightly different/better one. >>>>>>> You'll need to adjust your tests. >>>>>> >>>>>> Ack. I'll be on the lookout for the ranger commit (if you hppen >>>>>> to remember and CC me on it just in case I might miss it that would >>>>>> be great). >>>>> >>>>> I have applied the patch and ran some tests. There are quite >>>>> a few failures (see the list below). I have only looked at >>>>> a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c >>>>> boils down to the following test case. There should be no warning >>>>> for either sprintf call. The one in h() is a false positive and >>>>> the reason for at least some of the regressions. Somehow, >>>>> the conversions between int and char are causing Ranger to lose >>>>> the range. >>>>> >>>>> $ cat t.c && gcc -O2 -S -Wall t.c >>>>> char a[2]; >>>>> >>>>> extern int x; >>>>> >>>>> signed char f (int min, int max) >>>>> { >>>>> signed char i = x; >>>>> return i < min || max < i ? min : i; >>>>> } >>>>> >>>>> void ff (signed char i) >>>>> { >>>>> __builtin_sprintf (a, "%i", f (0, 9)); // okay >>>>> } >>>>> >>>>> signed char g (signed char min, signed char max) >>>>> { >>>>> signed char i = x; >>>>> return i < min || max < i ? min : i; >>>>> } >>>>> >>>>> void gg (void) >>>>> { >>>>> __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>> } >>>>> >>>>> t.c: In function ‘gg’: >>>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes >>>>> into a region of size 2 [-Wformat-overflow=] >>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>> | ^~ >>>>> t.c:24:25: note: directive argument in the range [-128, 127] >>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>> | ^~~~ >>>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes >>>>> into a destination of size 2 >>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>> >>>>> >>>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where >>>>> the false positive is due to the strlen pass somehow having lost >>>>> the upper bound on the sum of the two string lengths. >>>> >>>> I'm guessing this might have something to do with the strlen pass >>>> calling set_range_info() on the nonconstant results of strlen calls >>>> it can determine the range for. How would I go about doing it with >>>> ranger? I see ranger_cache::set_global_range() and the class ctor >>>> takes a gimple_ranger as argument. Would calling the function be >>>> the right way to do it? (I couldn't find anything on the Wiki.) >>> >>> I haven't had time to write a new modern wiki yet... probably won't >>> until well after stage 1 closes either. >>> >>> And no, thats the internal cache of the ranger, which you dont have >>> access to. >>> >>> At the moment, there isnt a way to inject a "new" global range out of >>> the blue. It hasnt been a requirement before, but wouldn't be >>> particularly difficult. >>> >>> Short answer, there is no current equivalent of set_range_info() >>> because we dont have a persistent global cache in the same way, and >>> the ranger doesnt set the RANGE_INFO field at this point.. we havent >>> gotten to the point of reworking that yet. >>> >>> So the question is, what exactly are you trying to do? I presume you >>> are analyzing a strlen statement, and are making range conclusions >>> about either the result or some of the operands are that are not >>> otherwise exposed in the code? >>> >>> And you are looking to have those values injected into the rangers >>> calculations as refinements? >> >> Yes, exactly. This is used both for optimization and for warnings >> (both to enable them and to suppress false positives). >> >> Besides folding strlen and sprintf calls into constants, the strlen >> pass sets the range of the results of the calls when the lengths of >> the arguments/output are known to be at least some number of bytes. >> >> Besides exposing optimization opportunities downstream, the pass >> uses this information to either issue or suppress warnings like in >> the sprintf case above. -Wformat-overflow issues warnings either >> based on string lengths (or ranges) if they're known, or based on >> the sizes of the arrays the strings are stored in when nothing is >> known about the lengths. The array size is used as the worst case >> estimate, which can cause false positives. >> >> Here's an example of a downstream optimization made possible by >> this. Because i's range is known, the range of the snprintf result >> is also known (the range is computed for all snprintf arguments). >> The strlen pass sets the range and subsequent passes eliminate >> the unreachable code. >> >> void f (int i) >> { >> if (i < 100 || 9999 < i) >> i = 100; >> >> int n = __builtin_snprintf (0, 0, "%i", i); >> if (n < 3 || 4 < n) >> __builtin_abort (); >> } >> >> Martin >> > That seems like somethings that the range code for builtins should take > care of rather than be calculated externally and injected > > they are just like any other builtins that if the range of certain > arguments are known, you can produce a result? The strlen optimization is more involved in that it keeps track of the state of the strings as they're being created by calls to string functions like strcpy (or single and multi-byte stores). So the only way for Ranger to figure out the range of the result of a strlen call is to query the strlen pass. That might work via a callback but it would of course only work while the strlen pass is running. The same goes for the sprintf results (which are based on the strlen data). It might be good enough as long as the ranges can also be exported to downstream passes. It might be possible to compute this on demand too (and introduce something like a strlen_range_query) but it would need reworking the strlen pass. I like the on-demand design but in the strlen case I'm not sure the benefits would justify the costs. > I gather the range of i > being known to be [100,9999] means we know something about n. The the > ranger would be able to do that anywhere and everywhere, not just within > this pass. One of the ranger goals is to centralize all range tracking > > and the warning pass can continue to check the ranges and issue warnings > when the ranges arent what it likes. In this simple case, yes. But consider a more involved example: char a[8]; void f (int i) { if (i < 100 || 9999 < i) i = 100; memcpy (a, "123", 3); // strlen(a) is in [3, 7] int n = snprintf (0, 0, "%s %i", a, i); // n is in [7, 12] if (n < 7 || 12 < n) // folded to false abort (); } To compute n's range you need to know not only that of i but also the length of a (or its range in this case). That's what the strlen optimization enables. > > There may be other hoops you have to continue to jump thru, i don't know > the specifics of what you deal with, but at least the basics of ranges > produced from a builtin sounds like something that should be internal > to the range engine. It might help to meet to help each other come up to speed on what we're doing, what we're planning, and how to best integrate the work and enable future enhancements. I still know very little about the Ranger internals and it might be helpful to you to learn more about some of the warnings I work on and the optimizations that enable them (and what I'd like to do in the future). Martin
On 11/6/20 11:13 AM, Martin Sebor wrote: > On 11/5/20 8:08 PM, Andrew MacLeod wrote: >> On 11/5/20 7:50 PM, Martin Sebor wrote: >>> On 11/5/20 5:02 PM, Andrew MacLeod wrote: >>>> On 11/5/20 4:02 PM, Martin Sebor wrote: >>>>> On 11/5/20 12:29 PM, Martin Sebor wrote: >>>>>> On 10/1/20 11:25 AM, Martin Sebor wrote: >>>>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >>>>>>>> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>>>>>>> >>> Thanks for doing all this! There isn't anything I don't >>>>>>>> understand >>>>>>>> >>> in the sprintf changes so no questions from me (well, >>>>>>>> almost none). >>>>>>>> >>> Just some comments: >>>>>>>> >> Thanks for your comments on the sprintf/strlen API conversion. >>>>>>>> >> >>>>>>>> >>> The current call statement is available in all functions >>>>>>>> that take >>>>>>>> >>> a directive argument, as dir->info.callstmt. There should >>>>>>>> be no need >>>>>>>> >>> to also add it as a new argument to the functions that now >>>>>>>> need it. >>>>>>>> >> Fixed. >>>>>>>> >> >>>>>>>> >>> The change adds code along these lines in a bunch of places: >>>>>>>> >>> >>>>>>>> >>> + value_range vr; >>>>>>>> >>> + if (!query->range_of_expr (vr, arg, stmt)) >>>>>>>> >>> + vr.set_varying (TREE_TYPE (arg)); >>>>>>>> >>> >>>>>>>> >>> I thought under the new Ranger APIs when a range couldn't be >>>>>>>> >>> determined it would be automatically set to the maximum for >>>>>>>> >>> the type. I like that and have been moving in that direction >>>>>>>> >>> with my code myself (rather than having an API fail, have it >>>>>>>> >>> set the max range and succeed). >>>>>>>> >> I went through all the above idioms and noticed all are >>>>>>>> being used on >>>>>>>> >> supported types (integers or pointers). So range_of_expr >>>>>>>> will always >>>>>>>> >> return true. I've removed the if() and the set_varying. >>>>>>>> >> >>>>>>>> >>> Since that isn't so in this case, I think it would still >>>>>>>> be nice >>>>>>>> >>> if the added code could be written as if the range were >>>>>>>> set to >>>>>>>> >>> varying in this case and (ideally) reduced to just >>>>>>>> initialization: >>>>>>>> >>> >>>>>>>> >>> value_range vr = some-function (query, stmt, arg); >>>>>>>> >>> >>>>>>>> >>> some-function could be an inline helper defined just for >>>>>>>> the sprintf >>>>>>>> >>> pass (and maybe also strlen which also seems to use the >>>>>>>> same pattern), >>>>>>>> >>> or it could be a value_range AKA irange ctor, or it could >>>>>>>> be a member >>>>>>>> >>> of range_query, whatever is the most appropriate. >>>>>>>> >>> >>>>>>>> >>> (If assigning/copying a value_range is thought to be too >>>>>>>> expensive, >>>>>>>> >>> declaring it first and then passing it to that helper to >>>>>>>> set it >>>>>>>> >>> would work too). >>>>>>>> >>> >>>>>>>> >>> In strlen, is the removed comment no longer relevant? >>>>>>>> (I.e., does >>>>>>>> >>> the ranger solve the problem?) >>>>>>>> >>> >>>>>>>> >>> - /* The range below may be "inaccurate" if a >>>>>>>> constant has been >>>>>>>> >>> - substituted earlier for VAL by this pass that >>>>>>>> hasn't been >>>>>>>> >>> - propagated through the CFG. This shoud be fixed >>>>>>>> by the new >>>>>>>> >>> - on-demand VRP if/when it becomes available >>>>>>>> (hopefully in >>>>>>>> >>> - GCC 11). */ >>>>>>>> >> It should. >>>>>>>> >> >>>>>>>> >>> I'm wondering about the comment added to >>>>>>>> get_range_strlen_dynamic >>>>>>>> >>> and other places: >>>>>>>> >>> >>>>>>>> >>> + // FIXME: Use range_query instead of global ranges. >>>>>>>> >>> >>>>>>>> >>> Is that something you're planning to do in a followup or >>>>>>>> should >>>>>>>> >>> I remember to do it at some point? >>>>>>>> >> I'm not planning on doing it. It's just a reminder that it >>>>>>>> would be >>>>>>>> >> beneficial to do so. >>>>>>>> >> >>>>>>>> >>> Otherwise I have no concern with the changes. >>>>>>>> >> It's not cleared whether Andrew approved all 3 parts of the >>>>>>>> patchset >>>>>>>> >> or just the valuation part. I'll wait for his nod before >>>>>>>> committing >>>>>>>> >> this chunk. >>>>>>>> >> >>>>>>>> >> Aldy >>>>>>>> >> >>>>>>>> > I have no issue with it, so OK. >>>>>>>> >>>>>>>> Pushed all 3 patches. >>>>>>>> >>>>>>>> > >>>>>>>> > Just an observation that should be pointed out, I believe >>>>>>>> Aldy has all >>>>>>>> > the code for converting to a ranger, but we have not pursued >>>>>>>> that any >>>>>>>> > further yet since there is a regression due to our lack of >>>>>>>> equivalence >>>>>>>> > processing I think? That should be resolved in the coming >>>>>>>> month, but at >>>>>>>> > the moment is a holdback/concern for converting these >>>>>>>> passes... iirc. >>>>>>>> >>>>>>>> Yes. Martin, the take away here is that the strlen/sprintf >>>>>>>> pass has been converted to the new API, but ranger is still not >>>>>>>> up and running on it (even on the branch). >>>>>>>> >>>>>>>> With the new API, all you have to do is remove all instances of >>>>>>>> evrp_range_analyzer and replace them with a ranger. That's it. >>>>>>>> Below is an untested patch that would convert you to a ranger >>>>>>>> once it's contributed. >>>>>>>> >>>>>>>> IIRC when I enabled the ranger for your pass a while back, >>>>>>>> there was one or two regressions due to missing equivalences, >>>>>>>> and the rest were because the tests were expecting an actual >>>>>>>> specific range, and the ranger returned a slightly >>>>>>>> different/better one. You'll need to adjust your tests. >>>>>>> >>>>>>> Ack. I'll be on the lookout for the ranger commit (if you hppen >>>>>>> to remember and CC me on it just in case I might miss it that would >>>>>>> be great). >>>>>> >>>>>> I have applied the patch and ran some tests. There are quite >>>>>> a few failures (see the list below). I have only looked at >>>>>> a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c >>>>>> boils down to the following test case. There should be no warning >>>>>> for either sprintf call. The one in h() is a false positive and >>>>>> the reason for at least some of the regressions. Somehow, >>>>>> the conversions between int and char are causing Ranger to lose >>>>>> the range. >>>>>> >>>>>> $ cat t.c && gcc -O2 -S -Wall t.c >>>>>> char a[2]; >>>>>> >>>>>> extern int x; >>>>>> >>>>>> signed char f (int min, int max) >>>>>> { >>>>>> signed char i = x; >>>>>> return i < min || max < i ? min : i; >>>>>> } >>>>>> >>>>>> void ff (signed char i) >>>>>> { >>>>>> __builtin_sprintf (a, "%i", f (0, 9)); // okay >>>>>> } >>>>>> >>>>>> signed char g (signed char min, signed char max) >>>>>> { >>>>>> signed char i = x; >>>>>> return i < min || max < i ? min : i; >>>>>> } >>>>>> >>>>>> void gg (void) >>>>>> { >>>>>> __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>> } >>>>>> >>>>>> t.c: In function ‘gg’: >>>>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes >>>>>> into a region of size 2 [-Wformat-overflow=] >>>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>> | ^~ >>>>>> t.c:24:25: note: directive argument in the range [-128, 127] >>>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>> | ^~~~ >>>>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes >>>>>> into a destination of size 2 >>>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> >>>>>> >>>>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where >>>>>> the false positive is due to the strlen pass somehow having lost >>>>>> the upper bound on the sum of the two string lengths. >>>>> >>>>> I'm guessing this might have something to do with the strlen pass >>>>> calling set_range_info() on the nonconstant results of strlen calls >>>>> it can determine the range for. How would I go about doing it with >>>>> ranger? I see ranger_cache::set_global_range() and the class ctor >>>>> takes a gimple_ranger as argument. Would calling the function be >>>>> the right way to do it? (I couldn't find anything on the Wiki.) >>>> >>>> I haven't had time to write a new modern wiki yet... probably won't >>>> until well after stage 1 closes either. >>>> >>>> And no, thats the internal cache of the ranger, which you dont have >>>> access to. >>>> >>>> At the moment, there isnt a way to inject a "new" global range out >>>> of the blue. It hasnt been a requirement before, but wouldn't be >>>> particularly difficult. >>>> >>>> Short answer, there is no current equivalent of set_range_info() >>>> because we dont have a persistent global cache in the same way, and >>>> the ranger doesnt set the RANGE_INFO field at this point.. we >>>> havent gotten to the point of reworking that yet. >>>> >>>> So the question is, what exactly are you trying to do? I presume >>>> you are analyzing a strlen statement, and are making range >>>> conclusions about either the result or some of the operands are >>>> that are not otherwise exposed in the code? >>>> >>>> And you are looking to have those values injected into the rangers >>>> calculations as refinements? >>> >>> Yes, exactly. This is used both for optimization and for warnings >>> (both to enable them and to suppress false positives). >>> >>> Besides folding strlen and sprintf calls into constants, the strlen >>> pass sets the range of the results of the calls when the lengths of >>> the arguments/output are known to be at least some number of bytes. >>> >>> Besides exposing optimization opportunities downstream, the pass >>> uses this information to either issue or suppress warnings like in >>> the sprintf case above. -Wformat-overflow issues warnings either >>> based on string lengths (or ranges) if they're known, or based on >>> the sizes of the arrays the strings are stored in when nothing is >>> known about the lengths. The array size is used as the worst case >>> estimate, which can cause false positives. >>> >>> Here's an example of a downstream optimization made possible by >>> this. Because i's range is known, the range of the snprintf result >>> is also known (the range is computed for all snprintf arguments). >>> The strlen pass sets the range and subsequent passes eliminate >>> the unreachable code. >>> >>> void f (int i) >>> { >>> if (i < 100 || 9999 < i) >>> i = 100; >>> >>> int n = __builtin_snprintf (0, 0, "%i", i); >>> if (n < 3 || 4 < n) >>> __builtin_abort (); >>> } >>> >>> Martin >>> >> That seems like somethings that the range code for builtins should >> take care of rather than be calculated externally and injected >> >> they are just like any other builtins that if the range of certain >> arguments are known, you can produce a result? > > The strlen optimization is more involved in that it keeps track of > the state of the strings as they're being created by calls to string > functions like strcpy (or single and multi-byte stores). So the only > way for Ranger to figure out the range of the result of a strlen call > is to query the strlen pass. That might work via a callback but it > would of course only work while the strlen pass is running. The same > goes for the sprintf results (which are based on the strlen data). > It might be good enough as long as the ranges can also be exported > to downstream passes. > > It might be possible to compute this on demand too (and introduce > something like a strlen_range_query) but it would need reworking > the strlen pass. I like the on-demand design but in the strlen > case I'm not sure the benefits would justify the costs. > >> I gather the range of i being known to be [100,9999] means we know >> something about n. The the ranger would be able to do that anywhere >> and everywhere, not just within this pass. One of the ranger goals >> is to centralize all range tracking >> >> and the warning pass can continue to check the ranges and issue >> warnings when the ranges arent what it likes. > > In this simple case, yes. But consider a more involved example: > > char a[8]; > > void f (int i) > { > if (i < 100 || 9999 < i) i = 100; > memcpy (a, "123", 3); // strlen(a) is in [3, 7] > int n = snprintf (0, 0, "%s %i", a, i); // n is in [7, 12] > if (n < 7 || 12 < n) // folded to false > abort (); > } > > To compute n's range you need to know not only that of i but also > the length of a (or its range in this case). That's what the strlen > optimization enables. > Thats why we need precise examples :-) Ive attached a thouroughly untested patch which you can apply to the latest trunk stuff I've checked in and try. It gives you an entry point into the ranger: bool import_global_range (tree name, irange &r); NAME is the ssa_name you are updating the global value for, and R is the new range you wish to add. Note it is not a replacement, but rather an intergration. ie, and extra source of information the ranger can't see. you can call it with your newly calculated value,and it will enhance the value the ranger already has with the extra information.. so above, you would call it with ranger->import_global_range (n_6, [7,12]) this will set the range for n_6 to the INTERSECTION of [7,12] and whatever the ranger already knows. If its varying, then it'll set it to [7,12]. If the ranger had already figured it was [9,20] for some reason, then the result will be [9,12] aftewr this call, and it will reurn FALSE indicating the new range is different than the range you passed in. if you care. It should also push the range of n_6 forward into pother blocks, so on entry to the block with the abort(), it should have a range of [], indicating it can't eb reached. There are a couple of caveats. 1) It may or may not affect other dependent value at this point depending on the order you visit things since the propagation of dependency chains is not complete yet. ie if the abort() block has h_4 = n_6 + 2 g_2 = h_4 + 6 , g_2 may not be reevaluated with the new range... It might.. but I cant guarantee it just now. 2) If you wish this value to be exported into SSA_NAME_RANGE_INFO, you will have to continue doing that manually. The ranger cache is not currently ever exported the RANGE_INFO fields. We've run into numerous issues with "differences": that occur when a multi-range is converted to a value_range and although the range is correct, it may be different that the orignal and produce different results. We are waiting until we replace the SSA_NAME_RANGE_INFO mechanism. Anyway, let me known if this works. I havent had a chance to test it yet. If it works we can add it to the full API Andrew
On 11/5/20 2:29 PM, Martin Sebor wrote: > On 10/1/20 11:25 AM, Martin Sebor wrote: >> > > I have applied the patch and ran some tests. There are quite > a few failures (see the list below). I have only looked at > a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c > boils down to the following test case. There should be no warning > for either sprintf call. The one in h() is a false positive and > the reason for at least some of the regressions. Somehow, > the conversions between int and char are causing Ranger to lose > the range. > > $ cat t.c && gcc -O2 -S -Wall t.c > char a[2]; > > extern int x; > > signed char f (int min, int max) > { > signed char i = x; > return i < min || max < i ? min : i; > } > > void ff (signed char i) > { > __builtin_sprintf (a, "%i", f (0, 9)); // okay > } > > signed char g (signed char min, signed char max) > { > signed char i = x; > return i < min || max < i ? min : i; > } > > void gg (void) > { > __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning > } > The latest changes resolve the issues witg the gg() case. you will now get a range of [0,9] for the temporary: =========== BB 4 ============ <bb 4> : # iftmp.3_10 = PHI <0(2), i_6(3)> _2 = (int) iftmp.3_10; __builtin_sprintf (&a, "%i", _2); return; _2 : int [0, 9] iftmp.3_10 : signed char [0, 9] The code issued for the 2 routines is very different. The first routine produces: =========== BB 2 ============ <bb 2> : x.0_4 = x; i_6 = (signed char) x.0_4; _7 = (int) i_6; if (_7 < 0) goto <bb 4>; [50.00%] else goto <bb 3>; [50.00%] _7 : int [-128, 127] 2->4 (T) x.0_4 : int [-INF, -257][-128, -1][128, +INF] 2->4 (T) i_6 : signed char [-INF, -1] 2->4 (T) _7 : int [-128, -1] 2->3 (F) x.0_4 : int [-INF, -129][0, 127][256, +INF] 2->3 (F) i_6 : signed char [0, +INF] 2->3 (F) _7 : int [0, 127] =========== BB 3 ============ i_6 signed char [0, +INF] _7 int [0, 127] <bb 3> : if (_7 > 9) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] 3->4 (T) _7 : int [10, 127] 3->5 (F) _7 : int [0, 9] =========== BB 4 ============ <bb 4> : =========== BB 5 ============ <bb 5> : # iftmp.1_9 = PHI <i_6(3), 0(4)> _2 = (int) iftmp.1_9; __builtin_sprintf (&a, "%i", _2); return; _2 : int [0, 127] iftmp.1_9 : signed char [0, +INF] Note that we figure out that _7 is [0,9] from 3->5, except the PHI node in block 5 refers to i_6... i_6 is not referred to in BB3, so this generation of ranger doesnt see it as an exportable value, and thus doesnt do a calculation for it like it does in BB2, where it is defined. So it only picks up the first of the 2 conditions for i_6 that is live on entry to BB3 ([0, +INF]) 2 things are in the pipe which will resolve this 1 - equivalences. _7 is a cast-copy of i_6. since the range of _7 falls within the range of a signed char, the forthcoming equivalency/relational work will have i_6 match the same value as _7. 2 - The next generation GORI engine will allow for recalculation of dependency chains from outside the block when appropriate, so this will cause i_6 to also be an exportable value from BB 3 as well. which will also then get the [0,9] range in circumstances in which it isnt a simply copy. Neither of which helps you right now with the perfect precision :-P Andrew
On 11/6/20 2:54 PM, Andrew MacLeod wrote: > On 11/6/20 11:13 AM, Martin Sebor wrote: >> On 11/5/20 8:08 PM, Andrew MacLeod wrote: >>> On 11/5/20 7:50 PM, Martin Sebor wrote: >>>> On 11/5/20 5:02 PM, Andrew MacLeod wrote: >>>>> On 11/5/20 4:02 PM, Martin Sebor wrote: >>>>>> On 11/5/20 12:29 PM, Martin Sebor wrote: >>>>>>> On 10/1/20 11:25 AM, Martin Sebor wrote: >>>>>>>> On 10/1/20 9:34 AM, Aldy Hernandez wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 10/1/20 3:22 PM, Andrew MacLeod wrote: >>>>>>>>> > On 10/1/20 5:05 AM, Aldy Hernandez via Gcc-patches wrote: >>>>>>>>> >>> Thanks for doing all this! There isn't anything I don't >>>>>>>>> understand >>>>>>>>> >>> in the sprintf changes so no questions from me (well, >>>>>>>>> almost none). >>>>>>>>> >>> Just some comments: >>>>>>>>> >> Thanks for your comments on the sprintf/strlen API conversion. >>>>>>>>> >> >>>>>>>>> >>> The current call statement is available in all functions >>>>>>>>> that take >>>>>>>>> >>> a directive argument, as dir->info.callstmt. There should >>>>>>>>> be no need >>>>>>>>> >>> to also add it as a new argument to the functions that now >>>>>>>>> need it. >>>>>>>>> >> Fixed. >>>>>>>>> >> >>>>>>>>> >>> The change adds code along these lines in a bunch of places: >>>>>>>>> >>> >>>>>>>>> >>> + value_range vr; >>>>>>>>> >>> + if (!query->range_of_expr (vr, arg, stmt)) >>>>>>>>> >>> + vr.set_varying (TREE_TYPE (arg)); >>>>>>>>> >>> >>>>>>>>> >>> I thought under the new Ranger APIs when a range couldn't be >>>>>>>>> >>> determined it would be automatically set to the maximum for >>>>>>>>> >>> the type. I like that and have been moving in that direction >>>>>>>>> >>> with my code myself (rather than having an API fail, have it >>>>>>>>> >>> set the max range and succeed). >>>>>>>>> >> I went through all the above idioms and noticed all are >>>>>>>>> being used on >>>>>>>>> >> supported types (integers or pointers). So range_of_expr >>>>>>>>> will always >>>>>>>>> >> return true. I've removed the if() and the set_varying. >>>>>>>>> >> >>>>>>>>> >>> Since that isn't so in this case, I think it would still >>>>>>>>> be nice >>>>>>>>> >>> if the added code could be written as if the range were >>>>>>>>> set to >>>>>>>>> >>> varying in this case and (ideally) reduced to just >>>>>>>>> initialization: >>>>>>>>> >>> >>>>>>>>> >>> value_range vr = some-function (query, stmt, arg); >>>>>>>>> >>> >>>>>>>>> >>> some-function could be an inline helper defined just for >>>>>>>>> the sprintf >>>>>>>>> >>> pass (and maybe also strlen which also seems to use the >>>>>>>>> same pattern), >>>>>>>>> >>> or it could be a value_range AKA irange ctor, or it could >>>>>>>>> be a member >>>>>>>>> >>> of range_query, whatever is the most appropriate. >>>>>>>>> >>> >>>>>>>>> >>> (If assigning/copying a value_range is thought to be too >>>>>>>>> expensive, >>>>>>>>> >>> declaring it first and then passing it to that helper to >>>>>>>>> set it >>>>>>>>> >>> would work too). >>>>>>>>> >>> >>>>>>>>> >>> In strlen, is the removed comment no longer relevant? >>>>>>>>> (I.e., does >>>>>>>>> >>> the ranger solve the problem?) >>>>>>>>> >>> >>>>>>>>> >>> - /* The range below may be "inaccurate" if a >>>>>>>>> constant has been >>>>>>>>> >>> - substituted earlier for VAL by this pass that >>>>>>>>> hasn't been >>>>>>>>> >>> - propagated through the CFG. This shoud be fixed >>>>>>>>> by the new >>>>>>>>> >>> - on-demand VRP if/when it becomes available >>>>>>>>> (hopefully in >>>>>>>>> >>> - GCC 11). */ >>>>>>>>> >> It should. >>>>>>>>> >> >>>>>>>>> >>> I'm wondering about the comment added to >>>>>>>>> get_range_strlen_dynamic >>>>>>>>> >>> and other places: >>>>>>>>> >>> >>>>>>>>> >>> + // FIXME: Use range_query instead of global ranges. >>>>>>>>> >>> >>>>>>>>> >>> Is that something you're planning to do in a followup or >>>>>>>>> should >>>>>>>>> >>> I remember to do it at some point? >>>>>>>>> >> I'm not planning on doing it. It's just a reminder that it >>>>>>>>> would be >>>>>>>>> >> beneficial to do so. >>>>>>>>> >> >>>>>>>>> >>> Otherwise I have no concern with the changes. >>>>>>>>> >> It's not cleared whether Andrew approved all 3 parts of the >>>>>>>>> patchset >>>>>>>>> >> or just the valuation part. I'll wait for his nod before >>>>>>>>> committing >>>>>>>>> >> this chunk. >>>>>>>>> >> >>>>>>>>> >> Aldy >>>>>>>>> >> >>>>>>>>> > I have no issue with it, so OK. >>>>>>>>> >>>>>>>>> Pushed all 3 patches. >>>>>>>>> >>>>>>>>> > >>>>>>>>> > Just an observation that should be pointed out, I believe >>>>>>>>> Aldy has all >>>>>>>>> > the code for converting to a ranger, but we have not pursued >>>>>>>>> that any >>>>>>>>> > further yet since there is a regression due to our lack of >>>>>>>>> equivalence >>>>>>>>> > processing I think? That should be resolved in the coming >>>>>>>>> month, but at >>>>>>>>> > the moment is a holdback/concern for converting these >>>>>>>>> passes... iirc. >>>>>>>>> >>>>>>>>> Yes. Martin, the take away here is that the strlen/sprintf >>>>>>>>> pass has been converted to the new API, but ranger is still not >>>>>>>>> up and running on it (even on the branch). >>>>>>>>> >>>>>>>>> With the new API, all you have to do is remove all instances of >>>>>>>>> evrp_range_analyzer and replace them with a ranger. That's it. >>>>>>>>> Below is an untested patch that would convert you to a ranger >>>>>>>>> once it's contributed. >>>>>>>>> >>>>>>>>> IIRC when I enabled the ranger for your pass a while back, >>>>>>>>> there was one or two regressions due to missing equivalences, >>>>>>>>> and the rest were because the tests were expecting an actual >>>>>>>>> specific range, and the ranger returned a slightly >>>>>>>>> different/better one. You'll need to adjust your tests. >>>>>>>> >>>>>>>> Ack. I'll be on the lookout for the ranger commit (if you hppen >>>>>>>> to remember and CC me on it just in case I might miss it that would >>>>>>>> be great). >>>>>>> >>>>>>> I have applied the patch and ran some tests. There are quite >>>>>>> a few failures (see the list below). I have only looked at >>>>>>> a couple. The one in in gcc.dg/tree-ssa/builtin-sprintf-warn-3.c >>>>>>> boils down to the following test case. There should be no warning >>>>>>> for either sprintf call. The one in h() is a false positive and >>>>>>> the reason for at least some of the regressions. Somehow, >>>>>>> the conversions between int and char are causing Ranger to lose >>>>>>> the range. >>>>>>> >>>>>>> $ cat t.c && gcc -O2 -S -Wall t.c >>>>>>> char a[2]; >>>>>>> >>>>>>> extern int x; >>>>>>> >>>>>>> signed char f (int min, int max) >>>>>>> { >>>>>>> signed char i = x; >>>>>>> return i < min || max < i ? min : i; >>>>>>> } >>>>>>> >>>>>>> void ff (signed char i) >>>>>>> { >>>>>>> __builtin_sprintf (a, "%i", f (0, 9)); // okay >>>>>>> } >>>>>>> >>>>>>> signed char g (signed char min, signed char max) >>>>>>> { >>>>>>> signed char i = x; >>>>>>> return i < min || max < i ? min : i; >>>>>>> } >>>>>>> >>>>>>> void gg (void) >>>>>>> { >>>>>>> __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>>> } >>>>>>> >>>>>>> t.c: In function ‘gg’: >>>>>>> t.c:24:26: warning: ‘%i’ directive writing between 1 and 4 bytes >>>>>>> into a region of size 2 [-Wformat-overflow=] >>>>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>>> | ^~ >>>>>>> t.c:24:25: note: directive argument in the range [-128, 127] >>>>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>>> | ^~~~ >>>>>>> t.c:24:3: note: ‘__builtin_sprintf’ output between 2 and 5 bytes >>>>>>> into a destination of size 2 >>>>>>> 24 | __builtin_sprintf (a, "%i", g (0, 9)); // bogus warning >>>>>>> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>>> >>>>>>> >>>>>>> Another failure (the one in builtin-sprintf-warn-22.c) is this where >>>>>>> the false positive is due to the strlen pass somehow having lost >>>>>>> the upper bound on the sum of the two string lengths. >>>>>> >>>>>> I'm guessing this might have something to do with the strlen pass >>>>>> calling set_range_info() on the nonconstant results of strlen calls >>>>>> it can determine the range for. How would I go about doing it with >>>>>> ranger? I see ranger_cache::set_global_range() and the class ctor >>>>>> takes a gimple_ranger as argument. Would calling the function be >>>>>> the right way to do it? (I couldn't find anything on the Wiki.) >>>>> >>>>> I haven't had time to write a new modern wiki yet... probably won't >>>>> until well after stage 1 closes either. >>>>> >>>>> And no, thats the internal cache of the ranger, which you dont have >>>>> access to. >>>>> >>>>> At the moment, there isnt a way to inject a "new" global range out >>>>> of the blue. It hasnt been a requirement before, but wouldn't be >>>>> particularly difficult. >>>>> >>>>> Short answer, there is no current equivalent of set_range_info() >>>>> because we dont have a persistent global cache in the same way, and >>>>> the ranger doesnt set the RANGE_INFO field at this point.. we >>>>> havent gotten to the point of reworking that yet. >>>>> >>>>> So the question is, what exactly are you trying to do? I presume >>>>> you are analyzing a strlen statement, and are making range >>>>> conclusions about either the result or some of the operands are >>>>> that are not otherwise exposed in the code? >>>>> >>>>> And you are looking to have those values injected into the rangers >>>>> calculations as refinements? >>>> >>>> Yes, exactly. This is used both for optimization and for warnings >>>> (both to enable them and to suppress false positives). >>>> >>>> Besides folding strlen and sprintf calls into constants, the strlen >>>> pass sets the range of the results of the calls when the lengths of >>>> the arguments/output are known to be at least some number of bytes. >>>> >>>> Besides exposing optimization opportunities downstream, the pass >>>> uses this information to either issue or suppress warnings like in >>>> the sprintf case above. -Wformat-overflow issues warnings either >>>> based on string lengths (or ranges) if they're known, or based on >>>> the sizes of the arrays the strings are stored in when nothing is >>>> known about the lengths. The array size is used as the worst case >>>> estimate, which can cause false positives. >>>> >>>> Here's an example of a downstream optimization made possible by >>>> this. Because i's range is known, the range of the snprintf result >>>> is also known (the range is computed for all snprintf arguments). >>>> The strlen pass sets the range and subsequent passes eliminate >>>> the unreachable code. >>>> >>>> void f (int i) >>>> { >>>> if (i < 100 || 9999 < i) >>>> i = 100; >>>> >>>> int n = __builtin_snprintf (0, 0, "%i", i); >>>> if (n < 3 || 4 < n) >>>> __builtin_abort (); >>>> } >>>> >>>> Martin >>>> >>> That seems like somethings that the range code for builtins should >>> take care of rather than be calculated externally and injected >>> >>> they are just like any other builtins that if the range of certain >>> arguments are known, you can produce a result? >> >> The strlen optimization is more involved in that it keeps track of >> the state of the strings as they're being created by calls to string >> functions like strcpy (or single and multi-byte stores). So the only >> way for Ranger to figure out the range of the result of a strlen call >> is to query the strlen pass. That might work via a callback but it >> would of course only work while the strlen pass is running. The same >> goes for the sprintf results (which are based on the strlen data). >> It might be good enough as long as the ranges can also be exported >> to downstream passes. >> >> It might be possible to compute this on demand too (and introduce >> something like a strlen_range_query) but it would need reworking >> the strlen pass. I like the on-demand design but in the strlen >> case I'm not sure the benefits would justify the costs. >> >>> I gather the range of i being known to be [100,9999] means we know >>> something about n. The the ranger would be able to do that anywhere >>> and everywhere, not just within this pass. One of the ranger goals >>> is to centralize all range tracking >>> >>> and the warning pass can continue to check the ranges and issue >>> warnings when the ranges arent what it likes. >> >> In this simple case, yes. But consider a more involved example: >> >> char a[8]; >> >> void f (int i) >> { >> if (i < 100 || 9999 < i) i = 100; >> memcpy (a, "123", 3); // strlen(a) is in [3, 7] >> int n = snprintf (0, 0, "%s %i", a, i); // n is in [7, 12] >> if (n < 7 || 12 < n) // folded to false >> abort (); >> } >> >> To compute n's range you need to know not only that of i but also >> the length of a (or its range in this case). That's what the strlen >> optimization enables. >> > Thats why we need precise examples :-) > > Ive attached a thouroughly untested patch which you can apply to the > latest trunk stuff I've checked in and try. It gives you an entry > point into the ranger: > > bool import_global_range (tree name, irange &r); > NAME is the ssa_name you are updating the global value for, and > R is the new range you wish to add. Note it is not a replacement, but > rather an intergration. ie, and extra source of information the ranger > can't see. > > you can call it with your newly calculated value,and it will enhance the > value the ranger already has with the extra information.. so above, you > would call it with > > ranger->import_global_range (n_6, [7,12]) > > this will set the range for n_6 to the INTERSECTION of [7,12] and > whatever the ranger already knows. If its varying, then it'll set it to > [7,12]. If the ranger had already figured it was [9,20] for some reason, > then the result will be [9,12] aftewr this call, and it will reurn FALSE > indicating the new range is different than the range you passed in. if > you care. > > It should also push the range of n_6 forward into pother blocks, so on > entry to the block with the abort(), it should have a range of [], > indicating it can't eb reached. > > There are a couple of caveats. > > 1) It may or may not affect other dependent value at this point > depending on the order you visit things since the propagation of > dependency chains is not complete yet. ie if the abort() block has > h_4 = n_6 + 2 > g_2 = h_4 + 6 , g_2 may not be reevaluated with the new range... It > might.. but I cant guarantee it just now. > > 2) If you wish this value to be exported into SSA_NAME_RANGE_INFO, you > will have to continue doing that manually. The ranger cache is not > currently ever exported the RANGE_INFO fields. We've run into numerous > issues with "differences": that occur when a multi-range is converted to > a value_range and although the range is correct, it may be different > that the orignal and produce different results. We are waiting until > we replace the SSA_NAME_RANGE_INFO mechanism. > > Anyway, let me known if this works. I havent had a chance to test it > yet. If it works we can add it to the full API Thanks a lot! I'll try to get to it sometime this week if I can. Martin
From 75e48c3b6d0d744e511edfa075a50f635c879aa8 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez <aldyh@redhat.com> Date: Thu, 17 Sep 2020 09:34:03 +0200 Subject: [PATCH 3/3] Convert sprintf/strlen passes to value query class. --- gcc/builtins.c | 29 +++--- gcc/builtins.h | 5 +- gcc/gimple-ssa-sprintf.c | 136 +++++++++++++++------------ gcc/tree-ssa-strlen.c | 193 ++++++++++++++++++++------------------- gcc/tree-ssa-strlen.h | 9 +- 5 files changed, 196 insertions(+), 176 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index 97f1a184dc6..fb94c8e0ac0 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -183,8 +183,9 @@ static void maybe_emit_chk_warning (tree, enum built_in_function); static void maybe_emit_sprintf_chk_warning (tree, enum built_in_function); static void maybe_emit_free_warning (tree); static tree fold_builtin_object_size (tree, tree); -static tree compute_objsize (tree, int, access_ref *, const vr_values * = NULL); -static bool get_range (tree, signop, offset_int[2], const vr_values * = NULL); +static tree compute_objsize (tree, int, access_ref *, range_query * = NULL); +static bool get_range (tree, gimple *, signop, offset_int[2], + range_query * = NULL); static bool check_read_access (tree, tree, tree = NULL_TREE, int = 1); unsigned HOST_WIDE_INT target_newline; @@ -4070,7 +4071,7 @@ check_read_access (tree exp, tree src, tree bound /* = NULL_TREE */, tree gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */, - const vr_values *rvals /* = NULL */) + range_query *rvals /* = NULL */) { if (!stmt) return NULL_TREE; @@ -4122,7 +4123,7 @@ gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */, if (!rng1) rng1 = rng1_buf; - if (!get_range (size, rng1, rvals)) + if (!get_range (size, stmt, rng1, rvals)) return NULL_TREE; if (argidx2 > nargs && TREE_CODE (size) == INTEGER_CST) @@ -4132,7 +4133,7 @@ gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */, of the upper bounds as a constant. Ignore anti-ranges. */ tree n = argidx2 < nargs ? gimple_call_arg (stmt, argidx2) : integer_one_node; wide_int rng2[2]; - if (!get_range (n, rng2, rvals)) + if (!get_range (n, stmt, rng2, rvals)) return NULL_TREE; /* Extend to the maximum precision to avoid overflow. */ @@ -4160,11 +4161,11 @@ gimple_call_alloc_size (gimple *stmt, wide_int rng1[2] /* = NULL */, result but accepts offset_int instead. */ static bool -get_range (tree x, signop sgn, offset_int r[2], - const vr_values *rvals /* = NULL */) +get_range (tree x, gimple *stmt, signop sgn, offset_int r[2], + range_query *rvals /* = NULL */) { wide_int wr[2]; - if (!get_range (x, wr, rvals)) + if (!get_range (x, stmt, wr, rvals)) return false; r[0] = offset_int::from (wr[0], sgn); @@ -4188,7 +4189,7 @@ get_range (tree x, signop sgn, offset_int r[2], static bool compute_objsize (tree ptr, int ostype, access_ref *pref, - bitmap *visited, const vr_values *rvals /* = NULL */) + bitmap *visited, range_query *rvals /* = NULL */) { const bool addr = TREE_CODE (ptr) == ADDR_EXPR; if (addr) @@ -4286,7 +4287,7 @@ compute_objsize (tree ptr, int ostype, access_ref *pref, offset_int orng[2]; tree off = TREE_OPERAND (ptr, 1); - if (!get_range (off, SIGNED, orng, rvals)) + if (!get_range (off, NULL, SIGNED, orng, rvals)) /* Fail unless the size of the object is zero. */ return pref->sizrng[0] == 0 && pref->sizrng[0] == pref->sizrng[1]; @@ -4367,7 +4368,7 @@ compute_objsize (tree ptr, int ostype, access_ref *pref, offset to the maximum. */ offset_int orng[2]; tree off = gimple_assign_rhs2 (stmt); - if (!get_range (off, SIGNED, orng, rvals)) + if (!get_range (off, stmt, SIGNED, orng, rvals)) { orng[0] = wi::to_offset (TYPE_MIN_VALUE (ptrdiff_type_node)); orng[1] = wi::to_offset (TYPE_MAX_VALUE (ptrdiff_type_node)); @@ -4397,7 +4398,7 @@ compute_objsize (tree ptr, int ostype, access_ref *pref, && !array_at_struct_end_p (ptr)) { if (tree size = TYPE_SIZE_UNIT (type)) - return get_range (size, UNSIGNED, pref->sizrng, rvals); + return get_range (size, NULL, UNSIGNED, pref->sizrng, rvals); } return false; @@ -4407,7 +4408,7 @@ compute_objsize (tree ptr, int ostype, access_ref *pref, static tree compute_objsize (tree ptr, int ostype, access_ref *pref, - const vr_values *rvals /* = NULL */) + range_query *rvals /* = NULL */) { bitmap visited = NULL; @@ -4439,7 +4440,7 @@ compute_objsize (tree ptr, int ostype, access_ref *pref, tree compute_objsize (tree ptr, int ostype, tree *pdecl /* = NULL */, - tree *poff /* = NULL */, const vr_values *rvals /* = NULL */) + tree *poff /* = NULL */, class range_query *rvals /* = NULL */) { /* Set the initial offsets to zero and size to negative to indicate none has been computed yet. */ diff --git a/gcc/builtins.h b/gcc/builtins.h index 94ff96b1292..5ebde7930cb 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -134,11 +134,10 @@ extern void set_builtin_user_assembler_name (tree decl, const char *asmspec); extern bool is_simple_builtin (tree); extern bool is_inexpensive_builtin (tree); -class vr_values; tree gimple_call_alloc_size (gimple *, wide_int[2] = NULL, - const vr_values * = NULL); + class range_query * = NULL); extern tree compute_objsize (tree, int, tree * = NULL, tree * = NULL, - const vr_values * = NULL); + class range_query * = NULL); extern bool readonly_data_expr (tree exp); extern bool init_target_chars (void); diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c index 70b031fe7b9..8b173987532 100644 --- a/gcc/gimple-ssa-sprintf.c +++ b/gcc/gimple-ssa-sprintf.c @@ -546,8 +546,8 @@ fmtresult::type_max_digits (tree type, int base) } static bool -get_int_range (tree, HOST_WIDE_INT *, HOST_WIDE_INT *, bool, HOST_WIDE_INT, - const vr_values *); +get_int_range (tree, gimple *, HOST_WIDE_INT *, HOST_WIDE_INT *, + bool, HOST_WIDE_INT, range_query *); struct call_info; @@ -597,7 +597,7 @@ struct directive /* Format conversion function that given a directive and an argument returns the formatting result. */ - fmtresult (*fmtfunc) (const directive &, tree, const vr_values *); + fmtresult (*fmtfunc) (gimple *, const directive &, tree, range_query *); /* Return True when the format flag CHR has been used. */ bool get_flag (char chr) const @@ -634,10 +634,7 @@ struct directive or 0, whichever is greater. For a non-constant ARG in some range set width to its range adjusting each bound to -1 if it's less. For an indeterminate ARG set width to [0, INT_MAX]. */ - void set_width (tree arg, const vr_values *vr) - { - get_int_range (arg, width, width + 1, true, 0, vr); - } + void set_width (tree arg, range_query *); /* Set both bounds of the precision range to VAL. */ void set_precision (HOST_WIDE_INT val) @@ -650,10 +647,7 @@ struct directive or -1 whichever is greater. For a non-constant ARG in some range set precision to its range adjusting each bound to -1 if it's less. For an indeterminate ARG set precision to [-1, INT_MAX]. */ - void set_precision (tree arg, const vr_values *vr) - { - get_int_range (arg, prec, prec + 1, false, -1, vr); - } + void set_precision (tree arg, range_query *query); /* Return true if both width and precision are known to be either constant or in some range, false otherwise. */ @@ -956,10 +950,22 @@ struct call_info } }; +void +directive::set_width (tree arg, range_query *query) +{ + get_int_range (arg, info->callstmt, width, width + 1, true, 0, query); +} + +void +directive::set_precision (tree arg, range_query *query) +{ + get_int_range (arg, info->callstmt, prec, prec + 1, false, -1, query); +} + /* Return the result of formatting a no-op directive (such as '%n'). */ static fmtresult -format_none (const directive &, tree, const vr_values *) +format_none (gimple *, const directive &, tree, range_query *) { fmtresult res (0); return res; @@ -968,7 +974,7 @@ format_none (const directive &, tree, const vr_values *) /* Return the result of formatting the '%%' directive. */ static fmtresult -format_percent (const directive &, tree, const vr_values *) +format_percent (gimple *, const directive &, tree, range_query *) { fmtresult res (1); return res; @@ -1026,9 +1032,10 @@ build_intmax_type_nodes (tree *pintmax, tree *puintmax) the determined range are replaced with NEGBOUND. */ static bool -get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax, +get_int_range (tree arg, gimple *stmt, + HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax, bool absolute, HOST_WIDE_INT negbound, - const class vr_values *vr_values) + range_query *query) { /* The type of the result. */ const_tree type = integer_type_node; @@ -1067,10 +1074,11 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax, && TYPE_PRECISION (argtype) <= TYPE_PRECISION (type)) { /* Try to determine the range of values of the integer argument. */ - const value_range_equiv *vr - = CONST_CAST (class vr_values *, vr_values)->get_value_range (arg); + value_range vr; + if (!query->range_of_expr (vr, arg, stmt)) + vr.set_varying (TREE_TYPE (arg)); - if (!vr->undefined_p () && !vr->varying_p () && !vr->symbolic_p ()) + if (!vr.undefined_p () && !vr.varying_p ()) { HOST_WIDE_INT type_min = (TYPE_UNSIGNED (argtype) @@ -1080,8 +1088,8 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax, HOST_WIDE_INT type_max = tree_to_uhwi (TYPE_MAX_VALUE (argtype)); tree type = TREE_TYPE (arg); - tree tmin = wide_int_to_tree (type, vr->lower_bound ()); - tree tmax = wide_int_to_tree (type, vr->upper_bound ()); + tree tmin = wide_int_to_tree (type, vr.lower_bound ()); + tree tmax = wide_int_to_tree (type, vr.upper_bound ()); *pmin = TREE_INT_CST_LOW (tmin); *pmax = TREE_INT_CST_LOW (tmax); @@ -1103,8 +1111,8 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, HOST_WIDE_INT *pmax, /* Handle an argument with an unknown range as if none had been provided. */ if (unknown) - return get_int_range (NULL_TREE, pmin, pmax, absolute, - negbound, vr_values); + return get_int_range (NULL_TREE, NULL, pmin, pmax, absolute, + negbound, query); } /* Adjust each bound as specified by ABSOLUTE and NEGBOUND. */ @@ -1189,7 +1197,8 @@ adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax) used when the directive argument or its value isn't known. */ static fmtresult -format_integer (const directive &dir, tree arg, const vr_values *vr_values) +format_integer (gimple *stmt, const directive &dir, tree arg, + range_query *query) { tree intmax_type_node; tree uintmax_type_node; @@ -1372,13 +1381,14 @@ format_integer (const directive &dir, tree arg, const vr_values *vr_values) { /* Try to determine the range of values of the integer argument (range information is not available for pointers). */ - const value_range_equiv *vr - = CONST_CAST (class vr_values *, vr_values)->get_value_range (arg); + value_range vr; + if (!query->range_of_expr (vr, arg, stmt)) + vr.set_varying (TREE_TYPE (arg)); - if (!vr->varying_p () && !vr->undefined_p () && !vr->symbolic_p ()) + if (!vr.varying_p () && !vr.undefined_p ()) { - argmin = wide_int_to_tree (TREE_TYPE (arg), vr->lower_bound ()); - argmax = wide_int_to_tree (TREE_TYPE (arg), vr->upper_bound ()); + argmin = wide_int_to_tree (TREE_TYPE (arg), vr.lower_bound ()); + argmax = wide_int_to_tree (TREE_TYPE (arg), vr.upper_bound ()); /* Set KNOWNRANGE if the argument is in a known subrange of the directive's type and neither width nor precision @@ -1404,7 +1414,7 @@ format_integer (const directive &dir, tree arg, const vr_values *vr_values) if (code == INTEGER_CST) { arg = gimple_assign_rhs1 (def); - return format_integer (dir, arg, vr_values); + return format_integer (def, dir, arg, query); } if (code == NOP_EXPR) @@ -1449,16 +1459,16 @@ format_integer (const directive &dir, tree arg, const vr_values *vr_values) /* For unsigned conversions/directives or signed when the minimum is positive, use the minimum and maximum to compute the shortest and longest output, respectively. */ - res.range.min = format_integer (dir, argmin, vr_values).range.min; - res.range.max = format_integer (dir, argmax, vr_values).range.max; + res.range.min = format_integer (stmt, dir, argmin, query).range.min; + res.range.max = format_integer (stmt, dir, argmax, query).range.max; } else if (tree_int_cst_sgn (argmax) < 0) { /* For signed conversions/directives if maximum is negative, use the minimum as the longest output and maximum as the shortest output. */ - res.range.min = format_integer (dir, argmax, vr_values).range.min; - res.range.max = format_integer (dir, argmin, vr_values).range.max; + res.range.min = format_integer (stmt, dir, argmax, query).range.min; + res.range.max = format_integer (stmt, dir, argmin, query).range.max; } else { @@ -1467,11 +1477,11 @@ format_integer (const directive &dir, tree arg, const vr_values *vr_values) length of the output of both minimum and maximum and pick the longer. */ unsigned HOST_WIDE_INT max1 - = format_integer (dir, argmin, vr_values).range.max; + = format_integer (stmt, dir, argmin, query).range.max; unsigned HOST_WIDE_INT max2 - = format_integer (dir, argmax, vr_values).range.max; + = format_integer (stmt, dir, argmax, query).range.max; res.range.min - = format_integer (dir, integer_zero_node, vr_values).range.min; + = format_integer (stmt, dir, integer_zero_node, query).range.min; res.range.max = MAX (max1, max2); } @@ -1820,7 +1830,7 @@ format_floating (const directive &dir, const HOST_WIDE_INT prec[2]) ARG. */ static fmtresult -format_floating (const directive &dir, tree arg, const vr_values *) +format_floating (gimple *, const directive &dir, tree arg, range_query *) { HOST_WIDE_INT prec[] = { dir.prec[0], dir.prec[1] }; tree type = (dir.modifier == FMT_LEN_L || dir.modifier == FMT_LEN_ll @@ -2014,7 +2024,8 @@ format_floating (const directive &dir, tree arg, const vr_values *) Used by the format_string function below. */ static fmtresult -get_string_length (tree str, unsigned eltsize, const vr_values *vr) +get_string_length (tree str, gimple *stmt, unsigned eltsize, + range_query *query) { if (!str) return fmtresult (); @@ -2025,7 +2036,7 @@ get_string_length (tree str, unsigned eltsize, const vr_values *vr) c_strlen_data lendata = { }; lendata.maxbound = str; if (eltsize == 1) - get_range_strlen_dynamic (str, &lendata, vr); + get_range_strlen_dynamic (str, stmt, &lendata, query); else { /* Determine the length of the shortest and longest string referenced @@ -2122,7 +2133,8 @@ get_string_length (tree str, unsigned eltsize, const vr_values *vr) vsprinf). */ static fmtresult -format_character (const directive &dir, tree arg, const vr_values *vr_values) +format_character (gimple *stmt, const directive &dir, tree arg, + range_query *query) { fmtresult res; @@ -2135,7 +2147,7 @@ format_character (const directive &dir, tree arg, const vr_values *vr_values) res.range.min = 0; HOST_WIDE_INT min, max; - if (get_int_range (arg, &min, &max, false, 0, vr_values)) + if (get_int_range (arg, stmt, &min, &max, false, 0, query)) { if (min == 0 && max == 0) { @@ -2433,7 +2445,8 @@ alias_offset (tree arg, tree dst, HOST_WIDE_INT dst_fld) vsprinf). */ static fmtresult -format_string (const directive &dir, tree arg, const vr_values *vr_values) +format_string (gimple *stmt, const directive &dir, tree arg, + range_query *query) { fmtresult res; @@ -2462,7 +2475,7 @@ format_string (const directive &dir, tree arg, const vr_values *vr_values) gcc_checking_assert (count_by == 2 || count_by == 4); } - fmtresult slen = get_string_length (arg, count_by, vr_values); + fmtresult slen = get_string_length (arg, stmt, count_by, query); if (slen.range.min == slen.range.max && slen.range.min < HOST_WIDE_INT_MAX) { @@ -2634,7 +2647,7 @@ format_string (const directive &dir, tree arg, const vr_values *vr_values) /* Format plain string (part of the format string itself). */ static fmtresult -format_plain (const directive &dir, tree, const vr_values *) +format_plain (gimple *, const directive &dir, tree, range_query *) { fmtresult res (dir.len); return res; @@ -3030,7 +3043,7 @@ bytes_remaining (unsigned HOST_WIDE_INT navail, const format_result &res) static bool format_directive (const call_info &info, format_result *res, const directive &dir, - const class vr_values *vr_values) + range_query *query) { /* Offset of the beginning of the directive from the beginning of the format string. */ @@ -3055,7 +3068,7 @@ format_directive (const call_info &info, return false; /* Compute the range of lengths of the formatted output. */ - fmtresult fmtres = dir.fmtfunc (dir, dir.arg, vr_values); + fmtresult fmtres = dir.fmtfunc (info.callstmt, dir, dir.arg, query); /* Record whether the output of all directives is known to be bounded by some maximum, implying that their arguments are @@ -3386,7 +3399,7 @@ static size_t parse_directive (call_info &info, directive &dir, format_result *res, const char *str, unsigned *argno, - const vr_values *vr_values) + range_query *query) { const char *pcnt = strchr (str, target_percent); dir.beg = str; @@ -3711,7 +3724,7 @@ parse_directive (call_info &info, if (star_width) { if (INTEGRAL_TYPE_P (TREE_TYPE (star_width))) - dir.set_width (star_width, vr_values); + dir.set_width (star_width, query); else { /* Width specified by a va_list takes on the range [0, -INT_MIN] @@ -3744,7 +3757,7 @@ parse_directive (call_info &info, if (star_precision) { if (INTEGRAL_TYPE_P (TREE_TYPE (star_precision))) - dir.set_precision (star_precision, vr_values); + dir.set_precision (star_precision, query); else { /* Precision specified by a va_list takes on the range [-1, INT_MAX] @@ -3958,7 +3971,8 @@ maybe_warn_overlap (call_info &info, format_result *res) that caused the processing to be terminated early). */ static bool -compute_format_length (call_info &info, format_result *res, const vr_values *vr) +compute_format_length (call_info &info, format_result *res, + range_query *query) { if (dump_file) { @@ -3995,10 +4009,10 @@ compute_format_length (call_info &info, format_result *res, const vr_values *vr) { directive dir (&info, dirno); - size_t n = parse_directive (info, dir, res, pf, &argno, vr); + size_t n = parse_directive (info, dir, res, pf, &argno, query); /* Return failure if the format function fails. */ - if (!format_directive (info, res, dir, vr)) + if (!format_directive (info, res, dir, query)) return false; /* Return success when the directive is zero bytes long and it's @@ -4288,13 +4302,14 @@ get_user_idx_format (tree fndecl, unsigned *idx_args) gsi_next should not be performed in the caller. */ bool -handle_printf_call (gimple_stmt_iterator *gsi, const vr_values *vr_values) +handle_printf_call (gimple_stmt_iterator *gsi, range_query *query) { init_target_to_host_charmap (); call_info info = call_info (); - info.callstmt = gsi_stmt (*gsi); + gimple *stmt = gsi_stmt (*gsi); + info.callstmt = stmt; info.func = gimple_call_fndecl (info.callstmt); if (!info.func) return false; @@ -4557,14 +4572,15 @@ handle_printf_call (gimple_stmt_iterator *gsi, const vr_values *vr_values) /* Try to determine the range of values of the argument and use the greater of the two at level 1 and the smaller of them at level 2. */ - const value_range_equiv *vr - = CONST_CAST (class vr_values *, vr_values)->get_value_range (size); + value_range vr; + if (!query->range_of_expr (vr, size, stmt)) + vr.set_varying (TREE_TYPE (size)); - if (!vr->undefined_p () && !vr->symbolic_p ()) + if (!vr.undefined_p ()) { tree type = TREE_TYPE (size); - tree tmin = wide_int_to_tree (type, vr->lower_bound ()); - tree tmax = wide_int_to_tree (type, vr->upper_bound ()); + tree tmin = wide_int_to_tree (type, vr.lower_bound ()); + tree tmax = wide_int_to_tree (type, vr.upper_bound ()); unsigned HOST_WIDE_INT minsize = TREE_INT_CST_LOW (tmin); unsigned HOST_WIDE_INT maxsize = TREE_INT_CST_LOW (tmax); dstsize = warn_level < 2 ? maxsize : minsize; @@ -4675,7 +4691,7 @@ handle_printf_call (gimple_stmt_iterator *gsi, const vr_values *vr_values) never set to true again). */ res.posunder4k = posunder4k && dstptr; - bool success = compute_format_length (info, &res, vr_values); + bool success = compute_format_length (info, &res, query); if (res.warned) gimple_set_no_warning (info.callstmt, true); diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c index 9907cc0c824..8f2413de3ff 100644 --- a/gcc/tree-ssa-strlen.c +++ b/gcc/tree-ssa-strlen.c @@ -200,7 +200,8 @@ static void handle_builtin_stxncpy_strncat (bool, gimple_stmt_iterator *); to determine the range, otherwise get_range_info. */ tree -get_range (tree val, wide_int minmax[2], const vr_values *rvals /* = NULL */) +get_range (tree val, gimple *stmt, wide_int minmax[2], + range_query *rvals /* = NULL */) { if (TREE_CODE (val) == INTEGER_CST) { @@ -211,21 +212,17 @@ get_range (tree val, wide_int minmax[2], const vr_values *rvals /* = NULL */) if (TREE_CODE (val) != SSA_NAME) return NULL_TREE; - if (rvals) - { - /* The range below may be "inaccurate" if a constant has been - substituted earlier for VAL by this pass that hasn't been - propagated through the CFG. This shoud be fixed by the new - on-demand VRP if/when it becomes available (hopefully in - GCC 11). */ - const value_range *vr - = (CONST_CAST (class vr_values *, rvals)->get_value_range (val)); - value_range_kind rng = vr->kind (); - if (rng != VR_RANGE || !range_int_cst_p (vr)) + if (rvals && stmt) + { + value_range vr; + if (!rvals->range_of_expr (vr, val, stmt)) + return NULL_TREE; + value_range_kind rng = vr.kind (); + if (rng != VR_RANGE) return NULL_TREE; - minmax[0] = wi::to_wide (vr->min ()); - minmax[1] = wi::to_wide (vr->max ()); + minmax[0] = wi::to_wide (vr.min ()); + minmax[1] = wi::to_wide (vr.max ()); return val; } @@ -263,7 +260,7 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off) static int compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off, - const vr_values *rvals) + range_query *rvals) { if (!si->nonzero_chars) return -1; @@ -274,20 +271,19 @@ compare_nonzero_chars (strinfo *si, unsigned HOST_WIDE_INT off, if (!rvals || TREE_CODE (si->nonzero_chars) != SSA_NAME) return -1; - const value_range_equiv *vr - = (CONST_CAST (class vr_values *, rvals) - ->get_value_range (si->nonzero_chars)); - - value_range_kind rng = vr->kind (); - if (rng != VR_RANGE || !range_int_cst_p (vr)) + value_range vr; + if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt)) + return -1; + value_range_kind rng = vr.kind (); + if (rng != VR_RANGE) return -1; /* If the offset is less than the minimum length or if the bounds of the length range are equal return the result of the comparison same as in the constant case. Otherwise return a conservative result. */ - int cmpmin = compare_tree_int (vr->min (), off); - if (cmpmin > 0 || tree_int_cst_equal (vr->min (), vr->max ())) + int cmpmin = compare_tree_int (vr.min (), off); + if (cmpmin > 0 || tree_int_cst_equal (vr.min (), vr.max ())) return cmpmin; return -1; @@ -332,7 +328,7 @@ get_next_strinfo (strinfo *si) static int get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out, - const vr_values *rvals = NULL) + range_query *rvals = NULL) { HOST_WIDE_INT off; struct stridxlist *list, *last = NULL; @@ -392,7 +388,7 @@ get_addr_stridx (tree exp, tree ptr, unsigned HOST_WIDE_INT *offset_out, When nonnull, uses RVALS to determine range information. */ static int -get_stridx (tree exp, wide_int offrng[2] = NULL, const vr_values *rvals = NULL) +get_stridx (tree exp, wide_int offrng[2] = NULL, range_query *rvals = NULL) { if (offrng) offrng[0] = offrng[1] = wi::zero (TYPE_PRECISION (ptrdiff_type_node)); @@ -474,7 +470,7 @@ get_stridx (tree exp, wide_int offrng[2] = NULL, const vr_values *rvals = NULL) return the index corresponding to the SSA_NAME. Do this irrespective of the whether the offset is known. */ - if (get_range (off, offrng, rvals)) + if (get_range (off, def_stmt, offrng, rvals)) { /* When the offset range is known, increment it it by the constant offset computed in prior @@ -864,11 +860,11 @@ get_string_length (strinfo *si) } /* Dump strlen data to FP for statement STMT. When non-null, RVALS - points to EVRP info and is used to dump strlen range for non-constant - results. */ + points to the valuation engine used to calculate ranges, and is + used to dump strlen range for non-constant results. */ DEBUG_FUNCTION void -dump_strlen_info (FILE *fp, gimple *stmt, const vr_values *rvals) +dump_strlen_info (FILE *fp, gimple *stmt, range_query *rvals) { if (stmt) { @@ -909,14 +905,15 @@ dump_strlen_info (FILE *fp, gimple *stmt, const vr_values *rvals) wide_int min, max; if (rvals) { - const value_range *vr - = CONST_CAST (class vr_values *, rvals) - ->get_value_range (si->nonzero_chars); - rng = vr->kind (); - if (range_int_cst_p (vr)) + value_range vr; + if (!rvals->range_of_expr (vr, si->nonzero_chars, + si->stmt)) + vr.set_varying (TREE_TYPE (si->nonzero_chars)); + rng = vr.kind (); + if (range_int_cst_p (&vr)) { - min = wi::to_wide (vr->min ()); - max = wi::to_wide (vr->max ()); + min = wi::to_wide (vr.min ()); + max = wi::to_wide (vr.max ()); } else rng = VR_UNDEFINED; @@ -1004,13 +1001,14 @@ dump_strlen_info (FILE *fp, gimple *stmt, const vr_values *rvals) /* Attempt to determine the length of the string SRC. On success, store the length in *PDATA and return true. Otherwise, return false. - VISITED is a bitmap of visited PHI nodes. RVALS points to EVRP info - and PSSA_DEF_MAX to an SSA_NAME assignment limit used to prevent runaway - recursion. */ + VISITED is a bitmap of visited PHI nodes. RVALS points to the valuation + engine used to calculate ranges. PSSA_DEF_MAX to an SSA_NAME + assignment limit used to prevent runaway recursion. */ static bool -get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited, - const vr_values *rvals, unsigned *pssa_def_max) +get_range_strlen_dynamic (tree src, gimple *stmt, + c_strlen_data *pdata, bitmap *visited, + range_query *rvals, unsigned *pssa_def_max) { int idx = get_stridx (src); if (!idx) @@ -1042,8 +1040,8 @@ get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited, continue; c_strlen_data argdata = { }; - if (get_range_strlen_dynamic (arg, &argdata, visited, rvals, - pssa_def_max)) + if (get_range_strlen_dynamic (arg, phi, &argdata, visited, + rvals, pssa_def_max)) { /* Set the DECL of an unterminated array this argument refers to if one hasn't been found yet. */ @@ -1110,14 +1108,13 @@ get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited, pdata->minlen = si->nonzero_chars; else if (TREE_CODE (si->nonzero_chars) == SSA_NAME) { - const value_range_equiv *vr - = CONST_CAST (class vr_values *, rvals) - ->get_value_range (si->nonzero_chars); - if (vr->kind () == VR_RANGE - && range_int_cst_p (vr)) + value_range vr; + if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt)) + vr.set_varying (TREE_TYPE (si->nonzero_chars)); + if (range_int_cst_p (&vr)) { - pdata->minlen = vr->min (); - pdata->maxlen = vr->max (); + pdata->minlen = vr.min (); + pdata->maxlen = vr.max (); } else pdata->minlen = build_zero_cst (size_type_node); @@ -1156,14 +1153,13 @@ get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited, } else if (pdata->minlen && TREE_CODE (pdata->minlen) == SSA_NAME) { - const value_range_equiv *vr - = CONST_CAST (class vr_values *, rvals) - ->get_value_range (si->nonzero_chars); - if (vr->kind () == VR_RANGE - && range_int_cst_p (vr)) + value_range vr; + if (!rvals->range_of_expr (vr, si->nonzero_chars, stmt)) + vr.set_varying (TREE_TYPE (si->nonzero_chars)); + if (range_int_cst_p (&vr)) { - pdata->minlen = vr->min (); - pdata->maxlen = vr->max (); + pdata->minlen = vr.min (); + pdata->maxlen = vr.max (); pdata->maxbound = pdata->maxlen; } else @@ -1198,17 +1194,17 @@ get_range_strlen_dynamic (tree src, c_strlen_data *pdata, bitmap *visited, Try to obtain the range of the lengths of the string(s) referenced by SRC, or the size of the largest array SRC refers to if the range of lengths cannot be determined, and store all in *PDATA. RVALS - points to EVRP info. */ + points to the valuation engine used to calculate ranges. */ void -get_range_strlen_dynamic (tree src, c_strlen_data *pdata, - const vr_values *rvals) +get_range_strlen_dynamic (tree src, gimple *stmt, c_strlen_data *pdata, + range_query *rvals) { bitmap visited = NULL; tree maxbound = pdata->maxbound; unsigned limit = param_ssa_name_def_chain_limit; - if (!get_range_strlen_dynamic (src, pdata, &visited, rvals, &limit)) + if (!get_range_strlen_dynamic (src, stmt, pdata, &visited, rvals, &limit)) { /* On failure extend the length range to an impossible maximum (a valid MAXLEN must be less than PTRDIFF_MAX - 1). Other @@ -1803,6 +1799,7 @@ set_strlen_range (tree lhs, wide_int min, wide_int max, else if (TREE_CODE (bound) == SSA_NAME) { wide_int minbound, maxbound; + // FIXME: Use range_query instead of global ranges. value_range_kind rng = get_range_info (bound, &minbound, &maxbound); if (rng == VR_RANGE) { @@ -1907,7 +1904,7 @@ maybe_set_strlen_range (tree lhs, tree src, tree bound) static void maybe_warn_overflow (gimple *stmt, tree len, - const vr_values *rvals = NULL, + range_query *rvals = NULL, strinfo *si = NULL, bool plus_one = false, bool rawmem = false) { @@ -1959,7 +1956,7 @@ maybe_warn_overflow (gimple *stmt, tree len, tree off = TREE_OPERAND (ref, 1); ref = TREE_OPERAND (ref, 0); wide_int rng[2]; - if (get_range (off, rng, rvals)) + if (get_range (off, stmt, rng, rvals)) { /* Convert offsets to the maximum precision. */ offrng[0] = widest_int::from (rng[0], SIGNED); @@ -1977,7 +1974,7 @@ maybe_warn_overflow (gimple *stmt, tree len, tree mem_off = TREE_OPERAND (ref, 1); ref = TREE_OPERAND (ref, 0); wide_int rng[2]; - if (get_range (mem_off, rng, rvals)) + if (get_range (mem_off, stmt, rng, rvals)) { offrng[0] += widest_int::from (rng[0], SIGNED); offrng[1] += widest_int::from (rng[1], SIGNED); @@ -2049,7 +2046,7 @@ maybe_warn_overflow (gimple *stmt, tree len, } wide_int rng[2]; - if (get_range (destsize, rng, rvals)) + if (get_range (destsize, stmt, rng, rvals)) { sizrng[0] = widest_int::from (rng[0], UNSIGNED); sizrng[1] = widest_int::from (rng[1], UNSIGNED); @@ -2080,7 +2077,7 @@ maybe_warn_overflow (gimple *stmt, tree len, return; wide_int rng[2]; - if (!get_range (len, rng, rvals)) + if (!get_range (len, stmt, rng, rvals)) return; widest_int lenrng[2] = @@ -2231,7 +2228,7 @@ maybe_warn_overflow (gimple *stmt, tree len, if (destoff) { wide_int rng[2]; - if (get_range (destoff, rng)) + if (get_range (destoff, stmt, rng)) { offrng[0] = widest_int::from (rng[0], SIGNED); offrng[1] = widest_int::from (rng[1], SIGNED); @@ -2339,7 +2336,7 @@ maybe_warn_overflow (gimple *stmt, tree len, static inline void maybe_warn_overflow (gimple *stmt, unsigned HOST_WIDE_INT len, - const vr_values *rvals = NULL, strinfo *si = NULL, + range_query *rvals = NULL, strinfo *si = NULL, bool plus_one = false, bool rawmem = false) { maybe_warn_overflow (stmt, build_int_cst (size_type_node, len), rvals, @@ -2642,7 +2639,7 @@ handle_builtin_strchr (gimple_stmt_iterator *gsi) static void handle_builtin_strcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi, - const vr_values *rvals) + range_query *rvals) { int idx, didx; tree src, dst, srclen, len, lhs, type, fn, oldlen; @@ -3036,6 +3033,7 @@ maybe_diag_stxncpy_trunc (gimple_stmt_iterator gsi, tree src, tree cnt) cntrange[0] = cntrange[1] = wi::to_wide (cnt); else if (TREE_CODE (cnt) == SSA_NAME) { + // FIXME: Use range_query instead of global ranges. enum value_range_kind rng = get_range_info (cnt, cntrange, cntrange + 1); if (rng == VR_RANGE) ; @@ -3444,7 +3442,7 @@ handle_builtin_stxncpy_strncat (bool append_p, gimple_stmt_iterator *gsi) static void handle_builtin_memcpy (enum built_in_function bcode, gimple_stmt_iterator *gsi, - const vr_values *rvals) + range_query *rvals) { tree lhs, oldlen, newlen; gimple *stmt = gsi_stmt (*gsi); @@ -3909,7 +3907,7 @@ handle_alloc_call (enum built_in_function bcode, gimple_stmt_iterator *gsi) static bool handle_builtin_memset (gimple_stmt_iterator *gsi, bool *zero_write, - const vr_values *rvals) + range_query *rvals) { gimple *memset_stmt = gsi_stmt (*gsi); tree ptr = gimple_call_arg (memset_stmt, 0); @@ -4103,9 +4101,10 @@ handle_builtin_memcmp (gimple_stmt_iterator *gsi) determine range information. Returns true on success. */ static bool -get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2], +get_len_or_size (gimple *stmt, tree arg, int idx, + unsigned HOST_WIDE_INT lenrng[2], unsigned HOST_WIDE_INT *size, bool *nulterm, - const vr_values *rvals) + range_query *rvals) { /* Invalidate. */ *size = HOST_WIDE_INT_M1U; @@ -4140,6 +4139,7 @@ get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2], else if (TREE_CODE (si->nonzero_chars) == SSA_NAME) { wide_int min, max; + // FIXME: Use range_query instead of global ranges. value_range_kind rng = get_range_info (si->nonzero_chars, &min, &max); if (rng == VR_RANGE) { @@ -4158,7 +4158,7 @@ get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2], /* Set MAXBOUND to an arbitrary non-null non-integer node as a request to have it set to the length of the longest string in a PHI. */ lendata.maxbound = arg; - get_range_strlen_dynamic (arg, &lendata, rvals); + get_range_strlen_dynamic (arg, stmt, &lendata, rvals); unsigned HOST_WIDE_INT maxbound = HOST_WIDE_INT_M1U; if (tree_fits_uhwi_p (lendata.maxbound) @@ -4216,17 +4216,17 @@ get_len_or_size (tree arg, int idx, unsigned HOST_WIDE_INT lenrng[2], Otherwise return null. */ static tree -strxcmp_eqz_result (tree arg1, int idx1, tree arg2, int idx2, +strxcmp_eqz_result (gimple *stmt, tree arg1, int idx1, tree arg2, int idx2, unsigned HOST_WIDE_INT bound, unsigned HOST_WIDE_INT len[2], - unsigned HOST_WIDE_INT *psize, const vr_values *rvals) + unsigned HOST_WIDE_INT *psize, range_query *rvals) { /* Determine the range the length of each string is in and whether it's known to be nul-terminated, or the size of the array it's stored in. */ bool nul1, nul2; unsigned HOST_WIDE_INT siz1, siz2; unsigned HOST_WIDE_INT len1rng[2], len2rng[2]; - if (!get_len_or_size (arg1, idx1, len1rng, &siz1, &nul1, rvals) - || !get_len_or_size (arg2, idx2, len2rng, &siz2, &nul2, rvals)) + if (!get_len_or_size (stmt, arg1, idx1, len1rng, &siz1, &nul1, rvals) + || !get_len_or_size (stmt, arg2, idx2, len2rng, &siz2, &nul2, rvals)) return NULL_TREE; /* BOUND is set to HWI_M1U for strcmp and less to strncmp, and LENiRNG @@ -4375,7 +4375,7 @@ maybe_warn_pointless_strcmp (gimple *stmt, HOST_WIDE_INT bound, another and false otherwise. */ static bool -handle_builtin_string_cmp (gimple_stmt_iterator *gsi, const vr_values *rvals) +handle_builtin_string_cmp (gimple_stmt_iterator *gsi, range_query *rvals) { gcall *stmt = as_a <gcall *> (gsi_stmt (*gsi)); tree lhs = gimple_call_lhs (stmt); @@ -4420,7 +4420,7 @@ handle_builtin_string_cmp (gimple_stmt_iterator *gsi, const vr_values *rvals) /* Try to determine if the two strings are either definitely equal or definitely unequal and if so, either fold the result to zero (when equal) or set the range of the result to ~[0, 0] otherwise. */ - if (tree eqz = strxcmp_eqz_result (arg1, idx1, arg2, idx2, bound, + if (tree eqz = strxcmp_eqz_result (stmt, arg1, idx1, arg2, idx2, bound, len, &siz, rvals)) { if (integer_zerop (eqz)) @@ -4457,8 +4457,9 @@ handle_builtin_string_cmp (gimple_stmt_iterator *gsi, const vr_values *rvals) unsigned HOST_WIDE_INT arsz1, arsz2; bool nulterm[2]; - if (!get_len_or_size (arg1, idx1, len1rng, &arsz1, nulterm, rvals) - || !get_len_or_size (arg2, idx2, len2rng, &arsz2, nulterm + 1, rvals)) + if (!get_len_or_size (stmt, arg1, idx1, len1rng, &arsz1, nulterm, rvals) + || !get_len_or_size (stmt, arg2, idx2, len2rng, &arsz2, nulterm + 1, + rvals)) return false; if (len1rng[0] == len1rng[1] && len1rng[0] < HOST_WIDE_INT_MAX) @@ -4623,7 +4624,7 @@ int ssa_name_limit_t::next_ssa_name (tree ssa_name) static bool count_nonzero_bytes_addr (tree, unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, unsigned [3], bool *, bool *, bool *, - const vr_values *, ssa_name_limit_t &); + range_query *, ssa_name_limit_t &); /* Determines the minimum and maximum number of leading non-zero bytes in the representation of EXP and set LENRANGE[0] and LENRANGE[1] @@ -4644,7 +4645,7 @@ static bool count_nonzero_bytes (tree exp, unsigned HOST_WIDE_INT offset, unsigned HOST_WIDE_INT nbytes, unsigned lenrange[3], bool *nulterm, - bool *allnul, bool *allnonnul, const vr_values *rvals, + bool *allnul, bool *allnonnul, range_query *rvals, ssa_name_limit_t &snlim) { if (TREE_CODE (exp) == SSA_NAME) @@ -4836,7 +4837,7 @@ count_nonzero_bytes_addr (tree exp, unsigned HOST_WIDE_INT offset, unsigned HOST_WIDE_INT nbytes, unsigned lenrange[3], bool *nulterm, bool *allnul, bool *allnonnul, - const vr_values *rvals, ssa_name_limit_t &snlim) + range_query *rvals, ssa_name_limit_t &snlim) { int idx = get_stridx (exp); if (idx > 0) @@ -4853,13 +4854,14 @@ count_nonzero_bytes_addr (tree exp, unsigned HOST_WIDE_INT offset, else if (si->nonzero_chars && TREE_CODE (si->nonzero_chars) == SSA_NAME) { - vr_values *v = CONST_CAST (vr_values *, rvals); - const value_range_equiv *vr = v->get_value_range (si->nonzero_chars); - if (vr->kind () != VR_RANGE || !range_int_cst_p (vr)) + value_range vr; + if (!rvals->range_of_expr (vr, si->nonzero_chars, si->stmt)) + vr.set_varying (TREE_TYPE (si->nonzero_chars)); + if (vr.kind () != VR_RANGE) return false; - minlen = tree_to_uhwi (vr->min ()); - maxlen = tree_to_uhwi (vr->max ()); + minlen = tree_to_uhwi (vr.min ()); + maxlen = tree_to_uhwi (vr.max ()); } else return false; @@ -4948,7 +4950,7 @@ count_nonzero_bytes_addr (tree exp, unsigned HOST_WIDE_INT offset, static bool count_nonzero_bytes (tree exp, unsigned lenrange[3], bool *nulterm, - bool *allnul, bool *allnonnul, const vr_values *rvals) + bool *allnul, bool *allnonnul, range_query *rvals) { /* Set to optimistic values so the caller doesn't have to worry about initializing these and to what. On success, the function will clear @@ -4972,7 +4974,7 @@ count_nonzero_bytes (tree exp, unsigned lenrange[3], bool *nulterm, static bool handle_store (gimple_stmt_iterator *gsi, bool *zero_write, - const vr_values *rvals) + range_query *rvals) { int idx = -1; strinfo *si = NULL; @@ -5382,7 +5384,7 @@ is_char_type (tree type) static bool strlen_check_and_optimize_call (gimple_stmt_iterator *gsi, bool *zero_write, - const vr_values *rvals) + range_query *rvals) { gimple *stmt = gsi_stmt (*gsi); @@ -5473,7 +5475,7 @@ strlen_check_and_optimize_call (gimple_stmt_iterator *gsi, bool *zero_write, static void handle_integral_assign (gimple_stmt_iterator *gsi, bool *cleanup_eh, - const vr_values *rvals) + range_query *rvals) { gimple *stmt = gsi_stmt (*gsi); tree lhs = gimple_assign_lhs (stmt); @@ -5565,6 +5567,7 @@ handle_integral_assign (gimple_stmt_iterator *gsi, bool *cleanup_eh, wide_int min, max; signop sign = TYPE_SIGN (lhs_type); int prec = TYPE_PRECISION (lhs_type); + // FIXME: Use range_query instead of global ranges. value_range_kind vr = get_range_info (lhs, &min, &max); if (vr == VR_VARYING || (vr == VR_RANGE @@ -5617,7 +5620,7 @@ handle_integral_assign (gimple_stmt_iterator *gsi, bool *cleanup_eh, static bool check_and_optimize_stmt (gimple_stmt_iterator *gsi, bool *cleanup_eh, - const vr_values *rvals) + range_query *rvals) { gimple *stmt = gsi_stmt (*gsi); diff --git a/gcc/tree-ssa-strlen.h b/gcc/tree-ssa-strlen.h index a11c4d579a1..225f64b1630 100644 --- a/gcc/tree-ssa-strlen.h +++ b/gcc/tree-ssa-strlen.h @@ -25,13 +25,14 @@ extern bool is_strlen_related_p (tree, tree); extern bool maybe_diag_stxncpy_trunc (gimple_stmt_iterator, tree, tree); extern tree set_strlen_range (tree, wide_int, wide_int, tree = NULL_TREE); -class vr_values; -extern tree get_range (tree, wide_int[2], const vr_values * = NULL); +extern tree get_range (tree, gimple *, wide_int[2], + class range_query * = NULL); struct c_strlen_data; -extern void get_range_strlen_dynamic (tree , c_strlen_data *, const vr_values *); +extern void get_range_strlen_dynamic (tree, gimple *, c_strlen_data *, + class range_query *); /* APIs internal to strlen pass. Defined in gimple-ssa-sprintf.c. */ -extern bool handle_printf_call (gimple_stmt_iterator *, const vr_values *); +extern bool handle_printf_call (gimple_stmt_iterator *, class range_query *); #endif // GCC_TREE_SSA_STRLEN_H -- 2.26.2