Message ID | ZDpVL1KlzxWJKDzy@tucnak |
---|---|
State | New |
Headers | show |
Series | if-conv: Small improvement for expansion of complex PHIs [PR109154] | expand |
> Am 15.04.2023 um 10:30 schrieb Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org>: > > Hi! > > The following patch is just a dumb improvement, gets rid of 2 unnecessary > instructions on both the PR's original testcase and on the two reduced ones, > both on -mcpu=neoverse-v1 and -mavx512f. > > The thing is, if we have args_len (args_len >= 2) unique PHI arguments, > we need only args_len - 1 COND_EXPRs to expand the PHI, because first > COND_EXPR can merge 2 unique arguments and all the following ones merge > another unique argument with the previously merged arguments, > while the code for mysterious reasons was always emitting args_len > COND_EXPRs, where the first COND_EXPR merged the first and second unique > arguments, the second COND_EXPR merged the second unique argument with > result of merging the first and second unique arguments and the rest was > already expectable, nth COND_EXPR for n > 2 merged the nth unique argument > with result of merging the previous unique arguments. > Now, in my understanding, the bb_predicate for bb's predecessor need to > form a disjunct set which together creates the successor's bb_predicate, > so I don't see why we'd need to check all the bb_predicates, if we check > all but one then when all those other ones are false the last bb_predicate > is necessarily true. Given that the code attempts to sort argument with > most occurrences (so likely most complex combined predicate) last, I chose > not to test that last argument's predicate. > So e.g. on the testcase from comment 47 in the PR: > void > foo (int *f, int d, int e) > { > for (int i = 0; i < 1024; i++) > { > int a = f[i]; > int t; > if (a < 0) > t = 1; > else if (a < e) > t = 1 - a * d; > else > t = 0; > f[i] = t; > } > } > we used to emit: > _7 = a_10 < 0; > _21 = a_10 >= 0; > _22 = a_10 < e_11(D); > _23 = _21 & _22; > _26 = a_10 >= e_11(D); > _27 = _21 & _26; > _ifc__42 = _7 ? 1 : t_13; > _ifc__43 = _23 ? t_13 : _ifc__42; > t_6 = _27 ? 0 : _ifc__43; > while the following patch changes it to: > _7 = a_10 < 0; > _21 = a_10 >= 0; > _22 = a_10 < e_11(D); > _23 = _21 & _22; > _ifc__42 = _23 ? t_13 : 0; > t_6 = _7 ? 1 : _ifc__42; > which I believe should be sufficient for a PHI <1, t_13, 0>. > > I've gathered some statistics and on x86_64-linux and i686-linux > bootstraps/regtests, this code triggers: > 92 4 4 > 112 2 4 > 141 3 4 > 4046 3 3 > (where 2nd number is args_len and 3rd argument EDGE_COUNT (bb->preds) > and first argument count of those from sort | uniq -c | sort -n). > In all these cases the patch should squeze one extra COND_EXPR and > its associated predicate (the latter only if it wasn't used elsewhere). > > Incrementally, I think we should try to perform some analysis on which > predicates depend on inverses of other predicates and if possible try > to sort the arguments better and omit testing unnecessary predicates. > So essentially for the above testcase deconstruct it back to: > _7 = a_10 < 0; > _22 = a_10 < e_11(D); > _ifc__42 = _22 ? t_13 : 0; > t_6 = _7 ? 1 : _ifc__42; > which is like what this patch produces, but with the & a_10 >= 0 part > removed, because the last predicate is a_10 < 0 and so testing a_10 >= 0 > on what appears on the false branch doesn't make sense. > But I'm afraid that will take more work than is doable in stage4 right now. Agreed. > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Yes - thanks for spotting this obvious improvement. Richard > 2023-04-15 Jakub Jelinek <jakub@redhat.com> > > PR tree-optimization/109154 > * tree-if-conv.cc (predicate_scalar_phi): For complex PHIs, emit just > args_len - 1 COND_EXPRs rather than args_len. Formatting fix. > > --- gcc/tree-if-conv.cc.jj 2023-04-12 08:53:58.264496474 +0200 > +++ gcc/tree-if-conv.cc 2023-04-14 21:02:42.403826690 +0200 > @@ -2071,7 +2071,7 @@ predicate_scalar_phi (gphi *phi, gimple_ > } > > /* Put element with max number of occurences to the end of ARGS. */ > - if (max_ind != -1 && max_ind +1 != (int) args_len) > + if (max_ind != -1 && max_ind + 1 != (int) args_len) > std::swap (args[args_len - 1], args[max_ind]); > > /* Handle one special case when number of arguments with different values > @@ -2116,12 +2116,12 @@ predicate_scalar_phi (gphi *phi, gimple_ > vec<int> *indexes; > tree type = TREE_TYPE (gimple_phi_result (phi)); > tree lhs; > - arg1 = args[1]; > - for (i = 0; i < args_len; i++) > + arg1 = args[args_len - 1]; > + for (i = args_len - 1; i > 0; i--) > { > - arg0 = args[i]; > - indexes = phi_arg_map.get (args[i]); > - if (i != args_len - 1) > + arg0 = args[i - 1]; > + indexes = phi_arg_map.get (args[i - 1]); > + if (i != 1) > lhs = make_temp_ssa_name (type, NULL, "_ifc_"); > else > lhs = res; > > Jakub >
--- gcc/tree-if-conv.cc.jj 2023-04-12 08:53:58.264496474 +0200 +++ gcc/tree-if-conv.cc 2023-04-14 21:02:42.403826690 +0200 @@ -2071,7 +2071,7 @@ predicate_scalar_phi (gphi *phi, gimple_ } /* Put element with max number of occurences to the end of ARGS. */ - if (max_ind != -1 && max_ind +1 != (int) args_len) + if (max_ind != -1 && max_ind + 1 != (int) args_len) std::swap (args[args_len - 1], args[max_ind]); /* Handle one special case when number of arguments with different values @@ -2116,12 +2116,12 @@ predicate_scalar_phi (gphi *phi, gimple_ vec<int> *indexes; tree type = TREE_TYPE (gimple_phi_result (phi)); tree lhs; - arg1 = args[1]; - for (i = 0; i < args_len; i++) + arg1 = args[args_len - 1]; + for (i = args_len - 1; i > 0; i--) { - arg0 = args[i]; - indexes = phi_arg_map.get (args[i]); - if (i != args_len - 1) + arg0 = args[i - 1]; + indexes = phi_arg_map.get (args[i - 1]); + if (i != 1) lhs = make_temp_ssa_name (type, NULL, "_ifc_"); else lhs = res;