Message ID | 5e79620e-8adb-7768-a34e-32fd286995f0@redhat.com |
---|---|
State | New |
Headers | show |
On 12/16/2016 07:41 AM, Aldy Hernandez wrote: > > BTW, I don't understand why we don't have auto_bitmap's, as we already > have auto_sbitmap's. I've implemented the former based on > auto_sbitmap's code we already have. Trevor poked at it a bit. bitmaps are a bit more complex than sbitmaps in terms of implementation details. https://gcc.gnu.org/ml/gcc/2016-04/msg00013.html But his suggestion was to first create auto_bitmap, then look to convert to using that as opposed to his other approaches. > > The attached patch fixes the bug without introducing any regressions. > > I also tested the patch by compiling 242 .ii files with -O3. These were > gathered from a stage1 build with -save-temps. There is a slight time > degradation of 4 seconds within 27 minutes of user time: > > tainted: 26:52 > orig: 26:48 > > This was the average aggregate time of two runs compiling all 242 .ii > files. IMO, this looks reasonable. It is after all, -O3. Is it > acceptable? > > Aldy > > curr > > > commit 2310bcd0e2552a40ca1de354faf005ed3e9daf4e > Author: Aldy Hernandez <aldyh@redhat.com> > Date: Fri Dec 16 03:44:52 2016 -0500 > > PR tree-optimization/71691 > * bitmap.h (class auto_bitmap): New. > * tree-ssa-loop-unswitch.c (ssa_maybe_undefined_value_p): New. > (tree_may_unswitch_on): Call ssa_maybe_undefined_value_p. So I'm going to defer to Richi since he was reviewing my original attempt in this space. It probably doesn't matter in practice (when I looked at this I couldn't get into the code in question with a -O3 bootstrap or with the testsuite, just with the testcase in the BZ) but you might consider handling an already visited node slightly differently. If the the node was visited and resolved as undefined, then we would have already exited the loop. If the node was visited and resolved as defined, then we could just keep processing other items on the the worklist. The case where you want to conservatively return false is when you're actively processing the name in question. Jeff
On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > Hi folks. > > This is a follow-up on Jeff and Richi's interaction on the aforementioned PR > here: > > https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html > > I decided to explore the idea of analyzing may-undefness on-demand, which > actually looks rather cheap. > > BTW, I don't understand why we don't have auto_bitmap's, as we already have > auto_sbitmap's. I've implemented the former based on auto_sbitmap's code we > already have. > > The attached patch fixes the bug without introducing any regressions. > > I also tested the patch by compiling 242 .ii files with -O3. These were > gathered from a stage1 build with -save-temps. There is a slight time > degradation of 4 seconds within 27 minutes of user time: > > tainted: 26:52 > orig: 26:48 > > This was the average aggregate time of two runs compiling all 242 .ii files. > IMO, this looks reasonable. It is after all, -O3. Is it acceptable? + while (!worklist.is_empty ()) + { + name = worklist.pop (); + gcc_assert (TREE_CODE (name) == SSA_NAME); + + if (ssa_undefined_value_p (name, true)) + return true; + + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); it should be already set as we use visited_ssa as "was it ever on the worklist", so maybe renaming it would be a good thing as well. + if (TREE_CODE (name) == SSA_NAME) + { + /* If an SSA has already been seen, this may be a loop. + Fail conservatively. */ + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) + return false; so to me "conservative" is returning true, not false. + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); + worklist.safe_push (name); but for loops we can just continue and ignore this use. And bitmap_set_bit returns whether it set a bit, thus if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name))) worklist.safe_push (name); should work? + /* Check that any SSA names used to define NAME is also fully + defined. */ + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE) + { + name = USE_FROM_PTR (use_p); + if (TREE_CODE (name) == SSA_NAME) always true. + { + /* If an SSA has already been seen, this may be a loop. + Fail conservatively. */ + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) + return false; + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); + worklist.safe_push (name); See above. In principle the thing is sound but I'd like to be able to pass in a bitmap of known maybe-undefined/must-defined SSA names to have a cache for multiple invocations from within a pass (like this unswitching case). Also once you hit defs that are in a post-dominated region of the loop entry you can treat them as not undefined (as their use invokes undefined behavior). This is also how you treat function parameters (well, ssa_undefined_value_p does), where the call site invokes undefined behavior when passing in undefined values. So we need an extra parameter specifying the post-dominance region. You do not handle memory or calls conservatively which means the existing testcase only needs some obfuscation to become a problem again. To fix that before /* Check that any SSA names used to define NAME is also fully defined. */ bail out conservatively, like if (! is_gimple_assign (def) || gimple_assign_single_p (def)) return true; Only unswitching on conditions that post-dominate the loop entry might be a simpler solution for the PR in question. OTOH this may disable too much unswitching in practice. Richard. > Aldy
On 12/20/2016 07:16 AM, Richard Biener wrote: > > it should be already set as we use visited_ssa as "was it ever on the worklist", > so maybe renaming it would be a good thing as well. > > + if (TREE_CODE (name) == SSA_NAME) > + { > + /* If an SSA has already been seen, this may be a loop. > + Fail conservatively. */ > + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) > + return false; > > so to me "conservative" is returning true, not false. Agreed. > > + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); > + worklist.safe_push (name); > > but for loops we can just continue and ignore this use. And bitmap_set_bit > returns whether it set a bit, thus > > if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name))) > worklist.safe_push (name); > > should work? What if we're in an irreducible region? > > In principle the thing is sound but I'd like to be able to pass in a bitmap of > known maybe-undefined/must-defined SSA names to have a cache for > multiple invocations from within a pass (like this unswitching case). So that means keeping another bitmap for things positively identified as defined, then saving it for later invocations. So maybe some of my work + some of his melded together. (mine computes once for the entire FN and saves the result, so converting it to on-demand saving the result shouldn't be terrible). > > Only unswitching on conditions that post-dominate the loop entry might be a > simpler solution for the PR in question. OTOH this may disable too much > unswitching in practice. It might be worth experimentation. Jeff
On December 20, 2016 4:50:25 PM GMT+01:00, Jeff Law <law@redhat.com> wrote: >On 12/20/2016 07:16 AM, Richard Biener wrote: >> >> it should be already set as we use visited_ssa as "was it ever on the >worklist", >> so maybe renaming it would be a good thing as well. >> >> + if (TREE_CODE (name) == SSA_NAME) >> + { >> + /* If an SSA has already been seen, this may be a >loop. >> + Fail conservatively. */ >> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION >(name))) >> + return false; >> >> so to me "conservative" is returning true, not false. >Agreed. > >> >> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION >(name)); >> + worklist.safe_push (name); >> >> but for loops we can just continue and ignore this use. And >bitmap_set_bit >> returns whether it set a bit, thus >> >> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION >(name))) >> worklist.safe_push (name); >> >> should work? >What if we're in an irreducible region? Handling all back edges by deferring to their defs should work. At least I can't see how it would not. > >> >> In principle the thing is sound but I'd like to be able to pass in a >bitmap of >> known maybe-undefined/must-defined SSA names to have a cache for >> multiple invocations from within a pass (like this unswitching case). >So that means keeping another bitmap for things positively identified >as >defined, then saving it for later invocations. We eventually need a tristate here for maximum caching. And as the result depends on the dominating BB of the postdom region the savings might be questionable. > So maybe some of my >work >+ some of his melded together. (mine computes once for the entire FN >and saves the result, so converting it to on-demand saving the result >shouldn't be terrible). > > >> >> Only unswitching on conditions that post-dominate the loop entry >might be a >> simpler solution for the PR in question. OTOH this may disable too >much >> unswitching in practice. >It might be worth experimentation. > >Jeff
On 12/20/2016 10:33 AM, Richard Biener wrote: >>> >>> but for loops we can just continue and ignore this use. And >> bitmap_set_bit >>> returns whether it set a bit, thus >>> >>> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION >> (name))) >>> worklist.safe_push (name); >>> >>> should work? >> What if we're in an irreducible region? > > Handling all back edges by deferring to their defs should work. At least I can't see how it would not. I'll take your word for it -- I haven't thought deeply about this. > >> >>> >>> In principle the thing is sound but I'd like to be able to pass in a >> bitmap of >>> known maybe-undefined/must-defined SSA names to have a cache for >>> multiple invocations from within a pass (like this unswitching case). >> So that means keeping another bitmap for things positively identified >> as >> defined, then saving it for later invocations. > > We eventually need a tristate here for maximum caching. And as the result depends on the dominating BB of the postdom region the savings might be questionable. Possibly. Jeff
On 12/20/2016 09:16 AM, Richard Biener wrote: > You do not handle memory or calls conservatively which means the existing > testcase only needs some obfuscation to become a problem again. To fix > that before /* Check that any SSA names used to define NAME is also fully > defined. */ bail out conservatively, like > > if (! is_gimple_assign (def) > || gimple_assign_single_p (def)) > return true; I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) == true"?? Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we want to follow up the def chain precisely on these? Thanks. Aldy
On Wed, Dec 21, 2016 at 7:43 PM, Aldy Hernandez <aldyh@redhat.com> wrote: > On 12/20/2016 09:16 AM, Richard Biener wrote: > >> You do not handle memory or calls conservatively which means the existing >> testcase only needs some obfuscation to become a problem again. To fix >> that before /* Check that any SSA names used to define NAME is also fully >> defined. */ bail out conservatively, like >> >> if (! is_gimple_assign (def) >> || gimple_assign_single_p (def)) >> return true; > > > I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs > setting a value, but the "gimple_assign_single_p(def) == true"?? > > Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we > want to follow up the def chain precisely on these? For the first (= e) we can't, for the second, yes, the predicate is too strict. But it's conservative ;) Likewise for e_2 = VIEW_CONVERT_EXPR <u_88> btw, or REAL/IMAGPART_EXPR. Note both VIEW_CONVERT_EXPR and REAL/IMAGPART_EXPR will also appear in RHSs we can't handle. Richard. > > Thanks. > Aldy
On Fri, Dec 16, 2016 at 09:41:33AM -0500, Aldy Hernandez wrote: > Hi folks. > > This is a follow-up on Jeff and Richi's interaction on the aforementioned PR > here: > > https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html > > I decided to explore the idea of analyzing may-undefness on-demand, which > actually looks rather cheap. > > BTW, I don't understand why we don't have auto_bitmap's, as we already have > auto_sbitmap's. I've implemented the former based on auto_sbitmap's code we > already have. No good reason, it just hasn't been done yet. I'm tempted to fit both this and auto_sbitmap into a unique_ptr with custom deletor, but that would make it hard to do small object optimizations, so maybe it isn't actually a great idea. > +class auto_bitmap > +{ > + public: > + auto_bitmap () { bits = BITMAP_ALLOC (NULL); } > + ~auto_bitmap () { BITMAP_FREE (bits); } We could make the member a bitmap_head, and then use bitmap_initialize and bitmap_clear in the ctor and dtor, that'd save a alloc / dealloc. You might also want to use the CXX_MEM_STAT macros on the ctor to get better attribution of memory usage. > + // Prevent making a copy that references our bitmap. > + auto_bitmap (const auto_bitmap &); > + auto_bitmap &operator = (const auto_bitmap &); > +#if __cplusplus >= 201103L > + auto_bitmap (auto_bitmap &&); > + auto_bitmap &operator = (auto_bitmap &&); We could actually support moving bitmaps pretty easily, though we would need to do the auto_ptr style hacks to keep building as c++98. I'll probably do that work eventually for unique_ptr, but its kind of late for gcc 8 unless we just use it to fix memory leaks. > +#endif > + > + bitmap bits; style guide would say that should be m_bits I believe. Trev sorry about the lag here.
diff --git a/gcc/bitmap.h b/gcc/bitmap.h index 1b2c8e0..bc32a21 100644 --- a/gcc/bitmap.h +++ b/gcc/bitmap.h @@ -802,4 +802,25 @@ bmp_iter_and_compl (bitmap_iterator *bi, unsigned *bit_no) bmp_iter_and_compl (&(ITER), &(BITNUM)); \ bmp_iter_next (&(ITER), &(BITNUM))) +/* A class that ties the lifetime of a bitmap to its scope. */ +class auto_bitmap +{ + public: + auto_bitmap () { bits = BITMAP_ALLOC (NULL); } + ~auto_bitmap () { BITMAP_FREE (bits); } + // Allow calling bitmap functions on our bitmap. + operator bitmap () { return bits; } + + private: + // Prevent making a copy that references our bitmap. + auto_bitmap (const auto_bitmap &); + auto_bitmap &operator = (const auto_bitmap &); +#if __cplusplus >= 201103L + auto_bitmap (auto_bitmap &&); + auto_bitmap &operator = (auto_bitmap &&); +#endif + + bitmap bits; +}; + #endif /* GCC_BITMAP_H */ diff --git a/gcc/testsuite/gcc.c-torture/execute/pr71691.c b/gcc/testsuite/gcc.c-torture/execute/pr71691.c new file mode 100644 index 0000000..1ffd1a2 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr71691.c @@ -0,0 +1,47 @@ +/* { dg-options "-fno-tree-vrp" } */ + +char b; +short f; +unsigned e; +int g = 20; + +void +foo () +{ + int l, h; + for (l = 0; l <= 7; l++) + { + int j = 38; + if (g) + h = 0; + for (; h <= 7; h++) + { + int i, k = b % (j % 4); + g = f; + for (;;) + { + j = 6 || b; + if (e) + { + for (; j; --j) + if (k) + __builtin_printf ("%d", 9); + if (i) + __builtin_printf ("%d", j); + } + if (l) + continue; + break; + } + i = f || b; + } + } +} + +int +main () +{ + foo (); + return 0; +} + diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c index 40905af..bc34395 100644 --- a/gcc/tree-ssa-loop-unswitch.c +++ b/gcc/tree-ssa-loop-unswitch.c @@ -109,6 +109,71 @@ tree_ssa_unswitch_loops (void) return 0; } +/* Return TRUE if an SSA_NAME may be undefined. */ + +static bool +ssa_maybe_undefined_value_p (tree name) +{ + /* If it's obviously undefined, avoid further computations. */ + if (ssa_undefined_value_p (name, true)) + return true; + + auto_bitmap visited_ssa; + auto_vec<tree> worklist; + worklist.safe_push (name); + while (!worklist.is_empty ()) + { + name = worklist.pop (); + gcc_assert (TREE_CODE (name) == SSA_NAME); + + if (ssa_undefined_value_p (name, true)) + return true; + + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); + + /* Check that all the PHI args are fully defined. */ + gimple *def = SSA_NAME_DEF_STMT (name); + if (gphi *phi = dyn_cast <gphi *> (def)) + { + if (virtual_operand_p (PHI_RESULT (phi))) + continue; + for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i) + { + name = gimple_phi_arg_def (phi, i); + if (TREE_CODE (name) == SSA_NAME) + { + /* If an SSA has already been seen, this may be a loop. + Fail conservatively. */ + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) + return false; + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); + worklist.safe_push (name); + } + } + continue; + } + + /* Check that any SSA names used to define NAME is also fully + defined. */ + use_operand_p use_p; + ssa_op_iter iter; + FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE) + { + name = USE_FROM_PTR (use_p); + if (TREE_CODE (name) == SSA_NAME) + { + /* If an SSA has already been seen, this may be a loop. + Fail conservatively. */ + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) + return false; + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); + worklist.safe_push (name); + } + } + } + return false; +} + /* Checks whether we can unswitch LOOP on condition at end of BB -- one of its basic blocks (for what it means see comments below). */ @@ -138,7 +203,7 @@ tree_may_unswitch_on (basic_block bb, struct loop *loop) { /* Unswitching on undefined values would introduce undefined behavior that the original program might never exercise. */ - if (ssa_undefined_value_p (use, true)) + if (ssa_maybe_undefined_value_p (use)) return NULL_TREE; def = SSA_NAME_DEF_STMT (use); def_bb = gimple_bb (def);