Message ID | 20200630184602.1228644-1-iii@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [v2] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p () before IPA. | expand |
On 6/30/20 12:46 PM, Ilya Leoshkevich wrote: > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > This is the implementation of Jakub's suggestion: allow > __builtin_constant_p () after IPA, but fold it into 0. Smoke test > passed on s390x-redhat-linux, full regtest and bootstrap are running on > x86_64-redhat-linux. > > --- > > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) build > with GCC 10 fails on s390 with "impossible constraint". > > The problem is that jump threading makes __builtin_constant_p () lie > when it splits a path containing a non-constant expression in a way > that on each of the resulting paths this expression is constant. > > Fix by disallowing __builtin_constant_p () on threading paths before > IPA and fold it into 0 after IPA. > > gcc/ChangeLog: > > 2020-06-30 Ilya Leoshkevich <iii@linux.ibm.com> > > * tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New > member. > (thread_jumps::profitable_jump_thread_path): Do not allow > __builtin_constant_p () on threading paths unless m_allow_bcp_p > is set. > (thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p. > (pass_thread_jumps::execute): Allow __builtin_constant_p () on > threading paths after IPA. > (pass_early_thread_jumps::execute): Do not allow > __builtin_constant_p () on threading paths before IPA. > * tree-ssa-threadupdate.c (duplicate_thread_path): Fold > __builtin_constant_p () on threading paths into 0. > > gcc/testsuite/ChangeLog: > > 2020-06-30 Ilya Leoshkevich <iii@linux.ibm.com> > > * gcc.target/s390/builtin-constant-p-threading.c: New test. So I'm finally getting back to this. Thanks for your patience. It's a nasty little problem, and I suspect there's actually some deeper issues here. While I'd like to claim its a bad use of b_c_p, I don't think I can reasonably make that argument. So what we have is a b_c_p at the start of an if-else chain. Subsequent tests on the "true" arm of the the b_c_p test may throw us off the constant path (because the constants are out of range). Once all the tests are passed (it's constant and the constant is in range) the true arm's terminal block has a special asm that requires a constant argument. In the case where we get to the terminal block on the true arm, the argument to the b_c_p is used as the constant argument to the special asm. At first glace jump threading seems to be doing the right thing. Except that we end up with two paths to that terminal block with the special asm, one for each of the two constant arguments to the b_c_p call. Naturally since that same value is used in the asm, we have to introduce a PHI to select between them at the head of the terminal block. Now the argument in the asm is no longer constant and boom we fail. I briefly pondered if we should only throttle when the argument to the b_c_p is not used elsewhere. But I think that just hides the problem and with a little work I could probably extend the testcase to still fail in that scenario. I also briefly pondered if we should isolate the terminal block as well (essentially creating one for each unique PHI argument). We'd likely only need to do that when there's an ASM in the terminal block, but that likely just papers over the problem as well since the ASM could be in a successor of the terminal block. I haven't thought real deeply about it, but I wouldn't be surprised if there's other passes that can trigger similar problems. Aggressive cross-jumping would be the most obvious, but some of the hosting/sinking of operations past PHIs would seem potentially problematical as well. Jakub suggestion might be the best one in this space. I don't have anything better right now. The deeper questions about other passes setting up similar scenarios can probably be punted, I'd expect threading to be far and above the most common way for this to happen and I'd be comfortable faulting in investigation of other cases if/when they happen. So I retract my initial objections. Let's go with the V2 patch. jeff
On Fri, 2020-11-20 at 12:14 -0700, Jeff Law wrote: > > On 6/30/20 12:46 PM, Ilya Leoshkevich wrote: > > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html > > > > This is the implementation of Jakub's suggestion: allow > > __builtin_constant_p () after IPA, but fold it into 0. Smoke test > > passed on s390x-redhat-linux, full regtest and bootstrap are > > running on > > x86_64-redhat-linux. > > > > --- > > > > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) > > build > > with GCC 10 fails on s390 with "impossible constraint". > > > > The problem is that jump threading makes __builtin_constant_p () > > lie > > when it splits a path containing a non-constant expression in a way > > that on each of the resulting paths this expression is constant. > > > > Fix by disallowing __builtin_constant_p () on threading paths > > before > > IPA and fold it into 0 after IPA. > > > > gcc/ChangeLog: > > > > 2020-06-30 Ilya Leoshkevich <iii@linux.ibm.com> > > > > * tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New > > member. > > (thread_jumps::profitable_jump_thread_path): Do not allow > > __builtin_constant_p () on threading paths unless m_allow_bcp_p > > is set. > > (thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p. > > (pass_thread_jumps::execute): Allow __builtin_constant_p () on > > threading paths after IPA. > > (pass_early_thread_jumps::execute): Do not allow > > __builtin_constant_p () on threading paths before IPA. > > * tree-ssa-threadupdate.c (duplicate_thread_path): Fold > > __builtin_constant_p () on threading paths into 0. > > > > gcc/testsuite/ChangeLog: > > > > 2020-06-30 Ilya Leoshkevich <iii@linux.ibm.com> > > > > * gcc.target/s390/builtin-constant-p-threading.c: New test. > So I'm finally getting back to this. Thanks for your patience. > > It's a nasty little problem, and I suspect there's actually some > deeper > issues here. While I'd like to claim its a bad use of b_c_p, I don't > think I can reasonably make that argument. > > So what we have is a b_c_p at the start of an if-else chain. > Subsequent > tests on the "true" arm of the the b_c_p test may throw us off the > constant path (because the constants are out of range). Once all the > tests are passed (it's constant and the constant is in range) the > true > arm's terminal block has a special asm that requires a constant > argument. In the case where we get to the terminal block on the > true > arm, the argument to the b_c_p is used as the constant argument to > the > special asm. > > At first glace jump threading seems to be doing the right thing. > Except > that we end up with two paths to that terminal block with the special > asm, one for each of the two constant arguments to the b_c_p call. > Naturally since that same value is used in the asm, we have to > introduce > a PHI to select between them at the head of the terminal block. Now > the argument in the asm is no longer constant and boom we fail. > > I briefly pondered if we should only throttle when the argument to > the > b_c_p is not used elsewhere. But I think that just hides the problem > and with a little work I could probably extend the testcase to still > fail in that scenario. > > I also briefly pondered if we should isolate the terminal block as > well > (essentially creating one for each unique PHI argument). We'd likely > only need to do that when there's an ASM in the terminal block, but > that > likely just papers over the problem as well since the ASM could be in > a > successor of the terminal block. > > I haven't thought real deeply about it, but I wouldn't be surprised > if > there's other passes that can trigger similar problems. Aggressive > cross-jumping would be the most obvious, but some of the > hosting/sinking > of operations past PHIs would seem potentially problematical as well. > > Jakub suggestion might be the best one in this space. I don't have > anything better right now. The deeper questions about other passes > setting up similar scenarios can probably be punted, I'd expect > threading to be far and above the most common way for this to happen > and > I'd be comfortable faulting in investigation of other cases if/when > they > happen. > > So I retract my initial objections. Let's go with the V2 patch. > > > jeff Hi Jeff, Thanks for having another look! I did x86_64 builds of SPEC and vmlinux, and it seems that in practice v2 does not have any benefit over v1. What do you think about going with the v1, which is less complex? Best regards, Ilya
On 11/23/20 7:36 AM, Ilya Leoshkevich wrote: > On Fri, 2020-11-20 at 12:14 -0700, Jeff Law wrote: >> On 6/30/20 12:46 PM, Ilya Leoshkevich wrote: >>> v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html >>> >>> This is the implementation of Jakub's suggestion: allow >>> __builtin_constant_p () after IPA, but fold it into 0. Smoke test >>> passed on s390x-redhat-linux, full regtest and bootstrap are >>> running on >>> x86_64-redhat-linux. >>> >>> --- >>> >>> Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) >>> build >>> with GCC 10 fails on s390 with "impossible constraint". >>> >>> The problem is that jump threading makes __builtin_constant_p () >>> lie >>> when it splits a path containing a non-constant expression in a way >>> that on each of the resulting paths this expression is constant. >>> >>> Fix by disallowing __builtin_constant_p () on threading paths >>> before >>> IPA and fold it into 0 after IPA. >>> >>> gcc/ChangeLog: >>> >>> 2020-06-30 Ilya Leoshkevich <iii@linux.ibm.com> >>> >>> * tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New >>> member. >>> (thread_jumps::profitable_jump_thread_path): Do not allow >>> __builtin_constant_p () on threading paths unless m_allow_bcp_p >>> is set. >>> (thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p. >>> (pass_thread_jumps::execute): Allow __builtin_constant_p () on >>> threading paths after IPA. >>> (pass_early_thread_jumps::execute): Do not allow >>> __builtin_constant_p () on threading paths before IPA. >>> * tree-ssa-threadupdate.c (duplicate_thread_path): Fold >>> __builtin_constant_p () on threading paths into 0. >>> >>> gcc/testsuite/ChangeLog: >>> >>> 2020-06-30 Ilya Leoshkevich <iii@linux.ibm.com> >>> >>> * gcc.target/s390/builtin-constant-p-threading.c: New test. >> So I'm finally getting back to this. Thanks for your patience. >> >> It's a nasty little problem, and I suspect there's actually some >> deeper >> issues here. While I'd like to claim its a bad use of b_c_p, I don't >> think I can reasonably make that argument. >> >> So what we have is a b_c_p at the start of an if-else chain. >> Subsequent >> tests on the "true" arm of the the b_c_p test may throw us off the >> constant path (because the constants are out of range). Once all the >> tests are passed (it's constant and the constant is in range) the >> true >> arm's terminal block has a special asm that requires a constant >> argument. In the case where we get to the terminal block on the >> true >> arm, the argument to the b_c_p is used as the constant argument to >> the >> special asm. >> >> At first glace jump threading seems to be doing the right thing. >> Except >> that we end up with two paths to that terminal block with the special >> asm, one for each of the two constant arguments to the b_c_p call. >> Naturally since that same value is used in the asm, we have to >> introduce >> a PHI to select between them at the head of the terminal block. Now >> the argument in the asm is no longer constant and boom we fail. >> >> I briefly pondered if we should only throttle when the argument to >> the >> b_c_p is not used elsewhere. But I think that just hides the problem >> and with a little work I could probably extend the testcase to still >> fail in that scenario. >> >> I also briefly pondered if we should isolate the terminal block as >> well >> (essentially creating one for each unique PHI argument). We'd likely >> only need to do that when there's an ASM in the terminal block, but >> that >> likely just papers over the problem as well since the ASM could be in >> a >> successor of the terminal block. >> >> I haven't thought real deeply about it, but I wouldn't be surprised >> if >> there's other passes that can trigger similar problems. Aggressive >> cross-jumping would be the most obvious, but some of the >> hosting/sinking >> of operations past PHIs would seem potentially problematical as well. >> >> Jakub suggestion might be the best one in this space. I don't have >> anything better right now. The deeper questions about other passes >> setting up similar scenarios can probably be punted, I'd expect >> threading to be far and above the most common way for this to happen >> and >> I'd be comfortable faulting in investigation of other cases if/when >> they >> happen. >> >> So I retract my initial objections. Let's go with the V2 patch. >> >> >> jeff > Hi Jeff, > > Thanks for having another look! > > I did x86_64 builds of SPEC and vmlinux, and it seems that in practice > v2 does not have any benefit over v1. > > What do you think about going with the v1, which is less complex? No strong opinions. I think whichever is less invasive in terms of code quality is probably the way to go. What we want to avoid is suppressing threading unnecessarily as that often leads to false positives from middle-end based warnings. Suppressing threading can also lead to build failures in the kernel due to the way they use b_c_p. jeff
diff --git a/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c new file mode 100644 index 00000000000..5f0acdce0b0 --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=z196 -mzarch" } */ + +typedef struct +{ + int counter; +} atomic_t; + +static inline __attribute__ ((__gnu_inline__)) int +__atomic_add (int val, int *ptr) +{ + int old; + asm volatile("laa %[old],%[val],%[ptr]\n" + : [old] "=d" (old), [ptr] "+Q"(*ptr) + : [val] "d" (val) + : "cc", "memory"); + return old; +} + +static inline __attribute__ ((__gnu_inline__)) void +__atomic_add_const (int val, int *ptr) +{ + asm volatile("asi %[ptr],%[val]\n" + : [ptr] "+Q" (*ptr) + : [val] "i" (val) + : "cc", "memory"); +} + +static inline __attribute__ ((__gnu_inline__)) void +atomic_add (int i, atomic_t *v) +{ + if (__builtin_constant_p (i) && (i > -129) && (i < 128)) + { + __atomic_add_const (i, &v->counter); + return; + } + __atomic_add (i, &v->counter); +} + +static atomic_t num_active_cpus = { (0) }; + +void +ledtrig_cpu (_Bool is_active) +{ + atomic_add (is_active ? 1 : -1, &num_active_cpus); +} diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c index 327628f1662..446cb1b2e21 100644 --- a/gcc/tree-ssa-threadbackward.c +++ b/gcc/tree-ssa-threadbackward.c @@ -40,7 +40,9 @@ along with GCC; see the file COPYING3. If not see class thread_jumps { public: - void find_jump_threads_backwards (basic_block bb, bool speed_p); + void find_jump_threads_backwards (basic_block bb, bool speed_p, + bool allow_bcp_p); + private: edge profitable_jump_thread_path (basic_block bbi, tree name, tree arg, bool *creates_irreducible_loop); @@ -65,6 +67,8 @@ class thread_jumps /* Indicate that we could increase code size to improve the code path. */ bool m_speed_p; + /* Whether to allow __builtin_constant_p () on threading paths. */ + bool m_allow_bcp_p; }; /* Simple helper to get the last statement from BB, which is assumed @@ -260,7 +264,9 @@ thread_jumps::profitable_jump_thread_path (basic_block bbi, tree name, gsi_next_nondebug (&gsi)) { gimple *stmt = gsi_stmt (gsi); - if (gimple_call_internal_p (stmt, IFN_UNIQUE)) + if (gimple_call_internal_p (stmt, IFN_UNIQUE) + || (!m_allow_bcp_p + && gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P))) { m_path.pop (); return NULL; @@ -737,10 +743,13 @@ thread_jumps::fsm_find_control_statement_thread_paths (tree name) It is assumed that BB ends with a control statement and that by finding a path where NAME is a constant, we can thread the path. SPEED_P indicates that we could increase code size to improve the - code path. */ + code path. ALLOW_BCP_P specifies whether to allow __builtin_constant_p () + on threading paths and fold it into 0, or whether to disallow it + altogether. */ void -thread_jumps::find_jump_threads_backwards (basic_block bb, bool speed_p) +thread_jumps::find_jump_threads_backwards (basic_block bb, bool speed_p, + bool allow_bcp_p) { gimple *stmt = get_gimple_control_stmt (bb); if (!stmt) @@ -770,6 +779,7 @@ thread_jumps::find_jump_threads_backwards (basic_block bb, bool speed_p) m_visited_bbs.empty (); m_seen_loop_phi = false; m_speed_p = speed_p; + m_allow_bcp_p = allow_bcp_p; m_max_threaded_paths = param_max_fsm_thread_paths; fsm_find_control_statement_thread_paths (name); @@ -820,7 +830,7 @@ pass_thread_jumps::execute (function *fun) FOR_EACH_BB_FN (bb, fun) { if (EDGE_COUNT (bb->succs) > 1) - threader.find_jump_threads_backwards (bb, true); + threader.find_jump_threads_backwards (bb, true, true); } bool changed = thread_through_all_blocks (true); @@ -881,7 +891,7 @@ pass_early_thread_jumps::execute (function *fun) FOR_EACH_BB_FN (bb, fun) { if (EDGE_COUNT (bb->succs) > 1) - threader.find_jump_threads_backwards (bb, false); + threader.find_jump_threads_backwards (bb, false, false); } thread_through_all_blocks (true); diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 5abecf6c295..4b39e6d76e5 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2365,6 +2365,24 @@ duplicate_thread_path (edge entry, edge exit, basic_block *region, edge_iterator ei; basic_block bb = region_copy[i]; + /* Fold __builtin_constant_p () calls into 0. We act so conservatively, + because threading could have split a non-constant expression into + multiple constant ones, and we don't know whether corresponding paths + converge before all statements that depends on __builtin_constant_p () + result. */ + for (gimple_stmt_iterator gsi = gsi_after_labels (bb); !gsi_end_p (gsi); + gsi_next_nondebug (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + if (gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P)) + { + tree lhs = gimple_call_lhs (stmt); + tree zero = build_int_cst (TREE_TYPE (lhs), 0); + gimple *repl = gimple_build_assign (lhs, zero); + gsi_replace (&gsi, repl, false); + } + } + /* Watch inconsistent profile. */ if (curr_count > region[i]->count) curr_count = region[i]->count;