diff mbox series

[v2] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p () before IPA.

Message ID 20200630184602.1228644-1-iii@linux.ibm.com
State New
Headers show
Series [v2] tree-ssa-threadbackward.c (profitable_jump_thread_path): Do not allow __builtin_constant_p () before IPA. | expand

Commit Message

Ilya Leoshkevich June 30, 2020, 6:46 p.m. UTC
v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html

This is the implementation of Jakub's suggestion: allow
__builtin_constant_p () after IPA, but fold it into 0.  Smoke test
passed on s390x-redhat-linux, full regtest and bootstrap are running on
x86_64-redhat-linux.

---

Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) build
with GCC 10 fails on s390 with "impossible constraint".

The problem is that jump threading makes __builtin_constant_p () lie
when it splits a path containing a non-constant expression in a way
that on each of the resulting paths this expression is constant.

Fix by disallowing __builtin_constant_p () on threading paths before
IPA and fold it into 0 after IPA.

gcc/ChangeLog:

2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>

	* tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New
	member.
	(thread_jumps::profitable_jump_thread_path): Do not allow
	__builtin_constant_p () on threading paths unless m_allow_bcp_p
	is set.
	(thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p.
	(pass_thread_jumps::execute): Allow __builtin_constant_p () on
	threading paths after IPA.
	(pass_early_thread_jumps::execute): Do not allow
	__builtin_constant_p () on threading paths before IPA.
	* tree-ssa-threadupdate.c (duplicate_thread_path): Fold
	__builtin_constant_p () on threading paths into 0.

gcc/testsuite/ChangeLog:

2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>

	* gcc.target/s390/builtin-constant-p-threading.c: New test.
---
 .../s390/builtin-constant-p-threading.c       | 46 +++++++++++++++++++
 gcc/tree-ssa-threadbackward.c                 | 22 ++++++---
 gcc/tree-ssa-threadupdate.c                   | 18 ++++++++
 3 files changed, 80 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c

Comments

Jeff Law Nov. 20, 2020, 7:14 p.m. UTC | #1
On 6/30/20 12:46 PM, Ilya Leoshkevich wrote:
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
>
> This is the implementation of Jakub's suggestion: allow
> __builtin_constant_p () after IPA, but fold it into 0.  Smoke test
> passed on s390x-redhat-linux, full regtest and bootstrap are running on
> x86_64-redhat-linux.
>
> ---
>
> Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c) build
> with GCC 10 fails on s390 with "impossible constraint".
>
> The problem is that jump threading makes __builtin_constant_p () lie
> when it splits a path containing a non-constant expression in a way
> that on each of the resulting paths this expression is constant.
>
> Fix by disallowing __builtin_constant_p () on threading paths before
> IPA and fold it into 0 after IPA.
>
> gcc/ChangeLog:
>
> 2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>
>
> 	* tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New
> 	member.
> 	(thread_jumps::profitable_jump_thread_path): Do not allow
> 	__builtin_constant_p () on threading paths unless m_allow_bcp_p
> 	is set.
> 	(thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p.
> 	(pass_thread_jumps::execute): Allow __builtin_constant_p () on
> 	threading paths after IPA.
> 	(pass_early_thread_jumps::execute): Do not allow
> 	__builtin_constant_p () on threading paths before IPA.
> 	* tree-ssa-threadupdate.c (duplicate_thread_path): Fold
> 	__builtin_constant_p () on threading paths into 0.
>
> gcc/testsuite/ChangeLog:
>
> 2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>
>
> 	* gcc.target/s390/builtin-constant-p-threading.c: New test.
So I'm finally getting back to this.  Thanks for your patience.

It's a nasty little problem, and I suspect there's actually some deeper
issues here.  While I'd like to claim its a bad use of b_c_p, I don't
think I can reasonably make that argument.

So what we have is a b_c_p at the start of an if-else chain.  Subsequent
tests on the "true" arm of the the b_c_p test may throw us off the
constant path (because the constants are out of range).  Once all the
tests are passed (it's constant and the constant is in range) the true
arm's terminal block has a special asm that requires a constant
argument.   In the case where we get to the terminal block on the true
arm, the argument to the b_c_p is used as the constant argument to the
special asm.

At first glace jump threading seems to be doing the right thing.  Except
that we end up with two paths to that terminal block with the special
asm, one for each of the two constant arguments to the b_c_p call. 
Naturally since that same value is used in the asm, we have to introduce
a PHI to select between them at the head of the terminal block.   Now
the argument in the asm is no longer constant and boom we fail.

I briefly pondered if we should only throttle when the argument to the
b_c_p is not used elsewhere.  But I think that just hides the problem
and with a little work I could probably extend the testcase to still
fail in that scenario.

I also briefly pondered if we should isolate the terminal block as well
(essentially creating one for each unique PHI argument).  We'd likely
only need to do that when there's an ASM in the terminal block, but that
likely just papers over the problem as well since the ASM could be in a
successor of the terminal block.

I haven't thought real deeply about it, but I wouldn't be surprised if
there's other passes that can trigger similar problems.  Aggressive
cross-jumping would be the most obvious, but some of the hosting/sinking
of operations past PHIs would seem potentially problematical as well.

Jakub suggestion might be the best one in this space.   I don't have
anything better right now.  The deeper questions about other passes
setting up similar scenarios can probably be punted, I'd expect
threading to be far and above the most common way for this to happen and
I'd be comfortable faulting in investigation of other cases if/when they
happen.

So I retract my initial objections.  Let's go with the V2 patch.


jeff
Ilya Leoshkevich Nov. 23, 2020, 2:36 p.m. UTC | #2
On Fri, 2020-11-20 at 12:14 -0700, Jeff Law wrote:
> 
> On 6/30/20 12:46 PM, Ilya Leoshkevich wrote:
> > v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > 
> > This is the implementation of Jakub's suggestion: allow
> > __builtin_constant_p () after IPA, but fold it into 0.  Smoke test
> > passed on s390x-redhat-linux, full regtest and bootstrap are
> > running on
> > x86_64-redhat-linux.
> > 
> > ---
> > 
> > Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c)
> > build
> > with GCC 10 fails on s390 with "impossible constraint".
> > 
> > The problem is that jump threading makes __builtin_constant_p ()
> > lie
> > when it splits a path containing a non-constant expression in a way
> > that on each of the resulting paths this expression is constant.
> > 
> > Fix by disallowing __builtin_constant_p () on threading paths
> > before
> > IPA and fold it into 0 after IPA.
> > 
> > gcc/ChangeLog:
> > 
> > 2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>
> > 
> > 	* tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New
> > 	member.
> > 	(thread_jumps::profitable_jump_thread_path): Do not allow
> > 	__builtin_constant_p () on threading paths unless m_allow_bcp_p
> > 	is set.
> > 	(thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p.
> > 	(pass_thread_jumps::execute): Allow __builtin_constant_p () on
> > 	threading paths after IPA.
> > 	(pass_early_thread_jumps::execute): Do not allow
> > 	__builtin_constant_p () on threading paths before IPA.
> > 	* tree-ssa-threadupdate.c (duplicate_thread_path): Fold
> > 	__builtin_constant_p () on threading paths into 0.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > 2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>
> > 
> > 	* gcc.target/s390/builtin-constant-p-threading.c: New test.
> So I'm finally getting back to this.  Thanks for your patience.
> 
> It's a nasty little problem, and I suspect there's actually some
> deeper
> issues here.  While I'd like to claim its a bad use of b_c_p, I don't
> think I can reasonably make that argument.
> 
> So what we have is a b_c_p at the start of an if-else chain. 
> Subsequent
> tests on the "true" arm of the the b_c_p test may throw us off the
> constant path (because the constants are out of range).  Once all the
> tests are passed (it's constant and the constant is in range) the
> true
> arm's terminal block has a special asm that requires a constant
> argument.   In the case where we get to the terminal block on the
> true
> arm, the argument to the b_c_p is used as the constant argument to
> the
> special asm.
> 
> At first glace jump threading seems to be doing the right thing. 
> Except
> that we end up with two paths to that terminal block with the special
> asm, one for each of the two constant arguments to the b_c_p call. 
> Naturally since that same value is used in the asm, we have to
> introduce
> a PHI to select between them at the head of the terminal block.   Now
> the argument in the asm is no longer constant and boom we fail.
> 
> I briefly pondered if we should only throttle when the argument to
> the
> b_c_p is not used elsewhere.  But I think that just hides the problem
> and with a little work I could probably extend the testcase to still
> fail in that scenario.
> 
> I also briefly pondered if we should isolate the terminal block as
> well
> (essentially creating one for each unique PHI argument).  We'd likely
> only need to do that when there's an ASM in the terminal block, but
> that
> likely just papers over the problem as well since the ASM could be in
> a
> successor of the terminal block.
> 
> I haven't thought real deeply about it, but I wouldn't be surprised
> if
> there's other passes that can trigger similar problems.  Aggressive
> cross-jumping would be the most obvious, but some of the
> hosting/sinking
> of operations past PHIs would seem potentially problematical as well.
> 
> Jakub suggestion might be the best one in this space.   I don't have
> anything better right now.  The deeper questions about other passes
> setting up similar scenarios can probably be punted, I'd expect
> threading to be far and above the most common way for this to happen
> and
> I'd be comfortable faulting in investigation of other cases if/when
> they
> happen.
> 
> So I retract my initial objections.  Let's go with the V2 patch.
> 
> 
> jeff

Hi Jeff,

Thanks for having another look!

I did x86_64 builds of SPEC and vmlinux, and it seems that in practice
v2 does not have any benefit over v1.

What do you think about going with the v1, which is less complex?

Best regards,
Ilya
Jeff Law Dec. 1, 2020, 10:34 p.m. UTC | #3
On 11/23/20 7:36 AM, Ilya Leoshkevich wrote:
> On Fri, 2020-11-20 at 12:14 -0700, Jeff Law wrote:
>> On 6/30/20 12:46 PM, Ilya Leoshkevich wrote:
>>> v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
>>>
>>> This is the implementation of Jakub's suggestion: allow
>>> __builtin_constant_p () after IPA, but fold it into 0.  Smoke test
>>> passed on s390x-redhat-linux, full regtest and bootstrap are
>>> running on
>>> x86_64-redhat-linux.
>>>
>>> ---
>>>
>>> Linux Kernel (specifically, drivers/leds/trigger/ledtrig-cpu.c)
>>> build
>>> with GCC 10 fails on s390 with "impossible constraint".
>>>
>>> The problem is that jump threading makes __builtin_constant_p ()
>>> lie
>>> when it splits a path containing a non-constant expression in a way
>>> that on each of the resulting paths this expression is constant.
>>>
>>> Fix by disallowing __builtin_constant_p () on threading paths
>>> before
>>> IPA and fold it into 0 after IPA.
>>>
>>> gcc/ChangeLog:
>>>
>>> 2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>
>>>
>>> 	* tree-ssa-threadbackward.c (thread_jumps::m_allow_bcp_p): New
>>> 	member.
>>> 	(thread_jumps::profitable_jump_thread_path): Do not allow
>>> 	__builtin_constant_p () on threading paths unless m_allow_bcp_p
>>> 	is set.
>>> 	(thread_jumps::find_jump_threads_backwards): Set m_allow_bcp_p.
>>> 	(pass_thread_jumps::execute): Allow __builtin_constant_p () on
>>> 	threading paths after IPA.
>>> 	(pass_early_thread_jumps::execute): Do not allow
>>> 	__builtin_constant_p () on threading paths before IPA.
>>> 	* tree-ssa-threadupdate.c (duplicate_thread_path): Fold
>>> 	__builtin_constant_p () on threading paths into 0.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2020-06-30  Ilya Leoshkevich  <iii@linux.ibm.com>
>>>
>>> 	* gcc.target/s390/builtin-constant-p-threading.c: New test.
>> So I'm finally getting back to this.  Thanks for your patience.
>>
>> It's a nasty little problem, and I suspect there's actually some
>> deeper
>> issues here.  While I'd like to claim its a bad use of b_c_p, I don't
>> think I can reasonably make that argument.
>>
>> So what we have is a b_c_p at the start of an if-else chain. 
>> Subsequent
>> tests on the "true" arm of the the b_c_p test may throw us off the
>> constant path (because the constants are out of range).  Once all the
>> tests are passed (it's constant and the constant is in range) the
>> true
>> arm's terminal block has a special asm that requires a constant
>> argument.   In the case where we get to the terminal block on the
>> true
>> arm, the argument to the b_c_p is used as the constant argument to
>> the
>> special asm.
>>
>> At first glace jump threading seems to be doing the right thing. 
>> Except
>> that we end up with two paths to that terminal block with the special
>> asm, one for each of the two constant arguments to the b_c_p call. 
>> Naturally since that same value is used in the asm, we have to
>> introduce
>> a PHI to select between them at the head of the terminal block.   Now
>> the argument in the asm is no longer constant and boom we fail.
>>
>> I briefly pondered if we should only throttle when the argument to
>> the
>> b_c_p is not used elsewhere.  But I think that just hides the problem
>> and with a little work I could probably extend the testcase to still
>> fail in that scenario.
>>
>> I also briefly pondered if we should isolate the terminal block as
>> well
>> (essentially creating one for each unique PHI argument).  We'd likely
>> only need to do that when there's an ASM in the terminal block, but
>> that
>> likely just papers over the problem as well since the ASM could be in
>> a
>> successor of the terminal block.
>>
>> I haven't thought real deeply about it, but I wouldn't be surprised
>> if
>> there's other passes that can trigger similar problems.  Aggressive
>> cross-jumping would be the most obvious, but some of the
>> hosting/sinking
>> of operations past PHIs would seem potentially problematical as well.
>>
>> Jakub suggestion might be the best one in this space.   I don't have
>> anything better right now.  The deeper questions about other passes
>> setting up similar scenarios can probably be punted, I'd expect
>> threading to be far and above the most common way for this to happen
>> and
>> I'd be comfortable faulting in investigation of other cases if/when
>> they
>> happen.
>>
>> So I retract my initial objections.  Let's go with the V2 patch.
>>
>>
>> jeff
> Hi Jeff,
>
> Thanks for having another look!
>
> I did x86_64 builds of SPEC and vmlinux, and it seems that in practice
> v2 does not have any benefit over v1.
>
> What do you think about going with the v1, which is less complex?
No strong opinions.  I think whichever is less invasive in terms of code
quality is probably the way to go.  What we want to avoid is suppressing
threading unnecessarily as that often leads to false positives from
middle-end based warnings.  Suppressing threading can also lead to build
failures in the kernel due to the way they use b_c_p.

jeff
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c
new file mode 100644
index 00000000000..5f0acdce0b0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/builtin-constant-p-threading.c
@@ -0,0 +1,46 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=z196 -mzarch" } */
+
+typedef struct
+{
+  int counter;
+} atomic_t;
+
+static inline __attribute__ ((__gnu_inline__)) int
+__atomic_add (int val, int *ptr)
+{
+  int old;
+  asm volatile("laa %[old],%[val],%[ptr]\n"
+	       : [old] "=d" (old), [ptr] "+Q"(*ptr)
+	       : [val] "d" (val)
+	       : "cc", "memory");
+  return old;
+}
+
+static inline __attribute__ ((__gnu_inline__)) void
+__atomic_add_const (int val, int *ptr)
+{
+  asm volatile("asi %[ptr],%[val]\n"
+	       : [ptr] "+Q" (*ptr)
+	       : [val] "i" (val)
+	       : "cc", "memory");
+}
+
+static inline __attribute__ ((__gnu_inline__)) void
+atomic_add (int i, atomic_t *v)
+{
+  if (__builtin_constant_p (i) && (i > -129) && (i < 128))
+    {
+      __atomic_add_const (i, &v->counter);
+      return;
+    }
+  __atomic_add (i, &v->counter);
+}
+
+static atomic_t num_active_cpus = { (0) };
+
+void
+ledtrig_cpu (_Bool is_active)
+{
+  atomic_add (is_active ? 1 : -1, &num_active_cpus);
+}
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 327628f1662..446cb1b2e21 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -40,7 +40,9 @@  along with GCC; see the file COPYING3.  If not see
 class thread_jumps
 {
  public:
-  void find_jump_threads_backwards (basic_block bb, bool speed_p);
+   void find_jump_threads_backwards (basic_block bb, bool speed_p,
+				     bool allow_bcp_p);
+
  private:
   edge profitable_jump_thread_path (basic_block bbi, tree name, tree arg,
 				    bool *creates_irreducible_loop);
@@ -65,6 +67,8 @@  class thread_jumps
   /* Indicate that we could increase code size to improve the
      code path.  */
   bool m_speed_p;
+  /* Whether to allow __builtin_constant_p () on threading paths.  */
+  bool m_allow_bcp_p;
 };
 
 /* Simple helper to get the last statement from BB, which is assumed
@@ -260,7 +264,9 @@  thread_jumps::profitable_jump_thread_path (basic_block bbi, tree name,
 	       gsi_next_nondebug (&gsi))
 	    {
 	      gimple *stmt = gsi_stmt (gsi);
-	      if (gimple_call_internal_p (stmt, IFN_UNIQUE))
+	      if (gimple_call_internal_p (stmt, IFN_UNIQUE)
+		  || (!m_allow_bcp_p
+		      && gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P)))
 		{
 		  m_path.pop ();
 		  return NULL;
@@ -737,10 +743,13 @@  thread_jumps::fsm_find_control_statement_thread_paths (tree name)
    It is assumed that BB ends with a control statement and that by
    finding a path where NAME is a constant, we can thread the path.
    SPEED_P indicates that we could increase code size to improve the
-   code path.  */
+   code path.  ALLOW_BCP_P specifies whether to allow __builtin_constant_p ()
+   on threading paths and fold it into 0, or whether to disallow it
+   altogether.  */
 
 void
-thread_jumps::find_jump_threads_backwards (basic_block bb, bool speed_p)
+thread_jumps::find_jump_threads_backwards (basic_block bb, bool speed_p,
+					   bool allow_bcp_p)
 {     
   gimple *stmt = get_gimple_control_stmt (bb);
   if (!stmt)
@@ -770,6 +779,7 @@  thread_jumps::find_jump_threads_backwards (basic_block bb, bool speed_p)
   m_visited_bbs.empty ();
   m_seen_loop_phi = false;
   m_speed_p = speed_p;
+  m_allow_bcp_p = allow_bcp_p;
   m_max_threaded_paths = param_max_fsm_thread_paths;
 
   fsm_find_control_statement_thread_paths (name);
@@ -820,7 +830,7 @@  pass_thread_jumps::execute (function *fun)
   FOR_EACH_BB_FN (bb, fun)
     {
       if (EDGE_COUNT (bb->succs) > 1)
-	threader.find_jump_threads_backwards (bb, true);
+	threader.find_jump_threads_backwards (bb, true, true);
     }
   bool changed = thread_through_all_blocks (true);
 
@@ -881,7 +891,7 @@  pass_early_thread_jumps::execute (function *fun)
   FOR_EACH_BB_FN (bb, fun)
     {
       if (EDGE_COUNT (bb->succs) > 1)
-	threader.find_jump_threads_backwards (bb, false);
+	threader.find_jump_threads_backwards (bb, false, false);
     }
   thread_through_all_blocks (true);
 
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 5abecf6c295..4b39e6d76e5 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2365,6 +2365,24 @@  duplicate_thread_path (edge entry, edge exit, basic_block *region,
       edge_iterator ei;
       basic_block bb = region_copy[i];
 
+      /* Fold __builtin_constant_p () calls into 0.  We act so conservatively,
+	 because threading could have split a non-constant expression into
+	 multiple constant ones, and we don't know whether corresponding paths
+	 converge before all statements that depends on __builtin_constant_p ()
+	 result.  */
+      for (gimple_stmt_iterator gsi = gsi_after_labels (bb); !gsi_end_p (gsi);
+	   gsi_next_nondebug (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  if (gimple_call_builtin_p (stmt, BUILT_IN_CONSTANT_P))
+	    {
+	      tree lhs = gimple_call_lhs (stmt);
+	      tree zero = build_int_cst (TREE_TYPE (lhs), 0);
+	      gimple *repl = gimple_build_assign (lhs, zero);
+	      gsi_replace (&gsi, repl, false);
+	    }
+	}
+
       /* Watch inconsistent profile.  */
       if (curr_count > region[i]->count)
 	curr_count = region[i]->count;