diff mbox

Enable jump threading on maths meeting hot paths

Message ID 20170126100425.GA60443@kam.mff.cuni.cz
State New
Headers show

Commit Message

Jan Hubicka Jan. 26, 2017, 10:04 a.m. UTC
> > +      if (!contains_hot_bb && speed_p)
> > +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
> > +
> 
> Hmm, but you are also counting the destination of the threading here
> which we will
> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
> conditional so we
> look for hot BBs we are actually duplicating only (similar restrictions apply to
> path[0]?).

Aha, you are right.  I am re-testing updated patch (it also solves the PR)

Honza

Comments

Richard Biener Jan. 26, 2017, 10:16 a.m. UTC | #1
On Thu, Jan 26, 2017 at 11:04 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> > +      if (!contains_hot_bb && speed_p)
>> > +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
>> > +
>>
>> Hmm, but you are also counting the destination of the threading here
>> which we will
>> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
>> conditional so we
>> look for hot BBs we are actually duplicating only (similar restrictions apply to
>> path[0]?).
>
> Aha, you are right.  I am re-testing updated patch (it also solves the PR)

What about j == 0?  We don't duplicate that block either.

> Honza
>
> Index: tree-ssa-threadbackward.c
> ===================================================================
> --- tree-ssa-threadbackward.c   (revision 244732)
> +++ tree-ssa-threadbackward.c   (working copy)
> @@ -159,6 +159,7 @@ profitable_jump_thread_path (vec<basic_b
>    bool threaded_through_latch = false;
>    bool multiway_branch_in_path = false;
>    bool threaded_multiway_branch = false;
> +  bool contains_hot_bb = false;
>
>    /* Count the number of instructions on the path: as these instructions
>       will have to be duplicated, we will not record the path if there
> @@ -168,6 +169,9 @@ profitable_jump_thread_path (vec<basic_b
>      {
>        basic_block bb = (*path)[j];
>
> +      if (!contains_hot_bb && speed_p && j < path_length - 1)
> +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
> +
>        /* Remember, blocks in the path are stored in opposite order
>          in the PATH array.  The last entry in the array represents
>          the block with an outgoing edge that we will redirect to the
> @@ -311,7 +315,11 @@ profitable_jump_thread_path (vec<basic_b
>        return NULL;
>      }
>
> -  if (speed_p && optimize_edge_for_speed_p (taken_edge))
> +  /* Threading is profitable if the path duplicated is hot but also
> +     in a case we separate cold path from hot path and permit optimization
> +     of the hot path later.  Be on the agressive side here. In some testcases,
> +     as in PR 78407 this leads to noticeable improvements.  */
> +  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
>      {
>        if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
>         {
Jan Hubicka Jan. 26, 2017, 10:39 a.m. UTC | #2
> On Thu, Jan 26, 2017 at 11:04 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> > +      if (!contains_hot_bb && speed_p)
> >> > +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
> >> > +
> >>
> >> Hmm, but you are also counting the destination of the threading here
> >> which we will
> >> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
> >> conditional so we
> >> look for hot BBs we are actually duplicating only (similar restrictions apply to
> >> path[0]?).
> >
> > Aha, you are right.  I am re-testing updated patch (it also solves the PR)
> 
> What about j == 0?  We don't duplicate that block either.

We do duplicate it, just eliminate the control flow at the end of bb
(j==0 is the last basic block of path containing the control flow being threaded)

I am updating patch actually print into details dump file what paths are considered
and what insn counts are counted.

Honza
Jeff Law Jan. 27, 2017, 8:48 p.m. UTC | #3
On 01/26/2017 03:39 AM, Jan Hubicka wrote:
>> On Thu, Jan 26, 2017 at 11:04 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>>>>> +      if (!contains_hot_bb && speed_p)
>>>>> +       contains_hot_bb |= optimize_bb_for_speed_p (bb);
>>>>> +
>>>>
>>>> Hmm, but you are also counting the destination of the threading here
>>>> which we will
>>>> not duplicate.  Shouldn't this be under the if (j < path_length - 1)
>>>> conditional so we
>>>> look for hot BBs we are actually duplicating only (similar restrictions apply to
>>>> path[0]?).
>>>
>>> Aha, you are right.  I am re-testing updated patch (it also solves the PR)
>>
>> What about j == 0?  We don't duplicate that block either.
>
> We do duplicate it, just eliminate the control flow at the end of bb
> (j==0 is the last basic block of path containing the control flow being threaded)
Right.  And that can (of course) allow other parts of the block to be 
eliminated as dead code -- we've never tried to model any of that.

>
> I am updating patch actually print into details dump file what paths are considered
> and what insn counts are counted.
Probably wise.

jeff
diff mbox

Patch

Index: tree-ssa-threadbackward.c
===================================================================
--- tree-ssa-threadbackward.c	(revision 244732)
+++ tree-ssa-threadbackward.c	(working copy)
@@ -159,6 +159,7 @@  profitable_jump_thread_path (vec<basic_b
   bool threaded_through_latch = false;
   bool multiway_branch_in_path = false;
   bool threaded_multiway_branch = false;
+  bool contains_hot_bb = false;
 
   /* Count the number of instructions on the path: as these instructions
      will have to be duplicated, we will not record the path if there
@@ -168,6 +169,9 @@  profitable_jump_thread_path (vec<basic_b
     {
       basic_block bb = (*path)[j];
 
+      if (!contains_hot_bb && speed_p && j < path_length - 1)
+	contains_hot_bb |= optimize_bb_for_speed_p (bb);
+
       /* Remember, blocks in the path are stored in opposite order
 	 in the PATH array.  The last entry in the array represents
 	 the block with an outgoing edge that we will redirect to the
@@ -311,7 +315,11 @@  profitable_jump_thread_path (vec<basic_b
       return NULL;
     }
 
-  if (speed_p && optimize_edge_for_speed_p (taken_edge))
+  /* Threading is profitable if the path duplicated is hot but also
+     in a case we separate cold path from hot path and permit optimization
+     of the hot path later.  Be on the agressive side here. In some testcases,
+     as in PR 78407 this leads to noticeable improvements.  */
+  if (speed_p && (optimize_edge_for_speed_p (taken_edge) || contains_hot_bb))
     {
       if (n_insns >= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
 	{