diff mbox series

tree-optimization: Add register pressure heuristics

Message ID ed1ab120-cb87-4f1f-a219-0e0c6e8929e2@linux.ibm.com
State New
Headers show
Series tree-optimization: Add register pressure heuristics | expand

Commit Message

Ajit Agarwal Nov. 2, 2023, 8:50 p.m. UTC
Hello All:

Currently code sinking heuristics are based on profile data like
basic block count and sink frequency threshold. We have removed
such heuristics and added register pressure heuristics based on
live-in and live-out of early blocks and immediate dominator of
use blocks of the same loop nesting depth.

Such heuristics reduces register pressure when code sinking is 
done with same loop nesting depth.

High register pressure region is the region where there are live-in of
early blocks that has been modified by the early block. If there are
modification of the variables in best block that are live-in in early
block that are live-out of best block.

Bootstrapped and regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit

tree-optimization: Add register pressure heuristics

Currently code sinking heuristics are based on profile data like
basic block count and sink frequency threshold. We have removed
such heuristics to add register pressure heuristics based on
live-in and live-out of early blocks and immediate dominator of
use blocks.

High register pressure region is the region where there are live-in of
early blocks that has been modified by the early block. If there are
modification of the variables in best block that are live-in in early
block that are live-out of best block.

2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>

gcc/ChangeLog:

        * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
        as paramters.
        (sink_code_in_bb): Ditto.
        (select_best_block): Add register pressure heuristics to select
        the best blocks in the immediate dominator for same loop nest depth.
        (execute): Add live range analysis.
        (additional_var_map): New function.
        * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
        tests on ssa_names.
        (verify_live_on_entry): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
        * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
 gcc/tree-ssa-live.cc                        | 11 ++-
 gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
 4 files changed, 104 insertions(+), 34 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c

Comments

Richard Biener Nov. 3, 2023, 7:21 a.m. UTC | #1
On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>
> Hello All:
>
> Currently code sinking heuristics are based on profile data like
> basic block count and sink frequency threshold. We have removed
> such heuristics and added register pressure heuristics based on
> live-in and live-out of early blocks and immediate dominator of
> use blocks of the same loop nesting depth.
>
> Such heuristics reduces register pressure when code sinking is
> done with same loop nesting depth.
>
> High register pressure region is the region where there are live-in of
> early blocks that has been modified by the early block. If there are
> modification of the variables in best block that are live-in in early
> block that are live-out of best block.

?!  Parse error.

> Bootstrapped and regtested on powerpc64-linux-gnu.

What's the effect on code generation?

Note that live is a quadratic problem while sinking was not.  You
are effectively making the pass unfit for -O1.

You are computing "liveness" on GIMPLE where within EBBs there
isn't really any particular order of stmts, so it's kind of a garbage
heuristic.  Likewise you are not computing the effect that moving
a stmt has on liveness as far as I can see but you are just identifying
some odd metrics (I don't really understand them) to rank blocks,
not even taking the register file size into account.

You are replacing the hot/cold heuristic.

IMHO the sinking pass is the totally wrong place to do anything
about register pressure.  You are trying to solve a scheduling
problem by just looking at a single stmt.

Richard.

> Thanks & Regards
> Ajit
>
> tree-optimization: Add register pressure heuristics
>
> Currently code sinking heuristics are based on profile data like
> basic block count and sink frequency threshold. We have removed
> such heuristics to add register pressure heuristics based on
> live-in and live-out of early blocks and immediate dominator of
> use blocks.
>
> High register pressure region is the region where there are live-in of
> early blocks that has been modified by the early block. If there are
> modification of the variables in best block that are live-in in early
> block that are live-out of best block.
>
> 2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>
> gcc/ChangeLog:
>
>         * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
>         as paramters.
>         (sink_code_in_bb): Ditto.
>         (select_best_block): Add register pressure heuristics to select
>         the best blocks in the immediate dominator for same loop nest depth.
>         (execute): Add live range analysis.
>         (additional_var_map): New function.
>         * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
>         tests on ssa_names.
>         (verify_live_on_entry): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>         * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
>  gcc/tree-ssa-live.cc                        | 11 ++-
>  gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
>  4 files changed, 104 insertions(+), 34 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> new file mode 100644
> index 00000000000..d3b79ca5803
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +    {
> +      bar();
> +      j = l;
> +    }
> +}
> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> new file mode 100644
> index 00000000000..84e7938c54f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> +void bar();
> +int j, x;
> +void foo(int a, int b, int c, int d, int e, int f)
> +{
> +  int l;
> +  l = a + b + c + d +e + f;
> +  if (a != 5)
> +    {
> +      bar();
> +      if (b != 3)
> +        x = 3;
> +      else
> +        x = 5;
> +      j = l;
> +    }
> +}
> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
> index f06daf23035..998fe588278 100644
> --- a/gcc/tree-ssa-live.cc
> +++ b/gcc/tree-ssa-live.cc
> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
>      def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>
>    /* An undefined local variable does not need to be very alive.  */
> -  if (ssa_undefined_value_p (ssa_name, false))
> +  if (virtual_operand_p (ssa_name)
> +      || ssa_undefined_value_p (ssa_name, false))
>      return;
>
>    /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
>
>
>  /* Verify that the info in LIVE matches the current cfg.  */
> -
>  static void
>  verify_live_on_entry (tree_live_info_p live)
>  {
> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
>           tree d = NULL_TREE;
>           bitmap loe;
>           var = partition_to_var (map, i);
> +         if (var == NULL_TREE)
> +           continue;
>           stmt = SSA_NAME_DEF_STMT (var);
>           tmp = gimple_bb (stmt);
> +
>           if (SSA_NAME_VAR (var))
>             d = ssa_default_def (cfun, SSA_NAME_VAR (var));
> -
>           loe = live_on_entry (live, e->dest);
>           if (loe && bitmap_bit_p (loe, i))
>             {
> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
>               {
>                 /* An undefined local variable does not need to be very
>                    alive.  */
> -               if (ssa_undefined_value_p (var, false))
> +               if (virtual_operand_p (var)
> +                   || ssa_undefined_value_p (var, false))
>                   continue;
>
>                 /* The only way this var shouldn't be marked live on entry is
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index a360c5cdd6e..d0c9ef1ab86 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>     tree, return the best basic block between them (inclusive) to place
>     statements.
>
> +   The best basic block should be an immediate dominator of
> +   best basic block if we've moved to same loop nest.
> +
>     We want the most control dependent block in the shallowest loop nest.
>
>     If the resulting block is in a shallower loop nest, then use it.  Else
> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>  static basic_block
>  select_best_block (basic_block early_bb,
>                    basic_block late_bb,
> -                  gimple *stmt)
> +                  gimple *stmt,
> +                  tree_live_info_p &live)
>  {
>    basic_block best_bb = late_bb;
>    basic_block temp_bb = late_bb;
> -  int threshold;
>
>    while (temp_bb != early_bb)
>      {
>        /* If we've moved into a lower loop nest, then that becomes
>          our best block.  */
> -      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
> +      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
>         best_bb = temp_bb;
>
>        /* Walk up the dominator tree, hopefully we'll find a shallower
>          loop nest.  */
>        temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>      }
> -
>    /* Placing a statement before a setjmp-like function would be invalid
>       (it cannot be reevaluated when execution follows an abnormal edge).
>       If we selected a block with abnormal predecessors, just punt.  */
> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>      return early_bb;
>
> -  /* Get the sinking threshold.  If the statement to be moved has memory
> -     operands, then increase the threshold by 7% as those are even more
> -     profitable to avoid, clamping at 100%.  */
> -  threshold = param_sink_frequency_threshold;
> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
> -    {
> -      threshold += 7;
> -      if (threshold > 100)
> -       threshold = 100;
> -    }
> +  int best_bb_liveout_cnt
> +    = bitmap_count_bits (&live->liveout[best_bb->index]);
> +  int early_bb_liveout_cnt
> +     = bitmap_count_bits (&live->liveout[early_bb->index]);
> +  int early_livein_cnt
> +     = bitmap_count_bits (&live->livein[early_bb->index]);
> +
> +  /* High register pressure region is the region where there are live-in of
> +     early blocks that has been modified by the early block. If there are
> +     modification of the variables in best block that are live-in in early
> +     block that are live-out of best block.  */
> +  bool live_in_rgn = (early_livein_cnt != 0
> +                     && early_bb_liveout_cnt <= early_livein_cnt);
> +
> +  bool high_reg_pressure_rgn
> +    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
> +       && best_bb_liveout_cnt <= early_bb_liveout_cnt);
>
>    /* If BEST_BB is at the same nesting level, then require it to have
> -     significantly lower execution frequency to avoid gratuitous movement.  */
> +     significantly high register pressure region to avoid gratuitous
> +      movement.  */
>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
> -      /* If result of comparsion is unknown, prefer EARLY_BB.
> -        Thus use !(...>=..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> -    return best_bb;
> +     && high_reg_pressure_rgn)
> +    {
> +     /* Avoid sinking to immediate dominator if the statement to be moved
> +       has memory operand and same loop nest.  */
> +      if (best_bb != late_bb && gimple_vuse (stmt))
> +       return late_bb;
> +      return best_bb;
> +    }
>
>    /* No better block found, so return EARLY_BB, which happens to be the
>       statement's original block.  */
> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
>  static bool
>  statement_sink_location (gimple *stmt, basic_block frombb,
>                          gimple_stmt_iterator *togsi, bool *zero_uses_p,
> -                        virtual_operand_live &vop_live)
> +                        virtual_operand_live &vop_live,
> +                        tree_live_info_p &live)
>  {
>    gimple *use;
>    use_operand_p one_use = NULL_USE_OPERAND_P;
> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
>         return false;
>
> -      commondom = select_best_block (frombb, commondom, stmt);
> +      commondom = select_best_block (frombb, commondom, stmt, live);
>
>        if (commondom == frombb)
>         return false;
> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>             continue;
>           break;
>         }
> +
>        use = USE_STMT (one_use);
>
>        if (gimple_code (use) != GIMPLE_PHI)
>         {
> -         sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
> +         sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
>
>           if (sinkbb == frombb)
>             return false;
>
> -         if (sinkbb == gimple_bb (use))
> -           *togsi = gsi_for_stmt (use);
> -         else
> -           *togsi = gsi_after_labels (sinkbb);
> +         *togsi = gsi_after_labels (sinkbb);
>
>           return true;
>         }
> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>    if (!sinkbb)
>      return false;
>
> -  sinkbb = select_best_block (frombb, sinkbb, stmt);
> +  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
>    if (!sinkbb || sinkbb == frombb)
>      return false;
>
> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
>  /* Perform code sinking on BB */
>
>  static unsigned
> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
> +               tree_live_info_p &live)
>  {
>    gimple_stmt_iterator gsi;
>    edge_iterator ei;
> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>        gimple_stmt_iterator togsi;
>        bool zero_uses_p;
>
> -      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
> +      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
> +                                   vop_live, live))
>         {
>           gimple_stmt_iterator saved = gsi;
>           if (!gsi_end_p (gsi))
> @@ -814,6 +829,15 @@ private:
>    bool unsplit_edges;
>  }; // class pass_sink_code
>
> +void additional_var_map (var_map &map)
> +{
> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
> +  map->outofssa_p = false;
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, cfun)
> +    bitmap_set_bit (map->bmp_bbs, bb->index);
> +}
> +
>  unsigned int
>  pass_sink_code::execute (function *fun)
>  {
> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
>    calculate_dominance_info (CDI_DOMINATORS);
>
>    virtual_operand_live vop_live;
> +  var_map map = init_var_map (num_ssa_names);
> +  additional_var_map (map);
> +  tree_live_info_p live = calculate_live_ranges (map, true);
>
>    int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>    int n = inverted_rev_post_order_compute (fun, rpo);
> +
>    for (int i = 0; i < n; ++i)
> -    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
> +    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
>    free (rpo);
> +  if (live)
> +    delete_tree_live_info (live);
> +
> +  if (map)
> +    delete_var_map (map);
>
>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
> --
> 2.39.3
>
>
Ajit Agarwal Nov. 3, 2023, 10:20 a.m. UTC | #2
Hello Richard:

On 03/11/23 12:51 pm, Richard Biener wrote:
> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>
>> Hello All:
>>
>> Currently code sinking heuristics are based on profile data like
>> basic block count and sink frequency threshold. We have removed
>> such heuristics and added register pressure heuristics based on
>> live-in and live-out of early blocks and immediate dominator of
>> use blocks of the same loop nesting depth.
>>
>> Such heuristics reduces register pressure when code sinking is
>> done with same loop nesting depth.
>>
>> High register pressure region is the region where there are live-in of
>> early blocks that has been modified by the early block. If there are
>> modification of the variables in best block that are live-in in early
>> block that are live-out of best block.
> 
> ?!  Parse error.
> 

I didnt understand what you meant here. Please suggest.

>> Bootstrapped and regtested on powerpc64-linux-gnu.
> 
> What's the effect on code generation?
> 
> Note that live is a quadratic problem while sinking was not.  You
> are effectively making the pass unfit for -O1.
> 
> You are computing "liveness" on GIMPLE where within EBBs there
> isn't really any particular order of stmts, so it's kind of a garbage
> heuristic.  Likewise you are not computing the effect that moving
> a stmt has on liveness as far as I can see but you are just identifying
> some odd metrics (I don't really understand them) to rank blocks,
> not even taking the register file size into account.


if the live out of best_bb  <= live out of early_bb, that shows
that there are modification in best_bb. Then it's 
safer to move statements in best_bb as there are lesser interfering
live variables in best_bb.

if there are lesser live out in best_bb, there is lesser chance 
of interfering live ranges and hence moving statements in best_bb
will not increase register pressure.

If the liveout of best_bb is greater than live-out of early_bb, 
moving statements in best_bb will increase chances of more interfering
live ranges and hence increase in register pressure.

This is how the heuristics is defined.


> 
> You are replacing the hot/cold heuristic.

> 
> IMHO the sinking pass is the totally wrong place to do anything
> about register pressure.  You are trying to solve a scheduling
> problem by just looking at a single stmt.
> 

bb->count from profile.cc are prone to errors as you have 
mentioned in previous mails. Main bottlenecks with code 
motion is increase in register pressure as that counts to 
spills in later phases of the compiler backend.

Calculation of best_bb based of immediate dominator should
consider register pressure instead of hold cold regions as that
would effect code generation.

If there is increase in register pressure with code motion and if
we are moving into colder regions, that wont improve code generations.

Hold/cold should be criteria but not the improved criteria with 
code motion.

We should consider register pressure with code motion than hot/cold
regions.

Thanks & Regards
Ajit

> Richard.
> 
>> Thanks & Regards


>> Ajit
>>
>> tree-optimization: Add register pressure heuristics
>>
>> Currently code sinking heuristics are based on profile data like
>> basic block count and sink frequency threshold. We have removed
>> such heuristics to add register pressure heuristics based on
>> live-in and live-out of early blocks and immediate dominator of
>> use blocks.
>>
>> High register pressure region is the region where there are live-in of
>> early blocks that has been modified by the early block. If there are
>> modification of the variables in best block that are live-in in early
>> block that are live-out of best block.
>>
>> 2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>>
>> gcc/ChangeLog:
>>
>>         * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
>>         as paramters.
>>         (sink_code_in_bb): Ditto.
>>         (select_best_block): Add register pressure heuristics to select
>>         the best blocks in the immediate dominator for same loop nest depth.
>>         (execute): Add live range analysis.
>>         (additional_var_map): New function.
>>         * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
>>         tests on ssa_names.
>>         (verify_live_on_entry): Ditto.
>>
>> gcc/testsuite/ChangeLog:
>>
>>         * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>>         * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
>>  gcc/tree-ssa-live.cc                        | 11 ++-
>>  gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
>>  4 files changed, 104 insertions(+), 34 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>> new file mode 100644
>> index 00000000000..d3b79ca5803
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>> +void bar();
>> +int j;
>> +void foo(int a, int b, int c, int d, int e, int f)
>> +{
>> +  int l;
>> +  l = a + b + c + d +e + f;
>> +  if (a != 5)
>> +    {
>> +      bar();
>> +      j = l;
>> +    }
>> +}
>> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>> new file mode 100644
>> index 00000000000..84e7938c54f
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>> +void bar();
>> +int j, x;
>> +void foo(int a, int b, int c, int d, int e, int f)
>> +{
>> +  int l;
>> +  l = a + b + c + d +e + f;
>> +  if (a != 5)
>> +    {
>> +      bar();
>> +      if (b != 3)
>> +        x = 3;
>> +      else
>> +        x = 5;
>> +      j = l;
>> +    }
>> +}
>> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
>> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
>> index f06daf23035..998fe588278 100644
>> --- a/gcc/tree-ssa-live.cc
>> +++ b/gcc/tree-ssa-live.cc
>> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
>>      def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>>
>>    /* An undefined local variable does not need to be very alive.  */
>> -  if (ssa_undefined_value_p (ssa_name, false))
>> +  if (virtual_operand_p (ssa_name)
>> +      || ssa_undefined_value_p (ssa_name, false))
>>      return;
>>
>>    /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
>> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
>>
>>
>>  /* Verify that the info in LIVE matches the current cfg.  */
>> -
>>  static void
>>  verify_live_on_entry (tree_live_info_p live)
>>  {
>> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
>>           tree d = NULL_TREE;
>>           bitmap loe;
>>           var = partition_to_var (map, i);
>> +         if (var == NULL_TREE)
>> +           continue;
>>           stmt = SSA_NAME_DEF_STMT (var);
>>           tmp = gimple_bb (stmt);
>> +
>>           if (SSA_NAME_VAR (var))
>>             d = ssa_default_def (cfun, SSA_NAME_VAR (var));
>> -
>>           loe = live_on_entry (live, e->dest);
>>           if (loe && bitmap_bit_p (loe, i))
>>             {
>> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
>>               {
>>                 /* An undefined local variable does not need to be very
>>                    alive.  */
>> -               if (ssa_undefined_value_p (var, false))
>> +               if (virtual_operand_p (var)
>> +                   || ssa_undefined_value_p (var, false))
>>                   continue;
>>
>>                 /* The only way this var shouldn't be marked live on entry is
>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
>> index a360c5cdd6e..d0c9ef1ab86 100644
>> --- a/gcc/tree-ssa-sink.cc
>> +++ b/gcc/tree-ssa-sink.cc
>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>     tree, return the best basic block between them (inclusive) to place
>>     statements.
>>
>> +   The best basic block should be an immediate dominator of
>> +   best basic block if we've moved to same loop nest.
>> +
>>     We want the most control dependent block in the shallowest loop nest.
>>
>>     If the resulting block is in a shallower loop nest, then use it.  Else
>> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>  static basic_block
>>  select_best_block (basic_block early_bb,
>>                    basic_block late_bb,
>> -                  gimple *stmt)
>> +                  gimple *stmt,
>> +                  tree_live_info_p &live)
>>  {
>>    basic_block best_bb = late_bb;
>>    basic_block temp_bb = late_bb;
>> -  int threshold;
>>
>>    while (temp_bb != early_bb)
>>      {
>>        /* If we've moved into a lower loop nest, then that becomes
>>          our best block.  */
>> -      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>> +      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
>>         best_bb = temp_bb;
>>
>>        /* Walk up the dominator tree, hopefully we'll find a shallower
>>          loop nest.  */
>>        temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>>      }
>> -
>>    /* Placing a statement before a setjmp-like function would be invalid
>>       (it cannot be reevaluated when execution follows an abnormal edge).
>>       If we selected a block with abnormal predecessors, just punt.  */
>> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
>>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>>      return early_bb;
>>
>> -  /* Get the sinking threshold.  If the statement to be moved has memory
>> -     operands, then increase the threshold by 7% as those are even more
>> -     profitable to avoid, clamping at 100%.  */
>> -  threshold = param_sink_frequency_threshold;
>> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
>> -    {
>> -      threshold += 7;
>> -      if (threshold > 100)
>> -       threshold = 100;
>> -    }
>> +  int best_bb_liveout_cnt
>> +    = bitmap_count_bits (&live->liveout[best_bb->index]);
>> +  int early_bb_liveout_cnt
>> +     = bitmap_count_bits (&live->liveout[early_bb->index]);
>> +  int early_livein_cnt
>> +     = bitmap_count_bits (&live->livein[early_bb->index]);
>> +
>> +  /* High register pressure region is the region where there are live-in of
>> +     early blocks that has been modified by the early block. If there are
>> +     modification of the variables in best block that are live-in in early
>> +     block that are live-out of best block.  */
>> +  bool live_in_rgn = (early_livein_cnt != 0
>> +                     && early_bb_liveout_cnt <= early_livein_cnt);
>> +
>> +  bool high_reg_pressure_rgn
>> +    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
>> +       && best_bb_liveout_cnt <= early_bb_liveout_cnt);
>>
>>    /* If BEST_BB is at the same nesting level, then require it to have
>> -     significantly lower execution frequency to avoid gratuitous movement.  */
>> +     significantly high register pressure region to avoid gratuitous
>> +      movement.  */
>>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>> -      /* If result of comparsion is unknown, prefer EARLY_BB.
>> -        Thus use !(...>=..) rather than (...<...)  */
>> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
>> -    return best_bb;
>> +     && high_reg_pressure_rgn)
>> +    {
>> +     /* Avoid sinking to immediate dominator if the statement to be moved
>> +       has memory operand and same loop nest.  */
>> +      if (best_bb != late_bb && gimple_vuse (stmt))
>> +       return late_bb;
>> +      return best_bb;
>> +    }
>>
>>    /* No better block found, so return EARLY_BB, which happens to be the
>>       statement's original block.  */
>> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
>>  static bool
>>  statement_sink_location (gimple *stmt, basic_block frombb,
>>                          gimple_stmt_iterator *togsi, bool *zero_uses_p,
>> -                        virtual_operand_live &vop_live)
>> +                        virtual_operand_live &vop_live,
>> +                        tree_live_info_p &live)
>>  {
>>    gimple *use;
>>    use_operand_p one_use = NULL_USE_OPERAND_P;
>> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
>>         return false;
>>
>> -      commondom = select_best_block (frombb, commondom, stmt);
>> +      commondom = select_best_block (frombb, commondom, stmt, live);
>>
>>        if (commondom == frombb)
>>         return false;
>> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>             continue;
>>           break;
>>         }
>> +
>>        use = USE_STMT (one_use);
>>
>>        if (gimple_code (use) != GIMPLE_PHI)
>>         {
>> -         sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
>> +         sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
>>
>>           if (sinkbb == frombb)
>>             return false;
>>
>> -         if (sinkbb == gimple_bb (use))
>> -           *togsi = gsi_for_stmt (use);
>> -         else
>> -           *togsi = gsi_after_labels (sinkbb);
>> +         *togsi = gsi_after_labels (sinkbb);
>>
>>           return true;
>>         }
>> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>    if (!sinkbb)
>>      return false;
>>
>> -  sinkbb = select_best_block (frombb, sinkbb, stmt);
>> +  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
>>    if (!sinkbb || sinkbb == frombb)
>>      return false;
>>
>> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
>>  /* Perform code sinking on BB */
>>
>>  static unsigned
>> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
>> +               tree_live_info_p &live)
>>  {
>>    gimple_stmt_iterator gsi;
>>    edge_iterator ei;
>> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>        gimple_stmt_iterator togsi;
>>        bool zero_uses_p;
>>
>> -      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
>> +      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
>> +                                   vop_live, live))
>>         {
>>           gimple_stmt_iterator saved = gsi;
>>           if (!gsi_end_p (gsi))
>> @@ -814,6 +829,15 @@ private:
>>    bool unsplit_edges;
>>  }; // class pass_sink_code
>>
>> +void additional_var_map (var_map &map)
>> +{
>> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
>> +  map->outofssa_p = false;
>> +  basic_block bb;
>> +  FOR_EACH_BB_FN (bb, cfun)
>> +    bitmap_set_bit (map->bmp_bbs, bb->index);
>> +}
>> +
>>  unsigned int
>>  pass_sink_code::execute (function *fun)
>>  {
>> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
>>    calculate_dominance_info (CDI_DOMINATORS);
>>
>>    virtual_operand_live vop_live;
>> +  var_map map = init_var_map (num_ssa_names);
>> +  additional_var_map (map);
>> +  tree_live_info_p live = calculate_live_ranges (map, true);
>>
>>    int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>>    int n = inverted_rev_post_order_compute (fun, rpo);
>> +
>>    for (int i = 0; i < n; ++i)
>> -    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
>> +    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
>>    free (rpo);
>> +  if (live)
>> +    delete_tree_live_info (live);
>> +
>> +  if (map)
>> +    delete_var_map (map);
>>
>>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
>> --
>> 2.39.3
>>
>>
Richard Biener Nov. 3, 2023, 1:36 p.m. UTC | #3
On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>
> Hello Richard:
>
> On 03/11/23 12:51 pm, Richard Biener wrote:
> > On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
> >>
> >> Hello All:
> >>
[...]
> >>
> >> High register pressure region is the region where there are live-in of
> >> early blocks that has been modified by the early block. If there are
> >> modification of the variables in best block that are live-in in early
> >> block that are live-out of best block.
> >
> > ?!  Parse error.
> >
>
> I didnt understand what you meant here. Please suggest.

I can't even guess what that paragraph means.  It fails at a
parsing level already, I can't even start to reason about what
the sentences mean.

> >> Bootstrapped and regtested on powerpc64-linux-gnu.
> >
> > What's the effect on code generation?
> >
> > Note that live is a quadratic problem while sinking was not.  You
> > are effectively making the pass unfit for -O1.
> >
> > You are computing "liveness" on GIMPLE where within EBBs there
> > isn't really any particular order of stmts, so it's kind of a garbage
> > heuristic.  Likewise you are not computing the effect that moving
> > a stmt has on liveness as far as I can see but you are just identifying
> > some odd metrics (I don't really understand them) to rank blocks,
> > not even taking the register file size into account.
>
>
> if the live out of best_bb  <= live out of early_bb, that shows
> that there are modification in best_bb.

Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
then some variables die during the execution of bb?  Otherwise,
if live_out (early) > live_out (best) then somewhere on the path
from early to best some variables die.

> Then it's
> safer to move statements in best_bb as there are lesser interfering
> live variables in best_bb.

consider a stmt

 a = b + c;

where b and c die at the definition of a.  Then moving the stmt
down from early_bb means you increase live_out (early_bb) by
one.  So why's that "safer" then?  Of course live_out (best_bb)
also increases by two then.

> if there are lesser live out in best_bb, there is lesser chance
> of interfering live ranges and hence moving statements in best_bb
> will not increase register pressure.
>
> If the liveout of best_bb is greater than live-out of early_bb,
> moving statements in best_bb will increase chances of more interfering
> live ranges and hence increase in register pressure.
>
> This is how the heuristics is defined.

I don't think they will work.  Do you have any numbers?

>
> >
> > You are replacing the hot/cold heuristic.
>
> >
> > IMHO the sinking pass is the totally wrong place to do anything
> > about register pressure.  You are trying to solve a scheduling
> > problem by just looking at a single stmt.
> >
>
> bb->count from profile.cc are prone to errors as you have
> mentioned in previous mails. Main bottlenecks with code
> motion is increase in register pressure as that counts to
> spills in later phases of the compiler backend.
>
> Calculation of best_bb based of immediate dominator should
> consider register pressure instead of hold cold regions as that
> would effect code generation.
>
> If there is increase in register pressure with code motion and if
> we are moving into colder regions, that wont improve code generations.
>
> Hold/cold should be criteria but not the improved criteria with
> code motion.
>
> We should consider register pressure with code motion than hot/cold
> regions.
>
> Thanks & Regards
> Ajit
>
> > Richard.
> >
> >> Thanks & Regards
>
>
> >> Ajit
> >>
> >> tree-optimization: Add register pressure heuristics
> >>
> >> Currently code sinking heuristics are based on profile data like
> >> basic block count and sink frequency threshold. We have removed
> >> such heuristics to add register pressure heuristics based on
> >> live-in and live-out of early blocks and immediate dominator of
> >> use blocks.
> >>
> >> High register pressure region is the region where there are live-in of
> >> early blocks that has been modified by the early block. If there are
> >> modification of the variables in best block that are live-in in early
> >> block that are live-out of best block.
> >>
> >> 2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
> >>
> >> gcc/ChangeLog:
> >>
> >>         * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
> >>         as paramters.
> >>         (sink_code_in_bb): Ditto.
> >>         (select_best_block): Add register pressure heuristics to select
> >>         the best blocks in the immediate dominator for same loop nest depth.
> >>         (execute): Add live range analysis.
> >>         (additional_var_map): New function.
> >>         * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
> >>         tests on ssa_names.
> >>         (verify_live_on_entry): Ditto.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>         * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> >>         * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
> >> ---
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
> >>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
> >>  gcc/tree-ssa-live.cc                        | 11 ++-
> >>  gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
> >>  4 files changed, 104 insertions(+), 34 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >> new file mode 100644
> >> index 00000000000..d3b79ca5803
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >> @@ -0,0 +1,15 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> >> +void bar();
> >> +int j;
> >> +void foo(int a, int b, int c, int d, int e, int f)
> >> +{
> >> +  int l;
> >> +  l = a + b + c + d +e + f;
> >> +  if (a != 5)
> >> +    {
> >> +      bar();
> >> +      j = l;
> >> +    }
> >> +}
> >> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >> new file mode 100644
> >> index 00000000000..84e7938c54f
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >> @@ -0,0 +1,19 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> >> +void bar();
> >> +int j, x;
> >> +void foo(int a, int b, int c, int d, int e, int f)
> >> +{
> >> +  int l;
> >> +  l = a + b + c + d +e + f;
> >> +  if (a != 5)
> >> +    {
> >> +      bar();
> >> +      if (b != 3)
> >> +        x = 3;
> >> +      else
> >> +        x = 5;
> >> +      j = l;
> >> +    }
> >> +}
> >> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> >> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
> >> index f06daf23035..998fe588278 100644
> >> --- a/gcc/tree-ssa-live.cc
> >> +++ b/gcc/tree-ssa-live.cc
> >> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
> >>      def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
> >>
> >>    /* An undefined local variable does not need to be very alive.  */
> >> -  if (ssa_undefined_value_p (ssa_name, false))
> >> +  if (virtual_operand_p (ssa_name)
> >> +      || ssa_undefined_value_p (ssa_name, false))
> >>      return;
> >>
> >>    /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
> >> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
> >>
> >>
> >>  /* Verify that the info in LIVE matches the current cfg.  */
> >> -
> >>  static void
> >>  verify_live_on_entry (tree_live_info_p live)
> >>  {
> >> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
> >>           tree d = NULL_TREE;
> >>           bitmap loe;
> >>           var = partition_to_var (map, i);
> >> +         if (var == NULL_TREE)
> >> +           continue;
> >>           stmt = SSA_NAME_DEF_STMT (var);
> >>           tmp = gimple_bb (stmt);
> >> +
> >>           if (SSA_NAME_VAR (var))
> >>             d = ssa_default_def (cfun, SSA_NAME_VAR (var));
> >> -
> >>           loe = live_on_entry (live, e->dest);
> >>           if (loe && bitmap_bit_p (loe, i))
> >>             {
> >> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
> >>               {
> >>                 /* An undefined local variable does not need to be very
> >>                    alive.  */
> >> -               if (ssa_undefined_value_p (var, false))
> >> +               if (virtual_operand_p (var)
> >> +                   || ssa_undefined_value_p (var, false))
> >>                   continue;
> >>
> >>                 /* The only way this var shouldn't be marked live on entry is
> >> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> >> index a360c5cdd6e..d0c9ef1ab86 100644
> >> --- a/gcc/tree-ssa-sink.cc
> >> +++ b/gcc/tree-ssa-sink.cc
> >> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
> >>     tree, return the best basic block between them (inclusive) to place
> >>     statements.
> >>
> >> +   The best basic block should be an immediate dominator of
> >> +   best basic block if we've moved to same loop nest.
> >> +
> >>     We want the most control dependent block in the shallowest loop nest.
> >>
> >>     If the resulting block is in a shallower loop nest, then use it.  Else
> >> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
> >>  static basic_block
> >>  select_best_block (basic_block early_bb,
> >>                    basic_block late_bb,
> >> -                  gimple *stmt)
> >> +                  gimple *stmt,
> >> +                  tree_live_info_p &live)
> >>  {
> >>    basic_block best_bb = late_bb;
> >>    basic_block temp_bb = late_bb;
> >> -  int threshold;
> >>
> >>    while (temp_bb != early_bb)
> >>      {
> >>        /* If we've moved into a lower loop nest, then that becomes
> >>          our best block.  */
> >> -      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
> >> +      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
> >>         best_bb = temp_bb;
> >>
> >>        /* Walk up the dominator tree, hopefully we'll find a shallower
> >>          loop nest.  */
> >>        temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> >>      }
> >> -
> >>    /* Placing a statement before a setjmp-like function would be invalid
> >>       (it cannot be reevaluated when execution follows an abnormal edge).
> >>       If we selected a block with abnormal predecessors, just punt.  */
> >> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
> >>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
> >>      return early_bb;
> >>
> >> -  /* Get the sinking threshold.  If the statement to be moved has memory
> >> -     operands, then increase the threshold by 7% as those are even more
> >> -     profitable to avoid, clamping at 100%.  */
> >> -  threshold = param_sink_frequency_threshold;
> >> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
> >> -    {
> >> -      threshold += 7;
> >> -      if (threshold > 100)
> >> -       threshold = 100;
> >> -    }
> >> +  int best_bb_liveout_cnt
> >> +    = bitmap_count_bits (&live->liveout[best_bb->index]);
> >> +  int early_bb_liveout_cnt
> >> +     = bitmap_count_bits (&live->liveout[early_bb->index]);
> >> +  int early_livein_cnt
> >> +     = bitmap_count_bits (&live->livein[early_bb->index]);
> >> +
> >> +  /* High register pressure region is the region where there are live-in of
> >> +     early blocks that has been modified by the early block. If there are
> >> +     modification of the variables in best block that are live-in in early
> >> +     block that are live-out of best block.  */
> >> +  bool live_in_rgn = (early_livein_cnt != 0
> >> +                     && early_bb_liveout_cnt <= early_livein_cnt);
> >> +
> >> +  bool high_reg_pressure_rgn
> >> +    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
> >> +       && best_bb_liveout_cnt <= early_bb_liveout_cnt);
> >>
> >>    /* If BEST_BB is at the same nesting level, then require it to have
> >> -     significantly lower execution frequency to avoid gratuitous movement.  */
> >> +     significantly high register pressure region to avoid gratuitous
> >> +      movement.  */
> >>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
> >> -      /* If result of comparsion is unknown, prefer EARLY_BB.
> >> -        Thus use !(...>=..) rather than (...<...)  */
> >> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> >> -    return best_bb;
> >> +     && high_reg_pressure_rgn)
> >> +    {
> >> +     /* Avoid sinking to immediate dominator if the statement to be moved
> >> +       has memory operand and same loop nest.  */
> >> +      if (best_bb != late_bb && gimple_vuse (stmt))
> >> +       return late_bb;
> >> +      return best_bb;
> >> +    }
> >>
> >>    /* No better block found, so return EARLY_BB, which happens to be the
> >>       statement's original block.  */
> >> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
> >>  static bool
> >>  statement_sink_location (gimple *stmt, basic_block frombb,
> >>                          gimple_stmt_iterator *togsi, bool *zero_uses_p,
> >> -                        virtual_operand_live &vop_live)
> >> +                        virtual_operand_live &vop_live,
> >> +                        tree_live_info_p &live)
> >>  {
> >>    gimple *use;
> >>    use_operand_p one_use = NULL_USE_OPERAND_P;
> >> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> >>        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
> >>         return false;
> >>
> >> -      commondom = select_best_block (frombb, commondom, stmt);
> >> +      commondom = select_best_block (frombb, commondom, stmt, live);
> >>
> >>        if (commondom == frombb)
> >>         return false;
> >> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> >>             continue;
> >>           break;
> >>         }
> >> +
> >>        use = USE_STMT (one_use);
> >>
> >>        if (gimple_code (use) != GIMPLE_PHI)
> >>         {
> >> -         sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
> >> +         sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
> >>
> >>           if (sinkbb == frombb)
> >>             return false;
> >>
> >> -         if (sinkbb == gimple_bb (use))
> >> -           *togsi = gsi_for_stmt (use);
> >> -         else
> >> -           *togsi = gsi_after_labels (sinkbb);
> >> +         *togsi = gsi_after_labels (sinkbb);
> >>
> >>           return true;
> >>         }
> >> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> >>    if (!sinkbb)
> >>      return false;
> >>
> >> -  sinkbb = select_best_block (frombb, sinkbb, stmt);
> >> +  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
> >>    if (!sinkbb || sinkbb == frombb)
> >>      return false;
> >>
> >> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
> >>  /* Perform code sinking on BB */
> >>
> >>  static unsigned
> >> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
> >> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
> >> +               tree_live_info_p &live)
> >>  {
> >>    gimple_stmt_iterator gsi;
> >>    edge_iterator ei;
> >> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
> >>        gimple_stmt_iterator togsi;
> >>        bool zero_uses_p;
> >>
> >> -      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
> >> +      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
> >> +                                   vop_live, live))
> >>         {
> >>           gimple_stmt_iterator saved = gsi;
> >>           if (!gsi_end_p (gsi))
> >> @@ -814,6 +829,15 @@ private:
> >>    bool unsplit_edges;
> >>  }; // class pass_sink_code
> >>
> >> +void additional_var_map (var_map &map)
> >> +{
> >> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
> >> +  map->outofssa_p = false;
> >> +  basic_block bb;
> >> +  FOR_EACH_BB_FN (bb, cfun)
> >> +    bitmap_set_bit (map->bmp_bbs, bb->index);
> >> +}
> >> +
> >>  unsigned int
> >>  pass_sink_code::execute (function *fun)
> >>  {
> >> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
> >>    calculate_dominance_info (CDI_DOMINATORS);
> >>
> >>    virtual_operand_live vop_live;
> >> +  var_map map = init_var_map (num_ssa_names);
> >> +  additional_var_map (map);
> >> +  tree_live_info_p live = calculate_live_ranges (map, true);
> >>
> >>    int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >>    int n = inverted_rev_post_order_compute (fun, rpo);
> >> +
> >>    for (int i = 0; i < n; ++i)
> >> -    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
> >> +    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
> >>    free (rpo);
> >> +  if (live)
> >> +    delete_tree_live_info (live);
> >> +
> >> +  if (map)
> >> +    delete_var_map (map);
> >>
> >>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
> >>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
> >> --
> >> 2.39.3
> >>
> >>
Ajit Agarwal Nov. 3, 2023, 2:54 p.m. UTC | #4
Hello Richard:


On 03/11/23 7:06 pm, Richard Biener wrote:
> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>
>> Hello Richard:
>>
>> On 03/11/23 12:51 pm, Richard Biener wrote:
>>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>>
>>>> Hello All:
>>>>
> [...]
>>>>
>>>> High register pressure region is the region where there are live-in of
>>>> early blocks that has been modified by the early block. If there are
>>>> modification of the variables in best block that are live-in in early
>>>> block that are live-out of best block.
>>>
>>> ?!  Parse error.
>>>
>>
>> I didnt understand what you meant here. Please suggest.
> 
> I can't even guess what that paragraph means.  It fails at a
> parsing level already, I can't even start to reason about what
> the sentences mean.

Sorry for that I will modify.

> 
>>>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>>
>>> What's the effect on code generation?
>>>
>>> Note that live is a quadratic problem while sinking was not.  You
>>> are effectively making the pass unfit for -O1.
>>>
>>> You are computing "liveness" on GIMPLE where within EBBs there
>>> isn't really any particular order of stmts, so it's kind of a garbage
>>> heuristic.  Likewise you are not computing the effect that moving
>>> a stmt has on liveness as far as I can see but you are just identifying
>>> some odd metrics (I don't really understand them) to rank blocks,
>>> not even taking the register file size into account.
>>
>>
>> if the live out of best_bb  <= live out of early_bb, that shows
>> that there are modification in best_bb.
> 
> Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
> then some variables die during the execution of bb?

live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
and there are more GEN (Variables).

  Otherwise,
> if live_out (early) > live_out (best) then somewhere on the path
> from early to best some variables die.
> 

If live_out (early) > live_out (best) means there are more GEN (Variables)
between path from early to best.


>> Then it's
>> safer to move statements in best_bb as there are lesser interfering
>> live variables in best_bb.
> 
> consider a stmt
> 
>  a = b + c;
> 
> where b and c die at the definition of a.  Then moving the stmt
> down from early_bb means you increase live_out (early_bb) by
> one.  So why's that "safer" then?  Of course live_out (best_bb)
> also increases by two then.
> 

If b and c die at the definition of a and generates a live_in(early_bb)
would be live_out(early_bb) - 2 + 1.

the moving the stmt from early_bb down to best_bb increases live_out(early_bb)
by one and live_out (best_bb) depends on the LIVEIN(for all successors of best_bb)
which may be same even if we move down.

There are chances that live_out (best_bb) greater if for all successors of 
best_bb there are more GEN ( variables). If live_out (best_bb) is less
means there more KILL (Variables) in successors of best_bb.

With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do
code motion as there are chances of more interfering live ranges. If liveout(best_bb)
<= liveout (early_bb) then we do code motion as there is there are more KILL(for
all successors of best_bb) and there is less chance of interfering live ranges.

With moving down above stmt from early_bb to best_bb increases live_out(early_bb)
by one but live_out(best_bb) may be remains. If live_out (early_bb) increase by 1
but if it becomes > live_out(best_bb) then we dont do code motion if we have more GEN (Variables) in best_bb otherewise its safer to do 
code motion.

for above statement a = b + c dies b and c and generates a in early_bb then
liveout(early_bb) increases by 1. If before moving if liveout (best_bb) is 10
and then liveout (early_bb) becomes > 10 then we dont do code motion otherwise
we do code motion.





>> if there are lesser live out in best_bb, there is lesser chance
>> of interfering live ranges and hence moving statements in best_bb
>> will not increase register pressure.
>>
>> If the liveout of best_bb is greater than live-out of early_bb,
>> moving statements in best_bb will increase chances of more interfering
>> live ranges and hence increase in register pressure.
>>
>> This is how the heuristics is defined.
> 
> I don't think they will work.  Do you have any numbers?
>

My heuristics will work as mentioned above. I will  run the spec benchmarks
and will able to give performance numbers.

Thanks & Regards
Ajit
 
>>
>>>
>>> You are replacing the hot/cold heuristic.
>>
>>>
>>> IMHO the sinking pass is the totally wrong place to do anything
>>> about register pressure.  You are trying to solve a scheduling
>>> problem by just looking at a single stmt.
>>>
>>
>> bb->count from profile.cc are prone to errors as you have
>> mentioned in previous mails. Main bottlenecks with code
>> motion is increase in register pressure as that counts to
>> spills in later phases of the compiler backend.
>>
>> Calculation of best_bb based of immediate dominator should
>> consider register pressure instead of hold cold regions as that
>> would effect code generation.
>>
>> If there is increase in register pressure with code motion and if
>> we are moving into colder regions, that wont improve code generations.
>>
>> Hold/cold should be criteria but not the improved criteria with
>> code motion.
>>
>> We should consider register pressure with code motion than hot/cold
>> regions.
>>
>> Thanks & Regards
>> Ajit
>>
>>> Richard.
>>>
>>>> Thanks & Regards
>>
>>
>>>> Ajit
>>>>
>>>> tree-optimization: Add register pressure heuristics
>>>>
>>>> Currently code sinking heuristics are based on profile data like
>>>> basic block count and sink frequency threshold. We have removed
>>>> such heuristics to add register pressure heuristics based on
>>>> live-in and live-out of early blocks and immediate dominator of
>>>> use blocks.
>>>>
>>>> High register pressure region is the region where there are live-in of
>>>> early blocks that has been modified by the early block. If there are
>>>> modification of the variables in best block that are live-in in early
>>>> block that are live-out of best block.
>>>>
>>>> 2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>         * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
>>>>         as paramters.
>>>>         (sink_code_in_bb): Ditto.
>>>>         (select_best_block): Add register pressure heuristics to select
>>>>         the best blocks in the immediate dominator for same loop nest depth.
>>>>         (execute): Add live range analysis.
>>>>         (additional_var_map): New function.
>>>>         * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
>>>>         tests on ssa_names.
>>>>         (verify_live_on_entry): Ditto.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>         * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>>>>         * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
>>>> ---
>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
>>>>  gcc/tree-ssa-live.cc                        | 11 ++-
>>>>  gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
>>>>  4 files changed, 104 insertions(+), 34 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>> new file mode 100644
>>>> index 00000000000..d3b79ca5803
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>> @@ -0,0 +1,15 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>> +void bar();
>>>> +int j;
>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>> +{
>>>> +  int l;
>>>> +  l = a + b + c + d +e + f;
>>>> +  if (a != 5)
>>>> +    {
>>>> +      bar();
>>>> +      j = l;
>>>> +    }
>>>> +}
>>>> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>> new file mode 100644
>>>> index 00000000000..84e7938c54f
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>> @@ -0,0 +1,19 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>> +void bar();
>>>> +int j, x;
>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>> +{
>>>> +  int l;
>>>> +  l = a + b + c + d +e + f;
>>>> +  if (a != 5)
>>>> +    {
>>>> +      bar();
>>>> +      if (b != 3)
>>>> +        x = 3;
>>>> +      else
>>>> +        x = 5;
>>>> +      j = l;
>>>> +    }
>>>> +}
>>>> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
>>>> index f06daf23035..998fe588278 100644
>>>> --- a/gcc/tree-ssa-live.cc
>>>> +++ b/gcc/tree-ssa-live.cc
>>>> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
>>>>      def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>>>>
>>>>    /* An undefined local variable does not need to be very alive.  */
>>>> -  if (ssa_undefined_value_p (ssa_name, false))
>>>> +  if (virtual_operand_p (ssa_name)
>>>> +      || ssa_undefined_value_p (ssa_name, false))
>>>>      return;
>>>>
>>>>    /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
>>>> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
>>>>
>>>>
>>>>  /* Verify that the info in LIVE matches the current cfg.  */
>>>> -
>>>>  static void
>>>>  verify_live_on_entry (tree_live_info_p live)
>>>>  {
>>>> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
>>>>           tree d = NULL_TREE;
>>>>           bitmap loe;
>>>>           var = partition_to_var (map, i);
>>>> +         if (var == NULL_TREE)
>>>> +           continue;
>>>>           stmt = SSA_NAME_DEF_STMT (var);
>>>>           tmp = gimple_bb (stmt);
>>>> +
>>>>           if (SSA_NAME_VAR (var))
>>>>             d = ssa_default_def (cfun, SSA_NAME_VAR (var));
>>>> -
>>>>           loe = live_on_entry (live, e->dest);
>>>>           if (loe && bitmap_bit_p (loe, i))
>>>>             {
>>>> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
>>>>               {
>>>>                 /* An undefined local variable does not need to be very
>>>>                    alive.  */
>>>> -               if (ssa_undefined_value_p (var, false))
>>>> +               if (virtual_operand_p (var)
>>>> +                   || ssa_undefined_value_p (var, false))
>>>>                   continue;
>>>>
>>>>                 /* The only way this var shouldn't be marked live on entry is
>>>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
>>>> index a360c5cdd6e..d0c9ef1ab86 100644
>>>> --- a/gcc/tree-ssa-sink.cc
>>>> +++ b/gcc/tree-ssa-sink.cc
>>>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>>     tree, return the best basic block between them (inclusive) to place
>>>>     statements.
>>>>
>>>> +   The best basic block should be an immediate dominator of
>>>> +   best basic block if we've moved to same loop nest.
>>>> +
>>>>     We want the most control dependent block in the shallowest loop nest.
>>>>
>>>>     If the resulting block is in a shallower loop nest, then use it.  Else
>>>> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>>  static basic_block
>>>>  select_best_block (basic_block early_bb,
>>>>                    basic_block late_bb,
>>>> -                  gimple *stmt)
>>>> +                  gimple *stmt,
>>>> +                  tree_live_info_p &live)
>>>>  {
>>>>    basic_block best_bb = late_bb;
>>>>    basic_block temp_bb = late_bb;
>>>> -  int threshold;
>>>>
>>>>    while (temp_bb != early_bb)
>>>>      {
>>>>        /* If we've moved into a lower loop nest, then that becomes
>>>>          our best block.  */
>>>> -      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>>>> +      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
>>>>         best_bb = temp_bb;
>>>>
>>>>        /* Walk up the dominator tree, hopefully we'll find a shallower
>>>>          loop nest.  */
>>>>        temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>>>>      }
>>>> -
>>>>    /* Placing a statement before a setjmp-like function would be invalid
>>>>       (it cannot be reevaluated when execution follows an abnormal edge).
>>>>       If we selected a block with abnormal predecessors, just punt.  */
>>>> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
>>>>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>>>>      return early_bb;
>>>>
>>>> -  /* Get the sinking threshold.  If the statement to be moved has memory
>>>> -     operands, then increase the threshold by 7% as those are even more
>>>> -     profitable to avoid, clamping at 100%.  */
>>>> -  threshold = param_sink_frequency_threshold;
>>>> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
>>>> -    {
>>>> -      threshold += 7;
>>>> -      if (threshold > 100)
>>>> -       threshold = 100;
>>>> -    }
>>>> +  int best_bb_liveout_cnt
>>>> +    = bitmap_count_bits (&live->liveout[best_bb->index]);
>>>> +  int early_bb_liveout_cnt
>>>> +     = bitmap_count_bits (&live->liveout[early_bb->index]);
>>>> +  int early_livein_cnt
>>>> +     = bitmap_count_bits (&live->livein[early_bb->index]);
>>>> +
>>>> +  /* High register pressure region is the region where there are live-in of
>>>> +     early blocks that has been modified by the early block. If there are
>>>> +     modification of the variables in best block that are live-in in early
>>>> +     block that are live-out of best block.  */
>>>> +  bool live_in_rgn = (early_livein_cnt != 0
>>>> +                     && early_bb_liveout_cnt <= early_livein_cnt);
>>>> +
>>>> +  bool high_reg_pressure_rgn
>>>> +    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
>>>> +       && best_bb_liveout_cnt <= early_bb_liveout_cnt);
>>>>
>>>>    /* If BEST_BB is at the same nesting level, then require it to have
>>>> -     significantly lower execution frequency to avoid gratuitous movement.  */
>>>> +     significantly high register pressure region to avoid gratuitous
>>>> +      movement.  */
>>>>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>>>> -      /* If result of comparsion is unknown, prefer EARLY_BB.
>>>> -        Thus use !(...>=..) rather than (...<...)  */
>>>> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
>>>> -    return best_bb;
>>>> +     && high_reg_pressure_rgn)
>>>> +    {
>>>> +     /* Avoid sinking to immediate dominator if the statement to be moved
>>>> +       has memory operand and same loop nest.  */
>>>> +      if (best_bb != late_bb && gimple_vuse (stmt))
>>>> +       return late_bb;
>>>> +      return best_bb;
>>>> +    }
>>>>
>>>>    /* No better block found, so return EARLY_BB, which happens to be the
>>>>       statement's original block.  */
>>>> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
>>>>  static bool
>>>>  statement_sink_location (gimple *stmt, basic_block frombb,
>>>>                          gimple_stmt_iterator *togsi, bool *zero_uses_p,
>>>> -                        virtual_operand_live &vop_live)
>>>> +                        virtual_operand_live &vop_live,
>>>> +                        tree_live_info_p &live)
>>>>  {
>>>>    gimple *use;
>>>>    use_operand_p one_use = NULL_USE_OPERAND_P;
>>>> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
>>>>         return false;
>>>>
>>>> -      commondom = select_best_block (frombb, commondom, stmt);
>>>> +      commondom = select_best_block (frombb, commondom, stmt, live);
>>>>
>>>>        if (commondom == frombb)
>>>>         return false;
>>>> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>             continue;
>>>>           break;
>>>>         }
>>>> +
>>>>        use = USE_STMT (one_use);
>>>>
>>>>        if (gimple_code (use) != GIMPLE_PHI)
>>>>         {
>>>> -         sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
>>>> +         sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
>>>>
>>>>           if (sinkbb == frombb)
>>>>             return false;
>>>>
>>>> -         if (sinkbb == gimple_bb (use))
>>>> -           *togsi = gsi_for_stmt (use);
>>>> -         else
>>>> -           *togsi = gsi_after_labels (sinkbb);
>>>> +         *togsi = gsi_after_labels (sinkbb);
>>>>
>>>>           return true;
>>>>         }
>>>> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>    if (!sinkbb)
>>>>      return false;
>>>>
>>>> -  sinkbb = select_best_block (frombb, sinkbb, stmt);
>>>> +  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
>>>>    if (!sinkbb || sinkbb == frombb)
>>>>      return false;
>>>>
>>>> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
>>>>  /* Perform code sinking on BB */
>>>>
>>>>  static unsigned
>>>> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
>>>> +               tree_live_info_p &live)
>>>>  {
>>>>    gimple_stmt_iterator gsi;
>>>>    edge_iterator ei;
>>>> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>>        gimple_stmt_iterator togsi;
>>>>        bool zero_uses_p;
>>>>
>>>> -      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
>>>> +      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
>>>> +                                   vop_live, live))
>>>>         {
>>>>           gimple_stmt_iterator saved = gsi;
>>>>           if (!gsi_end_p (gsi))
>>>> @@ -814,6 +829,15 @@ private:
>>>>    bool unsplit_edges;
>>>>  }; // class pass_sink_code
>>>>
>>>> +void additional_var_map (var_map &map)
>>>> +{
>>>> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
>>>> +  map->outofssa_p = false;
>>>> +  basic_block bb;
>>>> +  FOR_EACH_BB_FN (bb, cfun)
>>>> +    bitmap_set_bit (map->bmp_bbs, bb->index);
>>>> +}
>>>> +
>>>>  unsigned int
>>>>  pass_sink_code::execute (function *fun)
>>>>  {
>>>> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
>>>>    calculate_dominance_info (CDI_DOMINATORS);
>>>>
>>>>    virtual_operand_live vop_live;
>>>> +  var_map map = init_var_map (num_ssa_names);
>>>> +  additional_var_map (map);
>>>> +  tree_live_info_p live = calculate_live_ranges (map, true);
>>>>
>>>>    int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>>>>    int n = inverted_rev_post_order_compute (fun, rpo);
>>>> +
>>>>    for (int i = 0; i < n; ++i)
>>>> -    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
>>>> +    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
>>>>    free (rpo);
>>>> +  if (live)
>>>> +    delete_tree_live_info (live);
>>>> +
>>>> +  if (map)
>>>> +    delete_var_map (map);
>>>>
>>>>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>>>>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
>>>> --
>>>> 2.39.3
>>>>
>>>>
Ajit Agarwal Nov. 4, 2023, 5:49 p.m. UTC | #5
Hello Richard:

Below are the performance numbers on CPU 2017 benchmarks with and without register pressure
changes for code sinking.

INT Benchmarks:

With register pressure code sinking changes:


                       Estimated                       Estimated
                 Base     Base        Base        Peak     Peak        Peak
Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate 
--------------- -------  ---------  ---------    -------  ---------  ---------

500.perlbench_r       1        363       4.39  *
502.gcc_r             1        225       6.30  *
505.mcf_r             1        289       5.59  *
520.omnetpp_r         1        315       4.17  *
523.xalancbmk_r       1        238       4.43  *
525.x264_r            1        180       9.75  *
531.deepsjeng_r       1        291       3.94  *
541.leela_r           1        463       3.58  *
548.exchange2_r       1        222      11.8   *
557.xz_r              1        323       3.34  *
 Est. SPECrate(R)2017_int_base           5.24

Trunk without any register pressure code sinking changes:

500.perlbench_r       1        358       4.44  *
502.gcc_r             1        225       6.28  *
505.mcf_r             1        286       5.64  *
520.omnetpp_r         1        310       4.24  *
523.xalancbmk_r       1        235       4.50  *
525.x264_r            1        181       9.69  *
531.deepsjeng_r       1        291       3.94  *
541.leela_r           1        465       3.56  *
548.exchange2_r       1        219      11.9   *
557.xz_r              1        325       3.32  *
 Est. SPECrate(R)2017_int_base           5.26

FP benchmarks:

With register pressure code sinking changes:

503.bwaves_r          1        187      53.5   * 
507.cactuBSSN_r       1        235       5.39  *
508.namd_r            1        216       4.40  *
510.parest_r          1        340       7.69  *
511.povray_r          1        488       4.78  *
519.lbm_r             1        128       8.21  *
521.wrf_r             1        269       8.34  *
526.blender_r         1        311       4.89  *
527.cam4_r            1        314       5.56  *
538.imagick_r         1        228      10.9   *
544.nab_r             1        293       5.74  *
549.fotonik3d_r       1        237      16.4   *
554.roms_r            1        269       5.90  *
 Est. SPECrate(R)2017_fp_base            7.97

Trunk Without register pressure changes for code sinking:

503.bwaves_r          1        188      53.4   *
507.cactuBSSN_r       1        242       5.24  *
508.namd_r            1        215       4.42  *
510.parest_r          1        333       7.86  *
511.povray_r          1        481       4.85  *
519.lbm_r             1        128       8.22  *
521.wrf_r             1        269       8.34  *
526.blender_r         1        309       4.93  *
527.cam4_r            1        313       5.58  *
538.imagick_r         1        227      11.0   *
544.nab_r             1        291       5.79  *
549.fotonik3d_r       1        235      16.6   *
554.roms_r            1        268       5.92  *
 Est. SPECrate(R)2017_fp_base            8.00


Thanks & Regards
Ajit

On 03/11/23 8:24 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> 
> On 03/11/23 7:06 pm, Richard Biener wrote:
>> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>
>>> Hello Richard:
>>>
>>> On 03/11/23 12:51 pm, Richard Biener wrote:
>>>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>>>
>>>>> Hello All:
>>>>>
>> [...]
>>>>>
>>>>> High register pressure region is the region where there are live-in of
>>>>> early blocks that has been modified by the early block. If there are
>>>>> modification of the variables in best block that are live-in in early
>>>>> block that are live-out of best block.
>>>>
>>>> ?!  Parse error.
>>>>
>>>
>>> I didnt understand what you meant here. Please suggest.
>>
>> I can't even guess what that paragraph means.  It fails at a
>> parsing level already, I can't even start to reason about what
>> the sentences mean.
> 
> Sorry for that I will modify.
> 
>>
>>>>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>>>
>>>> What's the effect on code generation?
>>>>
>>>> Note that live is a quadratic problem while sinking was not.  You
>>>> are effectively making the pass unfit for -O1.
>>>>
>>>> You are computing "liveness" on GIMPLE where within EBBs there
>>>> isn't really any particular order of stmts, so it's kind of a garbage
>>>> heuristic.  Likewise you are not computing the effect that moving
>>>> a stmt has on liveness as far as I can see but you are just identifying
>>>> some odd metrics (I don't really understand them) to rank blocks,
>>>> not even taking the register file size into account.
>>>
>>>
>>> if the live out of best_bb  <= live out of early_bb, that shows
>>> that there are modification in best_bb.
>>
>> Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
>> then some variables die during the execution of bb?
> 
> live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
> and there are more GEN (Variables).
> 
>   Otherwise,
>> if live_out (early) > live_out (best) then somewhere on the path
>> from early to best some variables die.
>>
> 
> If live_out (early) > live_out (best) means there are more GEN (Variables)
> between path from early to best.
> 
> 
>>> Then it's
>>> safer to move statements in best_bb as there are lesser interfering
>>> live variables in best_bb.
>>
>> consider a stmt
>>
>>  a = b + c;
>>
>> where b and c die at the definition of a.  Then moving the stmt
>> down from early_bb means you increase live_out (early_bb) by
>> one.  So why's that "safer" then?  Of course live_out (best_bb)
>> also increases by two then.
>>
> 
> If b and c die at the definition of a and generates a live_in(early_bb)
> would be live_out(early_bb) - 2 + 1.
> 
> the moving the stmt from early_bb down to best_bb increases live_out(early_bb)
> by one and live_out (best_bb) depends on the LIVEIN(for all successors of best_bb)
> which may be same even if we move down.
> 
> There are chances that live_out (best_bb) greater if for all successors of 
> best_bb there are more GEN ( variables). If live_out (best_bb) is less
> means there more KILL (Variables) in successors of best_bb.
> 
> With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do
> code motion as there are chances of more interfering live ranges. If liveout(best_bb)
> <= liveout (early_bb) then we do code motion as there is there are more KILL(for
> all successors of best_bb) and there is less chance of interfering live ranges.
> 
> With moving down above stmt from early_bb to best_bb increases live_out(early_bb)
> by one but live_out(best_bb) may be remains. If live_out (early_bb) increase by 1
> but if it becomes > live_out(best_bb) then we dont do code motion if we have more GEN (Variables) in best_bb otherewise its safer to do 
> code motion.
> 
> for above statement a = b + c dies b and c and generates a in early_bb then
> liveout(early_bb) increases by 1. If before moving if liveout (best_bb) is 10
> and then liveout (early_bb) becomes > 10 then we dont do code motion otherwise
> we do code motion.
> 
> 
> 
> 
> 
>>> if there are lesser live out in best_bb, there is lesser chance
>>> of interfering live ranges and hence moving statements in best_bb
>>> will not increase register pressure.
>>>
>>> If the liveout of best_bb is greater than live-out of early_bb,
>>> moving statements in best_bb will increase chances of more interfering
>>> live ranges and hence increase in register pressure.
>>>
>>> This is how the heuristics is defined.
>>
>> I don't think they will work.  Do you have any numbers?
>>
> 
> My heuristics will work as mentioned above. I will  run the spec benchmarks
> and will able to give performance numbers.
> 
> Thanks & Regards
> Ajit
>  
>>>
>>>>
>>>> You are replacing the hot/cold heuristic.
>>>
>>>>
>>>> IMHO the sinking pass is the totally wrong place to do anything
>>>> about register pressure.  You are trying to solve a scheduling
>>>> problem by just looking at a single stmt.
>>>>
>>>
>>> bb->count from profile.cc are prone to errors as you have
>>> mentioned in previous mails. Main bottlenecks with code
>>> motion is increase in register pressure as that counts to
>>> spills in later phases of the compiler backend.
>>>
>>> Calculation of best_bb based of immediate dominator should
>>> consider register pressure instead of hold cold regions as that
>>> would effect code generation.
>>>
>>> If there is increase in register pressure with code motion and if
>>> we are moving into colder regions, that wont improve code generations.
>>>
>>> Hold/cold should be criteria but not the improved criteria with
>>> code motion.
>>>
>>> We should consider register pressure with code motion than hot/cold
>>> regions.
>>>
>>> Thanks & Regards
>>> Ajit
>>>
>>>> Richard.
>>>>
>>>>> Thanks & Regards
>>>
>>>
>>>>> Ajit
>>>>>
>>>>> tree-optimization: Add register pressure heuristics
>>>>>
>>>>> Currently code sinking heuristics are based on profile data like
>>>>> basic block count and sink frequency threshold. We have removed
>>>>> such heuristics to add register pressure heuristics based on
>>>>> live-in and live-out of early blocks and immediate dominator of
>>>>> use blocks.
>>>>>
>>>>> High register pressure region is the region where there are live-in of
>>>>> early blocks that has been modified by the early block. If there are
>>>>> modification of the variables in best block that are live-in in early
>>>>> block that are live-out of best block.
>>>>>
>>>>> 2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>         * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
>>>>>         as paramters.
>>>>>         (sink_code_in_bb): Ditto.
>>>>>         (select_best_block): Add register pressure heuristics to select
>>>>>         the best blocks in the immediate dominator for same loop nest depth.
>>>>>         (execute): Add live range analysis.
>>>>>         (additional_var_map): New function.
>>>>>         * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
>>>>>         tests on ssa_names.
>>>>>         (verify_live_on_entry): Ditto.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>         * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>>>>>         * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
>>>>> ---
>>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
>>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
>>>>>  gcc/tree-ssa-live.cc                        | 11 ++-
>>>>>  gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
>>>>>  4 files changed, 104 insertions(+), 34 deletions(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>>
>>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>>> new file mode 100644
>>>>> index 00000000000..d3b79ca5803
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>>> @@ -0,0 +1,15 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>>> +void bar();
>>>>> +int j;
>>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>>> +{
>>>>> +  int l;
>>>>> +  l = a + b + c + d +e + f;
>>>>> +  if (a != 5)
>>>>> +    {
>>>>> +      bar();
>>>>> +      j = l;
>>>>> +    }
>>>>> +}
>>>>> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>> new file mode 100644
>>>>> index 00000000000..84e7938c54f
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>> @@ -0,0 +1,19 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>>> +void bar();
>>>>> +int j, x;
>>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>>> +{
>>>>> +  int l;
>>>>> +  l = a + b + c + d +e + f;
>>>>> +  if (a != 5)
>>>>> +    {
>>>>> +      bar();
>>>>> +      if (b != 3)
>>>>> +        x = 3;
>>>>> +      else
>>>>> +        x = 5;
>>>>> +      j = l;
>>>>> +    }
>>>>> +}
>>>>> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>>> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
>>>>> index f06daf23035..998fe588278 100644
>>>>> --- a/gcc/tree-ssa-live.cc
>>>>> +++ b/gcc/tree-ssa-live.cc
>>>>> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
>>>>>      def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>>>>>
>>>>>    /* An undefined local variable does not need to be very alive.  */
>>>>> -  if (ssa_undefined_value_p (ssa_name, false))
>>>>> +  if (virtual_operand_p (ssa_name)
>>>>> +      || ssa_undefined_value_p (ssa_name, false))
>>>>>      return;
>>>>>
>>>>>    /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
>>>>> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
>>>>>
>>>>>
>>>>>  /* Verify that the info in LIVE matches the current cfg.  */
>>>>> -
>>>>>  static void
>>>>>  verify_live_on_entry (tree_live_info_p live)
>>>>>  {
>>>>> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
>>>>>           tree d = NULL_TREE;
>>>>>           bitmap loe;
>>>>>           var = partition_to_var (map, i);
>>>>> +         if (var == NULL_TREE)
>>>>> +           continue;
>>>>>           stmt = SSA_NAME_DEF_STMT (var);
>>>>>           tmp = gimple_bb (stmt);
>>>>> +
>>>>>           if (SSA_NAME_VAR (var))
>>>>>             d = ssa_default_def (cfun, SSA_NAME_VAR (var));
>>>>> -
>>>>>           loe = live_on_entry (live, e->dest);
>>>>>           if (loe && bitmap_bit_p (loe, i))
>>>>>             {
>>>>> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
>>>>>               {
>>>>>                 /* An undefined local variable does not need to be very
>>>>>                    alive.  */
>>>>> -               if (ssa_undefined_value_p (var, false))
>>>>> +               if (virtual_operand_p (var)
>>>>> +                   || ssa_undefined_value_p (var, false))
>>>>>                   continue;
>>>>>
>>>>>                 /* The only way this var shouldn't be marked live on entry is
>>>>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
>>>>> index a360c5cdd6e..d0c9ef1ab86 100644
>>>>> --- a/gcc/tree-ssa-sink.cc
>>>>> +++ b/gcc/tree-ssa-sink.cc
>>>>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>>>     tree, return the best basic block between them (inclusive) to place
>>>>>     statements.
>>>>>
>>>>> +   The best basic block should be an immediate dominator of
>>>>> +   best basic block if we've moved to same loop nest.
>>>>> +
>>>>>     We want the most control dependent block in the shallowest loop nest.
>>>>>
>>>>>     If the resulting block is in a shallower loop nest, then use it.  Else
>>>>> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>>>  static basic_block
>>>>>  select_best_block (basic_block early_bb,
>>>>>                    basic_block late_bb,
>>>>> -                  gimple *stmt)
>>>>> +                  gimple *stmt,
>>>>> +                  tree_live_info_p &live)
>>>>>  {
>>>>>    basic_block best_bb = late_bb;
>>>>>    basic_block temp_bb = late_bb;
>>>>> -  int threshold;
>>>>>
>>>>>    while (temp_bb != early_bb)
>>>>>      {
>>>>>        /* If we've moved into a lower loop nest, then that becomes
>>>>>          our best block.  */
>>>>> -      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>>>>> +      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
>>>>>         best_bb = temp_bb;
>>>>>
>>>>>        /* Walk up the dominator tree, hopefully we'll find a shallower
>>>>>          loop nest.  */
>>>>>        temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>>>>>      }
>>>>> -
>>>>>    /* Placing a statement before a setjmp-like function would be invalid
>>>>>       (it cannot be reevaluated when execution follows an abnormal edge).
>>>>>       If we selected a block with abnormal predecessors, just punt.  */
>>>>> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
>>>>>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>>>>>      return early_bb;
>>>>>
>>>>> -  /* Get the sinking threshold.  If the statement to be moved has memory
>>>>> -     operands, then increase the threshold by 7% as those are even more
>>>>> -     profitable to avoid, clamping at 100%.  */
>>>>> -  threshold = param_sink_frequency_threshold;
>>>>> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
>>>>> -    {
>>>>> -      threshold += 7;
>>>>> -      if (threshold > 100)
>>>>> -       threshold = 100;
>>>>> -    }
>>>>> +  int best_bb_liveout_cnt
>>>>> +    = bitmap_count_bits (&live->liveout[best_bb->index]);
>>>>> +  int early_bb_liveout_cnt
>>>>> +     = bitmap_count_bits (&live->liveout[early_bb->index]);
>>>>> +  int early_livein_cnt
>>>>> +     = bitmap_count_bits (&live->livein[early_bb->index]);
>>>>> +
>>>>> +  /* High register pressure region is the region where there are live-in of
>>>>> +     early blocks that has been modified by the early block. If there are
>>>>> +     modification of the variables in best block that are live-in in early
>>>>> +     block that are live-out of best block.  */
>>>>> +  bool live_in_rgn = (early_livein_cnt != 0
>>>>> +                     && early_bb_liveout_cnt <= early_livein_cnt);
>>>>> +
>>>>> +  bool high_reg_pressure_rgn
>>>>> +    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
>>>>> +       && best_bb_liveout_cnt <= early_bb_liveout_cnt);
>>>>>
>>>>>    /* If BEST_BB is at the same nesting level, then require it to have
>>>>> -     significantly lower execution frequency to avoid gratuitous movement.  */
>>>>> +     significantly high register pressure region to avoid gratuitous
>>>>> +      movement.  */
>>>>>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>>>>> -      /* If result of comparsion is unknown, prefer EARLY_BB.
>>>>> -        Thus use !(...>=..) rather than (...<...)  */
>>>>> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
>>>>> -    return best_bb;
>>>>> +     && high_reg_pressure_rgn)
>>>>> +    {
>>>>> +     /* Avoid sinking to immediate dominator if the statement to be moved
>>>>> +       has memory operand and same loop nest.  */
>>>>> +      if (best_bb != late_bb && gimple_vuse (stmt))
>>>>> +       return late_bb;
>>>>> +      return best_bb;
>>>>> +    }
>>>>>
>>>>>    /* No better block found, so return EARLY_BB, which happens to be the
>>>>>       statement's original block.  */
>>>>> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
>>>>>  static bool
>>>>>  statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>                          gimple_stmt_iterator *togsi, bool *zero_uses_p,
>>>>> -                        virtual_operand_live &vop_live)
>>>>> +                        virtual_operand_live &vop_live,
>>>>> +                        tree_live_info_p &live)
>>>>>  {
>>>>>    gimple *use;
>>>>>    use_operand_p one_use = NULL_USE_OPERAND_P;
>>>>> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
>>>>>         return false;
>>>>>
>>>>> -      commondom = select_best_block (frombb, commondom, stmt);
>>>>> +      commondom = select_best_block (frombb, commondom, stmt, live);
>>>>>
>>>>>        if (commondom == frombb)
>>>>>         return false;
>>>>> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>             continue;
>>>>>           break;
>>>>>         }
>>>>> +
>>>>>        use = USE_STMT (one_use);
>>>>>
>>>>>        if (gimple_code (use) != GIMPLE_PHI)
>>>>>         {
>>>>> -         sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
>>>>> +         sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
>>>>>
>>>>>           if (sinkbb == frombb)
>>>>>             return false;
>>>>>
>>>>> -         if (sinkbb == gimple_bb (use))
>>>>> -           *togsi = gsi_for_stmt (use);
>>>>> -         else
>>>>> -           *togsi = gsi_after_labels (sinkbb);
>>>>> +         *togsi = gsi_after_labels (sinkbb);
>>>>>
>>>>>           return true;
>>>>>         }
>>>>> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>    if (!sinkbb)
>>>>>      return false;
>>>>>
>>>>> -  sinkbb = select_best_block (frombb, sinkbb, stmt);
>>>>> +  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
>>>>>    if (!sinkbb || sinkbb == frombb)
>>>>>      return false;
>>>>>
>>>>> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
>>>>>  /* Perform code sinking on BB */
>>>>>
>>>>>  static unsigned
>>>>> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>>> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
>>>>> +               tree_live_info_p &live)
>>>>>  {
>>>>>    gimple_stmt_iterator gsi;
>>>>>    edge_iterator ei;
>>>>> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>>>        gimple_stmt_iterator togsi;
>>>>>        bool zero_uses_p;
>>>>>
>>>>> -      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
>>>>> +      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
>>>>> +                                   vop_live, live))
>>>>>         {
>>>>>           gimple_stmt_iterator saved = gsi;
>>>>>           if (!gsi_end_p (gsi))
>>>>> @@ -814,6 +829,15 @@ private:
>>>>>    bool unsplit_edges;
>>>>>  }; // class pass_sink_code
>>>>>
>>>>> +void additional_var_map (var_map &map)
>>>>> +{
>>>>> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
>>>>> +  map->outofssa_p = false;
>>>>> +  basic_block bb;
>>>>> +  FOR_EACH_BB_FN (bb, cfun)
>>>>> +    bitmap_set_bit (map->bmp_bbs, bb->index);
>>>>> +}
>>>>> +
>>>>>  unsigned int
>>>>>  pass_sink_code::execute (function *fun)
>>>>>  {
>>>>> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
>>>>>    calculate_dominance_info (CDI_DOMINATORS);
>>>>>
>>>>>    virtual_operand_live vop_live;
>>>>> +  var_map map = init_var_map (num_ssa_names);
>>>>> +  additional_var_map (map);
>>>>> +  tree_live_info_p live = calculate_live_ranges (map, true);
>>>>>
>>>>>    int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>>>>>    int n = inverted_rev_post_order_compute (fun, rpo);
>>>>> +
>>>>>    for (int i = 0; i < n; ++i)
>>>>> -    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
>>>>> +    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
>>>>>    free (rpo);
>>>>> +  if (live)
>>>>> +    delete_tree_live_info (live);
>>>>> +
>>>>> +  if (map)
>>>>> +    delete_var_map (map);
>>>>>
>>>>>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>>>>>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
>>>>> --
>>>>> 2.39.3
>>>>>
>>>>>
Ajit Agarwal Nov. 16, 2023, 6:12 a.m. UTC | #6
Hello Richard:

With the below decison making I get the performance at par with trunk
changes and better than trunk for FP and INT SPEC 2017 benchmarks.

int best_bb_liveout_cnt
    = bitmap_count_bits (&live->liveout[best_bb->index]);
int early_bb_liveout_cnt
    = bitmap_count_bits (&live->liveout[early_bb->index]);
int early_livein_cnt
    = bitmap_count_bits (&live->livein[early_bb->index]);
  
    /* High register pressure region is the region where there are live-in of
     early blocks that has been modified by the early block. If there are
     modification of the variables in best block that are live-in in early
     block that are live-out of best block.  */
  bool live_in_rgn = (early_livein_cnt != 0
                      && early_bb_liveout_cnt <= early_livein_cnt);

  bool high_reg_pressure_rgn = false;
    
  if (live_in_rgn)
    high_reg_pressure_rgn
      = (best_bb_liveout_cnt <= early_bb_liveout_cnt);
  else
    high_reg_pressure_rgn
      = (best_bb_liveout_cnt != 0 && best_bb_liveout_cnt <= early_bb_liveout_cnt);

  high_reg_pressure_rgn
    = (high_reg_pressure_rgn) && !(best_bb->count >= early_bb->count));

I have included profile data without any sinking threshold or multiplying with 100.
This will fixes the error prone code as you have mentioned.

This will add the register pressure and profile data both in better way.

Please let me know if this is okay to submit.

I will send the patch accordingly.

Thanks & Regards
Ajit

On 03/11/23 8:24 pm, Ajit Agarwal wrote:
> Hello Richard:
> 
> 
> On 03/11/23 7:06 pm, Richard Biener wrote:
>> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>
>>> Hello Richard:
>>>
>>> On 03/11/23 12:51 pm, Richard Biener wrote:
>>>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>>>>>
>>>>> Hello All:
>>>>>
>> [...]
>>>>>
>>>>> High register pressure region is the region where there are live-in of
>>>>> early blocks that has been modified by the early block. If there are
>>>>> modification of the variables in best block that are live-in in early
>>>>> block that are live-out of best block.
>>>>
>>>> ?!  Parse error.
>>>>
>>>
>>> I didnt understand what you meant here. Please suggest.
>>
>> I can't even guess what that paragraph means.  It fails at a
>> parsing level already, I can't even start to reason about what
>> the sentences mean.
> 
> Sorry for that I will modify.
> 
>>
>>>>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>>>
>>>> What's the effect on code generation?
>>>>
>>>> Note that live is a quadratic problem while sinking was not.  You
>>>> are effectively making the pass unfit for -O1.
>>>>
>>>> You are computing "liveness" on GIMPLE where within EBBs there
>>>> isn't really any particular order of stmts, so it's kind of a garbage
>>>> heuristic.  Likewise you are not computing the effect that moving
>>>> a stmt has on liveness as far as I can see but you are just identifying
>>>> some odd metrics (I don't really understand them) to rank blocks,
>>>> not even taking the register file size into account.
>>>
>>>
>>> if the live out of best_bb  <= live out of early_bb, that shows
>>> that there are modification in best_bb.
>>
>> Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
>> then some variables die during the execution of bb?
> 
> live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
> and there are more GEN (Variables).
> 
>   Otherwise,
>> if live_out (early) > live_out (best) then somewhere on the path
>> from early to best some variables die.
>>
> 
> If live_out (early) > live_out (best) means there are more GEN (Variables)
> between path from early to best.
> 
> 
>>> Then it's
>>> safer to move statements in best_bb as there are lesser interfering
>>> live variables in best_bb.
>>
>> consider a stmt
>>
>>  a = b + c;
>>
>> where b and c die at the definition of a.  Then moving the stmt
>> down from early_bb means you increase live_out (early_bb) by
>> one.  So why's that "safer" then?  Of course live_out (best_bb)
>> also increases by two then.
>>
> 
> If b and c die at the definition of a and generates a live_in(early_bb)
> would be live_out(early_bb) - 2 + 1.
> 
> the moving the stmt from early_bb down to best_bb increases live_out(early_bb)
> by one and live_out (best_bb) depends on the LIVEIN(for all successors of best_bb)
> which may be same even if we move down.
> 
> There are chances that live_out (best_bb) greater if for all successors of 
> best_bb there are more GEN ( variables). If live_out (best_bb) is less
> means there more KILL (Variables) in successors of best_bb.
> 
> With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do
> code motion as there are chances of more interfering live ranges. If liveout(best_bb)
> <= liveout (early_bb) then we do code motion as there is there are more KILL(for
> all successors of best_bb) and there is less chance of interfering live ranges.
> 
> With moving down above stmt from early_bb to best_bb increases live_out(early_bb)
> by one but live_out(best_bb) may be remains. If live_out (early_bb) increase by 1
> but if it becomes > live_out(best_bb) then we dont do code motion if we have more GEN (Variables) in best_bb otherewise its safer to do 
> code motion.
> 
> for above statement a = b + c dies b and c and generates a in early_bb then
> liveout(early_bb) increases by 1. If before moving if liveout (best_bb) is 10
> and then liveout (early_bb) becomes > 10 then we dont do code motion otherwise
> we do code motion.
> 
> 
> 
> 
> 
>>> if there are lesser live out in best_bb, there is lesser chance
>>> of interfering live ranges and hence moving statements in best_bb
>>> will not increase register pressure.
>>>
>>> If the liveout of best_bb is greater than live-out of early_bb,
>>> moving statements in best_bb will increase chances of more interfering
>>> live ranges and hence increase in register pressure.
>>>
>>> This is how the heuristics is defined.
>>
>> I don't think they will work.  Do you have any numbers?
>>
> 
> My heuristics will work as mentioned above. I will  run the spec benchmarks
> and will able to give performance numbers.
> 
> Thanks & Regards
> Ajit
>  
>>>
>>>>
>>>> You are replacing the hot/cold heuristic.
>>>
>>>>
>>>> IMHO the sinking pass is the totally wrong place to do anything
>>>> about register pressure.  You are trying to solve a scheduling
>>>> problem by just looking at a single stmt.
>>>>
>>>
>>> bb->count from profile.cc are prone to errors as you have
>>> mentioned in previous mails. Main bottlenecks with code
>>> motion is increase in register pressure as that counts to
>>> spills in later phases of the compiler backend.
>>>
>>> Calculation of best_bb based of immediate dominator should
>>> consider register pressure instead of hold cold regions as that
>>> would effect code generation.
>>>
>>> If there is increase in register pressure with code motion and if
>>> we are moving into colder regions, that wont improve code generations.
>>>
>>> Hold/cold should be criteria but not the improved criteria with
>>> code motion.
>>>
>>> We should consider register pressure with code motion than hot/cold
>>> regions.
>>>
>>> Thanks & Regards
>>> Ajit
>>>
>>>> Richard.
>>>>
>>>>> Thanks & Regards
>>>
>>>
>>>>> Ajit
>>>>>
>>>>> tree-optimization: Add register pressure heuristics
>>>>>
>>>>> Currently code sinking heuristics are based on profile data like
>>>>> basic block count and sink frequency threshold. We have removed
>>>>> such heuristics to add register pressure heuristics based on
>>>>> live-in and live-out of early blocks and immediate dominator of
>>>>> use blocks.
>>>>>
>>>>> High register pressure region is the region where there are live-in of
>>>>> early blocks that has been modified by the early block. If there are
>>>>> modification of the variables in best block that are live-in in early
>>>>> block that are live-out of best block.
>>>>>
>>>>> 2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>         * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
>>>>>         as paramters.
>>>>>         (sink_code_in_bb): Ditto.
>>>>>         (select_best_block): Add register pressure heuristics to select
>>>>>         the best blocks in the immediate dominator for same loop nest depth.
>>>>>         (execute): Add live range analysis.
>>>>>         (additional_var_map): New function.
>>>>>         * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
>>>>>         tests on ssa_names.
>>>>>         (verify_live_on_entry): Ditto.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>         * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
>>>>>         * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
>>>>> ---
>>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
>>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
>>>>>  gcc/tree-ssa-live.cc                        | 11 ++-
>>>>>  gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
>>>>>  4 files changed, 104 insertions(+), 34 deletions(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>>
>>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>>> new file mode 100644
>>>>> index 00000000000..d3b79ca5803
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
>>>>> @@ -0,0 +1,15 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>>> +void bar();
>>>>> +int j;
>>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>>> +{
>>>>> +  int l;
>>>>> +  l = a + b + c + d +e + f;
>>>>> +  if (a != 5)
>>>>> +    {
>>>>> +      bar();
>>>>> +      j = l;
>>>>> +    }
>>>>> +}
>>>>> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>> new file mode 100644
>>>>> index 00000000000..84e7938c54f
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
>>>>> @@ -0,0 +1,19 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
>>>>> +void bar();
>>>>> +int j, x;
>>>>> +void foo(int a, int b, int c, int d, int e, int f)
>>>>> +{
>>>>> +  int l;
>>>>> +  l = a + b + c + d +e + f;
>>>>> +  if (a != 5)
>>>>> +    {
>>>>> +      bar();
>>>>> +      if (b != 3)
>>>>> +        x = 3;
>>>>> +      else
>>>>> +        x = 5;
>>>>> +      j = l;
>>>>> +    }
>>>>> +}
>>>>> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
>>>>> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
>>>>> index f06daf23035..998fe588278 100644
>>>>> --- a/gcc/tree-ssa-live.cc
>>>>> +++ b/gcc/tree-ssa-live.cc
>>>>> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
>>>>>      def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>>>>>
>>>>>    /* An undefined local variable does not need to be very alive.  */
>>>>> -  if (ssa_undefined_value_p (ssa_name, false))
>>>>> +  if (virtual_operand_p (ssa_name)
>>>>> +      || ssa_undefined_value_p (ssa_name, false))
>>>>>      return;
>>>>>
>>>>>    /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
>>>>> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
>>>>>
>>>>>
>>>>>  /* Verify that the info in LIVE matches the current cfg.  */
>>>>> -
>>>>>  static void
>>>>>  verify_live_on_entry (tree_live_info_p live)
>>>>>  {
>>>>> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
>>>>>           tree d = NULL_TREE;
>>>>>           bitmap loe;
>>>>>           var = partition_to_var (map, i);
>>>>> +         if (var == NULL_TREE)
>>>>> +           continue;
>>>>>           stmt = SSA_NAME_DEF_STMT (var);
>>>>>           tmp = gimple_bb (stmt);
>>>>> +
>>>>>           if (SSA_NAME_VAR (var))
>>>>>             d = ssa_default_def (cfun, SSA_NAME_VAR (var));
>>>>> -
>>>>>           loe = live_on_entry (live, e->dest);
>>>>>           if (loe && bitmap_bit_p (loe, i))
>>>>>             {
>>>>> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
>>>>>               {
>>>>>                 /* An undefined local variable does not need to be very
>>>>>                    alive.  */
>>>>> -               if (ssa_undefined_value_p (var, false))
>>>>> +               if (virtual_operand_p (var)
>>>>> +                   || ssa_undefined_value_p (var, false))
>>>>>                   continue;
>>>>>
>>>>>                 /* The only way this var shouldn't be marked live on entry is
>>>>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
>>>>> index a360c5cdd6e..d0c9ef1ab86 100644
>>>>> --- a/gcc/tree-ssa-sink.cc
>>>>> +++ b/gcc/tree-ssa-sink.cc
>>>>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>>>     tree, return the best basic block between them (inclusive) to place
>>>>>     statements.
>>>>>
>>>>> +   The best basic block should be an immediate dominator of
>>>>> +   best basic block if we've moved to same loop nest.
>>>>> +
>>>>>     We want the most control dependent block in the shallowest loop nest.
>>>>>
>>>>>     If the resulting block is in a shallower loop nest, then use it.  Else
>>>>> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
>>>>>  static basic_block
>>>>>  select_best_block (basic_block early_bb,
>>>>>                    basic_block late_bb,
>>>>> -                  gimple *stmt)
>>>>> +                  gimple *stmt,
>>>>> +                  tree_live_info_p &live)
>>>>>  {
>>>>>    basic_block best_bb = late_bb;
>>>>>    basic_block temp_bb = late_bb;
>>>>> -  int threshold;
>>>>>
>>>>>    while (temp_bb != early_bb)
>>>>>      {
>>>>>        /* If we've moved into a lower loop nest, then that becomes
>>>>>          our best block.  */
>>>>> -      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
>>>>> +      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
>>>>>         best_bb = temp_bb;
>>>>>
>>>>>        /* Walk up the dominator tree, hopefully we'll find a shallower
>>>>>          loop nest.  */
>>>>>        temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
>>>>>      }
>>>>> -
>>>>>    /* Placing a statement before a setjmp-like function would be invalid
>>>>>       (it cannot be reevaluated when execution follows an abnormal edge).
>>>>>       If we selected a block with abnormal predecessors, just punt.  */
>>>>> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
>>>>>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
>>>>>      return early_bb;
>>>>>
>>>>> -  /* Get the sinking threshold.  If the statement to be moved has memory
>>>>> -     operands, then increase the threshold by 7% as those are even more
>>>>> -     profitable to avoid, clamping at 100%.  */
>>>>> -  threshold = param_sink_frequency_threshold;
>>>>> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
>>>>> -    {
>>>>> -      threshold += 7;
>>>>> -      if (threshold > 100)
>>>>> -       threshold = 100;
>>>>> -    }
>>>>> +  int best_bb_liveout_cnt
>>>>> +    = bitmap_count_bits (&live->liveout[best_bb->index]);
>>>>> +  int early_bb_liveout_cnt
>>>>> +     = bitmap_count_bits (&live->liveout[early_bb->index]);
>>>>> +  int early_livein_cnt
>>>>> +     = bitmap_count_bits (&live->livein[early_bb->index]);
>>>>> +
>>>>> +  /* High register pressure region is the region where there are live-in of
>>>>> +     early blocks that has been modified by the early block. If there are
>>>>> +     modification of the variables in best block that are live-in in early
>>>>> +     block that are live-out of best block.  */
>>>>> +  bool live_in_rgn = (early_livein_cnt != 0
>>>>> +                     && early_bb_liveout_cnt <= early_livein_cnt);
>>>>> +
>>>>> +  bool high_reg_pressure_rgn
>>>>> +    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
>>>>> +       && best_bb_liveout_cnt <= early_bb_liveout_cnt);
>>>>>
>>>>>    /* If BEST_BB is at the same nesting level, then require it to have
>>>>> -     significantly lower execution frequency to avoid gratuitous movement.  */
>>>>> +     significantly high register pressure region to avoid gratuitous
>>>>> +      movement.  */
>>>>>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>>>>> -      /* If result of comparsion is unknown, prefer EARLY_BB.
>>>>> -        Thus use !(...>=..) rather than (...<...)  */
>>>>> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
>>>>> -    return best_bb;
>>>>> +     && high_reg_pressure_rgn)
>>>>> +    {
>>>>> +     /* Avoid sinking to immediate dominator if the statement to be moved
>>>>> +       has memory operand and same loop nest.  */
>>>>> +      if (best_bb != late_bb && gimple_vuse (stmt))
>>>>> +       return late_bb;
>>>>> +      return best_bb;
>>>>> +    }
>>>>>
>>>>>    /* No better block found, so return EARLY_BB, which happens to be the
>>>>>       statement's original block.  */
>>>>> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
>>>>>  static bool
>>>>>  statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>                          gimple_stmt_iterator *togsi, bool *zero_uses_p,
>>>>> -                        virtual_operand_live &vop_live)
>>>>> +                        virtual_operand_live &vop_live,
>>>>> +                        tree_live_info_p &live)
>>>>>  {
>>>>>    gimple *use;
>>>>>    use_operand_p one_use = NULL_USE_OPERAND_P;
>>>>> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
>>>>>         return false;
>>>>>
>>>>> -      commondom = select_best_block (frombb, commondom, stmt);
>>>>> +      commondom = select_best_block (frombb, commondom, stmt, live);
>>>>>
>>>>>        if (commondom == frombb)
>>>>>         return false;
>>>>> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>             continue;
>>>>>           break;
>>>>>         }
>>>>> +
>>>>>        use = USE_STMT (one_use);
>>>>>
>>>>>        if (gimple_code (use) != GIMPLE_PHI)
>>>>>         {
>>>>> -         sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
>>>>> +         sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
>>>>>
>>>>>           if (sinkbb == frombb)
>>>>>             return false;
>>>>>
>>>>> -         if (sinkbb == gimple_bb (use))
>>>>> -           *togsi = gsi_for_stmt (use);
>>>>> -         else
>>>>> -           *togsi = gsi_after_labels (sinkbb);
>>>>> +         *togsi = gsi_after_labels (sinkbb);
>>>>>
>>>>>           return true;
>>>>>         }
>>>>> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
>>>>>    if (!sinkbb)
>>>>>      return false;
>>>>>
>>>>> -  sinkbb = select_best_block (frombb, sinkbb, stmt);
>>>>> +  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
>>>>>    if (!sinkbb || sinkbb == frombb)
>>>>>      return false;
>>>>>
>>>>> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
>>>>>  /* Perform code sinking on BB */
>>>>>
>>>>>  static unsigned
>>>>> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>>> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
>>>>> +               tree_live_info_p &live)
>>>>>  {
>>>>>    gimple_stmt_iterator gsi;
>>>>>    edge_iterator ei;
>>>>> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
>>>>>        gimple_stmt_iterator togsi;
>>>>>        bool zero_uses_p;
>>>>>
>>>>> -      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
>>>>> +      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
>>>>> +                                   vop_live, live))
>>>>>         {
>>>>>           gimple_stmt_iterator saved = gsi;
>>>>>           if (!gsi_end_p (gsi))
>>>>> @@ -814,6 +829,15 @@ private:
>>>>>    bool unsplit_edges;
>>>>>  }; // class pass_sink_code
>>>>>
>>>>> +void additional_var_map (var_map &map)
>>>>> +{
>>>>> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
>>>>> +  map->outofssa_p = false;
>>>>> +  basic_block bb;
>>>>> +  FOR_EACH_BB_FN (bb, cfun)
>>>>> +    bitmap_set_bit (map->bmp_bbs, bb->index);
>>>>> +}
>>>>> +
>>>>>  unsigned int
>>>>>  pass_sink_code::execute (function *fun)
>>>>>  {
>>>>> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
>>>>>    calculate_dominance_info (CDI_DOMINATORS);
>>>>>
>>>>>    virtual_operand_live vop_live;
>>>>> +  var_map map = init_var_map (num_ssa_names);
>>>>> +  additional_var_map (map);
>>>>> +  tree_live_info_p live = calculate_live_ranges (map, true);
>>>>>
>>>>>    int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>>>>>    int n = inverted_rev_post_order_compute (fun, rpo);
>>>>> +
>>>>>    for (int i = 0; i < n; ++i)
>>>>> -    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
>>>>> +    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
>>>>>    free (rpo);
>>>>> +  if (live)
>>>>> +    delete_tree_live_info (live);
>>>>> +
>>>>> +  if (map)
>>>>> +    delete_var_map (map);
>>>>>
>>>>>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
>>>>>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
>>>>> --
>>>>> 2.39.3
>>>>>
>>>>>
Richard Biener Nov. 16, 2023, 7:16 a.m. UTC | #7
On Thu, Nov 16, 2023 at 7:12 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
>
> Hello Richard:
>
> With the below decison making I get the performance at par with trunk
> changes and better than trunk for FP and INT SPEC 2017 benchmarks.
>
> int best_bb_liveout_cnt
>     = bitmap_count_bits (&live->liveout[best_bb->index]);
> int early_bb_liveout_cnt
>     = bitmap_count_bits (&live->liveout[early_bb->index]);
> int early_livein_cnt
>     = bitmap_count_bits (&live->livein[early_bb->index]);
>
>     /* High register pressure region is the region where there are live-in of
>      early blocks that has been modified by the early block. If there are
>      modification of the variables in best block that are live-in in early
>      block that are live-out of best block.  */
>   bool live_in_rgn = (early_livein_cnt != 0
>                       && early_bb_liveout_cnt <= early_livein_cnt);
>
>   bool high_reg_pressure_rgn = false;
>
>   if (live_in_rgn)
>     high_reg_pressure_rgn
>       = (best_bb_liveout_cnt <= early_bb_liveout_cnt);
>   else
>     high_reg_pressure_rgn
>       = (best_bb_liveout_cnt != 0 && best_bb_liveout_cnt <= early_bb_liveout_cnt);
>
>   high_reg_pressure_rgn
>     = (high_reg_pressure_rgn) && !(best_bb->count >= early_bb->count));
>
> I have included profile data without any sinking threshold or multiplying with 100.
> This will fixes the error prone code as you have mentioned.
>
> This will add the register pressure and profile data both in better way.
>
> Please let me know if this is okay to submit.
>
> I will send the patch accordingly.

Please send an updated patch.

Richard.

> Thanks & Regards
> Ajit
>
> On 03/11/23 8:24 pm, Ajit Agarwal wrote:
> > Hello Richard:
> >
> >
> > On 03/11/23 7:06 pm, Richard Biener wrote:
> >> On Fri, Nov 3, 2023 at 11:20 AM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
> >>>
> >>> Hello Richard:
> >>>
> >>> On 03/11/23 12:51 pm, Richard Biener wrote:
> >>>> On Thu, Nov 2, 2023 at 9:50 PM Ajit Agarwal <aagarwa1@linux.ibm.com> wrote:
> >>>>>
> >>>>> Hello All:
> >>>>>
> >> [...]
> >>>>>
> >>>>> High register pressure region is the region where there are live-in of
> >>>>> early blocks that has been modified by the early block. If there are
> >>>>> modification of the variables in best block that are live-in in early
> >>>>> block that are live-out of best block.
> >>>>
> >>>> ?!  Parse error.
> >>>>
> >>>
> >>> I didnt understand what you meant here. Please suggest.
> >>
> >> I can't even guess what that paragraph means.  It fails at a
> >> parsing level already, I can't even start to reason about what
> >> the sentences mean.
> >
> > Sorry for that I will modify.
> >
> >>
> >>>>> Bootstrapped and regtested on powerpc64-linux-gnu.
> >>>>
> >>>> What's the effect on code generation?
> >>>>
> >>>> Note that live is a quadratic problem while sinking was not.  You
> >>>> are effectively making the pass unfit for -O1.
> >>>>
> >>>> You are computing "liveness" on GIMPLE where within EBBs there
> >>>> isn't really any particular order of stmts, so it's kind of a garbage
> >>>> heuristic.  Likewise you are not computing the effect that moving
> >>>> a stmt has on liveness as far as I can see but you are just identifying
> >>>> some odd metrics (I don't really understand them) to rank blocks,
> >>>> not even taking the register file size into account.
> >>>
> >>>
> >>> if the live out of best_bb  <= live out of early_bb, that shows
> >>> that there are modification in best_bb.
> >>
> >> Hm?  Do you maybe want to say that if live_out (bb) < live_in (bb)
> >> then some variables die during the execution of bb?
> >
> > live_out (bb) < live_in(bb) means in bb there may be KILL (Variables)
> > and there are more GEN (Variables).
> >
> >   Otherwise,
> >> if live_out (early) > live_out (best) then somewhere on the path
> >> from early to best some variables die.
> >>
> >
> > If live_out (early) > live_out (best) means there are more GEN (Variables)
> > between path from early to best.
> >
> >
> >>> Then it's
> >>> safer to move statements in best_bb as there are lesser interfering
> >>> live variables in best_bb.
> >>
> >> consider a stmt
> >>
> >>  a = b + c;
> >>
> >> where b and c die at the definition of a.  Then moving the stmt
> >> down from early_bb means you increase live_out (early_bb) by
> >> one.  So why's that "safer" then?  Of course live_out (best_bb)
> >> also increases by two then.
> >>
> >
> > If b and c die at the definition of a and generates a live_in(early_bb)
> > would be live_out(early_bb) - 2 + 1.
> >
> > the moving the stmt from early_bb down to best_bb increases live_out(early_bb)
> > by one and live_out (best_bb) depends on the LIVEIN(for all successors of best_bb)
> > which may be same even if we move down.
> >
> > There are chances that live_out (best_bb) greater if for all successors of
> > best_bb there are more GEN ( variables). If live_out (best_bb) is less
> > means there more KILL (Variables) in successors of best_bb.
> >
> > With my heuristics live_out (best_bb ) > live_out (early_bb) then we dont do
> > code motion as there are chances of more interfering live ranges. If liveout(best_bb)
> > <= liveout (early_bb) then we do code motion as there is there are more KILL(for
> > all successors of best_bb) and there is less chance of interfering live ranges.
> >
> > With moving down above stmt from early_bb to best_bb increases live_out(early_bb)
> > by one but live_out(best_bb) may be remains. If live_out (early_bb) increase by 1
> > but if it becomes > live_out(best_bb) then we dont do code motion if we have more GEN (Variables) in best_bb otherewise its safer to do
> > code motion.
> >
> > for above statement a = b + c dies b and c and generates a in early_bb then
> > liveout(early_bb) increases by 1. If before moving if liveout (best_bb) is 10
> > and then liveout (early_bb) becomes > 10 then we dont do code motion otherwise
> > we do code motion.
> >
> >
> >
> >
> >
> >>> if there are lesser live out in best_bb, there is lesser chance
> >>> of interfering live ranges and hence moving statements in best_bb
> >>> will not increase register pressure.
> >>>
> >>> If the liveout of best_bb is greater than live-out of early_bb,
> >>> moving statements in best_bb will increase chances of more interfering
> >>> live ranges and hence increase in register pressure.
> >>>
> >>> This is how the heuristics is defined.
> >>
> >> I don't think they will work.  Do you have any numbers?
> >>
> >
> > My heuristics will work as mentioned above. I will  run the spec benchmarks
> > and will able to give performance numbers.
> >
> > Thanks & Regards
> > Ajit
> >
> >>>
> >>>>
> >>>> You are replacing the hot/cold heuristic.
> >>>
> >>>>
> >>>> IMHO the sinking pass is the totally wrong place to do anything
> >>>> about register pressure.  You are trying to solve a scheduling
> >>>> problem by just looking at a single stmt.
> >>>>
> >>>
> >>> bb->count from profile.cc are prone to errors as you have
> >>> mentioned in previous mails. Main bottlenecks with code
> >>> motion is increase in register pressure as that counts to
> >>> spills in later phases of the compiler backend.
> >>>
> >>> Calculation of best_bb based of immediate dominator should
> >>> consider register pressure instead of hold cold regions as that
> >>> would effect code generation.
> >>>
> >>> If there is increase in register pressure with code motion and if
> >>> we are moving into colder regions, that wont improve code generations.
> >>>
> >>> Hold/cold should be criteria but not the improved criteria with
> >>> code motion.
> >>>
> >>> We should consider register pressure with code motion than hot/cold
> >>> regions.
> >>>
> >>> Thanks & Regards
> >>> Ajit
> >>>
> >>>> Richard.
> >>>>
> >>>>> Thanks & Regards
> >>>
> >>>
> >>>>> Ajit
> >>>>>
> >>>>> tree-optimization: Add register pressure heuristics
> >>>>>
> >>>>> Currently code sinking heuristics are based on profile data like
> >>>>> basic block count and sink frequency threshold. We have removed
> >>>>> such heuristics to add register pressure heuristics based on
> >>>>> live-in and live-out of early blocks and immediate dominator of
> >>>>> use blocks.
> >>>>>
> >>>>> High register pressure region is the region where there are live-in of
> >>>>> early blocks that has been modified by the early block. If there are
> >>>>> modification of the variables in best block that are live-in in early
> >>>>> block that are live-out of best block.
> >>>>>
> >>>>> 2023-11-03  Ajit Kumar Agarwal  <aagarwa1@linux.ibm.com>
> >>>>>
> >>>>> gcc/ChangeLog:
> >>>>>
> >>>>>         * tree-ssa-sink.cc (statement_sink_location): Add tree_live_info_p
> >>>>>         as paramters.
> >>>>>         (sink_code_in_bb): Ditto.
> >>>>>         (select_best_block): Add register pressure heuristics to select
> >>>>>         the best blocks in the immediate dominator for same loop nest depth.
> >>>>>         (execute): Add live range analysis.
> >>>>>         (additional_var_map): New function.
> >>>>>         * tree-ssa-live.cc (set_var_live_on_entry): Add virtual operand
> >>>>>         tests on ssa_names.
> >>>>>         (verify_live_on_entry): Ditto.
> >>>>>
> >>>>> gcc/testsuite/ChangeLog:
> >>>>>
> >>>>>         * gcc.dg/tree-ssa/ssa-sink-21.c: New test.
> >>>>>         * gcc.dg/tree-ssa/ssa-sink-22.c: New test.
> >>>>> ---
> >>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 15 ++++
> >>>>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c | 19 +++++
> >>>>>  gcc/tree-ssa-live.cc                        | 11 ++-
> >>>>>  gcc/tree-ssa-sink.cc                        | 93 ++++++++++++++-------
> >>>>>  4 files changed, 104 insertions(+), 34 deletions(-)
> >>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >>>>>
> >>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..d3b79ca5803
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> >>>>> @@ -0,0 +1,15 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> >>>>> +void bar();
> >>>>> +int j;
> >>>>> +void foo(int a, int b, int c, int d, int e, int f)
> >>>>> +{
> >>>>> +  int l;
> >>>>> +  l = a + b + c + d +e + f;
> >>>>> +  if (a != 5)
> >>>>> +    {
> >>>>> +      bar();
> >>>>> +      j = l;
> >>>>> +    }
> >>>>> +}
> >>>>> +/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
> >>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >>>>> new file mode 100644
> >>>>> index 00000000000..84e7938c54f
> >>>>> --- /dev/null
> >>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
> >>>>> @@ -0,0 +1,19 @@
> >>>>> +/* { dg-do compile } */
> >>>>> +/* { dg-options "-O2 -fdump-tree-sink-stats" } */
> >>>>> +void bar();
> >>>>> +int j, x;
> >>>>> +void foo(int a, int b, int c, int d, int e, int f)
> >>>>> +{
> >>>>> +  int l;
> >>>>> +  l = a + b + c + d +e + f;
> >>>>> +  if (a != 5)
> >>>>> +    {
> >>>>> +      bar();
> >>>>> +      if (b != 3)
> >>>>> +        x = 3;
> >>>>> +      else
> >>>>> +        x = 5;
> >>>>> +      j = l;
> >>>>> +    }
> >>>>> +}
> >>>>> +/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
> >>>>> diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
> >>>>> index f06daf23035..998fe588278 100644
> >>>>> --- a/gcc/tree-ssa-live.cc
> >>>>> +++ b/gcc/tree-ssa-live.cc
> >>>>> @@ -1141,7 +1141,8 @@ set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
> >>>>>      def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
> >>>>>
> >>>>>    /* An undefined local variable does not need to be very alive.  */
> >>>>> -  if (ssa_undefined_value_p (ssa_name, false))
> >>>>> +  if (virtual_operand_p (ssa_name)
> >>>>> +      || ssa_undefined_value_p (ssa_name, false))
> >>>>>      return;
> >>>>>
> >>>>>    /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
> >>>>> @@ -1540,7 +1541,6 @@ debug (tree_live_info_d *ptr)
> >>>>>
> >>>>>
> >>>>>  /* Verify that the info in LIVE matches the current cfg.  */
> >>>>> -
> >>>>>  static void
> >>>>>  verify_live_on_entry (tree_live_info_p live)
> >>>>>  {
> >>>>> @@ -1569,11 +1569,13 @@ verify_live_on_entry (tree_live_info_p live)
> >>>>>           tree d = NULL_TREE;
> >>>>>           bitmap loe;
> >>>>>           var = partition_to_var (map, i);
> >>>>> +         if (var == NULL_TREE)
> >>>>> +           continue;
> >>>>>           stmt = SSA_NAME_DEF_STMT (var);
> >>>>>           tmp = gimple_bb (stmt);
> >>>>> +
> >>>>>           if (SSA_NAME_VAR (var))
> >>>>>             d = ssa_default_def (cfun, SSA_NAME_VAR (var));
> >>>>> -
> >>>>>           loe = live_on_entry (live, e->dest);
> >>>>>           if (loe && bitmap_bit_p (loe, i))
> >>>>>             {
> >>>>> @@ -1614,7 +1616,8 @@ verify_live_on_entry (tree_live_info_p live)
> >>>>>               {
> >>>>>                 /* An undefined local variable does not need to be very
> >>>>>                    alive.  */
> >>>>> -               if (ssa_undefined_value_p (var, false))
> >>>>> +               if (virtual_operand_p (var)
> >>>>> +                   || ssa_undefined_value_p (var, false))
> >>>>>                   continue;
> >>>>>
> >>>>>                 /* The only way this var shouldn't be marked live on entry is
> >>>>> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> >>>>> index a360c5cdd6e..d0c9ef1ab86 100644
> >>>>> --- a/gcc/tree-ssa-sink.cc
> >>>>> +++ b/gcc/tree-ssa-sink.cc
> >>>>> @@ -176,6 +176,9 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
> >>>>>     tree, return the best basic block between them (inclusive) to place
> >>>>>     statements.
> >>>>>
> >>>>> +   The best basic block should be an immediate dominator of
> >>>>> +   best basic block if we've moved to same loop nest.
> >>>>> +
> >>>>>     We want the most control dependent block in the shallowest loop nest.
> >>>>>
> >>>>>     If the resulting block is in a shallower loop nest, then use it.  Else
> >>>>> @@ -191,24 +194,23 @@ nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
> >>>>>  static basic_block
> >>>>>  select_best_block (basic_block early_bb,
> >>>>>                    basic_block late_bb,
> >>>>> -                  gimple *stmt)
> >>>>> +                  gimple *stmt,
> >>>>> +                  tree_live_info_p &live)
> >>>>>  {
> >>>>>    basic_block best_bb = late_bb;
> >>>>>    basic_block temp_bb = late_bb;
> >>>>> -  int threshold;
> >>>>>
> >>>>>    while (temp_bb != early_bb)
> >>>>>      {
> >>>>>        /* If we've moved into a lower loop nest, then that becomes
> >>>>>          our best block.  */
> >>>>> -      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
> >>>>> +      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
> >>>>>         best_bb = temp_bb;
> >>>>>
> >>>>>        /* Walk up the dominator tree, hopefully we'll find a shallower
> >>>>>          loop nest.  */
> >>>>>        temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
> >>>>>      }
> >>>>> -
> >>>>>    /* Placing a statement before a setjmp-like function would be invalid
> >>>>>       (it cannot be reevaluated when execution follows an abnormal edge).
> >>>>>       If we selected a block with abnormal predecessors, just punt.  */
> >>>>> @@ -233,24 +235,36 @@ select_best_block (basic_block early_bb,
> >>>>>        && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
> >>>>>      return early_bb;
> >>>>>
> >>>>> -  /* Get the sinking threshold.  If the statement to be moved has memory
> >>>>> -     operands, then increase the threshold by 7% as those are even more
> >>>>> -     profitable to avoid, clamping at 100%.  */
> >>>>> -  threshold = param_sink_frequency_threshold;
> >>>>> -  if (gimple_vuse (stmt) || gimple_vdef (stmt))
> >>>>> -    {
> >>>>> -      threshold += 7;
> >>>>> -      if (threshold > 100)
> >>>>> -       threshold = 100;
> >>>>> -    }
> >>>>> +  int best_bb_liveout_cnt
> >>>>> +    = bitmap_count_bits (&live->liveout[best_bb->index]);
> >>>>> +  int early_bb_liveout_cnt
> >>>>> +     = bitmap_count_bits (&live->liveout[early_bb->index]);
> >>>>> +  int early_livein_cnt
> >>>>> +     = bitmap_count_bits (&live->livein[early_bb->index]);
> >>>>> +
> >>>>> +  /* High register pressure region is the region where there are live-in of
> >>>>> +     early blocks that has been modified by the early block. If there are
> >>>>> +     modification of the variables in best block that are live-in in early
> >>>>> +     block that are live-out of best block.  */
> >>>>> +  bool live_in_rgn = (early_livein_cnt != 0
> >>>>> +                     && early_bb_liveout_cnt <= early_livein_cnt);
> >>>>> +
> >>>>> +  bool high_reg_pressure_rgn
> >>>>> +    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
> >>>>> +       && best_bb_liveout_cnt <= early_bb_liveout_cnt);
> >>>>>
> >>>>>    /* If BEST_BB is at the same nesting level, then require it to have
> >>>>> -     significantly lower execution frequency to avoid gratuitous movement.  */
> >>>>> +     significantly high register pressure region to avoid gratuitous
> >>>>> +      movement.  */
> >>>>>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
> >>>>> -      /* If result of comparsion is unknown, prefer EARLY_BB.
> >>>>> -        Thus use !(...>=..) rather than (...<...)  */
> >>>>> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> >>>>> -    return best_bb;
> >>>>> +     && high_reg_pressure_rgn)
> >>>>> +    {
> >>>>> +     /* Avoid sinking to immediate dominator if the statement to be moved
> >>>>> +       has memory operand and same loop nest.  */
> >>>>> +      if (best_bb != late_bb && gimple_vuse (stmt))
> >>>>> +       return late_bb;
> >>>>> +      return best_bb;
> >>>>> +    }
> >>>>>
> >>>>>    /* No better block found, so return EARLY_BB, which happens to be the
> >>>>>       statement's original block.  */
> >>>>> @@ -265,7 +279,8 @@ select_best_block (basic_block early_bb,
> >>>>>  static bool
> >>>>>  statement_sink_location (gimple *stmt, basic_block frombb,
> >>>>>                          gimple_stmt_iterator *togsi, bool *zero_uses_p,
> >>>>> -                        virtual_operand_live &vop_live)
> >>>>> +                        virtual_operand_live &vop_live,
> >>>>> +                        tree_live_info_p &live)
> >>>>>  {
> >>>>>    gimple *use;
> >>>>>    use_operand_p one_use = NULL_USE_OPERAND_P;
> >>>>> @@ -413,7 +428,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> >>>>>        if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
> >>>>>         return false;
> >>>>>
> >>>>> -      commondom = select_best_block (frombb, commondom, stmt);
> >>>>> +      commondom = select_best_block (frombb, commondom, stmt, live);
> >>>>>
> >>>>>        if (commondom == frombb)
> >>>>>         return false;
> >>>>> @@ -430,19 +445,17 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> >>>>>             continue;
> >>>>>           break;
> >>>>>         }
> >>>>> +
> >>>>>        use = USE_STMT (one_use);
> >>>>>
> >>>>>        if (gimple_code (use) != GIMPLE_PHI)
> >>>>>         {
> >>>>> -         sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
> >>>>> +         sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
> >>>>>
> >>>>>           if (sinkbb == frombb)
> >>>>>             return false;
> >>>>>
> >>>>> -         if (sinkbb == gimple_bb (use))
> >>>>> -           *togsi = gsi_for_stmt (use);
> >>>>> -         else
> >>>>> -           *togsi = gsi_after_labels (sinkbb);
> >>>>> +         *togsi = gsi_after_labels (sinkbb);
> >>>>>
> >>>>>           return true;
> >>>>>         }
> >>>>> @@ -454,7 +467,7 @@ statement_sink_location (gimple *stmt, basic_block frombb,
> >>>>>    if (!sinkbb)
> >>>>>      return false;
> >>>>>
> >>>>> -  sinkbb = select_best_block (frombb, sinkbb, stmt);
> >>>>> +  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
> >>>>>    if (!sinkbb || sinkbb == frombb)
> >>>>>      return false;
> >>>>>
> >>>>> @@ -643,7 +656,8 @@ sink_common_stores_to_bb (basic_block bb)
> >>>>>  /* Perform code sinking on BB */
> >>>>>
> >>>>>  static unsigned
> >>>>> -sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
> >>>>> +sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
> >>>>> +               tree_live_info_p &live)
> >>>>>  {
> >>>>>    gimple_stmt_iterator gsi;
> >>>>>    edge_iterator ei;
> >>>>> @@ -670,7 +684,8 @@ sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
> >>>>>        gimple_stmt_iterator togsi;
> >>>>>        bool zero_uses_p;
> >>>>>
> >>>>> -      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
> >>>>> +      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
> >>>>> +                                   vop_live, live))
> >>>>>         {
> >>>>>           gimple_stmt_iterator saved = gsi;
> >>>>>           if (!gsi_end_p (gsi))
> >>>>> @@ -814,6 +829,15 @@ private:
> >>>>>    bool unsplit_edges;
> >>>>>  }; // class pass_sink_code
> >>>>>
> >>>>> +void additional_var_map (var_map &map)
> >>>>> +{
> >>>>> +  map->bmp_bbs = BITMAP_ALLOC (NULL);
> >>>>> +  map->outofssa_p = false;
> >>>>> +  basic_block bb;
> >>>>> +  FOR_EACH_BB_FN (bb, cfun)
> >>>>> +    bitmap_set_bit (map->bmp_bbs, bb->index);
> >>>>> +}
> >>>>> +
> >>>>>  unsigned int
> >>>>>  pass_sink_code::execute (function *fun)
> >>>>>  {
> >>>>> @@ -827,12 +851,21 @@ pass_sink_code::execute (function *fun)
> >>>>>    calculate_dominance_info (CDI_DOMINATORS);
> >>>>>
> >>>>>    virtual_operand_live vop_live;
> >>>>> +  var_map map = init_var_map (num_ssa_names);
> >>>>> +  additional_var_map (map);
> >>>>> +  tree_live_info_p live = calculate_live_ranges (map, true);
> >>>>>
> >>>>>    int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >>>>>    int n = inverted_rev_post_order_compute (fun, rpo);
> >>>>> +
> >>>>>    for (int i = 0; i < n; ++i)
> >>>>> -    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
> >>>>> +    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
> >>>>>    free (rpo);
> >>>>> +  if (live)
> >>>>> +    delete_tree_live_info (live);
> >>>>> +
> >>>>> +  if (map)
> >>>>> +    delete_var_map (map);
> >>>>>
> >>>>>    statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
> >>>>>    statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);
> >>>>> --
> >>>>> 2.39.3
> >>>>>
> >>>>>
diff mbox series

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 00000000000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+    {
+      bar();
+      j = l;
+    }
+}
+/* { dg-final { scan-tree-dump {l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
new file mode 100644
index 00000000000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-22.c
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+    {
+      bar();
+      if (b != 3)
+        x = 3;
+      else
+        x = 5;
+      j = l;
+    }
+}
+/* { dg-final { scan-tree-dump {l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-live.cc b/gcc/tree-ssa-live.cc
index f06daf23035..998fe588278 100644
--- a/gcc/tree-ssa-live.cc
+++ b/gcc/tree-ssa-live.cc
@@ -1141,7 +1141,8 @@  set_var_live_on_entry (tree ssa_name, tree_live_info_p live)
     def_bb = ENTRY_BLOCK_PTR_FOR_FN (cfun);
 
   /* An undefined local variable does not need to be very alive.  */
-  if (ssa_undefined_value_p (ssa_name, false))
+  if (virtual_operand_p (ssa_name)
+      || ssa_undefined_value_p (ssa_name, false))
     return;
 
   /* Visit each use of SSA_NAME and if it isn't in the same block as the def,
@@ -1540,7 +1541,6 @@  debug (tree_live_info_d *ptr)
 
 
 /* Verify that the info in LIVE matches the current cfg.  */
-
 static void
 verify_live_on_entry (tree_live_info_p live)
 {
@@ -1569,11 +1569,13 @@  verify_live_on_entry (tree_live_info_p live)
 	  tree d = NULL_TREE;
 	  bitmap loe;
 	  var = partition_to_var (map, i);
+	  if (var == NULL_TREE)
+	    continue;
 	  stmt = SSA_NAME_DEF_STMT (var);
 	  tmp = gimple_bb (stmt);
+
 	  if (SSA_NAME_VAR (var))
 	    d = ssa_default_def (cfun, SSA_NAME_VAR (var));
-
 	  loe = live_on_entry (live, e->dest);
 	  if (loe && bitmap_bit_p (loe, i))
 	    {
@@ -1614,7 +1616,8 @@  verify_live_on_entry (tree_live_info_p live)
 	      {
 		/* An undefined local variable does not need to be very
 		   alive.  */
-		if (ssa_undefined_value_p (var, false))
+		if (virtual_operand_p (var)
+		    || ssa_undefined_value_p (var, false))
 		  continue;
 
 		/* The only way this var shouldn't be marked live on entry is
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index a360c5cdd6e..d0c9ef1ab86 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -176,6 +176,9 @@  nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
    tree, return the best basic block between them (inclusive) to place
    statements.
 
+   The best basic block should be an immediate dominator of
+   best basic block if we've moved to same loop nest.
+
    We want the most control dependent block in the shallowest loop nest.
 
    If the resulting block is in a shallower loop nest, then use it.  Else
@@ -191,24 +194,23 @@  nearest_common_dominator_of_uses (def_operand_p def_p, bool *debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
 		   basic_block late_bb,
-		   gimple *stmt)
+		   gimple *stmt,
+		   tree_live_info_p &live)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
-  int threshold;
 
   while (temp_bb != early_bb)
     {
       /* If we've moved into a lower loop nest, then that becomes
 	 our best block.  */
-      if (bb_loop_depth (temp_bb) < bb_loop_depth (best_bb))
+      if (bb_loop_depth (temp_bb) <= bb_loop_depth (best_bb))
 	best_bb = temp_bb;
 
       /* Walk up the dominator tree, hopefully we'll find a shallower
  	 loop nest.  */
       temp_bb = get_immediate_dominator (CDI_DOMINATORS, temp_bb);
     }
-
   /* Placing a statement before a setjmp-like function would be invalid
      (it cannot be reevaluated when execution follows an abnormal edge).
      If we selected a block with abnormal predecessors, just punt.  */
@@ -233,24 +235,36 @@  select_best_block (basic_block early_bb,
       && !dominated_by_p (CDI_DOMINATORS, best_bb->loop_father->latch, best_bb))
     return early_bb;
 
-  /* Get the sinking threshold.  If the statement to be moved has memory
-     operands, then increase the threshold by 7% as those are even more
-     profitable to avoid, clamping at 100%.  */
-  threshold = param_sink_frequency_threshold;
-  if (gimple_vuse (stmt) || gimple_vdef (stmt))
-    {
-      threshold += 7;
-      if (threshold > 100)
-	threshold = 100;
-    }
+  int best_bb_liveout_cnt
+    = bitmap_count_bits (&live->liveout[best_bb->index]);
+  int early_bb_liveout_cnt
+     = bitmap_count_bits (&live->liveout[early_bb->index]);
+  int early_livein_cnt
+     = bitmap_count_bits (&live->livein[early_bb->index]);
+
+  /* High register pressure region is the region where there are live-in of
+     early blocks that has been modified by the early block. If there are
+     modification of the variables in best block that are live-in in early
+     block that are live-out of best block.  */
+  bool live_in_rgn = (early_livein_cnt != 0
+		      && early_bb_liveout_cnt <= early_livein_cnt);
+
+  bool high_reg_pressure_rgn
+    = ((best_bb_liveout_cnt != 0 || live_in_rgn)
+	&& best_bb_liveout_cnt <= early_bb_liveout_cnt);
 
   /* If BEST_BB is at the same nesting level, then require it to have
-     significantly lower execution frequency to avoid gratuitous movement.  */
+     significantly high register pressure region to avoid gratuitous
+      movement.  */
   if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
-      /* If result of comparsion is unknown, prefer EARLY_BB.
-	 Thus use !(...>=..) rather than (...<...)  */
-      && !(best_bb->count * 100 >= early_bb->count * threshold))
-    return best_bb;
+     && high_reg_pressure_rgn)
+    {
+     /* Avoid sinking to immediate dominator if the statement to be moved
+	has memory operand and same loop nest.  */
+      if (best_bb != late_bb && gimple_vuse (stmt))
+	return late_bb;
+      return best_bb;
+    }
 
   /* No better block found, so return EARLY_BB, which happens to be the
      statement's original block.  */
@@ -265,7 +279,8 @@  select_best_block (basic_block early_bb,
 static bool
 statement_sink_location (gimple *stmt, basic_block frombb,
 			 gimple_stmt_iterator *togsi, bool *zero_uses_p,
-			 virtual_operand_live &vop_live)
+			 virtual_operand_live &vop_live,
+			 tree_live_info_p &live)
 {
   gimple *use;
   use_operand_p one_use = NULL_USE_OPERAND_P;
@@ -413,7 +428,7 @@  statement_sink_location (gimple *stmt, basic_block frombb,
       if (!dominated_by_p (CDI_DOMINATORS, commondom, frombb))
 	return false;
 
-      commondom = select_best_block (frombb, commondom, stmt);
+      commondom = select_best_block (frombb, commondom, stmt, live);
 
       if (commondom == frombb)
 	return false;	
@@ -430,19 +445,17 @@  statement_sink_location (gimple *stmt, basic_block frombb,
 	    continue;
 	  break;
 	}
+
       use = USE_STMT (one_use);
 
       if (gimple_code (use) != GIMPLE_PHI)
 	{
-	  sinkbb = select_best_block (frombb, gimple_bb (use), stmt);
+	  sinkbb = select_best_block (frombb, gimple_bb (use), stmt, live);
 
 	  if (sinkbb == frombb)
 	    return false;
 
-	  if (sinkbb == gimple_bb (use))
-	    *togsi = gsi_for_stmt (use);
-	  else
-	    *togsi = gsi_after_labels (sinkbb);
+	  *togsi = gsi_after_labels (sinkbb);
 
 	  return true;
 	}
@@ -454,7 +467,7 @@  statement_sink_location (gimple *stmt, basic_block frombb,
   if (!sinkbb)
     return false;
   
-  sinkbb = select_best_block (frombb, sinkbb, stmt);
+  sinkbb = select_best_block (frombb, sinkbb, stmt, live);
   if (!sinkbb || sinkbb == frombb)
     return false;
 
@@ -643,7 +656,8 @@  sink_common_stores_to_bb (basic_block bb)
 /* Perform code sinking on BB */
 
 static unsigned
-sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
+sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live,
+		tree_live_info_p &live)
 {
   gimple_stmt_iterator gsi;
   edge_iterator ei;
@@ -670,7 +684,8 @@  sink_code_in_bb (basic_block bb, virtual_operand_live &vop_live)
       gimple_stmt_iterator togsi;
       bool zero_uses_p;
 
-      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p, vop_live))
+      if (!statement_sink_location (stmt, bb, &togsi, &zero_uses_p,
+				    vop_live, live))
 	{
 	  gimple_stmt_iterator saved = gsi;
 	  if (!gsi_end_p (gsi))
@@ -814,6 +829,15 @@  private:
   bool unsplit_edges;
 }; // class pass_sink_code
 
+void additional_var_map (var_map &map)
+{
+  map->bmp_bbs = BITMAP_ALLOC (NULL);
+  map->outofssa_p = false;
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, cfun)
+    bitmap_set_bit (map->bmp_bbs, bb->index);
+}
+
 unsigned int
 pass_sink_code::execute (function *fun)
 {
@@ -827,12 +851,21 @@  pass_sink_code::execute (function *fun)
   calculate_dominance_info (CDI_DOMINATORS);
 
   virtual_operand_live vop_live;
+  var_map map = init_var_map (num_ssa_names);
+  additional_var_map (map);
+  tree_live_info_p live = calculate_live_ranges (map, true);
 
   int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
   int n = inverted_rev_post_order_compute (fun, rpo);
+
   for (int i = 0; i < n; ++i)
-    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live);
+    todo |= sink_code_in_bb (BASIC_BLOCK_FOR_FN (fun, rpo[i]), vop_live, live);
   free (rpo);
+  if (live)
+    delete_tree_live_info (live);
+
+  if (map)
+    delete_var_map (map);
 
   statistics_counter_event (fun, "Sunk statements", sink_stats.sunk);
   statistics_counter_event (fun, "Commoned stores", sink_stats.commoned);