diff mbox series

[PATCH-1,rs6000] Put constant into pseudo at expand when it needs two insns [PR86106]

Message ID 22e83da3-a81f-dd61-c04b-a39b459a965f@linux.ibm.com
State New
Headers show
Series [PATCH-1,rs6000] Put constant into pseudo at expand when it needs two insns [PR86106] | expand

Commit Message

HAO CHEN GUI March 16, 2023, 5:34 a.m. UTC
Hi,
  Currently, rs6000 directly expands to 2 insns if an integer constant is the
second operand and it needs two insns. For example, addi/addis and ori/oris.
It may not benefit when the constant is used for more than 2 times in an
extended basic block, just like the case in PR shows.

  One possible solution is to force the constant in pseudo at expand and let
propagation pass and combine pass decide if the pseudo should be replaced
with the constant or not by comparing the rtx/insn cost.

  It generates a constant move if the constant is forced to a pseudo. There
is one constant move if it's used only once. The combine pass can combine
the constant move and add/ior/xor insn and eliminate the move as the insn
cost reduces. There are multiple moves if the constant is used for several
times. In an extended basic block, these constant moves are merged to one by
propagation pass. The combine pass can't replace the pseudo with the constant
as it is no cost saving.

  In an extreme case, the constant is used twice in an extended basic block.
The cost(latency) is unchanged between putting constant in pseudo and
generating 2 insns. The dependence of instructions reduces but one more
register is used. In other case, it should be always optimal to put constant
in a pseudo.

  This patch changes the expander of integer add and force constant to a
pseudo when it needs 2 insn. Also a combine and split pattern is defined.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
2023-03-14  Haochen Gui <guihaoc@linux.ibm.com>

gcc/
	* config/rs6000/predicates.md (add_2insn_cint_operand): New predicate
	which returns true when op is a 32-bit but not a 16-bit signed
	integer constant.
	* config/rs6000/rs6000.md (add<mode>3): Put the second operand into
	register when it's a constant and need 2 add insns.
	(*add<mode>_2insn): New insn_and_split for 2-insn add.


patch.diff

Comments

Richard Biener March 16, 2023, 7:57 a.m. UTC | #1
On Thu, Mar 16, 2023 at 6:35 AM HAO CHEN GUI via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
>   Currently, rs6000 directly expands to 2 insns if an integer constant is the
> second operand and it needs two insns. For example, addi/addis and ori/oris.
> It may not benefit when the constant is used for more than 2 times in an
> extended basic block, just like the case in PR shows.
>
>   One possible solution is to force the constant in pseudo at expand and let
> propagation pass and combine pass decide if the pseudo should be replaced
> with the constant or not by comparing the rtx/insn cost.
>
>   It generates a constant move if the constant is forced to a pseudo. There
> is one constant move if it's used only once. The combine pass can combine
> the constant move and add/ior/xor insn and eliminate the move as the insn
> cost reduces. There are multiple moves if the constant is used for several
> times. In an extended basic block, these constant moves are merged to one by
> propagation pass. The combine pass can't replace the pseudo with the constant
> as it is no cost saving.
>
>   In an extreme case, the constant is used twice in an extended basic block.
> The cost(latency) is unchanged between putting constant in pseudo and
> generating 2 insns. The dependence of instructions reduces but one more
> register is used. In other case, it should be always optimal to put constant
> in a pseudo.
>
>   This patch changes the expander of integer add and force constant to a
> pseudo when it needs 2 insn. Also a combine and split pattern is defined.

So this is one way around the lack of CSE/PRE of constant operands.  I'd
argue that a better spot for this _might_ be LRA (split the constant out if
there's a free register available), postreload-[g]cse (CSE the constants) and
then maybe cprop_hardreg to combine back single-use constants?

I'm not sure if careful constraints massaging like adding magic letters to
alternatives with constants to pessimize them for LRA, making them
more expensive than spilling the constant to a register but avoid
secondary reloads with spilling a register to the stack to make room
for the constant, is possible - but in theory a special constraint modifier
for this purpose could be invented.

Richard.

>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>
> Thanks
> Gui Haochen
>
> ChangeLog
> 2023-03-14  Haochen Gui <guihaoc@linux.ibm.com>
>
> gcc/
>         * config/rs6000/predicates.md (add_2insn_cint_operand): New predicate
>         which returns true when op is a 32-bit but not a 16-bit signed
>         integer constant.
>         * config/rs6000/rs6000.md (add<mode>3): Put the second operand into
>         register when it's a constant and need 2 add insns.
>         (*add<mode>_2insn): New insn_and_split for 2-insn add.
>
>
> patch.diff
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index a1764018545..09e59a48cd3 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -282,6 +282,13 @@ (define_predicate "s32bit_cint_operand"
>    (and (match_code "const_int")
>         (match_test "(0x80000000 + UINTVAL (op)) >> 32 == 0")))
>
> +;; Return 1 if op is a 32-bit but not 16-bit constant signed integer
> +(define_predicate "add_2insn_cint_operand"
> +  (and (match_code "const_int")
> +       (and (match_operand 0 "s32bit_cint_operand")
> +           (and (not (match_operand 0 "short_cint_operand"))
> +                (not (match_operand 0 "upper16_cint_operand"))))))
> +
>  ;; Return 1 if op is a constant 32-bit unsigned
>  (define_predicate "c32bit_cint_operand"
>    (and (match_code "const_int")
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 6011f5bf76a..dba41e3df90 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -1796,12 +1796,44 @@ (define_expand "add<mode>3"
>        /* The ordering here is important for the prolog expander.
>          When space is allocated from the stack, adding 'low' first may
>          produce a temporary deallocation (which would be bad).  */
> -      emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest)));
> -      emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low)));
> -      DONE;
> +      if (!can_create_pseudo_p ())
> +       {
> +         emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest)));
> +         emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low)));
> +         DONE;
> +       }
> +
> +      operands[2] = force_reg (<MODE>mode, operands[2]);
>      }
>  })
>
> +/* The ordering here is important for the prolog expander.
> +   When space is allocated from the stack, adding 'low' first may
> +   produce a temporary deallocation (which would be bad).  */
> +
> +(define_insn_and_split "*add<mode>_2insn"
> +  [(set (match_operand:GPR 0 "gpc_reg_operand" "=b")
> +       (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%b")
> +                 (match_operand:GPR 2 "add_2insn_cint_operand" "n")))]
> +  "!TARGET_PREFIXED"
> +  "#"
> +  "&& 1"
> +  [(set (match_dup 0)
> +       (plus:GPR (match_dup 1)
> +                 (match_dup 3)))
> +   (set (match_dup 0)
> +       (plus:GPR (match_dup 0)
> +                 (match_dup 4)))]
> +{
> +  HOST_WIDE_INT val = INTVAL (operands[2]);
> +  HOST_WIDE_INT low = sext_hwi (val, 16);
> +  HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode);
> +
> +  operands[3] = GEN_INT (rest);
> +  operands[4] = GEN_INT (low);
> +}
> +  [(set_attr "length" "8")])
> +
>  (define_insn "*add<mode>3"
>    [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
>         (plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")
HAO CHEN GUI March 16, 2023, 9:04 a.m. UTC | #2
Hi Richard,

在 2023/3/16 15:57, Richard Biener 写道:
> So this is one way around the lack of CSE/PRE of constant operands.  I'd
> argue that a better spot for this _might_ be LRA (split the constant out if
> there's a free register available), postreload-[g]cse (CSE the constants) and
> then maybe cprop_hardreg to combine back single-use constants?
> 
> I'm not sure if careful constraints massaging like adding magic letters to
> alternatives with constants to pessimize them for LRA, making them
> more expensive than spilling the constant to a register but avoid
> secondary reloads with spilling a register to the stack to make room
> for the constant, is possible - but in theory a special constraint modifier
> for this purpose could be invented.

Thanks so much for your advice.

cse/gcse doesn't take cost of constant set (the def insn of the constant) into
consideration. So it won't replace the register with a constant as it costs 1
insn with the register and costs 2 insn with the constant. Finally, the single-
use constants can't be back to 2 insn. Not sure if I understand it correctly.
Looking forward to your advice.

Thanks
Gui Haochen
Richard Biener March 16, 2023, 10:36 a.m. UTC | #3
On Thu, Mar 16, 2023 at 10:04 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>
> Hi Richard,
>
> 在 2023/3/16 15:57, Richard Biener 写道:
> > So this is one way around the lack of CSE/PRE of constant operands.  I'd
> > argue that a better spot for this _might_ be LRA (split the constant out if
> > there's a free register available), postreload-[g]cse (CSE the constants) and
> > then maybe cprop_hardreg to combine back single-use constants?
> >
> > I'm not sure if careful constraints massaging like adding magic letters to
> > alternatives with constants to pessimize them for LRA, making them
> > more expensive than spilling the constant to a register but avoid
> > secondary reloads with spilling a register to the stack to make room
> > for the constant, is possible - but in theory a special constraint modifier
> > for this purpose could be invented.
>
> Thanks so much for your advice.
>
> cse/gcse doesn't take cost of constant set (the def insn of the constant) into
> consideration. So it won't replace the register with a constant as it costs 1
> insn with the register and costs 2 insn with the constant.

I think it does (and should) cost the constant set (IIRC we had some
improvements
there, or at least proposed, during this stage1).  But sure - this is why your
"trick" works.

> Finally, the single-
> use constants can't be back to 2 insn.

And that's because of the issue you point out above?

> Not sure if I understand it correctly.
> Looking forward to your advice.

My main point is that CSEing constants has impacts on register pressure
and thus should probably be done after or within register allocation.  RTL
expansion itself is probably a bad time to pro-actively split out constants
even more if, as you say, nothing puts them back.

Richard.

> Thanks
> Gui Haochen
HAO CHEN GUI March 17, 2023, 2:48 a.m. UTC | #4
Hi Richard,

在 2023/3/16 18:36, Richard Biener 写道:
> On Thu, Mar 16, 2023 at 10:04 AM HAO CHEN GUI <guihaoc@linux.ibm.com> wrote:
>>
>> Hi Richard,
>>
>> 在 2023/3/16 15:57, Richard Biener 写道:
>>> So this is one way around the lack of CSE/PRE of constant operands.  I'd
>>> argue that a better spot for this _might_ be LRA (split the constant out if
>>> there's a free register available), postreload-[g]cse (CSE the constants) and
>>> then maybe cprop_hardreg to combine back single-use constants?
>>>
>>> I'm not sure if careful constraints massaging like adding magic letters to
>>> alternatives with constants to pessimize them for LRA, making them
>>> more expensive than spilling the constant to a register but avoid
>>> secondary reloads with spilling a register to the stack to make room
>>> for the constant, is possible - but in theory a special constraint modifier
>>> for this purpose could be invented.
>>
>> Thanks so much for your advice.
>>
>> cse/gcse doesn't take cost of constant set (the def insn of the constant) into
>> consideration. So it won't replace the register with a constant as it costs 1
>> insn with the register and costs 2 insn with the constant.
> 
> I think it does (and should) cost the constant set (IIRC we had some
> improvements
> there, or at least proposed, during this stage1).  But sure - this is why your
> "trick" works.
> 
It's doable if post-reload gsc costs the constant set. I will draft a patch to
test it.

>> Finally, the single-
>> use constants can't be back to 2 insn.
> 
> And that's because of the issue you point out above?
No. my original concern is the constant can't be back. If post-reload gsc doen't
cost the constant set, the insn with a register always cost less than two insns
with immediates. Commonly the constant set itself costs 2 insn also.
> 
>> Not sure if I understand it correctly.
>> Looking forward to your advice.
> 
> My main point is that CSEing constants has impacts on register pressure
> and thus should probably be done after or within register allocation.  RTL
> expansion itself is probably a bad time to pro-actively split out constants
> even more if, as you say, nothing puts them back.
> 
I agree. Thank a lot.
> Richard.
> 
>> Thanks
>> Gui Haochen

Gui Haochen
HAO CHEN GUI March 22, 2023, 3:08 a.m. UTC | #5
Hi Richard,

在 2023/3/16 15:57, Richard Biener 写道:
> I'm not sure if careful constraints massaging like adding magic letters to
> alternatives with constants to pessimize them for LRA, making them
> more expensive than spilling the constant to a register but avoid
> secondary reloads with spilling a register to the stack to make room
> for the constant, is possible - but in theory a special constraint modifier
> for this purpose could be invented.

I have made some tests on constraint modifiers. They all seems not work.
By checking the code, I found that the no reloading is always better than
reloading in LRA. So there is no way to spill the constant to register in
LRA.

      /* If this alternative can be made to work by reloading, and it
         needs less reloading than the others checked so far, record
         it as the chosen goal for reloading.  */
      if ((best_losers != 0 && losers == 0)
          || (((best_losers == 0 && losers == 0)
               || (best_losers != 0 && losers != 0))
              && (best_overall > overall
                  || (best_overall == overall
	 ... // set goal_alt

Looking forward to your advice.

Thanks
Gui Haochen
diff mbox series

Patch

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index a1764018545..09e59a48cd3 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -282,6 +282,13 @@  (define_predicate "s32bit_cint_operand"
   (and (match_code "const_int")
        (match_test "(0x80000000 + UINTVAL (op)) >> 32 == 0")))

+;; Return 1 if op is a 32-bit but not 16-bit constant signed integer
+(define_predicate "add_2insn_cint_operand"
+  (and (match_code "const_int")
+       (and (match_operand 0 "s32bit_cint_operand")
+	    (and (not (match_operand 0 "short_cint_operand"))
+		 (not (match_operand 0 "upper16_cint_operand"))))))
+
 ;; Return 1 if op is a constant 32-bit unsigned
 (define_predicate "c32bit_cint_operand"
   (and (match_code "const_int")
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6011f5bf76a..dba41e3df90 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -1796,12 +1796,44 @@  (define_expand "add<mode>3"
       /* The ordering here is important for the prolog expander.
 	 When space is allocated from the stack, adding 'low' first may
 	 produce a temporary deallocation (which would be bad).  */
-      emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest)));
-      emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low)));
-      DONE;
+      if (!can_create_pseudo_p ())
+	{
+	  emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (rest)));
+	  emit_insn (gen_add<mode>3 (operands[0], tmp, GEN_INT (low)));
+	  DONE;
+	}
+
+      operands[2] = force_reg (<MODE>mode, operands[2]);
     }
 })

+/* The ordering here is important for the prolog expander.
+   When space is allocated from the stack, adding 'low' first may
+   produce a temporary deallocation (which would be bad).  */
+
+(define_insn_and_split "*add<mode>_2insn"
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=b")
+	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%b")
+		  (match_operand:GPR 2 "add_2insn_cint_operand" "n")))]
+  "!TARGET_PREFIXED"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(plus:GPR (match_dup 1)
+		  (match_dup 3)))
+   (set (match_dup 0)
+	(plus:GPR (match_dup 0)
+		  (match_dup 4)))]
+{
+  HOST_WIDE_INT val = INTVAL (operands[2]);
+  HOST_WIDE_INT low = sext_hwi (val, 16);
+  HOST_WIDE_INT rest = trunc_int_for_mode (val - low, <MODE>mode);
+
+  operands[3] = GEN_INT (rest);
+  operands[4] = GEN_INT (low);
+}
+  [(set_attr "length" "8")])
+
 (define_insn "*add<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r,r,r,r")
 	(plus:GPR (match_operand:GPR 1 "gpc_reg_operand" "%r,b,b,b")