diff mbox series

[V4,1/2] split complicate 64bit constant to memory

Message ID 20240611083727.2642461-1-guojiufu@linux.ibm.com
State New
Headers show
Series [V4,1/2] split complicate 64bit constant to memory | expand

Commit Message

Jiufu Guo June 11, 2024, 8:37 a.m. UTC
Hi,

Sometimes, a complicated constant is built via 3(or more)
instructions.  Generally speaking, it would not be as fast
as loading it from the constant pool (as the discussions in
PR63281):
"ld" is one instruction.  If consider "address/toc" adjust,
we may count it as 2 instructions. And "pld" may need fewer
cycles.

As testing(SPEC2017), it could get better/stable runtime
if set the threshold as "> 2" (compare with "> 3").

As known, because the constant is load from memory by this
patch,  so this functionality may affect the cache missing.
Also there may be possible side effects on icach. While,
IMHO, this patch would be still do the right thing.

Compare with the previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639493.html
This version adds safe guard for 64bit, and adds one more test case.
And a following patch is drafted to support this functionality for
32bit(-m32) with -mpowerpc64.

Boostrap & regtest pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff (Jiufu Guo)

	PR target/63281

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
	complicate constant to memory.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/const_anchors.c: Update to test final-rtl.
	* gcc.target/powerpc/parall_5insn_const.c: Update to keep original test
	point.
	* gcc.target/powerpc/pr106550.c: Likewise..
	* gcc.target/powerpc/pr106550_1.c: Likewise.
	* gcc.target/powerpc/pr87870.c: Update according to latest behavior.
	* gcc.target/powerpc/pr93012.c: Likewise.
	* gcc.target/powerpc/pr63281.c: New test.
---
 gcc/config/rs6000/rs6000.cc                    | 15 +++++++++++++++
 .../gcc.target/powerpc/const_anchors.c         |  5 +++--
 .../gcc.target/powerpc/parall_5insn_const.c    | 11 +++++++++--
 gcc/testsuite/gcc.target/powerpc/pr106550.c    | 12 +++++++++---
 gcc/testsuite/gcc.target/powerpc/pr106550_1.c  | 18 ++++++++++--------
 gcc/testsuite/gcc.target/powerpc/pr63281.c     | 11 +++++++++++
 gcc/testsuite/gcc.target/powerpc/pr87870.c     |  5 ++++-
 gcc/testsuite/gcc.target/powerpc/pr93012.c     |  6 +++++-
 8 files changed, 66 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c

Comments

Segher Boessenkool June 11, 2024, 4:59 p.m. UTC | #1
Hi!

On Tue, Jun 11, 2024 at 04:37:25PM +0800, Jiufu Guo wrote:
> Sometimes, a complicated constant is built via 3(or more)
> instructions.  Generally speaking, it would not be as fast
> as loading it from the constant pool (as the discussions in
> PR63281):
> "ld" is one instruction.  If consider "address/toc" adjust,
> we may count it as 2 instructions. And "pld" may need fewer
> cycles.

Yeah, three insns to build up a constant always was about as fast/slow
as loading a constant from memory.  When you need two related constants
loading from memory is slower, so we preferred building them up.

> As testing(SPEC2017), it could get better/stable runtime
> if set the threshold as "> 2" (compare with "> 3").

Interesting!  About how much speedup did you see?  0.1%?

> As known, because the constant is load from memory by this
> patch,  so this functionality may affect the cache missing.
> Also there may be possible side effects on icach. While,
> IMHO, this patch would be still do the right thing.

Yeah, but almost every codegen patch has *some* icache effect.  Your
benchmarks show a net positive effect, that is all that matters in the
end.

> gcc/ChangeLog:
> 
> 	* config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
> 	complicate constant to memory.

(complicated)

And don't write "update", every patch is an update :-)

There are more conditions here, mention those as well?

But, why does base_reg_operand matter?  And, why for TARGET_ELF only?
That is the only place you tested I'm sure, but  make an educated guess
for the rest, why would it not be useful for other binary formats?

> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index e4dc629ddcc..f448df289a0 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10240,6 +10240,21 @@ rs6000_emit_set_const (rtx dest, rtx source)
>  	  c = sext_hwi (c, 32);
>  	  emit_move_insn (lo, GEN_INT (c));
>  	}
> +
> +      else if (base_reg_operand (dest, mode) && TARGET_64BIT
> +	       && TARGET_ELF && num_insns_constant (source, mode) > 2)
> +	{
> +	  rtx sym = force_const_mem (mode, source);
> +	  if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
> +	      && use_toc_relative_ref (XEXP (sym, 0), mode))
> +	    {
> +	      rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
> +	      sym = gen_const_mem (mode, toc);
> +	      set_mem_alias_set (sym, get_TOC_alias_set ());
> +	    }
> +
> +	  emit_move_insn (dest, sym);
> +	}
>        else
>  	rs6000_emit_set_long_const (dest, c);
>        break;

Is there no utility function to put constants like this in memory?
Like, "output_toc" for example?

> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> index 542e2674b12..f33c9a83f5e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target has_arch_ppc64 } } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
>  
>  #define C1 0x2351847027482577ULL
>  #define C2 0x2351847027482578ULL
> @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long long b)
>      *a++ = C2;
>  }
>  
> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
> +/* Checking "final" instead checking "asm" output to avoid noise.  */
> +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */

So, in the RTL dump it finds a named pattern adddi3, while in the asm
you find random other addi insns?  Hardly worth a comment in the
testcase itself, but heh.  The point is that you check for the RTL
pattern name instead of the machine insn.  The compiler pass you check
in isn't changed even!

> --- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> @@ -6,11 +6,18 @@
>  /* { dg-final { scan-assembler-times {\mori\M} 4 } } */
>  /* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
>  
> +/* The below macro helps to avoid loading constant from memory.  */
> +#define CONST_AVOID_BASE_REG(DEST, CST)                                        \
> +  {                                                                            \
> +    register long long d asm ("r0") = CST;                                     \
> +    asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));                         \
> +  }

This needs an actual changelog, not just "update".  Something as simple
as "New macro to force constant into a reg" is fine, but more than
"update" :-)

> +/* The below macro helps to avoid loading constant from memory.  */
> +#define CONST_AVOID_BASE_REG(DEST, CST)                                        \
> +  {                                                                            \
> +    register long long d asm ("r0") = CST;                                     \
> +    asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));                         \
> +  }

"Forces the constant into a register", yes.

> --- a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
> @@ -4,17 +4,19 @@
>  /* { dg-options "-O2 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */
>  /* force the constant splitter run after RA: -fdisable-rtl-split1.  */
>  
> +/* The below marco helps to avoid using paddi and avoid loading from memory. */

(macro)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
> @@ -0,0 +1,11 @@
> +/* Check loading constant from memory pool.  */
> +/* { dg-options "-O2 -mpowerpc64" } */
> +
> +void
> +foo (unsigned long long *a)
> +{
> +  *a++ = 0x2351847027482577ULL;
> +}
> +
> +/* { dg-final { scan-assembler-times {\mp?ld\M} 1 { target { lp64 } } } } */

Why target lp64 only?  You have -mpowerpc64 already, does that not make
us use ld here always?


Segher
Jiufu Guo June 12, 2024, 3:14 a.m. UTC | #2
Hi Segher,

Thanks a lot for your review!

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Tue, Jun 11, 2024 at 04:37:25PM +0800, Jiufu Guo wrote:
>> Sometimes, a complicated constant is built via 3(or more)
>> instructions.  Generally speaking, it would not be as fast
>> as loading it from the constant pool (as the discussions in
>> PR63281):
>> "ld" is one instruction.  If consider "address/toc" adjust,
>> we may count it as 2 instructions. And "pld" may need fewer
>> cycles.
>
> Yeah, three insns to build up a constant always was about as fast/slow
Yeah.
> as loading a constant from memory.  When you need two related constants
> loading from memory is slower, so we preferred building them up.
You may talking about the functionality "const_anchor" which
is enabled in cse1 for two related constants.
Like the test case: const_anchors.c, the first one would be
loaded from memory if it is complicated, and the second would
be computed based on the first one. 

>
>> As testing(SPEC2017), it could get better/stable runtime
>> if set the threshold as "> 2" (compare with "> 3").
>
> Interesting!  About how much speedup did you see?  0.1%?
I did not see the proven speedup from SPEC2017.
On some benchs(including perlbench), I saw some runs with
speedup >1%.  But with deeper checking, it would not directly
benefit of this functionality, since it was not hit by the hot
functions; and the changed constants do not contribute the
runtime visibly.

Saying "better/stable", because with different bases(interval
weeks/months), ">2" introduce less waving; and with different
options (-O2, -O3, -Ofast), ">2" seems more stable.

>
>> As known, because the constant is load from memory by this
>> patch,  so this functionality may affect the cache missing.
>> Also there may be possible side effects on icach. While,
>> IMHO, this patch would be still do the right thing.
>
> Yeah, but almost every codegen patch has *some* icache effect.  Your
> benchmarks show a net positive effect, that is all that matters in the
> end.
Yes. :)
As using the small bench (like the cases mentioned in comment20
of PR63281), the result shows the prefer choise would be
buiding by "2 insns" > "loading from rodata/toc" > "3 insns",
and building by "4/5insn" would be the last choise.

I mentioned this here, because in one recent run, for one benchmark,
I saw 'caching missing' raising, and 'cache missing' was waving
with size of the code (not directly related to this patch.)

>
>> gcc/ChangeLog:
>> 
>> 	* config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
>> 	complicate constant to memory.
>
> (complicated)
>
> And don't write "update", every patch is an update :-)
>
> There are more conditions here, mention those as well?

Thanks! This patch only split complicated constants(>2insns)
to memory for -m64 under TARGET_ELF.
"Split complicate constant to memory for -m64."

>
> But, why does base_reg_operand matter?  And, why for TARGET_ELF only?
> That is the only place you tested I'm sure, but  make an educated guess
> for the rest, why would it not be useful for other binary formats?
Thanks for these very insightful questions!

"base_reg_operand" is used to avoid to instruction like "r0=[OFF(r0)]"
for the case like "r0=0xXXXXX" (I would have a recheck to see if
this is really needed before RA.)

Yes, I only tested with ELF. I add ELF checking, just because not
sure how profitable it is on other targets.
In theory, this functionality would be useful for other targets,
at least for the 4/5 insns (witout the aspect of parallel
execution in the binary).

There is another heuristic code in rs6000_emit_move, it is using:
```(TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)```
This does not check ELF. Levearging this code, it would be ok to
accept all targets.

This heuristic exist a long time ago, I guess it would be before
ppc64le (even more history), it may more for -m32. For -m64, as
I tested, ">2" would be suitable.

>
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index e4dc629ddcc..f448df289a0 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -10240,6 +10240,21 @@ rs6000_emit_set_const (rtx dest, rtx source)
>>  	  c = sext_hwi (c, 32);
>>  	  emit_move_insn (lo, GEN_INT (c));
>>  	}
>> +
>> +      else if (base_reg_operand (dest, mode) && TARGET_64BIT
>> +	       && TARGET_ELF && num_insns_constant (source, mode) > 2)
>> +	{
>> +	  rtx sym = force_const_mem (mode, source);
>> +	  if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
>> +	      && use_toc_relative_ref (XEXP (sym, 0), mode))
>> +	    {
>> +	      rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
>> +	      sym = gen_const_mem (mode, toc);
>> +	      set_mem_alias_set (sym, get_TOC_alias_set ());
>> +	    }
>> +
>> +	  emit_move_insn (dest, sym);
>> +	}
>>        else
>>  	rs6000_emit_set_long_const (dest, c);
>>        break;
>
> Is there no utility function to put constants like this in memory?
> Like, "output_toc" for example?
I guess you mean "force_const_mem" which puts the constant value to
constant pool and create "LC" label for it.

With "emit_move_insn (dest, force_const_mem (mode, source));" could
implement the part of this functionality.   While adding code
"if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))..." would save one
more instructions.  So, the code is drafted like this patch.

>
>> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> index 542e2674b12..f33c9a83f5e 100644
>> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
>> @@ -1,5 +1,5 @@
>>  /* { dg-do compile { target has_arch_ppc64 } } */
>> -/* { dg-options "-O2" } */
>> +/* { dg-options "-O2 -fdump-rtl-final" } */
>>  
>>  #define C1 0x2351847027482577ULL
>>  #define C2 0x2351847027482578ULL
>> @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long long b)
>>      *a++ = C2;
>>  }
>>  
>> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
>> +/* Checking "final" instead checking "asm" output to avoid noise.  */
>> +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
>
> So, in the RTL dump it finds a named pattern adddi3, while in the asm
> you find random other addi insns?  Hardly worth a comment in the
Yes, other "addi"s for memory address occur in asm.
Thanks for your kind review.  I would remove this no-go comment.

> testcase itself, but heh.  The point is that you check for the RTL
> pattern name instead of the machine insn.  The compiler pass you check
> in isn't changed even!
>
>> --- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
>> @@ -6,11 +6,18 @@
>>  /* { dg-final { scan-assembler-times {\mori\M} 4 } } */
>>  /* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
>>  
>> +/* The below macro helps to avoid loading constant from memory.  */
>> +#define CONST_AVOID_BASE_REG(DEST, CST)                                        \
>> +  {                                                                            \
>> +    register long long d asm ("r0") = CST;                                     \
>> +    asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));                         \
>> +  }
>
> This needs an actual changelog, not just "update".  Something as simple
> as "New macro to force constant into a reg" is fine, but more than
> "update" :-)
Thanks a lot for point this out!
>
>> +/* The below macro helps to avoid loading constant from memory.  */
>> +#define CONST_AVOID_BASE_REG(DEST, CST)                                        \
>> +  {                                                                            \
>> +    register long long d asm ("r0") = CST;                                     \
>> +    asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));                         \
>> +  }
>
> "Forces the constant into a register", yes.
OK, thanks.
>
>> --- a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
>> @@ -4,17 +4,19 @@
>>  /* { dg-options "-O2 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */
>>  /* force the constant splitter run after RA: -fdisable-rtl-split1.  */
>>  
>> +/* The below marco helps to avoid using paddi and avoid loading from memory. */
>
> (macro)
OK.
>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
>> @@ -0,0 +1,11 @@
>> +/* Check loading constant from memory pool.  */
>> +/* { dg-options "-O2 -mpowerpc64" } */
>> +
>> +void
>> +foo (unsigned long long *a)
>> +{
>> +  *a++ = 0x2351847027482577ULL;
>> +}
>> +
>> +/* { dg-final { scan-assembler-times {\mp?ld\M} 1 { target { lp64 } } } } */
>
> Why target lp64 only?  You have -mpowerpc64 already, does that not make
> us use ld here always?
For -m32, we need other code to support it.  And the patch for
"-m32 -mpowerpc" is the second patch of this series.  Here 'lp64'
is safe guard for "-m64", in case there is concerns for "-m32" patch.

Thanks again for your review!!

BR,
Jeff(Jiufu) Guo.

>
>
> Segher
diff mbox series

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index e4dc629ddcc..f448df289a0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -10240,6 +10240,21 @@  rs6000_emit_set_const (rtx dest, rtx source)
 	  c = sext_hwi (c, 32);
 	  emit_move_insn (lo, GEN_INT (c));
 	}
+
+      else if (base_reg_operand (dest, mode) && TARGET_64BIT
+	       && TARGET_ELF && num_insns_constant (source, mode) > 2)
+	{
+	  rtx sym = force_const_mem (mode, source);
+	  if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
+	      && use_toc_relative_ref (XEXP (sym, 0), mode))
+	    {
+	      rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
+	      sym = gen_const_mem (mode, toc);
+	      set_mem_alias_set (sym, get_TOC_alias_set ());
+	    }
+
+	  emit_move_insn (dest, sym);
+	}
       else
 	rs6000_emit_set_long_const (dest, c);
       break;
diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
index 542e2674b12..f33c9a83f5e 100644
--- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
+++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
@@ -1,5 +1,5 @@ 
 /* { dg-do compile { target has_arch_ppc64 } } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -fdump-rtl-final" } */
 
 #define C1 0x2351847027482577ULL
 #define C2 0x2351847027482578ULL
@@ -17,4 +17,5 @@  void __attribute__ ((noinline)) foo1 (long long *a, long long b)
     *a++ = C2;
 }
 
-/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
+/* Checking "final" instead checking "asm" output to avoid noise.  */
+/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
index e3a9a7264cf..ff745c730f3 100644
--- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
+++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
@@ -6,11 +6,18 @@ 
 /* { dg-final { scan-assembler-times {\mori\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
 
+/* The below macro helps to avoid loading constant from memory.  */
+#define CONST_AVOID_BASE_REG(DEST, CST)                                        \
+  {                                                                            \
+    register long long d asm ("r0") = CST;                                     \
+    asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));                         \
+  }
+
 void __attribute__ ((noinline)) foo (unsigned long long *a)
 {
   /* 2 lis + 2 ori + 1 rldimi for each constant.  */
-  *a++ = 0x800aabcdc167fa16ULL;
-  *a++ = 0x7543a876867f616ULL;
+  CONST_AVOID_BASE_REG(*a++, 0x800aabcdc167fa16ULL);
+  CONST_AVOID_BASE_REG(*a++, 0x7543a876867f616ULL);  
 }
 
 long long A[] = {0x800aabcdc167fa16ULL, 0x7543a876867f616ULL};
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550.c b/gcc/testsuite/gcc.target/powerpc/pr106550.c
index 92b76ac8811..e5e38b4ea82 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr106550.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550.c
@@ -2,13 +2,19 @@ 
 /* { dg-options "-O2 -mdejagnu-cpu=power10" } */
 /* { dg-require-effective-target has_arch_ppc64 } */
 
+/* The below macro helps to avoid loading constant from memory.  */
+#define CONST_AVOID_BASE_REG(DEST, CST)                                        \
+  {                                                                            \
+    register long long d asm ("r0") = CST;                                     \
+    asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));                         \
+  }
+
 void
 foo (unsigned long long *a)
 {
-  *a++ = 0x020805006106003; /* pli+pli+rldimi */
-  *a++ = 0x2351847027482577;/* pli+pli+rldimi */  
+  CONST_AVOID_BASE_REG (*a++, 0x020805006106003ULL);  /* pli+pli+rldimi */
+  CONST_AVOID_BASE_REG (*a++, 0x2351847027482577ULL); /* pli+pli+rldimi */
 }
 
 /* { dg-final { scan-assembler-times {\mpli\M} 4 } } */
 /* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
-
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
index 5ab40d71a56..66539ee7cf0 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr106550_1.c
@@ -4,17 +4,19 @@ 
 /* { dg-options "-O2 -mdejagnu-cpu=power10 -fdisable-rtl-split1" } */
 /* force the constant splitter run after RA: -fdisable-rtl-split1.  */
 
+/* The below marco helps to avoid using paddi and avoid loading from memory. */
+#define CONST_AVOID_BASE_REG(DEST, CST)                                        \
+  {                                                                            \
+    register long long d asm ("r0") = CST;                                     \
+    asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));                         \
+  }
+
 void
 foo (unsigned long long *a)
 {
-  /* Test oris/ori is used where paddi does not work with 'r0'. */
-  register long long d asm("r0") = 0x1245abcef9240dec; /* pli+sldi+oris+ori */
-  long long n;
-  asm("cntlzd %0, %1" : "=r"(n) : "r"(d));
-  *a++ = n;
-
-  *a++ = 0x235a8470a7480000ULL; /* pli+sldi+oris */
-  *a++ = 0x23a184700000b677ULL; /* pli+sldi+ori */
+  CONST_AVOID_BASE_REG (*a++, 0x1245abcef9240dec); /* pli+sldi+oris+ori */
+  CONST_AVOID_BASE_REG (*a++, 0x235a8470a7480000ULL); /* pli+sldi+oris */
+  CONST_AVOID_BASE_REG (*a++, 0x23a184700000b677ULL); /* pli+sldi+ori */
 }
 
 /* { dg-final { scan-assembler-times {\mpli\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c b/gcc/testsuite/gcc.target/powerpc/pr63281.c
new file mode 100644
index 00000000000..9763a7181fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
@@ -0,0 +1,11 @@ 
+/* Check loading constant from memory pool.  */
+/* { dg-options "-O2 -mpowerpc64" } */
+
+void
+foo (unsigned long long *a)
+{
+  *a++ = 0x2351847027482577ULL;
+}
+
+/* { dg-final { scan-assembler-times {\mp?ld\M} 1 { target { lp64 } } } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/pr87870.c b/gcc/testsuite/gcc.target/powerpc/pr87870.c
index d2108ac3386..09b2e8de901 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr87870.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr87870.c
@@ -25,4 +25,7 @@  test3 (void)
   return ((__int128)0xdeadbeefcafebabe << 64) | 0xfacefeedbaaaaaad;
 }
 
-/* { dg-final { scan-assembler-not {\mld\M} } } */
+/* test3 is using "ld" to load the value to r3 and r4. So there are 2 'ld's
+   test0, test1 and test2 are using "li", then check 6 'li's.  */
+/* { dg-final { scan-assembler-times {\mp?ld\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mli\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr93012.c b/gcc/testsuite/gcc.target/powerpc/pr93012.c
index 4f764d0576f..660fb0dddfa 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr93012.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
@@ -10,4 +10,8 @@  unsigned long long mskh1() { return 0xffff9234ffff9234ULL; }
 unsigned long long mskl1() { return 0x2bcdffff2bcdffffULL; }
 unsigned long long mskse() { return 0xffff1234ffff1234ULL; }
 
-/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 7 { target has_arch_pwr10 } } } */
+
+/* 4 complicated constants can be loaded from pool.  */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 3 { target { ! has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mld\M} 4 { target { ! has_arch_pwr10 } } } } */