diff mbox

, Tweak PowerPC movdi constraints

Message ID 20161118203838.GA16377@ibm-tiger.the-meissners.org
State New
Headers show

Commit Message

Michael Meissner Nov. 18, 2016, 8:38 p.m. UTC
This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
constraints instead of "?*".  This allows the register allocator to more often
allocate DImode to a floating point/vector register when it is desirable to do
so.

I did a full Spec 2006 run with this patch installed, and most of the
benchmarks were neutral in terms of performance.  The 482.sphinx3 benchmark had
a 2.5% performance boost with these patches.  There were no benchmarks that
regressed with this patch.

I built bootstrap compilers and did make check with no regressions on:
    1)	Little endian power8, --with-cpu=power8
    2)	Big endian power8, --with-cpu=power8 (no 32-bit support)
    3)	Big endian power7, --with-cpu=power7 (both 32/64-bit support)

Can I check this patch into the trunk?

[gcc]
2016-11-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* config/rs6000/rs6000.md (movdi_internal32): Change FPR/VSX "?*"
	load/store constraints to "^" if the instruction allows d-form
	addressing or "$" if it only allows x-form addressing.  Change
	FPR/VSX move constraints to "^".
	(movdi_internal64): Likewise.

[gcc/testsuite]
2016-11-18  Michael Meissner  <meissner@linux.vnet.ibm.com>

	* gcc.target/powerpc/ppc-round2.c: Allow XSCVDPSXWS and XSCVDPUXWS
	to be generated instead of FCTIWUZ or FCTIWZ.

Comments

Segher Boessenkool Nov. 18, 2016, 10:43 p.m. UTC | #1
On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> constraints instead of "?*".  This allows the register allocator to more often
> allocate DImode to a floating point/vector register when it is desirable to do
> so.

It also changes some plain "?" to "^" or "$" or even "*" (which cannot
work for multi-character constraints, it skips one character, not one
constraint).  Wrong version of the patch?

> I built bootstrap compilers and did make check with no regressions on:
>     1)	Little endian power8, --with-cpu=power8
>     2)	Big endian power8, --with-cpu=power8 (no 32-bit support)
>     3)	Big endian power7, --with-cpu=power7 (both 32/64-bit support)

Could you also test with reload please?  Just LE is enough I guess.
We'd like to keep reload working for GCC 7 at least, and these cost
prefixes tend to break mov patterns :-/


Segher
Michael Meissner Nov. 18, 2016, 10:52 p.m. UTC | #2
On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> > constraints instead of "?*".  This allows the register allocator to more often
> > allocate DImode to a floating point/vector register when it is desirable to do
> > so.
> 
> It also changes some plain "?" to "^" or "$" or even "*" (which cannot
> work for multi-character constraints, it skips one character, not one
> constraint).  Wrong version of the patch?

Note, if '*' does not work with multi-character prefixes, that is a bug.  All
of rs6000.md assumes that ?*wa means that the register allocator will not
consider VSX vector registers for when calculating register preferences.

> > I built bootstrap compilers and did make check with no regressions on:
> >     1)	Little endian power8, --with-cpu=power8
> >     2)	Big endian power8, --with-cpu=power8 (no 32-bit support)
> >     3)	Big endian power7, --with-cpu=power7 (both 32/64-bit support)
> 
> Could you also test with reload please?  Just LE is enough I guess.
> We'd like to keep reload working for GCC 7 at least, and these cost
> prefixes tend to break mov patterns :-/

Argh, I guess you are right, but then if reload doesn't work, I will likely
submit the patch where there are three different movdi's (one for 32-bit
without the change, one for 64-bit with reload, and one for 64-bit with lra).
I would prefer not to do that.
Segher Boessenkool Nov. 18, 2016, 11:07 p.m. UTC | #3
On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote:
> On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> > On Fri, Nov 18, 2016 at 03:38:38PM -0500, Michael Meissner wrote:
> > > This patch tweaks the movdi constraints for the PowerPC to use "^" or "$"
> > > constraints instead of "?*".  This allows the register allocator to more often
> > > allocate DImode to a floating point/vector register when it is desirable to do
> > > so.
> > 
> > It also changes some plain "?" to "^" or "$" or even "*" (which cannot
> > work for multi-character constraints, it skips one character, not one
> > constraint).  Wrong version of the patch?
> 
> Note, if '*' does not work with multi-character prefixes, that is a bug.  All
> of rs6000.md assumes that ?*wa means that the register allocator will not
> consider VSX vector registers for when calculating register preferences.

The documentation is out of date.

From ira-costs.c:

          /* Scan all the constraint letters.  See if the operand
             matches any of the constraints.  Collect the valid
             register classes and see if this operand accepts
             memory.  */
          while ((c = *p))
            {
              switch (c)
                {
                case '*':
                  /* Ignore the next letter for this pass.  */
                  c = *++p;
                  break;

and then 83 lines later:

              p += CONSTRAINT_LEN (c, p);
              if (c == ',')
                break;
            }

so it does in fact work.

Neither the patch description nor the changelog says you are doing these
changes though.

> > > I built bootstrap compilers and did make check with no regressions on:
> > >     1)	Little endian power8, --with-cpu=power8
> > >     2)	Big endian power8, --with-cpu=power8 (no 32-bit support)
> > >     3)	Big endian power7, --with-cpu=power7 (both 32/64-bit support)
> > 
> > Could you also test with reload please?  Just LE is enough I guess.
> > We'd like to keep reload working for GCC 7 at least, and these cost
> > prefixes tend to break mov patterns :-/
> 
> Argh, I guess you are right, but then if reload doesn't work, I will likely
> submit the patch where there are three different movdi's (one for 32-bit
> without the change, one for 64-bit with reload, and one for 64-bit with lra).
> I would prefer not to do that.

Let's hope it just works :-)


Segher
Michael Meissner Nov. 21, 2016, 6:27 p.m. UTC | #4
On Fri, Nov 18, 2016 at 05:07:21PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote:
> > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> > > Could you also test with reload please?  Just LE is enough I guess.
> > > We'd like to keep reload working for GCC 7 at least, and these cost
> > > prefixes tend to break mov patterns :-/
> > 
> > Argh, I guess you are right, but then if reload doesn't work, I will likely
> > submit the patch where there are three different movdi's (one for 32-bit
> > without the change, one for 64-bit with reload, and one for 64-bit with lra).
> > I would prefer not to do that.
> 
> Let's hope it just works :-)

I did test it over the weekend.

29 of the 30 spec 2006 benchmarks currently build with reload (gamess fails).
The same 29 build and run with the new patch.  Like the patch under LRA, there
are no regressions in performance, and one FP benchmark is faster.

Under LRA, sphinx3 is 2.5% faster (compared to LRA without the patch).

Under reload, sphinx3 is roughly the same performance, but calculix is 3.8%
faster.

Can I apply the patch?
Segher Boessenkool Nov. 21, 2016, 6:51 p.m. UTC | #5
On Mon, Nov 21, 2016 at 01:27:59PM -0500, Michael Meissner wrote:
> On Fri, Nov 18, 2016 at 05:07:21PM -0600, Segher Boessenkool wrote:
> > On Fri, Nov 18, 2016 at 05:52:12PM -0500, Michael Meissner wrote:
> > > On Fri, Nov 18, 2016 at 04:43:40PM -0600, Segher Boessenkool wrote:
> > > > Could you also test with reload please?  Just LE is enough I guess.
> > > > We'd like to keep reload working for GCC 7 at least, and these cost
> > > > prefixes tend to break mov patterns :-/
> > > 
> > > Argh, I guess you are right, but then if reload doesn't work, I will likely
> > > submit the patch where there are three different movdi's (one for 32-bit
> > > without the change, one for 64-bit with reload, and one for 64-bit with lra).
> > > I would prefer not to do that.
> > 
> > Let's hope it just works :-)
> 
> I did test it over the weekend.
> 
> 29 of the 30 spec 2006 benchmarks currently build with reload (gamess fails).
> The same 29 build and run with the new patch.  Like the patch under LRA, there
> are no regressions in performance, and one FP benchmark is faster.
> 
> Under LRA, sphinx3 is 2.5% faster (compared to LRA without the patch).
> 
> Under reload, sphinx3 is roughly the same performance, but calculix is 3.8%
> faster.

Great, thanks for testing.

> Can I apply the patch?

Okay, if you change the changelog to say what the patch actually does ;-)
And please watch for fallout.


Segher
Michael Meissner Nov. 21, 2016, 10:04 p.m. UTC | #6
On Mon, Nov 21, 2016 at 12:51:38PM -0600, Segher Boessenkool wrote:
> 
> Okay, if you change the changelog to say what the patch actually does ;-)
> And please watch for fallout.

This is the ChangeLog entry I checked in.

2016-11-21  Michael Meissner  <meissner@linux.vnet.ibm.com>
 
	* config/rs6000/rs6000.md (movdi_internal32): Change constraints
	so that DImode can be allocated to FP/vector registers in more
	cases, and we can avoid direct move operations.  If the register
	needs reloading, prefer GPRs over FP/vector registers.  In the
	case of FPR vs. Altivec registers, prefer FPR registers unless we
	have the ISA 3.0 reg+offset scalar instructions.
	(movdi_internal64): Likewise.
diff mbox

Patch

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)	(revision 242556)
+++ gcc/config/rs6000/rs6000.md	(.../gcc/config/rs6000)	(working copy)
@@ -8118,10 +8118,10 @@  (define_insn "p8_mfvsrd_4_disf"
 
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "rs6000_nonimmediate_operand"
-         "=Y,        r,         r,         ?m,        ?*d,        ?*d,
-          r,         ?wY,       ?Z,        ?*wb,      ?*wv,       ?wi,
-          ?wo,       ?wo,       ?wv,       ?wi,       ?wi,        ?wv,
-          ?wv")
+         "=Y,        r,         r,         ^m,        ^d,         ^d,
+          r,         ^wY,       $Z,        ^wb,       $wv,        ^wi,
+          *wo,       *wo,       *wv,       *wi,       *wi,        *wv,
+          *wv")
 
 	(match_operand:DI 1 "input_operand"
           "r,        Y,         r,         d,         m,          d,
@@ -8195,9 +8195,9 @@  (define_split
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
                "=Y,        r,         r,         r,         r,          r,
-                ?m,        ?*d,       ?*d,       ?wY,       ?Z,         ?*wb,
-                ?*wv,      ?wi,       ?wo,       ?wo,       ?wv,        ?wi,
-                ?wi,       ?wv,       ?wv,       r,         *h,         *h,
+                ^m,        ^d,        ^d,        ^Y,        $Z,         $wb,
+                $wv,       ^wi,       *wo,       *wo,       *wv,        *wi,
+                *wi,       *wv,       *wv,       r,         *h,         *h,
                 ?*r,       ?*wg,      ?*r,       ?*wj")
 
 	(match_operand:DI 1 "input_operand"
Index: gcc/testsuite/gcc.target/powerpc/ppc-round2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-round2.c	(.../svn+ssh://meissner@gcc.gnu.org/svn/gcc/trunk/gcc/testsuite/gcc.target/powerpc)	(revision 242556)
+++ gcc/testsuite/gcc.target/powerpc/ppc-round2.c	(.../gcc/testsuite/gcc.target/powerpc)	(working copy)
@@ -5,8 +5,8 @@ 
 /* { dg-options "-O2 -mcpu=power8" } */
 /* { dg-final { scan-assembler-times "fcfid "      2 } } */
 /* { dg-final { scan-assembler-times "fcfids "     2 } } */
-/* { dg-final { scan-assembler-times "fctiwuz "    2 } } */
-/* { dg-final { scan-assembler-times "fctiwz "     2 } } */
+/* { dg-final { scan-assembler-times "fctiwuz \|xscvdpuxws " 2 } } */
+/* { dg-final { scan-assembler-times "fctiwz \|xscvdpsxws "  2 } } */
 /* { dg-final { scan-assembler-times "mfvsrd "     4 } } */
 /* { dg-final { scan-assembler-times "mtvsrwa "    2 } } */
 /* { dg-final { scan-assembler-times "mtvsrwz "    2 } } */