diff mbox series

PowerPC PR target/84154, fix floating point to small integer conversion regression

Message ID 20180201193116.GA15164@ibm-tiger.the-meissners.org
State New
Headers show
Series PowerPC PR target/84154, fix floating point to small integer conversion regression | expand

Commit Message

Michael Meissner Feb. 1, 2018, 7:31 p.m. UTC
This patch fixes the optimization regression that occurred on GCC 7 where
conversions from the various floating point types to small integers would at
times generate a store and a load.

For example, converting from double to unsigned char generated the following
code on GCC 6 for -mcpu=power8:

        fctiwuz 1,1
        mfvsrd 3,1
        rlwinm 3,3,0,0xff

on GCC 7 and 8 it generates:

        fctiwuz 0,1
        mfvsrwz 9,0
        stw 9,-16(1)
        ori 2,2,0
        lbz 3,-16(1)

The insns before register allocation are:

	(insn 7 8 13 2 (set (subreg:SI (reg:QI 157) 0)
			    (unsigned_fix:SI (reg:SF 33)))

	(insn 13 7 14 2 (set (reg/i:DI 3 3)
			     (zero_extend:DI (reg:QI 157))))

After reload, the insns are:

	(insn 7 8 19 2 (set (reg:SI 32 0 [160])
			    (unsigned_fix:SI (reg:SF 33))))

	(insn 19 7 18 2 (set (reg:SI 9 9 [160])
			     (reg:SI 32 0 [160])))

	(insn 18 19 13 2 (set (mem/c:SI (plus:DI (reg/f:DI 1 1)
						 (const_int -16))
			      (reg:SI 9))))

	(insn 13 18 14 2 (set (reg/i:DI 3 3)
			      (zero_extend:DI (mem/c:HI (plus:DI (reg/f:DI 1 1)
								 (const_int -16))))))

ISA 3.0 (Power9) did not have this problem, because it already had a
fixuns_truncdfqi2 pattern, since QI/HImode values are allowed in vector
registers.  Previous versions of the ISA did not allow QI/HImode into vector
registers, because there wasn't load or store byte/half-word operations.

I extended ISA 3.0 conversion patterns to handle ISA 2.07, using splitters to
move the 32-bit int parts back to the GPR to do sign/zero extension or stores.

I also moved the optimization to prevent the register allocator from doing a
direct move on ISA 3.0 to do an offsettable store via the GPR register to a
separate insn, like I had previously done for SImode.  The rationale for this
is to prevent some places where the register allocator decided to do change a
store into a move (and then later store).

I have tested this patch on a little endian power8 system (64-bit) and a big
endian power8 system (both 32-bit and 64-bit executables).  There were no
regressions in the test suite and the compiler bootstrapped fine.  I added some
tests, and verified they ran in all 3 environments.  Can I check this into the
trunk?  Given this is a regression in GCC 7 as well, can I check the patch if
it applies cleanly into GCC 7 after a burn-in period.

[gcc]
2018-02-01  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/84154
	* config/rs6000/rs6000.md (fix_trunc<SFDF:mode><QHI:mode>2):
	Convert from define_expand to be define_insn_and_split.  Rework
	float/double/_Float128 conversions to QI/HI/SImode to work with
	both ISA 2.07 (power8) or ISA 3.0 (power9).  Fix regression where
	conversions to QI/HImode types did a store and then a load to
	truncate the value.  For conversions to VSX registers, don't split
	the insn, instead emit the code directly.
	(fixuns_trunc<SFDF:mode><QHI:mode>2): Likewise.
	(fix_trunc<IEEE128:mode><QHI:mode>2): Likewise.
	(fix_trunc<SFDF:mode><QHI:mode>2_internal): Delete, no longer
	used.
	(fixuns_trunc<SFDF:mode><QHI:mode>2_internal): Likewise.
	(fix<uns>_<mode>_mem): Likewise.
	(fix_trunc<SFDF:mode><QHI:mode>2_mem): On ISA 3.0, prevent the
	register allocator from doing a direct move to the GPRs to do a
	store, and instead use the ISA 3.0 store byte/half-word from
	vector register instruction.  For IEEE 128-bit floating point,
	also optimize stores of 32-bit ints.
	(fixuns_trunc<SFDF:mode><QHI:mode>2_mem): Likewise.
	(fix_trunc<IEEE128:mode><QHSI:mode>2_mem): Likewise.
	(fixuns_trunc<IEEE128:mode><QHSI:mode>2_mem): Likewise.

[gcc/testsuite]
2018-02-01  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/84154
	* gcc.target/powerpc/pr84154-1.c: New tests.
	* gcc.target/powerpc/pr84154-2.c: Likewise.
	* gcc.target/powerpc/pr84154-3.c: Likewise.

Comments

Segher Boessenkool Feb. 5, 2018, 11:57 a.m. UTC | #1
On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> This patch fixes the optimization regression that occurred on GCC 7 where
> conversions from the various floating point types to small integers would at
> times generate a store and a load.

[ snip big explanation; thanks for that! ]

Could you merge the signed and unsigned patterns, using any_fix?  Or is
there a reason that cannot work (other than that <su> unsigned_fix seems
buggy, it should say "u")?

Okay for trunk even without that (but please try).  Also okay for 7 after
looking for fallout.

Thanks!


Segher
Michael Meissner Feb. 5, 2018, 12:54 p.m. UTC | #2
On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote:
> On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> > This patch fixes the optimization regression that occurred on GCC 7 where
> > conversions from the various floating point types to small integers would at
> > times generate a store and a load.
> 
> [ snip big explanation; thanks for that! ]
> 
> Could you merge the signed and unsigned patterns, using any_fix?  Or is
> there a reason that cannot work (other than that <su> unsigned_fix seems
> buggy, it should say "u")?

Well that's the way the instructions are.  For traditional FPR instructions we
have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS
vs. XSCVDPUXWS.  If you mean the name of the insn, I can change it if desired,
but originally it was based on the FPR insn name.

> Okay for trunk even without that (but please try).  Also okay for 7 after
> looking for fallout.

In the past, I have found that combining code iterators with two mode iterators
has a bug where it would use the wrong code iterator, so I just avoided doing
that.  I'll see if it is still a bug.
Segher Boessenkool Feb. 5, 2018, 2:01 p.m. UTC | #3
On Mon, Feb 05, 2018 at 07:54:58AM -0500, Michael Meissner wrote:
> On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote:
> > On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> > > This patch fixes the optimization regression that occurred on GCC 7 where
> > > conversions from the various floating point types to small integers would at
> > > times generate a store and a load.
> > 
> > [ snip big explanation; thanks for that! ]
> > 
> > Could you merge the signed and unsigned patterns, using any_fix?  Or is
> > there a reason that cannot work (other than that <su> unsigned_fix seems
> > buggy, it should say "u")?
> 
> Well that's the way the instructions are.  For traditional FPR instructions we
> have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS
> vs. XSCVDPUXWS.  If you mean the name of the insn, I can change it if desired,
> but originally it was based on the FPR insn name.

We have

(define_code_attr su [(sign_extend      "s")
                      (zero_extend      "u")
                      (fix              "s")
                      (unsigned_fix     "s")
                      (float            "s")
                      (unsigned_float   "u")])

and "s" for unsigned_fix seems like it should be "u".  Very surprising
otherwise (if this is needed in some case, it should just write "s" there
instead of "<su>").

> > Okay for trunk even without that (but please try).  Also okay for 7 after
> > looking for fallout.
> 
> In the past, I have found that combining code iterators with two mode iterators
> has a bug where it would use the wrong code iterator, so I just avoided doing
> that.  I'll see if it is still a bug.

Hrm.  If you have multiple iterators you often need to use ":" syntax,
and you might want that anyway because the precedence rules are non-obvious;
but you are hitting something else?  Please open a PR if so :-)


Segher
Michael Meissner Feb. 5, 2018, 10:57 p.m. UTC | #4
On Mon, Feb 05, 2018 at 08:01:32AM -0600, Segher Boessenkool wrote:
> On Mon, Feb 05, 2018 at 07:54:58AM -0500, Michael Meissner wrote:
> > On Mon, Feb 05, 2018 at 05:57:25AM -0600, Segher Boessenkool wrote:
> > > On Thu, Feb 01, 2018 at 02:31:17PM -0500, Michael Meissner wrote:
> > > > This patch fixes the optimization regression that occurred on GCC 7 where
> > > > conversions from the various floating point types to small integers would at
> > > > times generate a store and a load.
> > > 
> > > [ snip big explanation; thanks for that! ]
> > > 
> > > Could you merge the signed and unsigned patterns, using any_fix?  Or is
> > > there a reason that cannot work (other than that <su> unsigned_fix seems
> > > buggy, it should say "u")?
> > 
> > Well that's the way the instructions are.  For traditional FPR instructions we
> > have FCTIWZ vs. FCTIWUZ, while on the VSX side we have XSCVDPSXWS
> > vs. XSCVDPUXWS.  If you mean the name of the insn, I can change it if desired,
> > but originally it was based on the FPR insn name.
> 
> We have
> 
> (define_code_attr su [(sign_extend      "s")
>                       (zero_extend      "u")
>                       (fix              "s")
>                       (unsigned_fix     "s")
>                       (float            "s")
>                       (unsigned_float   "u")])
> 
> and "s" for unsigned_fix seems like it should be "u".  Very surprising
> otherwise (if this is needed in some case, it should just write "s" there
> instead of "<su>").
> 
> > > Okay for trunk even without that (but please try).  Also okay for 7 after
> > > looking for fallout.
> > 
> > In the past, I have found that combining code iterators with two mode iterators
> > has a bug where it would use the wrong code iterator, so I just avoided doing
> > that.  I'll see if it is still a bug.
> 
> Hrm.  If you have multiple iterators you often need to use ":" syntax,
> and you might want that anyway because the precedence rules are non-obvious;
> but you are hitting something else?  Please open a PR if so :-)

I already do use the : syntax.

I found as long as I avoid putting the <su> or <u> in the output template
(i.e. use an output statement instead of a template) it works.  It only
seems to fail in the IEEE128 case, and not in the SFDF case.  I will submit a
bug report  on it after this gets checked in, as it will be simpler to provide
a patch that people can test.

This patch cleans up all of the {,unsigned_}fix patterns that convert to QImode
or HImode.  It adds the memory combiner to include SImode to the mix.  While I
was at it, I combined the IEEE insns to use any_fix.

I have checked this patch on both little endian and big endian power8 systems,
and I have have hand checked the code for power7 and power9 systems.  There
were no regressions in the test suite, the new tests were verified to run, and
I did bootstrap builds on both systems.  The big endian power8 system also
32-bit tests as well as 64-bit tests.  Can I apply this patch to the trunk and
after a waiting period, apply the patch to the GCC 7 branch?

[gcc]
2018-02-05  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/84154
	* config/rs6000/rs6000.md (fix_trunc<SFDF:mode><QHI:mode>2):
	Convert from define_expand to be define_insn_and_split.  Rework
	float/double/_Float128 conversions to QI/HI/SImode to work with
	both ISA 2.07 (power8) or ISA 3.0 (power9).  Fix regression where
	conversions to QI/HImode types did a store and then a load to
	truncate the value.  For conversions to VSX registers, don't split
	the insn, instead emit the code directly.  Use the code iterator
	any_fix to combine signed and unsigned conversions.
	(fix<uns>_trunc<SFDF:mode>si2_p8): Likewise.
	(fixuns_trunc<SFDF:mode><QHI:mode>2): Likewise.
	(fix_trunc<IEEE128:mode><QHI:mode>2): Likewise.
	(fix<uns>_trunc<SFDF:mode><QHI:mode>2): Likewise.
	(fix_<mode>di2_hw): Likewise.
	(fixuns_<mode>di2_hw): Likewise.
	(fix_<mode>si2_hw): Likewise.
	(fixuns_<mode>si2_hw): Likewise.
	(fix<uns>_<IEEE128:mode><SDI:mode>2_hw): Likewise.
	(fix<uns>_trunc<IEEE128:mode><QHI:mode>2): Likewise.
	(fctiw<u>z_<mode>_smallint): Rename fctiw<u>z_<mode>_smallint to
	fix<uns>_trunc<SFDF:mode>si2_p8.
	(fix_trunc<SFDF:mode><QHI:mode>2_internal): Delete, no longer
	used.
	(fixuns_trunc<SFDF:mode><QHI:mode>2_internal): Likewise.
	(fix<uns>_<mode>_mem): Likewise.
	(fctiw<u>z_<mode>_mem): Likewise.
	(fix<uns>_<mode>_mem): Likewise.
	(fix<uns>_trunc<SFDF:mode><QHSI:mode>2_mem): On ISA 3.0, prevent
	the register allocator from doing a direct move to the GPRs to do
	a store, and instead use the ISA 3.0 store byte/half-word from
	vector register instruction.  For IEEE 128-bit floating point,
	also optimize stores of 32-bit ints.
	(fix<uns>_trunc<IEEE128:mode><QHSI:mode>2_mem): Likewise.

[gcc/testsuite]
2018-02-05  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/84154
	* gcc.target/powerpc/pr84154-1.c: New tests.
	* gcc.target/powerpc/pr84154-2.c: Likewise.
	* gcc.target/powerpc/pr84154-3.c: Likewise.
Segher Boessenkool Feb. 6, 2018, 5:15 p.m. UTC | #5
Hi!

On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote:
> > We have
> > 
> > (define_code_attr su [(sign_extend      "s")
> >                       (zero_extend      "u")
> >                       (fix              "s")
> >                       (unsigned_fix     "s")
> >                       (float            "s")
> >                       (unsigned_float   "u")])
> > 
> > and "s" for unsigned_fix seems like it should be "u".  Very surprising
> > otherwise (if this is needed in some case, it should just write "s" there
> > instead of "<su>").

What about this?

> I found as long as I avoid putting the <su> or <u> in the output template
> (i.e. use an output statement instead of a template) it works.  It only
> seems to fail in the IEEE128 case, and not in the SFDF case.  I will submit a
> bug report  on it after this gets checked in, as it will be simpler to provide
> a patch that people can test.

Thanks.

> +(define_insn_and_split "fix<uns>_trunc<SFDF:mode><QHI:mode>2"
> +  [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r")
> +	(any_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa")))
> +   (clobber (match_scratch:SI 2 "=X,X,wi"))]
> +  "TARGET_DIRECT_MOVE"
> +  "@
> +   fctiw<u>z %0,%1
> +   xscvdp<su>xws %x0,%x1

This one cannot work with the <su> definition above, for example.

> +(define_insn "fix<uns>_<IEEE128:mode><SDI:mode>2_hw"
> +  [(set (match_operand:SDI 0 "altivec_register_operand" "=v")
> +	(any_fix:SDI (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
> +  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)"
> +{
> +  return (<CODE> == UNSIGNED_FIX) ? "xscvqpu<wd>z %0,%1" : "xscvqps<wd>z %0,%1";
> +}

And it may well be why you need this.

Rest looks good :-)


Segher
Michael Meissner Feb. 6, 2018, 8:11 p.m. UTC | #6
On Tue, Feb 06, 2018 at 11:15:21AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote:
> > > We have
> > > 
> > > (define_code_attr su [(sign_extend      "s")
> > >                       (zero_extend      "u")
> > >                       (fix              "s")
> > >                       (unsigned_fix     "s")
> > >                       (float            "s")
> > >                       (unsigned_float   "u")])
> > > 
> > > and "s" for unsigned_fix seems like it should be "u".  Very surprising
> > > otherwise (if this is needed in some case, it should just write "s" there
> > > instead of "<su>").
> 
> What about this?

You were right.  I kept missing this.  Sorry.

Here is the patch for just this fix.  As we've discussed, I'll apply this patch
directly as being obvious, while waiting for the other builds.

[gcc]
2018-02-06  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/84154
	* config/rs6000/rs6000.md (su code attribute): Use "u" for
	unsigned_fix, not "s".

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 257269)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -550,7 +550,7 @@ (define_code_attr u  [(sign_extend	"")
 (define_code_attr su [(sign_extend	"s")
 		      (zero_extend	"u")
 		      (fix		"s")
-		      (unsigned_fix	"s")
+		      (unsigned_fix	"u")
 		      (float		"s")
 		      (unsigned_float	"u")])
Michael Meissner Feb. 6, 2018, 9:34 p.m. UTC | #7
On Tue, Feb 06, 2018 at 11:15:21AM -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Feb 05, 2018 at 05:57:51PM -0500, Michael Meissner wrote:
> > > We have
> > > 
> > > (define_code_attr su [(sign_extend      "s")
> > >                       (zero_extend      "u")
> > >                       (fix              "s")
> > >                       (unsigned_fix     "s")
> > >                       (float            "s")
> > >                       (unsigned_float   "u")])
> > > 
> > > and "s" for unsigned_fix seems like it should be "u".  Very surprising
> > > otherwise (if this is needed in some case, it should just write "s" there
> > > instead of "<su>").
> 
> What about this?

As we discussed before, this was a bug, and I've already committed the fix
separately.

> > I found as long as I avoid putting the <su> or <u> in the output template
> > (i.e. use an output statement instead of a template) it works.  It only
> > seems to fail in the IEEE128 case, and not in the SFDF case.  I will submit a
> > bug report  on it after this gets checked in, as it will be simpler to provide
> > a patch that people can test.
> 
> Thanks.
> 
> > +(define_insn_and_split "fix<uns>_trunc<SFDF:mode><QHI:mode>2"
> > +  [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r")
> > +	(any_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa")))
> > +   (clobber (match_scratch:SI 2 "=X,X,wi"))]
> > +  "TARGET_DIRECT_MOVE"
> > +  "@
> > +   fctiw<u>z %0,%1
> > +   xscvdp<su>xws %x0,%x1
> 
> This one cannot work with the <su> definition above, for example.
> 
> > +(define_insn "fix<uns>_<IEEE128:mode><SDI:mode>2_hw"
> > +  [(set (match_operand:SDI 0 "altivec_register_operand" "=v")
> > +	(any_fix:SDI (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
> > +  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)"
> > +{
> > +  return (<CODE> == UNSIGNED_FIX) ? "xscvqpu<wd>z %0,%1" : "xscvqps<wd>z %0,%1";
> > +}
> 
> And it may well be why you need this.
> 
> Rest looks good :-)

Here is the patch reworked.  It bootstraps on both little/big endian power8,
and all of the tests run.  Can I install this into trunk now, and into GCC 7
after a soak period (along with the previous patch)?

[gcc]
2018-02-06  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/84154
	* config/rs6000/rs6000.md (fix_trunc<SFDF:mode><QHI:mode>2):
	Convert from define_expand to be define_insn_and_split.  Rework
	float/double/_Float128 conversions to QI/HI/SImode to work with
	both ISA 2.07 (power8) or ISA 3.0 (power9).  Fix regression where
	conversions to QI/HImode types did a store and then a load to
	truncate the value.  For conversions to VSX registers, don't split
	the insn, instead emit the code directly.  Use the code iterator
	any_fix to combine signed and unsigned conversions.
	(fix<uns>_trunc<SFDF:mode>si2_p8): Likewise.
	(fixuns_trunc<SFDF:mode><QHI:mode>2): Likewise.
	(fix_trunc<IEEE128:mode><QHI:mode>2): Likewise.
	(fix<uns>_trunc<SFDF:mode><QHI:mode>2): Likewise.
	(fix_<mode>di2_hw): Likewise.
	(fixuns_<mode>di2_hw): Likewise.
	(fix_<mode>si2_hw): Likewise.
	(fixuns_<mode>si2_hw): Likewise.
	(fix<uns>_<IEEE128:mode><SDI:mode>2_hw): Likewise.
	(fix<uns>_trunc<IEEE128:mode><QHI:mode>2): Likewise.
	(fctiw<u>z_<mode>_smallint): Rename fctiw<u>z_<mode>_smallint to
	fix<uns>_trunc<SFDF:mode>si2_p8.
	(fix_trunc<SFDF:mode><QHI:mode>2_internal): Delete, no longer
	used.
	(fixuns_trunc<SFDF:mode><QHI:mode>2_internal): Likewise.
	(fix<uns>_<mode>_mem): Likewise.
	(fctiw<u>z_<mode>_mem): Likewise.
	(fix<uns>_<mode>_mem): Likewise.
	(fix<uns>_trunc<SFDF:mode><QHSI:mode>2_mem): On ISA 3.0, prevent
	the register allocator from doing a direct move to the GPRs to do
	a store, and instead use the ISA 3.0 store byte/half-word from
	vector register instruction.  For IEEE 128-bit floating point,
	also optimize stores of 32-bit ints.
	(fix<uns>_trunc<IEEE128:mode><QHSI:mode>2_mem): Likewise.

[gcc/testsuite]
2018-02-06  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/84154
	* gcc.target/powerpc/pr84154-1.c: New tests.
	* gcc.target/powerpc/pr84154-2.c: Likewise.
	* gcc.target/powerpc/pr84154-3.c: Likewise.
Segher Boessenkool Feb. 7, 2018, 10:41 p.m. UTC | #8
Hi Mike,

On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> Here is the patch reworked.  It bootstraps on both little/big endian power8,
> and all of the tests run.  Can I install this into trunk now, and into GCC 7
> after a soak period (along with the previous patch)?

> +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR

"If we have"?

> +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> +	(any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> +   (clobber (match_scratch:SI 2 "=wa"))]
> +  "((<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR)
> +    || (<QHSI:MODE>mode != SImode && TARGET_P9_VECTOR))"

This is the same as

  "(<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"

which is a bit easier to understand I think.

Okay (for trunk as well as 7) with those trivialities improved.  Thanks!


Segher
Hans-Peter Nilsson Feb. 8, 2018, 11:10 p.m. UTC | #9
On Wed, 7 Feb 2018, Segher Boessenkool wrote:
> Hi Mike,
>
> On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> > Here is the patch reworked.  It bootstraps on both little/big endian power8,
> > and all of the tests run.  Can I install this into trunk now, and into GCC 7
> > after a soak period (along with the previous patch)?
>
> > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
>
> "If we have"?
>
> > +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> > +	(any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> > +   (clobber (match_scratch:SI 2 "=wa"))]
> > +  "((<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR)
> > +    || (<QHSI:MODE>mode != SImode && TARGET_P9_VECTOR))"
>
> This is the same as
>
>   "(<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"

Umm, sorry for chiming in here with zero rs6000 knowledge and I
might be missing something trivial but...what wouldn't that misfire for
 "<QHSI:MODE>mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ?

(Is that invalid or not applicable or don't care or something?)

brgds, H-P
Michael Meissner Feb. 8, 2018, 11:58 p.m. UTC | #10
On Thu, Feb 08, 2018 at 06:10:31PM -0500, Hans-Peter Nilsson wrote:
> On Wed, 7 Feb 2018, Segher Boessenkool wrote:
> > Hi Mike,
> >
> > On Tue, Feb 06, 2018 at 04:34:08PM -0500, Michael Meissner wrote:
> > > Here is the patch reworked.  It bootstraps on both little/big endian power8,
> > > and all of the tests run.  Can I install this into trunk now, and into GCC 7
> > > after a soak period (along with the previous patch)?
> >
> > > +;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
> >
> > "If we have"?
> >
> > > +  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
> > > +	(any_fix:QHSI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
> > > +   (clobber (match_scratch:SI 2 "=wa"))]
> > > +  "((<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR)
> > > +    || (<QHSI:MODE>mode != SImode && TARGET_P9_VECTOR))"
> >
> > This is the same as
> >
> >   "(<QHSI:MODE>mode == SImode && TARGET_P8_VECTOR) || TARGET_P9_VECTOR"
> 
> Umm, sorry for chiming in here with zero rs6000 knowledge and I
> might be missing something trivial but...what wouldn't that misfire for
>  "<QHSI:MODE>mode == SImode && ! TARGET_P8_VECTOR && TARGET_P9_VECTOR" ?
> 
> (Is that invalid or not applicable or don't care or something?)

TARGET_P9_VECTOR requires TARGET_P8_VECTOR.

Basically when we are converting SF/DFmode to SImode, we want to allow it on
ISA 2.07 (-mcpu=power8).  If we are converting to SF/DFmode to HI/QImode, we
require ISA 3.0 (-mcpu=power9).

The reason is that we don't have the instructions to do 32-bit integer store
and 32-bit integer sign/zero extended load instructions to all of the vector
and floating point registers until ISA 2.07.  Because of that, we don't allow
SImode in the vector and floating point registers until ISA 2.07.  In
processors before power8, we had to do a store 64-bit integer on the stack and
then load up the 32-bit value into the GPR registers.

However, ISA 2.07 does not have instructions to store or load 8/16-bit values
that can be conveniently used.  ISA 3.0 added 8/16-bit store, 8/16-bit zero
extended load, and 8/16-bit sign extend instructions.  So in ISA 3.0, we allow
QI/HImode to go in vector registers.

In ISA 2.06 (-mcpu=power7) we had to use UNSPECs to hide the convert floating
point scalar to 32-bit signed/unsigned instruction instructions because we
didn't allow the base type into the register.
diff mbox series

Patch

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 257269)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -5700,43 +5700,48 @@  (define_insn "*fix_trunc<mode>di2_fctidz
    xscvdpsxds %x0,%x1"
   [(set_attr "type" "fp")])
 
-(define_expand "fix_trunc<SFDF:mode><QHI:mode>2"
-  [(parallel [(set (match_operand:<QHI:MODE> 0 "nonimmediate_operand")
-		   (fix:QHI (match_operand:SFDF 1 "gpc_reg_operand")))
-	      (clobber (match_scratch:DI 2))])]
-  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT"
+;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
+;; registers.  If we have ISA 2.07, we don't allow QI/HImode values in the
+;; vector registers, so we need to do direct moves to the GPRs, but SImode
+;; values can go in VSX registers.  Keeping the direct move part through
+;; register allocation prevents the register allocator from doing a direct of
+;; the SImode value to a GPR, and then a store/load.
+(define_insn_and_split "fix_trunc<SFDF:mode><QHI:mode>2"
+  [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r")
+	(fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa")))
+   (clobber (match_scratch:SI 2 "=X,X,wi"))]
+  "TARGET_DIRECT_MOVE"
+  "@
+   fctiwz %0,%1
+   xscvdpsxws %x0,%x1
+   #"
+  "&& reload_completed && int_reg_operand (operands[0], <QHI:MODE>mode)"
+  [(set (match_dup 2)
+	(fix:SI (match_dup 1)))
+   (set (match_dup 3)
+	(match_dup 2))]
 {
-  if (MEM_P (operands[0]))
-    operands[0] = rs6000_address_for_fpconvert (operands[0]);
-})
+  operands[3] = gen_rtx_REG (SImode, REGNO (operands[0]));
+}
+  [(set_attr "length" "4,4,8")
+   (set_attr "type" "fp")])
 
-(define_insn_and_split "*fix_trunc<SFDF:mode><QHI:mode>2_internal"
-  [(set (match_operand:<QHI:MODE> 0 "reg_or_indexed_operand" "=wIwJ,rZ")
-	(fix:QHI
-	 (match_operand:SFDF 1 "gpc_reg_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:DI 2 "=X,wi"))]
-  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT"
+;; Keep the convert and store together through register allocation to prevent
+;; the register allocator from getting clever and doing a direct move to a GPR
+;; and then store for reg+offset stores.
+(define_insn_and_split "*fix_trunc<SFDF:mode><QHI:mode>2_mem"
+  [(set (match_operand:QHI 0 "memory_operand" "=Z")
+	(fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
+   (clobber (match_scratch:SI 2 "=wa"))]
+  "TARGET_P9_VECTOR"
   "#"
   "&& reload_completed"
-  [(const_int 0)]
+  [(set (match_dup 2)
+	(fix:SI (match_dup 1)))
+   (set (match_dup 0)
+	(match_dup 3))]
 {
-  rtx dest = operands[0];
-  rtx src = operands[1];
-
-  if (vsx_register_operand (dest, <QHI:MODE>mode))
-    {
-      rtx di_dest = gen_rtx_REG (DImode, REGNO (dest));
-      emit_insn (gen_fix_trunc<SFDF:mode>di2 (di_dest, src));
-    }
-  else
-    {
-      rtx tmp = operands[2];
-      rtx tmp2 = gen_rtx_REG (<QHI:MODE>mode, REGNO (tmp));
-
-      emit_insn (gen_fix_trunc<SFDF:mode>di2 (tmp, src));
-      emit_move_insn (dest, tmp2);
-    }
-  DONE;
+  operands[3] = gen_rtx_REG (<QHI:MODE>mode, REGNO (operands[2]));
 })
 
 (define_expand "fixuns_trunc<mode>si2"
@@ -5803,48 +5808,51 @@  (define_insn "fixuns_trunc<mode>di2"
    xscvdpuxds %x0,%x1"
   [(set_attr "type" "fp")])
 
-(define_expand "fixuns_trunc<SFDF:mode><QHI:mode>2"
-  [(parallel [(set (match_operand:<QHI:MODE> 0 "nonimmediate_operand")
-		   (unsigned_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand")))
-	      (clobber (match_scratch:DI 2))])]
-  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT"
+;; If have ISA 3.0, QI/HImode values can go in both VSX registers and GPR
+;; registers.  If we have ISA 2.07, we don't allow QI/HImode values in the
+;; vector registers, so we need to do direct moves to the GPRs, but SImode
+;; values can go in VSX registers.  Keeping the direct move part through
+;; register allocation prevents the register allocator from doing a direct of
+;; the SImode value to a GPR, and then a store/load.
+(define_insn_and_split "fixuns_trunc<SFDF:mode><QHI:mode>2"
+  [(set (match_operand:<QHI:MODE> 0 "gpc_reg_operand" "=wJ,wJwK,r")
+	(unsigned_fix:QHI
+	 (match_operand:SFDF 1 "gpc_reg_operand" "wJ,wJwK,wa")))
+   (clobber (match_scratch:SI 2 "=X,X,wi"))]
+  "TARGET_DIRECT_MOVE"
+  "@
+   fctiwuz %0,%1
+   xscvdpuxws %x0,%x1
+   #"
+  "&& reload_completed && int_reg_operand (operands[0], <QHI:MODE>mode)"
+  [(set (match_dup 2)
+	(unsigned_fix:SI (match_dup 1)))
+   (set (match_dup 3)
+	(match_dup 2))]
 {
-  if (MEM_P (operands[0]))
-    operands[0] = rs6000_address_for_fpconvert (operands[0]);
-})
+  operands[3] = gen_rtx_REG (SImode, REGNO (operands[0]));
+}
+  [(set_attr "length" "4,4,8")
+   (set_attr "type" "fp")])
 
-(define_insn_and_split "*fixuns_trunc<SFDF:mode><QHI:mode>2_internal"
-  [(set (match_operand:<QHI:MODE> 0 "reg_or_indexed_operand" "=wIwJ,rZ")
-	(unsigned_fix:QHI
-	 (match_operand:SFDF 1 "gpc_reg_operand" "<SFDF:Fv>,<SFDF:Fv>")))
-   (clobber (match_scratch:DI 2 "=X,wi"))]
-  "TARGET_P9_VECTOR && TARGET_DIRECT_MOVE_64BIT"
+;; Keep the convert and store together through register allocation to prevent
+;; the register allocator from getting clever and doing a direct move to a GPR
+;; and then store for reg+offset stores.
+(define_insn_and_split "*fixuns_trunc<SFDF:mode><QHI:mode>2_mem"
+  [(set (match_operand:QHI 0 "memory_operand" "=Z")
+	(unsigned_fix:QHI (match_operand:SFDF 1 "gpc_reg_operand" "wa")))
+   (clobber (match_scratch:SI 2 "=wa"))]
+  "TARGET_P9_VECTOR"
   "#"
   "&& reload_completed"
-  [(const_int 0)]
+  [(set (match_dup 2)
+	(unsigned_fix:SI (match_dup 1)))
+   (set (match_dup 0)
+	(match_dup 3))]
 {
-  rtx dest = operands[0];
-  rtx src = operands[1];
-
-  if (vsx_register_operand (dest, <QHI:MODE>mode))
-    {
-      rtx di_dest = gen_rtx_REG (DImode, REGNO (dest));
-      emit_insn (gen_fixuns_trunc<SFDF:mode>di2 (di_dest, src));
-    }
-  else
-    {
-      rtx tmp = operands[2];
-      rtx tmp2 = gen_rtx_REG (<QHI:MODE>mode, REGNO (tmp));
-
-      emit_insn (gen_fixuns_trunc<SFDF:mode>di2 (tmp, src));
-      emit_move_insn (dest, tmp2);
-    }
-  DONE;
+  operands[3] = gen_rtx_REG (<QHI:MODE>mode, REGNO (operands[2]));
 })
 
-;; If -mvsx-small-integer, we can represent the FIX operation directly.  On
-;; older machines, we have to use an UNSPEC to produce a SImode and move it
-;; to another location, since SImode is not allowed in vector registers.
 (define_insn "*fctiw<u>z_<mode>_smallint"
   [(set (match_operand:SI 0 "vsx_register_operand" "=d,wi")
 	(any_fix:SI (match_operand:SFDF 1 "gpc_reg_operand" "<Ff>,<Fv>")))]
@@ -14386,6 +14394,15 @@  (define_insn "fix_<mode>si2_hw"
   [(set_attr "type" "vecfloat")
    (set_attr "size" "128")])
 
+(define_insn "fix_trunc<IEEE128:mode><QHI:mode>2"
+  [(set (match_operand:<QHI:MODE> 0 "altivec_register_operand" "=v")
+	(fix:QHI
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)"
+  "xscvqpswz %0,%1"
+  [(set_attr "type" "vecfloat")
+   (set_attr "size" "128")])
+
 (define_insn "fixuns_<mode>si2_hw"
   [(set (match_operand:SI 0 "altivec_register_operand" "=v")
 	(unsigned_fix:SI (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
@@ -14394,17 +14411,40 @@  (define_insn "fixuns_<mode>si2_hw"
   [(set_attr "type" "vecfloat")
    (set_attr "size" "128")])
 
-;; Combiner pattern to prevent moving the result of converting an IEEE 128-bit
-;; floating point value to 32-bit integer to GPR in order to save it.
-(define_insn_and_split "*fix<uns>_<mode>_mem"
-  [(set (match_operand:SI 0 "memory_operand" "=Z")
-	(any_fix:SI (match_operand:IEEE128 1 "altivec_register_operand" "v")))
-   (clobber (match_scratch:SI 2 "=v"))]
+(define_insn "fixuns_trunc<IEEE128:mode><QHI:mode>2"
+  [(set (match_operand:<QHI:MODE> 0 "altivec_register_operand" "=v")
+	(unsigned_fix:QHI
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<IEEE128:MODE>mode)"
+  "xscvqpuwz %0,%1"
+  [(set_attr "type" "vecfloat")
+   (set_attr "size" "128")])
+
+;; Combiner patterns to prevent moving the result of converting an IEEE 128-bit
+;; floating point value to 8/16/32-bit integer to GPR in order to save it.
+(define_insn_and_split "*fix_trunc<IEEE128:mode><QHSI:mode>2_mem"
+  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
+	(fix:QHSI
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))
+   (clobber (match_scratch:QHSI 2 "=v"))]
   "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
   "#"
   "&& reload_completed"
   [(set (match_dup 2)
-	(any_fix:SI (match_dup 1)))
+	(fix:QHSI (match_dup 1)))
+   (set (match_dup 0)
+	(match_dup 2))])
+
+(define_insn_and_split "*fixuns_trunc<IEEE128:mode><QHSI:mode>2_mem"
+  [(set (match_operand:QHSI 0 "memory_operand" "=Z")
+	(unsigned_fix:QHSI
+	 (match_operand:IEEE128 1 "altivec_register_operand" "v")))
+   (clobber (match_scratch:QHSI 2 "=v"))]
+  "TARGET_FLOAT128_HW && FLOAT128_IEEE_P (<MODE>mode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 2)
+	(unsigned_fix:QHSI (match_dup 1)))
    (set (match_dup 0)
 	(match_dup 2))])
 
Index: gcc/testsuite/gcc.target/powerpc/pr84154-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr84154-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr84154-1.c	(revision 0)
@@ -0,0 +1,55 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mpower8-vector -O2" } */
+
+/* PR target/84154.  Make sure conversion to char/short does not generate a
+   store and a load on ISA 2.07 and newer systems.  */
+
+unsigned char
+double_to_uc (double x)
+{
+  return x;
+}
+
+signed char
+double_to_sc (double x)
+{
+  return x;
+}
+
+unsigned short
+double_to_us (double x)
+{
+  return x;
+}
+
+short
+double_to_ss (double x)
+{
+  return x;
+}
+
+unsigned int
+double_to_ui (double x)
+{
+  return x;
+}
+
+int
+double_to_si (double x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-times {\mextsb\M}                  1 } } */
+/* { dg-final { scan-assembler-times {\mextsh\M}                  1 } } */
+/* { dg-final { scan-assembler-times {\mfctiwuz\M|\mxscvdpuxws\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mfctiwz\M|\mxscvdpsxws\M}  3 } } */
+/* { dg-final { scan-assembler-times {\mmfvsrwz\M}                6 } } */
+/* { dg-final { scan-assembler-times {\mrlwinm\M}                 2 } } */
+/* { dg-final { scan-assembler-not   {\mlbz\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mlhz\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mlha\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mmfvsrd\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mstw\M}                      } } */
Index: gcc/testsuite/gcc.target/powerpc/pr84154-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr84154-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr84154-2.c	(revision 0)
@@ -0,0 +1,58 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-mcpu=power8 -O2" } */
+
+/* PR target/84154.  Make sure on ISA 2.07 (power8) that we store the result of
+   a conversion to char/short using an offsettable address does not generate
+   direct moves for storing 32-bit integers, but does do a direct move for
+   8/16-bit integers.  */
+
+void
+double_to_uc (double x, unsigned char *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_sc (double x, signed char *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_us (double x, unsigned short *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_ss (double x, short *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_ui (double x, unsigned int *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_si (double x, int *p)
+{
+  p[3] = x;
+}
+
+/* { dg-final { scan-assembler-times {\mfctiwuz\M|\mxscvdpuxws\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mfctiwz\M|\mxscvdpsxws\M}  3 } } */
+/* { dg-final { scan-assembler-times {\mmfvsrwz\M}                4 } } */
+/* { dg-final { scan-assembler-times {\mstfiwx\M|\mstxsiwx\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mstb\M}                    2 } } */
+/* { dg-final { scan-assembler-times {\msth\M}                    2 } } */
+/* { dg-final { scan-assembler-not   {\mlbz\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mlhz\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mlha\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mmfvsrd\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mstw\M}                      } } */
Index: gcc/testsuite/gcc.target/powerpc/pr84154-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr84154-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr84154-3.c	(revision 0)
@@ -0,0 +1,60 @@ 
+/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+/* PR target/84154.  Make sure on ISA 3.0 we store the result of a conversion
+   to char/short using an offsettable address does not generate direct moves
+   for storing 8/16/32-bit integers.  */
+
+void
+double_to_uc (double x, unsigned char *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_sc (double x, signed char *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_us (double x, unsigned short *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_ss (double x, short *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_ui (double x, unsigned int *p)
+{
+  p[3] = x;
+}
+
+void
+double_to_si (double x, int *p)
+{
+  p[3] = x;
+}
+
+/* { dg-final { scan-assembler-times {\maddi\M}                   6 } } */
+/* { dg-final { scan-assembler-times {\mfctiwuz\M|\mxscvdpuxws\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mfctiwz\M|\mxscvdpsxws\M}  3 } } */
+/* { dg-final { scan-assembler-times {\mstfiwx\M|\mstxsiwx\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mstxsibx\M}                2 } } */
+/* { dg-final { scan-assembler-times {\mstxsihx\M}                2 } } */
+/* { dg-final { scan-assembler-not   {\mlbz\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mlhz\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mlha\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mmfvsrwz\M}                  } } */
+/* { dg-final { scan-assembler-not   {\mmfvsrd\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mstw\M}                      } } */
+/* { dg-final { scan-assembler-not   {\mstb\M}                      } } */
+/* { dg-final { scan-assembler-not   {\msth\M}                      } } */