Message ID | 20170919191636.GA4037@ibm-tiger.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | , Improve moving SFmode to GPR on PowerPC | expand |
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #1. Can I check this into the trunk? I noticed without this patch, sometimes the register allocator would do a store and then a load to move a SImode value from a vector register to a GPR and sign extend it, instead of doing the move and then the sign extension. 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/rs6000.md (extendsi<mode>2): Add a splitter to do sign extension from a vector register to a GPR by doing a 32-bit direct move and then an EXTSW. (extendsi<mode>2 splitter): Likewise.
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #2. Can I check this into the trunk? Compared to the previous patch, I simplified this to use zero_extendsidi2 instead of using an UNSPEC, and I deleted the UNSPEC. 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/rs6000.md (movsi_from_sf): Adjust code to eliminate doing a 32-bit shift right or vector extract after doing XSCVDPSPN. Use zero_extendsidi2 instead of p8_mfvsrd_4_disf to move the value to the GPRs. (movdi_from_sf_zero_ext): Likewise. (reload_gpr_from_vsxsf): Likewise. (p8_mfvsrd_4_disf): Delete, no longer used.
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #3. Can I check this into the trunk? I noticed that sometimes you want to convert a DFmode to SFmode and then move it to a GPR register. This patch adds a combiner insn to use the XSCVDPSP instruction (which does the proper rounding and handles out of bound values) instead of the XSCVDPSPN instruction (which assumes the value has already been rounded, and does not raise an exception in case of a Nan). 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/rs6000.md (movsi_from_df): Optimize converting a DFmode to a SFmode, and then needing to move the SFmode to a GPR to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN.
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #4. Can I check this into the trunk? This is a cosmetic change, in that I noticed the two insns providing XSCVDPSP were separated by another insn, and it moves the 2nd insn to be adjacent to the first. 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so that it is adjacent to the other XSCVSPDP insns.
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #5. Can I check this into the trunk? In working on the patch, I noticed that the XSCVDPSP insn used "f" to limit the register to traditional FPR registers. This insn was written for power7, which did not support SFmode in Altivec registers. Now that power8 supports SFmode in Altivec registers, this patch uses the proper constraint ("ww") so we can avoid a move instruction. 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/vsx.md (vsx_xscvdpsp_scalar): Use "ww" constraint instead of "f" to allow SFmode to be in traditional Altivec registers.
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #6. Can I check this into the trunk? When I first did the power7 port years ago, I decorated a lot of the insns with 2 alternatives, one that used the type specific constraint (i.e. "wf" for V4SF, "wd" for V2DF, etc.) and then the all VSX constraint ("wa") as an alternative with '?'. The theory was we might want to favor Altivec or FPR registers for a given vector type. However, we've never done this. Further more with the change to use LRA instead of reload, this isn't as useful. So, as I encounter these dual alternatived, I have been eliminating them. 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/vsx.md (vsx_xscvdpspn): Eliminate useless alternative constraint. (vsx_xscvspdpn): Likewise. (vsx_xscvspdpn_scalar): Likewise.
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #7. Can I check this into the trunk? This patch fixes the peephole to optimize code like (that shows up in the math library): float value; unsigned int u2; union { float f; unsigned int ui; } u; u.f = value; u2 = u.ui & 0x80000000; 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * config/rs6000/vsx.md (peephole for optimizing move SF to GPR): Adjust code to eliminate needing to do the shift right 32-bits operation after XSCVDPSPN.
Off list, Segher asked that I break the patch eliminating a shift right when transfering SFmode from a vector register to a GPR register down into smaller chunks. The power7 and power8 instructions that convert values in the double precision format to single precision actually duplicate the 32-bits in the first word and second word (the ISA says the second word is undefined). We are in the process of issuing an update to ISA 3.0 to clarify that this will be the required behavior going forward. I have broken the patches down to 8 chunks. Some of the patch are just cosmetic of things I noticed while doing the main patch. One patch eliminates the shift. Another fixes up the peephole2 that optimizes putting a SFmode into a union and then doing masking on the value. And the final patch updates the tests that need to be changed. I have verified that each of these sub-patches build, and after all 8 patches have been applied, I did the full bootstrap and regresion test, and like the previous combination patch there were no regressions. If only some of the patches are applied, then there will be 3 regressions until the remaining patches are applied. This is patch #8. Can I check this into the trunk? It fixes the two tests (pr71977-1.c and direct-move-float1.c) that need to be adjusted with the previous patches applies. It also adds a new test to test combining round from DFmode to SFmode and move it to a GPR. 2017-09-25 Michael Meissner <meissner@linux.vnet.ibm.com> * gcc.target/powerpc/pr71977-1.c: Update test to know that we don't generate a 32-bit shift after doing XSCVDPSPN. * gcc.target/powerpc/direct-move-float1.c: Likewise. * gcc.target/powerpc/direct-move-float3.c: New test.
Hi! On Tue, Sep 26, 2017 at 10:30:03AM -0400, Michael Meissner wrote: > I have broken the patches down to 8 chunks. Thanks for doing this. > +(define_split > + [(set (match_operand:DI 0 "int_reg_operand") > + (sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))] Should be EXTSI instead of DI, for clarity. > + "TARGET_DIRECT_MOVE_64BIT && reload_completed" > + [(set (match_dup 2) > + (match_dup 1)) > + (set (match_dup 0) > + (sign_extend:DI (match_dup 2)))] > +{ > + operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0])); > +}) Okay for trunk with that. Thanks! Segher
Hi, On Tue, Sep 26, 2017 at 10:32:03AM -0400, Michael Meissner wrote: > * config/rs6000/rs6000.md (movsi_from_sf): Adjust code to > eliminate doing a 32-bit shift right or vector extract after doing > XSCVDPSPN. Use zero_extendsidi2 instead of p8_mfvsrd_4_disf to > move the value to the GPRs. > (movdi_from_sf_zero_ext): Likewise. > (reload_gpr_from_vsxsf): Likewise. > --- gcc/config/rs6000/rs6000.md (revision 253169) > +++ gcc/config/rs6000/rs6000.md (working copy) > @@ -6806,25 +6806,25 @@ (define_insn "*movsi_internal1_single" > ;; needed. > > ;; MR LWZ LFIWZX LXSIWZX STW > -;; STFS STXSSP STXSSPX VSX->GPR MTVSRWZ > -;; VSX->VSX > +;; STFS STXSSP STXSSPX VSX->GPR VSX->VSX, > +;; MTVSRWZ (Typo: comma at end of line). > (define_insn_and_split "movsi_from_sf" > [(set (match_operand:SI 0 "nonimmediate_operand" > "=r, r, ?*wI, ?*wH, m, > - m, wY, Z, r, wIwH, > - ?wK") > + m, wY, Z, r, ?*wIwH, > + wIwH") > > (unspec:SI [(match_operand:SF 1 "input_operand" > "r, m, Z, Z, r, > - f, wb, wu, wIwH, r, > - wK")] > + f, wb, wu, wIwH, wIwH, > + r")] > UNSPEC_SI_FROM_SF)) > > (clobber (match_scratch:V4SF 2 > "=X, X, X, X, X, > - X, X, X, wa, X, > - wa"))] > + X, X, X, wIwH, X, > + X"))] > > "TARGET_NO_SF_SUBREG > && (register_operand (operands[0], SImode) > @@ -6839,10 +6839,10 @@ (define_insn_and_split "movsi_from_sf" > stxssp %1,%0 > stxsspx %x1,%y0 > # > - mtvsrwz %x0,%1 > - #" > + xscvdpspn %x0,%x1 > + mtvsrwz %x0,%1" > "&& reload_completed > - && register_operand (operands[0], SImode) > + && int_reg_operand (operands[0], SImode) > && vsx_reg_sfsubreg_ok (operands[1], SFmode)" > [(const_int 0)] > { So you swap the last two alternatives. Hrm okay. > @@ -6850,52 +6850,41 @@ (define_insn_and_split "movsi_from_sf" > rtx op1 = operands[1]; > rtx op2 = operands[2]; > rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); > + rtx op2_si = gen_rtx_REG (SImode, REGNO (op2)); > Does this work, btw? I would expect you need reg_or_subregno, for op0 that is (the new op2 might be fine, not sure). You do use it in most places; please check. > ;; movsi_from_sf with zero extension > ;; > ;; RLDICL LWZ LFIWZX LXSIWZX VSX->GPR > -;; MTVSRWZ VSX->VSX > +;; VSX->VSX MTVSRWZ > > (define_insn_and_split "*movdi_from_sf_zero_ext" > [(set (match_operand:DI 0 "gpc_reg_operand" > "=r, r, ?*wI, ?*wH, r, > - wIwH, ?wK") > + wK, wIwH") This loses the "?", is that on purpose? > [(set_attr "type" > "*, load, fpload, fpload, mftgpr, > - mffgpr, veclogical") > + vecexts, mffgpr") vecsimple or vecfloat I guess, not vecexts. We have no way of describing it exactly, of course. Maybe just "two". Okay for trunk with those things taken care of. Thanks! Segher
Hi! On Tue, Sep 26, 2017 at 10:34:44AM -0400, Michael Meissner wrote: > * config/rs6000/rs6000.md (movsi_from_df): Optimize converting a > DFmode to a SFmode, and then needing to move the SFmode to a GPR > to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN. > --- gcc/config/rs6000/rs6000.md (revision 253170) > +++ gcc/config/rs6000/rs6000.md (working copy) > @@ -6919,6 +6919,26 @@ (define_insn_and_split "*movdi_from_sf_z > "4, 4, 4, 4, 8, > 8, 4")]) > > +;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before > +;; moving it to SImode. We can do a SFmode store without having to do the > +;; conversion explicitly. If we are doing a register->register conversion, use > +;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the > +;; input will not fit in a SFmode, and the later assumes the value has already > +;; been rounded. > +(define_insn "*movsi_from_df" > + [(set (match_operand:SI 0 "nonimmediate_operand" "=wa,m,wY,Z") > + (unspec:SI [(float_truncate:SF > + (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))] > + UNSPEC_SI_FROM_SF))] (The indentation is a bit broken here -- DF line is indented a space too many, and the constraint strings do not line up). Other than that: looks fine, thanks! Segher
On Tue, Sep 26, 2017 at 10:36:34AM -0400, Michael Meissner wrote: > * config/rs6000/vsx.md (vsx_xscvspdp_scalar2): Move insn so that > it is adjacent to the other XSCVSPDP insns. Okay for trunk. Thanks, Segher
Hi! On Tue, Sep 26, 2017 at 10:39:06AM -0400, Michael Meissner wrote: > * config/rs6000/vsx.md (vsx_xscvdpsp_scalar): Use "ww" constraint > instead of "f" to allow SFmode to be in traditional Altivec > registers. Okay. Thanks, Segher
On Tue, Sep 26, 2017 at 10:44:24AM -0400, Michael Meissner wrote: > * config/rs6000/vsx.md (vsx_xscvdpspn): Eliminate useless > alternative constraint. > (vsx_xscvspdpn): Likewise. > (vsx_xscvspdpn_scalar): Likewise. Okay, nice cleanup! Thanks, Segher
On Tue, Sep 26, 2017 at 10:50:14AM -0400, Michael Meissner wrote: > * gcc.target/powerpc/pr71977-1.c: Update test to know that we > don't generate a 32-bit shift after doing XSCVDPSPN. > * gcc.target/powerpc/direct-move-float1.c: Likewise. > * gcc.target/powerpc/direct-move-float3.c: New test. > --- gcc/testsuite/gcc.target/powerpc/pr71977-1.c (revision 253176) > +++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c (working copy) > @@ -23,9 +23,9 @@ mask_and_float_var (float f, uint32_t ma > return u.value; > } > > -/* { dg-final { scan-assembler "\[ \t\]xxland " } } */ > -/* { dg-final { scan-assembler-not "\[ \t\]and " } } */ > -/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */ > -/* { dg-final { scan-assembler-not "\[ \t\]stxv" } } */ > -/* { dg-final { scan-assembler-not "\[ \t\]lxv" } } */ > -/* { dg-final { scan-assembler-not "\[ \t\]srdi " } } */ > +/* { dg-final { scan-assembler {\mxxland\M} } } */ > +/* { dg-final { scan-assembler-not {\mand\M} } } */ > +/* { dg-final { scan-assembler-not {\mmfvsrd\M} } } */ > +/* { dg-final { scan-assembler-not {\mstxv\M} } } */ > +/* { dg-final { scan-assembler-not {\mlxv\M} } } */ > +/* { dg-final { scan-assembler-not {\msrdi\M} } } */ Careful, you still want to disallow lxvx and stxvx -- so just remove the \M from those patterns, I'd say (if that works :-) ) Okay for trunk with that fixed. Thanks, Segher
On Tue, Sep 26, 2017 at 11:06:09AM -0500, Segher Boessenkool wrote: > > @@ -6850,52 +6850,41 @@ (define_insn_and_split "movsi_from_sf" > > rtx op1 = operands[1]; > > rtx op2 = operands[2]; > > rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); > > + rtx op2_si = gen_rtx_REG (SImode, REGNO (op2)); > > > > Does this work, btw? I would expect you need reg_or_subregno, for op0 > that is (the new op2 might be fine, not sure). You do use it in most > places; please check. It does work, but using reg_or_subreno is better. I've changed it. > > ;; movsi_from_sf with zero extension > > ;; > > ;; RLDICL LWZ LFIWZX LXSIWZX VSX->GPR > > -;; MTVSRWZ VSX->VSX > > +;; VSX->VSX MTVSRWZ > > > > (define_insn_and_split "*movdi_from_sf_zero_ext" > > [(set (match_operand:DI 0 "gpc_reg_operand" > > "=r, r, ?*wI, ?*wH, r, > > - wIwH, ?wK") > > + wK, wIwH") > > This loses the "?", is that on purpose? No. I'll put it back (but I do think it is harmless, since the direct move occurs before it). In my original patches, I just left the value in a vector register, and let the register allocator generate the direct move. However, when I updated the peephole2 (patch #7), it was a lot easier to write, if I kept in the direct move option. But I missed putting the '?' back in. Thanks. > > [(set_attr "type" > > "*, load, fpload, fpload, mftgpr, > > - mffgpr, veclogical") > > + vecexts, mffgpr") > > vecsimple or vecfloat I guess, not vecexts. We have no way of describing > it exactly, of course. Maybe just "two". Ok. > Okay for trunk with those things taken care of. Thanks!
On Tue, Sep 26, 2017 at 11:36:14AM -0500, Segher Boessenkool wrote: > Hi! > > On Tue, Sep 26, 2017 at 10:34:44AM -0400, Michael Meissner wrote: > > * config/rs6000/rs6000.md (movsi_from_df): Optimize converting a > > DFmode to a SFmode, and then needing to move the SFmode to a GPR > > to use the XSCVDPSP instruction instead of FRSP and XSCVDPSPN. > > > --- gcc/config/rs6000/rs6000.md (revision 253170) > > +++ gcc/config/rs6000/rs6000.md (working copy) > > @@ -6919,6 +6919,26 @@ (define_insn_and_split "*movdi_from_sf_z > > "4, 4, 4, 4, 8, > > 8, 4")]) > > > > +;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before > > +;; moving it to SImode. We can do a SFmode store without having to do the > > +;; conversion explicitly. If we are doing a register->register conversion, use > > +;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the > > +;; input will not fit in a SFmode, and the later assumes the value has already > > +;; been rounded. > > +(define_insn "*movsi_from_df" > > + [(set (match_operand:SI 0 "nonimmediate_operand" "=wa,m,wY,Z") > > + (unspec:SI [(float_truncate:SF > > + (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))] > > + UNSPEC_SI_FROM_SF))] > > (The indentation is a bit broken here -- DF line is indented a space too > many, and the constraint strings do not line up). That must something funky with patches and tabs. It looks ok after I apply the patch (the match_operand:DF is indented one space under float_truncate:SF).
On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote: > * config/rs6000/vsx.md (peephole for optimizing move SF to GPR): > Adjust code to eliminate needing to do the shift right 32-bits > operation after XSCVDPSPN. After staring at this way too long... Looks correct. What a monster :-) Okay for trunk. Thanks! Segher
On Tue, Sep 26, 2017 at 04:56:54PM -0500, Segher Boessenkool wrote: > On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote: > > * config/rs6000/vsx.md (peephole for optimizing move SF to GPR): > > Adjust code to eliminate needing to do the shift right 32-bits > > operation after XSCVDPSPN. > > After staring at this way too long... Looks correct. What a monster :-) > > Okay for trunk. Thanks! Thanks for taking the time to verify it. Yeah, it is a monster to get right. It would be nice to put this off to a separate MD pass, instead of abusing peephole2's.
On Tue, Sep 26, 2017 at 06:37:17PM -0400, Michael Meissner wrote: > On Tue, Sep 26, 2017 at 04:56:54PM -0500, Segher Boessenkool wrote: > > On Tue, Sep 26, 2017 at 10:48:29AM -0400, Michael Meissner wrote: > > > * config/rs6000/vsx.md (peephole for optimizing move SF to GPR): > > > Adjust code to eliminate needing to do the shift right 32-bits > > > operation after XSCVDPSPN. > > > > After staring at this way too long... Looks correct. What a monster :-) > > > > Okay for trunk. Thanks! > > Thanks for taking the time to verify it. > > Yeah, it is a monster to get right. It would be nice to put this off to a > separate MD pass, instead of abusing peephole2's. Or maybe we can teach LRA to help us out here. LRA dearly needs some target hooks for the target to tell it what code sequences are good for the target. Segher
Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 252844) +++ gcc/config/rs6000/vsx.md (working copy) @@ -1781,6 +1781,15 @@ (define_insn "vsx_xscvspdp" "xscvspdp %x0,%x1" [(set_attr "type" "fp")]) +;; Same as vsx_xscvspdp, but use SF as the type +(define_insn "vsx_xscvspdp_scalar2" + [(set (match_operand:SF 0 "vsx_register_operand" "=ww") + (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")] + UNSPEC_VSX_CVSPDP))] + "VECTOR_UNIT_VSX_P (V4SFmode)" + "xscvspdp %x0,%x1" + [(set_attr "type" "fp")]) + ;; Generate xvcvhpsp instruction (define_insn "vsx_xvcvhpsp" [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa") @@ -1794,41 +1803,32 @@ (define_insn "vsx_xvcvhpsp" ;; format of scalars is actually DF. (define_insn "vsx_xscvdpsp_scalar" [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa") - (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "f")] + (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")] UNSPEC_VSX_CVSPDP))] "VECTOR_UNIT_VSX_P (V4SFmode)" "xscvdpsp %x0,%x1" [(set_attr "type" "fp")]) -;; Same as vsx_xscvspdp, but use SF as the type -(define_insn "vsx_xscvspdp_scalar2" - [(set (match_operand:SF 0 "vsx_register_operand" "=ww") - (unspec:SF [(match_operand:V4SF 1 "vsx_register_operand" "wa")] - UNSPEC_VSX_CVSPDP))] - "VECTOR_UNIT_VSX_P (V4SFmode)" - "xscvspdp %x0,%x1" - [(set_attr "type" "fp")]) - ;; ISA 2.07 xscvdpspn/xscvspdpn that does not raise an error on signalling NaNs (define_insn "vsx_xscvdpspn" - [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww,?ww") - (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "wd,wa")] + [(set (match_operand:V4SF 0 "vsx_register_operand" "=ww") + (unspec:V4SF [(match_operand:DF 1 "vsx_register_operand" "ws")] UNSPEC_VSX_CVDPSPN))] "TARGET_XSCVDPSPN" "xscvdpspn %x0,%x1" [(set_attr "type" "fp")]) (define_insn "vsx_xscvspdpn" - [(set (match_operand:DF 0 "vsx_register_operand" "=ws,?ws") - (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wf,wa")] + [(set (match_operand:DF 0 "vsx_register_operand" "=ws") + (unspec:DF [(match_operand:V4SF 1 "vsx_register_operand" "wa")] UNSPEC_VSX_CVSPDPN))] "TARGET_XSCVSPDPN" "xscvspdpn %x0,%x1" [(set_attr "type" "fp")]) (define_insn "vsx_xscvdpspn_scalar" - [(set (match_operand:V4SF 0 "vsx_register_operand" "=wf,?wa") - (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww,ww")] + [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa") + (unspec:V4SF [(match_operand:SF 1 "vsx_register_operand" "ww")] UNSPEC_VSX_CVDPSPN))] "TARGET_XSCVDPSPN" "xscvdpspn %x0,%x1" @@ -4773,15 +4773,13 @@ (define_constants ;; ;; (set (reg:DI reg3) (unspec:DI [(reg:V4SF reg2)] UNSPEC_P8V_RELOAD_FROM_VSX)) ;; -;; (set (reg:DI reg3) (lshiftrt:DI (reg:DI reg3) (const_int 32))) +;; (set (reg:DI reg4) (and:DI (reg:DI reg3) (reg:DI reg3))) ;; -;; (set (reg:DI reg5) (and:DI (reg:DI reg3) (reg:DI reg4))) +;; (set (reg:DI reg5) (ashift:DI (reg:DI reg4) (const_int 32))) ;; -;; (set (reg:DI reg6) (ashift:DI (reg:DI reg5) (const_int 32))) +;; (set (reg:SF reg6) (unspec:SF [(reg:DI reg5)] UNSPEC_P8V_MTVSRD)) ;; -;; (set (reg:SF reg7) (unspec:SF [(reg:DI reg6)] UNSPEC_P8V_MTVSRD)) -;; -;; (set (reg:SF reg7) (unspec:SF [(reg:SF reg7)] UNSPEC_VSX_CVSPDPN)) +;; (set (reg:SF reg6) (unspec:SF [(reg:SF reg6)] UNSPEC_VSX_CVSPDPN)) (define_peephole2 [(match_scratch:DI SFBOOL_TMP_GPR "r") @@ -4792,11 +4790,6 @@ (define_peephole2 (unspec:DI [(match_operand:V4SF SFBOOL_MFVSR_A "vsx_register_operand")] UNSPEC_P8V_RELOAD_FROM_VSX)) - ;; SRDI - (set (match_dup SFBOOL_MFVSR_D) - (lshiftrt:DI (match_dup SFBOOL_MFVSR_D) - (const_int 32))) - ;; AND/IOR/XOR operation on int (set (match_operand:SI SFBOOL_BOOL_D "int_reg_operand") (and_ior_xor:SI (match_operand:SI SFBOOL_BOOL_A1 "int_reg_operand") @@ -4820,15 +4813,15 @@ (define_peephole2 && (REG_P (operands[SFBOOL_BOOL_A2]) || CONST_INT_P (operands[SFBOOL_BOOL_A2])) && (REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_MFVSR_D]) - || peep2_reg_dead_p (3, operands[SFBOOL_MFVSR_D])) + || peep2_reg_dead_p (2, operands[SFBOOL_MFVSR_D])) && (REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A1]) || (REG_P (operands[SFBOOL_BOOL_A2]) && REGNO (operands[SFBOOL_MFVSR_D]) == REGNO (operands[SFBOOL_BOOL_A2]))) && REGNO (operands[SFBOOL_BOOL_D]) == REGNO (operands[SFBOOL_SHL_A]) && (REGNO (operands[SFBOOL_SHL_D]) == REGNO (operands[SFBOOL_BOOL_D]) - || peep2_reg_dead_p (4, operands[SFBOOL_BOOL_D])) - && peep2_reg_dead_p (5, operands[SFBOOL_SHL_D])" + || peep2_reg_dead_p (3, operands[SFBOOL_BOOL_D])) + && peep2_reg_dead_p (4, operands[SFBOOL_SHL_D])" [(set (match_dup SFBOOL_TMP_GPR) (ashift:DI (match_dup SFBOOL_BOOL_A_DI) (const_int 32))) Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 252844) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -986,8 +986,11 @@ (define_insn_and_split "*extendhi<mode>2 (define_insn "extendsi<mode>2" - [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,wl,wu,wj,wK,wH") - (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" "Y,r,Z,Z,r,wK,wH")))] + [(set (match_operand:EXTSI 0 "gpc_reg_operand" + "=r, r, wl, wu, wj, wK, wH, wr") + + (sign_extend:EXTSI (match_operand:SI 1 "lwa_operand" + "Y, r, Z, Z, r, wK, wH, ?wIwH")))] "" "@ lwa%U1%X1 %0,%1 @@ -996,10 +999,23 @@ (define_insn "extendsi<mode>2" lxsiwax %x0,%y1 mtvsrwa %x0,%1 vextsw2d %0,%1 + # #" - [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm") + [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm,mftgpr") (set_attr "sign_extend" "yes") - (set_attr "length" "4,4,4,4,4,4,8")]) + (set_attr "length" "4,4,4,4,4,4,8,8")]) + +(define_split + [(set (match_operand:DI 0 "int_reg_operand") + (sign_extend:DI (match_operand:SI 1 "vsx_register_operand")))] + "TARGET_DIRECT_MOVE_64BIT && reload_completed" + [(set (match_dup 2) + (match_dup 1)) + (set (match_dup 0) + (sign_extend:DI (match_dup 2)))] +{ + operands[2] = gen_rtx_REG (SImode, reg_or_subregno (operands[0])); +}) (define_split [(set (match_operand:DI 0 "altivec_register_operand") @@ -6790,25 +6806,25 @@ (define_insn "*movsi_internal1_single" ;; needed. ;; MR LWZ LFIWZX LXSIWZX STW -;; STFS STXSSP STXSSPX VSX->GPR MTVSRWZ -;; VSX->VSX +;; STFS STXSSP STXSSPX VSX->GPR VSX->VSX, +;; MTVSRWZ (define_insn_and_split "movsi_from_sf" [(set (match_operand:SI 0 "nonimmediate_operand" "=r, r, ?*wI, ?*wH, m, - m, wY, Z, r, wIwH, - ?wK") + m, wY, Z, r, ?*wIwH, + wIwH") (unspec:SI [(match_operand:SF 1 "input_operand" "r, m, Z, Z, r, - f, wb, wu, wIwH, r, - wK")] + f, wb, wu, wIwH, wIwH, + r")] UNSPEC_SI_FROM_SF)) (clobber (match_scratch:V4SF 2 "=X, X, X, X, X, X, X, X, wa, X, - wa"))] + X"))] "TARGET_NO_SF_SUBREG && (register_operand (operands[0], SImode) @@ -6823,10 +6839,10 @@ (define_insn_and_split "movsi_from_sf" stxssp %1,%0 stxsspx %x1,%y0 # - mtvsrwz %x0,%1 - #" + xscvdpspn %x0,%x1 + mtvsrwz %x0,%1" "&& reload_completed - && register_operand (operands[0], SImode) + && int_reg_operand (operands[0], SImode) && vsx_reg_sfsubreg_ok (operands[1], SFmode)" [(const_int 0)] { @@ -6836,50 +6852,38 @@ (define_insn_and_split "movsi_from_sf" rtx op0_di = gen_rtx_REG (DImode, REGNO (op0)); emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1)); - - if (int_reg_operand (op0, SImode)) - { - emit_insn (gen_p8_mfvsrd_4_disf (op0_di, op2)); - emit_insn (gen_lshrdi3 (op0_di, op0_di, GEN_INT (32))); - } - else - { - rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1)); - rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12); - emit_insn (gen_vextract4b (op0_di, op1_v16qi, byte_off)); - } - + emit_insn (gen_p8_mfvsrwz_disf (op0_di, op2)); DONE; } [(set_attr "type" "*, load, fpload, fpload, store, - fpstore, fpstore, fpstore, mftgpr, mffgpr, - veclogical") + fpstore, fpstore, fpstore, mftgpr, fp, + mffgpr") (set_attr "length" "4, 4, 4, 4, 4, - 4, 4, 4, 12, 4, - 8")]) + 4, 4, 4, 8, 4, + 4")]) ;; movsi_from_sf with zero extension ;; ;; RLDICL LWZ LFIWZX LXSIWZX VSX->GPR -;; MTVSRWZ VSX->VSX +;; VSX->VSX MTVSRWZ (define_insn_and_split "*movdi_from_sf_zero_ext" [(set (match_operand:DI 0 "gpc_reg_operand" "=r, r, ?*wI, ?*wH, r, - wIwH, ?wK") + wK, wIwH") (zero_extend:DI (unspec:SI [(match_operand:SF 1 "input_operand" "r, m, Z, Z, wIwH, - r, wK")] + wIwH, r")] UNSPEC_SI_FROM_SF))) (clobber (match_scratch:V4SF 2 "=X, X, X, X, wa, - X, wa"))] + X, X"))] "TARGET_DIRECT_MOVE_64BIT && (register_operand (operands[0], DImode) @@ -6890,9 +6894,10 @@ (define_insn_and_split "*movdi_from_sf_z lfiwzx %0,%y1 lxsiwzx %x0,%y1 # - mtvsrwz %x0,%1 - #" + # + mtvsrwz %x0,%1" "&& reload_completed + && register_operand (operands[0], DImode) && vsx_reg_sfsubreg_ok (operands[1], SFmode)" [(const_int 0)] { @@ -6901,29 +6906,43 @@ (define_insn_and_split "*movdi_from_sf_z rtx op2 = operands[2]; emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1)); - if (int_reg_operand (op0, DImode)) - { - emit_insn (gen_p8_mfvsrd_4_disf (op0, op2)); - emit_insn (gen_lshrdi3 (op0, op0, GEN_INT (32))); - } + emit_insn (gen_p8_mfvsrwz_disf (op0, op2)); else { - rtx op0_si = gen_rtx_REG (SImode, REGNO (op0)); - rtx op1_v16qi = gen_rtx_REG (V16QImode, REGNO (op1)); - rtx byte_off = VECTOR_ELT_ORDER_BIG ? const0_rtx : GEN_INT (12); - emit_insn (gen_vextract4b (op0_si, op1_v16qi, byte_off)); + rtx op2_si = gen_rtx_REG (SImode, reg_or_subregno (op2)); + emit_insn (gen_zero_extendsidi2 (op0, op2_si)); } DONE; } [(set_attr "type" "*, load, fpload, fpload, mftgpr, - mffgpr, veclogical") + vecexts, mffgpr") (set_attr "length" - "4, 4, 4, 4, 12, - 4, 8")]) + "4, 4, 4, 4, 8, + 8, 4")]) + +;; Like movsi_from_sf, but combine a convert from DFmode to SFmode before +;; moving it to SImode. We can do a SFmode store without having to do the +;; conversion explicitly. If we are doing a register->register conversion, use +;; XSCVDPSP instead of XSCVDPSPN, since the former handles cases where the +;; input will not fit in a SFmode, and the later assumes the value has already +;; been rounded. +(define_insn "*movsi_from_df" + [(set (match_operand:SI 0 "nonimmediate_operand" "=wa,m,wY,Z") + (unspec:SI [(float_truncate:SF + (match_operand:DF 1 "gpc_reg_operand" "wa, f,wb,wa"))] + UNSPEC_SI_FROM_SF))] + + "TARGET_NO_SF_SUBREG" + "@ + xscvdpsp %x0,%x1 + stfs%U0%X0 %1,%0 + stxssp %1,%0 + stxsspx %x1,%y0" + [(set_attr "type" "fp,fpstore,fpstore,fpstore")]) ;; Split a load of a large constant into the appropriate two-insn ;; sequence. @@ -8437,19 +8456,20 @@ (define_insn_and_split "reload_gpr_from_ rtx diop0 = simplify_gen_subreg (DImode, op0, SFmode, 0); emit_insn (gen_vsx_xscvdpspn_scalar (op2, op1)); - emit_insn (gen_p8_mfvsrd_4_disf (diop0, op2)); - emit_insn (gen_lshrdi3 (diop0, diop0, GEN_INT (32))); + emit_insn (gen_p8_mfvsrwz_disf (diop0, op2)); DONE; } [(set_attr "length" "12") (set_attr "type" "three")]) -(define_insn "p8_mfvsrd_4_disf" +;; XSCVDPSPN puts the 32-bit value in both the first and second words, so we do +;; not need to do a shift to extract the value. +(define_insn "p8_mfvsrwz_disf" [(set (match_operand:DI 0 "register_operand" "=r") (unspec:DI [(match_operand:V4SF 1 "register_operand" "wa")] UNSPEC_P8V_RELOAD_FROM_VSX))] "TARGET_POWERPC64 && TARGET_DIRECT_MOVE" - "mfvsrd %0,%x1" + "mfvsrwz %0,%x1" [(set_attr "type" "mftgpr")]) Index: gcc/testsuite/gcc.target/powerpc/pr71977-1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/pr71977-1.c (revision 252844) +++ gcc/testsuite/gcc.target/powerpc/pr71977-1.c (working copy) @@ -23,9 +23,9 @@ mask_and_float_var (float f, uint32_t ma return u.value; } -/* { dg-final { scan-assembler "\[ \t\]xxland " } } */ -/* { dg-final { scan-assembler-not "\[ \t\]and " } } */ -/* { dg-final { scan-assembler-not "\[ \t\]mfvsrd " } } */ -/* { dg-final { scan-assembler-not "\[ \t\]stxv" } } */ -/* { dg-final { scan-assembler-not "\[ \t\]lxv" } } */ -/* { dg-final { scan-assembler-not "\[ \t\]srdi " } } */ +/* { dg-final { scan-assembler {\mxxland\M} } } */ +/* { dg-final { scan-assembler-not {\mand\M} } } */ +/* { dg-final { scan-assembler-not {\mmfvsrwz\M} } } */ +/* { dg-final { scan-assembler-not {\mstxv\M} } } */ +/* { dg-final { scan-assembler-not {\mlxv\M} } } */ +/* { dg-final { scan-assembler-not {\msrdi\M} } } */ Index: gcc/testsuite/gcc.target/powerpc/direct-move-float1.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/direct-move-float1.c (revision 252844) +++ gcc/testsuite/gcc.target/powerpc/direct-move-float1.c (working copy) @@ -5,7 +5,7 @@ /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ /* { dg-options "-mcpu=power8 -O2" } */ /* { dg-final { scan-assembler "mtvsrd" } } */ -/* { dg-final { scan-assembler "mfvsrd" } } */ +/* { dg-final { scan-assembler "mfvsrwz" } } */ /* { dg-final { scan-assembler "xscvdpspn" } } */ /* { dg-final { scan-assembler "xscvspdpn" } } */ Index: gcc/testsuite/gcc.target/powerpc/direct-move-float3.c =================================================================== --- gcc/testsuite/gcc.target/powerpc/direct-move-float3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/direct-move-float3.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do compile { target { powerpc*-*-linux* && lp64 } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } } */ +/* { dg-skip-if "" { powerpc*-*-*spe* } } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mpower8-vector -O2" } */ + +/* Test that we generate XSCVDPSP instead of FRSP and XSCVDPSPN when we combine + a round from double to float and moving the float value to a GPR. */ + +union u { + float f; + unsigned int ui; + int si; +}; + +unsigned int +ui_d (double d) +{ + union u x; + x.f = d; + return x.ui; +} + +/* { dg-final { scan-assembler {\mmfvsrwz\M} } } */ +/* { dg-final { scan-assembler {\mxscvdpsp\M} } } */ +/* { dg-final { scan-assembler-not {\mmtvsrd\M} } } */ +/* { dg-final { scan-assembler-not {\mxscvdpspn\M} } } */ +/* { dg-final { scan-assembler-not {\msrdi\M} } } */