Message ID | 20200228053206.GA25306@ibm-toto.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | , PR target/93937, Fix variable vec_extract insn that will never match | expand |
Hi! On Fri, Feb 28, 2020 at 12:32:06AM -0500, Michael Meissner wrote: > As part of my work in adding support for -mcpu=future, I noticed an insn that > would never match. > It will never match, because the zero_extend result is the same mode as the > input, so the machine independent parts of the compiler would never insert the > zero extend. It's not valid RTL, even: @findex zero_extend @item (zero_extend:@var{m} @var{x}) Represents the result of zero-extending the value @var{x} to machine mode @var{m}. @var{m} must be a fixed-point mode and @var{x} a fixed-point value of a mode narrower than @var{m}. > There is a wider issue to optimize all cases of vec_extract to do the sign, > zero, and float extension automatically when we are loading from memory, which > is PR target/93230. I have patches for all of the cases for 93230, but they > will need to wait until GCC 11 opens up. If you don't use reload_completed in the split condition you do not have this problem (in the normal case). Please work on that? > But it irks me to have this pattern that mostly was correct, but it would never > match. As I see it, there are 4 options: > > 1) Delete the insn completely, since it doesn't match, and then put in code > later to cover the case when PR target/93230 is done. Such a patch is pre-approved. > 3) Patch the existing insns so that they do match, but don't add all of the > other options that could be added (adding sign extension, adding the ability to > load the value into vector registers with ISA 2.07, optimizing vec_extract > being cnverted to floating point to avoid direct moves, etc.). > > 4) Do all of the possibilities now. The trouble is due to the number of > special cases, this can add up to a number of new insns (for example, dealing > with HImode/QImode also being zero extended to SImode in addition to DImode, > dealing with QImode not having a sign extending load, dealing with SImode that > can load into the vector registers at ISA 2.05/2.06 which HI/QI modes need > 2.07, etc.). > > Given we are in stage 4, I think #4 is not appropriate (but if you want, I can > do the patches). That goes for option 3 as well. Thanks, Segher
On Fri, Feb 28, 2020 at 06:45:25AM -0600, Segher Boessenkool wrote: > Hi! > > On Fri, Feb 28, 2020 at 12:32:06AM -0500, Michael Meissner wrote: > > As part of my work in adding support for -mcpu=future, I noticed an insn that > > would never match. > > > It will never match, because the zero_extend result is the same mode as the > > input, so the machine independent parts of the compiler would never insert the > > zero extend. > > It's not valid RTL, even: > @findex zero_extend > @item (zero_extend:@var{m} @var{x}) > Represents the result of zero-extending the value @var{x} > to machine mode @var{m}. @var{m} must be a fixed-point mode > and @var{x} a fixed-point value of a mode narrower than @var{m}. > > > There is a wider issue to optimize all cases of vec_extract to do the sign, > > zero, and float extension automatically when we are loading from memory, which > > is PR target/93230. I have patches for all of the cases for 93230, but they > > will need to wait until GCC 11 opens up. > > If you don't use reload_completed in the split condition you do not have > this problem (in the normal case). Please work on that? No. I tend to think that if we do the split before reload, that it will cause some regressions, because the register allocator will take the opportunity to change loads to vector registers to be loads to GPRs and direct moves. One of the original motivations for some of these patches is to avoid direct moves. I also worry that things like having to use SUBREG's before RA (instead of just changing the mode and/or the register number that we can do after reload) will not work because generally vectors and scalars aren't tieable.
On Mon, Mar 02, 2020 at 07:41:42PM -0500, Michael Meissner wrote: > On Fri, Feb 28, 2020 at 06:45:25AM -0600, Segher Boessenkool wrote: > > On Fri, Feb 28, 2020 at 12:32:06AM -0500, Michael Meissner wrote: > > > There is a wider issue to optimize all cases of vec_extract to do the sign, > > > zero, and float extension automatically when we are loading from memory, which > > > is PR target/93230. I have patches for all of the cases for 93230, but they > > > will need to wait until GCC 11 opens up. > > > > If you don't use reload_completed in the split condition you do not have > > this problem (in the normal case). Please work on that? > > No. I tend to think that if we do the split before reload, that it will cause > some regressions, because the register allocator will take the opportunity to > change loads to vector registers to be loads to GPRs and direct moves. It will cause better optimisations, yes. Earlier optimisations. The compiler will use a different register set if it thinks that is cheaper to do. We need to get that right *anyway*, because it is used from many more places. > One of > the original motivations for some of these patches is to avoid direct moves. You only need to do all of this manually because you split after reload. That is only a good thing to do if you *have* to, usually because some datum can end up in memory or in a register, and those are significantly differently for the machine code you need. This is not true here, because you do *not* allow both regs and mem (in the normal case). If you split earlier, you do not have to do all the main rtl optimisations (say cprop, fwprop, combine, insn selection in general) manually, as you do have to do to not get terrible code if you split after reload. > I also worry that things like having to use SUBREG's before RA (instead of just > changing the mode and/or the register number that we can do after reload) will > not work because generally vectors and scalars aren't tieable. You have pseudos before reload. Subregs work fine. If they weren't tieable you could not do this on hard regs either, so I don't see your point at all here? Segher
--- /tmp/DHFA3L_vsx.md 2020-02-27 23:48:34.871654766 -0500 +++ gcc/config/rs6000/vsx.md 2020-02-27 16:26:51.538724961 -0500 @@ -3749,15 +3749,16 @@ (define_insn_and_split "*vsx_extract_<mo } [(set_attr "type" "load")]) -(define_insn_and_split "*vsx_extract_<mode>_<VS_scalar>mode_var" - [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r,r") - (zero_extend:<VS_scalar> - (unspec:<VSX_EXTRACT_I:VS_scalar> - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,Q") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] +;; Variable V16QI/V8HI/V4SI extract from a register and zero extend to DImode. +(define_insn_and_split "*vsx_extract_<mode>_uns_di_var" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r") + (zero_extend:DI + (unspec:<VS_scalar> + [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v") + (match_operand:DI 2 "gpc_reg_operand" "r,r")] UNSPEC_VSX_EXTRACT))) - (clobber (match_scratch:DI 3 "=r,r,&b")) - (clobber (match_scratch:V2DI 4 "=X,&v,X"))] + (clobber (match_scratch:DI 3 "=r,r")) + (clobber (match_scratch:V2DI 4 "=X,&v"))] "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I:MODE>mode) && TARGET_DIRECT_MOVE_64BIT" "#" "&& reload_completed" @@ -3769,7 +3770,29 @@ (define_insn_and_split "*vsx_extract_<mo operands[3], operands[4]); DONE; } - [(set_attr "isa" "p9v,*,*")]) + [(set_attr "isa" "p9v,*")]) + +;; Variable V16QI/V8HI/V4SI extract from memory and zero extend to DImode. +(define_insn_and_split "*vsx_extract_<mode>_uns_di_var_load" + [(set (match_operand:DI 0 "gpc_reg_operand" "=r") + (zero_extend:DI + (unspec:<VS_scalar> + [(match_operand:VSX_EXTRACT_I 1 "memory_operand" "Q") + (match_operand:DI 2 "gpc_reg_operand" "r")] + UNSPEC_VSX_EXTRACT))) + (clobber (match_scratch:DI 3 "=&b"))] + "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT" + "#" + "&& reload_completed" + [(set (match_dup 0) + (zero_extend:DI (match_dup 4)))] +{ + machine_mode smode = <VS_scalar>mode; + rtx reg_small = gen_rtx_REG (smode, reg_or_subregno (operands[0])); + operands[4] = rs6000_adjust_vec_address (reg_small, operands[1], operands[2], + operands[3], <VS_scalar>mode); +} + [(set_attr "type" "load")]) ;; VSX_EXTRACT optimizations ;; Optimize double d = (double) vec_extract (vi, <n>) --- /tmp/4yG7Ho_fold-vec-extract-char.p8.c 2020-02-27 23:48:34.882654847 -0500 +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c 2020-02-27 19:55:47.011685661 -0500 @@ -6,9 +6,9 @@ /* { dg-options "-mdejagnu-cpu=power8 -O2" } */ // six tests total. Targeting P8LE / P8BE. -// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, rlwinm, (extsb) +// P8 LE variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (extsb) // P8 LE constant offset: vspltb, mfvsrd, rlwinm, (extsb) -// P8 BE variable offset: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, rlwinm, (extsb) +// P8 BE variable offset: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (extsb) // P8 BE constant offset: vspltb, mfvsrd, rlwinm, (extsb) /* { dg-final { scan-assembler-times {\mrldicl\M} 3 { target { le } } } } */ @@ -21,7 +21,7 @@ /* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */ /* { dg-final { scan-assembler-times "extsb" 2 } } */ /* { dg-final { scan-assembler-times {\mvspltb\M} 3 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {\mrlwinm\M} 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times {\mrlwinm\M} 2 { target lp64 } } } */ /* multiple codegen variations for -m32. */ /* { dg-final { scan-assembler-times {\mrlwinm\M} 3 { target ilp32 } } } */ --- /tmp/JCT8n1_fold-vec-extract-int.p8.c 2020-02-27 23:48:34.889654898 -0500 +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c 2020-02-27 23:45:39.924371415 -0500 @@ -7,14 +7,14 @@ // Targeting P8 (LE) and (BE). 6 tests total. // P8 LE constant: vspltw, mfvsrwz, (1:extsw/2:rldicl) -// P8 LE variables: subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/5:rldicl)) +// P8 LE variables: subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/3:rldicl)) // P8 BE constant: vspltw, mfvsrwz, (1:extsw/2:rldicl) // P8 BE variables: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/2:rldicl)) /* { dg-final { scan-assembler-times {\mvspltw\M} 3 { target lp64 } } } */ /* { dg-final { scan-assembler-times {\mmfvsrwz\M} 3 { target lp64 } } } */ -/* { dg-final { scan-assembler-times {\mrldicl\M} 7 { target { le } } } } */ -/* { dg-final { scan-assembler-times {\mrldicl\M} 4 { target { lp64 && be } } } } */ +/* { dg-final { scan-assembler-times {\mrldicl\M} 5 { target { le } } } } */ +/* { dg-final { scan-assembler-times {\mrldicl\M} 2 { target { lp64 && be } } } } */ /* { dg-final { scan-assembler-times {\msubfic\M} 3 { target { le } } } } */ /* { dg-final { scan-assembler-times {\msldi\M} 3 { target lp64 } } } */ /* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */ --- /tmp/bVbx5D_fold-vec-extract-short.p8.c 2020-02-27 23:48:34.896654950 -0500 +++ gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c 2020-02-27 23:44:15.883754923 -0500 @@ -6,10 +6,10 @@ /* { dg-options "-mdejagnu-cpu=power8 -O2" } */ // six tests total. Targeting P8, both LE and BE. -// p8 (le) variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, srdi, (1:extsh/2:rlwinm) -// p8 (le) const offset: mtvsrd, (1:extsh/2:rlwinm) -// p8 (be) var offset: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, srdi, (1:extsh:2:rlwinm) -// p8 (be) const offset: vsplth, mfvsrd, (1:extsh/2:rlwinm) +// p8 (le) variable offset: rldicl, subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, srdi, (1:extsh/1:rlwinm) +// p8 (le) const offset: mtvsrd, (1:extsh/1:rlwinm) +// p8 (be) var offset: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, srdi, (1:extsh:1:rlwinm) +// p8 (be) const offset: vsplth, mfvsrd, (1:extsh/1:rlwinm) // * - each of the above will have an extsh if the argument is signed. // * - bool and unsigned tests also have an rlwinm. @@ -24,7 +24,7 @@ /* { dg-final { scan-assembler-times "mfvsrd" 6 { target lp64 } } } */ /* { dg-final { scan-assembler-times "srdi" 3 { target lp64 } } } */ /* { dg-final { scan-assembler-times "extsh" 2 { target lp64 } } } */ -/* { dg-final { scan-assembler-times "rlwinm" 4 { target lp64 } } } */ +/* { dg-final { scan-assembler-times "rlwinm" 2 { target lp64 } } } */ /* -m32 codegen tests. */ /* { dg-final { scan-assembler-times {\mli\M} 6 { target ilp32 } } } */