Message ID | 577B7212.6080809@foss.arm.com |
---|---|
State | New |
Headers | show |
On Tue, 5 Jul 2016, Kyrill Tkachov wrote: > > On 01/07/16 13:02, Richard Biener wrote: > > On Thu, 30 Jun 2016, Kyrill Tkachov wrote: > > > > > On 28/06/16 08:54, Richard Biener wrote: > > > > On Thu, 16 Jun 2016, Kyrill Tkachov wrote: > > > > > > > > > On 15/06/16 22:53, Marc Glisse wrote: > > > > > > On Wed, 15 Jun 2016, Kyrill Tkachov wrote: > > > > > > > > > > > > > This is a respin of > > > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following > > > > > > > feedback. > > > > > > > I've changed the code to cast the operand to an unsigned type > > > > > > > before > > > > > > > applying the multiplication algorithm > > > > > > > and cast it back to the signed type at the end. > > > > > > > Whether to perform the cast is now determined by the function > > > > > > > cast_mult_synth_to_unsigned in which I've implemented > > > > > > > the cases that Marc mentioned in [1]. Please do let me know > > > > > > > if there are any other cases that need to be handled. > > > > > > Ah, I never meant those cases as an exhaustive list, I was just > > > > > > looking > > > > > > for > > > > > > examples showing that the transformation was unsafe, and those 2 > > > > > > came to > > > > > > mind: > > > > > > > > > > > > - x*15 -> x*16-x the second one can overflow even when the first one > > > > > > doesn't. > > > > > > > > > > > > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe > > > > > > that's > > > > > > redundant with the negate_variant check?) > > > > > > > > > > > > On the other hand, as long as we remain in the 'positive' > > > > > > operations, > > > > > > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to > > > > > > (x*16-x)*2 > > > > > > cannot cause spurious overflows. But I didn't look at the algorithm > > > > > > closely > > > > > > enough to characterize the safe cases. Now if you have done it, > > > > > > that's > > > > > > good > > > > > > :-) Otherwise, we might want to err on the side of caution. > > > > > > > > > > > I'll be honest, I didn't give it much thought beyond convincing myself > > > > > that > > > > > the two cases you listed are legitimate. > > > > > Looking at expand_mult_const in expmed.c can be helpful (where it > > > > > updates > > > > > val_so_far for checking purposes) to see > > > > > the different algorithm cases. I think the only steps that could cause > > > > > overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the > > > > > final > > > > > step is negate_variant, which are what you listed (and covered in this > > > > > patch). > > > > > > > > > > richi is away on PTO for the time being though, so we have some time > > > > > to > > > > > convince ourselves :) > > > > ;) I think the easiest way would be to always use unsigned arithmetic. > > > > > > > > While VRP doesn't do anything on vector operations we still have some > > > > match.pd patterns that rely on correct overflow behavior and those > > > > may be enabled for vector types as well. > > > That's fine with me. > > > Here's the patch that performs the casts to unsigned and back when the > > > input > > > type doesn't wrap on overflow. > > > > > > Bootstrapped and tested on arm, aarch64, x86_64. > > > > > > How's this? > > +static bool > > +target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var, > > + tree scaltype) > > +{ > > ... > > + tree vectype = get_vectype_for_scalar_type (scaltype); > > + > > + if (!vectype) > > + return false; > > + > > + /* All synthesis algorithms require shifts, so bail out early if > > + target cannot vectorize them. > > + TODO: If the target doesn't support vector shifts but supports > > vector > > + addition we could synthesize shifts that way. > > vect_synth_mult_by_constant > > + would need to be updated to do that. */ > > + if (!vect_supportable_shift (LSHIFT_EXPR, scaltype)) > > + return false; > > > > I think these tests should be done in the caller before calling > > synth_mult (and vectype be passed down accordingly). Also I wonder > > if synth_mult for a * 2 produces a << 1 or a + a - the case of > > a * 2 -> a + a was what Marcs patch handled. Guarding off LSHIFT_EXPR > > support that early will make that fail on targets w/o vector shift. > > I believe it depends on the relative rtx costs (which of course are not > relevant > at vector gimple level). Anyway, I've moved the check outside. > > I've also added code to synthesise the shifts by additions when vector shift > is not available (the new function is synth_lshift_by_additions). > > > + bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype); > > + bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype); > > + > > + if (var == negate_variant > > + && !target_has_vecop_for_code (NEGATE_EXPR, vectype)) > > + return false; > > > > I think we don't have any targets that support PLUS but not MINUS > > or targets that do not support NEGATE. OTOH double-checking doesn't > > matter. > > > > +apply_binop_and_append_stmt (tree_code code, tree op1, tree op2, > > + stmt_vec_info stmt_vinfo) > > +{ > > + if (TREE_INT_CST_LOW (op2) == 0 > > > > integer_zerop (op2) > > Ok > > > + && (code == LSHIFT_EXPR > > + || code == PLUS_EXPR)) > > > > + tree itype = TREE_TYPE (op2); > > > > I think it's dangerous to use the type of op2 given you synthesize > > shifts as well. Why not use the type of op1? > > Yeah, we should be taking the type of op1. Fixed. > > > + /* TODO: Targets that don't support vector shifts could synthesize > > + them using vector additions. target_supports_mult_synth_alg > > would > > + need to be updated to allow this. */ > > + switch (alg.op[i]) > > + { > > > > so I suppose one could at least special-case << 1 and always use > > PLUS for that. > > As said above, I added code to synthesize all constant shifts using additions > if the target doesn't support vector shifts. Note that on targets that do > support > vector shifts we will still generate a shift. I think that's fair enough but > if you > really want to special-case this to always generate an addition I suppose I > can do that. > > > Otherwise looks ok to me. There is a PR with a testcase for > > the x * 2 issue so please add that if it is fixed or amend the fix if not > > ;) > > Thanks for the review. I've added the * 2 and * 4 case. > As for testing I've bootstrapped and tested the patch on aarch64 and x86_64 > with synth_shift_p in vect_synth_mult_by_constant hacked to be always true to > exercise the paths that synthesize the shift by additions. Marc, could you > test this > on the sparc targets you care about? Looks good. Thanks, Richard. > Thanks, > Kyrill > > 2016-07-05 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR target/65951 > * tree-vect-patterns.c: Include mult-synthesis.h. > (target_supports_mult_synth_alg): New function. > (synth_lshift_by_additions): Likewise. > (apply_binop_and_append_stmt): Likewise. > (vect_synth_mult_by_constant): Likewise. > (target_has_vecop_for_code): Likewise. > (vect_recog_mult_pattern): Use above functions to synthesize vector > multiplication by integer constants. > > 2016-07-05 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR target/65951 > * gcc.dg/vect/vect-mult-const-pattern-1.c: New test. > * gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise. > * gcc.dg/vect/pr65951.c: Likewise. > >
On Tue, 5 Jul 2016, Kyrill Tkachov wrote: > As for testing I've bootstrapped and tested the patch on aarch64 and > x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be > always true to exercise the paths that synthesize the shift by > additions. Marc, could you test this on the sparc targets you care > about? I don't have access to Sparc, you want Rainer here (added in Cc:). (thanks for completing the patch!)
Marc Glisse <marc.glisse@inria.fr> writes: > On Tue, 5 Jul 2016, Kyrill Tkachov wrote: > >> As for testing I've bootstrapped and tested the patch on aarch64 and >> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be >> always true to exercise the paths that synthesize the shift by >> additions. Marc, could you test this on the sparc targets you care about? > > I don't have access to Sparc, you want Rainer here (added in Cc:). As is, the patch doesn't even build on current mainline: /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var, ^ enum mult_variant is only declared in expmed.c. Rainer
On 05/07/16 12:24, Rainer Orth wrote: > Marc Glisse <marc.glisse@inria.fr> writes: > >> On Tue, 5 Jul 2016, Kyrill Tkachov wrote: >> >>> As for testing I've bootstrapped and tested the patch on aarch64 and >>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be >>> always true to exercise the paths that synthesize the shift by >>> additions. Marc, could you test this on the sparc targets you care about? >> I don't have access to Sparc, you want Rainer here (added in Cc:). > As is, the patch doesn't even build on current mainline: > > /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared > target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var, > ^ > > enum mult_variant is only declared in expmed.c. Ah, sorry I forgot to mention that this is patch 2/2. It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which is already approved but I was planning to commit it together with this one. Can you please try applying https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html as well as this? Thanks, Kyrill > Rainer >
Hi Kyrill, > On 05/07/16 12:24, Rainer Orth wrote: >> Marc Glisse <marc.glisse@inria.fr> writes: >> >>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote: >>> >>>> As for testing I've bootstrapped and tested the patch on aarch64 and >>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be >>>> always true to exercise the paths that synthesize the shift by >>>> additions. Marc, could you test this on the sparc targets you care about? >>> I don't have access to Sparc, you want Rainer here (added in Cc:). >> As is, the patch doesn't even build on current mainline: >> >> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared >> target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var, >> ^ >> >> enum mult_variant is only declared in expmed.c. > > Ah, sorry I forgot to mention that this is patch 2/2. > It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which > is already approved > but I was planning to commit it together with this one. > Can you please try applying > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html > as well as this? sure, that did the trick. A sparc-sun-solaris2.12 bootstrap revealed that the patch fixes PR tree-optimization/70923 (you should mention that in the ChangeLog or close it as a duplicate), with the same caveat as about Marc's latest patch for that: +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1 The message appears twice, not once, on sparc, so the testcase should be updated to accomodate that. Besides, the new testcase FAILs: +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 2 +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2 The dump contains gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3; gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function. gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3; gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function. gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop. gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop. gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function. Rainer
On 06/07/16 13:31, Rainer Orth wrote: > Hi Kyrill, > >> On 05/07/16 12:24, Rainer Orth wrote: >>> Marc Glisse <marc.glisse@inria.fr> writes: >>> >>>> On Tue, 5 Jul 2016, Kyrill Tkachov wrote: >>>> >>>>> As for testing I've bootstrapped and tested the patch on aarch64 and >>>>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be >>>>> always true to exercise the paths that synthesize the shift by >>>>> additions. Marc, could you test this on the sparc targets you care about? >>>> I don't have access to Sparc, you want Rainer here (added in Cc:). >>> As is, the patch doesn't even build on current mainline: >>> >>> /vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 'mult_variant' has not been declared >>> target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var, >>> ^ >>> >>> enum mult_variant is only declared in expmed.c. >> Ah, sorry I forgot to mention that this is patch 2/2. >> It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which >> is already approved >> but I was planning to commit it together with this one. >> Can you please try applying >> https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html >> as well as this? > sure, that did the trick. A sparc-sun-solaris2.12 bootstrap revealed > that the patch fixes PR tree-optimization/70923 (you should mention that > in the ChangeLog or close it as a duplicate), with the same caveat as > about Marc's latest patch for that: Thanks! Much appreciated. > +FAIL: gcc.dg/vect/vect-iv-9.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 1 > +FAIL: gcc.dg/vect/vect-iv-9.c scan-tree-dump-times vect "vectorized 1 loops" 1 > > The message appears twice, not once, on sparc, so the testcase should be > updated to accomodate that. Ok. > Besides, the new testcase FAILs: > > +FAIL: gcc.dg/vect/pr65951.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 2 > +FAIL: gcc.dg/vect/pr65951.c scan-tree-dump-times vect "vectorized 1 loops" 2 > > The dump contains > > gcc.dg/vect/pr65951.c:14:3: note: not vectorized: no vectype for stmt: _4 = *_3; > gcc.dg/vect/pr65951.c:12:1: note: vectorized 0 loops in function. > gcc.dg/vect/pr65951.c:21:3: note: not vectorized: no vectype for stmt: _4 = *_3; > gcc.dg/vect/pr65951.c:19:1: note: vectorized 0 loops in function. > gcc.dg/vect/pr65951.c:55:15: note: not vectorized: control flow in loop. > gcc.dg/vect/pr65951.c:46:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed > gcc.dg/vect/pr65951.c:41:15: note: not vectorized: control flow in loop. > gcc.dg/vect/pr65951.c:32:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed > gcc.dg/vect/pr65951.c:26:1: note: vectorized 0 loops in function. I see. I suppose SPARC doesn't have vector shifts operating on 64-bit data? I believe the testcase should be updated to use just "int" arrays rather than "long long". I'll respin the testcases Thanks again, Kyrill > Rainer >
diff --git a/gcc/testsuite/gcc.dg/vect/pr65951.c b/gcc/testsuite/gcc.dg/vect/pr65951.c new file mode 100644 index 0000000000000000000000000000000000000000..97ec1661d3f1220158dbc30bc2617e127dde47ae --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr65951.c @@ -0,0 +1,63 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 512 + +/* These multiplications should be vectorizable with additions when + no vector shift is available. */ + +__attribute__ ((noinline)) void +foo (long long *arr) +{ + for (int i = 0; i < N; i++) + arr[i] *= 2; +} + +__attribute__ ((noinline)) void +foo2 (long long *arr) +{ + for (int i = 0; i < N; i++) + arr[i] *= 4; +} + +int +main (void) +{ + check_vect (); + long long data[N]; + int i; + + for (i = 0; i < N; i++) + { + data[i] = i; + __asm__ volatile (""); + } + + foo (data); + for (i = 0; i < N; i++) + { + if (data[i] / 2 != i) + __builtin_abort (); + __asm__ volatile (""); + } + + for (i = 0; i < N; i++) + { + data[i] = i; + __asm__ volatile (""); + } + + foo2 (data); + for (i = 0; i < N; i++) + { + if (data[i] / 4 != i) + __builtin_abort (); + __asm__ volatile (""); + } + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c new file mode 100644 index 0000000000000000000000000000000000000000..e5dba82d7fa955a6a37a0eabf980127e464ac77b --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c @@ -0,0 +1,41 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_shift } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 256 + +__attribute__ ((noinline)) void +foo (long long *arr) +{ + for (int i = 0; i < N; i++) + arr[i] *= 123; +} + +int +main (void) +{ + check_vect (); + long long data[N]; + int i; + + for (i = 0; i < N; i++) + { + data[i] = i; + __asm__ volatile (""); + } + + foo (data); + for (i = 0; i < N; i++) + { + if (data[i] / 123 != i) + __builtin_abort (); + __asm__ volatile (""); + } + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c new file mode 100644 index 0000000000000000000000000000000000000000..c5beabaa97425cc1e644d37a69eba65036eeaf4a --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c @@ -0,0 +1,40 @@ +/* { dg-require-effective-target vect_int } */ + +#include <stdarg.h> +#include "tree-vect.h" + +#define N 256 + +__attribute__ ((noinline)) void +foo (long long *arr) +{ + for (int i = 0; i < N; i++) + arr[i] *= -19594LL; +} + +int +main (void) +{ + check_vect (); + long long data[N]; + int i; + + for (i = 0; i < N; i++) + { + data[i] = i; + __asm__ volatile (""); + } + + foo (data); + for (i = 0; i < N; i++) + { + if (data[i] / -19594LL != i) + __builtin_abort (); + __asm__ volatile (""); + } + + return 0; +} + +/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect" { target aarch64*-*-* } } } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index f0c515daaa22f59e4bcefe1fcf3e44bf29fe9a40..3be1b89d5842955a4ef56f4a0e5e6f47046c5150 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -2131,32 +2131,313 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts, return pattern_stmt; } -/* Detect multiplication by constant which are postive or negatives of power 2, - and convert them to shift patterns. +/* Return true iff the target has a vector optab implementing the operation + CODE on type VECTYPE. */ - Mult with constants that are postive power of two. - type a_t; - type b_t - S1: b_t = a_t * n +static bool +target_has_vecop_for_code (tree_code code, tree vectype) +{ + optab voptab = optab_for_tree_code (code, vectype, optab_vector); + return voptab + && optab_handler (voptab, TYPE_MODE (vectype)) != CODE_FOR_nothing; +} - or +/* Verify that the target has optabs of VECTYPE to perform all the steps + needed by the multiplication-by-immediate synthesis algorithm described by + ALG and VAR. If SYNTH_SHIFT_P is true ensure that vector addition is + present. Return true iff the target supports all the steps. */ + +static bool +target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var, + tree vectype, bool synth_shift_p) +{ + if (alg->op[0] != alg_zero && alg->op[0] != alg_m) + return false; + + bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype); + bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype); + + if (var == negate_variant + && !target_has_vecop_for_code (NEGATE_EXPR, vectype)) + return false; + + /* If we must synthesize shifts with additions make sure that vector + addition is available. */ + if ((var == add_variant || synth_shift_p) && !supports_vplus) + return false; + + for (int i = 1; i < alg->ops; i++) + { + switch (alg->op[i]) + { + case alg_shift: + break; + case alg_add_t_m2: + case alg_add_t2_m: + case alg_add_factor: + if (!supports_vplus) + return false; + break; + case alg_sub_t_m2: + case alg_sub_t2_m: + case alg_sub_factor: + if (!supports_vminus) + return false; + break; + case alg_unknown: + case alg_m: + case alg_zero: + case alg_impossible: + return false; + default: + gcc_unreachable (); + } + } + + return true; +} + +/* Synthesize a left shift of OP by AMNT bits using a series of additions and + putting the final result in DEST. Append all statements but the last into + VINFO. Return the last statement. */ + +static gimple * +synth_lshift_by_additions (tree dest, tree op, HOST_WIDE_INT amnt, + stmt_vec_info vinfo) +{ + HOST_WIDE_INT i; + tree itype = TREE_TYPE (op); + tree prev_res = op; + gcc_assert (amnt >= 0); + for (i = 0; i < amnt; i++) + { + tree tmp_var = (i < amnt - 1) ? vect_recog_temp_ssa_var (itype, NULL) + : dest; + gimple *stmt + = gimple_build_assign (tmp_var, PLUS_EXPR, prev_res, prev_res); + prev_res = tmp_var; + if (i < amnt - 1) + append_pattern_def_seq (vinfo, stmt); + else + return stmt; + } + gcc_unreachable (); + return NULL; +} + +/* Helper for vect_synth_mult_by_constant. Apply a binary operation + CODE to operands OP1 and OP2, creating a new temporary SSA var in + the process if necessary. Append the resulting assignment statements + to the sequence in STMT_VINFO. Return the SSA variable that holds the + result of the binary operation. If SYNTH_SHIFT_P is true synthesize + left shifts using additions. */ + +static tree +apply_binop_and_append_stmt (tree_code code, tree op1, tree op2, + stmt_vec_info stmt_vinfo, bool synth_shift_p) +{ + if (integer_zerop (op2) + && (code == LSHIFT_EXPR + || code == PLUS_EXPR)) + { + gcc_assert (TREE_CODE (op1) == SSA_NAME); + return op1; + } + + gimple *stmt; + tree itype = TREE_TYPE (op1); + tree tmp_var = vect_recog_temp_ssa_var (itype, NULL); + + if (code == LSHIFT_EXPR + && synth_shift_p) + { + stmt = synth_lshift_by_additions (tmp_var, op1, TREE_INT_CST_LOW (op2), + stmt_vinfo); + append_pattern_def_seq (stmt_vinfo, stmt); + return tmp_var; + } + + stmt = gimple_build_assign (tmp_var, code, op1, op2); + append_pattern_def_seq (stmt_vinfo, stmt); + return tmp_var; +} + +/* Synthesize a multiplication of OP by an INTEGER_CST VAL using shifts + and simple arithmetic operations to be vectorized. Record the statements + produced in STMT_VINFO and return the last statement in the sequence or + NULL if it's not possible to synthesize such a multiplication. + This function mirrors the behavior of expand_mult_const in expmed.c but + works on tree-ssa form. */ + +static gimple * +vect_synth_mult_by_constant (tree op, tree val, + stmt_vec_info stmt_vinfo) +{ + tree itype = TREE_TYPE (op); + machine_mode mode = TYPE_MODE (itype); + struct algorithm alg; + mult_variant variant; + if (!tree_fits_shwi_p (val)) + return NULL; + + /* Multiplication synthesis by shifts, adds and subs can introduce + signed overflow where the original operation didn't. Perform the + operations on an unsigned type and cast back to avoid this. + In the future we may want to relax this for synthesis algorithms + that we can prove do not cause unexpected overflow. */ + bool cast_to_unsigned_p = !TYPE_OVERFLOW_WRAPS (itype); + + tree multtype = cast_to_unsigned_p ? unsigned_type_for (itype) : itype; + + /* Targets that don't support vector shifts but support vector additions + can synthesize shifts that way. */ + bool synth_shift_p = !vect_supportable_shift (LSHIFT_EXPR, multtype); + + HOST_WIDE_INT hwval = tree_to_shwi (val); + /* Use MAX_COST here as we don't want to limit the sequence on rtx costs. + The vectorizer's benefit analysis will decide whether it's beneficial + to do this. */ + bool possible = choose_mult_variant (mode, hwval, &alg, + &variant, MAX_COST); + if (!possible) + return NULL; - Mult with constants that are negative power of two. - S2: b_t = a_t * -n + tree vectype = get_vectype_for_scalar_type (multtype); + + if (!vectype + || !target_supports_mult_synth_alg (&alg, variant, + vectype, synth_shift_p)) + return NULL; + + tree accumulator; + + /* Clear out the sequence of statements so we can populate it below. */ + STMT_VINFO_PATTERN_DEF_SEQ (stmt_vinfo) = NULL; + gimple *stmt = NULL; + + if (cast_to_unsigned_p) + { + tree tmp_op = vect_recog_temp_ssa_var (multtype, NULL); + stmt = gimple_build_assign (tmp_op, CONVERT_EXPR, op); + append_pattern_def_seq (stmt_vinfo, stmt); + op = tmp_op; + } + + if (alg.op[0] == alg_zero) + accumulator = build_int_cst (multtype, 0); + else + accumulator = op; + + bool needs_fixup = (variant == negate_variant) + || (variant == add_variant); + + for (int i = 1; i < alg.ops; i++) + { + tree shft_log = build_int_cst (multtype, alg.log[i]); + tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL); + tree tmp_var = NULL_TREE; + + switch (alg.op[i]) + { + case alg_shift: + if (synth_shift_p) + stmt + = synth_lshift_by_additions (accum_tmp, accumulator, alg.log[i], + stmt_vinfo); + else + stmt = gimple_build_assign (accum_tmp, LSHIFT_EXPR, accumulator, + shft_log); + break; + case alg_add_t_m2: + tmp_var + = apply_binop_and_append_stmt (LSHIFT_EXPR, op, shft_log, + stmt_vinfo, synth_shift_p); + stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, + tmp_var); + break; + case alg_sub_t_m2: + tmp_var = apply_binop_and_append_stmt (LSHIFT_EXPR, op, + shft_log, stmt_vinfo, + synth_shift_p); + /* In some algorithms the first step involves zeroing the + accumulator. If subtracting from such an accumulator + just emit the negation directly. */ + if (integer_zerop (accumulator)) + stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, tmp_var); + else + stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, accumulator, + tmp_var); + break; + case alg_add_t2_m: + tmp_var + = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log, + stmt_vinfo, synth_shift_p); + stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, tmp_var, op); + break; + case alg_sub_t2_m: + tmp_var + = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log, + stmt_vinfo, synth_shift_p); + stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, op); + break; + case alg_add_factor: + tmp_var + = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log, + stmt_vinfo, synth_shift_p); + stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, + tmp_var); + break; + case alg_sub_factor: + tmp_var + = apply_binop_and_append_stmt (LSHIFT_EXPR, accumulator, shft_log, + stmt_vinfo, synth_shift_p); + stmt = gimple_build_assign (accum_tmp, MINUS_EXPR, tmp_var, + accumulator); + break; + default: + gcc_unreachable (); + } + /* We don't want to append the last stmt in the sequence to stmt_vinfo + but rather return it directly. */ + + if ((i < alg.ops - 1) || needs_fixup || cast_to_unsigned_p) + append_pattern_def_seq (stmt_vinfo, stmt); + accumulator = accum_tmp; + } + if (variant == negate_variant) + { + tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL); + stmt = gimple_build_assign (accum_tmp, NEGATE_EXPR, accumulator); + accumulator = accum_tmp; + if (cast_to_unsigned_p) + append_pattern_def_seq (stmt_vinfo, stmt); + } + else if (variant == add_variant) + { + tree accum_tmp = vect_recog_temp_ssa_var (multtype, NULL); + stmt = gimple_build_assign (accum_tmp, PLUS_EXPR, accumulator, op); + accumulator = accum_tmp; + if (cast_to_unsigned_p) + append_pattern_def_seq (stmt_vinfo, stmt); + } + /* Move back to a signed if needed. */ + if (cast_to_unsigned_p) + { + tree accum_tmp = vect_recog_temp_ssa_var (itype, NULL); + stmt = gimple_build_assign (accum_tmp, CONVERT_EXPR, accumulator); + } + + return stmt; +} + +/* Detect multiplication by constant and convert it into a sequence of + shifts and additions, subtractions, negations. We reuse the + choose_mult_variant algorithms from expmed.c Input/Output: STMTS: Contains a stmt from which the pattern search begins, - i.e. the mult stmt. Convert the mult operation to LSHIFT if - constant operand is a power of 2. - type a_t, b_t - S1': b_t = a_t << log2 (n) - - Convert the mult operation to LSHIFT and followed by a NEGATE - if constant operand is a negative power of 2. - type a_t, b_t, res_T; - S2': b_t = a_t << log2 (n) - S3': res_T = - (b_t) + i.e. the mult stmt. Output: @@ -2164,8 +2445,8 @@ vect_recog_vector_vector_shift_pattern (vec<gimple *> *stmts, * TYPE_OUT: The type of the output of this pattern. - * Return value: A new stmt that will be used to replace the multiplication - S1 or S2 stmt. */ + * Return value: A new stmt that will be used to replace + the multiplication. */ static gimple * vect_recog_mult_pattern (vec<gimple *> *stmts, @@ -2173,11 +2454,8 @@ vect_recog_mult_pattern (vec<gimple *> *stmts, { gimple *last_stmt = stmts->pop (); tree oprnd0, oprnd1, vectype, itype; - gimple *pattern_stmt, *def_stmt; - optab optab; + gimple *pattern_stmt; stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt); - int power2_val, power2_neg_val; - tree shift; if (!is_gimple_assign (last_stmt)) return NULL; @@ -2201,52 +2479,17 @@ vect_recog_mult_pattern (vec<gimple *> *stmts, /* If the target can handle vectorized multiplication natively, don't attempt to optimize this. */ - optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default); - if (optab != unknown_optab) + optab mul_optab = optab_for_tree_code (MULT_EXPR, vectype, optab_default); + if (mul_optab != unknown_optab) { machine_mode vec_mode = TYPE_MODE (vectype); - int icode = (int) optab_handler (optab, vec_mode); + int icode = (int) optab_handler (mul_optab, vec_mode); if (icode != CODE_FOR_nothing) - return NULL; + return NULL; } - /* If target cannot handle vector left shift then we cannot - optimize and bail out. */ - optab = optab_for_tree_code (LSHIFT_EXPR, vectype, optab_vector); - if (!optab - || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing) - return NULL; - - power2_val = wi::exact_log2 (oprnd1); - power2_neg_val = wi::exact_log2 (wi::neg (oprnd1)); - - /* Handle constant operands that are postive or negative powers of 2. */ - if (power2_val != -1) - { - shift = build_int_cst (itype, power2_val); - pattern_stmt - = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL), - LSHIFT_EXPR, oprnd0, shift); - } - else if (power2_neg_val != -1) - { - /* If the target cannot handle vector NEGATE then we cannot - do the optimization. */ - optab = optab_for_tree_code (NEGATE_EXPR, vectype, optab_vector); - if (!optab - || optab_handler (optab, TYPE_MODE (vectype)) == CODE_FOR_nothing) - return NULL; - - shift = build_int_cst (itype, power2_neg_val); - def_stmt - = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL), - LSHIFT_EXPR, oprnd0, shift); - new_pattern_def_seq (stmt_vinfo, def_stmt); - pattern_stmt - = gimple_build_assign (vect_recog_temp_ssa_var (itype, NULL), - NEGATE_EXPR, gimple_assign_lhs (def_stmt)); - } - else + pattern_stmt = vect_synth_mult_by_constant (oprnd0, oprnd1, stmt_vinfo); + if (!pattern_stmt) return NULL; /* Pattern detected. */