Message ID | 5AE75555.6060303@foss.arm.com |
---|---|
State | New |
Headers | show |
Series | [tree-complex.c] PR tree-optimization/70291: Inline floating-point complex multiplication more aggressively | expand |
On Mon, Apr 30, 2018 at 7:41 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote: > Hi all, > > We can improve the performance of complex floating-point multiplications by > inlining the expansion a bit more aggressively. > We can inline complex x = a * b as: > x = (ar*br - ai*bi) + i(ar*bi + br*ai); > if (isunordered (__real__ x, __imag__ x)) > x = __muldc3 (a, b); //Or __mulsc3 for single-precision > > That way the common case where no NaNs are produced we can avoid the libgcc > call and fall back to the > NaN handling stuff in libgcc if either components of the expansion are NaN. > > The implementation is done in expand_complex_multiplication in > tree-complex.c and the above expansion > will be done when optimising for -O1 and greater and when not optimising for > size. > At -O0 and -Os the single call to libgcc will be emitted. > > For the code: > __complex double > foo (__complex double a, __complex double b) > { > return a * b; > } > > We will now emit at -O2 for aarch64: > foo: > fmul d16, d1, d3 > fmul d6, d1, d2 > fnmsub d5, d0, d2, d16 > fmadd d4, d0, d3, d6 > fcmp d5, d4 > bvs .L8 > fmov d1, d4 > fmov d0, d5 > ret > .L8: > stp x29, x30, [sp, -16]! > mov x29, sp > bl __muldc3 > ldp x29, x30, [sp], 16 > ret > > Instead of just a branch to __muldc3. > > Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, > x86_64-unknown-linux-gnu. > > Ok for trunk? (GCC 9) + /* If optimizing for size or not at all just do a libcall. */ + if (optimize == 0 || optimize_function_for_size_p (cfun)) + { + expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR); + return; + } use optimize_bb_for_size_p instead please (get BB from the mult stmt). /* Expand a complex multiplication or division to a libcall to the c99 + compliant routines. Unlike expand_complex_libcall create and insert + the call, assign it to an output variable and return that rather than + modifying existing statements in place. */ + +static tree +insert_complex_mult_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, + tree ai, tree br, tree bi) +{ can you please try merging both functions instead? Also it shows a possible issue if with -fnon-call-exceptions the original multiplication has EH edges. I think you want to side-step that by doing the libcall-only way in that case as well (stmt_can_throw_internal). + tree isunordered_decl = builtin_decl_explicit (BUILT_IN_ISUNORDERED); + tree isunordered_res = create_tmp_var (integer_type_node); + gimple *tmpr_unord_check + = gimple_build_call (isunordered_decl, 2, tmpr, tmpi); + gimple_call_set_lhs (tmpr_unord_check, isunordered_res); + + gsi_insert_before (gsi, tmpr_unord_check, GSI_SAME_STMT); + gimple *check + = gimple_build_cond (NE_EXPR, isunordered_res, integer_zero_node, + NULL_TREE, NULL_TREE); why use BUILT_IN_ISUNORDERED but not a GIMPLE_COND with UNORDERED_EXPR? Note again that might trap/throw with -fsignalling-nans so better avoid this transform for flag_signalling_nans as well... + /* We have a conditional block with some assignments in cond_bb. + Wire up the PHIs to wrap up. */ + if (gimple_in_ssa_p (cfun)) + { we are always in SSA form(?) (probably tree-complex.c can use some TLC here). + /* If we are not worrying about NaNs expand to + (ar*br - ai*bi) + i(ar*bi + br*ai) directly. */ + expand_complex_multiplication_limited_range (gsi, inner_type, ar, ai, + br, bi, &rr, &ri); I think the function is badly worded - this isn't about limited ranges, no? Which also means that we can dispatch to this simple variant not only for flag_complex_method != 2 but for !HONOR_NANS && !HONOR_INFINITIES? Maybe that should be done as followup. Richard. > Thanks, > Kyrill > > 2018-04-30 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR tree-optimization/70291 > * tree-complex.c (insert_complex_mult_libcall): New function. > (expand_complex_multiplication_limited_range): Likewise. > (expand_complex_multiplication): Expand floating-point complex > multiplication using the above. > > 2018-04-30 Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > PR tree-optimization/70291 > * gcc.dg/complex-6.c: New test. > * gcc.dg/complex-7.c: Likewise.
Hi Richard, On 02/05/18 11:15, Richard Biener wrote: > On Mon, Apr 30, 2018 at 7:41 PM, Kyrill Tkachov > <kyrylo.tkachov@foss.arm.com> wrote: >> Hi all, >> >> We can improve the performance of complex floating-point multiplications by >> inlining the expansion a bit more aggressively. >> We can inline complex x = a * b as: >> x = (ar*br - ai*bi) + i(ar*bi + br*ai); >> if (isunordered (__real__ x, __imag__ x)) >> x = __muldc3 (a, b); //Or __mulsc3 for single-precision >> >> That way the common case where no NaNs are produced we can avoid the libgcc >> call and fall back to the >> NaN handling stuff in libgcc if either components of the expansion are NaN. >> >> The implementation is done in expand_complex_multiplication in >> tree-complex.c and the above expansion >> will be done when optimising for -O1 and greater and when not optimising for >> size. >> At -O0 and -Os the single call to libgcc will be emitted. >> >> For the code: >> __complex double >> foo (__complex double a, __complex double b) >> { >> return a * b; >> } >> >> We will now emit at -O2 for aarch64: >> foo: >> fmul d16, d1, d3 >> fmul d6, d1, d2 >> fnmsub d5, d0, d2, d16 >> fmadd d4, d0, d3, d6 >> fcmp d5, d4 >> bvs .L8 >> fmov d1, d4 >> fmov d0, d5 >> ret >> .L8: >> stp x29, x30, [sp, -16]! >> mov x29, sp >> bl __muldc3 >> ldp x29, x30, [sp], 16 >> ret >> >> Instead of just a branch to __muldc3. >> >> Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, >> x86_64-unknown-linux-gnu. >> >> Ok for trunk? (GCC 9) > > + /* If optimizing for size or not at all just do a libcall. */ > + if (optimize == 0 || optimize_function_for_size_p (cfun)) > + { > + expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR); > + return; > + } > > use optimize_bb_for_size_p instead please (get BB from the mult stmt). > Done. > /* Expand a complex multiplication or division to a libcall to the c99 > + compliant routines. Unlike expand_complex_libcall create and insert > + the call, assign it to an output variable and return that rather than > + modifying existing statements in place. */ > + > +static tree > +insert_complex_mult_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, > + tree ai, tree br, tree bi) > +{ > > can you please try merging both functions instead? > I tried that initially, it looked a bit awkward because we sometimes want to modify an existing assignment in-place, other times we want to insert it as a new statement. This latest version keeps only one function and adds an inplace_p parameter. > Also it shows a possible issue if with -fnon-call-exceptions the original > multiplication has EH edges. I think you want to side-step that > by doing the libcall-only way in that case as well (stmt_can_throw_internal). Ok, done. > > + tree isunordered_decl = builtin_decl_explicit (BUILT_IN_ISUNORDERED); > + tree isunordered_res = create_tmp_var (integer_type_node); > + gimple *tmpr_unord_check > + = gimple_build_call (isunordered_decl, 2, tmpr, tmpi); > + gimple_call_set_lhs (tmpr_unord_check, isunordered_res); > + > + gsi_insert_before (gsi, tmpr_unord_check, GSI_SAME_STMT); > + gimple *check > + = gimple_build_cond (NE_EXPR, isunordered_res, integer_zero_node, > + NULL_TREE, NULL_TREE); > > why use BUILT_IN_ISUNORDERED but not a GIMPLE_COND with > UNORDERED_EXPR? Note again that might trap/throw with -fsignalling-nans > so better avoid this transform for flag_signalling_nans as well... No reason, I was just not familiar enough with the tree-level expressions. Using UNORDERED_EXPR does look much better. > > + /* We have a conditional block with some assignments in cond_bb. > + Wire up the PHIs to wrap up. */ > + if (gimple_in_ssa_p (cfun)) > + { > > we are always in SSA form(?) (probably tree-complex.c can use some TLC here). I think we're always in SSA form. I suspected this was an artifact from the past. I've removed these conditionals in this latest version and testing doesn't show any problems. > > + /* If we are not worrying about NaNs expand to > + (ar*br - ai*bi) + i(ar*bi + br*ai) directly. */ > + expand_complex_multiplication_limited_range (gsi, inner_type, ar, ai, > + br, bi, &rr, &ri); > > I think the function is badly worded - this isn't about limited > ranges, no? Which > also means that we can dispatch to this simple variant not only for > flag_complex_method != 2 but for !HONOR_NANS && !HONOR_INFINITIES? > Maybe that should be done as followup. > Hmm, I've struggled a bit to come up with a good names. I've renamed it to expand_complex_multiplication_components. Is that more descriptive? I'm happy to enable this expansion path for !HONOR_NANS && !HONOR_INFINITIES as a separate patch. Bootstrapped and tested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-unknown-linux-gnu. Thanks, Kyrill 2018-05-02 Kyrylo Tkachov <kyrylo.tkachov@arm.com> PR tree-optimization/70291 * tree-complex.c (expand_complex_libcall): Add type, inplace_p arguments. Change return type to tree. Emit libcall as a new statement rather than replacing existing one when inplace_p is true. (expand_complex_multiplication_components): New function. (expand_complex_multiplication): Expand floating-point complex multiplication using the above. (expand_complex_division): Rename inner_type parameter to type. Update expand_complex_libcall call-site. (expand_complex_operations_1): Update expand_complex_multiplication and expand_complex_division call-sites. 2018-05-02 Kyrylo Tkachov <kyrylo.tkachov@arm.com> PR tree-optimization/70291 * gcc.dg/complex-6.c: New test. * gcc.dg/complex-7.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/complex-6.c b/gcc/testsuite/gcc.dg/complex-6.c new file mode 100644 index 0000000000000000000000000000000000000000..e70322bf6f378d1d6947de4c10f671bc0a7ded49 --- /dev/null +++ b/gcc/testsuite/gcc.dg/complex-6.c @@ -0,0 +1,13 @@ +/* PR tree-optimization/70291. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-cplxlower" } */ + +__complex float +foo (__complex float a, __complex float b) +{ + return a * b; +} + +/* { dg-final { scan-tree-dump-times "unord" 1 "cplxlower1" } } */ +/* { dg-final { scan-tree-dump-times "__mulsc3" 1 "cplxlower1" } } */ diff --git a/gcc/testsuite/gcc.dg/complex-7.c b/gcc/testsuite/gcc.dg/complex-7.c new file mode 100644 index 0000000000000000000000000000000000000000..78f1a290e34af0c092a00639d979bc32b332b1da --- /dev/null +++ b/gcc/testsuite/gcc.dg/complex-7.c @@ -0,0 +1,13 @@ +/* PR tree-optimization/70291. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-cplxlower" } */ + +__complex double +foo (__complex double a, __complex double b) +{ + return a * b; +} + +/* { dg-final { scan-tree-dump-times "unord" 1 "cplxlower1" } } */ +/* { dg-final { scan-tree-dump-times "__muldc3" 1 "cplxlower1" } } */ diff --git a/gcc/tree-complex.c b/gcc/tree-complex.c index 622b8696399b9e9d8bddcc6340d2f8d8ca852637..d194493ff859783678036ca3527ea7942eecc869 100644 --- a/gcc/tree-complex.c +++ b/gcc/tree-complex.c @@ -978,22 +978,22 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree inner_type, } /* Expand a complex multiplication or division to a libcall to the c99 - compliant routines. */ + compliant routines. TYPE is the complex type of the operation. + If INPLACE_P replace the statement at GSI with + the libcall and return NULL_TREE. Else insert the call, assign its + result to an output variable and return that variable. If INPLACE_P + is true then the statement being replaced should be an assignment + statement. */ -static void -expand_complex_libcall (gimple_stmt_iterator *gsi, tree ar, tree ai, - tree br, tree bi, enum tree_code code) +static tree +expand_complex_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, tree ai, + tree br, tree bi, enum tree_code code, bool inplace_p) { machine_mode mode; enum built_in_function bcode; - tree fn, type, lhs; - gimple *old_stmt; + tree fn, lhs; gcall *stmt; - old_stmt = gsi_stmt (*gsi); - lhs = gimple_assign_lhs (old_stmt); - type = TREE_TYPE (lhs); - mode = TYPE_MODE (type); gcc_assert (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT); @@ -1008,21 +1008,65 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree ar, tree ai, fn = builtin_decl_explicit (bcode); stmt = gimple_build_call (fn, 4, ar, ai, br, bi); - gimple_call_set_lhs (stmt, lhs); - update_stmt (stmt); - gsi_replace (gsi, stmt, false); - if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)) - gimple_purge_dead_eh_edges (gsi_bb (*gsi)); - if (gimple_in_ssa_p (cfun)) + if (inplace_p) { + gimple *old_stmt = gsi_stmt (*gsi); + lhs = gimple_assign_lhs (old_stmt); + gimple_call_set_lhs (stmt, lhs); + update_stmt (stmt); + gsi_replace (gsi, stmt, false); + + if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)) + gimple_purge_dead_eh_edges (gsi_bb (*gsi)); + type = TREE_TYPE (type); update_complex_components (gsi, stmt, - build1 (REALPART_EXPR, type, lhs), - build1 (IMAGPART_EXPR, type, lhs)); + build1 (REALPART_EXPR, type, lhs), + build1 (IMAGPART_EXPR, type, lhs)); SSA_NAME_DEF_STMT (lhs) = stmt; + return NULL_TREE; } + + lhs = create_tmp_var (type); + gimple_call_set_lhs (stmt, lhs); + + lhs = make_ssa_name (lhs, stmt); + gimple_call_set_lhs (stmt, lhs); + + update_stmt (stmt); + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + return lhs; +} + +/* Perform a complex multiplication on two complex constants A, B represented + by AR, AI, BR, BI of type TYPE. + The operation we want is: a * b = (ar*br - ai*bi) + i(ar*bi + br*ai). + Insert the GIMPLE statements into GSI. Store the real and imaginary + components of the result into RR and RI. */ + +static void +expand_complex_multiplication_components (gimple_stmt_iterator *gsi, + tree type, tree ar, tree ai, + tree br, tree bi, + tree *rr, tree *ri) +{ + tree t1, t2, t3, t4; + + t1 = gimplify_build2 (gsi, MULT_EXPR, type, ar, br); + t2 = gimplify_build2 (gsi, MULT_EXPR, type, ai, bi); + t3 = gimplify_build2 (gsi, MULT_EXPR, type, ar, bi); + + /* Avoid expanding redundant multiplication for the common + case of squaring a complex number. */ + if (ar == br && ai == bi) + t4 = t3; + else + t4 = gimplify_build2 (gsi, MULT_EXPR, type, ai, br); + + *rr = gimplify_build2 (gsi, MINUS_EXPR, type, t1, t2); + *ri = gimplify_build2 (gsi, PLUS_EXPR, type, t3, t4); } /* Expand complex multiplication to scalars: @@ -1030,11 +1074,12 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree ar, tree ai, */ static void -expand_complex_multiplication (gimple_stmt_iterator *gsi, tree inner_type, +expand_complex_multiplication (gimple_stmt_iterator *gsi, tree type, tree ar, tree ai, tree br, tree bi, complex_lattice_t al, complex_lattice_t bl) { tree rr, ri; + tree inner_type = TREE_TYPE (type); if (al < bl) { @@ -1080,27 +1125,79 @@ expand_complex_multiplication (gimple_stmt_iterator *gsi, tree inner_type, case PAIR (VARYING, VARYING): if (flag_complex_method == 2 && SCALAR_FLOAT_TYPE_P (inner_type)) { - expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR); - return; - } - else - { - tree t1, t2, t3, t4; - - t1 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ar, br); - t2 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ai, bi); - t3 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ar, bi); - - /* Avoid expanding redundant multiplication for the common - case of squaring a complex number. */ - if (ar == br && ai == bi) - t4 = t3; - else - t4 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ai, br); + /* If optimizing for size or not at all just do a libcall. + Same if there are exception-handling edges or signaling NaNs. */ + if (optimize == 0 || optimize_bb_for_size_p (gsi_bb (*gsi)) + || stmt_can_throw_internal (gsi_stmt (*gsi)) + || flag_signaling_nans) + { + expand_complex_libcall (gsi, type, ar, ai, br, bi, + MULT_EXPR, true); + return; + } - rr = gimplify_build2 (gsi, MINUS_EXPR, inner_type, t1, t2); - ri = gimplify_build2 (gsi, PLUS_EXPR, inner_type, t3, t4); + /* Else, expand x = a * b into + x = (ar*br - ai*bi) + i(ar*bi + br*ai); + if (isunordered (__real__ x, __imag__ x)) + x = __muldc3 (a, b); */ + + tree tmpr, tmpi; + expand_complex_multiplication_components (gsi, inner_type, ar, ai, + br, bi, &tmpr, &tmpi); + + gimple *check + = gimple_build_cond (UNORDERED_EXPR, tmpr, tmpi, + NULL_TREE, NULL_TREE); + + basic_block orig_bb = gsi_bb (*gsi); + /* We want to keep track of the original complex multiplication + statement as we're going to modify it later in + update_complex_assignment. Make sure that insert_cond_bb leaves + that statement in the join block. */ + gsi_prev (gsi); + basic_block cond_bb + = insert_cond_bb (gsi_bb (*gsi), gsi_stmt (*gsi), check, + profile_probability::very_unlikely ()); + + + gimple_stmt_iterator cond_bb_gsi = gsi_last_bb (cond_bb); + gsi_insert_after (&cond_bb_gsi, gimple_build_nop (), GSI_NEW_STMT); + + tree libcall_res + = expand_complex_libcall (&cond_bb_gsi, type, ar, ai, br, + bi, MULT_EXPR, false); + tree cond_real = gimplify_build1 (&cond_bb_gsi, REALPART_EXPR, + inner_type, libcall_res); + tree cond_imag = gimplify_build1 (&cond_bb_gsi, IMAGPART_EXPR, + inner_type, libcall_res); + + basic_block join_bb = single_succ_edge (cond_bb)->dest; + *gsi = gsi_start_nondebug_after_labels_bb (join_bb); + + /* We have a conditional block with some assignments in cond_bb. + Wire up the PHIs to wrap up. */ + rr = create_tmp_var (inner_type); + rr = make_ssa_name (rr); + ri = create_tmp_var (inner_type); + ri = make_ssa_name (ri); + edge cond_to_join = single_succ_edge (cond_bb); + edge orig_to_join = find_edge (orig_bb, join_bb); + + gphi *real_phi = create_phi_node (rr, gsi_bb (*gsi)); + add_phi_arg (real_phi, cond_real, cond_to_join, + UNKNOWN_LOCATION); + add_phi_arg (real_phi, tmpr, orig_to_join, UNKNOWN_LOCATION); + + gphi *imag_phi = create_phi_node (ri, gsi_bb (*gsi)); + add_phi_arg (imag_phi, cond_imag, cond_to_join, + UNKNOWN_LOCATION); + add_phi_arg (imag_phi, tmpi, orig_to_join, UNKNOWN_LOCATION); } + else + /* If we are not worrying about NaNs expand to + (ar*br - ai*bi) + i(ar*bi + br*ai) directly. */ + expand_complex_multiplication_components (gsi, inner_type, ar, ai, + br, bi, &rr, &ri); break; default: @@ -1308,13 +1405,14 @@ expand_complex_div_wide (gimple_stmt_iterator *gsi, tree inner_type, /* Expand complex division to scalars. */ static void -expand_complex_division (gimple_stmt_iterator *gsi, tree inner_type, +expand_complex_division (gimple_stmt_iterator *gsi, tree type, tree ar, tree ai, tree br, tree bi, enum tree_code code, complex_lattice_t al, complex_lattice_t bl) { tree rr, ri; + tree inner_type = TREE_TYPE (type); switch (PAIR (al, bl)) { case PAIR (ONLY_REAL, ONLY_REAL): @@ -1362,7 +1460,7 @@ expand_complex_division (gimple_stmt_iterator *gsi, tree inner_type, case 2: if (SCALAR_FLOAT_TYPE_P (inner_type)) { - expand_complex_libcall (gsi, ar, ai, br, bi, code); + expand_complex_libcall (gsi, type, ar, ai, br, bi, code, true); break; } /* FALLTHRU */ @@ -1630,7 +1728,7 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi) break; case MULT_EXPR: - expand_complex_multiplication (gsi, inner_type, ar, ai, br, bi, al, bl); + expand_complex_multiplication (gsi, type, ar, ai, br, bi, al, bl); break; case TRUNC_DIV_EXPR: @@ -1638,7 +1736,7 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi) case FLOOR_DIV_EXPR: case ROUND_DIV_EXPR: case RDIV_EXPR: - expand_complex_division (gsi, inner_type, ar, ai, br, bi, code, al, bl); + expand_complex_division (gsi, type, ar, ai, br, bi, code, al, bl); break; case NEGATE_EXPR:
diff --git a/gcc/testsuite/gcc.dg/complex-6.c b/gcc/testsuite/gcc.dg/complex-6.c new file mode 100644 index 0000000000000000000000000000000000000000..123b2a8206f098e7140792375830ff5f01f30cf6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/complex-6.c @@ -0,0 +1,13 @@ +/* PR tree-optimization/70291. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-cplxlower" } */ + +__complex float +foo (__complex float a, __complex float b) +{ + return a * b; +} + +/* { dg-final { scan-tree-dump-times "__builtin_isunordered" 1 "cplxlower1" } } */ +/* { dg-final { scan-tree-dump-times "__mulsc3" 1 "cplxlower1" } } */ diff --git a/gcc/testsuite/gcc.dg/complex-7.c b/gcc/testsuite/gcc.dg/complex-7.c new file mode 100644 index 0000000000000000000000000000000000000000..7d5ba3aefb3e007824b778d716ef7f21a48c58f8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/complex-7.c @@ -0,0 +1,13 @@ +/* PR tree-optimization/70291. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-cplxlower" } */ + +__complex double +foo (__complex double a, __complex double b) +{ + return a * b; +} + +/* { dg-final { scan-tree-dump-times "__builtin_isunordered" 1 "cplxlower1" } } */ +/* { dg-final { scan-tree-dump-times "__muldc3" 1 "cplxlower1" } } */ diff --git a/gcc/tree-complex.c b/gcc/tree-complex.c index 622b8696399b9e9d8bddcc6340d2f8d8ca852637..319c302526483ca80ecafe7e55289c1850ad6a11 100644 --- a/gcc/tree-complex.c +++ b/gcc/tree-complex.c @@ -978,6 +978,43 @@ expand_complex_addition (gimple_stmt_iterator *gsi, tree inner_type, } /* Expand a complex multiplication or division to a libcall to the c99 + compliant routines. Unlike expand_complex_libcall create and insert + the call, assign it to an output variable and return that rather than + modifying existing statements in place. */ + +static tree +insert_complex_mult_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, + tree ai, tree br, tree bi) +{ + machine_mode mode; + built_in_function bcode; + tree fn, lhs; + gcall *stmt; + + + mode = TYPE_MODE (type); + gcc_assert (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT); + + bcode = ((built_in_function) + (BUILT_IN_COMPLEX_MUL_MIN + mode - MIN_MODE_COMPLEX_FLOAT)); + + fn = builtin_decl_explicit (bcode); + + stmt = gimple_build_call (fn, 4, ar, ai, br, bi); + lhs = create_tmp_var (type); + gimple_call_set_lhs (stmt, lhs); + if (gimple_in_ssa_p (cfun)) + { + lhs = make_ssa_name (lhs, stmt); + gimple_call_set_lhs (stmt, lhs); + } + update_stmt (stmt); + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + + return lhs; +} + +/* Expand a complex multiplication or division to a libcall to the c99 compliant routines. */ static void @@ -1025,6 +1062,35 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree ar, tree ai, } } +/* Perform a complex multiplication assuming limited range on two + complex constants A, B represented by AR, AI, BR, BI of type TYPE. + The operation we want is: a * b = (ar*br - ai*bi) + i(ar*bi + br*ai). + Insert the GIMPLE statements into GSI. Store the real and imaginary + components of the result into RR and RI. */ + +static void +expand_complex_multiplication_limited_range (gimple_stmt_iterator *gsi, + tree type, tree ar, tree ai, + tree br, tree bi, + tree *rr, tree *ri) +{ + tree t1, t2, t3, t4; + + t1 = gimplify_build2 (gsi, MULT_EXPR, type, ar, br); + t2 = gimplify_build2 (gsi, MULT_EXPR, type, ai, bi); + t3 = gimplify_build2 (gsi, MULT_EXPR, type, ar, bi); + + /* Avoid expanding redundant multiplication for the common + case of squaring a complex number. */ + if (ar == br && ai == bi) + t4 = t3; + else + t4 = gimplify_build2 (gsi, MULT_EXPR, type, ai, br); + + *rr = gimplify_build2 (gsi, MINUS_EXPR, type, t1, t2); + *ri = gimplify_build2 (gsi, PLUS_EXPR, type, t3, t4); +} + /* Expand complex multiplication to scalars: a * b = (ar*br - ai*bi) + i(ar*bi + br*ai) */ @@ -1080,27 +1146,98 @@ expand_complex_multiplication (gimple_stmt_iterator *gsi, tree inner_type, case PAIR (VARYING, VARYING): if (flag_complex_method == 2 && SCALAR_FLOAT_TYPE_P (inner_type)) { - expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR); - return; - } - else - { - tree t1, t2, t3, t4; - - t1 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ar, br); - t2 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ai, bi); - t3 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ar, bi); + /* If optimizing for size or not at all just do a libcall. */ + if (optimize == 0 || optimize_function_for_size_p (cfun)) + { + expand_complex_libcall (gsi, ar, ai, br, bi, MULT_EXPR); + return; + } - /* Avoid expanding redundant multiplication for the common - case of squaring a complex number. */ - if (ar == br && ai == bi) - t4 = t3; + /* Else, expand x = a * b into + x = (ar*br - ai*bi) + i(ar*bi + br*ai); + if (isunordered (__real__ x, __imag__ x)) + x = __muldc3 (a, b); */ + + /* Get the top-level complex type. We'll need it soon. */ + tree top_type = TREE_TYPE (gimple_assign_lhs (gsi_stmt (*gsi))); + + tree tmpr, tmpi; + expand_complex_multiplication_limited_range (gsi, inner_type, ar, + ai, br, bi, &tmpr, + &tmpi); + + tree isunordered_decl = builtin_decl_explicit (BUILT_IN_ISUNORDERED); + tree isunordered_res = create_tmp_var (integer_type_node); + gimple *tmpr_unord_check + = gimple_build_call (isunordered_decl, 2, tmpr, tmpi); + gimple_call_set_lhs (tmpr_unord_check, isunordered_res); + + gsi_insert_before (gsi, tmpr_unord_check, GSI_SAME_STMT); + gimple *check + = gimple_build_cond (NE_EXPR, isunordered_res, integer_zero_node, + NULL_TREE, NULL_TREE); + + basic_block orig_bb = gsi_bb (*gsi); + /* We want to keep track of the original complex multiplication + statement as we're going to modify it later in + update_complex_assignment. Make sure that insert_cond_bb leaves + that statement in the join block. */ + gsi_prev (gsi); + basic_block cond_bb + = insert_cond_bb (gsi_bb (*gsi), gsi_stmt (*gsi), check, + profile_probability::very_unlikely ()); + + + gimple_stmt_iterator cond_bb_gsi = gsi_last_bb (cond_bb); + gsi_insert_after (&cond_bb_gsi, gimple_build_nop (), GSI_NEW_STMT); + + tree libcall_res + = insert_complex_mult_libcall (&cond_bb_gsi, top_type, ar, ai, br, + bi); + tree cond_real = gimplify_build1 (&cond_bb_gsi, REALPART_EXPR, + inner_type, libcall_res); + tree cond_imag = gimplify_build1 (&cond_bb_gsi, IMAGPART_EXPR, + inner_type, libcall_res); + + basic_block join_bb = single_succ_edge (cond_bb)->dest; + *gsi = gsi_start_nondebug_after_labels_bb (join_bb); + + /* We have a conditional block with some assignments in cond_bb. + Wire up the PHIs to wrap up. */ + if (gimple_in_ssa_p (cfun)) + { + rr = create_tmp_var (inner_type); + rr = make_ssa_name (rr); + ri = create_tmp_var (inner_type); + ri = make_ssa_name (ri); + edge cond_to_join = single_succ_edge (cond_bb); + edge orig_to_join = find_edge (orig_bb, join_bb); + + gphi *real_phi = create_phi_node (rr, gsi_bb (*gsi)); + add_phi_arg (real_phi, cond_real, cond_to_join, + UNKNOWN_LOCATION); + add_phi_arg (real_phi, tmpr, orig_to_join, UNKNOWN_LOCATION); + + gphi *imag_phi = create_phi_node (ri, gsi_bb (*gsi)); + add_phi_arg (imag_phi, cond_imag, cond_to_join, + UNKNOWN_LOCATION); + add_phi_arg (imag_phi, tmpi, orig_to_join, UNKNOWN_LOCATION); + } else - t4 = gimplify_build2 (gsi, MULT_EXPR, inner_type, ai, br); - - rr = gimplify_build2 (gsi, MINUS_EXPR, inner_type, t1, t2); - ri = gimplify_build2 (gsi, PLUS_EXPR, inner_type, t3, t4); + { + gimple *stmt = gimple_build_assign (tmpr, cond_real); + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + stmt = gimple_build_assign (tmpi, cond_imag); + gsi_insert_before (gsi, stmt, GSI_SAME_STMT); + rr = tmpr; + ri = tmpi; + } } + else + /* If we are not worrying about NaNs expand to + (ar*br - ai*bi) + i(ar*bi + br*ai) directly. */ + expand_complex_multiplication_limited_range (gsi, inner_type, ar, ai, + br, bi, &rr, &ri); break; default: