Message ID | 1619791790-628-6-git-send-email-christophe.lyon@linaro.org |
---|---|
State | New |
Headers | show |
Series | [1/9] arm: MVE: Convert vcmp[eq|ne]* in arm_mve.h to use only 's' builtin version | expand |
Hi Christophe, On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > Since MVE has a different set of vector comparison operators from > Neon, we have to update the expansion to take into account the new > ones, for instance 'NE' for which MVE does not require to use 'EQ' > with the inverted condition. > > Conversely, Neon supports comparisons with #0, MVE does not. > > For: > typedef long int vs32 __attribute__((vector_size(16))); > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } > > we now generate: > cmp_eq_vs32_reg: > vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 > vldr.64 d5, .L123+8 > vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 > vldr.64 d7, .L123+24 > vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si > vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si > bx lr @ 26 [c=8 l=4] *thumb2_return > .L124: > .align 3 > .L123: > .word 0 > .word 0 > .word 0 > .word 0 > .word 1 > .word 1 > .word 1 > .word 1 > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces > a pair of vldr instead of vmov.i32, qX, #0 I think ideally we would even want: vpteĀ eq, q0, q1 vmovt.i32 q0, #0 vmove.i32 q0, #1 But we don't have a way to generate VPT blocks with multiple instructions yet unfortunately so I guess VPSEL will have to do for now. > > 2021-03-01 Christophe Lyon <christophe.lyon@linaro.org> > > gcc/ > * config/arm/arm-protos.h (arm_expand_vector_compare): Update > prototype. > * config/arm/arm.c (arm_expand_vector_compare): Add support for > MVE. > (arm_expand_vcond): Likewise. > * config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S, > VCMPEQQ_N_S, VCMPNEQ_N_S. > (VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove. > * config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Add '@' prefix. > (@mve_vcmp<mve_cmp_op>q_f<mode>): Likewise. > (@mve_vcmp<mve_cmp_op>q_n_f<mode>): Likewise. > (@mve_vpselq_<supf><mode>): Likewise. > (@mve_vpselq_f<mode>"): Likewise. > * config/arm/neon.md (vec_cmp<mode><v_cmp_result): Enable for MVE > and move to vec-common.md. > (vec_cmpu<mode><mode>): Likewise. > (vcond<mode><mode>): Likewise. > (vcond<V_cvtto><mode>): Likewise. > (vcondu<mode><v_cmp_result>): Likewise. > (vcond_mask_<mode><v_cmp_result>): Likewise. > * config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S) > (VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U) > (VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S) > (VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U) > (VCMPHIQ_N_U, VCMPHIQ_U): Remove. > * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result): Moved > from neon.md. > (vec_cmpu<mode><mode>): Likewise. > (vcond<mode><mode>): Likewise. > (vcond<V_cvtto><mode>): Likewise. > (vcondu<mode><v_cmp_result>): Likewise. > (vcond_mask_<mode><v_cmp_result>): Likewise. > > gcc/testsuite > * gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors. > * gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors. > * gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC > vectors. > * gcc.target/arm/simd/mve-vcmp-f32.c: New test for > auto-vectorization. > * gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization. > > add gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > --- > gcc/config/arm/arm-protos.h | 2 +- > gcc/config/arm/arm.c | 211 ++++++++++++++++----- > gcc/config/arm/iterators.md | 9 +- > gcc/config/arm/mve.md | 10 +- > gcc/config/arm/neon.md | 87 --------- > gcc/config/arm/unspecs.md | 20 -- > gcc/config/arm/vec-common.md | 107 +++++++++++ > gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c | 80 ++++++++ > gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c | 38 ++++ > .../gcc.target/arm/simd/mve-compare-scalar-1.c | 69 +++++++ > gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c | 30 +++ > gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c | 50 +++++ > 12 files changed, 547 insertions(+), 166 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > index 2521541..ffccaa7 100644 > --- a/gcc/config/arm/arm-protos.h > +++ b/gcc/config/arm/arm-protos.h > @@ -373,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx, > extern bool arm_fusion_enabled_p (tune_params::fuse_ops); > extern bool arm_valid_symbolic_address_p (rtx); > extern bool arm_validize_comparison (rtx *, rtx *, rtx *); > -extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool); > +extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool); > #endif /* RTX_CODE */ > > extern bool arm_gen_setmem (rtx *); > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index 0371d98..80e28ef 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -30933,66 +30933,114 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, > and return true if TARGET contains the inverse. If !CAN_INVERT, > always store the result in TARGET, never its inverse. > > + If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do > + it with the right destination type to avoid emiting two vpsel, one here and > + one in arm_expand_vcond. > + > Note that the handling of floating-point comparisons is not > IEEE compliant. */ > > bool > arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > - bool can_invert) > + bool can_invert, bool vcond_mve) > { > machine_mode cmp_result_mode = GET_MODE (target); > machine_mode cmp_mode = GET_MODE (op0); > > bool inverted; > - switch (code) > - { > - /* For these we need to compute the inverse of the requested > - comparison. */ > - case UNORDERED: > - case UNLT: > - case UNLE: > - case UNGT: > - case UNGE: > - case UNEQ: > - case NE: > - code = reverse_condition_maybe_unordered (code); > - if (!can_invert) > - { > - /* Recursively emit the inverted comparison into a temporary > - and then store its inverse in TARGET. This avoids reusing > - TARGET (which for integer NE could be one of the inputs). */ > - rtx tmp = gen_reg_rtx (cmp_result_mode); > - if (arm_expand_vector_compare (tmp, code, op0, op1, true)) > - gcc_unreachable (); > - emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > - return false; > - } > - inverted = true; > - break; > > - default: > + /* MVE supports more comparisons than Neon. */ > + if (TARGET_HAVE_MVE) > inverted = false; > - break; > - } > + else > + switch (code) > + { > + /* For these we need to compute the inverse of the requested > + comparison. */ > + case UNORDERED: > + case UNLT: > + case UNLE: > + case UNGT: > + case UNGE: > + case UNEQ: > + case NE: > + code = reverse_condition_maybe_unordered (code); > + if (!can_invert) > + { > + /* Recursively emit the inverted comparison into a temporary > + and then store its inverse in TARGET. This avoids reusing > + TARGET (which for integer NE could be one of the inputs). */ > + rtx tmp = gen_reg_rtx (cmp_result_mode); > + if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve)) > + gcc_unreachable (); > + emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > + return false; > + } > + inverted = true; > + break; > + > + default: > + inverted = false; > + break; > + } > > switch (code) > { > - /* These are natively supported for zero comparisons, but otherwise > - require the operands to be swapped. */ > + /* These are natively supported by Neon for zero comparisons, but otherwise > + require the operands to be swapped. For MVE, we can only compare > + registers. */ > case LE: > case LT: > - if (op1 != CONST0_RTX (cmp_mode)) > - { > - code = swap_condition (code); > - std::swap (op0, op1); > - } > + if (!TARGET_HAVE_MVE) > + if (op1 != CONST0_RTX (cmp_mode)) > + { > + code = swap_condition (code); > + std::swap (op0, op1); > + } > /* Fall through. */ > > - /* These are natively supported for both register and zero operands. */ > + /* These are natively supported by Neon for both register and zero > + operands. MVE supports registers only. */ > case EQ: > case GE: > case GT: > - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > + case NE: > + if (TARGET_HAVE_MVE) { > + rtx vpr_p0; > + if (vcond_mve) > + vpr_p0 = target; > + else > + vpr_p0 = gen_reg_rtx (HImode); > + > + switch (cmp_mode) > + { > + case E_V16QImode: > + case E_V8HImode: > + case E_V4SImode: > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > + break; > + case E_V8HFmode: > + case E_V4SFmode: > + if (TARGET_HAVE_MVE_FLOAT) > + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > + else > + gcc_unreachable (); > + break; > + default: > + gcc_unreachable (); > + } > + > + /* If we are not expanding a vcond, build the result here. */ > + if (!vcond_mve) { > + rtx zero = gen_reg_rtx (cmp_result_mode); > + rtx one = gen_reg_rtx (cmp_result_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > + } > + } > + else > + emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > return inverted; > > /* These are natively supported for register operands only. > @@ -31000,16 +31048,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > or canonicalized by target-independent code. */ > case GEU: > case GTU: > - emit_insn (gen_neon_vc (code, cmp_mode, target, > - op0, force_reg (cmp_mode, op1))); > + if (TARGET_HAVE_MVE) { > + rtx vpr_p0; > + if (vcond_mve) > + vpr_p0 = target; > + else > + vpr_p0 = gen_reg_rtx (HImode); > + > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > + if (!vcond_mve) { > + rtx zero = gen_reg_rtx (cmp_result_mode); > + rtx one = gen_reg_rtx (cmp_result_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > + } > + } > + else > + emit_insn (gen_neon_vc (code, cmp_mode, target, > + op0, force_reg (cmp_mode, op1))); > return inverted; > > /* These require the operands to be swapped and likewise do not > support comparisons with zero. */ > case LEU: > case LTU: > - emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > - target, force_reg (cmp_mode, op1), op0)); > + if (TARGET_HAVE_MVE) { > + rtx vpr_p0; > + if (vcond_mve) > + vpr_p0 = target; > + else > + vpr_p0 = gen_reg_rtx (HImode); > + > + emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0)); > + if (!vcond_mve) { > + rtx zero = gen_reg_rtx (cmp_result_mode); > + rtx one = gen_reg_rtx (cmp_result_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > + } > + } > + else > + emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > + target, force_reg (cmp_mode, op1), op0)); > return inverted; > > /* These need a combination of two comparisons. */ > @@ -31021,8 +31103,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > rtx gt_res = gen_reg_rtx (cmp_result_mode); > rtx alt_res = gen_reg_rtx (cmp_result_mode); > rtx_code alt_code = (code == LTGT ? LT : LE); > - if (arm_expand_vector_compare (gt_res, GT, op0, op1, true) > - || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true)) > + if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve) > + || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve)) > gcc_unreachable (); > emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode, > gt_res, alt_res))); > @@ -31040,13 +31122,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > void > arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode) > { > - rtx mask = gen_reg_rtx (cmp_result_mode); > + /* When expanding for MVE, we do not want to emit a (useless) vpsel in > + arm_expand_vector_compare, and another one here. */ > + bool vcond_mve=false; > + rtx mask; > + > + if (TARGET_HAVE_MVE) > + { > + vcond_mve=true; > + mask = gen_reg_rtx (HImode); > + } > + else > + mask = gen_reg_rtx (cmp_result_mode); > + > bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), > - operands[4], operands[5], true); > + operands[4], operands[5], true, vcond_mve); > if (inverted) > std::swap (operands[1], operands[2]); > + if (TARGET_NEON) > emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], > mask, operands[1], operands[2])); > + else > + { > + machine_mode cmp_mode = GET_MODE (operands[4]); > + rtx vpr_p0 = mask; > + rtx zero = gen_reg_rtx (cmp_mode); > + rtx one = gen_reg_rtx (cmp_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_mode)); > + switch (cmp_mode) > + { > + case E_V16QImode: > + case E_V8HImode: > + case E_V4SImode: > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0)); > + break; > + case E_V8HFmode: > + case E_V4SFmode: > + if (TARGET_HAVE_MVE_FLOAT) > + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0)); > + break; > + default: > + gcc_unreachable (); > + } > + } > } > > #define MAX_VECT_LEN 16 > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > index 95df8bd..a128465 100644 > --- a/gcc/config/arm/iterators.md > +++ b/gcc/config/arm/iterators.md > @@ -1288,12 +1288,11 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") > (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s") > (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U "u") > (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s") (VSHLQ_S "s") > - (VADDLVQ_P_U "u") (VCMPNEQ_S "s") > + (VADDLVQ_P_U "u") > (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s") > (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u") > (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBRSRQ_N_S "s") > - (VBRSRQ_N_U "u") (VCMPEQQ_S "s") > - (VCMPEQQ_N_S "s") (VCMPNEQ_N_S "s") > + (VBRSRQ_N_U "u") > (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s") > (VHADDQ_U "u") (VHSUBQ_N_S "s") (VHSUBQ_N_U "u") > (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u") (VHSUBQ_U "u") > @@ -1549,16 +1548,12 @@ (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S]) > (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U]) > (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U]) > (define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U]) > -(define_int_iterator VCMPNEQ [VCMPNEQ_S]) > (define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U]) > (define_int_iterator VABDQ [VABDQ_S VABDQ_U]) > (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U]) > (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U]) > (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S]) > (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S]) > -(define_int_iterator VCMPEQQ [VCMPEQQ_S]) > -(define_int_iterator VCMPEQQ_N [VCMPEQQ_N_S]) > -(define_int_iterator VCMPNEQ_N [VCMPNEQ_N_S]) > (define_int_iterator VHADDQ [VHADDQ_S VHADDQ_U]) > (define_int_iterator VHADDQ_N [VHADDQ_N_U VHADDQ_N_S]) > (define_int_iterator VHSUBQ [VHSUBQ_S VHSUBQ_U]) > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > index 7c846a4..97f0a87 100644 > --- a/gcc/config/arm/mve.md > +++ b/gcc/config/arm/mve.md > @@ -838,7 +838,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si" > ;; > ;; [vcmpneq_, vcmpcsq_, vcmpeqq_, vcmpgeq_, vcmpgtq_, vcmphiq_, vcmpleq_, vcmpltq_]) > ;; > -(define_insn "mve_vcmp<mve_cmp_op>q_<mode>" > +(define_insn "@mve_vcmp<mve_cmp_op>q_<mode>" > [ > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > (MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w") > @@ -1928,7 +1928,7 @@ (define_insn "mve_vcaddq<mve_rot><mode>" > ;; > ;; [vcmpeqq_f, vcmpgeq_f, vcmpgtq_f, vcmpleq_f, vcmpltq_f, vcmpneq_f]) > ;; > -(define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > +(define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>" > [ > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > @@ -1942,7 +1942,7 @@ (define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > ;; > ;; [vcmpeqq_n_f, vcmpgeq_n_f, vcmpgtq_n_f, vcmpleq_n_f, vcmpltq_n_f, vcmpneq_n_f]) > ;; > -(define_insn "mve_vcmp<mve_cmp_op>q_n_f<mode>" > +(define_insn "@mve_vcmp<mve_cmp_op>q_n_f<mode>" > [ > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > @@ -3307,7 +3307,7 @@ (define_insn "mve_vnegq_m_s<mode>" > ;; > ;; [vpselq_u, vpselq_s]) > ;; > -(define_insn "mve_vpselq_<supf><mode>" > +(define_insn "@mve_vpselq_<supf><mode>" > [ > (set (match_operand:MVE_1 0 "s_register_operand" "=w") > (unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w") > @@ -4402,7 +4402,7 @@ (define_insn "mve_vorrq_m_n_<supf><mode>" > ;; > ;; [vpselq_f]) > ;; > -(define_insn "mve_vpselq_f<mode>" > +(define_insn "@mve_vpselq_f<mode>" > [ > (set (match_operand:MVE_0 0 "s_register_operand" "=w") > (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index fec2cc9..6660846 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -1416,93 +1416,6 @@ (define_insn "*us_sub<mode>_neon" > [(set_attr "type" "neon_qsub<q>")] > ) > > -(define_expand "vec_cmp<mode><v_cmp_result>" > - [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > - (match_operator:<V_cmp_result> 1 "comparison_operator" > - [(match_operand:VDQW 2 "s_register_operand") > - (match_operand:VDQW 3 "reg_or_zero_operand")]))] > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > -{ > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > - operands[2], operands[3], false); > - DONE; > -}) > - > -(define_expand "vec_cmpu<mode><mode>" > - [(set (match_operand:VDQIW 0 "s_register_operand") > - (match_operator:VDQIW 1 "comparison_operator" > - [(match_operand:VDQIW 2 "s_register_operand") > - (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > - "TARGET_NEON" > -{ > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > - operands[2], operands[3], false); > - DONE; > -}) > - > -;; Conditional instructions. These are comparisons with conditional moves for > -;; vectors. They perform the assignment: > -;; > -;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > -;; > -;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > -;; element-wise. > - > -(define_expand "vcond<mode><mode>" > - [(set (match_operand:VDQW 0 "s_register_operand") > - (if_then_else:VDQW > - (match_operator 3 "comparison_operator" > - [(match_operand:VDQW 4 "s_register_operand") > - (match_operand:VDQW 5 "reg_or_zero_operand")]) > - (match_operand:VDQW 1 "s_register_operand") > - (match_operand:VDQW 2 "s_register_operand")))] > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > -{ > - arm_expand_vcond (operands, <V_cmp_result>mode); > - DONE; > -}) > - > -(define_expand "vcond<V_cvtto><mode>" > - [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > - (if_then_else:<V_CVTTO> > - (match_operator 3 "comparison_operator" > - [(match_operand:V32 4 "s_register_operand") > - (match_operand:V32 5 "reg_or_zero_operand")]) > - (match_operand:<V_CVTTO> 1 "s_register_operand") > - (match_operand:<V_CVTTO> 2 "s_register_operand")))] > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > -{ > - arm_expand_vcond (operands, <V_cmp_result>mode); > - DONE; > -}) > - > -(define_expand "vcondu<mode><v_cmp_result>" > - [(set (match_operand:VDQW 0 "s_register_operand") > - (if_then_else:VDQW > - (match_operator 3 "arm_comparison_operator" > - [(match_operand:<V_cmp_result> 4 "s_register_operand") > - (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > - (match_operand:VDQW 1 "s_register_operand") > - (match_operand:VDQW 2 "s_register_operand")))] > - "TARGET_NEON" > -{ > - arm_expand_vcond (operands, <V_cmp_result>mode); > - DONE; > -}) > - > -(define_expand "vcond_mask_<mode><v_cmp_result>" > - [(set (match_operand:VDQW 0 "s_register_operand") > - (if_then_else:VDQW > - (match_operand:<V_cmp_result> 3 "s_register_operand") > - (match_operand:VDQW 1 "s_register_operand") > - (match_operand:VDQW 2 "s_register_operand")))] > - "TARGET_NEON" > -{ > - emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1], > - operands[2])); > - DONE; > -}) > - > ;; Patterns for builtins. > > ; good for plain vadd, vaddq. > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md > index 07ca53b..0778db1 100644 > --- a/gcc/config/arm/unspecs.md > +++ b/gcc/config/arm/unspecs.md > @@ -596,8 +596,6 @@ (define_c_enum "unspec" [ > VCVTQ_N_FROM_F_U > VADDLVQ_P_S > VADDLVQ_P_U > - VCMPNEQ_U > - VCMPNEQ_S > VSHLQ_S > VSHLQ_U > VABDQ_S > @@ -605,9 +603,6 @@ (define_c_enum "unspec" [ > VADDVAQ_S > VADDVQ_P_S > VBRSRQ_N_S > - VCMPEQQ_S > - VCMPEQQ_N_S > - VCMPNEQ_N_S > VHADDQ_S > VHADDQ_N_S > VHSUBQ_S > @@ -645,9 +640,6 @@ (define_c_enum "unspec" [ > VADDVAQ_U > VADDVQ_P_U > VBRSRQ_N_U > - VCMPEQQ_U > - VCMPEQQ_N_U > - VCMPNEQ_N_U > VHADDQ_U > VHADDQ_N_U > VHSUBQ_U > @@ -680,14 +672,6 @@ (define_c_enum "unspec" [ > VSHLQ_R_U > VSUBQ_U > VSUBQ_N_U > - VCMPGEQ_N_S > - VCMPGEQ_S > - VCMPGTQ_N_S > - VCMPGTQ_S > - VCMPLEQ_N_S > - VCMPLEQ_S > - VCMPLTQ_N_S > - VCMPLTQ_S > VHCADDQ_ROT270_S > VHCADDQ_ROT90_S > VMAXAQ_S > @@ -702,10 +686,6 @@ (define_c_enum "unspec" [ > VQRDMULHQ_N_S > VQRDMULHQ_S > VQSHLUQ_N_S > - VCMPCSQ_N_U > - VCMPCSQ_U > - VCMPHIQ_N_U > - VCMPHIQ_U > VABDQ_M_S > VABDQ_M_U > VABDQ_F > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md > index 0b2b3b1..034b48b 100644 > --- a/gcc/config/arm/vec-common.md > +++ b/gcc/config/arm/vec-common.md > @@ -362,3 +362,110 @@ (define_expand "vlshr<mode>3" > DONE; > } > }) > + > +(define_expand "vec_cmp<mode><v_cmp_result>" > + [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > + (match_operator:<V_cmp_result> 1 "comparison_operator" > + [(match_operand:VDQW 2 "s_register_operand") > + (match_operand:VDQW 3 "reg_or_zero_operand")]))] > + "ARM_HAVE_<MODE>_ARITH > + && !TARGET_REALLY_IWMMXT > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > +{ > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > + operands[2], operands[3], false, false); > + DONE; > +}) > + > +(define_expand "vec_cmpu<mode><mode>" > + [(set (match_operand:VDQIW 0 "s_register_operand") > + (match_operator:VDQIW 1 "comparison_operator" > + [(match_operand:VDQIW 2 "s_register_operand") > + (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > + "ARM_HAVE_<MODE>_ARITH > + && !TARGET_REALLY_IWMMXT" > +{ > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > + operands[2], operands[3], false, false); > + DONE; > +}) > + > +;; Conditional instructions. These are comparisons with conditional moves for > +;; vectors. They perform the assignment: > +;; > +;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > +;; > +;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > +;; element-wise. > + > +(define_expand "vcond<mode><mode>" > + [(set (match_operand:VDQW 0 "s_register_operand") > + (if_then_else:VDQW > + (match_operator 3 "comparison_operator" > + [(match_operand:VDQW 4 "s_register_operand") > + (match_operand:VDQW 5 "reg_or_zero_operand")]) > + (match_operand:VDQW 1 "s_register_operand") > + (match_operand:VDQW 2 "s_register_operand")))] > + "ARM_HAVE_<MODE>_ARITH > + && !TARGET_REALLY_IWMMXT > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > +{ > + arm_expand_vcond (operands, <V_cmp_result>mode); > + DONE; > +}) > + > +(define_expand "vcond<V_cvtto><mode>" > + [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > + (if_then_else:<V_CVTTO> > + (match_operator 3 "comparison_operator" > + [(match_operand:V32 4 "s_register_operand") > + (match_operand:V32 5 "reg_or_zero_operand")]) > + (match_operand:<V_CVTTO> 1 "s_register_operand") > + (match_operand:<V_CVTTO> 2 "s_register_operand")))] > + "ARM_HAVE_<MODE>_ARITH > + && !TARGET_REALLY_IWMMXT > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > +{ > + arm_expand_vcond (operands, <V_cmp_result>mode); > + DONE; > +}) > + > +(define_expand "vcondu<mode><v_cmp_result>" > + [(set (match_operand:VDQW 0 "s_register_operand") > + (if_then_else:VDQW > + (match_operator 3 "arm_comparison_operator" > + [(match_operand:<V_cmp_result> 4 "s_register_operand") > + (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > + (match_operand:VDQW 1 "s_register_operand") > + (match_operand:VDQW 2 "s_register_operand")))] > + "ARM_HAVE_<MODE>_ARITH > + && !TARGET_REALLY_IWMMXT" > +{ > + arm_expand_vcond (operands, <V_cmp_result>mode); > + DONE; > +}) > + > +(define_expand "vcond_mask_<mode><v_cmp_result>" > + [(set (match_operand:VDQW 0 "s_register_operand") > + (if_then_else:VDQW > + (match_operand:<V_cmp_result> 3 "s_register_operand") > + (match_operand:VDQW 1 "s_register_operand") > + (match_operand:VDQW 2 "s_register_operand")))] > + "ARM_HAVE_<MODE>_ARITH > + && !TARGET_REALLY_IWMMXT" > +{ > + if (TARGET_NEON) > + { > + emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3], > + operands[1], operands[2])); > + } > + else if (TARGET_HAVE_MVE) > + { > + emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0], > + operands[1], operands[2], operands[3])); > + } > + else > + gcc_unreachable (); > + > + DONE; > +}) > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > new file mode 100644 > index 0000000..029c931 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > @@ -0,0 +1,80 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O3" } */ > + > +/* Integer tests. */ > + > +#define COMPARE_REG(NAME, OP, TYPE) \ > + TYPE \ > + cmp_##NAME##_##TYPE##_reg (TYPE a, TYPE b) \ > + { \ > + return a OP b; \ > + } > + > +#define COMPARE_REG_AND_ZERO(NAME, OP, TYPE) \ > + COMPARE_REG (NAME, OP, TYPE) \ > + \ > + TYPE \ > + cmp_##NAME##_##TYPE##_zero (TYPE a) \ > + { \ > + return a OP (TYPE) {}; \ > + } > + > +#define COMPARE_TYPE(TYPE, COMPARE_ORDERED) \ > + COMPARE_REG_AND_ZERO (eq, ==, TYPE) \ > + COMPARE_REG_AND_ZERO (ne, !=, TYPE) \ > + COMPARE_ORDERED (lt, <, TYPE) \ > + COMPARE_ORDERED (le, <=, TYPE) \ > + COMPARE_ORDERED (gt, >, TYPE) \ > + COMPARE_ORDERED (ge, >=, TYPE) > + > +#define TEST_TYPE(NAME, ELEM, COMPARE_ORDERED, SIZE) \ > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > + COMPARE_TYPE (NAME##SIZE, COMPARE_ORDERED) > + > +/* 64-bits vectors, not vectorized. */ > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 8) > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 8) > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 8) > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 8) > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 8) > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 8) > + > +/* 128-bits vectors. */ > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 16) > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 16) > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 16) > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 16) > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 16) > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 16) > + > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > + > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > + > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > new file mode 100644 > index 0000000..8515195 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > @@ -0,0 +1,38 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > +/* { dg-add-options arm_v8_1m_mve_fp } */ > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > + > +/* float 32 tests. */ > + > +#ifndef ELEM_TYPE > +#define ELEM_TYPE float > +#endif > +#ifndef INT_ELEM_TYPE > +#define INT_ELEM_TYPE __INT32_TYPE__ > +#endif > + > +#define COMPARE(NAME, OP) \ > + int_vec \ > + cmp_##NAME##_reg (vec a, vec b) \ > + { \ > + return a OP b; \ > + } > + > +typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16))); > +typedef ELEM_TYPE vec __attribute__((vector_size(16))); > + > +COMPARE (eq, ==) > +COMPARE (ne, !=) > +COMPARE (lt, <) > +COMPARE (le, <=) > +COMPARE (gt, >) > +COMPARE (ge, >=) > + > +/* eq, ne, lt, le, gt, ge. > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > new file mode 100644 > index 0000000..7774972 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > @@ -0,0 +1,69 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O3" } */ > + > +#define COMPARE_REG(NAME, OP, TYPE, SCALAR) \ > + TYPE \ > + cmp_##NAME##_##TYPE##_scalar (TYPE a, SCALAR b) \ > + { \ > + return a OP b; \ > + } > + > +#define COMPARE_TYPE(SCALAR, TYPE) \ > + COMPARE_REG (eq, ==, TYPE, SCALAR) \ > + COMPARE_REG (ne, !=, TYPE, SCALAR) \ > + COMPARE_REG (lt, <, TYPE, SCALAR) \ > + COMPARE_REG (le, <=, TYPE, SCALAR) \ > + COMPARE_REG (gt, >, TYPE, SCALAR) \ > + COMPARE_REG (ge, >=, TYPE, SCALAR) > + > +#define TEST_TYPE(NAME, ELEM, SIZE) \ > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > + COMPARE_TYPE (ELEM, NAME##SIZE) > + > +/* 64-bits vectors, not vectorized. */ > +TEST_TYPE (vs8, __INT8_TYPE__, 8) > +TEST_TYPE (vu8, __UINT8_TYPE__, 8) > +TEST_TYPE (vs16, __INT16_TYPE__, 8) > +TEST_TYPE (vu16, __UINT16_TYPE__, 8) > +TEST_TYPE (vs32, __INT32_TYPE__, 8) > +TEST_TYPE (vu32, __UINT32_TYPE__, 8) > + > +/* 128-bits vectors. */ > +TEST_TYPE (vs8, __INT8_TYPE__, 16) > +TEST_TYPE (vu8, __UINT8_TYPE__, 16) > +TEST_TYPE (vs16, __INT16_TYPE__, 16) > +TEST_TYPE (vu16, __UINT16_TYPE__, 16) > +TEST_TYPE (vs32, __INT32_TYPE__, 16) > +TEST_TYPE (vu32, __UINT32_TYPE__, 16) > + > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > + > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > + > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > new file mode 100644 > index 0000000..4ed449e > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > @@ -0,0 +1,30 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > +/* { dg-add-options arm_v8_1m_mve_fp } */ > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > + > +#include <stdint.h> > + > +#define NB 4 > + > +#define FUNC(OP, NAME) \ > + void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \ > + int i; \ > + for (i=0; i<NB; i++) { \ > + dest[i] = a[i] OP b[i]; \ > + } \ > + } > + > +FUNC(==, vcmpeq) > +FUNC(!=, vcmpne) > +FUNC(<, vcmplt) > +FUNC(<=, vcmple) > +FUNC(>, vcmpgt) > +FUNC(>=, vcmpge) > + > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > new file mode 100644 > index 0000000..8da15e7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > @@ -0,0 +1,50 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > +/* { dg-add-options arm_v8_1m_mve } */ > +/* { dg-additional-options "-O3" } */ > + > +#include <stdint.h> > + > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME) \ > + void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ > + int i; \ > + for (i=0; i<NB; i++) { \ > + dest[i] = a[i] OP b[i]; \ > + } \ > +} > + > +#define ALL_FUNCS(OP, NAME) \ > + FUNC(s, int, 32, 2, OP, NAME) \ > + FUNC(u, uint, 32, 2, OP, NAME) \ > + FUNC(s, int, 16, 4, OP, NAME) \ > + FUNC(u, uint, 16, 4, OP, NAME) \ > + FUNC(s, int, 8, 8, OP, NAME) \ > + FUNC(u, uint, 8, 8, OP, NAME) \ > + FUNC(s, int, 32, 4, OP, NAME) \ > + FUNC(u, uint, 32, 4, OP, NAME) \ > + FUNC(s, int, 16, 8, OP, NAME) \ > + FUNC(u, uint, 16, 8, OP, NAME) \ > + FUNC(s, int, 8, 16, OP, NAME) \ > + FUNC(u, uint, 8, 16, OP, NAME) > + > +ALL_FUNCS(==, vcmpeq) > +ALL_FUNCS(!=, vcmpne) > +ALL_FUNCS(<, vcmplt) > +ALL_FUNCS(<=, vcmple) > +ALL_FUNCS(>, vcmpgt) > +ALL_FUNCS(>=, vcmpge) > + > +/* MVE has only 128-bit vectors, so we can vectorize only half of the > + functions above. */ > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ eq, q[0-9]+, q[0-9]+\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ ne, q[0-9]+, q[0-9]+\n} 6 } } */ > + > +/* lt, le, gt, ge apply to signed types, cs and hi to unsigned types. */ > +/* lt and le with unsigned types are replaced with the opposite condition, hence > + the double number of matches for cs and hi. */ > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ lt, q[0-9]+, q[0-9]+\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ le, q[0-9]+, q[0-9]+\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ gt, q[0-9]+, q[0-9]+\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ ge, q[0-9]+, q[0-9]+\n} 3 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ cs, q[0-9]+, q[0-9]+\n} 6 } } */ > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ hi, q[0-9]+, q[0-9]+\n} 6 } } */
On Tue, 4 May 2021 at 13:29, Andre Vieira (lists) <andre.simoesdiasvieira@arm.com> wrote: > > Hi Christophe, > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > > Since MVE has a different set of vector comparison operators from > > Neon, we have to update the expansion to take into account the new > > ones, for instance 'NE' for which MVE does not require to use 'EQ' > > with the inverted condition. > > > > Conversely, Neon supports comparisons with #0, MVE does not. > > > > For: > > typedef long int vs32 __attribute__((vector_size(16))); > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } > > > > we now generate: > > cmp_eq_vs32_reg: > > vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 > > vldr.64 d5, .L123+8 > > vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 > > vldr.64 d7, .L123+24 > > vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si > > vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si > > bx lr @ 26 [c=8 l=4] *thumb2_return > > .L124: > > .align 3 > > .L123: > > .word 0 > > .word 0 > > .word 0 > > .word 0 > > .word 1 > > .word 1 > > .word 1 > > .word 1 > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces > > a pair of vldr instead of vmov.i32, qX, #0 > I think ideally we would even want: > vpte eq, q0, q1 > vmovt.i32 q0, #0 > vmove.i32 q0, #1 > > But we don't have a way to generate VPT blocks with multiple > instructions yet unfortunately so I guess VPSEL will have to do for now. TBH, I looked at what LLVM generates currently ;-) > > > > > 2021-03-01 Christophe Lyon <christophe.lyon@linaro.org> > > > > gcc/ > > * config/arm/arm-protos.h (arm_expand_vector_compare): Update > > prototype. > > * config/arm/arm.c (arm_expand_vector_compare): Add support for > > MVE. > > (arm_expand_vcond): Likewise. > > * config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S, > > VCMPEQQ_N_S, VCMPNEQ_N_S. > > (VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove. > > * config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Add '@' prefix. > > (@mve_vcmp<mve_cmp_op>q_f<mode>): Likewise. > > (@mve_vcmp<mve_cmp_op>q_n_f<mode>): Likewise. > > (@mve_vpselq_<supf><mode>): Likewise. > > (@mve_vpselq_f<mode>"): Likewise. > > * config/arm/neon.md (vec_cmp<mode><v_cmp_result): Enable for MVE > > and move to vec-common.md. > > (vec_cmpu<mode><mode>): Likewise. > > (vcond<mode><mode>): Likewise. > > (vcond<V_cvtto><mode>): Likewise. > > (vcondu<mode><v_cmp_result>): Likewise. > > (vcond_mask_<mode><v_cmp_result>): Likewise. > > * config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S) > > (VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U) > > (VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S) > > (VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U) > > (VCMPHIQ_N_U, VCMPHIQ_U): Remove. > > * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result): Moved > > from neon.md. > > (vec_cmpu<mode><mode>): Likewise. > > (vcond<mode><mode>): Likewise. > > (vcond<V_cvtto><mode>): Likewise. > > (vcondu<mode><v_cmp_result>): Likewise. > > (vcond_mask_<mode><v_cmp_result>): Likewise. > > > > gcc/testsuite > > * gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors. > > * gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors. > > * gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC > > vectors. > > * gcc.target/arm/simd/mve-vcmp-f32.c: New test for > > auto-vectorization. > > * gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization. > > > > add gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > --- > > gcc/config/arm/arm-protos.h | 2 +- > > gcc/config/arm/arm.c | 211 ++++++++++++++++----- > > gcc/config/arm/iterators.md | 9 +- > > gcc/config/arm/mve.md | 10 +- > > gcc/config/arm/neon.md | 87 --------- > > gcc/config/arm/unspecs.md | 20 -- > > gcc/config/arm/vec-common.md | 107 +++++++++++ > > gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c | 80 ++++++++ > > gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c | 38 ++++ > > .../gcc.target/arm/simd/mve-compare-scalar-1.c | 69 +++++++ > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c | 30 +++ > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c | 50 +++++ > > 12 files changed, 547 insertions(+), 166 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > > index 2521541..ffccaa7 100644 > > --- a/gcc/config/arm/arm-protos.h > > +++ b/gcc/config/arm/arm-protos.h > > @@ -373,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx, > > extern bool arm_fusion_enabled_p (tune_params::fuse_ops); > > extern bool arm_valid_symbolic_address_p (rtx); > > extern bool arm_validize_comparison (rtx *, rtx *, rtx *); > > -extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool); > > +extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool); > > #endif /* RTX_CODE */ > > > > extern bool arm_gen_setmem (rtx *); > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > index 0371d98..80e28ef 100644 > > --- a/gcc/config/arm/arm.c > > +++ b/gcc/config/arm/arm.c > > @@ -30933,66 +30933,114 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, > > and return true if TARGET contains the inverse. If !CAN_INVERT, > > always store the result in TARGET, never its inverse. > > > > + If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do > > + it with the right destination type to avoid emiting two vpsel, one here and > > + one in arm_expand_vcond. > > + > > Note that the handling of floating-point comparisons is not > > IEEE compliant. */ > > > > bool > > arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > - bool can_invert) > > + bool can_invert, bool vcond_mve) > > { > > machine_mode cmp_result_mode = GET_MODE (target); > > machine_mode cmp_mode = GET_MODE (op0); > > > > bool inverted; > > - switch (code) > > - { > > - /* For these we need to compute the inverse of the requested > > - comparison. */ > > - case UNORDERED: > > - case UNLT: > > - case UNLE: > > - case UNGT: > > - case UNGE: > > - case UNEQ: > > - case NE: > > - code = reverse_condition_maybe_unordered (code); > > - if (!can_invert) > > - { > > - /* Recursively emit the inverted comparison into a temporary > > - and then store its inverse in TARGET. This avoids reusing > > - TARGET (which for integer NE could be one of the inputs). */ > > - rtx tmp = gen_reg_rtx (cmp_result_mode); > > - if (arm_expand_vector_compare (tmp, code, op0, op1, true)) > > - gcc_unreachable (); > > - emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > > - return false; > > - } > > - inverted = true; > > - break; > > > > - default: > > + /* MVE supports more comparisons than Neon. */ > > + if (TARGET_HAVE_MVE) > > inverted = false; > > - break; > > - } > > + else > > + switch (code) > > + { > > + /* For these we need to compute the inverse of the requested > > + comparison. */ > > + case UNORDERED: > > + case UNLT: > > + case UNLE: > > + case UNGT: > > + case UNGE: > > + case UNEQ: > > + case NE: > > + code = reverse_condition_maybe_unordered (code); > > + if (!can_invert) > > + { > > + /* Recursively emit the inverted comparison into a temporary > > + and then store its inverse in TARGET. This avoids reusing > > + TARGET (which for integer NE could be one of the inputs). */ > > + rtx tmp = gen_reg_rtx (cmp_result_mode); > > + if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve)) > > + gcc_unreachable (); > > + emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > > + return false; > > + } > > + inverted = true; > > + break; > > + > > + default: > > + inverted = false; > > + break; > > + } > > > > switch (code) > > { > > - /* These are natively supported for zero comparisons, but otherwise > > - require the operands to be swapped. */ > > + /* These are natively supported by Neon for zero comparisons, but otherwise > > + require the operands to be swapped. For MVE, we can only compare > > + registers. */ > > case LE: > > case LT: > > - if (op1 != CONST0_RTX (cmp_mode)) > > - { > > - code = swap_condition (code); > > - std::swap (op0, op1); > > - } > > + if (!TARGET_HAVE_MVE) > > + if (op1 != CONST0_RTX (cmp_mode)) > > + { > > + code = swap_condition (code); > > + std::swap (op0, op1); > > + } > > /* Fall through. */ > > > > - /* These are natively supported for both register and zero operands. */ > > + /* These are natively supported by Neon for both register and zero > > + operands. MVE supports registers only. */ > > case EQ: > > case GE: > > case GT: > > - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > > + case NE: > > + if (TARGET_HAVE_MVE) { > > + rtx vpr_p0; > > + if (vcond_mve) > > + vpr_p0 = target; > > + else > > + vpr_p0 = gen_reg_rtx (HImode); > > + > > + switch (cmp_mode) > > + { > > + case E_V16QImode: > > + case E_V8HImode: > > + case E_V4SImode: > > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > + break; > > + case E_V8HFmode: > > + case E_V4SFmode: > > + if (TARGET_HAVE_MVE_FLOAT) > > + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > + else > > + gcc_unreachable (); > > + break; > > + default: > > + gcc_unreachable (); > > + } > > + > > + /* If we are not expanding a vcond, build the result here. */ > > + if (!vcond_mve) { > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > + rtx one = gen_reg_rtx (cmp_result_mode); > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > + } > > + } > > + else > > + emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > > return inverted; > > > > /* These are natively supported for register operands only. > > @@ -31000,16 +31048,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > or canonicalized by target-independent code. */ > > case GEU: > > case GTU: > > - emit_insn (gen_neon_vc (code, cmp_mode, target, > > - op0, force_reg (cmp_mode, op1))); > > + if (TARGET_HAVE_MVE) { > > + rtx vpr_p0; > > + if (vcond_mve) > > + vpr_p0 = target; > > + else > > + vpr_p0 = gen_reg_rtx (HImode); > > + > > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > + if (!vcond_mve) { > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > + rtx one = gen_reg_rtx (cmp_result_mode); > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > + } > > + } > > + else > > + emit_insn (gen_neon_vc (code, cmp_mode, target, > > + op0, force_reg (cmp_mode, op1))); > > return inverted; > > > > /* These require the operands to be swapped and likewise do not > > support comparisons with zero. */ > > case LEU: > > case LTU: > > - emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > > - target, force_reg (cmp_mode, op1), op0)); > > + if (TARGET_HAVE_MVE) { > > + rtx vpr_p0; > > + if (vcond_mve) > > + vpr_p0 = target; > > + else > > + vpr_p0 = gen_reg_rtx (HImode); > > + > > + emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0)); > > + if (!vcond_mve) { > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > + rtx one = gen_reg_rtx (cmp_result_mode); > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > + } > > + } > > + else > > + emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > > + target, force_reg (cmp_mode, op1), op0)); > > return inverted; > > > > /* These need a combination of two comparisons. */ > > @@ -31021,8 +31103,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > rtx gt_res = gen_reg_rtx (cmp_result_mode); > > rtx alt_res = gen_reg_rtx (cmp_result_mode); > > rtx_code alt_code = (code == LTGT ? LT : LE); > > - if (arm_expand_vector_compare (gt_res, GT, op0, op1, true) > > - || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true)) > > + if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve) > > + || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve)) > > gcc_unreachable (); > > emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode, > > gt_res, alt_res))); > > @@ -31040,13 +31122,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > void > > arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode) > > { > > - rtx mask = gen_reg_rtx (cmp_result_mode); > > + /* When expanding for MVE, we do not want to emit a (useless) vpsel in > > + arm_expand_vector_compare, and another one here. */ > > + bool vcond_mve=false; > > + rtx mask; > > + > > + if (TARGET_HAVE_MVE) > > + { > > + vcond_mve=true; > > + mask = gen_reg_rtx (HImode); > > + } > > + else > > + mask = gen_reg_rtx (cmp_result_mode); > > + > > bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), > > - operands[4], operands[5], true); > > + operands[4], operands[5], true, vcond_mve); > > if (inverted) > > std::swap (operands[1], operands[2]); > > + if (TARGET_NEON) > > emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], > > mask, operands[1], operands[2])); > > + else > > + { > > + machine_mode cmp_mode = GET_MODE (operands[4]); > > + rtx vpr_p0 = mask; > > + rtx zero = gen_reg_rtx (cmp_mode); > > + rtx one = gen_reg_rtx (cmp_mode); > > + emit_move_insn (zero, CONST0_RTX (cmp_mode)); > > + emit_move_insn (one, CONST1_RTX (cmp_mode)); > > + switch (cmp_mode) > > + { > > + case E_V16QImode: > > + case E_V8HImode: > > + case E_V4SImode: > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0)); > > + break; > > + case E_V8HFmode: > > + case E_V4SFmode: > > + if (TARGET_HAVE_MVE_FLOAT) > > + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0)); > > + break; > > + default: > > + gcc_unreachable (); > > + } > > + } > > } > > > > #define MAX_VECT_LEN 16 > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > > index 95df8bd..a128465 100644 > > --- a/gcc/config/arm/iterators.md > > +++ b/gcc/config/arm/iterators.md > > @@ -1288,12 +1288,11 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") > > (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s") > > (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U "u") > > (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s") (VSHLQ_S "s") > > - (VADDLVQ_P_U "u") (VCMPNEQ_S "s") > > + (VADDLVQ_P_U "u") > > (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s") > > (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u") > > (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBRSRQ_N_S "s") > > - (VBRSRQ_N_U "u") (VCMPEQQ_S "s") > > - (VCMPEQQ_N_S "s") (VCMPNEQ_N_S "s") > > + (VBRSRQ_N_U "u") > > (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s") > > (VHADDQ_U "u") (VHSUBQ_N_S "s") (VHSUBQ_N_U "u") > > (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u") (VHSUBQ_U "u") > > @@ -1549,16 +1548,12 @@ (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S]) > > (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U]) > > (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U]) > > (define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U]) > > -(define_int_iterator VCMPNEQ [VCMPNEQ_S]) > > (define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U]) > > (define_int_iterator VABDQ [VABDQ_S VABDQ_U]) > > (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U]) > > (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U]) > > (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S]) > > (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S]) > > -(define_int_iterator VCMPEQQ [VCMPEQQ_S]) > > -(define_int_iterator VCMPEQQ_N [VCMPEQQ_N_S]) > > -(define_int_iterator VCMPNEQ_N [VCMPNEQ_N_S]) > > (define_int_iterator VHADDQ [VHADDQ_S VHADDQ_U]) > > (define_int_iterator VHADDQ_N [VHADDQ_N_U VHADDQ_N_S]) > > (define_int_iterator VHSUBQ [VHSUBQ_S VHSUBQ_U]) > > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > > index 7c846a4..97f0a87 100644 > > --- a/gcc/config/arm/mve.md > > +++ b/gcc/config/arm/mve.md > > @@ -838,7 +838,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si" > > ;; > > ;; [vcmpneq_, vcmpcsq_, vcmpeqq_, vcmpgeq_, vcmpgtq_, vcmphiq_, vcmpleq_, vcmpltq_]) > > ;; > > -(define_insn "mve_vcmp<mve_cmp_op>q_<mode>" > > +(define_insn "@mve_vcmp<mve_cmp_op>q_<mode>" > > [ > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > (MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w") > > @@ -1928,7 +1928,7 @@ (define_insn "mve_vcaddq<mve_rot><mode>" > > ;; > > ;; [vcmpeqq_f, vcmpgeq_f, vcmpgtq_f, vcmpleq_f, vcmpltq_f, vcmpneq_f]) > > ;; > > -(define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > > +(define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>" > > [ > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > > @@ -1942,7 +1942,7 @@ (define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > > ;; > > ;; [vcmpeqq_n_f, vcmpgeq_n_f, vcmpgtq_n_f, vcmpleq_n_f, vcmpltq_n_f, vcmpneq_n_f]) > > ;; > > -(define_insn "mve_vcmp<mve_cmp_op>q_n_f<mode>" > > +(define_insn "@mve_vcmp<mve_cmp_op>q_n_f<mode>" > > [ > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > > @@ -3307,7 +3307,7 @@ (define_insn "mve_vnegq_m_s<mode>" > > ;; > > ;; [vpselq_u, vpselq_s]) > > ;; > > -(define_insn "mve_vpselq_<supf><mode>" > > +(define_insn "@mve_vpselq_<supf><mode>" > > [ > > (set (match_operand:MVE_1 0 "s_register_operand" "=w") > > (unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w") > > @@ -4402,7 +4402,7 @@ (define_insn "mve_vorrq_m_n_<supf><mode>" > > ;; > > ;; [vpselq_f]) > > ;; > > -(define_insn "mve_vpselq_f<mode>" > > +(define_insn "@mve_vpselq_f<mode>" > > [ > > (set (match_operand:MVE_0 0 "s_register_operand" "=w") > > (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > > index fec2cc9..6660846 100644 > > --- a/gcc/config/arm/neon.md > > +++ b/gcc/config/arm/neon.md > > @@ -1416,93 +1416,6 @@ (define_insn "*us_sub<mode>_neon" > > [(set_attr "type" "neon_qsub<q>")] > > ) > > > > -(define_expand "vec_cmp<mode><v_cmp_result>" > > - [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > > - (match_operator:<V_cmp_result> 1 "comparison_operator" > > - [(match_operand:VDQW 2 "s_register_operand") > > - (match_operand:VDQW 3 "reg_or_zero_operand")]))] > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > -{ > > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > - operands[2], operands[3], false); > > - DONE; > > -}) > > - > > -(define_expand "vec_cmpu<mode><mode>" > > - [(set (match_operand:VDQIW 0 "s_register_operand") > > - (match_operator:VDQIW 1 "comparison_operator" > > - [(match_operand:VDQIW 2 "s_register_operand") > > - (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > > - "TARGET_NEON" > > -{ > > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > - operands[2], operands[3], false); > > - DONE; > > -}) > > - > > -;; Conditional instructions. These are comparisons with conditional moves for > > -;; vectors. They perform the assignment: > > -;; > > -;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > > -;; > > -;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > > -;; element-wise. > > - > > -(define_expand "vcond<mode><mode>" > > - [(set (match_operand:VDQW 0 "s_register_operand") > > - (if_then_else:VDQW > > - (match_operator 3 "comparison_operator" > > - [(match_operand:VDQW 4 "s_register_operand") > > - (match_operand:VDQW 5 "reg_or_zero_operand")]) > > - (match_operand:VDQW 1 "s_register_operand") > > - (match_operand:VDQW 2 "s_register_operand")))] > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > -{ > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > - DONE; > > -}) > > - > > -(define_expand "vcond<V_cvtto><mode>" > > - [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > > - (if_then_else:<V_CVTTO> > > - (match_operator 3 "comparison_operator" > > - [(match_operand:V32 4 "s_register_operand") > > - (match_operand:V32 5 "reg_or_zero_operand")]) > > - (match_operand:<V_CVTTO> 1 "s_register_operand") > > - (match_operand:<V_CVTTO> 2 "s_register_operand")))] > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > -{ > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > - DONE; > > -}) > > - > > -(define_expand "vcondu<mode><v_cmp_result>" > > - [(set (match_operand:VDQW 0 "s_register_operand") > > - (if_then_else:VDQW > > - (match_operator 3 "arm_comparison_operator" > > - [(match_operand:<V_cmp_result> 4 "s_register_operand") > > - (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > > - (match_operand:VDQW 1 "s_register_operand") > > - (match_operand:VDQW 2 "s_register_operand")))] > > - "TARGET_NEON" > > -{ > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > - DONE; > > -}) > > - > > -(define_expand "vcond_mask_<mode><v_cmp_result>" > > - [(set (match_operand:VDQW 0 "s_register_operand") > > - (if_then_else:VDQW > > - (match_operand:<V_cmp_result> 3 "s_register_operand") > > - (match_operand:VDQW 1 "s_register_operand") > > - (match_operand:VDQW 2 "s_register_operand")))] > > - "TARGET_NEON" > > -{ > > - emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1], > > - operands[2])); > > - DONE; > > -}) > > - > > ;; Patterns for builtins. > > > > ; good for plain vadd, vaddq. > > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md > > index 07ca53b..0778db1 100644 > > --- a/gcc/config/arm/unspecs.md > > +++ b/gcc/config/arm/unspecs.md > > @@ -596,8 +596,6 @@ (define_c_enum "unspec" [ > > VCVTQ_N_FROM_F_U > > VADDLVQ_P_S > > VADDLVQ_P_U > > - VCMPNEQ_U > > - VCMPNEQ_S > > VSHLQ_S > > VSHLQ_U > > VABDQ_S > > @@ -605,9 +603,6 @@ (define_c_enum "unspec" [ > > VADDVAQ_S > > VADDVQ_P_S > > VBRSRQ_N_S > > - VCMPEQQ_S > > - VCMPEQQ_N_S > > - VCMPNEQ_N_S > > VHADDQ_S > > VHADDQ_N_S > > VHSUBQ_S > > @@ -645,9 +640,6 @@ (define_c_enum "unspec" [ > > VADDVAQ_U > > VADDVQ_P_U > > VBRSRQ_N_U > > - VCMPEQQ_U > > - VCMPEQQ_N_U > > - VCMPNEQ_N_U > > VHADDQ_U > > VHADDQ_N_U > > VHSUBQ_U > > @@ -680,14 +672,6 @@ (define_c_enum "unspec" [ > > VSHLQ_R_U > > VSUBQ_U > > VSUBQ_N_U > > - VCMPGEQ_N_S > > - VCMPGEQ_S > > - VCMPGTQ_N_S > > - VCMPGTQ_S > > - VCMPLEQ_N_S > > - VCMPLEQ_S > > - VCMPLTQ_N_S > > - VCMPLTQ_S > > VHCADDQ_ROT270_S > > VHCADDQ_ROT90_S > > VMAXAQ_S > > @@ -702,10 +686,6 @@ (define_c_enum "unspec" [ > > VQRDMULHQ_N_S > > VQRDMULHQ_S > > VQSHLUQ_N_S > > - VCMPCSQ_N_U > > - VCMPCSQ_U > > - VCMPHIQ_N_U > > - VCMPHIQ_U > > VABDQ_M_S > > VABDQ_M_U > > VABDQ_F > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md > > index 0b2b3b1..034b48b 100644 > > --- a/gcc/config/arm/vec-common.md > > +++ b/gcc/config/arm/vec-common.md > > @@ -362,3 +362,110 @@ (define_expand "vlshr<mode>3" > > DONE; > > } > > }) > > + > > +(define_expand "vec_cmp<mode><v_cmp_result>" > > + [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > > + (match_operator:<V_cmp_result> 1 "comparison_operator" > > + [(match_operand:VDQW 2 "s_register_operand") > > + (match_operand:VDQW 3 "reg_or_zero_operand")]))] > > + "ARM_HAVE_<MODE>_ARITH > > + && !TARGET_REALLY_IWMMXT > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > +{ > > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > + operands[2], operands[3], false, false); > > + DONE; > > +}) > > + > > +(define_expand "vec_cmpu<mode><mode>" > > + [(set (match_operand:VDQIW 0 "s_register_operand") > > + (match_operator:VDQIW 1 "comparison_operator" > > + [(match_operand:VDQIW 2 "s_register_operand") > > + (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > > + "ARM_HAVE_<MODE>_ARITH > > + && !TARGET_REALLY_IWMMXT" > > +{ > > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > + operands[2], operands[3], false, false); > > + DONE; > > +}) > > + > > +;; Conditional instructions. These are comparisons with conditional moves for > > +;; vectors. They perform the assignment: > > +;; > > +;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > > +;; > > +;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > > +;; element-wise. > > + > > +(define_expand "vcond<mode><mode>" > > + [(set (match_operand:VDQW 0 "s_register_operand") > > + (if_then_else:VDQW > > + (match_operator 3 "comparison_operator" > > + [(match_operand:VDQW 4 "s_register_operand") > > + (match_operand:VDQW 5 "reg_or_zero_operand")]) > > + (match_operand:VDQW 1 "s_register_operand") > > + (match_operand:VDQW 2 "s_register_operand")))] > > + "ARM_HAVE_<MODE>_ARITH > > + && !TARGET_REALLY_IWMMXT > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > +{ > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > + DONE; > > +}) > > + > > +(define_expand "vcond<V_cvtto><mode>" > > + [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > > + (if_then_else:<V_CVTTO> > > + (match_operator 3 "comparison_operator" > > + [(match_operand:V32 4 "s_register_operand") > > + (match_operand:V32 5 "reg_or_zero_operand")]) > > + (match_operand:<V_CVTTO> 1 "s_register_operand") > > + (match_operand:<V_CVTTO> 2 "s_register_operand")))] > > + "ARM_HAVE_<MODE>_ARITH > > + && !TARGET_REALLY_IWMMXT > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > +{ > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > + DONE; > > +}) > > + > > +(define_expand "vcondu<mode><v_cmp_result>" > > + [(set (match_operand:VDQW 0 "s_register_operand") > > + (if_then_else:VDQW > > + (match_operator 3 "arm_comparison_operator" > > + [(match_operand:<V_cmp_result> 4 "s_register_operand") > > + (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > > + (match_operand:VDQW 1 "s_register_operand") > > + (match_operand:VDQW 2 "s_register_operand")))] > > + "ARM_HAVE_<MODE>_ARITH > > + && !TARGET_REALLY_IWMMXT" > > +{ > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > + DONE; > > +}) > > + > > +(define_expand "vcond_mask_<mode><v_cmp_result>" > > + [(set (match_operand:VDQW 0 "s_register_operand") > > + (if_then_else:VDQW > > + (match_operand:<V_cmp_result> 3 "s_register_operand") > > + (match_operand:VDQW 1 "s_register_operand") > > + (match_operand:VDQW 2 "s_register_operand")))] > > + "ARM_HAVE_<MODE>_ARITH > > + && !TARGET_REALLY_IWMMXT" > > +{ > > + if (TARGET_NEON) > > + { > > + emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3], > > + operands[1], operands[2])); > > + } > > + else if (TARGET_HAVE_MVE) > > + { > > + emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0], > > + operands[1], operands[2], operands[3])); > > + } > > + else > > + gcc_unreachable (); > > + > > + DONE; > > +}) > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > new file mode 100644 > > index 0000000..029c931 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > @@ -0,0 +1,80 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > +/* { dg-add-options arm_v8_1m_mve } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +/* Integer tests. */ > > + > > +#define COMPARE_REG(NAME, OP, TYPE) \ > > + TYPE \ > > + cmp_##NAME##_##TYPE##_reg (TYPE a, TYPE b) \ > > + { \ > > + return a OP b; \ > > + } > > + > > +#define COMPARE_REG_AND_ZERO(NAME, OP, TYPE) \ > > + COMPARE_REG (NAME, OP, TYPE) \ > > + \ > > + TYPE \ > > + cmp_##NAME##_##TYPE##_zero (TYPE a) \ > > + { \ > > + return a OP (TYPE) {}; \ > > + } > > + > > +#define COMPARE_TYPE(TYPE, COMPARE_ORDERED) \ > > + COMPARE_REG_AND_ZERO (eq, ==, TYPE) \ > > + COMPARE_REG_AND_ZERO (ne, !=, TYPE) \ > > + COMPARE_ORDERED (lt, <, TYPE) \ > > + COMPARE_ORDERED (le, <=, TYPE) \ > > + COMPARE_ORDERED (gt, >, TYPE) \ > > + COMPARE_ORDERED (ge, >=, TYPE) > > + > > +#define TEST_TYPE(NAME, ELEM, COMPARE_ORDERED, SIZE) \ > > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > > + COMPARE_TYPE (NAME##SIZE, COMPARE_ORDERED) > > + > > +/* 64-bits vectors, not vectorized. */ > > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 8) > > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 8) > > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 8) > > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 8) > > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 8) > > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 8) > > + > > +/* 128-bits vectors. */ > > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 16) > > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 16) > > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 16) > > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 16) > > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 16) > > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 16) > > + > > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > + > > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > + > > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > new file mode 100644 > > index 0000000..8515195 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > @@ -0,0 +1,38 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > + > > +/* float 32 tests. */ > > + > > +#ifndef ELEM_TYPE > > +#define ELEM_TYPE float > > +#endif > > +#ifndef INT_ELEM_TYPE > > +#define INT_ELEM_TYPE __INT32_TYPE__ > > +#endif > > + > > +#define COMPARE(NAME, OP) \ > > + int_vec \ > > + cmp_##NAME##_reg (vec a, vec b) \ > > + { \ > > + return a OP b; \ > > + } > > + > > +typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16))); > > +typedef ELEM_TYPE vec __attribute__((vector_size(16))); > > + > > +COMPARE (eq, ==) > > +COMPARE (ne, !=) > > +COMPARE (lt, <) > > +COMPARE (le, <=) > > +COMPARE (gt, >) > > +COMPARE (ge, >=) > > + > > +/* eq, ne, lt, le, gt, ge. > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > new file mode 100644 > > index 0000000..7774972 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > @@ -0,0 +1,69 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > +/* { dg-add-options arm_v8_1m_mve } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +#define COMPARE_REG(NAME, OP, TYPE, SCALAR) \ > > + TYPE \ > > + cmp_##NAME##_##TYPE##_scalar (TYPE a, SCALAR b) \ > > + { \ > > + return a OP b; \ > > + } > > + > > +#define COMPARE_TYPE(SCALAR, TYPE) \ > > + COMPARE_REG (eq, ==, TYPE, SCALAR) \ > > + COMPARE_REG (ne, !=, TYPE, SCALAR) \ > > + COMPARE_REG (lt, <, TYPE, SCALAR) \ > > + COMPARE_REG (le, <=, TYPE, SCALAR) \ > > + COMPARE_REG (gt, >, TYPE, SCALAR) \ > > + COMPARE_REG (ge, >=, TYPE, SCALAR) > > + > > +#define TEST_TYPE(NAME, ELEM, SIZE) \ > > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > > + COMPARE_TYPE (ELEM, NAME##SIZE) > > + > > +/* 64-bits vectors, not vectorized. */ > > +TEST_TYPE (vs8, __INT8_TYPE__, 8) > > +TEST_TYPE (vu8, __UINT8_TYPE__, 8) > > +TEST_TYPE (vs16, __INT16_TYPE__, 8) > > +TEST_TYPE (vu16, __UINT16_TYPE__, 8) > > +TEST_TYPE (vs32, __INT32_TYPE__, 8) > > +TEST_TYPE (vu32, __UINT32_TYPE__, 8) > > + > > +/* 128-bits vectors. */ > > +TEST_TYPE (vs8, __INT8_TYPE__, 16) > > +TEST_TYPE (vu8, __UINT8_TYPE__, 16) > > +TEST_TYPE (vs16, __INT16_TYPE__, 16) > > +TEST_TYPE (vu16, __UINT16_TYPE__, 16) > > +TEST_TYPE (vs32, __INT32_TYPE__, 16) > > +TEST_TYPE (vu32, __UINT32_TYPE__, 16) > > + > > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > + > > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > + > > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > new file mode 100644 > > index 0000000..4ed449e > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > @@ -0,0 +1,30 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > + > > +#include <stdint.h> > > + > > +#define NB 4 > > + > > +#define FUNC(OP, NAME) \ > > + void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \ > > + int i; \ > > + for (i=0; i<NB; i++) { \ > > + dest[i] = a[i] OP b[i]; \ > > + } \ > > + } > > + > > +FUNC(==, vcmpeq) > > +FUNC(!=, vcmpne) > > +FUNC(<, vcmplt) > > +FUNC(<=, vcmple) > > +FUNC(>, vcmpgt) > > +FUNC(>=, vcmpge) > > + > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > new file mode 100644 > > index 0000000..8da15e7 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > @@ -0,0 +1,50 @@ > > +/* { dg-do assemble } */ > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > +/* { dg-add-options arm_v8_1m_mve } */ > > +/* { dg-additional-options "-O3" } */ > > + > > +#include <stdint.h> > > + > > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME) \ > > + void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ > > + int i; \ > > + for (i=0; i<NB; i++) { \ > > + dest[i] = a[i] OP b[i]; \ > > + } \ > > +} > > + > > +#define ALL_FUNCS(OP, NAME) \ > > + FUNC(s, int, 32, 2, OP, NAME) \ > > + FUNC(u, uint, 32, 2, OP, NAME) \ > > + FUNC(s, int, 16, 4, OP, NAME) \ > > + FUNC(u, uint, 16, 4, OP, NAME) \ > > + FUNC(s, int, 8, 8, OP, NAME) \ > > + FUNC(u, uint, 8, 8, OP, NAME) \ > > + FUNC(s, int, 32, 4, OP, NAME) \ > > + FUNC(u, uint, 32, 4, OP, NAME) \ > > + FUNC(s, int, 16, 8, OP, NAME) \ > > + FUNC(u, uint, 16, 8, OP, NAME) \ > > + FUNC(s, int, 8, 16, OP, NAME) \ > > + FUNC(u, uint, 8, 16, OP, NAME) > > + > > +ALL_FUNCS(==, vcmpeq) > > +ALL_FUNCS(!=, vcmpne) > > +ALL_FUNCS(<, vcmplt) > > +ALL_FUNCS(<=, vcmple) > > +ALL_FUNCS(>, vcmpgt) > > +ALL_FUNCS(>=, vcmpge) > > + > > +/* MVE has only 128-bit vectors, so we can vectorize only half of the > > + functions above. */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ eq, q[0-9]+, q[0-9]+\n} 6 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ ne, q[0-9]+, q[0-9]+\n} 6 } } */ > > + > > +/* lt, le, gt, ge apply to signed types, cs and hi to unsigned types. */ > > +/* lt and le with unsigned types are replaced with the opposite condition, hence > > + the double number of matches for cs and hi. */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ lt, q[0-9]+, q[0-9]+\n} 3 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ le, q[0-9]+, q[0-9]+\n} 3 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ gt, q[0-9]+, q[0-9]+\n} 3 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ ge, q[0-9]+, q[0-9]+\n} 3 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ cs, q[0-9]+, q[0-9]+\n} 6 } } */ > > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ hi, q[0-9]+, q[0-9]+\n} 6 } } */
On Tue, 4 May 2021 at 15:41, Christophe Lyon <christophe.lyon@linaro.org> wrote: > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists) > <andre.simoesdiasvieira@arm.com> wrote: > > > > Hi Christophe, > > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > > > Since MVE has a different set of vector comparison operators from > > > Neon, we have to update the expansion to take into account the new > > > ones, for instance 'NE' for which MVE does not require to use 'EQ' > > > with the inverted condition. > > > > > > Conversely, Neon supports comparisons with #0, MVE does not. > > > > > > For: > > > typedef long int vs32 __attribute__((vector_size(16))); > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } > > > > > > we now generate: > > > cmp_eq_vs32_reg: > > > vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 > > > vldr.64 d5, .L123+8 > > > vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 > > > vldr.64 d7, .L123+24 > > > vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si > > > vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si > > > bx lr @ 26 [c=8 l=4] *thumb2_return > > > .L124: > > > .align 3 > > > .L123: > > > .word 0 > > > .word 0 > > > .word 0 > > > .word 0 > > > .word 1 > > > .word 1 > > > .word 1 > > > .word 1 > > > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces > > > a pair of vldr instead of vmov.i32, qX, #0 > > I think ideally we would even want: > > vpte eq, q0, q1 > > vmovt.i32 q0, #0 > > vmove.i32 q0, #1 > > > > But we don't have a way to generate VPT blocks with multiple > > instructions yet unfortunately so I guess VPSEL will have to do for now. > > TBH, I looked at what LLVM generates currently ;-) > Here is an updated version, which adds && (!<Is_float_mode> || flag_unsafe_math_optimizations) to vcond_mask_ This condition was not present in the neon.md version I move to vec-common.md, but since the VDQW iterator includes V2SF and V4SF, it should take float-point flags into account. Christophe > > > > > > > > 2021-03-01 Christophe Lyon <christophe.lyon@linaro.org> > > > > > > gcc/ > > > * config/arm/arm-protos.h (arm_expand_vector_compare): Update > > > prototype. > > > * config/arm/arm.c (arm_expand_vector_compare): Add support for > > > MVE. > > > (arm_expand_vcond): Likewise. > > > * config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S, > > > VCMPEQQ_N_S, VCMPNEQ_N_S. > > > (VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove. > > > * config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Add '@' prefix. > > > (@mve_vcmp<mve_cmp_op>q_f<mode>): Likewise. > > > (@mve_vcmp<mve_cmp_op>q_n_f<mode>): Likewise. > > > (@mve_vpselq_<supf><mode>): Likewise. > > > (@mve_vpselq_f<mode>"): Likewise. > > > * config/arm/neon.md (vec_cmp<mode><v_cmp_result): Enable for MVE > > > and move to vec-common.md. > > > (vec_cmpu<mode><mode>): Likewise. > > > (vcond<mode><mode>): Likewise. > > > (vcond<V_cvtto><mode>): Likewise. > > > (vcondu<mode><v_cmp_result>): Likewise. > > > (vcond_mask_<mode><v_cmp_result>): Likewise. > > > * config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S) > > > (VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U) > > > (VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S) > > > (VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U) > > > (VCMPHIQ_N_U, VCMPHIQ_U): Remove. > > > * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result): Moved > > > from neon.md. > > > (vec_cmpu<mode><mode>): Likewise. > > > (vcond<mode><mode>): Likewise. > > > (vcond<V_cvtto><mode>): Likewise. > > > (vcondu<mode><v_cmp_result>): Likewise. > > > (vcond_mask_<mode><v_cmp_result>): Likewise. > > > > > > gcc/testsuite > > > * gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors. > > > * gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors. > > > * gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC > > > vectors. > > > * gcc.target/arm/simd/mve-vcmp-f32.c: New test for > > > auto-vectorization. > > > * gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization. > > > > > > add gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > --- > > > gcc/config/arm/arm-protos.h | 2 +- > > > gcc/config/arm/arm.c | 211 ++++++++++++++++----- > > > gcc/config/arm/iterators.md | 9 +- > > > gcc/config/arm/mve.md | 10 +- > > > gcc/config/arm/neon.md | 87 --------- > > > gcc/config/arm/unspecs.md | 20 -- > > > gcc/config/arm/vec-common.md | 107 +++++++++++ > > > gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c | 80 ++++++++ > > > gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c | 38 ++++ > > > .../gcc.target/arm/simd/mve-compare-scalar-1.c | 69 +++++++ > > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c | 30 +++ > > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c | 50 +++++ > > > 12 files changed, 547 insertions(+), 166 deletions(-) > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > > > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > > > index 2521541..ffccaa7 100644 > > > --- a/gcc/config/arm/arm-protos.h > > > +++ b/gcc/config/arm/arm-protos.h > > > @@ -373,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx, > > > extern bool arm_fusion_enabled_p (tune_params::fuse_ops); > > > extern bool arm_valid_symbolic_address_p (rtx); > > > extern bool arm_validize_comparison (rtx *, rtx *, rtx *); > > > -extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool); > > > +extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool); > > > #endif /* RTX_CODE */ > > > > > > extern bool arm_gen_setmem (rtx *); > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > > index 0371d98..80e28ef 100644 > > > --- a/gcc/config/arm/arm.c > > > +++ b/gcc/config/arm/arm.c > > > @@ -30933,66 +30933,114 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, > > > and return true if TARGET contains the inverse. If !CAN_INVERT, > > > always store the result in TARGET, never its inverse. > > > > > > + If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do > > > + it with the right destination type to avoid emiting two vpsel, one here and > > > + one in arm_expand_vcond. > > > + > > > Note that the handling of floating-point comparisons is not > > > IEEE compliant. */ > > > > > > bool > > > arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > - bool can_invert) > > > + bool can_invert, bool vcond_mve) > > > { > > > machine_mode cmp_result_mode = GET_MODE (target); > > > machine_mode cmp_mode = GET_MODE (op0); > > > > > > bool inverted; > > > - switch (code) > > > - { > > > - /* For these we need to compute the inverse of the requested > > > - comparison. */ > > > - case UNORDERED: > > > - case UNLT: > > > - case UNLE: > > > - case UNGT: > > > - case UNGE: > > > - case UNEQ: > > > - case NE: > > > - code = reverse_condition_maybe_unordered (code); > > > - if (!can_invert) > > > - { > > > - /* Recursively emit the inverted comparison into a temporary > > > - and then store its inverse in TARGET. This avoids reusing > > > - TARGET (which for integer NE could be one of the inputs). */ > > > - rtx tmp = gen_reg_rtx (cmp_result_mode); > > > - if (arm_expand_vector_compare (tmp, code, op0, op1, true)) > > > - gcc_unreachable (); > > > - emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > > > - return false; > > > - } > > > - inverted = true; > > > - break; > > > > > > - default: > > > + /* MVE supports more comparisons than Neon. */ > > > + if (TARGET_HAVE_MVE) > > > inverted = false; > > > - break; > > > - } > > > + else > > > + switch (code) > > > + { > > > + /* For these we need to compute the inverse of the requested > > > + comparison. */ > > > + case UNORDERED: > > > + case UNLT: > > > + case UNLE: > > > + case UNGT: > > > + case UNGE: > > > + case UNEQ: > > > + case NE: > > > + code = reverse_condition_maybe_unordered (code); > > > + if (!can_invert) > > > + { > > > + /* Recursively emit the inverted comparison into a temporary > > > + and then store its inverse in TARGET. This avoids reusing > > > + TARGET (which for integer NE could be one of the inputs). */ > > > + rtx tmp = gen_reg_rtx (cmp_result_mode); > > > + if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve)) > > > + gcc_unreachable (); > > > + emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > > > + return false; > > > + } > > > + inverted = true; > > > + break; > > > + > > > + default: > > > + inverted = false; > > > + break; > > > + } > > > > > > switch (code) > > > { > > > - /* These are natively supported for zero comparisons, but otherwise > > > - require the operands to be swapped. */ > > > + /* These are natively supported by Neon for zero comparisons, but otherwise > > > + require the operands to be swapped. For MVE, we can only compare > > > + registers. */ > > > case LE: > > > case LT: > > > - if (op1 != CONST0_RTX (cmp_mode)) > > > - { > > > - code = swap_condition (code); > > > - std::swap (op0, op1); > > > - } > > > + if (!TARGET_HAVE_MVE) > > > + if (op1 != CONST0_RTX (cmp_mode)) > > > + { > > > + code = swap_condition (code); > > > + std::swap (op0, op1); > > > + } > > > /* Fall through. */ > > > > > > - /* These are natively supported for both register and zero operands. */ > > > + /* These are natively supported by Neon for both register and zero > > > + operands. MVE supports registers only. */ > > > case EQ: > > > case GE: > > > case GT: > > > - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > > > + case NE: > > > + if (TARGET_HAVE_MVE) { > > > + rtx vpr_p0; > > > + if (vcond_mve) > > > + vpr_p0 = target; > > > + else > > > + vpr_p0 = gen_reg_rtx (HImode); > > > + > > > + switch (cmp_mode) > > > + { > > > + case E_V16QImode: > > > + case E_V8HImode: > > > + case E_V4SImode: > > > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > > + break; > > > + case E_V8HFmode: > > > + case E_V4SFmode: > > > + if (TARGET_HAVE_MVE_FLOAT) > > > + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > > + else > > > + gcc_unreachable (); > > > + break; > > > + default: > > > + gcc_unreachable (); > > > + } > > > + > > > + /* If we are not expanding a vcond, build the result here. */ > > > + if (!vcond_mve) { > > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > > + rtx one = gen_reg_rtx (cmp_result_mode); > > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > > + } > > > + } > > > + else > > > + emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > > > return inverted; > > > > > > /* These are natively supported for register operands only. > > > @@ -31000,16 +31048,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > or canonicalized by target-independent code. */ > > > case GEU: > > > case GTU: > > > - emit_insn (gen_neon_vc (code, cmp_mode, target, > > > - op0, force_reg (cmp_mode, op1))); > > > + if (TARGET_HAVE_MVE) { > > > + rtx vpr_p0; > > > + if (vcond_mve) > > > + vpr_p0 = target; > > > + else > > > + vpr_p0 = gen_reg_rtx (HImode); > > > + > > > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > > + if (!vcond_mve) { > > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > > + rtx one = gen_reg_rtx (cmp_result_mode); > > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > > + } > > > + } > > > + else > > > + emit_insn (gen_neon_vc (code, cmp_mode, target, > > > + op0, force_reg (cmp_mode, op1))); > > > return inverted; > > > > > > /* These require the operands to be swapped and likewise do not > > > support comparisons with zero. */ > > > case LEU: > > > case LTU: > > > - emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > > > - target, force_reg (cmp_mode, op1), op0)); > > > + if (TARGET_HAVE_MVE) { > > > + rtx vpr_p0; > > > + if (vcond_mve) > > > + vpr_p0 = target; > > > + else > > > + vpr_p0 = gen_reg_rtx (HImode); > > > + > > > + emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0)); > > > + if (!vcond_mve) { > > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > > + rtx one = gen_reg_rtx (cmp_result_mode); > > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > > + } > > > + } > > > + else > > > + emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > > > + target, force_reg (cmp_mode, op1), op0)); > > > return inverted; > > > > > > /* These need a combination of two comparisons. */ > > > @@ -31021,8 +31103,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > rtx gt_res = gen_reg_rtx (cmp_result_mode); > > > rtx alt_res = gen_reg_rtx (cmp_result_mode); > > > rtx_code alt_code = (code == LTGT ? LT : LE); > > > - if (arm_expand_vector_compare (gt_res, GT, op0, op1, true) > > > - || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true)) > > > + if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve) > > > + || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve)) > > > gcc_unreachable (); > > > emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode, > > > gt_res, alt_res))); > > > @@ -31040,13 +31122,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > void > > > arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode) > > > { > > > - rtx mask = gen_reg_rtx (cmp_result_mode); > > > + /* When expanding for MVE, we do not want to emit a (useless) vpsel in > > > + arm_expand_vector_compare, and another one here. */ > > > + bool vcond_mve=false; > > > + rtx mask; > > > + > > > + if (TARGET_HAVE_MVE) > > > + { > > > + vcond_mve=true; > > > + mask = gen_reg_rtx (HImode); > > > + } > > > + else > > > + mask = gen_reg_rtx (cmp_result_mode); > > > + > > > bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), > > > - operands[4], operands[5], true); > > > + operands[4], operands[5], true, vcond_mve); > > > if (inverted) > > > std::swap (operands[1], operands[2]); > > > + if (TARGET_NEON) > > > emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], > > > mask, operands[1], operands[2])); > > > + else > > > + { > > > + machine_mode cmp_mode = GET_MODE (operands[4]); > > > + rtx vpr_p0 = mask; > > > + rtx zero = gen_reg_rtx (cmp_mode); > > > + rtx one = gen_reg_rtx (cmp_mode); > > > + emit_move_insn (zero, CONST0_RTX (cmp_mode)); > > > + emit_move_insn (one, CONST1_RTX (cmp_mode)); > > > + switch (cmp_mode) > > > + { > > > + case E_V16QImode: > > > + case E_V8HImode: > > > + case E_V4SImode: > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0)); > > > + break; > > > + case E_V8HFmode: > > > + case E_V4SFmode: > > > + if (TARGET_HAVE_MVE_FLOAT) > > > + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0)); > > > + break; > > > + default: > > > + gcc_unreachable (); > > > + } > > > + } > > > } > > > > > > #define MAX_VECT_LEN 16 > > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > > > index 95df8bd..a128465 100644 > > > --- a/gcc/config/arm/iterators.md > > > +++ b/gcc/config/arm/iterators.md > > > @@ -1288,12 +1288,11 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") > > > (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s") > > > (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U "u") > > > (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s") (VSHLQ_S "s") > > > - (VADDLVQ_P_U "u") (VCMPNEQ_S "s") > > > + (VADDLVQ_P_U "u") > > > (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s") > > > (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u") > > > (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBRSRQ_N_S "s") > > > - (VBRSRQ_N_U "u") (VCMPEQQ_S "s") > > > - (VCMPEQQ_N_S "s") (VCMPNEQ_N_S "s") > > > + (VBRSRQ_N_U "u") > > > (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s") > > > (VHADDQ_U "u") (VHSUBQ_N_S "s") (VHSUBQ_N_U "u") > > > (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u") (VHSUBQ_U "u") > > > @@ -1549,16 +1548,12 @@ (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S]) > > > (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U]) > > > (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U]) > > > (define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U]) > > > -(define_int_iterator VCMPNEQ [VCMPNEQ_S]) > > > (define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U]) > > > (define_int_iterator VABDQ [VABDQ_S VABDQ_U]) > > > (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U]) > > > (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U]) > > > (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S]) > > > (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S]) > > > -(define_int_iterator VCMPEQQ [VCMPEQQ_S]) > > > -(define_int_iterator VCMPEQQ_N [VCMPEQQ_N_S]) > > > -(define_int_iterator VCMPNEQ_N [VCMPNEQ_N_S]) > > > (define_int_iterator VHADDQ [VHADDQ_S VHADDQ_U]) > > > (define_int_iterator VHADDQ_N [VHADDQ_N_U VHADDQ_N_S]) > > > (define_int_iterator VHSUBQ [VHSUBQ_S VHSUBQ_U]) > > > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > > > index 7c846a4..97f0a87 100644 > > > --- a/gcc/config/arm/mve.md > > > +++ b/gcc/config/arm/mve.md > > > @@ -838,7 +838,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si" > > > ;; > > > ;; [vcmpneq_, vcmpcsq_, vcmpeqq_, vcmpgeq_, vcmpgtq_, vcmphiq_, vcmpleq_, vcmpltq_]) > > > ;; > > > -(define_insn "mve_vcmp<mve_cmp_op>q_<mode>" > > > +(define_insn "@mve_vcmp<mve_cmp_op>q_<mode>" > > > [ > > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > > (MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w") > > > @@ -1928,7 +1928,7 @@ (define_insn "mve_vcaddq<mve_rot><mode>" > > > ;; > > > ;; [vcmpeqq_f, vcmpgeq_f, vcmpgtq_f, vcmpleq_f, vcmpltq_f, vcmpneq_f]) > > > ;; > > > -(define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > > > +(define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>" > > > [ > > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > > > @@ -1942,7 +1942,7 @@ (define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > > > ;; > > > ;; [vcmpeqq_n_f, vcmpgeq_n_f, vcmpgtq_n_f, vcmpleq_n_f, vcmpltq_n_f, vcmpneq_n_f]) > > > ;; > > > -(define_insn "mve_vcmp<mve_cmp_op>q_n_f<mode>" > > > +(define_insn "@mve_vcmp<mve_cmp_op>q_n_f<mode>" > > > [ > > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > > > @@ -3307,7 +3307,7 @@ (define_insn "mve_vnegq_m_s<mode>" > > > ;; > > > ;; [vpselq_u, vpselq_s]) > > > ;; > > > -(define_insn "mve_vpselq_<supf><mode>" > > > +(define_insn "@mve_vpselq_<supf><mode>" > > > [ > > > (set (match_operand:MVE_1 0 "s_register_operand" "=w") > > > (unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w") > > > @@ -4402,7 +4402,7 @@ (define_insn "mve_vorrq_m_n_<supf><mode>" > > > ;; > > > ;; [vpselq_f]) > > > ;; > > > -(define_insn "mve_vpselq_f<mode>" > > > +(define_insn "@mve_vpselq_f<mode>" > > > [ > > > (set (match_operand:MVE_0 0 "s_register_operand" "=w") > > > (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") > > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > > > index fec2cc9..6660846 100644 > > > --- a/gcc/config/arm/neon.md > > > +++ b/gcc/config/arm/neon.md > > > @@ -1416,93 +1416,6 @@ (define_insn "*us_sub<mode>_neon" > > > [(set_attr "type" "neon_qsub<q>")] > > > ) > > > > > > -(define_expand "vec_cmp<mode><v_cmp_result>" > > > - [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > > > - (match_operator:<V_cmp_result> 1 "comparison_operator" > > > - [(match_operand:VDQW 2 "s_register_operand") > > > - (match_operand:VDQW 3 "reg_or_zero_operand")]))] > > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > -{ > > > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > - operands[2], operands[3], false); > > > - DONE; > > > -}) > > > - > > > -(define_expand "vec_cmpu<mode><mode>" > > > - [(set (match_operand:VDQIW 0 "s_register_operand") > > > - (match_operator:VDQIW 1 "comparison_operator" > > > - [(match_operand:VDQIW 2 "s_register_operand") > > > - (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > > > - "TARGET_NEON" > > > -{ > > > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > - operands[2], operands[3], false); > > > - DONE; > > > -}) > > > - > > > -;; Conditional instructions. These are comparisons with conditional moves for > > > -;; vectors. They perform the assignment: > > > -;; > > > -;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > > > -;; > > > -;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > > > -;; element-wise. > > > - > > > -(define_expand "vcond<mode><mode>" > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > - (if_then_else:VDQW > > > - (match_operator 3 "comparison_operator" > > > - [(match_operand:VDQW 4 "s_register_operand") > > > - (match_operand:VDQW 5 "reg_or_zero_operand")]) > > > - (match_operand:VDQW 1 "s_register_operand") > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > -{ > > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > > - DONE; > > > -}) > > > - > > > -(define_expand "vcond<V_cvtto><mode>" > > > - [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > > > - (if_then_else:<V_CVTTO> > > > - (match_operator 3 "comparison_operator" > > > - [(match_operand:V32 4 "s_register_operand") > > > - (match_operand:V32 5 "reg_or_zero_operand")]) > > > - (match_operand:<V_CVTTO> 1 "s_register_operand") > > > - (match_operand:<V_CVTTO> 2 "s_register_operand")))] > > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > -{ > > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > > - DONE; > > > -}) > > > - > > > -(define_expand "vcondu<mode><v_cmp_result>" > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > - (if_then_else:VDQW > > > - (match_operator 3 "arm_comparison_operator" > > > - [(match_operand:<V_cmp_result> 4 "s_register_operand") > > > - (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > > > - (match_operand:VDQW 1 "s_register_operand") > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > - "TARGET_NEON" > > > -{ > > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > > - DONE; > > > -}) > > > - > > > -(define_expand "vcond_mask_<mode><v_cmp_result>" > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > - (if_then_else:VDQW > > > - (match_operand:<V_cmp_result> 3 "s_register_operand") > > > - (match_operand:VDQW 1 "s_register_operand") > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > - "TARGET_NEON" > > > -{ > > > - emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1], > > > - operands[2])); > > > - DONE; > > > -}) > > > - > > > ;; Patterns for builtins. > > > > > > ; good for plain vadd, vaddq. > > > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md > > > index 07ca53b..0778db1 100644 > > > --- a/gcc/config/arm/unspecs.md > > > +++ b/gcc/config/arm/unspecs.md > > > @@ -596,8 +596,6 @@ (define_c_enum "unspec" [ > > > VCVTQ_N_FROM_F_U > > > VADDLVQ_P_S > > > VADDLVQ_P_U > > > - VCMPNEQ_U > > > - VCMPNEQ_S > > > VSHLQ_S > > > VSHLQ_U > > > VABDQ_S > > > @@ -605,9 +603,6 @@ (define_c_enum "unspec" [ > > > VADDVAQ_S > > > VADDVQ_P_S > > > VBRSRQ_N_S > > > - VCMPEQQ_S > > > - VCMPEQQ_N_S > > > - VCMPNEQ_N_S > > > VHADDQ_S > > > VHADDQ_N_S > > > VHSUBQ_S > > > @@ -645,9 +640,6 @@ (define_c_enum "unspec" [ > > > VADDVAQ_U > > > VADDVQ_P_U > > > VBRSRQ_N_U > > > - VCMPEQQ_U > > > - VCMPEQQ_N_U > > > - VCMPNEQ_N_U > > > VHADDQ_U > > > VHADDQ_N_U > > > VHSUBQ_U > > > @@ -680,14 +672,6 @@ (define_c_enum "unspec" [ > > > VSHLQ_R_U > > > VSUBQ_U > > > VSUBQ_N_U > > > - VCMPGEQ_N_S > > > - VCMPGEQ_S > > > - VCMPGTQ_N_S > > > - VCMPGTQ_S > > > - VCMPLEQ_N_S > > > - VCMPLEQ_S > > > - VCMPLTQ_N_S > > > - VCMPLTQ_S > > > VHCADDQ_ROT270_S > > > VHCADDQ_ROT90_S > > > VMAXAQ_S > > > @@ -702,10 +686,6 @@ (define_c_enum "unspec" [ > > > VQRDMULHQ_N_S > > > VQRDMULHQ_S > > > VQSHLUQ_N_S > > > - VCMPCSQ_N_U > > > - VCMPCSQ_U > > > - VCMPHIQ_N_U > > > - VCMPHIQ_U > > > VABDQ_M_S > > > VABDQ_M_U > > > VABDQ_F > > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md > > > index 0b2b3b1..034b48b 100644 > > > --- a/gcc/config/arm/vec-common.md > > > +++ b/gcc/config/arm/vec-common.md > > > @@ -362,3 +362,110 @@ (define_expand "vlshr<mode>3" > > > DONE; > > > } > > > }) > > > + > > > +(define_expand "vec_cmp<mode><v_cmp_result>" > > > + [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > > > + (match_operator:<V_cmp_result> 1 "comparison_operator" > > > + [(match_operand:VDQW 2 "s_register_operand") > > > + (match_operand:VDQW 3 "reg_or_zero_operand")]))] > > > + "ARM_HAVE_<MODE>_ARITH > > > + && !TARGET_REALLY_IWMMXT > > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > +{ > > > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > + operands[2], operands[3], false, false); > > > + DONE; > > > +}) > > > + > > > +(define_expand "vec_cmpu<mode><mode>" > > > + [(set (match_operand:VDQIW 0 "s_register_operand") > > > + (match_operator:VDQIW 1 "comparison_operator" > > > + [(match_operand:VDQIW 2 "s_register_operand") > > > + (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > > > + "ARM_HAVE_<MODE>_ARITH > > > + && !TARGET_REALLY_IWMMXT" > > > +{ > > > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > + operands[2], operands[3], false, false); > > > + DONE; > > > +}) > > > + > > > +;; Conditional instructions. These are comparisons with conditional moves for > > > +;; vectors. They perform the assignment: > > > +;; > > > +;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > > > +;; > > > +;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > > > +;; element-wise. > > > + > > > +(define_expand "vcond<mode><mode>" > > > + [(set (match_operand:VDQW 0 "s_register_operand") > > > + (if_then_else:VDQW > > > + (match_operator 3 "comparison_operator" > > > + [(match_operand:VDQW 4 "s_register_operand") > > > + (match_operand:VDQW 5 "reg_or_zero_operand")]) > > > + (match_operand:VDQW 1 "s_register_operand") > > > + (match_operand:VDQW 2 "s_register_operand")))] > > > + "ARM_HAVE_<MODE>_ARITH > > > + && !TARGET_REALLY_IWMMXT > > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > +{ > > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > > + DONE; > > > +}) > > > + > > > +(define_expand "vcond<V_cvtto><mode>" > > > + [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > > > + (if_then_else:<V_CVTTO> > > > + (match_operator 3 "comparison_operator" > > > + [(match_operand:V32 4 "s_register_operand") > > > + (match_operand:V32 5 "reg_or_zero_operand")]) > > > + (match_operand:<V_CVTTO> 1 "s_register_operand") > > > + (match_operand:<V_CVTTO> 2 "s_register_operand")))] > > > + "ARM_HAVE_<MODE>_ARITH > > > + && !TARGET_REALLY_IWMMXT > > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > +{ > > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > > + DONE; > > > +}) > > > + > > > +(define_expand "vcondu<mode><v_cmp_result>" > > > + [(set (match_operand:VDQW 0 "s_register_operand") > > > + (if_then_else:VDQW > > > + (match_operator 3 "arm_comparison_operator" > > > + [(match_operand:<V_cmp_result> 4 "s_register_operand") > > > + (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > > > + (match_operand:VDQW 1 "s_register_operand") > > > + (match_operand:VDQW 2 "s_register_operand")))] > > > + "ARM_HAVE_<MODE>_ARITH > > > + && !TARGET_REALLY_IWMMXT" > > > +{ > > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > > + DONE; > > > +}) > > > + > > > +(define_expand "vcond_mask_<mode><v_cmp_result>" > > > + [(set (match_operand:VDQW 0 "s_register_operand") > > > + (if_then_else:VDQW > > > + (match_operand:<V_cmp_result> 3 "s_register_operand") > > > + (match_operand:VDQW 1 "s_register_operand") > > > + (match_operand:VDQW 2 "s_register_operand")))] > > > + "ARM_HAVE_<MODE>_ARITH > > > + && !TARGET_REALLY_IWMMXT" > > > +{ > > > + if (TARGET_NEON) > > > + { > > > + emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3], > > > + operands[1], operands[2])); > > > + } > > > + else if (TARGET_HAVE_MVE) > > > + { > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0], > > > + operands[1], operands[2], operands[3])); > > > + } > > > + else > > > + gcc_unreachable (); > > > + > > > + DONE; > > > +}) > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > > new file mode 100644 > > > index 0000000..029c931 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > > @@ -0,0 +1,80 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > > +/* { dg-add-options arm_v8_1m_mve } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +/* Integer tests. */ > > > + > > > +#define COMPARE_REG(NAME, OP, TYPE) \ > > > + TYPE \ > > > + cmp_##NAME##_##TYPE##_reg (TYPE a, TYPE b) \ > > > + { \ > > > + return a OP b; \ > > > + } > > > + > > > +#define COMPARE_REG_AND_ZERO(NAME, OP, TYPE) \ > > > + COMPARE_REG (NAME, OP, TYPE) \ > > > + \ > > > + TYPE \ > > > + cmp_##NAME##_##TYPE##_zero (TYPE a) \ > > > + { \ > > > + return a OP (TYPE) {}; \ > > > + } > > > + > > > +#define COMPARE_TYPE(TYPE, COMPARE_ORDERED) \ > > > + COMPARE_REG_AND_ZERO (eq, ==, TYPE) \ > > > + COMPARE_REG_AND_ZERO (ne, !=, TYPE) \ > > > + COMPARE_ORDERED (lt, <, TYPE) \ > > > + COMPARE_ORDERED (le, <=, TYPE) \ > > > + COMPARE_ORDERED (gt, >, TYPE) \ > > > + COMPARE_ORDERED (ge, >=, TYPE) > > > + > > > +#define TEST_TYPE(NAME, ELEM, COMPARE_ORDERED, SIZE) \ > > > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > > > + COMPARE_TYPE (NAME##SIZE, COMPARE_ORDERED) > > > + > > > +/* 64-bits vectors, not vectorized. */ > > > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 8) > > > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 8) > > > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 8) > > > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 8) > > > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 8) > > > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 8) > > > + > > > +/* 128-bits vectors. */ > > > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 16) > > > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 16) > > > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 16) > > > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 16) > > > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 16) > > > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 16) > > > + > > > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > + > > > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > + > > > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > > new file mode 100644 > > > index 0000000..8515195 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > > @@ -0,0 +1,38 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > > + > > > +/* float 32 tests. */ > > > + > > > +#ifndef ELEM_TYPE > > > +#define ELEM_TYPE float > > > +#endif > > > +#ifndef INT_ELEM_TYPE > > > +#define INT_ELEM_TYPE __INT32_TYPE__ > > > +#endif > > > + > > > +#define COMPARE(NAME, OP) \ > > > + int_vec \ > > > + cmp_##NAME##_reg (vec a, vec b) \ > > > + { \ > > > + return a OP b; \ > > > + } > > > + > > > +typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16))); > > > +typedef ELEM_TYPE vec __attribute__((vector_size(16))); > > > + > > > +COMPARE (eq, ==) > > > +COMPARE (ne, !=) > > > +COMPARE (lt, <) > > > +COMPARE (le, <=) > > > +COMPARE (gt, >) > > > +COMPARE (ge, >=) > > > + > > > +/* eq, ne, lt, le, gt, ge. > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > new file mode 100644 > > > index 0000000..7774972 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > @@ -0,0 +1,69 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > > +/* { dg-add-options arm_v8_1m_mve } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +#define COMPARE_REG(NAME, OP, TYPE, SCALAR) \ > > > + TYPE \ > > > + cmp_##NAME##_##TYPE##_scalar (TYPE a, SCALAR b) \ > > > + { \ > > > + return a OP b; \ > > > + } > > > + > > > +#define COMPARE_TYPE(SCALAR, TYPE) \ > > > + COMPARE_REG (eq, ==, TYPE, SCALAR) \ > > > + COMPARE_REG (ne, !=, TYPE, SCALAR) \ > > > + COMPARE_REG (lt, <, TYPE, SCALAR) \ > > > + COMPARE_REG (le, <=, TYPE, SCALAR) \ > > > + COMPARE_REG (gt, >, TYPE, SCALAR) \ > > > + COMPARE_REG (ge, >=, TYPE, SCALAR) > > > + > > > +#define TEST_TYPE(NAME, ELEM, SIZE) \ > > > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > > > + COMPARE_TYPE (ELEM, NAME##SIZE) > > > + > > > +/* 64-bits vectors, not vectorized. */ > > > +TEST_TYPE (vs8, __INT8_TYPE__, 8) > > > +TEST_TYPE (vu8, __UINT8_TYPE__, 8) > > > +TEST_TYPE (vs16, __INT16_TYPE__, 8) > > > +TEST_TYPE (vu16, __UINT16_TYPE__, 8) > > > +TEST_TYPE (vs32, __INT32_TYPE__, 8) > > > +TEST_TYPE (vu32, __UINT32_TYPE__, 8) > > > + > > > +/* 128-bits vectors. */ > > > +TEST_TYPE (vs8, __INT8_TYPE__, 16) > > > +TEST_TYPE (vu8, __UINT8_TYPE__, 16) > > > +TEST_TYPE (vs16, __INT16_TYPE__, 16) > > > +TEST_TYPE (vu16, __UINT16_TYPE__, 16) > > > +TEST_TYPE (vs32, __INT32_TYPE__, 16) > > > +TEST_TYPE (vu32, __UINT32_TYPE__, 16) > > > + > > > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > + > > > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > + > > > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > > new file mode 100644 > > > index 0000000..4ed449e > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > > @@ -0,0 +1,30 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > > + > > > +#include <stdint.h> > > > + > > > +#define NB 4 > > > + > > > +#define FUNC(OP, NAME) \ > > > + void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \ > > > + int i; \ > > > + for (i=0; i<NB; i++) { \ > > > + dest[i] = a[i] OP b[i]; \ > > > + } \ > > > + } > > > + > > > +FUNC(==, vcmpeq) > > > +FUNC(!=, vcmpne) > > > +FUNC(<, vcmplt) > > > +FUNC(<=, vcmple) > > > +FUNC(>, vcmpgt) > > > +FUNC(>=, vcmpge) > > > + > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > > new file mode 100644 > > > index 0000000..8da15e7 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > > @@ -0,0 +1,50 @@ > > > +/* { dg-do assemble } */ > > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > > +/* { dg-add-options arm_v8_1m_mve } */ > > > +/* { dg-additional-options "-O3" } */ > > > + > > > +#include <stdint.h> > > > + > > > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME) \ > > > + void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ > > > + int i; \ > > > + for (i=0; i<NB; i++) { \ > > > + dest[i] = a[i] OP b[i]; \ > > > + } \ > > > +} > > > + > > > +#define ALL_FUNCS(OP, NAME) \ > > > + FUNC(s, int, 32, 2, OP, NAME) \ > > > + FUNC(u, uint, 32, 2, OP, NAME) \ > > > + FUNC(s, int, 16, 4, OP, NAME) \ > > > + FUNC(u, uint, 16, 4, OP, NAME) \ > > > + FUNC(s, int, 8, 8, OP, NAME) \ > > > + FUNC(u, uint, 8, 8, OP, NAME) \ > > > + FUNC(s, int, 32, 4, OP, NAME) \ > > > + FUNC(u, uint, 32, 4, OP, NAME) \ > > > + FUNC(s, int, 16, 8, OP, NAME) \ > > > + FUNC(u, uint, 16, 8, OP, NAME) \ > > > + FUNC(s, int, 8, 16, OP, NAME) \ > > > + FUNC(u, uint, 8, 16, OP, NAME) > > > + > > > +ALL_FUNCS(==, vcmpeq) > > > +ALL_FUNCS(!=, vcmpne) > > > +ALL_FUNCS(<, vcmplt) > > > +ALL_FUNCS(<=, vcmple) > > > +ALL_FUNCS(>, vcmpgt) > > > +ALL_FUNCS(>=, vcmpge) > > > + > > > +/* MVE has only 128-bit vectors, so we can vectorize only half of the > > > + functions above. */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ eq, q[0-9]+, q[0-9]+\n} 6 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ ne, q[0-9]+, q[0-9]+\n} 6 } } */ > > > + > > > +/* lt, le, gt, ge apply to signed types, cs and hi to unsigned types. */ > > > +/* lt and le with unsigned types are replaced with the opposite condition, hence > > > + the double number of matches for cs and hi. */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ lt, q[0-9]+, q[0-9]+\n} 3 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ le, q[0-9]+, q[0-9]+\n} 3 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ gt, q[0-9]+, q[0-9]+\n} 3 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ ge, q[0-9]+, q[0-9]+\n} 3 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ cs, q[0-9]+, q[0-9]+\n} 6 } } */ > > > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ hi, q[0-9]+, q[0-9]+\n} 6 } } */
ping? On Wed, 5 May 2021 at 16:08, Christophe Lyon <christophe.lyon@linaro.org> wrote: > > On Tue, 4 May 2021 at 15:41, Christophe Lyon <christophe.lyon@linaro.org> wrote: > > > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists) > > <andre.simoesdiasvieira@arm.com> wrote: > > > > > > Hi Christophe, > > > > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > > > > Since MVE has a different set of vector comparison operators from > > > > Neon, we have to update the expansion to take into account the new > > > > ones, for instance 'NE' for which MVE does not require to use 'EQ' > > > > with the inverted condition. > > > > > > > > Conversely, Neon supports comparisons with #0, MVE does not. > > > > > > > > For: > > > > typedef long int vs32 __attribute__((vector_size(16))); > > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } > > > > > > > > we now generate: > > > > cmp_eq_vs32_reg: > > > > vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 > > > > vldr.64 d5, .L123+8 > > > > vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 > > > > vldr.64 d7, .L123+24 > > > > vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si > > > > vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si > > > > bx lr @ 26 [c=8 l=4] *thumb2_return > > > > .L124: > > > > .align 3 > > > > .L123: > > > > .word 0 > > > > .word 0 > > > > .word 0 > > > > .word 0 > > > > .word 1 > > > > .word 1 > > > > .word 1 > > > > .word 1 > > > > > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) produces > > > > a pair of vldr instead of vmov.i32, qX, #0 > > > I think ideally we would even want: > > > vpte eq, q0, q1 > > > vmovt.i32 q0, #0 > > > vmove.i32 q0, #1 > > > > > > But we don't have a way to generate VPT blocks with multiple > > > instructions yet unfortunately so I guess VPSEL will have to do for now. > > > > TBH, I looked at what LLVM generates currently ;-) > > > > Here is an updated version, which adds > && (!<Is_float_mode> || flag_unsafe_math_optimizations) > to vcond_mask_ > > This condition was not present in the neon.md version I move to vec-common.md, > but since the VDQW iterator includes V2SF and V4SF, it should take > float-point flags into account. > > Christophe > > > > > > > > > > > > 2021-03-01 Christophe Lyon <christophe.lyon@linaro.org> > > > > > > > > gcc/ > > > > * config/arm/arm-protos.h (arm_expand_vector_compare): Update > > > > prototype. > > > > * config/arm/arm.c (arm_expand_vector_compare): Add support for > > > > MVE. > > > > (arm_expand_vcond): Likewise. > > > > * config/arm/iterators.md (supf): Remove VCMPNEQ_S, VCMPEQQ_S, > > > > VCMPEQQ_N_S, VCMPNEQ_N_S. > > > > (VCMPNEQ, VCMPEQQ, VCMPEQQ_N, VCMPNEQ_N): Remove. > > > > * config/arm/mve.md (@mve_vcmp<mve_cmp_op>q_<mode>): Add '@' prefix. > > > > (@mve_vcmp<mve_cmp_op>q_f<mode>): Likewise. > > > > (@mve_vcmp<mve_cmp_op>q_n_f<mode>): Likewise. > > > > (@mve_vpselq_<supf><mode>): Likewise. > > > > (@mve_vpselq_f<mode>"): Likewise. > > > > * config/arm/neon.md (vec_cmp<mode><v_cmp_result): Enable for MVE > > > > and move to vec-common.md. > > > > (vec_cmpu<mode><mode>): Likewise. > > > > (vcond<mode><mode>): Likewise. > > > > (vcond<V_cvtto><mode>): Likewise. > > > > (vcondu<mode><v_cmp_result>): Likewise. > > > > (vcond_mask_<mode><v_cmp_result>): Likewise. > > > > * config/arm/unspecs.md (VCMPNEQ_U, VCMPNEQ_S, VCMPEQQ_S) > > > > (VCMPEQQ_N_S, VCMPNEQ_N_S, VCMPEQQ_U, CMPEQQ_N_U, VCMPNEQ_N_U) > > > > (VCMPGEQ_N_S, VCMPGEQ_S, VCMPGTQ_N_S, VCMPGTQ_S, VCMPLEQ_N_S) > > > > (VCMPLEQ_S, VCMPLTQ_N_S, VCMPLTQ_S, VCMPCSQ_N_U, VCMPCSQ_U) > > > > (VCMPHIQ_N_U, VCMPHIQ_U): Remove. > > > > * config/arm/vec-common.md (vec_cmp<mode><v_cmp_result): Moved > > > > from neon.md. > > > > (vec_cmpu<mode><mode>): Likewise. > > > > (vcond<mode><mode>): Likewise. > > > > (vcond<V_cvtto><mode>): Likewise. > > > > (vcondu<mode><v_cmp_result>): Likewise. > > > > (vcond_mask_<mode><v_cmp_result>): Likewise. > > > > > > > > gcc/testsuite > > > > * gcc.target/arm/simd/mve-compare-1.c: New test with GCC vectors. > > > > * gcc.target/arm/simd/mve-compare-2.c: New test with GCC vectors. > > > > * gcc.target/arm/simd/mve-compare-scalar-1.c: New test with GCC > > > > vectors. > > > > * gcc.target/arm/simd/mve-vcmp-f32.c: New test for > > > > auto-vectorization. > > > > * gcc.target/arm/simd/mve-vcmp.c: New test for auto-vectorization. > > > > > > > > add gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > > --- > > > > gcc/config/arm/arm-protos.h | 2 +- > > > > gcc/config/arm/arm.c | 211 ++++++++++++++++----- > > > > gcc/config/arm/iterators.md | 9 +- > > > > gcc/config/arm/mve.md | 10 +- > > > > gcc/config/arm/neon.md | 87 --------- > > > > gcc/config/arm/unspecs.md | 20 -- > > > > gcc/config/arm/vec-common.md | 107 +++++++++++ > > > > gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c | 80 ++++++++ > > > > gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c | 38 ++++ > > > > .../gcc.target/arm/simd/mve-compare-scalar-1.c | 69 +++++++ > > > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c | 30 +++ > > > > gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c | 50 +++++ > > > > 12 files changed, 547 insertions(+), 166 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > > > create mode 100644 gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > > > > > > > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h > > > > index 2521541..ffccaa7 100644 > > > > --- a/gcc/config/arm/arm-protos.h > > > > +++ b/gcc/config/arm/arm-protos.h > > > > @@ -373,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx, > > > > extern bool arm_fusion_enabled_p (tune_params::fuse_ops); > > > > extern bool arm_valid_symbolic_address_p (rtx); > > > > extern bool arm_validize_comparison (rtx *, rtx *, rtx *); > > > > -extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool); > > > > +extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool); > > > > #endif /* RTX_CODE */ > > > > > > > > extern bool arm_gen_setmem (rtx *); > > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > > > > index 0371d98..80e28ef 100644 > > > > --- a/gcc/config/arm/arm.c > > > > +++ b/gcc/config/arm/arm.c > > > > @@ -30933,66 +30933,114 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, > > > > and return true if TARGET contains the inverse. If !CAN_INVERT, > > > > always store the result in TARGET, never its inverse. > > > > > > > > + If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do > > > > + it with the right destination type to avoid emiting two vpsel, one here and > > > > + one in arm_expand_vcond. > > > > + > > > > Note that the handling of floating-point comparisons is not > > > > IEEE compliant. */ > > > > > > > > bool > > > > arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > > - bool can_invert) > > > > + bool can_invert, bool vcond_mve) > > > > { > > > > machine_mode cmp_result_mode = GET_MODE (target); > > > > machine_mode cmp_mode = GET_MODE (op0); > > > > > > > > bool inverted; > > > > - switch (code) > > > > - { > > > > - /* For these we need to compute the inverse of the requested > > > > - comparison. */ > > > > - case UNORDERED: > > > > - case UNLT: > > > > - case UNLE: > > > > - case UNGT: > > > > - case UNGE: > > > > - case UNEQ: > > > > - case NE: > > > > - code = reverse_condition_maybe_unordered (code); > > > > - if (!can_invert) > > > > - { > > > > - /* Recursively emit the inverted comparison into a temporary > > > > - and then store its inverse in TARGET. This avoids reusing > > > > - TARGET (which for integer NE could be one of the inputs). */ > > > > - rtx tmp = gen_reg_rtx (cmp_result_mode); > > > > - if (arm_expand_vector_compare (tmp, code, op0, op1, true)) > > > > - gcc_unreachable (); > > > > - emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > > > > - return false; > > > > - } > > > > - inverted = true; > > > > - break; > > > > > > > > - default: > > > > + /* MVE supports more comparisons than Neon. */ > > > > + if (TARGET_HAVE_MVE) > > > > inverted = false; > > > > - break; > > > > - } > > > > + else > > > > + switch (code) > > > > + { > > > > + /* For these we need to compute the inverse of the requested > > > > + comparison. */ > > > > + case UNORDERED: > > > > + case UNLT: > > > > + case UNLE: > > > > + case UNGT: > > > > + case UNGE: > > > > + case UNEQ: > > > > + case NE: > > > > + code = reverse_condition_maybe_unordered (code); > > > > + if (!can_invert) > > > > + { > > > > + /* Recursively emit the inverted comparison into a temporary > > > > + and then store its inverse in TARGET. This avoids reusing > > > > + TARGET (which for integer NE could be one of the inputs). */ > > > > + rtx tmp = gen_reg_rtx (cmp_result_mode); > > > > + if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve)) > > > > + gcc_unreachable (); > > > > + emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); > > > > + return false; > > > > + } > > > > + inverted = true; > > > > + break; > > > > + > > > > + default: > > > > + inverted = false; > > > > + break; > > > > + } > > > > > > > > switch (code) > > > > { > > > > - /* These are natively supported for zero comparisons, but otherwise > > > > - require the operands to be swapped. */ > > > > + /* These are natively supported by Neon for zero comparisons, but otherwise > > > > + require the operands to be swapped. For MVE, we can only compare > > > > + registers. */ > > > > case LE: > > > > case LT: > > > > - if (op1 != CONST0_RTX (cmp_mode)) > > > > - { > > > > - code = swap_condition (code); > > > > - std::swap (op0, op1); > > > > - } > > > > + if (!TARGET_HAVE_MVE) > > > > + if (op1 != CONST0_RTX (cmp_mode)) > > > > + { > > > > + code = swap_condition (code); > > > > + std::swap (op0, op1); > > > > + } > > > > /* Fall through. */ > > > > > > > > - /* These are natively supported for both register and zero operands. */ > > > > + /* These are natively supported by Neon for both register and zero > > > > + operands. MVE supports registers only. */ > > > > case EQ: > > > > case GE: > > > > case GT: > > > > - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > > > > + case NE: > > > > + if (TARGET_HAVE_MVE) { > > > > + rtx vpr_p0; > > > > + if (vcond_mve) > > > > + vpr_p0 = target; > > > > + else > > > > + vpr_p0 = gen_reg_rtx (HImode); > > > > + > > > > + switch (cmp_mode) > > > > + { > > > > + case E_V16QImode: > > > > + case E_V8HImode: > > > > + case E_V4SImode: > > > > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > > > + break; > > > > + case E_V8HFmode: > > > > + case E_V4SFmode: > > > > + if (TARGET_HAVE_MVE_FLOAT) > > > > + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > > > + else > > > > + gcc_unreachable (); > > > > + break; > > > > + default: > > > > + gcc_unreachable (); > > > > + } > > > > + > > > > + /* If we are not expanding a vcond, build the result here. */ > > > > + if (!vcond_mve) { > > > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > > > + rtx one = gen_reg_rtx (cmp_result_mode); > > > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > > > + } > > > > + } > > > > + else > > > > + emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > > > > return inverted; > > > > > > > > /* These are natively supported for register operands only. > > > > @@ -31000,16 +31048,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > > or canonicalized by target-independent code. */ > > > > case GEU: > > > > case GTU: > > > > - emit_insn (gen_neon_vc (code, cmp_mode, target, > > > > - op0, force_reg (cmp_mode, op1))); > > > > + if (TARGET_HAVE_MVE) { > > > > + rtx vpr_p0; > > > > + if (vcond_mve) > > > > + vpr_p0 = target; > > > > + else > > > > + vpr_p0 = gen_reg_rtx (HImode); > > > > + > > > > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > > > > + if (!vcond_mve) { > > > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > > > + rtx one = gen_reg_rtx (cmp_result_mode); > > > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > > > + } > > > > + } > > > > + else > > > > + emit_insn (gen_neon_vc (code, cmp_mode, target, > > > > + op0, force_reg (cmp_mode, op1))); > > > > return inverted; > > > > > > > > /* These require the operands to be swapped and likewise do not > > > > support comparisons with zero. */ > > > > case LEU: > > > > case LTU: > > > > - emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > > > > - target, force_reg (cmp_mode, op1), op0)); > > > > + if (TARGET_HAVE_MVE) { > > > > + rtx vpr_p0; > > > > + if (vcond_mve) > > > > + vpr_p0 = target; > > > > + else > > > > + vpr_p0 = gen_reg_rtx (HImode); > > > > + > > > > + emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0)); > > > > + if (!vcond_mve) { > > > > + rtx zero = gen_reg_rtx (cmp_result_mode); > > > > + rtx one = gen_reg_rtx (cmp_result_mode); > > > > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > > > > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > > > > + } > > > > + } > > > > + else > > > > + emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, > > > > + target, force_reg (cmp_mode, op1), op0)); > > > > return inverted; > > > > > > > > /* These need a combination of two comparisons. */ > > > > @@ -31021,8 +31103,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > > rtx gt_res = gen_reg_rtx (cmp_result_mode); > > > > rtx alt_res = gen_reg_rtx (cmp_result_mode); > > > > rtx_code alt_code = (code == LTGT ? LT : LE); > > > > - if (arm_expand_vector_compare (gt_res, GT, op0, op1, true) > > > > - || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true)) > > > > + if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve) > > > > + || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve)) > > > > gcc_unreachable (); > > > > emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode, > > > > gt_res, alt_res))); > > > > @@ -31040,13 +31122,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, > > > > void > > > > arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode) > > > > { > > > > - rtx mask = gen_reg_rtx (cmp_result_mode); > > > > + /* When expanding for MVE, we do not want to emit a (useless) vpsel in > > > > + arm_expand_vector_compare, and another one here. */ > > > > + bool vcond_mve=false; > > > > + rtx mask; > > > > + > > > > + if (TARGET_HAVE_MVE) > > > > + { > > > > + vcond_mve=true; > > > > + mask = gen_reg_rtx (HImode); > > > > + } > > > > + else > > > > + mask = gen_reg_rtx (cmp_result_mode); > > > > + > > > > bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), > > > > - operands[4], operands[5], true); > > > > + operands[4], operands[5], true, vcond_mve); > > > > if (inverted) > > > > std::swap (operands[1], operands[2]); > > > > + if (TARGET_NEON) > > > > emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], > > > > mask, operands[1], operands[2])); > > > > + else > > > > + { > > > > + machine_mode cmp_mode = GET_MODE (operands[4]); > > > > + rtx vpr_p0 = mask; > > > > + rtx zero = gen_reg_rtx (cmp_mode); > > > > + rtx one = gen_reg_rtx (cmp_mode); > > > > + emit_move_insn (zero, CONST0_RTX (cmp_mode)); > > > > + emit_move_insn (one, CONST1_RTX (cmp_mode)); > > > > + switch (cmp_mode) > > > > + { > > > > + case E_V16QImode: > > > > + case E_V8HImode: > > > > + case E_V4SImode: > > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0)); > > > > + break; > > > > + case E_V8HFmode: > > > > + case E_V4SFmode: > > > > + if (TARGET_HAVE_MVE_FLOAT) > > > > + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0)); > > > > + break; > > > > + default: > > > > + gcc_unreachable (); > > > > + } > > > > + } > > > > } > > > > > > > > #define MAX_VECT_LEN 16 > > > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > > > > index 95df8bd..a128465 100644 > > > > --- a/gcc/config/arm/iterators.md > > > > +++ b/gcc/config/arm/iterators.md > > > > @@ -1288,12 +1288,11 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") > > > > (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s") > > > > (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U "u") > > > > (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s") (VSHLQ_S "s") > > > > - (VADDLVQ_P_U "u") (VCMPNEQ_S "s") > > > > + (VADDLVQ_P_U "u") > > > > (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s") > > > > (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u") > > > > (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBRSRQ_N_S "s") > > > > - (VBRSRQ_N_U "u") (VCMPEQQ_S "s") > > > > - (VCMPEQQ_N_S "s") (VCMPNEQ_N_S "s") > > > > + (VBRSRQ_N_U "u") > > > > (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s") > > > > (VHADDQ_U "u") (VHSUBQ_N_S "s") (VHSUBQ_N_U "u") > > > > (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u") (VHSUBQ_U "u") > > > > @@ -1549,16 +1548,12 @@ (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S]) > > > > (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U]) > > > > (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U]) > > > > (define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U]) > > > > -(define_int_iterator VCMPNEQ [VCMPNEQ_S]) > > > > (define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U]) > > > > (define_int_iterator VABDQ [VABDQ_S VABDQ_U]) > > > > (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U]) > > > > (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U]) > > > > (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S]) > > > > (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S]) > > > > -(define_int_iterator VCMPEQQ [VCMPEQQ_S]) > > > > -(define_int_iterator VCMPEQQ_N [VCMPEQQ_N_S]) > > > > -(define_int_iterator VCMPNEQ_N [VCMPNEQ_N_S]) > > > > (define_int_iterator VHADDQ [VHADDQ_S VHADDQ_U]) > > > > (define_int_iterator VHADDQ_N [VHADDQ_N_U VHADDQ_N_S]) > > > > (define_int_iterator VHSUBQ [VHSUBQ_S VHSUBQ_U]) > > > > diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md > > > > index 7c846a4..97f0a87 100644 > > > > --- a/gcc/config/arm/mve.md > > > > +++ b/gcc/config/arm/mve.md > > > > @@ -838,7 +838,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si" > > > > ;; > > > > ;; [vcmpneq_, vcmpcsq_, vcmpeqq_, vcmpgeq_, vcmpgtq_, vcmphiq_, vcmpleq_, vcmpltq_]) > > > > ;; > > > > -(define_insn "mve_vcmp<mve_cmp_op>q_<mode>" > > > > +(define_insn "@mve_vcmp<mve_cmp_op>q_<mode>" > > > > [ > > > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > > > (MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w") > > > > @@ -1928,7 +1928,7 @@ (define_insn "mve_vcaddq<mve_rot><mode>" > > > > ;; > > > > ;; [vcmpeqq_f, vcmpgeq_f, vcmpgtq_f, vcmpleq_f, vcmpltq_f, vcmpneq_f]) > > > > ;; > > > > -(define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > > > > +(define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>" > > > > [ > > > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > > > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > > > > @@ -1942,7 +1942,7 @@ (define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" > > > > ;; > > > > ;; [vcmpeqq_n_f, vcmpgeq_n_f, vcmpgtq_n_f, vcmpleq_n_f, vcmpltq_n_f, vcmpneq_n_f]) > > > > ;; > > > > -(define_insn "mve_vcmp<mve_cmp_op>q_n_f<mode>" > > > > +(define_insn "@mve_vcmp<mve_cmp_op>q_n_f<mode>" > > > > [ > > > > (set (match_operand:HI 0 "vpr_register_operand" "=Up") > > > > (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") > > > > @@ -3307,7 +3307,7 @@ (define_insn "mve_vnegq_m_s<mode>" > > > > ;; > > > > ;; [vpselq_u, vpselq_s]) > > > > ;; > > > > -(define_insn "mve_vpselq_<supf><mode>" > > > > +(define_insn "@mve_vpselq_<supf><mode>" > > > > [ > > > > (set (match_operand:MVE_1 0 "s_register_operand" "=w") > > > > (unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w") > > > > @@ -4402,7 +4402,7 @@ (define_insn "mve_vorrq_m_n_<supf><mode>" > > > > ;; > > > > ;; [vpselq_f]) > > > > ;; > > > > -(define_insn "mve_vpselq_f<mode>" > > > > +(define_insn "@mve_vpselq_f<mode>" > > > > [ > > > > (set (match_operand:MVE_0 0 "s_register_operand" "=w") > > > > (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") > > > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > > > > index fec2cc9..6660846 100644 > > > > --- a/gcc/config/arm/neon.md > > > > +++ b/gcc/config/arm/neon.md > > > > @@ -1416,93 +1416,6 @@ (define_insn "*us_sub<mode>_neon" > > > > [(set_attr "type" "neon_qsub<q>")] > > > > ) > > > > > > > > -(define_expand "vec_cmp<mode><v_cmp_result>" > > > > - [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > > > > - (match_operator:<V_cmp_result> 1 "comparison_operator" > > > > - [(match_operand:VDQW 2 "s_register_operand") > > > > - (match_operand:VDQW 3 "reg_or_zero_operand")]))] > > > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > > -{ > > > > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > > - operands[2], operands[3], false); > > > > - DONE; > > > > -}) > > > > - > > > > -(define_expand "vec_cmpu<mode><mode>" > > > > - [(set (match_operand:VDQIW 0 "s_register_operand") > > > > - (match_operator:VDQIW 1 "comparison_operator" > > > > - [(match_operand:VDQIW 2 "s_register_operand") > > > > - (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > > > > - "TARGET_NEON" > > > > -{ > > > > - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > > - operands[2], operands[3], false); > > > > - DONE; > > > > -}) > > > > - > > > > -;; Conditional instructions. These are comparisons with conditional moves for > > > > -;; vectors. They perform the assignment: > > > > -;; > > > > -;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > > > > -;; > > > > -;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > > > > -;; element-wise. > > > > - > > > > -(define_expand "vcond<mode><mode>" > > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > > - (if_then_else:VDQW > > > > - (match_operator 3 "comparison_operator" > > > > - [(match_operand:VDQW 4 "s_register_operand") > > > > - (match_operand:VDQW 5 "reg_or_zero_operand")]) > > > > - (match_operand:VDQW 1 "s_register_operand") > > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > > -{ > > > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > > > - DONE; > > > > -}) > > > > - > > > > -(define_expand "vcond<V_cvtto><mode>" > > > > - [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > > > > - (if_then_else:<V_CVTTO> > > > > - (match_operator 3 "comparison_operator" > > > > - [(match_operand:V32 4 "s_register_operand") > > > > - (match_operand:V32 5 "reg_or_zero_operand")]) > > > > - (match_operand:<V_CVTTO> 1 "s_register_operand") > > > > - (match_operand:<V_CVTTO> 2 "s_register_operand")))] > > > > - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > > -{ > > > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > > > - DONE; > > > > -}) > > > > - > > > > -(define_expand "vcondu<mode><v_cmp_result>" > > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > > - (if_then_else:VDQW > > > > - (match_operator 3 "arm_comparison_operator" > > > > - [(match_operand:<V_cmp_result> 4 "s_register_operand") > > > > - (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > > > > - (match_operand:VDQW 1 "s_register_operand") > > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > > - "TARGET_NEON" > > > > -{ > > > > - arm_expand_vcond (operands, <V_cmp_result>mode); > > > > - DONE; > > > > -}) > > > > - > > > > -(define_expand "vcond_mask_<mode><v_cmp_result>" > > > > - [(set (match_operand:VDQW 0 "s_register_operand") > > > > - (if_then_else:VDQW > > > > - (match_operand:<V_cmp_result> 3 "s_register_operand") > > > > - (match_operand:VDQW 1 "s_register_operand") > > > > - (match_operand:VDQW 2 "s_register_operand")))] > > > > - "TARGET_NEON" > > > > -{ > > > > - emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1], > > > > - operands[2])); > > > > - DONE; > > > > -}) > > > > - > > > > ;; Patterns for builtins. > > > > > > > > ; good for plain vadd, vaddq. > > > > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md > > > > index 07ca53b..0778db1 100644 > > > > --- a/gcc/config/arm/unspecs.md > > > > +++ b/gcc/config/arm/unspecs.md > > > > @@ -596,8 +596,6 @@ (define_c_enum "unspec" [ > > > > VCVTQ_N_FROM_F_U > > > > VADDLVQ_P_S > > > > VADDLVQ_P_U > > > > - VCMPNEQ_U > > > > - VCMPNEQ_S > > > > VSHLQ_S > > > > VSHLQ_U > > > > VABDQ_S > > > > @@ -605,9 +603,6 @@ (define_c_enum "unspec" [ > > > > VADDVAQ_S > > > > VADDVQ_P_S > > > > VBRSRQ_N_S > > > > - VCMPEQQ_S > > > > - VCMPEQQ_N_S > > > > - VCMPNEQ_N_S > > > > VHADDQ_S > > > > VHADDQ_N_S > > > > VHSUBQ_S > > > > @@ -645,9 +640,6 @@ (define_c_enum "unspec" [ > > > > VADDVAQ_U > > > > VADDVQ_P_U > > > > VBRSRQ_N_U > > > > - VCMPEQQ_U > > > > - VCMPEQQ_N_U > > > > - VCMPNEQ_N_U > > > > VHADDQ_U > > > > VHADDQ_N_U > > > > VHSUBQ_U > > > > @@ -680,14 +672,6 @@ (define_c_enum "unspec" [ > > > > VSHLQ_R_U > > > > VSUBQ_U > > > > VSUBQ_N_U > > > > - VCMPGEQ_N_S > > > > - VCMPGEQ_S > > > > - VCMPGTQ_N_S > > > > - VCMPGTQ_S > > > > - VCMPLEQ_N_S > > > > - VCMPLEQ_S > > > > - VCMPLTQ_N_S > > > > - VCMPLTQ_S > > > > VHCADDQ_ROT270_S > > > > VHCADDQ_ROT90_S > > > > VMAXAQ_S > > > > @@ -702,10 +686,6 @@ (define_c_enum "unspec" [ > > > > VQRDMULHQ_N_S > > > > VQRDMULHQ_S > > > > VQSHLUQ_N_S > > > > - VCMPCSQ_N_U > > > > - VCMPCSQ_U > > > > - VCMPHIQ_N_U > > > > - VCMPHIQ_U > > > > VABDQ_M_S > > > > VABDQ_M_U > > > > VABDQ_F > > > > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md > > > > index 0b2b3b1..034b48b 100644 > > > > --- a/gcc/config/arm/vec-common.md > > > > +++ b/gcc/config/arm/vec-common.md > > > > @@ -362,3 +362,110 @@ (define_expand "vlshr<mode>3" > > > > DONE; > > > > } > > > > }) > > > > + > > > > +(define_expand "vec_cmp<mode><v_cmp_result>" > > > > + [(set (match_operand:<V_cmp_result> 0 "s_register_operand") > > > > + (match_operator:<V_cmp_result> 1 "comparison_operator" > > > > + [(match_operand:VDQW 2 "s_register_operand") > > > > + (match_operand:VDQW 3 "reg_or_zero_operand")]))] > > > > + "ARM_HAVE_<MODE>_ARITH > > > > + && !TARGET_REALLY_IWMMXT > > > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > > +{ > > > > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > > + operands[2], operands[3], false, false); > > > > + DONE; > > > > +}) > > > > + > > > > +(define_expand "vec_cmpu<mode><mode>" > > > > + [(set (match_operand:VDQIW 0 "s_register_operand") > > > > + (match_operator:VDQIW 1 "comparison_operator" > > > > + [(match_operand:VDQIW 2 "s_register_operand") > > > > + (match_operand:VDQIW 3 "reg_or_zero_operand")]))] > > > > + "ARM_HAVE_<MODE>_ARITH > > > > + && !TARGET_REALLY_IWMMXT" > > > > +{ > > > > + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), > > > > + operands[2], operands[3], false, false); > > > > + DONE; > > > > +}) > > > > + > > > > +;; Conditional instructions. These are comparisons with conditional moves for > > > > +;; vectors. They perform the assignment: > > > > +;; > > > > +;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; > > > > +;; > > > > +;; where op3 is <, <=, ==, !=, >= or >. Operations are performed > > > > +;; element-wise. > > > > + > > > > +(define_expand "vcond<mode><mode>" > > > > + [(set (match_operand:VDQW 0 "s_register_operand") > > > > + (if_then_else:VDQW > > > > + (match_operator 3 "comparison_operator" > > > > + [(match_operand:VDQW 4 "s_register_operand") > > > > + (match_operand:VDQW 5 "reg_or_zero_operand")]) > > > > + (match_operand:VDQW 1 "s_register_operand") > > > > + (match_operand:VDQW 2 "s_register_operand")))] > > > > + "ARM_HAVE_<MODE>_ARITH > > > > + && !TARGET_REALLY_IWMMXT > > > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > > +{ > > > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > > > + DONE; > > > > +}) > > > > + > > > > +(define_expand "vcond<V_cvtto><mode>" > > > > + [(set (match_operand:<V_CVTTO> 0 "s_register_operand") > > > > + (if_then_else:<V_CVTTO> > > > > + (match_operator 3 "comparison_operator" > > > > + [(match_operand:V32 4 "s_register_operand") > > > > + (match_operand:V32 5 "reg_or_zero_operand")]) > > > > + (match_operand:<V_CVTTO> 1 "s_register_operand") > > > > + (match_operand:<V_CVTTO> 2 "s_register_operand")))] > > > > + "ARM_HAVE_<MODE>_ARITH > > > > + && !TARGET_REALLY_IWMMXT > > > > + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" > > > > +{ > > > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > > > + DONE; > > > > +}) > > > > + > > > > +(define_expand "vcondu<mode><v_cmp_result>" > > > > + [(set (match_operand:VDQW 0 "s_register_operand") > > > > + (if_then_else:VDQW > > > > + (match_operator 3 "arm_comparison_operator" > > > > + [(match_operand:<V_cmp_result> 4 "s_register_operand") > > > > + (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) > > > > + (match_operand:VDQW 1 "s_register_operand") > > > > + (match_operand:VDQW 2 "s_register_operand")))] > > > > + "ARM_HAVE_<MODE>_ARITH > > > > + && !TARGET_REALLY_IWMMXT" > > > > +{ > > > > + arm_expand_vcond (operands, <V_cmp_result>mode); > > > > + DONE; > > > > +}) > > > > + > > > > +(define_expand "vcond_mask_<mode><v_cmp_result>" > > > > + [(set (match_operand:VDQW 0 "s_register_operand") > > > > + (if_then_else:VDQW > > > > + (match_operand:<V_cmp_result> 3 "s_register_operand") > > > > + (match_operand:VDQW 1 "s_register_operand") > > > > + (match_operand:VDQW 2 "s_register_operand")))] > > > > + "ARM_HAVE_<MODE>_ARITH > > > > + && !TARGET_REALLY_IWMMXT" > > > > +{ > > > > + if (TARGET_NEON) > > > > + { > > > > + emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3], > > > > + operands[1], operands[2])); > > > > + } > > > > + else if (TARGET_HAVE_MVE) > > > > + { > > > > + emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0], > > > > + operands[1], operands[2], operands[3])); > > > > + } > > > > + else > > > > + gcc_unreachable (); > > > > + > > > > + DONE; > > > > +}) > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > > > new file mode 100644 > > > > index 0000000..029c931 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c > > > > @@ -0,0 +1,80 @@ > > > > +/* { dg-do assemble } */ > > > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > > > +/* { dg-add-options arm_v8_1m_mve } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +/* Integer tests. */ > > > > + > > > > +#define COMPARE_REG(NAME, OP, TYPE) \ > > > > + TYPE \ > > > > + cmp_##NAME##_##TYPE##_reg (TYPE a, TYPE b) \ > > > > + { \ > > > > + return a OP b; \ > > > > + } > > > > + > > > > +#define COMPARE_REG_AND_ZERO(NAME, OP, TYPE) \ > > > > + COMPARE_REG (NAME, OP, TYPE) \ > > > > + \ > > > > + TYPE \ > > > > + cmp_##NAME##_##TYPE##_zero (TYPE a) \ > > > > + { \ > > > > + return a OP (TYPE) {}; \ > > > > + } > > > > + > > > > +#define COMPARE_TYPE(TYPE, COMPARE_ORDERED) \ > > > > + COMPARE_REG_AND_ZERO (eq, ==, TYPE) \ > > > > + COMPARE_REG_AND_ZERO (ne, !=, TYPE) \ > > > > + COMPARE_ORDERED (lt, <, TYPE) \ > > > > + COMPARE_ORDERED (le, <=, TYPE) \ > > > > + COMPARE_ORDERED (gt, >, TYPE) \ > > > > + COMPARE_ORDERED (ge, >=, TYPE) > > > > + > > > > +#define TEST_TYPE(NAME, ELEM, COMPARE_ORDERED, SIZE) \ > > > > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > > > > + COMPARE_TYPE (NAME##SIZE, COMPARE_ORDERED) > > > > + > > > > +/* 64-bits vectors, not vectorized. */ > > > > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 8) > > > > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 8) > > > > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 8) > > > > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 8) > > > > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 8) > > > > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 8) > > > > + > > > > +/* 128-bits vectors. */ > > > > +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 16) > > > > +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 16) > > > > +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 16) > > > > +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 16) > > > > +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 16) > > > > +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 16) > > > > + > > > > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > + > > > > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > + > > > > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 4 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 4 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > > > new file mode 100644 > > > > index 0000000..8515195 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c > > > > @@ -0,0 +1,38 @@ > > > > +/* { dg-do assemble } */ > > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > > > + > > > > +/* float 32 tests. */ > > > > + > > > > +#ifndef ELEM_TYPE > > > > +#define ELEM_TYPE float > > > > +#endif > > > > +#ifndef INT_ELEM_TYPE > > > > +#define INT_ELEM_TYPE __INT32_TYPE__ > > > > +#endif > > > > + > > > > +#define COMPARE(NAME, OP) \ > > > > + int_vec \ > > > > + cmp_##NAME##_reg (vec a, vec b) \ > > > > + { \ > > > > + return a OP b; \ > > > > + } > > > > + > > > > +typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16))); > > > > +typedef ELEM_TYPE vec __attribute__((vector_size(16))); > > > > + > > > > +COMPARE (eq, ==) > > > > +COMPARE (ne, !=) > > > > +COMPARE (lt, <) > > > > +COMPARE (le, <=) > > > > +COMPARE (gt, >) > > > > +COMPARE (ge, >=) > > > > + > > > > +/* eq, ne, lt, le, gt, ge. > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > > new file mode 100644 > > > > index 0000000..7774972 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c > > > > @@ -0,0 +1,69 @@ > > > > +/* { dg-do assemble } */ > > > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > > > +/* { dg-add-options arm_v8_1m_mve } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +#define COMPARE_REG(NAME, OP, TYPE, SCALAR) \ > > > > + TYPE \ > > > > + cmp_##NAME##_##TYPE##_scalar (TYPE a, SCALAR b) \ > > > > + { \ > > > > + return a OP b; \ > > > > + } > > > > + > > > > +#define COMPARE_TYPE(SCALAR, TYPE) \ > > > > + COMPARE_REG (eq, ==, TYPE, SCALAR) \ > > > > + COMPARE_REG (ne, !=, TYPE, SCALAR) \ > > > > + COMPARE_REG (lt, <, TYPE, SCALAR) \ > > > > + COMPARE_REG (le, <=, TYPE, SCALAR) \ > > > > + COMPARE_REG (gt, >, TYPE, SCALAR) \ > > > > + COMPARE_REG (ge, >=, TYPE, SCALAR) > > > > + > > > > +#define TEST_TYPE(NAME, ELEM, SIZE) \ > > > > + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ > > > > + COMPARE_TYPE (ELEM, NAME##SIZE) > > > > + > > > > +/* 64-bits vectors, not vectorized. */ > > > > +TEST_TYPE (vs8, __INT8_TYPE__, 8) > > > > +TEST_TYPE (vu8, __UINT8_TYPE__, 8) > > > > +TEST_TYPE (vs16, __INT16_TYPE__, 8) > > > > +TEST_TYPE (vu16, __UINT16_TYPE__, 8) > > > > +TEST_TYPE (vs32, __INT32_TYPE__, 8) > > > > +TEST_TYPE (vu32, __UINT32_TYPE__, 8) > > > > + > > > > +/* 128-bits vectors. */ > > > > +TEST_TYPE (vs8, __INT8_TYPE__, 16) > > > > +TEST_TYPE (vu8, __UINT8_TYPE__, 16) > > > > +TEST_TYPE (vs16, __INT16_TYPE__, 16) > > > > +TEST_TYPE (vu16, __UINT16_TYPE__, 16) > > > > +TEST_TYPE (vs32, __INT32_TYPE__, 16) > > > > +TEST_TYPE (vu32, __UINT32_TYPE__, 16) > > > > + > > > > +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > + > > > > +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > + > > > > +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > > > new file mode 100644 > > > > index 0000000..4ed449e > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c > > > > @@ -0,0 +1,30 @@ > > > > +/* { dg-do assemble } */ > > > > +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ > > > > +/* { dg-add-options arm_v8_1m_mve_fp } */ > > > > +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ > > > > + > > > > +#include <stdint.h> > > > > + > > > > +#define NB 4 > > > > + > > > > +#define FUNC(OP, NAME) \ > > > > + void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \ > > > > + int i; \ > > > > + for (i=0; i<NB; i++) { \ > > > > + dest[i] = a[i] OP b[i]; \ > > > > + } \ > > > > + } > > > > + > > > > +FUNC(==, vcmpeq) > > > > +FUNC(!=, vcmpne) > > > > +FUNC(<, vcmplt) > > > > +FUNC(<=, vcmple) > > > > +FUNC(>, vcmpgt) > > > > +FUNC(>=, vcmpge) > > > > + > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ > > > > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > > > new file mode 100644 > > > > index 0000000..8da15e7 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c > > > > @@ -0,0 +1,50 @@ > > > > +/* { dg-do assemble } */ > > > > +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ > > > > +/* { dg-add-options arm_v8_1m_mve } */ > > > > +/* { dg-additional-options "-O3" } */ > > > > + > > > > +#include <stdint.h> > > > > + > > > > +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME) \ > > > > + void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ > > > > + int i; \ > > > > + for (i=0; i<NB; i++) { \ > > > > + dest[i] = a[i] OP b[i]; \ > > > > + } \ > > > > +} > > > > + > > > > +#define ALL_FUNCS(OP, NAME) \ > > > > + FUNC(s, int, 32, 2, OP, NAME) \ > > > > + FUNC(u, uint, 32, 2, OP, NAME) \ > > > > + FUNC(s, int, 16, 4, OP, NAME) \ > > > > + FUNC(u, uint, 16, 4, OP, NAME) \ > > > > + FUNC(s, int, 8, 8, OP, NAME) \ > > > > + FUNC(u, uint, 8, 8, OP, NAME) \ > > > > + FUNC(s, int, 32, 4, OP, NAME) \ > > > > + FUNC(u, uint, 32, 4, OP, NAME) \ > > > > + FUNC(s, int, 16, 8, OP, NAME) \ > > > > + FUNC(u, uint, 16, 8, OP, NAME) \ > > > > + FUNC(s, int, 8, 16, OP, NAME) \ > > > > + FUNC(u, uint, 8, 16, OP, NAME) > > > > + > > > > +ALL_FUNCS(==, vcmpeq) > > > > +ALL_FUNCS(!=, vcmpne) > > > > +ALL_FUNCS(<, vcmplt) > > > > +ALL_FUNCS(<=, vcmple) > > > > +ALL_FUNCS(>, vcmpgt) > > > > +ALL_FUNCS(>=, vcmpge) > > > > + > > > > +/* MVE has only 128-bit vectors, so we can vectorize only half of the > > > > + functions above. */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ eq, q[0-9]+, q[0-9]+\n} 6 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ ne, q[0-9]+, q[0-9]+\n} 6 } } */ > > > > + > > > > +/* lt, le, gt, ge apply to signed types, cs and hi to unsigned types. */ > > > > +/* lt and le with unsigned types are replaced with the opposite condition, hence > > > > + the double number of matches for cs and hi. */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ lt, q[0-9]+, q[0-9]+\n} 3 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ le, q[0-9]+, q[0-9]+\n} 3 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ gt, q[0-9]+, q[0-9]+\n} 3 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ ge, q[0-9]+, q[0-9]+\n} 3 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ cs, q[0-9]+, q[0-9]+\n} 6 } } */ > > > > +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ hi, q[0-9]+, q[0-9]+\n} 6 } } */
> -----Original Message----- > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of > Christophe Lyon via Gcc-patches > Sent: 05 May 2021 15:08 > To: Andre Simoes Dias Vieira <Andre.SimoesDiasVieira@arm.com> > Cc: gcc Patches <gcc-patches@gcc.gnu.org> > Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp > > On Tue, 4 May 2021 at 15:41, Christophe Lyon <christophe.lyon@linaro.org> > wrote: > > > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists) > > <andre.simoesdiasvieira@arm.com> wrote: > > > > > > Hi Christophe, > > > > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > > > > Since MVE has a different set of vector comparison operators from > > > > Neon, we have to update the expansion to take into account the new > > > > ones, for instance 'NE' for which MVE does not require to use 'EQ' > > > > with the inverted condition. > > > > > > > > Conversely, Neon supports comparisons with #0, MVE does not. > > > > > > > > For: > > > > typedef long int vs32 __attribute__((vector_size(16))); > > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } > > > > > > > > we now generate: > > > > cmp_eq_vs32_reg: > > > > vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 > > > > vldr.64 d5, .L123+8 > > > > vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 > > > > vldr.64 d7, .L123+24 > > > > vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si > > > > vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si > > > > bx lr @ 26 [c=8 l=4] *thumb2_return > > > > .L124: > > > > .align 3 > > > > .L123: > > > > .word 0 > > > > .word 0 > > > > .word 0 > > > > .word 0 > > > > .word 1 > > > > .word 1 > > > > .word 1 > > > > .word 1 > > > > > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) > produces > > > > a pair of vldr instead of vmov.i32, qX, #0 > > > I think ideally we would even want: > > > vpte eq, q0, q1 > > > vmovt.i32 q0, #0 > > > vmove.i32 q0, #1 > > > > > > But we don't have a way to generate VPT blocks with multiple > > > instructions yet unfortunately so I guess VPSEL will have to do for now. > > > > TBH, I looked at what LLVM generates currently ;-) > > > > Here is an updated version, which adds > && (!<Is_float_mode> || flag_unsafe_math_optimizations) > to vcond_mask_ > > This condition was not present in the neon.md version I move to vec- > common.md, > but since the VDQW iterator includes V2SF and V4SF, it should take > float-point flags into account. > - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); + case NE: + if (TARGET_HAVE_MVE) { + rtx vpr_p0; GNU style wants the '{' on the new line. This appears a few other times in the patch. + if (vcond_mve) + vpr_p0 = target; + else + vpr_p0 = gen_reg_rtx (HImode); + + switch (cmp_mode) + { + case E_V16QImode: + case E_V8HImode: + case E_V4SImode: + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); + break; + case E_V8HFmode: + case E_V4SFmode: + if (TARGET_HAVE_MVE_FLOAT) + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); + else + gcc_unreachable (); + break; + default: + gcc_unreachable (); + } Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT or MODE_VECTOR_FLOAT here rather than have this switch statement. + + /* If we are not expanding a vcond, build the result here. */ + if (!vcond_mve) { + rtx zero = gen_reg_rtx (cmp_result_mode); + rtx one = gen_reg_rtx (cmp_result_mode); + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); + } + } + else ... bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), - operands[4], operands[5], true); + operands[4], operands[5], true, vcond_mve); if (inverted) std::swap (operands[1], operands[2]); + if (TARGET_NEON) emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], mask, operands[1], operands[2])); + else + { + machine_mode cmp_mode = GET_MODE (operands[4]); + rtx vpr_p0 = mask; + rtx zero = gen_reg_rtx (cmp_mode); + rtx one = gen_reg_rtx (cmp_mode); + emit_move_insn (zero, CONST0_RTX (cmp_mode)); + emit_move_insn (one, CONST1_RTX (cmp_mode)); + switch (cmp_mode) + { + case E_V16QImode: + case E_V8HImode: + case E_V4SImode: + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0)); + break; + case E_V8HFmode: + case E_V4SFmode: + if (TARGET_HAVE_MVE_FLOAT) + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0)); + break; + default: + gcc_unreachable (); + } Similarly here. Ok with those changes. Thanks, Kyrill
On Mon, 17 May 2021 at 12:35, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> wrote: > > > > > -----Original Message----- > > From: Gcc-patches <gcc-patches-bounces@gcc.gnu.org> On Behalf Of > > Christophe Lyon via Gcc-patches > > Sent: 05 May 2021 15:08 > > To: Andre Simoes Dias Vieira <Andre.SimoesDiasVieira@arm.com> > > Cc: gcc Patches <gcc-patches@gcc.gnu.org> > > Subject: Re: [PATCH 6/9] arm: Auto-vectorization for MVE: vcmp > > > > On Tue, 4 May 2021 at 15:41, Christophe Lyon <christophe.lyon@linaro.org> > > wrote: > > > > > > On Tue, 4 May 2021 at 13:29, Andre Vieira (lists) > > > <andre.simoesdiasvieira@arm.com> wrote: > > > > > > > > Hi Christophe, > > > > > > > > On 30/04/2021 15:09, Christophe Lyon via Gcc-patches wrote: > > > > > Since MVE has a different set of vector comparison operators from > > > > > Neon, we have to update the expansion to take into account the new > > > > > ones, for instance 'NE' for which MVE does not require to use 'EQ' > > > > > with the inverted condition. > > > > > > > > > > Conversely, Neon supports comparisons with #0, MVE does not. > > > > > > > > > > For: > > > > > typedef long int vs32 __attribute__((vector_size(16))); > > > > > vs32 cmp_eq_vs32_reg (vs32 a, vs32 b) { return a == b; } > > > > > > > > > > we now generate: > > > > > cmp_eq_vs32_reg: > > > > > vldr.64 d4, .L123 @ 8 [c=8 l=4] *mve_movv4si/8 > > > > > vldr.64 d5, .L123+8 > > > > > vldr.64 d6, .L123+16 @ 9 [c=8 l=4] *mve_movv4si/8 > > > > > vldr.64 d7, .L123+24 > > > > > vcmp.i32 eq, q0, q1 @ 7 [c=16 l=4] mve_vcmpeqq_v4si > > > > > vpsel q0, q3, q2 @ 15 [c=8 l=4] mve_vpselq_sv4si > > > > > bx lr @ 26 [c=8 l=4] *thumb2_return > > > > > .L124: > > > > > .align 3 > > > > > .L123: > > > > > .word 0 > > > > > .word 0 > > > > > .word 0 > > > > > .word 0 > > > > > .word 1 > > > > > .word 1 > > > > > .word 1 > > > > > .word 1 > > > > > > > > > > For some reason emit_move_insn (zero, CONST0_RTX (cmp_mode)) > > produces > > > > > a pair of vldr instead of vmov.i32, qX, #0 > > > > I think ideally we would even want: > > > > vpte eq, q0, q1 > > > > vmovt.i32 q0, #0 > > > > vmove.i32 q0, #1 > > > > > > > > But we don't have a way to generate VPT blocks with multiple > > > > instructions yet unfortunately so I guess VPSEL will have to do for now. > > > > > > TBH, I looked at what LLVM generates currently ;-) > > > > > > > Here is an updated version, which adds > > && (!<Is_float_mode> || flag_unsafe_math_optimizations) > > to vcond_mask_ > > > > This condition was not present in the neon.md version I move to vec- > > common.md, > > but since the VDQW iterator includes V2SF and V4SF, it should take > > float-point flags into account. > > > > - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); > + case NE: > + if (TARGET_HAVE_MVE) { > + rtx vpr_p0; > > GNU style wants the '{' on the new line. This appears a few other times in the patch. > > + if (vcond_mve) > + vpr_p0 = target; > + else > + vpr_p0 = gen_reg_rtx (HImode); > + > + switch (cmp_mode) > + { > + case E_V16QImode: > + case E_V8HImode: > + case E_V4SImode: > + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > + break; > + case E_V8HFmode: > + case E_V4SFmode: > + if (TARGET_HAVE_MVE_FLOAT) > + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); > + else > + gcc_unreachable (); > + break; > + default: > + gcc_unreachable (); > + } > > Hmm, I think we can just check GET_MODE_CLASS (cmp_mode) for MODE_VECTOR_INT or MODE_VECTOR_FLOAT here rather than have this switch statement. > > + > + /* If we are not expanding a vcond, build the result here. */ > + if (!vcond_mve) { > + rtx zero = gen_reg_rtx (cmp_result_mode); > + rtx one = gen_reg_rtx (cmp_result_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); > + } > + } > + else > > ... > bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), > - operands[4], operands[5], true); > + operands[4], operands[5], true, vcond_mve); > if (inverted) > std::swap (operands[1], operands[2]); > + if (TARGET_NEON) > emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], > mask, operands[1], operands[2])); > + else > + { > + machine_mode cmp_mode = GET_MODE (operands[4]); > + rtx vpr_p0 = mask; > + rtx zero = gen_reg_rtx (cmp_mode); > + rtx one = gen_reg_rtx (cmp_mode); > + emit_move_insn (zero, CONST0_RTX (cmp_mode)); > + emit_move_insn (one, CONST1_RTX (cmp_mode)); > + switch (cmp_mode) > + { > + case E_V16QImode: > + case E_V8HImode: > + case E_V4SImode: > + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0)); > + break; > + case E_V8HFmode: > + case E_V4SFmode: > + if (TARGET_HAVE_MVE_FLOAT) > + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0)); > + break; > + default: > + gcc_unreachable (); > + } > > Similarly here. > Ok with those changes. Thanks, committed after testing. Christophe > Thanks, > Kyrill
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 2521541..ffccaa7 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -373,7 +373,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, rtx, rtx, rtx, rtx, extern bool arm_fusion_enabled_p (tune_params::fuse_ops); extern bool arm_valid_symbolic_address_p (rtx); extern bool arm_validize_comparison (rtx *, rtx *, rtx *); -extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool); +extern bool arm_expand_vector_compare (rtx, rtx_code, rtx, rtx, bool, bool); #endif /* RTX_CODE */ extern bool arm_gen_setmem (rtx *); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 0371d98..80e28ef 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -30933,66 +30933,114 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, and return true if TARGET contains the inverse. If !CAN_INVERT, always store the result in TARGET, never its inverse. + If VCOND_MVE, do not emit the vpsel instruction here, let arm_expand_vcond do + it with the right destination type to avoid emiting two vpsel, one here and + one in arm_expand_vcond. + Note that the handling of floating-point comparisons is not IEEE compliant. */ bool arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, - bool can_invert) + bool can_invert, bool vcond_mve) { machine_mode cmp_result_mode = GET_MODE (target); machine_mode cmp_mode = GET_MODE (op0); bool inverted; - switch (code) - { - /* For these we need to compute the inverse of the requested - comparison. */ - case UNORDERED: - case UNLT: - case UNLE: - case UNGT: - case UNGE: - case UNEQ: - case NE: - code = reverse_condition_maybe_unordered (code); - if (!can_invert) - { - /* Recursively emit the inverted comparison into a temporary - and then store its inverse in TARGET. This avoids reusing - TARGET (which for integer NE could be one of the inputs). */ - rtx tmp = gen_reg_rtx (cmp_result_mode); - if (arm_expand_vector_compare (tmp, code, op0, op1, true)) - gcc_unreachable (); - emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); - return false; - } - inverted = true; - break; - default: + /* MVE supports more comparisons than Neon. */ + if (TARGET_HAVE_MVE) inverted = false; - break; - } + else + switch (code) + { + /* For these we need to compute the inverse of the requested + comparison. */ + case UNORDERED: + case UNLT: + case UNLE: + case UNGT: + case UNGE: + case UNEQ: + case NE: + code = reverse_condition_maybe_unordered (code); + if (!can_invert) + { + /* Recursively emit the inverted comparison into a temporary + and then store its inverse in TARGET. This avoids reusing + TARGET (which for integer NE could be one of the inputs). */ + rtx tmp = gen_reg_rtx (cmp_result_mode); + if (arm_expand_vector_compare (tmp, code, op0, op1, true, vcond_mve)) + gcc_unreachable (); + emit_insn (gen_rtx_SET (target, gen_rtx_NOT (cmp_result_mode, tmp))); + return false; + } + inverted = true; + break; + + default: + inverted = false; + break; + } switch (code) { - /* These are natively supported for zero comparisons, but otherwise - require the operands to be swapped. */ + /* These are natively supported by Neon for zero comparisons, but otherwise + require the operands to be swapped. For MVE, we can only compare + registers. */ case LE: case LT: - if (op1 != CONST0_RTX (cmp_mode)) - { - code = swap_condition (code); - std::swap (op0, op1); - } + if (!TARGET_HAVE_MVE) + if (op1 != CONST0_RTX (cmp_mode)) + { + code = swap_condition (code); + std::swap (op0, op1); + } /* Fall through. */ - /* These are natively supported for both register and zero operands. */ + /* These are natively supported by Neon for both register and zero + operands. MVE supports registers only. */ case EQ: case GE: case GT: - emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); + case NE: + if (TARGET_HAVE_MVE) { + rtx vpr_p0; + if (vcond_mve) + vpr_p0 = target; + else + vpr_p0 = gen_reg_rtx (HImode); + + switch (cmp_mode) + { + case E_V16QImode: + case E_V8HImode: + case E_V4SImode: + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); + break; + case E_V8HFmode: + case E_V4SFmode: + if (TARGET_HAVE_MVE_FLOAT) + emit_insn (gen_mve_vcmpq_f (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); + else + gcc_unreachable (); + break; + default: + gcc_unreachable (); + } + + /* If we are not expanding a vcond, build the result here. */ + if (!vcond_mve) { + rtx zero = gen_reg_rtx (cmp_result_mode); + rtx one = gen_reg_rtx (cmp_result_mode); + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); + } + } + else + emit_insn (gen_neon_vc (code, cmp_mode, target, op0, op1)); return inverted; /* These are natively supported for register operands only. @@ -31000,16 +31048,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, or canonicalized by target-independent code. */ case GEU: case GTU: - emit_insn (gen_neon_vc (code, cmp_mode, target, - op0, force_reg (cmp_mode, op1))); + if (TARGET_HAVE_MVE) { + rtx vpr_p0; + if (vcond_mve) + vpr_p0 = target; + else + vpr_p0 = gen_reg_rtx (HImode); + + emit_insn (gen_mve_vcmpq (code, cmp_mode, vpr_p0, op0, force_reg (cmp_mode, op1))); + if (!vcond_mve) { + rtx zero = gen_reg_rtx (cmp_result_mode); + rtx one = gen_reg_rtx (cmp_result_mode); + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); + } + } + else + emit_insn (gen_neon_vc (code, cmp_mode, target, + op0, force_reg (cmp_mode, op1))); return inverted; /* These require the operands to be swapped and likewise do not support comparisons with zero. */ case LEU: case LTU: - emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, - target, force_reg (cmp_mode, op1), op0)); + if (TARGET_HAVE_MVE) { + rtx vpr_p0; + if (vcond_mve) + vpr_p0 = target; + else + vpr_p0 = gen_reg_rtx (HImode); + + emit_insn (gen_mve_vcmpq (swap_condition (code), cmp_mode, vpr_p0, force_reg (cmp_mode, op1), op0)); + if (!vcond_mve) { + rtx zero = gen_reg_rtx (cmp_result_mode); + rtx one = gen_reg_rtx (cmp_result_mode); + emit_move_insn (zero, CONST0_RTX (cmp_result_mode)); + emit_move_insn (one, CONST1_RTX (cmp_result_mode)); + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, target, one, zero, vpr_p0)); + } + } + else + emit_insn (gen_neon_vc (swap_condition (code), cmp_mode, + target, force_reg (cmp_mode, op1), op0)); return inverted; /* These need a combination of two comparisons. */ @@ -31021,8 +31103,8 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, rtx gt_res = gen_reg_rtx (cmp_result_mode); rtx alt_res = gen_reg_rtx (cmp_result_mode); rtx_code alt_code = (code == LTGT ? LT : LE); - if (arm_expand_vector_compare (gt_res, GT, op0, op1, true) - || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true)) + if (arm_expand_vector_compare (gt_res, GT, op0, op1, true, vcond_mve) + || arm_expand_vector_compare (alt_res, alt_code, op0, op1, true, vcond_mve)) gcc_unreachable (); emit_insn (gen_rtx_SET (target, gen_rtx_IOR (cmp_result_mode, gt_res, alt_res))); @@ -31040,13 +31122,50 @@ arm_expand_vector_compare (rtx target, rtx_code code, rtx op0, rtx op1, void arm_expand_vcond (rtx *operands, machine_mode cmp_result_mode) { - rtx mask = gen_reg_rtx (cmp_result_mode); + /* When expanding for MVE, we do not want to emit a (useless) vpsel in + arm_expand_vector_compare, and another one here. */ + bool vcond_mve=false; + rtx mask; + + if (TARGET_HAVE_MVE) + { + vcond_mve=true; + mask = gen_reg_rtx (HImode); + } + else + mask = gen_reg_rtx (cmp_result_mode); + bool inverted = arm_expand_vector_compare (mask, GET_CODE (operands[3]), - operands[4], operands[5], true); + operands[4], operands[5], true, vcond_mve); if (inverted) std::swap (operands[1], operands[2]); + if (TARGET_NEON) emit_insn (gen_neon_vbsl (GET_MODE (operands[0]), operands[0], mask, operands[1], operands[2])); + else + { + machine_mode cmp_mode = GET_MODE (operands[4]); + rtx vpr_p0 = mask; + rtx zero = gen_reg_rtx (cmp_mode); + rtx one = gen_reg_rtx (cmp_mode); + emit_move_insn (zero, CONST0_RTX (cmp_mode)); + emit_move_insn (one, CONST1_RTX (cmp_mode)); + switch (cmp_mode) + { + case E_V16QImode: + case E_V8HImode: + case E_V4SImode: + emit_insn (gen_mve_vpselq (VPSELQ_S, cmp_result_mode, operands[0], one, zero, vpr_p0)); + break; + case E_V8HFmode: + case E_V4SFmode: + if (TARGET_HAVE_MVE_FLOAT) + emit_insn (gen_mve_vpselq_f (cmp_mode, operands[0], one, zero, vpr_p0)); + break; + default: + gcc_unreachable (); + } + } } #define MAX_VECT_LEN 16 diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 95df8bd..a128465 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -1288,12 +1288,11 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U "u") (VREV16Q_S "s") (VCREATEQ_U "u") (VCREATEQ_S "s") (VSHRQ_N_S "s") (VSHRQ_N_U "u") (VCVTQ_N_FROM_F_S "s") (VSHLQ_U "u") (VCVTQ_N_FROM_F_U "u") (VADDLVQ_P_S "s") (VSHLQ_S "s") - (VADDLVQ_P_U "u") (VCMPNEQ_S "s") + (VADDLVQ_P_U "u") (VABDQ_M_S "s") (VABDQ_M_U "u") (VABDQ_S "s") (VABDQ_U "u") (VADDQ_N_S "s") (VADDQ_N_U "u") (VADDVQ_P_S "s") (VADDVQ_P_U "u") (VBRSRQ_N_S "s") - (VBRSRQ_N_U "u") (VCMPEQQ_S "s") - (VCMPEQQ_N_S "s") (VCMPNEQ_N_S "s") + (VBRSRQ_N_U "u") (VHADDQ_N_S "s") (VHADDQ_N_U "u") (VHADDQ_S "s") (VHADDQ_U "u") (VHSUBQ_N_S "s") (VHSUBQ_N_U "u") (VHSUBQ_S "s") (VMAXQ_S "s") (VMAXQ_U "u") (VHSUBQ_U "u") @@ -1549,16 +1548,12 @@ (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S]) (define_int_iterator VSHRQ_N [VSHRQ_N_S VSHRQ_N_U]) (define_int_iterator VCVTQ_N_FROM_F [VCVTQ_N_FROM_F_S VCVTQ_N_FROM_F_U]) (define_int_iterator VADDLVQ_P [VADDLVQ_P_S VADDLVQ_P_U]) -(define_int_iterator VCMPNEQ [VCMPNEQ_S]) (define_int_iterator VSHLQ [VSHLQ_S VSHLQ_U]) (define_int_iterator VABDQ [VABDQ_S VABDQ_U]) (define_int_iterator VADDQ_N [VADDQ_N_S VADDQ_N_U]) (define_int_iterator VADDVAQ [VADDVAQ_S VADDVAQ_U]) (define_int_iterator VADDVQ_P [VADDVQ_P_U VADDVQ_P_S]) (define_int_iterator VBRSRQ_N [VBRSRQ_N_U VBRSRQ_N_S]) -(define_int_iterator VCMPEQQ [VCMPEQQ_S]) -(define_int_iterator VCMPEQQ_N [VCMPEQQ_N_S]) -(define_int_iterator VCMPNEQ_N [VCMPNEQ_N_S]) (define_int_iterator VHADDQ [VHADDQ_S VHADDQ_U]) (define_int_iterator VHADDQ_N [VHADDQ_N_U VHADDQ_N_S]) (define_int_iterator VHSUBQ [VHSUBQ_S VHSUBQ_U]) diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index 7c846a4..97f0a87 100644 --- a/gcc/config/arm/mve.md +++ b/gcc/config/arm/mve.md @@ -838,7 +838,7 @@ (define_insn "mve_vaddlvq_p_<supf>v4si" ;; ;; [vcmpneq_, vcmpcsq_, vcmpeqq_, vcmpgeq_, vcmpgtq_, vcmphiq_, vcmpleq_, vcmpltq_]) ;; -(define_insn "mve_vcmp<mve_cmp_op>q_<mode>" +(define_insn "@mve_vcmp<mve_cmp_op>q_<mode>" [ (set (match_operand:HI 0 "vpr_register_operand" "=Up") (MVE_COMPARISONS:HI (match_operand:MVE_2 1 "s_register_operand" "w") @@ -1928,7 +1928,7 @@ (define_insn "mve_vcaddq<mve_rot><mode>" ;; ;; [vcmpeqq_f, vcmpgeq_f, vcmpgtq_f, vcmpleq_f, vcmpltq_f, vcmpneq_f]) ;; -(define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" +(define_insn "@mve_vcmp<mve_cmp_op>q_f<mode>" [ (set (match_operand:HI 0 "vpr_register_operand" "=Up") (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") @@ -1942,7 +1942,7 @@ (define_insn "mve_vcmp<mve_cmp_op>q_f<mode>" ;; ;; [vcmpeqq_n_f, vcmpgeq_n_f, vcmpgtq_n_f, vcmpleq_n_f, vcmpltq_n_f, vcmpneq_n_f]) ;; -(define_insn "mve_vcmp<mve_cmp_op>q_n_f<mode>" +(define_insn "@mve_vcmp<mve_cmp_op>q_n_f<mode>" [ (set (match_operand:HI 0 "vpr_register_operand" "=Up") (MVE_FP_COMPARISONS:HI (match_operand:MVE_0 1 "s_register_operand" "w") @@ -3307,7 +3307,7 @@ (define_insn "mve_vnegq_m_s<mode>" ;; ;; [vpselq_u, vpselq_s]) ;; -(define_insn "mve_vpselq_<supf><mode>" +(define_insn "@mve_vpselq_<supf><mode>" [ (set (match_operand:MVE_1 0 "s_register_operand" "=w") (unspec:MVE_1 [(match_operand:MVE_1 1 "s_register_operand" "w") @@ -4402,7 +4402,7 @@ (define_insn "mve_vorrq_m_n_<supf><mode>" ;; ;; [vpselq_f]) ;; -(define_insn "mve_vpselq_f<mode>" +(define_insn "@mve_vpselq_f<mode>" [ (set (match_operand:MVE_0 0 "s_register_operand" "=w") (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "w") diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index fec2cc9..6660846 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1416,93 +1416,6 @@ (define_insn "*us_sub<mode>_neon" [(set_attr "type" "neon_qsub<q>")] ) -(define_expand "vec_cmp<mode><v_cmp_result>" - [(set (match_operand:<V_cmp_result> 0 "s_register_operand") - (match_operator:<V_cmp_result> 1 "comparison_operator" - [(match_operand:VDQW 2 "s_register_operand") - (match_operand:VDQW 3 "reg_or_zero_operand")]))] - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" -{ - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), - operands[2], operands[3], false); - DONE; -}) - -(define_expand "vec_cmpu<mode><mode>" - [(set (match_operand:VDQIW 0 "s_register_operand") - (match_operator:VDQIW 1 "comparison_operator" - [(match_operand:VDQIW 2 "s_register_operand") - (match_operand:VDQIW 3 "reg_or_zero_operand")]))] - "TARGET_NEON" -{ - arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), - operands[2], operands[3], false); - DONE; -}) - -;; Conditional instructions. These are comparisons with conditional moves for -;; vectors. They perform the assignment: -;; -;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; -;; -;; where op3 is <, <=, ==, !=, >= or >. Operations are performed -;; element-wise. - -(define_expand "vcond<mode><mode>" - [(set (match_operand:VDQW 0 "s_register_operand") - (if_then_else:VDQW - (match_operator 3 "comparison_operator" - [(match_operand:VDQW 4 "s_register_operand") - (match_operand:VDQW 5 "reg_or_zero_operand")]) - (match_operand:VDQW 1 "s_register_operand") - (match_operand:VDQW 2 "s_register_operand")))] - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" -{ - arm_expand_vcond (operands, <V_cmp_result>mode); - DONE; -}) - -(define_expand "vcond<V_cvtto><mode>" - [(set (match_operand:<V_CVTTO> 0 "s_register_operand") - (if_then_else:<V_CVTTO> - (match_operator 3 "comparison_operator" - [(match_operand:V32 4 "s_register_operand") - (match_operand:V32 5 "reg_or_zero_operand")]) - (match_operand:<V_CVTTO> 1 "s_register_operand") - (match_operand:<V_CVTTO> 2 "s_register_operand")))] - "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)" -{ - arm_expand_vcond (operands, <V_cmp_result>mode); - DONE; -}) - -(define_expand "vcondu<mode><v_cmp_result>" - [(set (match_operand:VDQW 0 "s_register_operand") - (if_then_else:VDQW - (match_operator 3 "arm_comparison_operator" - [(match_operand:<V_cmp_result> 4 "s_register_operand") - (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) - (match_operand:VDQW 1 "s_register_operand") - (match_operand:VDQW 2 "s_register_operand")))] - "TARGET_NEON" -{ - arm_expand_vcond (operands, <V_cmp_result>mode); - DONE; -}) - -(define_expand "vcond_mask_<mode><v_cmp_result>" - [(set (match_operand:VDQW 0 "s_register_operand") - (if_then_else:VDQW - (match_operand:<V_cmp_result> 3 "s_register_operand") - (match_operand:VDQW 1 "s_register_operand") - (match_operand:VDQW 2 "s_register_operand")))] - "TARGET_NEON" -{ - emit_insn (gen_neon_vbsl<mode> (operands[0], operands[3], operands[1], - operands[2])); - DONE; -}) - ;; Patterns for builtins. ; good for plain vadd, vaddq. diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 07ca53b..0778db1 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -596,8 +596,6 @@ (define_c_enum "unspec" [ VCVTQ_N_FROM_F_U VADDLVQ_P_S VADDLVQ_P_U - VCMPNEQ_U - VCMPNEQ_S VSHLQ_S VSHLQ_U VABDQ_S @@ -605,9 +603,6 @@ (define_c_enum "unspec" [ VADDVAQ_S VADDVQ_P_S VBRSRQ_N_S - VCMPEQQ_S - VCMPEQQ_N_S - VCMPNEQ_N_S VHADDQ_S VHADDQ_N_S VHSUBQ_S @@ -645,9 +640,6 @@ (define_c_enum "unspec" [ VADDVAQ_U VADDVQ_P_U VBRSRQ_N_U - VCMPEQQ_U - VCMPEQQ_N_U - VCMPNEQ_N_U VHADDQ_U VHADDQ_N_U VHSUBQ_U @@ -680,14 +672,6 @@ (define_c_enum "unspec" [ VSHLQ_R_U VSUBQ_U VSUBQ_N_U - VCMPGEQ_N_S - VCMPGEQ_S - VCMPGTQ_N_S - VCMPGTQ_S - VCMPLEQ_N_S - VCMPLEQ_S - VCMPLTQ_N_S - VCMPLTQ_S VHCADDQ_ROT270_S VHCADDQ_ROT90_S VMAXAQ_S @@ -702,10 +686,6 @@ (define_c_enum "unspec" [ VQRDMULHQ_N_S VQRDMULHQ_S VQSHLUQ_N_S - VCMPCSQ_N_U - VCMPCSQ_U - VCMPHIQ_N_U - VCMPHIQ_U VABDQ_M_S VABDQ_M_U VABDQ_F diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md index 0b2b3b1..034b48b 100644 --- a/gcc/config/arm/vec-common.md +++ b/gcc/config/arm/vec-common.md @@ -362,3 +362,110 @@ (define_expand "vlshr<mode>3" DONE; } }) + +(define_expand "vec_cmp<mode><v_cmp_result>" + [(set (match_operand:<V_cmp_result> 0 "s_register_operand") + (match_operator:<V_cmp_result> 1 "comparison_operator" + [(match_operand:VDQW 2 "s_register_operand") + (match_operand:VDQW 3 "reg_or_zero_operand")]))] + "ARM_HAVE_<MODE>_ARITH + && !TARGET_REALLY_IWMMXT + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" +{ + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), + operands[2], operands[3], false, false); + DONE; +}) + +(define_expand "vec_cmpu<mode><mode>" + [(set (match_operand:VDQIW 0 "s_register_operand") + (match_operator:VDQIW 1 "comparison_operator" + [(match_operand:VDQIW 2 "s_register_operand") + (match_operand:VDQIW 3 "reg_or_zero_operand")]))] + "ARM_HAVE_<MODE>_ARITH + && !TARGET_REALLY_IWMMXT" +{ + arm_expand_vector_compare (operands[0], GET_CODE (operands[1]), + operands[2], operands[3], false, false); + DONE; +}) + +;; Conditional instructions. These are comparisons with conditional moves for +;; vectors. They perform the assignment: +;; +;; Vop0 = (Vop4 <op3> Vop5) ? Vop1 : Vop2; +;; +;; where op3 is <, <=, ==, !=, >= or >. Operations are performed +;; element-wise. + +(define_expand "vcond<mode><mode>" + [(set (match_operand:VDQW 0 "s_register_operand") + (if_then_else:VDQW + (match_operator 3 "comparison_operator" + [(match_operand:VDQW 4 "s_register_operand") + (match_operand:VDQW 5 "reg_or_zero_operand")]) + (match_operand:VDQW 1 "s_register_operand") + (match_operand:VDQW 2 "s_register_operand")))] + "ARM_HAVE_<MODE>_ARITH + && !TARGET_REALLY_IWMMXT + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" +{ + arm_expand_vcond (operands, <V_cmp_result>mode); + DONE; +}) + +(define_expand "vcond<V_cvtto><mode>" + [(set (match_operand:<V_CVTTO> 0 "s_register_operand") + (if_then_else:<V_CVTTO> + (match_operator 3 "comparison_operator" + [(match_operand:V32 4 "s_register_operand") + (match_operand:V32 5 "reg_or_zero_operand")]) + (match_operand:<V_CVTTO> 1 "s_register_operand") + (match_operand:<V_CVTTO> 2 "s_register_operand")))] + "ARM_HAVE_<MODE>_ARITH + && !TARGET_REALLY_IWMMXT + && (!<Is_float_mode> || flag_unsafe_math_optimizations)" +{ + arm_expand_vcond (operands, <V_cmp_result>mode); + DONE; +}) + +(define_expand "vcondu<mode><v_cmp_result>" + [(set (match_operand:VDQW 0 "s_register_operand") + (if_then_else:VDQW + (match_operator 3 "arm_comparison_operator" + [(match_operand:<V_cmp_result> 4 "s_register_operand") + (match_operand:<V_cmp_result> 5 "reg_or_zero_operand")]) + (match_operand:VDQW 1 "s_register_operand") + (match_operand:VDQW 2 "s_register_operand")))] + "ARM_HAVE_<MODE>_ARITH + && !TARGET_REALLY_IWMMXT" +{ + arm_expand_vcond (operands, <V_cmp_result>mode); + DONE; +}) + +(define_expand "vcond_mask_<mode><v_cmp_result>" + [(set (match_operand:VDQW 0 "s_register_operand") + (if_then_else:VDQW + (match_operand:<V_cmp_result> 3 "s_register_operand") + (match_operand:VDQW 1 "s_register_operand") + (match_operand:VDQW 2 "s_register_operand")))] + "ARM_HAVE_<MODE>_ARITH + && !TARGET_REALLY_IWMMXT" +{ + if (TARGET_NEON) + { + emit_insn (gen_neon_vbsl (<MODE>mode, operands[0], operands[3], + operands[1], operands[2])); + } + else if (TARGET_HAVE_MVE) + { + emit_insn (gen_mve_vpselq (VPSELQ_S, <MODE>mode, operands[0], + operands[1], operands[2], operands[3])); + } + else + gcc_unreachable (); + + DONE; +}) diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c new file mode 100644 index 0000000..029c931 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-1.c @@ -0,0 +1,80 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O3" } */ + +/* Integer tests. */ + +#define COMPARE_REG(NAME, OP, TYPE) \ + TYPE \ + cmp_##NAME##_##TYPE##_reg (TYPE a, TYPE b) \ + { \ + return a OP b; \ + } + +#define COMPARE_REG_AND_ZERO(NAME, OP, TYPE) \ + COMPARE_REG (NAME, OP, TYPE) \ + \ + TYPE \ + cmp_##NAME##_##TYPE##_zero (TYPE a) \ + { \ + return a OP (TYPE) {}; \ + } + +#define COMPARE_TYPE(TYPE, COMPARE_ORDERED) \ + COMPARE_REG_AND_ZERO (eq, ==, TYPE) \ + COMPARE_REG_AND_ZERO (ne, !=, TYPE) \ + COMPARE_ORDERED (lt, <, TYPE) \ + COMPARE_ORDERED (le, <=, TYPE) \ + COMPARE_ORDERED (gt, >, TYPE) \ + COMPARE_ORDERED (ge, >=, TYPE) + +#define TEST_TYPE(NAME, ELEM, COMPARE_ORDERED, SIZE) \ + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ + COMPARE_TYPE (NAME##SIZE, COMPARE_ORDERED) + +/* 64-bits vectors, not vectorized. */ +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 8) +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 8) +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 8) +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 8) +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 8) +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 8) + +/* 128-bits vectors. */ +TEST_TYPE (vs8, __INT8_TYPE__, COMPARE_REG_AND_ZERO, 16) +TEST_TYPE (vu8, __UINT8_TYPE__, COMPARE_REG, 16) +TEST_TYPE (vs16, __INT16_TYPE__, COMPARE_REG_AND_ZERO, 16) +TEST_TYPE (vu16, __UINT16_TYPE__, COMPARE_REG, 16) +TEST_TYPE (vs32, __INT32_TYPE__, COMPARE_REG_AND_ZERO, 16) +TEST_TYPE (vu32, __UINT32_TYPE__, COMPARE_REG, 16) + +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ + +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ + +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c new file mode 100644 index 0000000..8515195 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-2.c @@ -0,0 +1,38 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ + +/* float 32 tests. */ + +#ifndef ELEM_TYPE +#define ELEM_TYPE float +#endif +#ifndef INT_ELEM_TYPE +#define INT_ELEM_TYPE __INT32_TYPE__ +#endif + +#define COMPARE(NAME, OP) \ + int_vec \ + cmp_##NAME##_reg (vec a, vec b) \ + { \ + return a OP b; \ + } + +typedef INT_ELEM_TYPE int_vec __attribute__((vector_size(16))); +typedef ELEM_TYPE vec __attribute__((vector_size(16))); + +COMPARE (eq, ==) +COMPARE (ne, !=) +COMPARE (lt, <) +COMPARE (le, <=) +COMPARE (gt, >) +COMPARE (ge, >=) + +/* eq, ne, lt, le, gt, ge. +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c new file mode 100644 index 0000000..7774972 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-compare-scalar-1.c @@ -0,0 +1,69 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O3" } */ + +#define COMPARE_REG(NAME, OP, TYPE, SCALAR) \ + TYPE \ + cmp_##NAME##_##TYPE##_scalar (TYPE a, SCALAR b) \ + { \ + return a OP b; \ + } + +#define COMPARE_TYPE(SCALAR, TYPE) \ + COMPARE_REG (eq, ==, TYPE, SCALAR) \ + COMPARE_REG (ne, !=, TYPE, SCALAR) \ + COMPARE_REG (lt, <, TYPE, SCALAR) \ + COMPARE_REG (le, <=, TYPE, SCALAR) \ + COMPARE_REG (gt, >, TYPE, SCALAR) \ + COMPARE_REG (ge, >=, TYPE, SCALAR) + +#define TEST_TYPE(NAME, ELEM, SIZE) \ + typedef ELEM NAME##SIZE __attribute__((vector_size(SIZE))); \ + COMPARE_TYPE (ELEM, NAME##SIZE) + +/* 64-bits vectors, not vectorized. */ +TEST_TYPE (vs8, __INT8_TYPE__, 8) +TEST_TYPE (vu8, __UINT8_TYPE__, 8) +TEST_TYPE (vs16, __INT16_TYPE__, 8) +TEST_TYPE (vu16, __UINT16_TYPE__, 8) +TEST_TYPE (vs32, __INT32_TYPE__, 8) +TEST_TYPE (vu32, __UINT32_TYPE__, 8) + +/* 128-bits vectors. */ +TEST_TYPE (vs8, __INT8_TYPE__, 16) +TEST_TYPE (vu8, __UINT8_TYPE__, 16) +TEST_TYPE (vs16, __INT16_TYPE__, 16) +TEST_TYPE (vu16, __UINT16_TYPE__, 16) +TEST_TYPE (vs32, __INT32_TYPE__, 16) +TEST_TYPE (vu32, __UINT32_TYPE__, 16) + +/* { 8 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. +/* { dg-final { scan-assembler-times {\tvcmp.i8 eq, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.i8 ne, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 lt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 le, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 gt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s8 ge, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u8 hi, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u8 cs, q[0-9]+, q[0-9]+\n} 2 } } */ + +/* { 16 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. +/* { dg-final { scan-assembler-times {\tvcmp.i16 eq, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.i16 ne, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 lt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 le, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 gt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s16 ge, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u16 hi, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u16 cs, q[0-9]+, q[0-9]+\n} 2 } } */ + +/* { 32 bits } x { eq, ne, lt, le, gt, ge, hi, cs }. +/* { dg-final { scan-assembler-times {\tvcmp.i32 eq, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.i32 ne, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 lt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 le, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 gt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s32 ge, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u32 hi, q[0-9]+, q[0-9]+\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u32 cs, q[0-9]+, q[0-9]+\n} 2 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c new file mode 100644 index 0000000..4ed449e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32.c @@ -0,0 +1,30 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */ +/* { dg-add-options arm_v8_1m_mve_fp } */ +/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */ + +#include <stdint.h> + +#define NB 4 + +#define FUNC(OP, NAME) \ + void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \ + int i; \ + for (i=0; i<NB; i++) { \ + dest[i] = a[i] OP b[i]; \ + } \ + } + +FUNC(==, vcmpeq) +FUNC(!=, vcmpne) +FUNC(<, vcmplt) +FUNC(<=, vcmple) +FUNC(>, vcmpgt) +FUNC(>=, vcmpge) + +/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c new file mode 100644 index 0000000..8da15e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp.c @@ -0,0 +1,50 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target arm_v8_1m_mve_ok } */ +/* { dg-add-options arm_v8_1m_mve } */ +/* { dg-additional-options "-O3" } */ + +#include <stdint.h> + +#define FUNC(SIGN, TYPE, BITS, NB, OP, NAME) \ + void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \ + int i; \ + for (i=0; i<NB; i++) { \ + dest[i] = a[i] OP b[i]; \ + } \ +} + +#define ALL_FUNCS(OP, NAME) \ + FUNC(s, int, 32, 2, OP, NAME) \ + FUNC(u, uint, 32, 2, OP, NAME) \ + FUNC(s, int, 16, 4, OP, NAME) \ + FUNC(u, uint, 16, 4, OP, NAME) \ + FUNC(s, int, 8, 8, OP, NAME) \ + FUNC(u, uint, 8, 8, OP, NAME) \ + FUNC(s, int, 32, 4, OP, NAME) \ + FUNC(u, uint, 32, 4, OP, NAME) \ + FUNC(s, int, 16, 8, OP, NAME) \ + FUNC(u, uint, 16, 8, OP, NAME) \ + FUNC(s, int, 8, 16, OP, NAME) \ + FUNC(u, uint, 8, 16, OP, NAME) + +ALL_FUNCS(==, vcmpeq) +ALL_FUNCS(!=, vcmpne) +ALL_FUNCS(<, vcmplt) +ALL_FUNCS(<=, vcmple) +ALL_FUNCS(>, vcmpgt) +ALL_FUNCS(>=, vcmpge) + +/* MVE has only 128-bit vectors, so we can vectorize only half of the + functions above. */ +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ eq, q[0-9]+, q[0-9]+\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.i[0-9]+ ne, q[0-9]+, q[0-9]+\n} 6 } } */ + +/* lt, le, gt, ge apply to signed types, cs and hi to unsigned types. */ +/* lt and le with unsigned types are replaced with the opposite condition, hence + the double number of matches for cs and hi. */ +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ lt, q[0-9]+, q[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ le, q[0-9]+, q[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ gt, q[0-9]+, q[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.s[0-9]+ ge, q[0-9]+, q[0-9]+\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ cs, q[0-9]+, q[0-9]+\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tvcmp.u[0-9]+ hi, q[0-9]+, q[0-9]+\n} 6 } } */