From patchwork Fri Apr 8 08:30:51 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Claudiu Zissulescu X-Patchwork-Id: 607928 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3qhCR54cd0z9t3t for ; Fri, 8 Apr 2016 18:33:36 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=sxop5AJH; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=qHWxEsvq7MjttzYb+rEFwZpwtoHMRGRbZ4T/f77D6NEAVVXCeB OUGA50CV3cQbO5bhXVdpW1xEUdi1G0dyjb6LOukKhGN9iywPocnk3dZRQKKEHs71 YHrDMlDlln2mSo0/QjtMkkpIOAbWOE5vKyOpa3AwQNsbkdMRqx9nHLuD8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=E8VC5r1+Vbo7ChfeuN3ySFBNUk8=; b=sxop5AJHAEqColm3QExt DoXnAOq9aYuRT05CUQpYUrxfFgdxcuV5GzdGv5VwHc1sabxl6JTpWxg465X/csr6 3krYoTXKcAZvilpNaB2Vz0JOlWxc2HE58i0ACeGRPf+j9/U1sjri6dHSWB+1TKGL aqhS2gyXeeq0GR2c9ZfEXPY= Received: (qmail 72054 invoked by alias); 8 Apr 2016 08:33:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 71597 invoked by uid 89); 8 Apr 2016 08:33:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL, BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=ri, chase, def_builtin, sk:int_fty X-HELO: smtprelay.synopsys.com Received: from smtprelay.synopsys.com (HELO smtprelay.synopsys.com) (198.182.60.111) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Fri, 08 Apr 2016 08:33:18 +0000 Received: from dc8secmta1.synopsys.com (dc8secmta1.synopsys.com [10.13.218.200]) by smtprelay.synopsys.com (Postfix) with ESMTP id AC5D010C1358; Fri, 8 Apr 2016 01:33:15 -0700 (PDT) Received: from dc8secmta1.internal.synopsys.com (dc8secmta1.internal.synopsys.com [127.0.0.1]) by dc8secmta1.internal.synopsys.com (Service) with ESMTP id 91BCC27113; Fri, 8 Apr 2016 01:33:15 -0700 (PDT) Received: from mailhost.synopsys.com (unknown [10.13.184.66]) by dc8secmta1.internal.synopsys.com (Service) with ESMTP id 54DD927102; Fri, 8 Apr 2016 01:33:15 -0700 (PDT) Received: from mailhost.synopsys.com (localhost [127.0.0.1]) by mailhost.synopsys.com (Postfix) with ESMTP id 33F714F3; Fri, 8 Apr 2016 01:33:15 -0700 (PDT) Received: from US01WEHTC2.internal.synopsys.com (us01wehtc2.internal.synopsys.com [10.12.239.237]) by mailhost.synopsys.com (Postfix) with ESMTP id 002BF4E6; Fri, 8 Apr 2016 01:33:14 -0700 (PDT) Received: from IN01WEHTCB.internal.synopsys.com (10.144.199.106) by US01WEHTC2.internal.synopsys.com (10.12.239.237) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 8 Apr 2016 01:32:11 -0700 Received: from IN01WEHTCA.internal.synopsys.com (10.144.199.103) by IN01WEHTCB.internal.synopsys.com (10.144.199.105) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 8 Apr 2016 14:02:09 +0530 Received: from nl20droid1.internal.synopsys.com (10.100.24.228) by IN01WEHTCA.internal.synopsys.com (10.144.199.243) with Microsoft SMTP Server (TLS) id 14.3.195.1; Fri, 8 Apr 2016 14:02:08 +0530 From: Claudiu Zissulescu To: CC: , , , Subject: [PATCH] [ARC] Add SIMD extensions for ARC HS Date: Fri, 8 Apr 2016 10:30:51 +0200 Message-ID: <1460104251-30514-1-git-send-email-claziss@synopsys.com> MIME-Version: 1.0 This patch adds support for the new SIMD operations added to ARC HS cpu class. The proposed patch doesn't chase for performance but offers support for those newly added operations, and autovectorization. The patch is tested using dg.exp, compile.exp, and execute.exp for both arc700 and archs with and without SIMD support enabled. OK to apply? Claudiu gcc/ 2016-03-14 Claudiu Zissulescu * config/arc/arc.c (arc_vector_mode_supported_p): Add support for the new ARC HS SIMD instructions. (arc_preferred_simd_mode): New function. (arc_autovectorize_vector_sizes): Likewise. (TARGET_VECTORIZE_PREFERRED_SIMD_MODE) (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Define. (arc_init_reg_tables): Accept new ARC HS SIMD modes. (arc_init_builtins): Add new SIMD builtin types. (arc_split_move): Handle 64 bit vector moves. * config/arc/arc.h (TARGET_PLUS_DMPY, TARGET_PLUS_MACD) (TARGET_PLUS_QMACW): Define. * config/arc/builtins.def (QMACH, QMACHU, QMPYH, QMPYHU, DMACH) (DMACHU, DMPYH, DMPYHU, DMACWH, DMACWHU, VMAC2H, VMAC2HU, VMPY2H) (VMPY2HU, VADDSUB2H, VSUBADD2H, VADDSUB, VSUBADD, VADDSUB4H) (VSUBADD4H): New builtins. * config/arc/simdext.md: Add new ARC HS SIMD instructions. * testsuite/gcc.target/arc/builtin_simdarc.c: New file. --- gcc/config/arc/arc.c | 112 ++++- gcc/config/arc/arc.h | 6 + gcc/config/arc/builtins.def | 27 ++ gcc/config/arc/simdext.md | 571 +++++++++++++++++++++++++ gcc/testsuite/gcc.target/arc/builtin_simdarc.c | 38 ++ 5 files changed, 747 insertions(+), 7 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arc/builtin_simdarc.c diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c index d60db50..d120946 100644 --- a/gcc/config/arc/arc.c +++ b/gcc/config/arc/arc.c @@ -247,16 +247,47 @@ static bool arc_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT, static bool arc_vector_mode_supported_p (machine_mode mode) { - if (!TARGET_SIMD_SET) - return false; + switch (mode) + { + case V2HImode: + return TARGET_PLUS_DMPY; + case V4HImode: + case V2SImode: + return TARGET_PLUS_QMACW; + case V4SImode: + case V8HImode: + return TARGET_SIMD_SET; - if ((mode == V4SImode) - || (mode == V8HImode)) - return true; + default: + return false; + } +} - return false; +/* Implements target hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE. */ + +static enum machine_mode +arc_preferred_simd_mode (enum machine_mode mode) +{ + switch (mode) + { + case HImode: + return TARGET_PLUS_QMACW ? V4HImode : V2HImode; + case SImode: + return V2SImode; + + default: + return word_mode; + } } +/* Implements target hook + TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES. */ + +static unsigned int +arc_autovectorize_vector_sizes (void) +{ + return TARGET_PLUS_QMACW ? (8 | 4) : 0; +} /* TARGET_PRESERVE_RELOAD_P is still awaiting patch re-evaluation / review. */ static bool arc_preserve_reload_p (rtx in) ATTRIBUTE_UNUSED; @@ -345,6 +376,12 @@ static void arc_finalize_pic (void); #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P arc_vector_mode_supported_p +#undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE +#define TARGET_VECTORIZE_PREFERRED_SIMD_MODE arc_preferred_simd_mode + +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES arc_autovectorize_vector_sizes + #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P arc_can_use_doloop_p @@ -1214,7 +1251,12 @@ arc_init_reg_tables (void) arc_mode_class[i] = 0; break; case MODE_VECTOR_INT: - arc_mode_class [i] = (1<< (int) V_MODE); + if (GET_MODE_SIZE (m) == 4) + arc_mode_class[i] = (1 << (int) S_MODE); + else if (GET_MODE_SIZE (m) == 8) + arc_mode_class[i] = (1 << (int) D_MODE); + else + arc_mode_class[i] = (1 << (int) V_MODE); break; case MODE_CC: default: @@ -5277,6 +5319,15 @@ arc_builtin_decl (unsigned id, bool initialize_p ATTRIBUTE_UNUSED) static void arc_init_builtins (void) { + tree V4HI_type_node; + tree V2SI_type_node; + tree V2HI_type_node; + + /* Vector types based on HS SIMD elements. */ + V4HI_type_node = build_vector_type_for_mode (intHI_type_node, V4HImode); + V2SI_type_node = build_vector_type_for_mode (intSI_type_node, V2SImode); + V2HI_type_node = build_vector_type_for_mode (intHI_type_node, V2HImode); + tree pcvoid_type_node = build_pointer_type (build_qualified_type (void_type_node, TYPE_QUAL_CONST)); @@ -5341,6 +5392,28 @@ arc_init_builtins (void) tree v8hi_ftype_v8hi = build_function_type_list (V8HI_type_node, V8HI_type_node, NULL_TREE); + /* ARCv2 SIMD types. */ + tree long_ftype_v4hi_v4hi + = build_function_type_list (long_long_integer_type_node, + V4HI_type_node, V4HI_type_node, NULL_TREE); + tree int_ftype_v2hi_v2hi + = build_function_type_list (integer_type_node, + V2HI_type_node, V2HI_type_node, NULL_TREE); + tree v2si_ftype_v2hi_v2hi + = build_function_type_list (V2SI_type_node, + V2HI_type_node, V2HI_type_node, NULL_TREE); + tree v2hi_ftype_v2hi_v2hi + = build_function_type_list (V2HI_type_node, + V2HI_type_node, V2HI_type_node, NULL_TREE); + tree v2si_ftype_v2si_v2si + = build_function_type_list (V2SI_type_node, + V2SI_type_node, V2SI_type_node, NULL_TREE); + tree v4hi_ftype_v4hi_v4hi + = build_function_type_list (V4HI_type_node, + V4HI_type_node, V4HI_type_node, NULL_TREE); + tree long_ftype_v2si_v2hi + = build_function_type_list (long_long_integer_type_node, + V2SI_type_node, V2HI_type_node, NULL_TREE); /* Add the builtins. */ #define DEF_BUILTIN(NAME, N_ARGS, TYPE, ICODE, MASK) \ @@ -8706,6 +8779,31 @@ arc_split_move (rtx *operands) return; } + if (TARGET_PLUS_QMACW + && GET_CODE (operands[1]) == CONST_VECTOR) + { + HOST_WIDE_INT intval0, intval1; + if (GET_MODE (operands[1]) == V2SImode) + { + intval0 = INTVAL (XVECEXP (operands[1], 0, 0)); + intval1 = INTVAL (XVECEXP (operands[1], 0, 1)); + } + else + { + intval1 = INTVAL (XVECEXP (operands[1], 0, 3)) << 16; + intval1 |= INTVAL (XVECEXP (operands[1], 0, 2)) & 0xFFFF; + intval0 = INTVAL (XVECEXP (operands[1], 0, 1)) << 16; + intval0 |= INTVAL (XVECEXP (operands[1], 0, 0)) & 0xFFFF; + } + xop[0] = gen_rtx_REG (SImode, REGNO (operands[0])); + xop[3] = gen_rtx_REG (SImode, REGNO (operands[0]) + 1); + xop[2] = GEN_INT (trunc_int_for_mode (intval0, SImode)); + xop[1] = GEN_INT (trunc_int_for_mode (intval1, SImode)); + emit_move_insn (xop[0], xop[2]); + emit_move_insn (xop[3], xop[1]); + return; + } + for (i = 0; i < 2; i++) { if (MEM_P (operands[i]) && auto_inc_p (XEXP (operands[i], 0))) diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index 21c049f..7fc465b 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -1723,6 +1723,12 @@ enum /* Any multiplication feature macro. */ #define TARGET_ANY_MPY \ (TARGET_MPY || TARGET_MUL64_SET || TARGET_MULMAC_32BY16_SET) +/* PLUS_DMPY feature macro. */ +#define TARGET_PLUS_DMPY ((arc_mpy_option > 6) && TARGET_HS) +/* PLUS_MACD feature macro. */ +#define TARGET_PLUS_MACD ((arc_mpy_option > 7) && TARGET_HS) +/* PLUS_QMACW feature macro. */ +#define TARGET_PLUS_QMACW ((arc_mpy_option > 8) && TARGET_HS) /* ARC600 and ARC601 feature macro. */ #define TARGET_ARC600_FAMILY (TARGET_ARC600 || TARGET_ARC601) diff --git a/gcc/config/arc/builtins.def b/gcc/config/arc/builtins.def index 19be1d2..8c71d30 100644 --- a/gcc/config/arc/builtins.def +++ b/gcc/config/arc/builtins.def @@ -193,3 +193,30 @@ DEF_BUILTIN (VINTI, 1, void_ftype_int, vinti_insn, TARGET_SIMD_SET) /* END SIMD marker. */ DEF_BUILTIN (SIMD_END, 0, void_ftype_void, nothing, 0) + +/* ARCv2 SIMD instructions that use/clobber the accumulator reg. */ +DEF_BUILTIN (QMACH, 2, long_ftype_v4hi_v4hi, qmach, TARGET_PLUS_QMACW) +DEF_BUILTIN (QMACHU, 2, long_ftype_v4hi_v4hi, qmachu, TARGET_PLUS_QMACW) +DEF_BUILTIN (QMPYH, 2, long_ftype_v4hi_v4hi, qmpyh, TARGET_PLUS_QMACW) +DEF_BUILTIN (QMPYHU, 2, long_ftype_v4hi_v4hi, qmpyhu, TARGET_PLUS_QMACW) + +DEF_BUILTIN (DMACH, 2, int_ftype_v2hi_v2hi, dmach, TARGET_PLUS_DMPY) +DEF_BUILTIN (DMACHU, 2, int_ftype_v2hi_v2hi, dmachu, TARGET_PLUS_DMPY) +DEF_BUILTIN (DMPYH, 2, int_ftype_v2hi_v2hi, dmpyh, TARGET_PLUS_DMPY) +DEF_BUILTIN (DMPYHU, 2, int_ftype_v2hi_v2hi, dmpyhu, TARGET_PLUS_DMPY) + +DEF_BUILTIN (DMACWH, 2, long_ftype_v2si_v2hi, dmacwh, TARGET_PLUS_QMACW) +DEF_BUILTIN (DMACWHU, 2, long_ftype_v2si_v2hi, dmacwhu, TARGET_PLUS_QMACW) + +DEF_BUILTIN (VMAC2H, 2, v2si_ftype_v2hi_v2hi, vmac2h, TARGET_PLUS_MACD) +DEF_BUILTIN (VMAC2HU, 2, v2si_ftype_v2hi_v2hi, vmac2hu, TARGET_PLUS_MACD) +DEF_BUILTIN (VMPY2H, 2, v2si_ftype_v2hi_v2hi, vmpy2h, TARGET_PLUS_MACD) +DEF_BUILTIN (VMPY2HU, 2, v2si_ftype_v2hi_v2hi, vmpy2hu, TARGET_PLUS_MACD) + +/* Combined add/sub HS SIMD instructions. */ +DEF_BUILTIN (VADDSUB2H, 2, v2hi_ftype_v2hi_v2hi, addsubv2hi3, TARGET_PLUS_DMPY) +DEF_BUILTIN (VSUBADD2H, 2, v2hi_ftype_v2hi_v2hi, subaddv2hi3, TARGET_PLUS_DMPY) +DEF_BUILTIN (VADDSUB, 2, v2si_ftype_v2si_v2si, addsubv2si3, TARGET_PLUS_QMACW) +DEF_BUILTIN (VSUBADD, 2, v2si_ftype_v2si_v2si, subaddv2si3, TARGET_PLUS_QMACW) +DEF_BUILTIN (VADDSUB4H, 2, v4hi_ftype_v4hi_v4hi, addsubv4hi3, TARGET_PLUS_QMACW) +DEF_BUILTIN (VSUBADD4H, 2, v4hi_ftype_v4hi_v4hi, subaddv4hi3, TARGET_PLUS_QMACW) diff --git a/gcc/config/arc/simdext.md b/gcc/config/arc/simdext.md index 9fd9d62..51869e3 100644 --- a/gcc/config/arc/simdext.md +++ b/gcc/config/arc/simdext.md @@ -1288,3 +1288,574 @@ [(set_attr "type" "simd_vcontrol") (set_attr "length" "4") (set_attr "cond" "nocond")]) + +;; New ARCv2 SIMD extensions + +;;64-bit vectors of halwords and words +(define_mode_iterator VWH [V4HI V2SI]) + +;;double element vectors +(define_mode_iterator VDV [V2HI V2SI]) +(define_mode_attr V_addsub [(V2HI "HI") (V2SI "SI")]) +(define_mode_attr V_addsub_suffix [(V2HI "2h") (V2SI "")]) + +;;all vectors +(define_mode_iterator VCT [V2HI V4HI V2SI]) +(define_mode_attr V_suffix [(V2HI "2h") (V4HI "4h") (V2SI "2")]) + +;; Widening operations. +(define_code_iterator SE [sign_extend zero_extend]) +(define_code_attr V_US [(sign_extend "s") (zero_extend "u")]) +(define_code_attr V_US_suffix [(sign_extend "") (zero_extend "u")]) + + +;; Move patterns +(define_expand "movv2hi" + [(set (match_operand:V2HI 0 "move_dest_operand" "") + (match_operand:V2HI 1 "general_operand" ""))] + "" + "{ + if (prepare_move_operands (operands, V2HImode)) + DONE; + }") + +(define_insn_and_split "*movv2hi_insn" + [(set (match_operand:V2HI 0 "nonimmediate_operand" "=r,r,r,m") + (match_operand:V2HI 1 "general_operand" "i,r,m,r"))] + "(register_operand (operands[0], V2HImode) + || register_operand (operands[1], V2HImode))" + "@ + # + mov%? %0, %1 + ld%U1%V1 %0,%1 + st%U0%V0 %1,%0" + "reload_completed && GET_CODE (operands[1]) == CONST_VECTOR" + [(set (match_dup 0) (match_dup 2))] + { + HOST_WIDE_INT intval = INTVAL (XVECEXP (operands[1], 0, 1)) << 16; + intval |= INTVAL (XVECEXP (operands[1], 0, 0)) & 0xFFFF; + + operands[0] = gen_rtx_REG (SImode, REGNO (operands[0])); + operands[2] = GEN_INT (trunc_int_for_mode (intval, SImode)); + } + [(set_attr "type" "move,move,load,store") + (set_attr "predicable" "yes,yes,no,no") + (set_attr "iscompact" "false,false,false,false") + ]) + +(define_expand "movmisalignv2hi" + [(set (match_operand:V2HI 0 "general_operand" "") + (match_operand:V2HI 1 "general_operand" ""))] + "" +{ + if (!register_operand (operands[0], V2HImode) + && !register_operand (operands[1], V2HImode)) + operands[1] = force_reg (V2HImode, operands[1]); +}) + +(define_expand "mov" + [(set (match_operand:VWH 0 "move_dest_operand" "") + (match_operand:VWH 1 "general_operand" ""))] + "" + "{ + if (GET_CODE (operands[0]) == MEM) + operands[1] = force_reg (mode, operands[1]); + }") + +(define_insn_and_split "*mov_insn" + [(set (match_operand:VWH 0 "move_dest_operand" "=r,r,r,m") + (match_operand:VWH 1 "general_operand" "i,r,m,r"))] + "TARGET_PLUS_QMACW + && (register_operand (operands[0], mode) + || register_operand (operands[1], mode))" + "* +{ + switch (which_alternative) + { + default: + return \"#\"; + + case 1: + return \"vadd2 %0, %1, 0\"; + + case 2: + if (TARGET_LL64) + return \"ldd%U1%V1 %0,%1\"; + return \"#\"; + + case 3: + if (TARGET_LL64) + return \"std%U0%V0 %1,%0\"; + return \"#\"; + } +}" + "reload_completed" + [(const_int 0)] + { + arc_split_move (operands); + DONE; + } + [(set_attr "type" "move,move,load,store") + (set_attr "predicable" "yes,no,no,no") + (set_attr "iscompact" "false,false,false,false") + ]) + +(define_expand "movmisalign" + [(set (match_operand:VWH 0 "general_operand" "") + (match_operand:VWH 1 "general_operand" ""))] + "" +{ + if (!register_operand (operands[0], mode) + && !register_operand (operands[1], mode)) + operands[1] = force_reg (mode, operands[1]); +}) + +(define_insn "bswapv2hi2" + [(set (match_operand:V2HI 0 "register_operand" "=r,r") + (bswap:V2HI (match_operand:V2HI 1 "nonmemory_operand" "r,i")))] + "TARGET_V2 && TARGET_SWAP" + "swape %0, %1" + [(set_attr "length" "4,8") + (set_attr "type" "two_cycle_core")]) + +;; Simple arithmetic insns +(define_insn "add3" + [(set (match_operand:VCT 0 "register_operand" "=r,r") + (plus:VCT (match_operand:VCT 1 "register_operand" "0,r") + (match_operand:VCT 2 "register_operand" "r,r")))] + "TARGET_PLUS_DMPY" + "vadd%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "sub3" + [(set (match_operand:VCT 0 "register_operand" "=r,r") + (minus:VCT (match_operand:VCT 1 "register_operand" "0,r") + (match_operand:VCT 2 "register_operand" "r,r")))] + "TARGET_PLUS_DMPY" + "vsub%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +;; Combined arithmetic ops +(define_insn "addsub3" + [(set (match_operand:VDV 0 "register_operand" "=r,r") + (vec_concat:VDV + (plus: (vec_select: (match_operand:VDV 1 "register_operand" "0,r") + (parallel [(const_int 0)])) + (vec_select: (match_operand:VDV 2 "register_operand" "r,r") + (parallel [(const_int 0)]))) + (minus: (vec_select: (match_dup 1) (parallel [(const_int 1)])) + (vec_select: (match_dup 2) (parallel [(const_int 1)])))))] + "TARGET_PLUS_DMPY" + "vaddsub%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "subadd3" + [(set (match_operand:VDV 0 "register_operand" "=r,r") + (vec_concat:VDV + (minus: (vec_select: (match_operand:VDV 1 "register_operand" "0,r") + (parallel [(const_int 0)])) + (vec_select: (match_operand:VDV 2 "register_operand" "r,r") + (parallel [(const_int 0)]))) + (plus: (vec_select: (match_dup 1) (parallel [(const_int 1)])) + (vec_select: (match_dup 2) (parallel [(const_int 1)])))))] + "TARGET_PLUS_DMPY" + "vsubadd%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "addsubv4hi3" + [(set (match_operand:V4HI 0 "even_register_operand" "=r,r") + (vec_concat:V4HI + (vec_concat:V2HI + (plus:HI (vec_select:HI (match_operand:V4HI 1 "even_register_operand" "0,r") + (parallel [(const_int 0)])) + (vec_select:HI (match_operand:V4HI 2 "even_register_operand" "r,r") + (parallel [(const_int 0)]))) + (minus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)])) + (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))) + (vec_concat:V2HI + (plus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) + (vec_select:HI (match_dup 2) (parallel [(const_int 2)]))) + (minus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 3)])) + (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) + ))] + "TARGET_PLUS_QMACW" + "vaddsub4h%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "subaddv4hi3" + [(set (match_operand:V4HI 0 "even_register_operand" "=r,r") + (vec_concat:V4HI + (vec_concat:V2HI + (minus:HI (vec_select:HI (match_operand:V4HI 1 "even_register_operand" "0,r") + (parallel [(const_int 0)])) + (vec_select:HI (match_operand:V4HI 2 "even_register_operand" "r,r") + (parallel [(const_int 0)]))) + (plus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 1)])) + (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))) + (vec_concat:V2HI + (minus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 2)])) + (vec_select:HI (match_dup 2) (parallel [(const_int 2)]))) + (plus:HI (vec_select:HI (match_dup 1) (parallel [(const_int 3)])) + (vec_select:HI (match_dup 2) (parallel [(const_int 3)])))) + ))] + "TARGET_PLUS_QMACW" + "vsubadd4h%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +;; Multiplication +(define_insn "dmpyh" + [(set (match_operand:SI 0 "register_operand" "=r,r") + (plus:SI + (mult:SI + (SE:SI + (vec_select:HI (match_operand:V2HI 1 "register_operand" "0,r") + (parallel [(const_int 0)]))) + (SE:SI + (vec_select:HI (match_operand:V2HI 2 "register_operand" "r,r") + (parallel [(const_int 0)])))) + (mult:SI + (SE:SI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) + (SE:SI (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))))) + (set (reg:DI ARCV2_ACC) + (zero_extend:DI + (plus:SI + (mult:SI + (SE:SI (vec_select:HI (match_dup 1) (parallel [(const_int 0)]))) + (SE:SI (vec_select:HI (match_dup 2) (parallel [(const_int 0)])))) + (mult:SI + (SE:SI (vec_select:HI (match_dup 1) (parallel [(const_int 1)]))) + (SE:SI (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))))))] + "TARGET_PLUS_DMPY" + "dmpy%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +;; We can use dmac as well here. To be investigated which version +;; brings more. +(define_expand "sdot_prodv2hi" + [(match_operand:SI 0 "register_operand" "") + (match_operand:V2HI 1 "register_operand" "") + (match_operand:V2HI 2 "register_operand" "") + (match_operand:SI 3 "register_operand" "")] + "TARGET_PLUS_DMPY" +{ + rtx t = gen_reg_rtx (SImode); + emit_insn (gen_dmpyh (t, operands[1], operands[2])); + emit_insn (gen_addsi3 (operands[0], operands[3], t)); + DONE; +}) + +(define_expand "udot_prodv2hi" + [(match_operand:SI 0 "register_operand" "") + (match_operand:V2HI 1 "register_operand" "") + (match_operand:V2HI 2 "register_operand" "") + (match_operand:SI 3 "register_operand" "")] + "TARGET_PLUS_DMPY" +{ + rtx t = gen_reg_rtx (SImode); + emit_insn (gen_dmpyhu (t, operands[1], operands[2])); + emit_insn (gen_addsi3 (operands[0], operands[3], t)); + DONE; +}) + +(define_insn "arc_vec_mult_lo_v4hi" + [(set (match_operand:V2SI 0 "even_register_operand" "=r,r") + (mult:V2SI (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 1 "even_register_operand" "0,r") + (parallel [(const_int 0) (const_int 1)]))) + (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 2 "even_register_operand" "r,r") + (parallel [(const_int 0) (const_int 1)]))))) + (set (reg:V2SI ARCV2_ACC) + (mult:V2SI (SE:V2SI (vec_select:V2HI (match_dup 1) + (parallel [(const_int 0) (const_int 1)]))) + (SE:V2SI (vec_select:V2HI (match_dup 2) + (parallel [(const_int 0) (const_int 1)]))))) + ] + "TARGET_PLUS_MACD" + "vmpy2h%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "arc_vec_multacc_lo_v4hi" + [(set (reg:V2SI ARCV2_ACC) + (mult:V2SI (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 0 "even_register_operand" "r") + (parallel [(const_int 0) (const_int 1)]))) + (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 1 "even_register_operand" "r") + (parallel [(const_int 0) (const_int 1)]))))) + ] + "TARGET_PLUS_MACD" + "vmpy2h%? 0, %0, %1" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "no") + (set_attr "cond" "nocond")]) + +(define_expand "vec_widen_mult_lo_v4hi" + [(set (match_operand:V2SI 0 "even_register_operand" "") + (mult:V2SI (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 1 "even_register_operand" "") + (parallel [(const_int 0) (const_int 1)]))) + (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 2 "even_register_operand" "") + (parallel [(const_int 0) (const_int 1)])))))] + "TARGET_PLUS_QMACW" + { + emit_insn (gen_arc_vec_mult_lo_v4hi (operands[0], + operands[1], + operands[2])); + DONE; + } +) + +(define_insn "arc_vec_mult_hi_v4hi" + [(set (match_operand:V2SI 0 "even_register_operand" "=r,r") + (mult:V2SI (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 1 "even_register_operand" "0,r") + (parallel [(const_int 2) (const_int 3)]))) + (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 2 "even_register_operand" "r,r") + (parallel [(const_int 2) (const_int 3)]))))) + (set (reg:V2SI ARCV2_ACC) + (mult:V2SI (SE:V2SI (vec_select:V2HI (match_dup 1) + (parallel [(const_int 2) (const_int 3)]))) + (SE:V2SI (vec_select:V2HI (match_dup 2) + (parallel [(const_int 2) (const_int 3)]))))) + ] + "TARGET_PLUS_QMACW" + "vmpy2h%? %0, %R1, %R2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_expand "vec_widen_mult_hi_v4hi" + [(set (match_operand:V2SI 0 "even_register_operand" "") + (mult:V2SI (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 1 "even_register_operand" "") + (parallel [(const_int 2) (const_int 3)]))) + (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 2 "even_register_operand" "") + (parallel [(const_int 2) (const_int 3)])))))] + "TARGET_PLUS_MACD" + { + emit_insn (gen_arc_vec_mult_hi_v4hi (operands[0], + operands[1], + operands[2])); + DONE; + } +) + +(define_insn "arc_vec_mac_hi_v4hi" + [(set (match_operand:V2SI 0 "even_register_operand" "=r,r") + (plus:V2SI + (reg:V2SI ARCV2_ACC) + (mult:V2SI (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 1 "even_register_operand" "0,r") + (parallel [(const_int 2) (const_int 3)]))) + (SE:V2SI (vec_select:V2HI + (match_operand:V4HI 2 "even_register_operand" "r,r") + (parallel [(const_int 2) (const_int 3)])))))) + (set (reg:V2SI ARCV2_ACC) + (plus:V2SI + (reg:V2SI ARCV2_ACC) + (mult:V2SI (SE:V2SI (vec_select:V2HI (match_dup 1) + (parallel [(const_int 2) (const_int 3)]))) + (SE:V2SI (vec_select:V2HI (match_dup 2) + (parallel [(const_int 2) (const_int 3)])))))) + ] + "TARGET_PLUS_MACD" + "vmac2h%? %0, %R1, %R2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +;; Builtins +(define_insn "dmach" + [(set (match_operand:SI 0 "register_operand" "=r,r") + (unspec:SI [(match_operand:V2HI 1 "register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_DMACH)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_DMPY" + "dmach%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "dmachu" + [(set (match_operand:SI 0 "register_operand" "=r,r") + (unspec:SI [(match_operand:V2HI 1 "register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_DMACHU)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_DMPY" + "dmachu%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "dmacwh" + [(set (match_operand:DI 0 "even_register_operand" "=r,r") + (unspec:DI [(match_operand:V2SI 1 "even_register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_DMACWH)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_QMACW" + "dmacwh%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "dmacwhu" + [(set (match_operand:DI 0 "register_operand" "=r,r") + (unspec:DI [(match_operand:V2SI 1 "even_register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_DMACWHU)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_QMACW" + "dmacwhu%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "vmac2h" + [(set (match_operand:V2SI 0 "even_register_operand" "=r,r") + (unspec:V2SI [(match_operand:V2HI 1 "register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_VMAC2H)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_MACD" + "vmac2h%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "vmac2hu" + [(set (match_operand:V2SI 0 "even_register_operand" "=r,r") + (unspec:V2SI [(match_operand:V2HI 1 "register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_VMAC2HU)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_MACD" + "vmac2hu%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "vmpy2h" + [(set (match_operand:V2SI 0 "even_register_operand" "=r,r") + (unspec:V2SI [(match_operand:V2HI 1 "register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r")] + UNSPEC_ARC_VMPY2H)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_MACD" + "vmpy2h%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "vmpy2hu" + [(set (match_operand:V2SI 0 "even_register_operand" "=r,r") + (unspec:V2SI [(match_operand:V2HI 1 "register_operand" "0,r") + (match_operand:V2HI 2 "register_operand" "r,r")] + UNSPEC_ARC_VMPY2HU)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_MACD" + "vmpy2hu%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "qmach" + [(set (match_operand:DI 0 "even_register_operand" "=r,r") + (unspec:DI [(match_operand:V4HI 1 "even_register_operand" "0,r") + (match_operand:V4HI 2 "even_register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_QMACH)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_QMACW" + "qmach%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "qmachu" + [(set (match_operand:DI 0 "even_register_operand" "=r,r") + (unspec:DI [(match_operand:V4HI 1 "even_register_operand" "0,r") + (match_operand:V4HI 2 "even_register_operand" "r,r") + (reg:DI ARCV2_ACC)] + UNSPEC_ARC_QMACHU)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_QMACW" + "qmachu%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "qmpyh" + [(set (match_operand:DI 0 "even_register_operand" "=r,r") + (unspec:DI [(match_operand:V4HI 1 "even_register_operand" "0,r") + (match_operand:V4HI 2 "even_register_operand" "r,r")] + UNSPEC_ARC_QMPYH)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_QMACW" + "qmpyh%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) + +(define_insn "qmpyhu" + [(set (match_operand:DI 0 "even_register_operand" "=r,r") + (unspec:DI [(match_operand:V4HI 1 "even_register_operand" "0,r") + (match_operand:V4HI 2 "even_register_operand" "r,r")] + UNSPEC_ARC_QMPYHU)) + (clobber (reg:DI ARCV2_ACC))] + "TARGET_PLUS_QMACW" + "qmpyhu%? %0, %1, %2" + [(set_attr "length" "4") + (set_attr "type" "multi") + (set_attr "predicable" "yes,no") + (set_attr "cond" "canuse,nocond")]) diff --git a/gcc/testsuite/gcc.target/arc/builtin_simdarc.c b/gcc/testsuite/gcc.target/arc/builtin_simdarc.c new file mode 100644 index 0000000..68aae40 --- /dev/null +++ b/gcc/testsuite/gcc.target/arc/builtin_simdarc.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-mcpu=archs -O2 -Werror-implicit-function-declaration -mmpy-option=9" } */ + +#define STEST(name, rettype, op1type, op2type) \ + rettype test_ ## name \ + (op1type a, op2type b) \ + { \ + return __builtin_arc_ ## name (a, b); \ + } + +typedef short v2hi __attribute__ ((vector_size (4))); +typedef short v4hi __attribute__ ((vector_size (8))); +typedef int v2si __attribute__ ((vector_size (8))); + +STEST (qmach, long long, v4hi, v4hi) +STEST (qmachu, long long, v4hi, v4hi) +STEST (qmpyh, long long, v4hi, v4hi) +STEST (qmpyhu, long long, v4hi, v4hi) + +STEST (dmach, int, v2hi, v2hi) +STEST (dmachu, int, v2hi, v2hi) +STEST (dmpyh, int, v2hi, v2hi) +STEST (dmpyhu, int, v2hi, v2hi) + +STEST (dmacwh, long, v2si, v2hi) +STEST (dmacwhu, long, v2si, v2hi) + +STEST (vmac2h, v2si, v2hi, v2hi) +STEST (vmac2hu, v2si, v2hi, v2hi) +STEST (vmpy2h, v2si, v2hi, v2hi) +STEST (vmpy2hu, v2si, v2hi, v2hi) + +STEST (vaddsub2h, v2hi, v2hi, v2hi) +STEST (vsubadd2h, v2hi, v2hi, v2hi) +STEST (vaddsub, v2si, v2si, v2si) +STEST (vsubadd, v2si, v2si, v2si) +STEST (vaddsub4h, v4hi, v4hi, v4hi) +STEST (vsubadd4h, v4hi, v4hi, v4hi)