From patchwork Fri Jun 14 13:55:01 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vidya Praveen X-Patchwork-Id: 251434 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 394DF2C0095 for ; Fri, 14 Jun 2013 23:55:16 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=XTv5u1UFz5rl1TWF8+uabv80WXfZy9YdRuCE46OPavo CDyeWynYl3gv6hLgZD5oYU19z/W58ABDp41EXvaWk5Tz+RqvKqoV5ZPZYIgfyJfb NN5Ta8x5UqvRrq7PV6cfySyOoDZVy3FGFnSr8ge7bQnuzmSUVlVoolUU/pVYfabs = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=nzv2tVNeo4VKex7EJ7xXvIcGZbM=; b=NZC6bfkBKsFrBMG+n NozO2XEWp/gtPAXUFf8sv/JJL/1g79eCh+BWHpII+JltPBNLUCDU8b8oBmMZpv8g wgVUXMhlQYrPKTM9RN1jh0J9mS0WGgqKYbY5eOEom+9IKl+f41U84G1P7uWKjgkE 3ELW2JtSS0kAaIzhGo3vfamo7s= Received: (qmail 16410 invoked by alias); 14 Jun 2013 13:55:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 16399 invoked by uid 89); 14 Jun 2013 13:55:08 -0000 X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Fri, 14 Jun 2013 13:55:07 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 14 Jun 2013 14:55:04 +0100 Received: from [10.1.203.77] ([10.1.255.212]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 14 Jun 2013 14:55:02 +0100 Message-ID: <51BB20B5.7050808@arm.com> Date: Fri, 14 Jun 2013 14:55:01 +0100 From: Vidya Praveen User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: marcus.shawcroft@arm.com Subject: [AArch64] Support for SMLAL/SMLSL/UMLAL/UMLSL X-MC-Unique: 113061414550401901 X-Virus-Found: No Hello, This patch adds support to SMLAL/SMLSL/UMLAL/UMLSL instructions and adds tests for the same. Regression test run for aarch64-none-elf with no regressions. OK? ~VP --- gcc/ChangeLog 2013-06-14 Vidya Praveen * config/aarch64/aarch64-simd.md (*aarch64_mlal_lo): New pattern to support SMLAL,UMLAL instructions. * config/aarch64/aarch64-simd.md (*aarch64_mlal_hi): New pattern to support SMLAL2,UMLAL2 instructions. * config/aarch64/aarch64-simd.md (*aarch64_mlsl_lo): New pattern to support SMLSL,UMLSL instructions. * config/aarch64/aarch64-simd.md (*aarch64_mlsl_hi): New pattern to support SMLSL2,UMLSL2 instructions. * config/aarch64/aarch64-simd.md (*aarch64_mlal): New pattern to support SMLAL/UMLAL instructions for 64 bit vector modes. * config/aarch64/aarch64-simd.md (*aarch64_mlsl): New pattern to support SMLSL/UMLSL instructions for 64 bit vector modes. gcc/testsuite/ChangeLog 2013-06-14 Vidya Praveen * gcc.target/aarch64/vect_smlal_1.c: New file. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index e5990d4..8589476 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1190,6 +1190,104 @@ ;; Widening arithmetic. +(define_insn "*aarch64_mlal_lo" + [(set (match_operand: 0 "register_operand" "=w") + (plus: + (mult: + (ANY_EXTEND: (vec_select: + (match_operand:VQW 2 "register_operand" "w") + (match_operand:VQW 3 "vect_par_cnst_lo_half" ""))) + (ANY_EXTEND: (vec_select: + (match_operand:VQW 4 "register_operand" "w") + (match_dup 3)))) + (match_operand: 1 "register_operand" "0")))] + "TARGET_SIMD" + "mlal\t%0., %2., %4." + [(set_attr "simd_type" "simd_mlal") + (set_attr "simd_mode" "")] +) + +(define_insn "*aarch64_mlal_hi" + [(set (match_operand: 0 "register_operand" "=w") + (plus: + (mult: + (ANY_EXTEND: (vec_select: + (match_operand:VQW 2 "register_operand" "w") + (match_operand:VQW 3 "vect_par_cnst_hi_half" ""))) + (ANY_EXTEND: (vec_select: + (match_operand:VQW 4 "register_operand" "w") + (match_dup 3)))) + (match_operand: 1 "register_operand" "0")))] + "TARGET_SIMD" + "mlal2\t%0., %2., %4." + [(set_attr "simd_type" "simd_mlal") + (set_attr "simd_mode" "")] +) + +(define_insn "*aarch64_mlsl_lo" + [(set (match_operand: 0 "register_operand" "=w") + (minus: + (match_operand: 1 "register_operand" "0") + (mult: + (ANY_EXTEND: (vec_select: + (match_operand:VQW 2 "register_operand" "w") + (match_operand:VQW 3 "vect_par_cnst_lo_half" ""))) + (ANY_EXTEND: (vec_select: + (match_operand:VQW 4 "register_operand" "w") + (match_dup 3))))))] + "TARGET_SIMD" + "mlsl\t%0., %2., %4." + [(set_attr "simd_type" "simd_mlal") + (set_attr "simd_mode" "")] +) + +(define_insn "*aarch64_mlsl_hi" + [(set (match_operand: 0 "register_operand" "=w") + (minus: + (match_operand: 1 "register_operand" "0") + (mult: + (ANY_EXTEND: (vec_select: + (match_operand:VQW 2 "register_operand" "w") + (match_operand:VQW 3 "vect_par_cnst_hi_half" ""))) + (ANY_EXTEND: (vec_select: + (match_operand:VQW 4 "register_operand" "w") + (match_dup 3))))))] + "TARGET_SIMD" + "mlsl2\t%0., %2., %4." + [(set_attr "simd_type" "simd_mlal") + (set_attr "simd_mode" "")] +) + +(define_insn "*aarch64_mlal" + [(set (match_operand: 0 "register_operand" "=w") + (plus: + (mult: + (ANY_EXTEND: + (match_operand:VDW 1 "register_operand" "w")) + (ANY_EXTEND: + (match_operand:VDW 2 "register_operand" "w"))) + (match_operand: 3 "register_operand" "0")))] + "TARGET_SIMD" + "mlal\t%0., %1., %2." + [(set_attr "simd_type" "simd_mlal") + (set_attr "simd_mode" "")] +) + +(define_insn "*aarch64_mlsl" + [(set (match_operand: 0 "register_operand" "=w") + (minus: + (match_operand: 1 "register_operand" "0") + (mult: + (ANY_EXTEND: + (match_operand:VDW 2 "register_operand" "w")) + (ANY_EXTEND: + (match_operand:VDW 3 "register_operand" "w")))))] + "TARGET_SIMD" + "mlsl\t%0., %2., %3." + [(set_attr "simd_type" "simd_mlal") + (set_attr "simd_mode" "")] +) + (define_insn "aarch64_simd_vec_mult_lo_" [(set (match_operand: 0 "register_operand" "=w") (mult: (ANY_EXTEND: (vec_select: diff --git a/gcc/testsuite/gcc.target/aarch64/vect_smlal_1.c b/gcc/testsuite/gcc.target/aarch64/vect_smlal_1.c new file mode 100644 index 0000000..1f86eae --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect_smlal_1.c @@ -0,0 +1,325 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -fno-inline -save-temps -fno-vect-cost-model" } */ + +typedef signed char S8_t; +typedef signed short S16_t; +typedef signed int S32_t; +typedef signed long S64_t; +typedef signed char *__restrict__ pS8_t; +typedef signed short *__restrict__ pS16_t; +typedef signed int *__restrict__ pS32_t; +typedef signed long *__restrict__ pS64_t; +typedef unsigned char U8_t; +typedef unsigned short U16_t; +typedef unsigned int U32_t; +typedef unsigned long U64_t; +typedef unsigned char *__restrict__ pU8_t; +typedef unsigned short *__restrict__ pU16_t; +typedef unsigned int *__restrict__ pU32_t; +typedef unsigned long *__restrict__ pU64_t; + +extern void abort (); + +void +test_addS64_tS32_t4 (pS64_t a, pS32_t b, pS32_t c) +{ + int i; + for (i = 0; i < 4; i++) + a[i] += (S64_t) b[i] * (S64_t) c[i]; +} + +/* { dg-final { scan-assembler "smlal\tv\[0-9\]+\.2d" } } */ +/* { dg-final { scan-assembler "smlal2\tv\[0-9\]+\.2d" } } */ + +void +test_addS32_tS16_t8 (pS32_t a, pS16_t b, pS16_t c) +{ + int i; + for (i = 0; i < 8; i++) + a[i] += (S32_t) b[i] * (S32_t) c[i]; +} + +/* { dg-final { scan-assembler "smlal\tv\[0-9\]+\.4s" } } */ +/* { dg-final { scan-assembler "smlal2\tv\[0-9\]+\.4s" } } */ + +void +test_addS16_tS8_t16 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] += (S16_t) b[i] * (S16_t) c[i]; +} + +void +test_addS16_tS8_t16_neg0 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] += (S16_t) -b[i] * (S16_t) -c[i]; +} + +void +test_addS16_tS8_t16_neg1 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] -= (S16_t) b[i] * (S16_t) -c[i]; +} + +void +test_addS16_tS8_t16_neg2 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] -= (S16_t) -b[i] * (S16_t) c[i]; +} + +/* { dg-final { scan-assembler-times "smlal\tv\[0-9\]+\.8h" 4 } } */ +/* { dg-final { scan-assembler-times "smlal2\tv\[0-9\]+\.8h" 4 } } */ + +void +test_subS64_tS32_t4 (pS64_t a, pS32_t b, pS32_t c) +{ + int i; + for (i = 0; i < 4; i++) + a[i] -= (S64_t) b[i] * (S64_t) c[i]; +} + +/* { dg-final { scan-assembler "smlsl\tv\[0-9\]+\.2d" } } */ +/* { dg-final { scan-assembler "smlsl2\tv\[0-9\]+\.2d" } } */ + +void +test_subS32_tS16_t8 (pS32_t a, pS16_t b, pS16_t c) +{ + int i; + for (i = 0; i < 8; i++) + a[i] -= (S32_t) b[i] * (S32_t) c[i]; +} + +/* { dg-final { scan-assembler "smlsl\tv\[0-9\]+\.4s" } } */ +/* { dg-final { scan-assembler "smlsl2\tv\[0-9\]+\.4s" } } */ + +void +test_subS16_tS8_t16 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] -= (S16_t) b[i] * (S16_t) c[i]; +} + +void +test_subS16_tS8_t16_neg0 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] += (S16_t) -b[i] * (S16_t) c[i]; +} + +void +test_subS16_tS8_t16_neg1 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] += (S16_t) b[i] * (S16_t) -c[i]; +} + +void +test_subS16_tS8_t16_neg2 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] += -((S16_t) b[i] * (S16_t) c[i]); +} + +void +test_subS16_tS8_t16_neg3 (pS16_t a, pS8_t b, pS8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] -= (S16_t) -b[i] * (S16_t) -c[i]; +} + +/* { dg-final { scan-assembler-times "smlsl\tv\[0-9\]+\.8h" 5 } } */ +/* { dg-final { scan-assembler-times "smlsl2\tv\[0-9\]+\.8h" 5 } } */ + +void +test_addU64_tU32_t4 (pU64_t a, pU32_t b, pU32_t c) +{ + int i; + for (i = 0; i < 4; i++) + a[i] += (U64_t) b[i] * (U64_t) c[i]; +} + +/* { dg-final { scan-assembler "umlal\tv\[0-9\]+\.2d" } } */ +/* { dg-final { scan-assembler "umlal2\tv\[0-9\]+\.2d" } } */ + +void +test_addU32_tU16_t8 (pU32_t a, pU16_t b, pU16_t c) +{ + int i; + for (i = 0; i < 8; i++) + a[i] += (U32_t) b[i] * (U32_t) c[i]; +} + +/* { dg-final { scan-assembler "umlal\tv\[0-9\]+\.4s" } } */ +/* { dg-final { scan-assembler "umlal2\tv\[0-9\]+\.4s" } } */ + +void +test_addU16_tU8_t16 (pU16_t a, pU8_t b, pU8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] += (U16_t) b[i] * (U16_t) c[i]; +} + +/* { dg-final { scan-assembler "umlal\tv\[0-9\]+\.8h" } } */ +/* { dg-final { scan-assembler "umlal2\tv\[0-9\]+\.8h" } } */ + +void +test_subU64_tU32_t4 (pU64_t a, pU32_t b, pU32_t c) +{ + int i; + for (i = 0; i < 4; i++) + a[i] -= (U64_t) b[i] * (U64_t) c[i]; +} + +/* { dg-final { scan-assembler "umlsl\tv\[0-9\]+\.2d" } } */ +/* { dg-final { scan-assembler "umlsl2\tv\[0-9\]+\.2d" } } */ + +void +test_subU32_tU16_t8 (pU32_t a, pU16_t b, pU16_t c) +{ + int i; + for (i = 0; i < 8; i++) + a[i] -= (U32_t) b[i] * (U32_t) c[i]; +} + +/* { dg-final { scan-assembler "umlsl\tv\[0-9\]+\.4s" } } */ +/* { dg-final { scan-assembler "umlsl2\tv\[0-9\]+\.4s" } } */ + +void +test_subU16_tU8_t16 (pU16_t a, pU8_t b, pU8_t c) +{ + int i; + for (i = 0; i < 16; i++) + a[i] -= (U16_t) b[i] * (U16_t) c[i]; +} + +/* { dg-final { scan-assembler "umlsl\tv\[0-9\]+\.8h" } } */ +/* { dg-final { scan-assembler "umlsl2\tv\[0-9\]+\.8h" } } */ + + +S64_t add_rS64[4] = { 6, 7, -4, -3 }; +S32_t add_rS32[8] = { 6, 7, -4, -3, 10, 11, 0, 1 }; +S16_t add_rS16[16] = + { 6, 7, -4, -3, 10, 11, 0, 1, 14, 15, 4, 5, 18, 19, 8, 9 }; + +S64_t sub_rS64[4] = { 0, 1, 2, 3 }; +S32_t sub_rS32[8] = { 0, 1, 2, 3, 4, 5, 6, 7 }; +S16_t sub_rS16[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; + +U64_t add_rU64[4] = { 0x6, 0x7, 0x2fffffffc, 0x2fffffffd }; + +U32_t add_rU32[8] = +{ + 0x6, 0x7, 0x2fffc, 0x2fffd, + 0xa, 0xb, 0x30000, 0x30001 +}; + +U16_t add_rU16[16] = +{ + 0x6, 0x7, 0x2fc, 0x2fd, 0xa, 0xb, 0x300, 0x301, + 0xe, 0xf, 0x304, 0x305, 0x12, 0x13, 0x308, 0x309 +}; + +U64_t sub_rU64[4] = { 0, 1, 2, 3 }; +U32_t sub_rU32[8] = { 0, 1, 2, 3, 4, 5, 6, 7 }; +U16_t sub_rU16[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; + +S8_t neg_r[16] = { -6, -5, 8, 9, -2, -1, 12, 13, 2, 3, 16, 17, 6, 7, 20, 21 }; + +S64_t S64_ta[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; +S32_t S32_tb[16] = { 2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2 }; +S32_t S32_tc[16] = { 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3 }; + +S32_t S32_ta[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; +S16_t S16_tb[16] = { 2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2 }; +S16_t S16_tc[16] = { 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3 }; + +S16_t S16_ta[16] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; +S8_t S8_tb[16] = { 2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2, 2, 2, -2, -2 }; +S8_t S8_tc[16] = { 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3 }; + + +#define CHECK(T,N,AS,US) \ +do \ + { \ + for (i = 0; i < N; i++) \ + if (S##T##_ta[i] != AS##_r##US##T[i]) \ + abort (); \ + } \ +while (0) + +#define SCHECK(T,N,AS) CHECK(T,N,AS,S) +#define UCHECK(T,N,AS) CHECK(T,N,AS,U) + +#define NCHECK(RES) \ +do \ + { \ + for (i = 0; i < 16; i++) \ + if (S16_ta[i] != RES[i]) \ + abort (); \ + } \ +while (0) + + +int +main () +{ + int i; + + test_addS64_tS32_t4 (S64_ta, S32_tb, S32_tc); + SCHECK (64, 4, add); + test_addS32_tS16_t8 (S32_ta, S16_tb, S16_tc); + SCHECK (32, 8, add); + test_addS16_tS8_t16 (S16_ta, S8_tb, S8_tc); + SCHECK (16, 16, add); + test_subS64_tS32_t4 (S64_ta, S32_tb, S32_tc); + SCHECK (64, 4, sub); + test_subS32_tS16_t8 (S32_ta, S16_tb, S16_tc); + SCHECK (32, 8, sub); + test_subS16_tS8_t16 (S16_ta, S8_tb, S8_tc); + SCHECK (16, 16, sub); + + test_addU64_tU32_t4 (S64_ta, S32_tb, S32_tc); + UCHECK (64, 4, add); + test_addU32_tU16_t8 (S32_ta, S16_tb, S16_tc); + UCHECK (32, 8, add); + test_addU16_tU8_t16 (S16_ta, S8_tb, S8_tc); + UCHECK (16, 16, add); + test_subU64_tU32_t4 (S64_ta, S32_tb, S32_tc); + UCHECK (64, 4, sub); + test_subU32_tU16_t8 (S32_ta, S16_tb, S16_tc); + UCHECK (32, 8, sub); + test_subU16_tU8_t16 (S16_ta, S8_tb, S8_tc); + UCHECK (16, 16, sub); + + test_addS16_tS8_t16_neg0 (S16_ta, S8_tb, S8_tc); + NCHECK (add_rS16); + test_subS16_tS8_t16_neg0 (S16_ta, S8_tb, S8_tc); + NCHECK (sub_rS16); + test_addS16_tS8_t16_neg1 (S16_ta, S8_tb, S8_tc); + NCHECK (add_rS16); + test_subS16_tS8_t16_neg1 (S16_ta, S8_tb, S8_tc); + NCHECK (sub_rS16); + test_addS16_tS8_t16_neg2 (S16_ta, S8_tb, S8_tc); + NCHECK (add_rS16); + test_subS16_tS8_t16_neg2 (S16_ta, S8_tb, S8_tc); + NCHECK (sub_rS16); + test_subS16_tS8_t16_neg3 (S16_ta, S8_tb, S8_tc); + NCHECK (neg_r); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */