From patchwork Fri Nov 14 16:38:09 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jiong Wang X-Patchwork-Id: 410907 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 80FA91400E9 for ; Sat, 15 Nov 2014 03:38:30 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; q= dns; s=default; b=VkaKuJwXrlPdijLpA2iVyZiMPAPmS7rl8VjQxfoOGPYzNq lRvBbRHx5ll66a0xmMDFt4z1qBSNGkWZxHGkPZ3RV2JvCj0K5yKgi6dcS2bxzVRW BabaBsCDgKKDOC7tMwNzEEfXg+QqoDyDx6OPiR3f0gU4LRDbvfbgfqsnBxIAo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; s= default; bh=zDPyfAU7PE6r9+Qddxuyt0QZl3o=; b=h2QOur0CzkxGmE0M7dUu Pexd+2ld3wer04WY6q1ezDNQ0lIOVjByabrgKc8joGxy62+MGuf7vLLA06DHv070 v2r8QTXx/oRDjbGHr+zALenOGoFk7CQwL4T+NxzZHsaIFqq9UdjJfwdn3cpWw4bi P65c3OZoc4a7py90otFEhE8= Received: (qmail 27678 invoked by alias); 14 Nov 2014 16:38:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27669 invoked by uid 89); 14 Nov 2014 16:38:23 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 14 Nov 2014 16:38:17 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 14 Nov 2014 16:38:12 +0000 Received: from [10.1.205.157] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 14 Nov 2014 16:38:10 +0000 Message-ID: <54662FF1.1030503@arm.com> Date: Fri, 14 Nov 2014 16:38:09 +0000 From: Jiong Wang User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.1 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: [PATCH][AArch64] Add vector pattern for __builtin_ctz X-MC-Unique: 114111416381207401 X-IsSubscribed: yes This patch add vector pattern for __builtin_ctz. like __builtin_clz, only 32bit version of ctz supported. for scalar version ctz, we expand it into: rbit clz reverse bits first, then turn cout tailing zero into count leading zero. while for vector version, rbit only support byte granularity .8B and .16B. no half-word, and word. so we need to first reverse byte within word, then reverse bits within byte. thus the generated instruction sequences are: void count_tz_v4si (unsigned *__restrict a, int *__restrict b) { int i; for (i = 0; i < 4; i++) b[i] = __builtin_ctz (a[i]); } void count_tz_v2si (unsigned *__restrict a, int *__restrict b) { int i; for (i = 0; i < 2; i++) b[i] = __builtin_ctz (a[i]); } count_tz_v4si: ldr q0, [x0] rev32 v0.16b, v0.16b rbit v0.16b, v0.16b clz v0.4s, v0.4s str q0, [x1] ret count_tz_v2si: ldr d0, [x0] rev32 v0.8b, v0.8b rbit v0.8b, v0.8b clz v0.2s, v0.2s str d0, [x1] ret no regression on aarch64-none-gnu-linux qemu test. ok for trunk? thanks. gcc/ * config/aarch64/iterators.md (VS): New mode iterator. (vsi2qi): New mode attribute. (VSI2QI): Likewise. * config/aarch64/aarch64-simd-builtins.def: New entry for ctz. * config/aarch64/aarch64-simd.md (ctz2): New pattern for ctz. * config/aarch64/aarch64-builtins.c (aarch64_builtin_vectorized_function): Support BUILT_IN_CTZ. gcc/testsuite/ * gcc.target/aarch64/vect_ctz_1.c: New testcase. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 527445c..3250f3c 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -1097,6 +1097,14 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in) return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_clzv4si]; return NULL_TREE; } + case BUILT_IN_CTZ: + { + if (AARCH64_CHECK_BUILTIN_MODE (2, S)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv2si]; + else if (AARCH64_CHECK_BUILTIN_MODE (4, S)) + return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv4si]; + return NULL_TREE; + } #undef AARCH64_CHECK_BUILTIN_MODE #define AARCH64_CHECK_BUILTIN_MODE(C, N) \ (out_mode == N##Imode && out_n == C \ diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 62b7f33..c611b5c 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -46,6 +46,7 @@ BUILTIN_VD_BHSI (BINOP, addp, 0) VAR1 (UNOP, addp, 0, di) BUILTIN_VDQ_BHSI (UNOP, clz, 2) + BUILTIN_VS (UNOP, ctz, 2) BUILTIN_VALL (GETLANE, be_checked_get_lane, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index ef196e4..5ee960f 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -303,6 +303,20 @@ [(set_attr "type" "neon_rbit")] ) +(define_expand "ctz2" + [(set (match_operand:VS 0 "register_operand") + (ctz:VS (match_operand:VS 1 "register_operand")))] + "TARGET_SIMD" + { + emit_insn (gen_bswap (operands[0], operands[1])); + rtx op0_castsi2qi = simplify_gen_subreg(mode, operands[0], + mode, 0); + emit_insn (gen_aarch64_rbit (op0_castsi2qi, op0_castsi2qi)); + emit_insn (gen_clz2 (operands[0], operands[0])); + DONE; + } +) + (define_insn "*aarch64_mul3_elt" [(set (match_operand:VMUL 0 "register_operand" "=w") (mult:VMUL diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 9935167..b416e6a 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -183,6 +183,9 @@ ;; All byte modes. (define_mode_iterator VB [V8QI V16QI]) +;; 2 and 4 lane SI modes. +(define_mode_iterator VS [V2SI V4SI]) + (define_mode_iterator TX [TI TF]) ;; Opaque structure modes. @@ -670,6 +673,9 @@ (V2DI "p") (V2DF "p") (V2SF "p") (V4SF "v")]) +(define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")]) +(define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")]) + ;; ------------------------------------------------------------------- ;; Code Iterators ;; ------------------------------------------------------------------- diff --git a/gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c b/gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c new file mode 100644 index 0000000..40823b0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect_ctz_1.c @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -save-temps -fno-inline" } */ + +extern void abort (); + +#define TEST(name, subname, count) \ +void \ +count_tz_##name (unsigned *__restrict a, int *__restrict b) \ +{ \ + int i; \ + for (i = 0; i < count; i++) \ + b[i] = __builtin_##subname (a[i]); \ +} + +#define CHECK(name, count, input, output) \ + count_tz_##name (input, output); \ + for (i = 0; i < count; i++) \ + { \ + if (output[i] != r[i]) \ + abort (); \ + } + +TEST (v4si, ctz, 4) +TEST (v2si, ctz, 2) +/* { dg-final { scan-assembler "clz\tv\[0-9\]+\.4s" } } */ +/* { dg-final { scan-assembler "clz\tv\[0-9\]+\.2s" } } */ + +int +main () +{ + unsigned int x4[4] = { 0x0, 0xFF80, 0x1FFFF, 0xFF000000 }; + int r[4] = { 32, 7, 0, 24 }; + int d[4], i; + + CHECK (v4si, 4, x4, d); + CHECK (v2si, 2, x4, d); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */