From patchwork Tue Sep 22 23:52:25 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Collison X-Patchwork-Id: 521490 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id B20A914027C for ; Wed, 23 Sep 2015 09:52:41 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Jdiu9wuf; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; q= dns; s=default; b=ENGo+o2+i7xhD3ykVxzH3ew+F/8/22Il2zXv/M3fkbyCOY DK6/AFhXifwfHkk56549edpsz62iFrsTPRkKK2Jmd7bODAJK3KIgZ7tJqreKhmwG 9z3h5kWzgq3kwkLVntu9sA1r2zw5uk2h4Na2cK6yWp04XC7iarG9PtMKtg6ik= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:subject:content-type; s= default; bh=Ul+mka3bQETAjLzbfW1QblX78XI=; b=Jdiu9wufc6+Ehes2zIeQ Cdc0XvEeo6RdOZgSJsoPht6w2QhtnYYB4mBrZaws+70/ReHb6eNEtp9hFvSptS7t JytZt7Lj0889zbZbzbqSHoHc2F9GShX0vw7RjnbfQuqOZkqM+aOAkYw5MCnw1c8J CdyiR8qhPVuqj21ChyZ47cc= Received: (qmail 83698 invoked by alias); 22 Sep 2015 23:52:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 83686 invoked by uid 89); 22 Sep 2015 23:52:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f42.google.com Received: from mail-pa0-f42.google.com (HELO mail-pa0-f42.google.com) (209.85.220.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 22 Sep 2015 23:52:30 +0000 Received: by padhy16 with SMTP id hy16so22913286pad.1 for ; Tue, 22 Sep 2015 16:52:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:organization:user-agent :mime-version:to:subject:content-type; bh=vYliEOYXGA+v1zttfJdJRUUaAM0w7bEoP7Y6iM38wQA=; b=e7+/lp52UG2hy1cfhUVataDz40wOtwRtw1bPCA6jigliTYZEliqqQnk1WVrwCiHIKw RPR0GkvvJyGQPFhmI8PCjDQm/Iz+HUOqjcm7Yy7GPuXfdb3+XRD0Zf4CDZh0FMEnGVwR e/tecoJzCSGdIsQ9Kvd/TIyg4ImUh444lA62X6M00gH3JIq0nVIzLZXhKFjQazjro9bW iIgK5DzQGhYOkHHHmD9AHGS5X0qzi2Pqwi9NhQZoTvm4FEfeV5pxNWxK94lmFrMg4Vcr C4DOAYBCFOwTke6HuNat/UIaSipvahm3ZPVH96N8MC+QnHBQtqBO+C7xHIPRW+VG8C0D K2YQ== X-Gm-Message-State: ALoCoQnAAHBCvAVptXi9C5p1VdT0BgRw6mJuVVm0hxKadqsfPdllB2mE6b2MKJ2YUF5PvsOzYecB X-Received: by 10.68.96.197 with SMTP id du5mr33660508pbb.32.1442965947862; Tue, 22 Sep 2015 16:52:27 -0700 (PDT) Received: from [172.16.37.73] ([70.35.39.2]) by smtp.googlemail.com with ESMTPSA id wj12sm4381455pac.9.2015.09.22.16.52.26 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Sep 2015 16:52:27 -0700 (PDT) Message-ID: <5601E9B9.5060600@linaro.org> Date: Tue, 22 Sep 2015 16:52:25 -0700 From: Michael Collison User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: GCC Patches , Ramana.Radhakrishnan@arm.com Subject: Re: [ARM] Use vector wide add for mixed-mode adds This is a modified version of the previous patch that removes the documentation and read-md.c fixes. These patches have been submitted separately and approved. This patch is designed to address code that was not being vectorized due to missing widening patterns in the ARM backend. Code such as: int t6(int len, void * dummy, short * __restrict x) { len = len & ~31; int result = 0; __asm volatile (""); for (int i = 0; i < len; i++) result += x[i]; return result; } Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. 2015-09-22 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode add vectorization. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 654d9d5..54623fe 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,57 @@ ;; Widening operations +(define_expand "widen_ssum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_ssum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + +(define_insn "vec_sel_widen_ssum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + (define_insn "widen_ssum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (sign_extend: @@ -1184,4 +1235,55 @@ [(set_attr "type" "neon_add_widen")] ) +(define_expand "widen_usum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (zero_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_usum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_usum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_usum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + +(define_insn "vec_sel_widen_usum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c new file mode 100644 index 0000000..ed10669 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, short * __restrict x) +{ + len = len & ~31; + int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s16" } } */ + + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c new file mode 100644 index 0000000..94bf0c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, int * __restrict x) +{ + len = len & ~31; + long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s32" } } */ + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c new file mode 100644 index 0000000..98f8768 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, unsigned short * __restrict x) +{ + len = len & ~31; + unsigned int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c new file mode 100644 index 0000000..2e9af56 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, unsigned int * __restrict x) +{ + len = len & ~31; + unsigned long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c new file mode 100644 index 0000000..de2ad8a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, char * __restrict x) +{ + len = len & ~31; + unsigned short result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u8" } } */ + + + -- 1.9.1