From patchwork Wed Oct 30 18:01:54 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Hou X-Patchwork-Id: 287315 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 5E77B2C039D for ; Thu, 31 Oct 2013 05:02:08 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=iFCyyUgof5+5TmQZws 1s3FjqIHPRpie/t8vdrIbz1aMF/p0VQsubS163Pc9cn0xPAqHGzbvpMvhNMep1uc 8l6JwD/NgtmtyhGeWv3ROlh3G0XIQU7OdKgpL+M0qiQmMN95oFq3nr7PzIoBaoCB W6zPug1bQmuazVVWXWiyRytOc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=LReIiinf25EuJ0zuFLUCuORt YyE=; b=g/Mdf0loB/eWh2dF9njgsP0Shbe5qWVt8fwQVS1semSk4VkFriYc27tP 3KCgX2D5xQPO5OzK5UF0ZKOJ92No8Oj4GCqpVC79Sp9vY+Th2oO3tTS7LqiDuOH1 S2LrBUKJ7rfLcQWMOde6zuRE/wWKGT4dwdtF00oOzrYCrFce2JA= Received: (qmail 5577 invoked by alias); 30 Oct 2013 18:02:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5562 invoked by uid 89); 30 Oct 2013 18:01:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ie0-f180.google.com Received: from mail-ie0-f180.google.com (HELO mail-ie0-f180.google.com) (209.85.223.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Wed, 30 Oct 2013 18:01:57 +0000 Received: by mail-ie0-f180.google.com with SMTP id e14so2874091iej.39 for ; Wed, 30 Oct 2013 11:01:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=1Ol7V4tfktcILIw/01S/HGyKm4qCY7u4iWeemxVuCW8=; b=m74Bg0jwCuAA1NIMKDcHWbHpK/zFyBov4/6SWTL/7bfWGO3lkZSQdGmQRUDO0mShN7 W27KuD3vJSL56nJXvVcpvSz5Tf4vQqdCuKfqu9b2nGbnsgFwmb19Wwi+z+CVF3u3Jwg8 450FmXfSI7B95oiWdXf56R9mg/nfQeq32uOYdnlkAY3LMRCjtdkeyIRytrARvEUe+FUz bDqaWQcDNRtUGsJO03nbBDbI00OJ9EBjkS0pdKND6zoHKKzZIHXByURnB7u6nzKFMrgX NtIpsxzuv+22q/4VNRRnp+L0DbSI1dit8oAI0Xy1OUfQuGcjZi1FkbMuCOQlsmDownzg DasQ== X-Gm-Message-State: ALoCoQmJdyhjUP4LFGTpO/hh8BWVA0pe5bVnKnXWZ3GbRAv8D+SooWLMlXC/wS+KZzx0571IMBGeMCluKJuyEMTDd1SGYXwOKRoBAiiBQLXjb0qjWMDIhkU1h0Ck/79qGq9/eSr1yNZboi9HlxNj5kNdq1JWm2/A09bH19G2OFb3ZmU8NVQU0Fq8MC44ZTcFDBP8JF9hyWJz4jPfhhKyHXm6JfGemGSwdQ== MIME-Version: 1.0 X-Received: by 10.50.22.101 with SMTP id c5mr3408345igf.17.1383156115155; Wed, 30 Oct 2013 11:01:55 -0700 (PDT) Received: by 10.64.236.37 with HTTP; Wed, 30 Oct 2013 11:01:54 -0700 (PDT) In-Reply-To: References: Date: Wed, 30 Oct 2013 11:01:54 -0700 Message-ID: Subject: Re: [PATCH] Vectorizing abs(char/short/int) on x86. From: Cong Hou To: Uros Bizjak Cc: "gcc-patches@gcc.gnu.org" , Richard Biener , "Joseph S. Myers" On Wed, Oct 30, 2013 at 10:22 AM, Uros Bizjak wrote: > On Wed, Oct 30, 2013 at 6:01 PM, Cong Hou wrote: >> I found my problem: I put DONE outside of if not inside. You are >> right. I have updated my patch. > > OK, great that we put things in order ;) > > Does this patch need some extra middle-end functionality? I was not > able to vectorize char and short part of your patch. In the original patch, I converted abs() on short and char values to their own types by removing type casts. That is, originally char_val1 = abs(char_val2) will be converted to char_val1 = (char) abs((int) char_val2) in the frontend, and I would like to convert it back to char_val1 = abs(char_val2). But after several discussions, it seems this conversion has some problems such as overflow converns, and I thereby removed that part. Now you should still be able to vectorize abs(char) and abs(short) but with packing and unpacking. Later I will consider to write pattern recognizer for abs(char) and abs(short) and then the expand on abs(char)/abs(short) in this patch will be used during vectorization. > > Regarding the testcase - please put it to gcc.target/i386/ directory. > There is nothing generic in the test, as confirmed by target-dependent > scan test. You will find plenty of examples in the mentioned > directory. I'd suggest to split the testcase in three files, and to > simplify it to something like the testcase with global variables I > used earlier. I have done it. The test case is split into three for s8/s16/s32 in gcc.target/i386. Thank you! Cong > > Modulo testcase, the patch is OK otherwise, but middle-end parts > should be committed first. > > Thanks, > Uros. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..84c7ab5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-22 Cong Hou + + PR target/58762 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function. + * config/i386/i386.c (ix86_expand_sse2_abs): New function. + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int). + 2013-10-14 David Malcolm * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 3ab2f3a..ca31224 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_abs (rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 02cbbbd..71905fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) gen_rtx_MULT (mode, op1, op2)); } +void +ix86_expand_sse2_abs (rtx op0, rtx op1) +{ + enum machine_mode mode = GET_MODE (op0); + rtx tmp0, tmp1; + + switch (mode) + { + /* For 32-bit signed integer X, the best way to calculate the absolute + value of X is (((signed) X >> (W-1)) ^ X) - ((signed) X >> (W-1)). */ + case V4SImode: + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1, + GEN_INT (GET_MODE_BITSIZE + (GET_MODE_INNER (mode)) - 1), + NULL, 0, OPTAB_DIRECT); + if (tmp0) + tmp1 = expand_simple_binop (mode, XOR, op1, tmp0, + NULL, 0, OPTAB_DIRECT); + if (tmp0 && tmp1) + expand_simple_binop (mode, MINUS, tmp1, tmp0, + op0, 0, OPTAB_DIRECT); + break; + + /* For 16-bit signed integer X, the best way to calculate the absolute + value of X is max (X, -X), as SSE2 provides the PMAXSW insn. */ + case V8HImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + /* For 8-bit signed integer X, the best way to calculate the absolute + value of X is min ((unsigned char) X, (unsigned char) (-X)), + as SSE2 provides the PMINUB insn. */ + case V16QImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + default: + break; + } +} + /* Expand an insert into a vector register through pinsr insn. Return true if successful. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..46e1df4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8721,7 +8721,7 @@ (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI")]) -(define_insn "abs2" +(define_insn "*abs2" [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v") (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))] @@ -8733,6 +8733,19 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) +(define_expand "abs2" + [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand") + (abs:VI124_AVX2_48_AVX512F + (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand")))] + "TARGET_SSE2" +{ + if (!TARGET_SSSE3) + { + ix86_expand_sse2_abs (operands[0], operands[1]); + DONE; + } +}) + (define_insn "abs2" [(set (match_operand:MMXMODEI 0 "register_operand" "=y") (abs:MMXMODEI diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..c6a20c7 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2013-10-22 Cong Hou + + * gcc.target/i386/vect-abs-s8.c: New test. + * gcc.target/i386/vect-abs-s16.c: New test. + * gcc.target/i386/vect-abs-s32.c: New test. + 2013-10-14 Tobias Burnus PR fortran/58658 diff --git a/gcc/testsuite/gcc.target/i386/vect-abs-s16.c b/gcc/testsuite/gcc.target/i386/vect-abs-s16.c new file mode 100644 index 0000000..191ae34 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-abs-s16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse2 -mno-sse3 -fdump-tree-vect-details" } */ + + +void test (short* a, short* b) +{ + int i; + for (i = 0; i < 10000; ++i) + a[i] = abs (b[i]); +} + + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-abs-s32.c b/gcc/testsuite/gcc.target/i386/vect-abs-s32.c new file mode 100644 index 0000000..575e8ef --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-abs-s32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse2 -mno-sse3 -fdump-tree-vect-details" } */ + + +void test (int* a, int* b) +{ + int i; + for (i = 0; i < 10000; ++i) + a[i] = abs (b[i]); +} + + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-abs-s8.c b/gcc/testsuite/gcc.target/i386/vect-abs-s8.c new file mode 100644 index 0000000..3f3f3fa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-abs-s8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse2 -mno-sse3 -fdump-tree-vect-details" } */ + + +void test (char* a, char* b) +{ + int i; + for (i = 0; i < 10000; ++i) + a[i] = abs (b[i]); +} + + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..84c7ab5 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-22 Cong Hou + + PR target/58762 + * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function. + * config/i386/i386.c (ix86_expand_sse2_abs): New function. + * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int). + 2013-10-14 David Malcolm * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 3ab2f3a..ca31224 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_abs (rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 02cbbbd..71905fc 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) gen_rtx_MULT (mode, op1, op2)); } +void +ix86_expand_sse2_abs (rtx op0, rtx op1) +{ + enum machine_mode mode = GET_MODE (op0); + rtx tmp0, tmp1; + + switch (mode) + { + /* For 32-bit signed integer X, the best way to calculate the absolute + value of X is (((signed) X >> (W-1)) ^ X) - ((signed) X >> (W-1)). */ + case V4SImode: + tmp0 = expand_simple_binop (mode, ASHIFTRT, op1, + GEN_INT (GET_MODE_BITSIZE + (GET_MODE_INNER (mode)) - 1), + NULL, 0, OPTAB_DIRECT); + if (tmp0) + tmp1 = expand_simple_binop (mode, XOR, op1, tmp0, + NULL, 0, OPTAB_DIRECT); + if (tmp0 && tmp1) + expand_simple_binop (mode, MINUS, tmp1, tmp0, + op0, 0, OPTAB_DIRECT); + break; + + /* For 16-bit signed integer X, the best way to calculate the absolute + value of X is max (X, -X), as SSE2 provides the PMAXSW insn. */ + case V8HImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + /* For 8-bit signed integer X, the best way to calculate the absolute + value of X is min ((unsigned char) X, (unsigned char) (-X)), + as SSE2 provides the PMINUB insn. */ + case V16QImode: + tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0); + if (tmp0) + expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0, + OPTAB_DIRECT); + break; + + default: + break; + } +} + /* Expand an insert into a vector register through pinsr insn. Return true if successful. */ diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c3f6c94..46e1df4 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8721,7 +8721,7 @@ (set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)")) (set_attr "mode" "DI")]) -(define_insn "abs2" +(define_insn "*abs2" [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand" "=v") (abs:VI124_AVX2_48_AVX512F (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand" "vm")))] @@ -8733,6 +8733,19 @@ (set_attr "prefix" "maybe_vex") (set_attr "mode" "")]) +(define_expand "abs2" + [(set (match_operand:VI124_AVX2_48_AVX512F 0 "register_operand") + (abs:VI124_AVX2_48_AVX512F + (match_operand:VI124_AVX2_48_AVX512F 1 "nonimmediate_operand")))] + "TARGET_SSE2" +{ + if (!TARGET_SSSE3) + { + ix86_expand_sse2_abs (operands[0], operands[1]); + DONE; + } +}) + (define_insn "abs2" [(set (match_operand:MMXMODEI 0 "register_operand" "=y") (abs:MMXMODEI diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..c6a20c7 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2013-10-22 Cong Hou + + * gcc.target/i386/vect-abs-s8.c: New test. + * gcc.target/i386/vect-abs-s16.c: New test. + * gcc.target/i386/vect-abs-s32.c: New test. + 2013-10-14 Tobias Burnus PR fortran/58658 diff --git a/gcc/testsuite/gcc.target/i386/vect-abs-s16.c b/gcc/testsuite/gcc.target/i386/vect-abs-s16.c new file mode 100644 index 0000000..191ae34 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-abs-s16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse2 -mno-sse3 -fdump-tree-vect-details" } */ + + +void test (short* a, short* b) +{ + int i; + for (i = 0; i < 10000; ++i) + a[i] = abs (b[i]); +} + + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-abs-s32.c b/gcc/testsuite/gcc.target/i386/vect-abs-s32.c new file mode 100644 index 0000000..575e8ef --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-abs-s32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse2 -mno-sse3 -fdump-tree-vect-details" } */ + + +void test (int* a, int* b) +{ + int i; + for (i = 0; i < 10000; ++i) + a[i] = abs (b[i]); +} + + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-abs-s8.c b/gcc/testsuite/gcc.target/i386/vect-abs-s8.c new file mode 100644 index 0000000..3f3f3fa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-abs-s8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse2 -mno-sse3 -fdump-tree-vect-details" } */ + + +void test (char* a, char* b) +{ + int i; + for (i = 0; i < 10000; ++i) + a[i] = abs (b[i]); +} + + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */