From patchwork Mon Aug 9 02:53:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 1514897 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=ruSjKQR5; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Gjgdh50Hqz9sWX for ; Mon, 9 Aug 2021 12:53:59 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7E8BB384003C for ; Mon, 9 Aug 2021 02:53:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7E8BB384003C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1628477636; bh=UVUbJqdD/3LhJcd3e9do1fJUIIIReN3ceBXN/mF0uKQ=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=ruSjKQR5D4zOrQ7pRpRaCwO04tR5EHnSwA0mBgC8mrYNRc8MGs8RsOzG7scppYbjF HmHMVnpqQAMG4/jzikAVxbe0u1lZF1zIILkeBfQOQcBMvDCycnwZ+w0UWXNLB1nFxB f4tTnw+oAqS/KjppJ4QXMq2xRBfFvarjTruO8MU0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 1F8ED385E838 for ; Mon, 9 Aug 2021 02:53:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1F8ED385E838 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 1792ePDl075978; Sun, 8 Aug 2021 22:53:09 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 3aa9eyg2xs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 08 Aug 2021 22:53:08 -0400 Received: from m0098396.ppops.net (m0098396.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 1792expG076860; Sun, 8 Aug 2021 22:53:08 -0400 Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com with ESMTP id 3aa9eyg2x8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Sun, 08 Aug 2021 22:53:08 -0400 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1792m666015972; Mon, 9 Aug 2021 02:53:06 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma06ams.nl.ibm.com with ESMTP id 3a9hehb4ma-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 09 Aug 2021 02:53:05 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1792r3Hb41812420 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 9 Aug 2021 02:53:03 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 699B9AE04D; Mon, 9 Aug 2021 02:53:03 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D5918AE057; Mon, 9 Aug 2021 02:53:01 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.34]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Mon, 9 Aug 2021 02:53:01 +0000 (GMT) Subject: [PATCH v2] rs6000: Add vec_unpacku_{hi,lo}_v4si To: wschmidt@linux.ibm.com References: <0068d8dd-0e30-e78f-8893-dd24f0f5250a@linux.ibm.com> <48675437-01a4-af04-3cb0-8f68c675b9eb@linux.ibm.com> Message-ID: Date: Mon, 9 Aug 2021 10:53:00 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: <48675437-01a4-af04-3cb0-8f68c675b9eb@linux.ibm.com> Content-Language: en-US X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: ha0Uw-DPSQztSDZVYJLJFAaLGknkruSt X-Proofpoint-GUID: sT3sKGZXwLz0wmbkWxsQzRwdF0QC5FQu X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.790 definitions=2021-08-09_01:2021-08-06, 2021-08-09 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 impostorscore=0 lowpriorityscore=0 adultscore=0 mlxscore=0 phishscore=0 priorityscore=1501 malwarescore=0 bulkscore=0 spamscore=0 clxscore=1015 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108090019 X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: GCC Patches , David Edelsohn , Segher Boessenkool Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi Bill, Thanks for the comments! on 2021/8/6 下午9:10, Bill Schmidt wrote: > Hi Kewen, > > On 8/4/21 9:06 PM, Kewen.Lin wrote: >> Hi, >> >> The existing vec_unpacku_{hi,lo} supports emulated unsigned >> unpacking for short and char but misses the support for int. >> This patch adds the support for vec_unpacku_{hi,lo}_v4si. >> >> Meanwhile, the current implementation uses vector permutation >> way, which requires one extra customized constant vector as >> the permutation control vector.  It's better to use vector >> merge high/low with zero constant vector, to save the space >> in constant area as well as the cost to initialize pcv in >> prologue.  This patch updates it with vector merging and >> simplify it with iterators. >> >> Bootstrapped & regtested on powerpc64le-linux-gnu P9 and >> powerpc64-linux-gnu P8. >> >> btw, the loop in unpack-vectorize-2.c doesn't get vectorized >> without this patch, unpack-vectorize-[13]* is to verify >> the vector merging and simplification works expectedly. >> >> Is it ok for trunk? >> >> BR, >> Kewen >> ----- ... >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index d70c17e6bc2..0e8b66cd6a5 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -134,10 +134,8 @@ (define_c_enum "unspec" >>     UNSPEC_VMULWLUH >>     UNSPEC_VMULWHSH >>     UNSPEC_VMULWLSH >> -   UNSPEC_VUPKHUB >> -   UNSPEC_VUPKHUH >> -   UNSPEC_VUPKLUB >> -   UNSPEC_VUPKLUH >> +   UNSPEC_VUPKHUBHW >> +   UNSPEC_VUPKLUBHW > > > Up to you, but... maybe just UNSPEC_VUPKHU and UNSPEC_VUPKLU, in case we extend this later to other types.  Fine either way. > Good point! Fixed. >>     UNSPEC_VPERMSI >>     UNSPEC_VPERMHI >>     UNSPEC_INTERHI >> @@ -3885,143 +3883,45 @@ (define_insn "xxeval" >>     [(set_attr "type" "vecsimple") >>      (set_attr "prefixed" "yes")]) >> ... >> diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c >> new file mode 100644 >> index 00000000000..2621d753baa >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c >> @@ -0,0 +1,18 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target powerpc_altivec_ok } */ > > > I guess powerpc_altivec_ok is fine.  I was initially concerned since unpack-vectorize.h mentions vector long long, but the types aren't actually used here.  OK. > Yeah, I think it's fine since unpack-vectorize.h only typedef long long and it doesn't even have type vector long long. >> +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fdump-tree-vect-details" } */ >> + >> +/* Test if unpack vectorization succeeds for type signed/unsigned >> +   short and char.  */ >> + >> +#include "unpack-vectorize-1.h" >> + >> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ >> +/* { dg-final { scan-assembler {\mvupkhsb\M} } } */ >> +/* { dg-final { scan-assembler {\mvupklsb\M} } } */ >> +/* { dg-final { scan-assembler {\mvupkhsh\M} } } */ >> +/* { dg-final { scan-assembler {\mvupklsh\M} } } */ >> +/* { dg-final { scan-assembler {\mvmrghb\M} } } */ >> +/* { dg-final { scan-assembler {\mvmrglb\M} } } */ >> +/* { dg-final { scan-assembler {\mvmrghh\M} } } */ >> +/* { dg-final { scan-assembler {\mvmrglh\M} } } */ > > > Suggest that you consider scan-assembler-times 1 to make the tests more robust, here and for other tests. > Updated, thanks! I was worried that possible future unrolling tweaking can make the hardcoded times fragile and thought it might be trivial to check the times. "-fno-unroll-loops" has been added to disable unrolling explicitly as well. Re-tested on BE and LE, the test results looks fine. BR, Kewen ----- gcc/ChangeLog: * config/rs6000/altivec.md (vec_unpacku_hi_v16qi): Remove. (vec_unpacku_hi_v8hi): Likewise. (vec_unpacku_lo_v16qi): Likewise. (vec_unpacku_lo_v8hi): Likewise. (vec_unpacku_hi_): New define_expand. (vec_unpacku_lo_): Likewise. gcc/testsuite/ChangeLog: * gcc.target/powerpc/unpack-vectorize-1.c: New test. * gcc.target/powerpc/unpack-vectorize-1.h: New test. * gcc.target/powerpc/unpack-vectorize-2.c: New test. * gcc.target/powerpc/unpack-vectorize-2.h: New test. * gcc.target/powerpc/unpack-vectorize-3.c: New test. * gcc.target/powerpc/unpack-vectorize-3.h: New test. * gcc.target/powerpc/unpack-vectorize-run-1.c: New test. * gcc.target/powerpc/unpack-vectorize-run-2.c: New test. * gcc.target/powerpc/unpack-vectorize-run-3.c: New test. * gcc.target/powerpc/unpack-vectorize.h: New test. --- gcc/config/rs6000/altivec.md | 158 ++++-------------- .../gcc.target/powerpc/unpack-vectorize-1.c | 18 ++ .../gcc.target/powerpc/unpack-vectorize-1.h | 14 ++ .../gcc.target/powerpc/unpack-vectorize-2.c | 12 ++ .../gcc.target/powerpc/unpack-vectorize-2.h | 7 + .../gcc.target/powerpc/unpack-vectorize-3.c | 11 ++ .../gcc.target/powerpc/unpack-vectorize-3.h | 7 + .../powerpc/unpack-vectorize-run-1.c | 24 +++ .../powerpc/unpack-vectorize-run-2.c | 16 ++ .../powerpc/unpack-vectorize-run-3.c | 16 ++ .../gcc.target/powerpc/unpack-vectorize.h | 42 +++++ 11 files changed, 196 insertions(+), 129 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c create mode 100644 gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index d70c17e6bc2..5a4a824804b 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -134,10 +134,8 @@ (define_c_enum "unspec" UNSPEC_VMULWLUH UNSPEC_VMULWHSH UNSPEC_VMULWLSH - UNSPEC_VUPKHUB - UNSPEC_VUPKHUH - UNSPEC_VUPKLUB - UNSPEC_VUPKLUH + UNSPEC_VUPKHU + UNSPEC_VUPKLU UNSPEC_VPERMSI UNSPEC_VPERMHI UNSPEC_INTERHI @@ -3885,143 +3883,45 @@ (define_insn "xxeval" [(set_attr "type" "vecsimple") (set_attr "prefixed" "yes")]) -(define_expand "vec_unpacku_hi_v16qi" - [(set (match_operand:V8HI 0 "register_operand" "=v") - (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")] - UNSPEC_VUPKHUB))] - "TARGET_ALTIVEC" -{ - rtx vzero = gen_reg_rtx (V8HImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; - - emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 0 : 16); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 6); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 2 : 16); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 4); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 4 : 16); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 2); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 6 : 16); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 0); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); - - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); - DONE; -}) - -(define_expand "vec_unpacku_hi_v8hi" - [(set (match_operand:V4SI 0 "register_operand" "=v") - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")] - UNSPEC_VUPKHUH))] +(define_expand "vec_unpacku_hi_" + [(set (match_operand:VP 0 "register_operand" "=v") + (unspec:VP [(match_operand: 1 "register_operand" "v")] + UNSPEC_VUPKHU))] "TARGET_ALTIVEC" { - rtx vzero = gen_reg_rtx (V4SImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; + rtx vzero = gen_reg_rtx (mode); + emit_insn (gen_altivec_vspltis (vzero, const0_rtx)); - emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 7); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 6); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 0 : 17); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 1 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 5); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 4); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 2 : 17); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 3 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 3); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 2); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 4 : 17); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 5 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 1); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 0); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 6 : 17); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 7 : 16); - - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); - DONE; -}) + rtx res = gen_reg_rtx (mode); + rtx op1 = operands[1]; -(define_expand "vec_unpacku_lo_v16qi" - [(set (match_operand:V8HI 0 "register_operand" "=v") - (unspec:V8HI [(match_operand:V16QI 1 "register_operand" "v")] - UNSPEC_VUPKLUB))] - "TARGET_ALTIVEC" -{ - rtx vzero = gen_reg_rtx (V8HImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; - - emit_insn (gen_altivec_vspltish (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 8 : 16); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 16 : 14); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 10 : 16); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 16 : 12); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 12 : 16); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 16 : 10); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 14 : 16); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 16 : 8); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrgh (res, vzero, op1)); + else + emit_insn (gen_altivec_vmrgl (res, op1, vzero)); - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v16qiv8hi (operands[0], operands[1], vzero, mask)); + emit_insn (gen_move_insn (operands[0], gen_lowpart (mode, res))); DONE; }) -(define_expand "vec_unpacku_lo_v8hi" - [(set (match_operand:V4SI 0 "register_operand" "=v") - (unspec:V4SI [(match_operand:V8HI 1 "register_operand" "v")] - UNSPEC_VUPKLUH))] +(define_expand "vec_unpacku_lo_" + [(set (match_operand:VP 0 "register_operand" "=v") + (unspec:VP [(match_operand: 1 "register_operand" "v")] + UNSPEC_VUPKLU))] "TARGET_ALTIVEC" { - rtx vzero = gen_reg_rtx (V4SImode); - rtx mask = gen_reg_rtx (V16QImode); - rtvec v = rtvec_alloc (16); - bool be = BYTES_BIG_ENDIAN; + rtx vzero = gen_reg_rtx (mode); + emit_insn (gen_altivec_vspltis (vzero, const0_rtx)); - emit_insn (gen_altivec_vspltisw (vzero, const0_rtx)); - - RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (QImode, be ? 16 : 15); - RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (QImode, be ? 17 : 14); - RTVEC_ELT (v, 2) = gen_rtx_CONST_INT (QImode, be ? 8 : 17); - RTVEC_ELT (v, 3) = gen_rtx_CONST_INT (QImode, be ? 9 : 16); - RTVEC_ELT (v, 4) = gen_rtx_CONST_INT (QImode, be ? 16 : 13); - RTVEC_ELT (v, 5) = gen_rtx_CONST_INT (QImode, be ? 17 : 12); - RTVEC_ELT (v, 6) = gen_rtx_CONST_INT (QImode, be ? 10 : 17); - RTVEC_ELT (v, 7) = gen_rtx_CONST_INT (QImode, be ? 11 : 16); - RTVEC_ELT (v, 8) = gen_rtx_CONST_INT (QImode, be ? 16 : 11); - RTVEC_ELT (v, 9) = gen_rtx_CONST_INT (QImode, be ? 17 : 10); - RTVEC_ELT (v, 10) = gen_rtx_CONST_INT (QImode, be ? 12 : 17); - RTVEC_ELT (v, 11) = gen_rtx_CONST_INT (QImode, be ? 13 : 16); - RTVEC_ELT (v, 12) = gen_rtx_CONST_INT (QImode, be ? 16 : 9); - RTVEC_ELT (v, 13) = gen_rtx_CONST_INT (QImode, be ? 17 : 8); - RTVEC_ELT (v, 14) = gen_rtx_CONST_INT (QImode, be ? 14 : 17); - RTVEC_ELT (v, 15) = gen_rtx_CONST_INT (QImode, be ? 15 : 16); + rtx res = gen_reg_rtx (mode); + rtx op1 = operands[1]; - emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v))); - emit_insn (gen_vperm_v8hiv4si (operands[0], operands[1], vzero, mask)); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrgl (res, vzero, op1)); + else + emit_insn (gen_altivec_vmrgh (res, op1, vzero)); + + emit_insn (gen_move_insn (operands[0], gen_lowpart (mode, res))); DONE; }) diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c new file mode 100644 index 00000000000..dceb5b89bd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */ + +/* Test if unpack vectorization succeeds for type signed/unsigned + short and char. */ + +#include "unpack-vectorize-1.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" } } */ +/* { dg-final { scan-assembler-times {\mvupkhsb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvupklsb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvupkhsh\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvupklsh\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmrghb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmrglb\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmrghh\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mvmrglh\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h new file mode 100644 index 00000000000..1cb89aba392 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-1.h @@ -0,0 +1,14 @@ +#include "unpack-vectorize.h" + +DEF_ARR (si) +DEF_ARR (ui) +DEF_ARR (sh) +DEF_ARR (uh) +DEF_ARR (sc) +DEF_ARR (uc) + +TEST1 (sh, si) +TEST1 (uh, ui) +TEST1 (sc, sh) +TEST1 (uc, uh) + diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c new file mode 100644 index 00000000000..4f2e6ebb07b --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */ + +/* Test if unsigned int unpack vectorization succeeds. V2DImode is + supported since Power7 so guard it under Power7 and up. */ + +#include "unpack-vectorize-2.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-assembler-times {\mxxmrghw\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mxxmrglw\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h new file mode 100644 index 00000000000..e199229e6f7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-2.h @@ -0,0 +1,7 @@ +#include "unpack-vectorize.h" + +DEF_ARR (ui) +DEF_ARR (ull) + +TEST1 (ui, ull) + diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c new file mode 100644 index 00000000000..520a279ac1c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_p8vector_ok } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model -fno-unroll-loops -fdump-tree-vect-details" } */ + +/* Test if signed int unpack vectorization succeeds. */ + +#include "unpack-vectorize-3.h" + +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ +/* { dg-final { scan-assembler-times {\mvupkhsw\M} 1 } } */ +/* { dg-final { scan-assembler-times {\mvupklsw\M} 1 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h new file mode 100644 index 00000000000..6a5191d28a7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-3.h @@ -0,0 +1,7 @@ +#include "unpack-vectorize.h" + +DEF_ARR (si) +DEF_ARR (sll) + +TEST1 (si, sll) + diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c new file mode 100644 index 00000000000..51f0e67524f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-1.c @@ -0,0 +1,24 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-options "-maltivec -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "unpack-vectorize-1.h" + +/* Test if unpack vectorization cases on signed/unsigned short and char + run successfully. */ + +CHECK1 (sh, si) +CHECK1 (uh, ui) +CHECK1 (sc, sh) +CHECK1 (uc, uh) + +int +main () +{ + check1_sh_si (); + check1_uh_ui (); + check1_sc_sh (); + check1_uc_uh (); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c new file mode 100644 index 00000000000..6d243602bbf --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-2.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vsx_hw } */ +/* { dg-options "-mdejagnu-cpu=power7 -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "unpack-vectorize-2.h" + +/* Test if unpack vectorization cases on unsigned int run successfully. */ + +CHECK1 (ui, ull) + +int +main () +{ + check1_ui_ull (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c new file mode 100644 index 00000000000..fec33c46abc --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize-run-3.c @@ -0,0 +1,16 @@ +/* { dg-do run } */ +/* { dg-require-effective-target p8vector_hw } */ +/* { dg-options "-mdejagnu-cpu=power8 -O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include "unpack-vectorize-3.h" + +/* Test if unpack vectorization cases on signed int run successfully. */ + +CHECK1 (si, sll) + +int +main () +{ + check1_si_sll (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h new file mode 100644 index 00000000000..11fa7d4aa6f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/unpack-vectorize.h @@ -0,0 +1,42 @@ +typedef signed long long sll; +typedef unsigned long long ull; +typedef signed int si; +typedef unsigned int ui; +typedef signed short sh; +typedef unsigned short uh; +typedef signed char sc; +typedef unsigned char uc; + +#ifndef ALIGN +#define ALIGN 32 +#endif + +#define ALIGN_ATTR __attribute__((__aligned__(ALIGN))) + +#define N 128 + +#define DEF_ARR(TYPE) \ + TYPE TYPE##_a[N] ALIGN_ATTR; \ + TYPE TYPE##_b[N] ALIGN_ATTR; \ + TYPE TYPE##_c[N] ALIGN_ATTR; + +#define TEST1(NTYPE, WTYPE) \ + __attribute__((noipa)) void test1_##NTYPE##_##WTYPE() { \ + for (int i = 0; i < N; i++) \ + WTYPE##_c[i] = NTYPE##_a[i] + NTYPE##_b[i]; \ + } + +#define CHECK1(NTYPE, WTYPE) \ + __attribute__((noipa, optimize(0))) void check1_##NTYPE##_##WTYPE() { \ + for (int i = 0; i < N; i++) { \ + NTYPE##_a[i] = 2 * i * sizeof(NTYPE) + 10; \ + NTYPE##_b[i] = 7 * i * sizeof(NTYPE) / 5 - 10; \ + } \ + test1_##NTYPE##_##WTYPE(); \ + for (int i = 0; i < N; i++) { \ + WTYPE exp = NTYPE##_a[i] + NTYPE##_b[i]; \ + if (WTYPE##_c[i] != exp) \ + __builtin_abort(); \ + } \ + } +