From patchwork Thu Jun 11 11:21:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitrij Pochepko X-Patchwork-Id: 1307463 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bell-sw.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=bell-sw-com.20150623.gappssmtp.com header.i=@bell-sw-com.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=2Ijx19M1; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 49jLzp3mfgz9sR4 for ; Thu, 11 Jun 2020 21:22:13 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 02D5A3938C35; Thu, 11 Jun 2020 11:22:10 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) by sourceware.org (Postfix) with ESMTPS id 074E53840C3D for ; Thu, 11 Jun 2020 11:22:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 074E53840C3D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=bell-sw.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=dmitrij.pochepko@bell-sw.com Received: by mail-lj1-x230.google.com with SMTP id y11so6445201ljm.9 for ; Thu, 11 Jun 2020 04:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bell-sw-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:subject:message-id:mime-version:content-disposition :user-agent; bh=iMicYDUS2Rcn5MYCvSz6vFBuWaJsK3eFIlPVDOF76YM=; b=2Ijx19M1d3lYPjWcAd0EjdtQC7x6he0TkVFVDOTsGn7sQ0cXQEboFE8Gai19mIkSJP tDxZTZFFQ0q9aqLpl/HOUOc85U4fRmXa0Q9B6+OXnf6taOqNEmFU7QjJ1AD0cSmwbxO2 /S8oPYRXIxN0eplSboLm4tQDNPYi9x5lBaEy6m506UXdW2oiTn5FNLSAOLUX5mXo5Yyb AOmIILLi1JQ6LkhAIhknbkv4AgQut8cADg9IgAgueUw3nV/afUl/sul9JAYlFqdllstX GXkUg7egohz9H2YJ0mNvzJHZEaOZYmhexhC4w6G++fPqCcB5IgRUMQQUzLKpTovqMwOk rWPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:user-agent; bh=iMicYDUS2Rcn5MYCvSz6vFBuWaJsK3eFIlPVDOF76YM=; b=B6TWvMLYgrOBVbtDi20H88XnO+FjLGuH/XcPEaZqCkDzxkfDLBi6LJArfZNSlFeleC 7H1/TmAJJgWuiKBwOYP1NMhJFTIRTt2umVme93zdOhICPvJslA4PJRceEkllX2CUmh5P 9tTjawEy9Z5oALepO4YZMa0h8fVHnV/XMd2frFube1ptgPqrIxKBkJzK5cSChldMHcba pxrBeI62Au+HofmQ4nmp49DB1m7Mw0U8r+agG+pND9wqqDEvzkmH55kGXbQXC91sYBbv gLi7kErdvf8VYHpZ65D7u+3W3npOh3rikWDfxHE6viKEOn6WJ50yVoxyy2y/p31ZIjPH 0t6A== X-Gm-Message-State: AOAM5315OAeBQluIqolKBrfxN5b6J3ZrL4wZDtsjIp4SNF81hdIvSp2B OCdEV1rqWio80xt69glMd6fEfAnAfQvT7w== X-Google-Smtp-Source: ABdhPJwC5MAh9s0WyQeMKns3xeMcR3UcWWG9jjSWGSmusxyZPYpy9Kiq8109iYXXStxe5QjlE+VxcA== X-Received: by 2002:a2e:b60c:: with SMTP id r12mr4266081ljn.316.1591874524221; Thu, 11 Jun 2020 04:22:04 -0700 (PDT) Received: from NOTE ([31.134.130.23]) by smtp.gmail.com with ESMTPSA id h24sm641500lji.115.2020.06.11.04.22.03 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Jun 2020 04:22:03 -0700 (PDT) Date: Thu, 11 Jun 2020 14:21:58 +0300 From: Dmitrij Pochepko To: gcc-patches@gcc.gnu.org Subject: [PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199) Message-ID: <20200611112158.GA23841@NOTE> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" The following patch enables vector permutations optimization by using another vector element size when applicable. It allows usage of simpler instructions in applicable cases. example: #define vector __attribute__((vector_size(16) )) vector float f(vector float a, vector float b) { return __builtin_shuffle (a, b, (vector int){0, 1, 4,5}); } was compiled into: ... adrp x0, .LC0 ldr q2, [x0, #:lo12:.LC0] tbl v0.16b, {v0.16b - v1.16b}, v2.16b ... and after patch: ... zip1 v0.2d, v0.2d, v1.2d ... bootstrapped and tested on aarch64-linux-gnu with no regressions This patch was initially introduced by Andrew Pinksi with me being involved later. (I have no write access to repo) Thanks, Dmitrij gcc/ChangeLog: 2020-06-11 Andrew Pinski PR gcc/82199 * gcc/config/aarch64/aarch64.c (aarch64_evpc_reencode): New function gcc/testsuite/ChangeLog: 2020-06-11 Andrew Pinski PR gcc/82199 * gcc.target/aarch64/vdup_n_3.c: New test * gcc.target/aarch64/vzip_1.c: New test * gcc.target/aarch64/vzip_2.c: New test * gcc.target/aarch64/vzip_3.c: New test * gcc.target/aarch64/vzip_4.c: New test Co-Authored-By: Dmitrij Pochepko Thanks, Dmitrij From 3c9f3fe834811386223755fc58e2ab4a612eefcf Mon Sep 17 00:00:00 2001 From: Dmitrij Pochepko Date: Thu, 11 Jun 2020 14:13:35 +0300 Subject: [PATCH] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199) The following patch enables vector permutations optimization by using another vector element size when applicable. It allows usage of simpler instructions in applicable cases. example: vector float f(vector float a, vector float b) { return __builtin_shuffle (a, b, (vector int){0, 1, 4,5}); } was compiled into: ... adrp x0, .LC0 ldr q2, [x0, #:lo12:.LC0] tbl v0.16b, {v0.16b - v1.16b}, v2.16b ... and after patch: ... zip1 v0.2d, v0.2d, v1.2d ... bootstrapped and tested on aarch64-linux-gnu with no regressions gcc/ChangeLog: 2020-06-11 Andrew Pinski PR gcc/82199 * gcc/config/aarch64/aarch64.c (aarch64_evpc_reencode): New function gcc/testsuite/ChangeLog: 2020-06-11 Andrew Pinski PR gcc/82199 * gcc.target/aarch64/vdup_n_3.c: New test * gcc.target/aarch64/vzip_1.c: New test * gcc.target/aarch64/vzip_2.c: New test * gcc.target/aarch64/vzip_3.c: New test * gcc.target/aarch64/vzip_4.c: New test Co-Authored-By: Dmitrij Pochepko --- gcc/config/aarch64/aarch64.c | 81 +++++++++++++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/vdup_n_3.c | 16 ++++++ gcc/testsuite/gcc.target/aarch64/vzip_1.c | 11 ++++ gcc/testsuite/gcc.target/aarch64/vzip_2.c | 12 +++++ gcc/testsuite/gcc.target/aarch64/vzip_3.c | 12 +++++ gcc/testsuite/gcc.target/aarch64/vzip_4.c | 12 +++++ 6 files changed, 144 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/vdup_n_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vzip_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vzip_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vzip_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/vzip_4.c diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 973c65a..ab7b39e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -19889,6 +19889,8 @@ struct expand_vec_perm_d bool testing_p; }; +static bool aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d); + /* Generate a variable permutation. */ static void @@ -20074,6 +20076,83 @@ aarch64_evpc_trn (struct expand_vec_perm_d *d) return true; } +/* Try to re-encode the PERM constant so it use the bigger size up. + This rewrites constants such as {0, 1, 4, 5}/V4SF to {0, 2}/V2DI. + We retry with this new constant with the full suite of patterns. */ +static bool +aarch64_evpc_reencode (struct expand_vec_perm_d *d) +{ + expand_vec_perm_d newd; + unsigned HOST_WIDE_INT nelt; + + if (d->vec_flags != VEC_ADVSIMD) + return false; + + unsigned int encoded_nelts = d->perm.encoding ().encoded_nelts (); + for (unsigned int i = 0; i < encoded_nelts; ++i) + if (!d->perm[i].is_constant ()) + return false; + + /* to_constant is safe since this routine is specific to Advanced SIMD + vectors. */ + nelt = d->perm.length ().to_constant (); + + /* Get the new mode. Always twice the size of the inner + and half the elements. */ + machine_mode new_mode; + switch (d->vmode) + { + /* 128bit vectors. */ + case E_V4SFmode: + case E_V4SImode: + new_mode = V2DImode; + break; + case E_V8BFmode: + case E_V8HFmode: + case E_V8HImode: + new_mode = V4SImode; + break; + case E_V16QImode: + new_mode = V8HImode; + break; + /* 64bit vectors. */ + case E_V4BFmode: + case E_V4HFmode: + case E_V4HImode: + new_mode = V2SImode; + break; + case E_V8QImode: + new_mode = V4HImode; + break; + default: + return false; + } + + newd.vmode = new_mode; + newd.vec_flags = VEC_ADVSIMD; + newd.target = d->target ? gen_lowpart (new_mode, d->target) : NULL; + newd.op0 = d->op0 ? gen_lowpart (new_mode, d->op0) : NULL; + newd.op1 = d->op1 ? gen_lowpart (new_mode, d->op1) : NULL; + newd.testing_p = d->testing_p; + newd.one_vector_p = d->one_vector_p; + vec_perm_builder newpermconst; + newpermconst.new_vector (nelt / 2, nelt / 2, 1); + + /* Convert the perm constant if we can. Require even, odd as the pairs. */ + for (unsigned int i = 0; i < nelt; i += 2) + { + unsigned int elt0 = d->perm[i].to_constant (); + unsigned int elt1 = d->perm[i+1].to_constant (); + if ((elt0 & 1) != 0 || elt0 + 1 != elt1) + return false; + newpermconst.quick_push (elt0 / 2); + } + newpermconst.finalize (); + + newd.perm.new_vector (newpermconst, newd.one_vector_p ? 1 : 2, nelt / 2); + return aarch64_expand_vec_perm_const_1 (&newd); +} + /* Recognize patterns suitable for the UZP instructions. */ static bool aarch64_evpc_uzp (struct expand_vec_perm_d *d) @@ -20471,6 +20550,8 @@ aarch64_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) return true; else if (aarch64_evpc_sel (d)) return true; + else if (aarch64_evpc_reencode (d)) + return true; if (d->vec_flags == VEC_SVE_DATA) return aarch64_evpc_sve_tbl (d); else if (d->vec_flags == VEC_ADVSIMD) diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c new file mode 100644 index 0000000..289604d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vdup_n_3.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define vector __attribute__((vector_size(4*sizeof(float)))) + +/* These are both dups. */ +vector float f(vector float a, vector float b) +{ + return __builtin_shuffle (a, a, (vector int){0, 1, 0, 1}); +} +vector float f1(vector float a, vector float b) +{ + return __builtin_shuffle (a, a, (vector int){2, 3, 2, 3}); +} + +/* { dg-final { scan-assembler-times "\[ \t\]*dup\[ \t\]+v\[0-9\]+\.2d" 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_1.c b/gcc/testsuite/gcc.target/aarch64/vzip_1.c new file mode 100644 index 0000000..65a9d97 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vzip_1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define vector __attribute__((vector_size(2*sizeof(float)))) + +vector float f(vector float a, vector float b) +{ + return __builtin_shuffle (a, b, (vector int){0, 2}); +} + +/* { dg-final { scan-assembler-times "\[ \t\]*zip1\[ \t\]+v\[0-9\]+\.2s" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_2.c b/gcc/testsuite/gcc.target/aarch64/vzip_2.c new file mode 100644 index 0000000..a60b90f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vzip_2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define vector __attribute__((vector_size(4*sizeof(float)))) + +vector float f(vector float a, vector float b) +{ + /* This is the same as zip1 v.2d as {0, 1, 4, 5} can be converted to {0, 2}. */ + return __builtin_shuffle (a, b, (vector int){0, 1, 4, 5}); +} + +/* { dg-final { scan-assembler-times "\[ \t\]*zip1\[ \t\]+v\[0-9\]+\.2d" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_3.c b/gcc/testsuite/gcc.target/aarch64/vzip_3.c new file mode 100644 index 0000000..0446d1f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vzip_3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define vector __attribute__((vector_size(4*sizeof(float)))) + +vector float f(vector float a, vector float b) +{ + /* This is the same as zip1 v.2d as {4, 5, 0, 1} can be converted to {2, 0}. */ + return __builtin_shuffle (a, b, (vector int){4, 5, 0, 1}); +} + +/* { dg-final { scan-assembler-times "\[ \t\]*zip1\[ \t\]+v\[0-9\]+\.2d" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vzip_4.c b/gcc/testsuite/gcc.target/aarch64/vzip_4.c new file mode 100644 index 0000000..b21d8cf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vzip_4.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#define vector __attribute__((vector_size(4*sizeof(float)))) + +vector float f(vector float a, vector float b) +{ + /* This is the same as zip2 v.2d as {2, 3, 6, 7} can be converted to {1, 3}. */ + return __builtin_shuffle (a, b, (vector int){2, 3, 6, 7}); +} + +/* { dg-final { scan-assembler-times "\[ \t\]*zip2\[ \t\]+v\[0-9\]+\.2d" 1 } } */ -- 2.7.4