From patchwork Thu Dec 4 09:49:59 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ilya Tocar X-Patchwork-Id: 417692 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 5BE7A1400E9 for ; Thu, 4 Dec 2014 20:50:27 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=PNlAodyyks9QDZIirM96pTJVyr6IprLqd7r4LjTI46JecyJCqf ySIZSGh8EXrALCt4nXbYjA+7O3TMuZMwKywGwfHI2Kd09xMNkWEI15YAVPqo1lm1 K2rlkE7vmK01zp+Xg0lhktMYtPCqBW9rbiPDoawYNBhwCep6z/ZDDpPhg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=vbA4XPa3kMIR9aTpRXkUBiWjEy0=; b=H5T47TcFgNjz4fPyGMeR PqfUgHZbP8bEhgbHMuXSbEjDk423+wyKohao3O7cEDoIanTZvl/QaS5FTzdpPjiC GxoEcCjObGMHNbiQRgkPnXaf2iah2/S/rJnshIgKX2l1yc/dnnU8dpffxnAQaGLq gCg+7R8UsclBjerJrd+nm10= Received: (qmail 5497 invoked by alias); 4 Dec 2014 09:50:20 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 5478 invoked by uid 89); 4 Dec 2014 09:50:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-wg0-f53.google.com Received: from mail-wg0-f53.google.com (HELO mail-wg0-f53.google.com) (74.125.82.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 04 Dec 2014 09:50:17 +0000 Received: by mail-wg0-f53.google.com with SMTP id l18so21809938wgh.26 for ; Thu, 04 Dec 2014 01:50:14 -0800 (PST) X-Received: by 10.180.91.201 with SMTP id cg9mr20455028wib.63.1417686614479; Thu, 04 Dec 2014 01:50:14 -0800 (PST) Received: from msticlxl7.ims.intel.com (jfdmzpr05-ext.jf.intel.com. [134.134.139.74]) by mx.google.com with ESMTPSA id ep6sm34067174wib.0.2014.12.04.01.50.11 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Dec 2014 01:50:13 -0800 (PST) Date: Thu, 4 Dec 2014 12:49:59 +0300 From: Ilya Tocar To: Uros Bizjak , Jakub Jelinek Cc: GCC Patches Subject: [PATCH x86] Enable v64qi permutations. Message-ID: <20141204094959.GA67582@msticlxl7.ims.intel.com> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes Hi, As discussed in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00473.html This patch enables v64qi permutations. I've checked vshuf* tests from dg-torture.exp, with avx512* options on sde and generated permutations are correct. OK for trunk? --- gcc/config/i386/i386.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++ gcc/config/i386/sse.md | 4 +-- 2 files changed, 87 insertions(+), 2 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index eafc15a..f29f8ce 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -21831,6 +21831,10 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx mask, rtx op1, if (TARGET_AVX512VL && TARGET_AVX512BW) gen = gen_avx512vl_vpermi2varv16hi3; break; + case V64QImode: + if (TARGET_AVX512VBMI) + gen = gen_avx512bw_vpermi2varv64qi3; + break; case V32HImode: if (TARGET_AVX512BW) gen = gen_avx512bw_vpermi2varv32hi3; @@ -48872,6 +48876,7 @@ expand_vec_perm_broadcast_1 (struct expand_vec_perm_d *d) emit_move_insn (d->target, gen_lowpart (d->vmode, dest)); return true; + case V64QImode: case V32QImode: case V16HImode: case V8SImode: @@ -48905,6 +48910,78 @@ expand_vec_perm_broadcast (struct expand_vec_perm_d *d) return expand_vec_perm_broadcast_1 (d); } +/* Implement arbitrary permutations of two V64QImode operands + will 2 vpermi2w, 2 vpshufb and one vpor instruction. */ +static bool +expand_vec_perm_vpermi2_vpshub2 (struct expand_vec_perm_d *d) +{ + if (!TARGET_AVX512BW || !(d->vmode == V64QImode)) + return false; + + if (d->testing_p) + return true; + + struct expand_vec_perm_d ds[2]; + rtx rperm[128], vperm, target0, target1; + unsigned int i, nelt; + machine_mode vmode; + + nelt = d->nelt; + vmode = V64QImode; + + for (i = 0; i < 2; i++) + { + ds[i] = *d; + ds[i].vmode = V32HImode; + ds[i].nelt = 32; + ds[i].target = gen_reg_rtx (V32HImode); + ds[i].op0 = gen_lowpart (V32HImode, d->op0); + ds[i].op1 = gen_lowpart (V32HImode, d->op1); + } + + /* Prepare permutations such that the first one takes care of + putting the even bytes into the right positions or one higher + positions (ds[0]) and the second one takes care of + putting the odd bytes into the right positions or one below + (ds[1]). */ + + for (i = 0; i < nelt; i++) + { + ds[i & 1].perm[i / 2] = d->perm[i] / 2; + if (i & 1) + { + rperm[i] = constm1_rtx; + rperm[i + 64] = GEN_INT ((i & 14) + (d->perm[i] & 1)); + } + else + { + rperm[i] = GEN_INT ((i & 14) + (d->perm[i] & 1)); + rperm[i + 64] = constm1_rtx; + } + } + + bool ok = expand_vec_perm_1 (&ds[0]); + gcc_assert (ok); + ds[0].target = gen_lowpart (V64QImode, ds[0].target); + + ok = expand_vec_perm_1 (&ds[1]); + gcc_assert (ok); + ds[1].target = gen_lowpart (V64QImode, ds[1].target); + + vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm)); + vperm = force_reg (vmode, vperm); + target0 = gen_reg_rtx (V64QImode); + emit_insn (gen_avx512bw_pshufbv64qi3 (target0, ds[0].target, vperm)); + + vperm = gen_rtx_CONST_VECTOR (V64QImode, gen_rtvec_v (64, rperm + 64)); + vperm = force_reg (vmode, vperm); + target1 = gen_reg_rtx (V64QImode); + emit_insn (gen_avx512bw_pshufbv64qi3 (target1, ds[1].target, vperm)); + + emit_insn (gen_iorv64qi3 (d->target, target0, target1)); + return true; +} + /* Implement arbitrary permutation of two V32QImode and V16QImode operands with 4 vpshufb insns, 2 vpermq and 3 vpor. We should have already failed all the shorter instruction sequences. */ @@ -49079,6 +49156,9 @@ ix86_expand_vec_perm_const_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_vpshufb2_vpermq_even_odd (d)) return true; + if (expand_vec_perm_vpermi2_vpshub2 (d)) + return true; + /* ??? Look for narrow permutations whose element orderings would allow the promotion to a wider mode. */ @@ -49223,6 +49303,11 @@ ix86_vectorize_vec_perm_const_ok (machine_mode vmode, /* All implementable with a single vpermi2 insn. */ return true; break; + case V64QImode: + if (TARGET_AVX512BW) + /* Implementable with 2 vpermi2, 2 vpshufb and 1 or insn. */ + return true; + break; case V8SImode: case V8SFmode: case V4DFmode: diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index ca5d720..6252e7e 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -10678,7 +10678,7 @@ (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") - (V32HI "TARGET_AVX512BW")]) + (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")]) (define_expand "vec_perm" [(match_operand:VEC_PERM_AVX2 0 "register_operand") @@ -10700,7 +10700,7 @@ (V32QI "TARGET_AVX2") (V16HI "TARGET_AVX2") (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F") (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F") - (V32HI "TARGET_AVX512BW")]) + (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512BW")]) (define_expand "vec_perm_const" [(match_operand:VEC_PERM_CONST 0 "register_operand")