From patchwork Tue Apr 29 17:13:10 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evgeny Stupachenko X-Patchwork-Id: 343951 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 260921402F0 for ; Wed, 30 Apr 2014 03:13:23 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=qDWdGrzlYLnIJI8qdh pLfrUum0StQQaZNKZaVTLy1ULelxW4m1wlGtgZcp+I+ySTbBcOcxZRuU94GFFYkA IN4GN4euwW50PUUt2IEBlnSv74h7i/J26iO33PoHd8BSXhfH3hAMmopPS0yWadri Q+7MjqmuzJYg2YlxunKYoPhR4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=2ygYS9NFoyDjCSQu483e5fVQ aXI=; b=NJ/OJhEFuVXBwDyT4FJowae5Ap/fwNrAL8Tb0Gv/piRExqocn1epihBS qkWTeKWzUFZDpCeMMSaggbONpB1Aj7RrgxkFY+kiv2Wv0t7CUdi9SXB/Xr71lUqL g0Aj5y5mrB0mvM/ALKLuLNc+hF1jO2BHuAEmcteW63RBZuZ1Avk= Received: (qmail 31276 invoked by alias); 29 Apr 2014 17:13:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 31267 invoked by uid 89); 29 Apr 2014 17:13:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f180.google.com Received: from mail-ob0-f180.google.com (HELO mail-ob0-f180.google.com) (209.85.214.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 29 Apr 2014 17:13:12 +0000 Received: by mail-ob0-f180.google.com with SMTP id wm4so593780obc.39 for ; Tue, 29 Apr 2014 10:13:10 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.60.231.134 with SMTP id tg6mr582962oec.84.1398791590154; Tue, 29 Apr 2014 10:13:10 -0700 (PDT) Received: by 10.76.170.39 with HTTP; Tue, 29 Apr 2014 10:13:10 -0700 (PDT) In-Reply-To: <535FBC20.1000400@redhat.com> References: <535FBC20.1000400@redhat.com> Date: Tue, 29 Apr 2014 21:13:10 +0400 Message-ID: Subject: Re: [PATCH 2/2, x86] Add palignr support for AVX2. From: Evgeny Stupachenko To: Richard Henderson Cc: GCC Patches , Richard Biener , Uros Bizjak , "H.J. Lu" X-IsSubscribed: yes Thanks. The path is fixed according to your comments: On Tue, Apr 29, 2014 at 6:50 PM, Richard Henderson wrote: > On 04/29/2014 06:50 AM, Evgeny Stupachenko wrote: >> + if (d->one_operand_p != true) >> + return false; > > This looks odd. Better as !d->one_operand_p. > >> + >> + /* For an in order permutation with one operand like: {5 6 7 0 1 2 3 4} >> + PALIGNR is better than PSHUFB. Check for an order in permutation. */ > > FWIW, "in order permutation" sounds like a contradiction in terms. > Better to describe this as a rotate. > >> + in_order_length = 0; >> + in_order_length_max = 0; >> + if (d->one_operand_p == true) > > You've just tested one_operand above. > >> + for (i = 0; i < 2 * nelt; ++i) > > Which means that 2*nelt is doing twice as much work as needed. > >> + { >> + if ((d->perm[(i + 1) & (nelt - 1)] - >> + d->perm[i & (nelt - 1)]) != 1) > > Surely we can avoid re-reading the comparison value... > > next = (d->perm[0] + 1) & (nelt - 1); > for (i = 1; i < nelt; ++i) > { > if (d->perm[i] != next) > return false; > next = (next + 1) & (nelt - 1); > } > >> + { >> + if (in_order_length > in_order_length_max) >> + in_order_length_max = in_order_length; >> + in_order_length = 0; >> + } >> + else >> + in_order_length++; >> + } >> + >> + /* If not an ordered permutation then try something else. */ >> + if (in_order_length_max != nelt - 1) >> + return false; > > I don't understand what this length and max stuff is trying to accomplish. > >> + >> + min = d->perm[0]; >> + >> + shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode))); >> + shift1 = GEN_INT ((min - nelt / 2) * >> + GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode))); >> + >> + if (GET_MODE_SIZE (d->vmode) != 32) > > Positive tests are almost always clearer: == 16. > > > r~ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 002d295..aa6372a 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -42807,6 +42807,79 @@ expand_vec_perm_pshufb (struct expand_vec_perm_d *d) return true; } +/* A subroutine of ix86_expand_vec_perm_1. Try to use just palignr + instruction for one operand permutation. This is better than pshufb + as does not require to pass big constant and faster on some x86 + architectures. */ + +static bool +expand_vec_perm_palignr_one_operand (struct expand_vec_perm_d *d) +{ + unsigned i, nelt = d->nelt; + unsigned min; + rtx shift, shift1, target, tmp; + + /* PALIGNR of 2 128-bits registers takes only 1 instrucion. + Requires SSSE3. */ + if (GET_MODE_SIZE (d->vmode) == 16) + { + if (!TARGET_SSSE3) + return false; + } + /* PALIGNR of 2 256-bits registers on AVX2 costs only 2 instructions: + PERM and PALIGNR. It is more profitable than 2 PSHUFB and PERM. */ + else if (GET_MODE_SIZE (d->vmode) == 32) + { + if (!TARGET_AVX2) + return false; + } + else + return false; + + if (!d->one_operand_p) + return false; + + /* For a rotaion permutation with one operand like: {5 6 7 0 1 2 3 4} + PALIGNR is better than PSHUFB. Check for a rotation in permutation. */ + for (i = 0; i < nelt; ++i) + if ((((d->perm[(i + 1) & (nelt - 1)] - d->perm[i])) & (nelt - 1)) != 1) + return false; + + min = d->perm[0]; + shift = GEN_INT (min * GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode))); + shift1 = GEN_INT ((min - nelt / 2) * + GET_MODE_BITSIZE (GET_MODE_INNER (d->vmode))); + + if (GET_MODE_SIZE (d->vmode) == 16) + { + target = gen_reg_rtx (TImode); + emit_insn (gen_ssse3_palignrti (target, gen_lowpart (TImode, d->op1), + gen_lowpart (TImode, d->op0), shift)); + } + else + { + target = gen_reg_rtx (V2TImode); + tmp = gen_reg_rtx (V4DImode); + emit_insn (gen_avx2_permv2ti (tmp, + gen_lowpart (V4DImode, d->op0), + gen_lowpart (V4DImode, d->op1), + GEN_INT (33))); + if (min < nelt / 2) + emit_insn (gen_avx2_palignrv2ti (target, + gen_lowpart (V2TImode, tmp), + gen_lowpart (V2TImode, d->op0), + shift)); + else + emit_insn (gen_avx2_palignrv2ti (target, + gen_lowpart (V2TImode, d->op1), + gen_lowpart (V2TImode, tmp), + shift1)); + } + emit_move_insn (d->target, gen_lowpart (d->vmode, target)); + + return true; +} + static bool expand_vec_perm_vpshufb2_vpermq (struct expand_vec_perm_d *d); /* A subroutine of ix86_expand_vec_perm_builtin_1. Try to instantiate D @@ -42943,6 +43016,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_vpermil (d)) return true; + /* Try palignr on one operand. */ + if (expand_vec_perm_palignr_one_operand (d)) + return true; + /* Try the SSSE3 pshufb or XOP vpperm or AVX2 vperm2i128, vpshufb, vpermd, vpermps or vpermq variable permutation. */ if (expand_vec_perm_pshufb (d))