From patchwork Fri Dec 2 19:18:25 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 128939 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 62480B6F62 for ; Sat, 3 Dec 2011 06:18:48 +1100 (EST) Received: (qmail 27088 invoked by alias); 2 Dec 2011 19:18:45 -0000 Received: (qmail 27077 invoked by uid 22791); 2 Dec 2011 19:18:44 -0000 X-SWARE-Spam-Status: No, hits=-7.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, SPF_HELO_PASS, TW_VP X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 02 Dec 2011 19:18:27 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id pB2JIRQc031351 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 2 Dec 2011 14:18:27 -0500 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [10.16.42.4]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id pB2JIQaq028736 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 2 Dec 2011 14:18:27 -0500 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [127.0.0.1]) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4) with ESMTP id pB2JIQcC016647; Fri, 2 Dec 2011 20:18:26 +0100 Received: (from jakub@localhost) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4/Submit) id pB2JIPJa016645; Fri, 2 Dec 2011 20:18:25 +0100 Date: Fri, 2 Dec 2011 20:18:25 +0100 From: Jakub Jelinek To: Uros Bizjak , Richard Henderson Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Fix AVX2 mulv32qi expander (PR target/51387) Message-ID: <20111202191825.GF27242@tyan-ft48-01.lab.bos.redhat.com> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi! As reported by Michael, vect-116.c testcase fails with -mavx2, the problem is that mulv32qi pattern computes wrong result, the second and third quarters of the vector are swapped compared to what it should contain. This is because we can't use vec_extract_even_odd for V32QI when we prepared the vpmullw arguments using vpunpck[hl]bw, because those insns interleave only intra-lanes, therefore we want to finalize the result using { 0,2,..,14,32,34,..,46,16,18,..,30,48,50,..,62 } permutation instead of the current one { 0,2,..,14,16,18,..,30,32,34,..,46,48,50,..,62 } The new permutation is even shorter (2 vpshufb + vpor) compared to the extract even (2 vpshufb + vpor + vpermq). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-12-02 Jakub Jelinek PR target/51387 * config/i386/sse.md (mul3 with VI1_AVX2 iterator): For V32QImode use { 0,2,..,14,32,34,..,46,16,18,..,30,48,50,..,62 } permutation instead of extract even permutation. Jakub --- gcc/config/i386/sse.md.jj 2011-12-01 11:44:58.000000000 +0100 +++ gcc/config/i386/sse.md 2011-12-02 12:18:42.657795749 +0100 @@ -5066,7 +5066,24 @@ (define_insn_and_split "mul3" gen_lowpart (mulmode, t[3])))); /* Extract the even bytes and merge them back together. */ - ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0); + if (mode == V16QImode) + ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0); + else + { + /* Since avx2_interleave_{low,high}v32qi used above aren't cross-lane, + this can't be normal even extraction, but one where additionally + the second and third quarter are swapped. That is even one insn + shorter than even extraction. */ + rtvec v = rtvec_alloc (32); + for (i = 0; i < 32; ++i) + RTVEC_ELT (v, i) + = GEN_INT (i * 2 + ((i & 24) == 8 ? 16 : (i & 24) == 16 ? -16 : 0)); + t[0] = operands[0]; + t[1] = t[5]; + t[2] = t[4]; + t[3] = gen_rtx_CONST_VECTOR (mode, v); + ix86_expand_vec_perm_const (t); + } set_unique_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_MULT (mode, operands[1], operands[2]));