From patchwork Wed Oct 12 16:24:45 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 119252 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id B248BB6F7C for ; Thu, 13 Oct 2011 03:25:20 +1100 (EST) Received: (qmail 31894 invoked by alias); 12 Oct 2011 16:25:12 -0000 Received: (qmail 31833 invoked by uid 22791); 12 Oct 2011 16:25:08 -0000 X-SWARE-Spam-Status: No, hits=-6.8 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, SPF_HELO_PASS, TW_AV X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 12 Oct 2011 16:24:49 +0000 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p9CGOmoZ015634 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 12 Oct 2011 12:24:48 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [10.16.42.4]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p9CGOkbG015255 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 12 Oct 2011 12:24:47 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [127.0.0.1]) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4) with ESMTP id p9CGOkFR008065; Wed, 12 Oct 2011 18:24:46 +0200 Received: (from jakub@localhost) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4/Submit) id p9CGOjk5008064; Wed, 12 Oct 2011 18:24:45 +0200 Date: Wed, 12 Oct 2011 18:24:45 +0200 From: Jakub Jelinek To: Uros Bizjak , Richard Henderson , Kirill Yukhin Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] Add mulv32qi3 support Message-ID: <20111012162445.GD2210@tyan-ft48-01.lab.bos.redhat.com> Reply-To: Jakub Jelinek MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi! On long long a[1024], c[1024]; char b[1024]; void foo (void) { int i; for (i = 0; i < 1024; i++) b[i] = a[i] + 3 * c[i]; } I've noticed that while i?86 backend supports mulv16qi3, it doesn't support mulv32qi3 even with AVX2. The following patch implements that similarly how mulv16qi3 is implemented. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? BTW, I wonder if vector multiply expansion when one argument is VECTOR_CST with all elements the same shouldn't use something similar to what expand_mult does, not sure if in the generic code or at least in the backends. Testing the costs will be harder, maybe it could just test fewer algorithms and perhaps just count number of instructions or something similar. But certainly e.g. v32qi multiplication by 3 is quite costly (4 interleaves, 2 v16hi multiplications, 4 insns to select even from the two), while two vector additions (tmp = x + x; result = x + tmp;) would do the job. 2011-10-12 Jakub Jelinek * config/i386/sse.md (vec_avx2): New mode_attr. (mulv16qi3): Macroize to cover also mulv32qi3 for TARGET_AVX2 into ... (mul3): ... this. Jakub --- gcc/config/i386/sse.md.jj 2011-10-12 09:23:37.000000000 +0200 +++ gcc/config/i386/sse.md 2011-10-12 12:16:39.000000000 +0200 @@ -163,6 +163,12 @@ (define_mode_attr avx_avx2 (V4SI "avx2") (V2DI "avx2") (V8SI "avx2") (V4DI "avx2")]) +(define_mode_attr vec_avx2 + [(V16QI "vec") (V32QI "avx2") + (V8HI "vec") (V16HI "avx2") + (V4SI "vec") (V8SI "avx2") + (V2DI "vec") (V4DI "avx2")]) + ;; Mapping of logic-shift operators (define_code_iterator lshift [lshiftrt ashift]) @@ -4838,10 +4844,10 @@ (define_insn "*_3" + [(set (match_operand:VI1_AVX2 0 "register_operand" "") + (mult:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand" "") + (match_operand:VI1_AVX2 2 "register_operand" "")))] "TARGET_SSE2 && can_create_pseudo_p ()" "#" @@ -4850,34 +4856,41 @@ (define_insn_and_split "mulv16qi3" { rtx t[6]; int i; + enum machine_mode mulmode = mode; for (i = 0; i < 6; ++i) - t[i] = gen_reg_rtx (V16QImode); + t[i] = gen_reg_rtx (mode); /* Unpack data such that we've got a source byte in each low byte of each word. We don't care what goes into the high byte of each word. Rather than trying to get zero in there, most convenient is to let it be a copy of the low byte. */ - emit_insn (gen_vec_interleave_highv16qi (t[0], operands[1], operands[1])); - emit_insn (gen_vec_interleave_highv16qi (t[1], operands[2], operands[2])); - emit_insn (gen_vec_interleave_lowv16qi (t[2], operands[1], operands[1])); - emit_insn (gen_vec_interleave_lowv16qi (t[3], operands[2], operands[2])); + emit_insn (gen__interleave_high (t[0], operands[1], + operands[1])); + emit_insn (gen__interleave_high (t[1], operands[2], + operands[2])); + emit_insn (gen__interleave_low (t[2], operands[1], + operands[1])); + emit_insn (gen__interleave_low (t[3], operands[2], + operands[2])); /* Multiply words. The end-of-line annotations here give a picture of what the output of that instruction looks like. Dot means don't care; the letters are the bytes of the result with A being the most significant. */ - emit_insn (gen_mulv8hi3 (gen_lowpart (V8HImode, t[4]), /* .A.B.C.D.E.F.G.H */ - gen_lowpart (V8HImode, t[0]), - gen_lowpart (V8HImode, t[1]))); - emit_insn (gen_mulv8hi3 (gen_lowpart (V8HImode, t[5]), /* .I.J.K.L.M.N.O.P */ - gen_lowpart (V8HImode, t[2]), - gen_lowpart (V8HImode, t[3]))); + emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (mulmode, t[4]), + gen_rtx_MULT (mulmode, /* .A.B.C.D.E.F.G.H */ + gen_lowpart (mulmode, t[0]), + gen_lowpart (mulmode, t[1])))); + emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (mulmode, t[5]), + gen_rtx_MULT (mulmode, /* .I.J.K.L.M.N.O.P */ + gen_lowpart (mulmode, t[2]), + gen_lowpart (mulmode, t[3])))); /* Extract the even bytes and merge them back together. */ ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0); set_unique_reg_note (get_last_insn (), REG_EQUAL, - gen_rtx_MULT (V16QImode, operands[1], operands[2])); + gen_rtx_MULT (mode, operands[1], operands[2])); DONE; })