From patchwork Wed Nov 16 18:35:37 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 126009 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 50BCEB7182 for ; Thu, 17 Nov 2011 05:35:55 +1100 (EST) Received: (qmail 32421 invoked by alias); 16 Nov 2011 18:35:52 -0000 Received: (qmail 32056 invoked by uid 22791); 16 Nov 2011 18:35:51 -0000 X-SWARE-Spam-Status: No, hits=0.6 required=5.0 tests=AWL, BAYES_50, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SARE_HTML_INV_TAG, TW_AV, TW_DQ, TW_KL, TW_LQ, TW_QD, TW_VC, TW_VD, TW_VP, TW_VT, TW_VX, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-gx0-f175.google.com (HELO mail-gx0-f175.google.com) (209.85.161.175) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 16 Nov 2011 18:35:38 +0000 Received: by ggnh4 with SMTP id h4so5879925ggn.20 for ; Wed, 16 Nov 2011 10:35:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.236.193.68 with SMTP id j44mr3243035yhn.97.1321468537564; Wed, 16 Nov 2011 10:35:37 -0800 (PST) Received: by 10.146.241.11 with HTTP; Wed, 16 Nov 2011 10:35:37 -0800 (PST) In-Reply-To: References: Date: Wed, 16 Nov 2011 19:35:37 +0100 Message-ID: Subject: Re: [PATCH, i386]: Optimize v2df (x2) -> v4sf, v4si conversion sequences for AVX. From: Uros Bizjak To: gcc-patches@gcc.gnu.org Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Tue, Nov 15, 2011 at 8:23 PM, Uros Bizjak wrote: > Attached patch optimizes  v2df (x2) -> v4sf,v4si conversion sequences > for AVX from: > >        vroundpd        $1, 32(%rsp), %xmm1 >        vroundpd        $1, 48(%rsp), %xmm0 >        vcvttpd2dqx     %xmm1, %xmm1 >        vcvttpd2dqx     %xmm0, %xmm0 >        vpunpcklqdq     %xmm0, %xmm1, %xmm0 >        vmovdqa %xmm0, 16(%rsp) > > to > >        vroundpd        $1, 64(%rsp), %xmm1 >        vroundpd        $1, 80(%rsp), %xmm0 >        vinsertf128     $0x1, %xmm0, %ymm1, %ymm0 >        vcvttpd2dqy     %ymm0, %xmm0 >        vmovdqa %xmm0, 32(%rsp) > > Ideally, this would be just "vcvtpd2psy 64(%rsp), %xmm0" or "vroundpd > $1, 64(%rsp), %ymm1", but vectorizer does not (yet) support mixed > vectorize factors. Attached patch optimizes above code a step further, generating: vmovapd 64(%rsp), %xmm0 vinsertf128 $0x1, 80(%rsp), %ymm0, %ymm0 vroundpd $1, %ymm0, %ymm0 vcvttpd2dqy %ymm0, %xmm0 vmovdqa %xmm0, 32(%rsp) 2011-11-16 Uros Bizjak * config/i386/sse.md (round2_vec_pack_sfix): Optimize V2DFmode sequence for AVX. (_round_vec_pack_sfix): Ditto. Tested on x86_64-pc-linux-gnu {,-m32} AVX target, committed to mainline SVN. Uros. Index: sse.md =================================================================== --- sse.md (revision 181402) +++ sse.md (working copy) @@ -9962,17 +9962,32 @@ { rtx tmp0, tmp1; - tmp0 = gen_reg_rtx (mode); - tmp1 = gen_reg_rtx (mode); + if (mode == V2DFmode + && TARGET_AVX && !TARGET_PREFER_AVX128) + { + rtx tmp2 = gen_reg_rtx (V4DFmode); - emit_insn - (gen__round (tmp0, operands[1], - operands[3])); - emit_insn - (gen__round (tmp1, operands[2], - operands[3])); - emit_insn - (gen_vec_pack_sfix_trunc_ (operands[0], tmp0, tmp1)); + tmp0 = gen_reg_rtx (V4DFmode); + tmp1 = force_reg (V2DFmode, operands[1]); + + emit_insn (gen_avx_vec_concatv4df (tmp0, tmp1, operands[2])); + emit_insn (gen_avx_roundpd256 (tmp2, tmp0, operands[3])); + emit_insn (gen_fix_truncv4dfv4si2 (operands[0], tmp2)); + } + else + { + tmp0 = gen_reg_rtx (mode); + tmp1 = gen_reg_rtx (mode); + + emit_insn + (gen__round (tmp0, operands[1], + operands[3])); + emit_insn + (gen__round (tmp1, operands[2], + operands[3])); + emit_insn + (gen_vec_pack_sfix_trunc_ (operands[0], tmp0, tmp1)); + } DONE; }) @@ -10053,14 +10068,29 @@ { rtx tmp0, tmp1; - tmp0 = gen_reg_rtx (mode); - tmp1 = gen_reg_rtx (mode); + if (mode == V2DFmode + && TARGET_AVX && !TARGET_PREFER_AVX128) + { + rtx tmp2 = gen_reg_rtx (V4DFmode); - emit_insn (gen_round2 (tmp0, operands[1])); - emit_insn (gen_round2 (tmp1, operands[2])); + tmp0 = gen_reg_rtx (V4DFmode); + tmp1 = force_reg (V2DFmode, operands[1]); - emit_insn - (gen_vec_pack_sfix_trunc_ (operands[0], tmp0, tmp1)); + emit_insn (gen_avx_vec_concatv4df (tmp0, tmp1, operands[2])); + emit_insn (gen_roundv4df2 (tmp2, tmp0)); + emit_insn (gen_fix_truncv4dfv4si2 (operands[0], tmp2)); + } + else + { + tmp0 = gen_reg_rtx (mode); + tmp1 = gen_reg_rtx (mode); + + emit_insn (gen_round2 (tmp0, operands[1])); + emit_insn (gen_round2 (tmp1, operands[2])); + + emit_insn + (gen_vec_pack_sfix_trunc_ (operands[0], tmp0, tmp1)); + } DONE; })