From patchwork Wed Oct 30 17:41:56 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 287309 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id A1A6E2C039D for ; Thu, 31 Oct 2013 04:42:09 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; q=dns; s=default; b=bA5hlRsdXcqvZsTNh m8MNFk7G3gIzZ5Gv2NKgIDqWRuW7i3HXDPJdiNuDrmoymMiywVSaorkxih3/dvon rUhon3EYhyu7FFzZz+0WUbcj46FwCc4Dlr9o5KYhdnJfflJcCD9XiB5VrDgCfekR wSQFdgoEuAh5eR2+bIwba/McxE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:in-reply-to; s=default; bh=E7st344Gu2m0nyiFu72kogq dv0M=; b=xUvkMv4NVV5NDwrAs2WvBwz37VydyomKzMaQK5/0eWKEhqXH+3aQ6o6 SugxVwQRb0j2uwuoF7sC14xGmOIOq8J4r0Rq7PfRDM/QS3ae5DxdBpkIBnqOYGKt 0Qis9BAuBn1GsWeLZ+q8VTF8XK56IP5KYJRST440H2s8sGmhYjS8= Received: (qmail 18466 invoked by alias); 30 Oct 2013 17:42:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 18444 invoked by uid 89); 30 Oct 2013 17:42:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=AWL, BAYES_00, FRT_ADOBE2, RP_MATCHES_RCVD, SPF_HELO_PASS, SPF_PASS autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 30 Oct 2013 17:42:01 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r9UHfx3O007202 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 30 Oct 2013 13:41:59 -0400 Received: from tucnak.zalov.cz (vpn1-5-33.ams2.redhat.com [10.36.5.33]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r9UHfvOu021984 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 30 Oct 2013 13:41:59 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.14.7/8.14.7) with ESMTP id r9UHfv2e023634; Wed, 30 Oct 2013 18:41:57 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.14.7/8.14.7/Submit) id r9UHfu0f023633; Wed, 30 Oct 2013 18:41:56 +0100 Date: Wed, 30 Oct 2013 18:41:56 +0100 From: Jakub Jelinek To: Richard Henderson Cc: Uros Bizjak , Kirill Yukhin , gcc-patches@gcc.gnu.org Subject: Re: [RFC PATCH] For TARGET_AVX use *mov_internal for misaligned loads Message-ID: <20131030174156.GJ27813@tucnak.zalov.cz> Reply-To: Jakub Jelinek References: <20131030094713.GC27813@tucnak.zalov.cz> <52713100.4080706@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <52713100.4080706@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes On Wed, Oct 30, 2013 at 09:17:04AM -0700, Richard Henderson wrote: > On 10/30/2013 02:47 AM, Jakub Jelinek wrote: > > 2013-10-30 Jakub Jelinek > > > > * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If > > op1 is misaligned_operand, just use *mov_internal insn > > rather than UNSPEC_LOADU load. > > (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only). > > Avoid gen_lowpart on op0 if it isn't MEM. > > Ok. Testing revealed some testsuite failures, due to either trying to match insn names in -dp dump or counting specific FMA insns, where with the patch there are changes like: - vmovupd 0(%r13,%rax), %ymm0 - vfmadd231pd %ymm1, %ymm2, %ymm0 + vmovapd %ymm2, %ymm0 + vfmadd213pd 0(%r13,%rax), %ymm1, %ymm0 So, here is updated patch with those testsuite changes and added PR line to ChangeLog. I'll wait for Uros' testresults. 2013-10-30 Jakub Jelinek PR target/47754 * config/i386/i386.c (ix86_avx256_split_vector_move_misalign): If op1 is misaligned_operand, just use *mov_internal insn rather than UNSPEC_LOADU load. (ix86_expand_vector_move_misalign): Likewise (for TARGET_AVX only). Avoid gen_lowpart on op0 if it isn't MEM. * gcc.target/i386/avx256-unaligned-load-1.c: Adjust scan-assembler and scan-assembler-not regexps. * gcc.target/i386/avx256-unaligned-load-2.c: Likewise. * gcc.target/i386/avx256-unaligned-load-3.c: Likewise. * gcc.target/i386/avx256-unaligned-load-4.c: Likewise. * gcc.target/i386/l_fma_float_1.c: Expect vf{,n}m{add,sub}213*p* instead of vf{,n}m{add,sub}231*p*. * gcc.target/i386/l_fma_float_3.c: Likewise. * gcc.target/i386/l_fma_double_1.c: Likewise. * gcc.target/i386/l_fma_double_3.c: Likewise. Jakub --- gcc/config/i386/i386.c.jj 2013-10-30 08:15:38.000000000 +0100 +++ gcc/config/i386/i386.c 2013-10-30 10:20:22.684708729 +0100 @@ -16560,6 +16560,12 @@ ix86_avx256_split_vector_move_misalign ( r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m); emit_move_insn (op0, r); } + /* Normal *mov_internal pattern will handle + unaligned loads just fine if misaligned_operand + is true, and without the UNSPEC it can be combined + with arithmetic instructions. */ + else if (misaligned_operand (op1, GET_MODE (op1))) + emit_insn (gen_rtx_SET (VOIDmode, op0, op1)); else emit_insn (load_unaligned (op0, op1)); } @@ -16634,7 +16640,7 @@ ix86_avx256_split_vector_move_misalign ( void ix86_expand_vector_move_misalign (enum machine_mode mode, rtx operands[]) { - rtx op0, op1, m; + rtx op0, op1, orig_op0 = NULL_RTX, m; rtx (*load_unaligned) (rtx, rtx); rtx (*store_unaligned) (rtx, rtx); @@ -16647,7 +16653,16 @@ ix86_expand_vector_move_misalign (enum m { case MODE_VECTOR_INT: case MODE_INT: - op0 = gen_lowpart (V16SImode, op0); + if (GET_MODE (op0) != V16SImode) + { + if (!MEM_P (op0)) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V16SImode); + } + else + op0 = gen_lowpart (V16SImode, op0); + } op1 = gen_lowpart (V16SImode, op1); /* FALLTHRU */ @@ -16676,6 +16691,8 @@ ix86_expand_vector_move_misalign (enum m emit_insn (store_unaligned (op0, op1)); else gcc_unreachable (); + if (orig_op0) + emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0)); break; default: @@ -16692,12 +16709,23 @@ ix86_expand_vector_move_misalign (enum m { case MODE_VECTOR_INT: case MODE_INT: - op0 = gen_lowpart (V32QImode, op0); + if (GET_MODE (op0) != V32QImode) + { + if (!MEM_P (op0)) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V32QImode); + } + else + op0 = gen_lowpart (V32QImode, op0); + } op1 = gen_lowpart (V32QImode, op1); /* FALLTHRU */ case MODE_VECTOR_FLOAT: ix86_avx256_split_vector_move_misalign (op0, op1); + if (orig_op0) + emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0)); break; default: @@ -16709,15 +16737,30 @@ ix86_expand_vector_move_misalign (enum m if (MEM_P (op1)) { + /* Normal *mov_internal pattern will handle + unaligned loads just fine if misaligned_operand + is true, and without the UNSPEC it can be combined + with arithmetic instructions. */ + if (TARGET_AVX + && (GET_MODE_CLASS (mode) == MODE_VECTOR_INT + || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) + && misaligned_operand (op1, GET_MODE (op1))) + emit_insn (gen_rtx_SET (VOIDmode, op0, op1)); /* ??? If we have typed data, then it would appear that using movdqu is the only way to get unaligned data loaded with integer type. */ - if (TARGET_SSE2 && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) + else if (TARGET_SSE2 && GET_MODE_CLASS (mode) == MODE_VECTOR_INT) { - op0 = gen_lowpart (V16QImode, op0); + if (GET_MODE (op0) != V16QImode) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V16QImode); + } op1 = gen_lowpart (V16QImode, op1); /* We will eventually emit movups based on insn attributes. */ emit_insn (gen_sse2_loaddquv16qi (op0, op1)); + if (orig_op0) + emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0)); } else if (TARGET_SSE2 && mode == V2DFmode) { @@ -16765,9 +16808,16 @@ ix86_expand_vector_move_misalign (enum m || TARGET_SSE_PACKED_SINGLE_INSN_OPTIMAL || optimize_insn_for_size_p ()) { - op0 = gen_lowpart (V4SFmode, op0); + if (GET_MODE (op0) != V4SFmode) + { + orig_op0 = op0; + op0 = gen_reg_rtx (V4SFmode); + } op1 = gen_lowpart (V4SFmode, op1); emit_insn (gen_sse_loadups (op0, op1)); + if (orig_op0) + emit_move_insn (orig_op0, + gen_lowpart (GET_MODE (orig_op0), op0)); return; } --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-1.c.jj 2012-10-16 13:15:44.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-1.c 2013-10-30 17:58:30.312180662 +0100 @@ -14,6 +14,6 @@ avx_test (void) c[i] = a[i] * b[i+3]; } -/* { dg-final { scan-assembler-not "avx_loadups256" } } */ -/* { dg-final { scan-assembler "sse_loadups" } } */ +/* { dg-final { scan-assembler-not "(avx_loadups256|vmovups\[^\n\r]*movv8sf_internal)" } } */ +/* { dg-final { scan-assembler "(sse_loadups|movv4sf_internal)" } } */ /* { dg-final { scan-assembler "vinsertf128" } } */ --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c.jj 2013-05-10 10:36:29.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-2.c 2013-10-30 18:00:19.700628673 +0100 @@ -10,6 +10,6 @@ avx_test (char **cp, char **ep) *ap++ = *cp++; } -/* { dg-final { scan-assembler-not "avx_loaddqu256" } } */ -/* { dg-final { scan-assembler "sse2_loaddqu" } } */ +/* { dg-final { scan-assembler-not "(avx_loaddqu256|vmovdqu\[^\n\r]*movv32qi_internal)" } } */ +/* { dg-final { scan-assembler "(sse2_loaddqu|vmovdqu\[^\n\r]*movv16qi_internal)" } } */ /* { dg-final { scan-assembler "vinsert.128" } } */ --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-3.c.jj 2012-10-16 13:15:44.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-3.c 2013-10-30 18:01:02.900409927 +0100 @@ -14,6 +14,6 @@ avx_test (void) c[i] = a[i] * b[i+3]; } -/* { dg-final { scan-assembler-not "avx_loadupd256" } } */ -/* { dg-final { scan-assembler "sse2_loadupd" } } */ +/* { dg-final { scan-assembler-not "(avx_loadupd256|vmovupd\[^\n\r]*movv4df_internal)" } } */ +/* { dg-final { scan-assembler "(sse2_loadupd|vmovupd\[^\n\r]*movv2df_internal)" } } */ /* { dg-final { scan-assembler "vinsertf128" } } */ --- gcc/testsuite/gcc.target/i386/avx256-unaligned-load-4.c.jj 2013-06-10 18:16:38.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/avx256-unaligned-load-4.c 2013-10-30 18:01:28.121281630 +0100 @@ -14,6 +14,6 @@ avx_test (void) b[i] = a[i+3] * 2; } -/* { dg-final { scan-assembler "avx_loadups256" } } */ -/* { dg-final { scan-assembler-not "sse_loadups" } } */ +/* { dg-final { scan-assembler "(avx_loadups256|vmovups\[^\n\r]*movv8sf_internal)" } } */ +/* { dg-final { scan-assembler-not "(sse_loadups|vmovups\[^\n\r]*movv4sf_internal)" } } */ /* { dg-final { scan-assembler-not "vinsertf128" } } */ --- gcc/testsuite/gcc.target/i386/l_fma_float_1.c.jj 2013-08-13 12:20:13.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/l_fma_float_1.c 2013-10-30 18:09:20.083894747 +0100 @@ -9,13 +9,13 @@ #include "l_fma_1.h" /* { dg-final { scan-assembler-times "vfmadd132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfmadd213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfmsub231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfmsub213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmadd132ss" 60 } } */ /* { dg-final { scan-assembler-times "vfmadd213ss" 60 } } */ /* { dg-final { scan-assembler-times "vfmsub132ss" 60 } } */ --- gcc/testsuite/gcc.target/i386/l_fma_float_3.c.jj 2013-08-13 12:20:13.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/l_fma_float_3.c 2013-10-30 18:09:37.204811080 +0100 @@ -9,13 +9,13 @@ #include "l_fma_3.h" /* { dg-final { scan-assembler-times "vfmadd132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfmadd213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfmsub231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfmsub213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213ps" 4 } } */ /* { dg-final { scan-assembler-times "vfmadd132ss" 60 } } */ /* { dg-final { scan-assembler-times "vfmadd213ss" 60 } } */ /* { dg-final { scan-assembler-times "vfmsub132ss" 60 } } */ --- gcc/testsuite/gcc.target/i386/l_fma_double_1.c.jj 2013-08-13 12:20:13.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/l_fma_double_1.c 2013-10-30 18:08:44.504073698 +0100 @@ -10,13 +10,13 @@ typedef double adouble __attribute__((al #include "l_fma_1.h" /* { dg-final { scan-assembler-times "vfmadd132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfmadd213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfmsub231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfmsub213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmadd132sd" 28 } } */ /* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */ /* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */ --- gcc/testsuite/gcc.target/i386/l_fma_double_3.c.jj 2013-08-13 12:20:13.000000000 +0200 +++ gcc/testsuite/gcc.target/i386/l_fma_double_3.c 2013-10-30 18:09:02.270986352 +0100 @@ -10,13 +10,13 @@ typedef double adouble __attribute__((al #include "l_fma_3.h" /* { dg-final { scan-assembler-times "vfmadd132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfmadd231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfmadd213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmsub132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfmsub231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfmsub213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmadd132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmadd213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ -/* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ +/* { dg-final { scan-assembler-times "vfnmsub213pd" 4 } } */ /* { dg-final { scan-assembler-times "vfmadd132sd" 28 } } */ /* { dg-final { scan-assembler-times "vfmadd213sd" 28 } } */ /* { dg-final { scan-assembler-times "vfmsub132sd" 28 } } */