From patchwork Fri Aug 10 22:24:27 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 176641 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 214672C007F for ; Sat, 11 Aug 2012 08:24:48 +1000 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1345242289; h=Comment: DomainKey-Signature:Received:Received:Received:Received: MIME-Version:Received:Received:In-Reply-To:References:Date: Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List: Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:Sender:Delivered-To; bh=vorMRIR4gYYjx/CIptTWaXlp6mg=; b=kY1JH/TL0vYfi2CqA0LtT/nQnCGoXZzJj+2ByOelpi/2UAcCehWInaI3vjTs+V WFA7tCPdHveFEWTc1x5scCDO7ThA8VvhGbBjugicxR/oZbJ2uq2P0cIKlfRFJRkn p1KAMma3NEQ8r4kuONdt+0CU0T3t5zvktlTgd/sLlgCF0= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:MIME-Version:Received:Received:In-Reply-To:References:Date:Message-ID:Subject:From:To:Cc:Content-Type:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=NRLZy6mtJxMpdYPfCujuy5LGPH49HcbCS0vCU3UYxC76VjG3tjv88wC+ajmVMP et81lVbK/nFP26192eIfyPU88jFnfe/ahCEE8KHN/Bg6xJWHpfFUk2e7YUbvxykf jGir/PDPalfTrcgyT5c6BB3ZiswdJ9zCrs+xt1tR7VP5k=; Received: (qmail 5094 invoked by alias); 10 Aug 2012 22:24:45 -0000 Received: (qmail 5085 invoked by uid 22791); 10 Aug 2012 22:24:43 -0000 X-SWARE-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_FROM, KHOP_RCVD_TRUST, KHOP_THREADED, RCVD_IN_DNSWL_LOW, RCVD_IN_HOSTKARMA_YE, TW_BD, TW_ZJ X-Spam-Check-By: sourceware.org Received: from mail-pb0-f47.google.com (HELO mail-pb0-f47.google.com) (209.85.160.47) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 10 Aug 2012 22:24:28 +0000 Received: by pbcwy7 with SMTP id wy7so3284441pbc.20 for ; Fri, 10 Aug 2012 15:24:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.201.198 with SMTP id kc6mr1232848pbc.122.1344637467703; Fri, 10 Aug 2012 15:24:27 -0700 (PDT) Received: by 10.66.11.130 with HTTP; Fri, 10 Aug 2012 15:24:27 -0700 (PDT) In-Reply-To: <502568C4.10002@redhat.com> References: <20120808113105.21153.11115.sendpatchset@adcelk01.amd.com> <502568C4.10002@redhat.com> Date: Sat, 11 Aug 2012 00:24:27 +0200 Message-ID: Subject: Re: [PATCH,i386] fma,fma4 and xop flags From: Uros Bizjak To: Richard Henderson Cc: "Gopalasubramanian, Ganesh" , "gcc-patches@gcc.gnu.org" Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Fri, Aug 10, 2012 at 10:02 PM, Richard Henderson wrote: > On 2012-08-10 12:59, Uros Bizjak wrote: >> Actually, this is the problem you are trying to solve. The fma4 >> patterns are defined before fma3, so gcc prefers these. > > The Real Problem is that they should not be separate patterns. > They should be a single pattern that selects alternatives via > the enabled isa. 2012-08-11 Uros Bizjak * config/i386/i386.md (isa): Add fma and fma4. (enabled): Handle fma and fma4. * config/i386/sse.md (*fma_fmadd_): Merge *fma4_fmadd_. (*fma_fmsub_): Merge *fma4_fmsub_. (*fma_fnmadd_): Merge *fma4_fnmadd_. (*fma_fnmsub_): Merge *fma4_fnmsub_. (*fma_fmaddsub_): Merge *fma4_fmaddsub_. (*fma_fmsubadd_): Merge *fma4_fmsubadd_. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. I will wait a couple of days before backporting patches to 4.7, so please Ganesh, test mainline if everything is OK. BTW: With this patch, we can enable PTA_FMA4 for bdver2 target. Uros. Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 190301) +++ config/i386/i386.md (working copy) @@ -641,7 +641,8 @@ (define_attr "movu" "0,1" (const_string "0")) ;; Used to control the "enabled" attribute on a per-instruction basis. -(define_attr "isa" "base,sse2,sse2_noavx,sse3,sse4,sse4_noavx,noavx,avx,avx2,noavx2,bmi2" +(define_attr "isa" "base,sse2,sse2_noavx,sse3,sse4,sse4_noavx,noavx,avx, + avx2,noavx2,bmi2,fma,fma4" (const_string "base")) (define_attr "enabled" "" @@ -657,6 +658,9 @@ (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2") (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2") (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2") + (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA") + (eq_attr "isa" "fma4") + (symbol_ref "TARGET_FMA4 && !TARGET_FMA") ] (const_int 1))) Index: config/i386/sse.md =================================================================== --- config/i386/sse.md (revision 190304) +++ config/i386/sse.md (working copy) @@ -1891,21 +1891,6 @@ (define_mode_iterator FMAMODE [SF DF V4SF V2DF V8SF V4DF]) -;; In order to match (*a * *b) + *c, particularly when vectorizing, allow -;; combine to generate a multiply/add with two memory references. We then -;; split this insn, into loading up the destination register with one of the -;; memory operations. If we don't manage to split the insn, reload will -;; generate the appropriate moves. The reason this is needed, is that combine -;; has already folded one of the memory references into both the multiply and -;; add insns, and it can't generate a new pseudo. I.e.: -;; (set (reg1) (mem (addr1))) -;; (set (reg2) (mult (reg1) (mem (addr2)))) -;; (set (reg3) (plus (reg2) (mem (addr3)))) -;; -;; ??? This is historic, pre-dating the gimple fma transformation. -;; We could now properly represent that only one memory operand is -;; allowed and not be penalized during optimization. - ;; The standard names for fma is only available with SSE math enabled. (define_expand "fma4" [(set (match_operand:FMAMODE 0 "register_operand") @@ -1948,118 +1933,78 @@ (match_operand:FMAMODE 3 "nonimmediate_operand")))] "TARGET_FMA || TARGET_FMA4") -;; FMA3 version - (define_insn "*fma_fmadd_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x") + [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x,x,x") (fma:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x") - (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm") - (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0")))] - "TARGET_FMA" + (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x, x,x") + (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm,x,m") + (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0,xm,x")))] + "TARGET_FMA || TARGET_FMA4" "@ vfmadd132\t{%2, %3, %0|%0, %3, %2} vfmadd213\t{%3, %2, %0|%0, %2, %3} - vfmadd231\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "ssemuladd") + vfmadd231\t{%2, %1, %0|%0, %1, %2} + vfmadd\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "isa" "fma,fma,fma,fma4,fma4") + (set_attr "type" "ssemuladd") (set_attr "mode" "")]) (define_insn "*fma_fmsub_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x") + [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x,x,x") (fma:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x") - (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm") + (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x, x,x") + (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm,x,m") (neg:FMAMODE - (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0"))))] - "TARGET_FMA" + (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0,xm,x"))))] + "TARGET_FMA || TARGET_FMA4" "@ vfmsub132\t{%2, %3, %0|%0, %3, %2} vfmsub213\t{%3, %2, %0|%0, %2, %3} - vfmsub231\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "ssemuladd") + vfmsub231\t{%2, %1, %0|%0, %1, %2} + vfmsub\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmsub\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "isa" "fma,fma,fma,fma4,fma4") + (set_attr "type" "ssemuladd") (set_attr "mode" "")]) (define_insn "*fma_fnmadd_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x") + [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x,x,x") (fma:FMAMODE (neg:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x")) - (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm") - (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0")))] - "TARGET_FMA" + (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x, x,x")) + (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm,x,m") + (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0,xm,x")))] + "TARGET_FMA || TARGET_FMA4" "@ vfnmadd132\t{%2, %3, %0|%0, %3, %2} vfnmadd213\t{%3, %2, %0|%0, %2, %3} - vfnmadd231\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "ssemuladd") + vfnmadd231\t{%2, %1, %0|%0, %1, %2} + vfnmadd\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfnmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "isa" "fma,fma,fma,fma4,fma4") + (set_attr "type" "ssemuladd") (set_attr "mode" "")]) (define_insn "*fma_fnmsub_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x") + [(set (match_operand:FMAMODE 0 "register_operand" "=x,x,x,x,x") (fma:FMAMODE (neg:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x")) - (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm") + (match_operand:FMAMODE 1 "nonimmediate_operand" "%0, 0,x, x,x")) + (match_operand:FMAMODE 2 "nonimmediate_operand" "xm, x,xm,x,m") (neg:FMAMODE - (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0"))))] - "TARGET_FMA" + (match_operand:FMAMODE 3 "nonimmediate_operand" " x,xm,0,xm,x"))))] + "TARGET_FMA || TARGET_FMA4" "@ vfnmsub132\t{%2, %3, %0|%0, %3, %2} vfnmsub213\t{%3, %2, %0|%0, %2, %3} - vfnmsub231\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "ssemuladd") + vfnmsub231\t{%2, %1, %0|%0, %1, %2} + vfnmsub\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfnmsub\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "isa" "fma,fma,fma,fma4,fma4") + (set_attr "type" "ssemuladd") (set_attr "mode" "")]) -;; FMA4 version - -(define_insn "*fma4_fmadd_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x") - (fma:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%x,x") - (match_operand:FMAMODE 2 "nonimmediate_operand" " x,m") - (match_operand:FMAMODE 3 "nonimmediate_operand" "xm,x")))] - "TARGET_FMA4" - "vfmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "")]) - -(define_insn "*fma4_fmsub_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x") - (fma:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%x,x") - (match_operand:FMAMODE 2 "nonimmediate_operand" " x,m") - (neg:FMAMODE - (match_operand:FMAMODE 3 "nonimmediate_operand" "xm,x"))))] - "TARGET_FMA4" - "vfmsub\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "")]) - -(define_insn "*fma4_fnmadd_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x") - (fma:FMAMODE - (neg:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%x,x")) - (match_operand:FMAMODE 2 "nonimmediate_operand" " x,m") - (match_operand:FMAMODE 3 "nonimmediate_operand" "xm,x")))] - "TARGET_FMA4" - "vfnmadd\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "")]) - -(define_insn "*fma4_fnmsub_" - [(set (match_operand:FMAMODE 0 "register_operand" "=x,x") - (fma:FMAMODE - (neg:FMAMODE - (match_operand:FMAMODE 1 "nonimmediate_operand" "%x,x")) - (match_operand:FMAMODE 2 "nonimmediate_operand" " x,m") - (neg:FMAMODE - (match_operand:FMAMODE 3 "nonimmediate_operand" "xm,x"))))] - "TARGET_FMA4" - "vfnmsub\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "")]) - ;; FMA parallel floating point multiply addsub and subadd operations. ;; It would be possible to represent these without the UNSPEC as @@ -2080,66 +2025,43 @@ UNSPEC_FMADDSUB))] "TARGET_FMA || TARGET_FMA4") -;; FMA3 version - (define_insn "*fma_fmaddsub_" - [(set (match_operand:VF 0 "register_operand" "=x,x,x") + [(set (match_operand:VF 0 "register_operand" "=x,x,x,x,x") (unspec:VF - [(match_operand:VF 1 "nonimmediate_operand" "%0, 0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm, x,xm") - (match_operand:VF 3 "nonimmediate_operand" " x,xm,0")] + [(match_operand:VF 1 "nonimmediate_operand" "%0, 0,x, x,x") + (match_operand:VF 2 "nonimmediate_operand" "xm, x,xm,x,m") + (match_operand:VF 3 "nonimmediate_operand" " x,xm,0,xm,x")] UNSPEC_FMADDSUB))] - "TARGET_FMA" + "TARGET_FMA || TARGET_FMA4" "@ vfmaddsub132\t{%2, %3, %0|%0, %3, %2} vfmaddsub213\t{%3, %2, %0|%0, %2, %3} - vfmaddsub231\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "ssemuladd") + vfmaddsub231\t{%2, %1, %0|%0, %1, %2} + vfmaddsub\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmaddsub\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "isa" "fma,fma,fma,fma4,fma4") + (set_attr "type" "ssemuladd") (set_attr "mode" "")]) (define_insn "*fma_fmsubadd_" - [(set (match_operand:VF 0 "register_operand" "=x,x,x") + [(set (match_operand:VF 0 "register_operand" "=x,x,x,x,x") (unspec:VF - [(match_operand:VF 1 "nonimmediate_operand" "%0, 0,x") - (match_operand:VF 2 "nonimmediate_operand" "xm, x,xm") + [(match_operand:VF 1 "nonimmediate_operand" "%0, 0,x, x,x") + (match_operand:VF 2 "nonimmediate_operand" "xm, x,xm,x,m") (neg:VF - (match_operand:VF 3 "nonimmediate_operand" " x,xm,0"))] + (match_operand:VF 3 "nonimmediate_operand" " x,xm,0,xm,x"))] UNSPEC_FMADDSUB))] - "TARGET_FMA" + "TARGET_FMA || TARGET_FMA4" "@ vfmsubadd132\t{%2, %3, %0|%0, %3, %2} vfmsubadd213\t{%3, %2, %0|%0, %2, %3} - vfmsubadd231\t{%2, %1, %0|%0, %1, %2}" - [(set_attr "type" "ssemuladd") + vfmsubadd231\t{%2, %1, %0|%0, %1, %2} + vfmsubadd\t{%3, %2, %1, %0|%0, %1, %2, %3} + vfmsubadd\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "isa" "fma,fma,fma,fma4,fma4") + (set_attr "type" "ssemuladd") (set_attr "mode" "")]) -;; FMA4 version - -(define_insn "*fma4_fmaddsub_" - [(set (match_operand:VF 0 "register_operand" "=x,x") - (unspec:VF - [(match_operand:VF 1 "nonimmediate_operand" "%x,x") - (match_operand:VF 2 "nonimmediate_operand" " x,m") - (match_operand:VF 3 "nonimmediate_operand" "xm,x")] - UNSPEC_FMADDSUB))] - "TARGET_FMA4" - "vfmaddsub\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "")]) - -(define_insn "*fma4_fmsubadd_" - [(set (match_operand:VF 0 "register_operand" "=x,x") - (unspec:VF - [(match_operand:VF 1 "nonimmediate_operand" "%x,x") - (match_operand:VF 2 "nonimmediate_operand" " x,m") - (neg:VF - (match_operand:VF 3 "nonimmediate_operand" "xm,x"))] - UNSPEC_FMADDSUB))] - "TARGET_FMA4" - "vfmsubadd\t{%3, %2, %1, %0|%0, %1, %2, %3}" - [(set_attr "type" "ssemuladd") - (set_attr "mode" "")]) - ;; FMA3 floating point scalar intrinsics. These merge result with ;; high-order elements from the destination register.