From patchwork Fri Nov 9 03:39:04 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Gopalasubramanian, Ganesh" X-Patchwork-Id: 197922 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 40EAE2C00AA for ; Fri, 9 Nov 2012 14:39:30 +1100 (EST) Comment: DKIM? See http://www.dkim.org DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=gcc.gnu.org; s=default; x=1353037171; h=Comment: DomainKey-Signature:Received:Received:Received:Received:Received: Received:Received:Received:Received:Received:Received:From:To:CC: Subject:Date:Message-ID:References:Content-Type:MIME-Version: Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:Sender:Delivered-To; bh=Xk0VxcWF7l/0ritcSw5s ZrnLJO8=; b=buyUBhUpcDoM8HMxiCNTpXStUcajuiNViXPuAiiKhLKcluVbJOff sVeLWP+8mEmJiqMmwuZ7WB6meZDuuhjfMbg04mxK/VCpUUbfPRD+xyi+CLA4YNMW U6x61jETVSXV3JxHXEm/ND1JtFvwngTrWirO6pHOaikKfkefCwm/DTM= Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=gcc.gnu.org; h=Received:Received:X-SWARE-Spam-Status:X-Spam-Check-By:Received:Received:Received:X-Forefront-Antispam-Report:X-SpamScore:X-BigFish:Received:Received:Received:X-M-MSG:Received:Received:Received:From:To:CC:Subject:Date:Message-ID:References:Content-Type:MIME-Version:X-OriginatorOrg:Mailing-List:Precedence:List-Id:List-Unsubscribe:List-Archive:List-Post:List-Help:Sender:Delivered-To; b=aflafFKZ19JrDSQ0+UaoPHqXMZ7/KQwCmxFG198GI5Bwg/wVpFbwgbXb1zpypK ARGQOepKSUbMgMpae83cwY/j/TWZetTL6fN0hi/nMQ+mkft7JlTiOxGLDNsNany2 Y1na+syWrHWtDwZdRB0jZilh440hjHIJKoMPEPHgnem0w=; Received: (qmail 14785 invoked by alias); 9 Nov 2012 03:39:25 -0000 Received: (qmail 14775 invoked by uid 22791); 9 Nov 2012 03:39:23 -0000 X-SWARE-Spam-Status: No, hits=-3.0 required=5.0 tests=AWL, BAYES_50, KHOP_RCVD_UNTRUST, KHOP_THREADED, RCVD_IN_HOSTKARMA_NO, RCVD_IN_HOSTKARMA_W, RCVD_IN_HOSTKARMA_WL, RCVD_IN_HOSTKARMA_YE, TW_AV, TW_BD, TW_JC, TW_VZ, TW_ZB, TW_ZJ X-Spam-Check-By: sourceware.org Received: from co9ehsobe003.messaging.microsoft.com (HELO co9outboundpool.messaging.microsoft.com) (207.46.163.26) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 09 Nov 2012 03:39:12 +0000 Received: from mail164-co9-R.bigfish.com (10.236.132.231) by CO9EHSOBE036.bigfish.com (10.236.130.99) with Microsoft SMTP Server id 14.1.225.23; Fri, 9 Nov 2012 03:39:10 +0000 Received: from mail164-co9 (localhost [127.0.0.1]) by mail164-co9-R.bigfish.com (Postfix) with ESMTP id DA54222012A; Fri, 9 Nov 2012 03:39:10 +0000 (UTC) X-Forefront-Antispam-Report: CIP:163.181.249.109; KIP:(null); UIP:(null); IPV:NLI; H:ausb3twp02.amd.com; RD:none; EFVD:NLI X-SpamScore: -5 X-BigFish: VPS-5(zz98dI9371I936eIc85fh542M1432Izz1de0h1202h1d1ah1d2ahzz8275bh8275dhz2dh668h839hd25hf0ah1288h12a5h12bdh137ah1441h14ddh1504h1537h153bh15d0l34h1155h) Received: from mail164-co9 (localhost.localdomain [127.0.0.1]) by mail164-co9 (MessageSwitch) id 1352432348372094_18187; Fri, 9 Nov 2012 03:39:08 +0000 (UTC) Received: from CO9EHSMHS003.bigfish.com (unknown [10.236.132.231]) by mail164-co9.bigfish.com (Postfix) with ESMTP id 58562420056; Fri, 9 Nov 2012 03:39:08 +0000 (UTC) Received: from ausb3twp02.amd.com (163.181.249.109) by CO9EHSMHS003.bigfish.com (10.236.130.13) with Microsoft SMTP Server id 14.1.225.23; Fri, 9 Nov 2012 03:39:07 +0000 X-M-MSG: Received: from sausexedgep02.amd.com (sausexedgep02-ext.amd.com [163.181.249.73]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Axway MailGate 3.8.1) with ESMTP id 294F4C80F7; Thu, 8 Nov 2012 21:39:02 -0600 (CST) Received: from SAUSEXDAG05.amd.com (163.181.55.6) by sausexedgep02.amd.com (163.181.36.59) with Microsoft SMTP Server (TLS) id 8.3.192.1; Thu, 8 Nov 2012 21:39:23 -0600 Received: from SAUSEXDAG06.amd.com ([fe80::cc99:d0a6:b4b:ef8b]) by sausexdag05.amd.com ([fe80::94d8:2d17:10c5:6039%20]) with mapi id 14.02.0318.004; Thu, 8 Nov 2012 21:39:04 -0600 From: "Gopalasubramanian, Ganesh" To: "gcc-patches@gcc.gnu.org" CC: "Uros Bizjak (ubizjak@gmail.com)" Subject: [PATCH, i386]: AMD bdver3 enablement Date: Fri, 9 Nov 2012 03:39:04 +0000 Message-ID: References: MIME-Version: 1.0 X-OriginatorOrg: amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi Changes done with respect to the review comments. Conditionally setting "sseshuf" type attribute has been removed. Instead new attribute is added and is included for other attribute calculations. The patch is attached as (difflog.txt). The new file (bdver3.md) describing the pipelines is also attached. Bootstrapping and "make -k check" passes. OK for upstream? 2012-11-09 Ganesh Gopalasubramanian bdver3 Enablement * gcc/doc/extend.texi: Add details about bdver3. * gcc/doc/invoke.texi: Add details about bdver3. * config.gcc (i[34567]86-*-linux* | ...): Add bdver3. (case ${target}): Add bdver3. * config/i386/i386.h (TARGET_BDVER3): New definition. * config/i386/i386.md (define_attr "cpu"): Add bdver3. * config/i386/sse.md (sseshuf): New type attribute. * config/i386/athlon.md (sseshuf):Likewise. * config/i386/atom.md (sseshuf):Likewise. * config/i386/ppro.md (sseshuf):Likewise. * config/i386/bdver1.md (sseshuf):Likewise. * config/i386/i386.opt (flag_dispatch_scheduler): Add bdver3. * config/i386/i386-c.c (ix86_target_macros_internal): Add bdver3 def_and_undef * config/i386/driver-i386.c (host_detect_local_cpu): Let -march=native recognize bdver3 processors. * config/i386/i386.c (struct processor_costs bdver3_cost): New. (m_BDVER3): New definition. (m_AMD_MULTIPLE): Includes m_BDVER3. (initial_ix86_tune_features): Add bdver3 tune. (processor_target_table): Add bdver3 entry. (static const char *const cpu_names): Add bdver3 entry. (software_prefetching_beneficial_p): Add bdver3. (ix86_option_override_internal): Add bdver3 instruction sets. (ix86_option_override_internal): Remove XSAVEOPT for bdver1 and bdver2. (ix86_issue_rate): Add bdver3. (ix86_adjust_cost): Add bdver3. (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver3. (enum processor_type): Add PROCESSOR_BDVER3. * config/i386/bdver3.md: New file describing bdver3 pipelines. Regards Ganesh -----Original Message----- From: Uros Bizjak [mailto:ubizjak@gmail.com] Sent: Monday, November 05, 2012 1:37 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH, i386]: AMD bdver3 enablement On Mon, Nov 5, 2012 at 8:33 AM, Gopalasubramanian, Ganesh wrote: > Couple of changes done with respect to the review comments. > > 1. sseshuf type attribute is handled in unit attribute calculation. > 2. sseadd1 instruction attribute is handled in the new scheduler descriptions. > > The patch is attached as (patch.txt). > The new file (bdver3.md) describing the pipelines is also attached. - [(set_attr "type" "sselog") + [(set (attr "type") + (if_then_else (eq_attr "cpu" "bdver3") + (const_string "sseshuf") + (const_string "sselog"))) (set_attr "length_immediate" "1") (set_attr "prefix" "vex") (set_attr "mode" "V8SF")]) @@ -3911,7 +3914,10 @@ } } [(set_attr "isa" "noavx,avx") - (set_attr "type" "sselog") + (set (attr "type") + (if_then_else (eq_attr "cpu" "bdver3") + (const_string "sseshuf") + (const_string "sselog"))) (set_attr "length_immediate" "1") (set_attr "prefix" "orig,vex") (set_attr "mode" "V4SF")]) @@ -4018,7 +4024,27 @@ vmovlps\t{%2, %1, %0|%0, %1, %2} %vmovlps\t{%2, %0|%0, %2}" [(set_attr "isa" "noavx,avx,noavx,avx,*") - (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov") + (set (attr "type") + (cond [(and (eq_attr "cpu" "bdver3") + (eq_attr "alternative" "0")) + (const_string "sseshuf") + (and (eq_attr "cpu" "bdver3") + (eq_attr "alternative" "1")) + (const_string "sseshuf") + (eq_attr "alternative" "2") + (const_string "ssemov") + (eq_attr "alternative" "3") + (const_string "ssemov") + (eq_attr "alternative" "4") + (const_string "ssemov") + (and (not (eq_attr "cpu" "bdver3")) + (eq_attr "alternative" "0")) + (const_string "sselog") + (and (not (eq_attr "cpu" "bdver3")) + (eq_attr "alternative" "1")) + (const_string "sselog") + ] + (const_string "*" ))) (set_attr "length_immediate" "1,1,*,*,*") (set_attr "prefix" "orig,vex,orig,vex,maybe_vex") (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) @@ -4072,7 +4098,23 @@ vbroadcastss\t{%1, %0|%0, %1} shufps\t{$0, %0, %0|%0, %0, 0}" [(set_attr "isa" "avx,avx,noavx") - (set_attr "type" "sselog1,ssemov,sselog1") + (set (attr "type") + (cond [(and (eq_attr "cpu" "bdver3") + (eq_attr "alternative" "0")) + (const_string "sseshuf") + (and (eq_attr "cpu" "bdver3") + (eq_attr "alternative" "2")) + (const_string "sseshuf") + (eq_attr "alternative" "1") + (const_string "ssemov") + (and (not (eq_attr "cpu" "bdver3")) + (eq_attr "alternative" "0")) + (const_string "sselog1") + (and (not (eq_attr "cpu" "bdver3")) + (eq_attr "alternative" "2")) + (const_string "sselog1") + ] + (const_string "*" ))) Please don't conditionally change type attribute. Change sselog{,1} attribute unconditionally to sseshuf{,1} and handle them in the same way as sselog{,1}. In other words, add new attributes to all places where original attributes are handled. Otherwise, the patch looks good. Uros. Index: gcc/doc/extend.texi =================================================================== --- gcc/doc/extend.texi (revision 193132) +++ gcc/doc/extend.texi (working copy) @@ -9608,6 +9608,9 @@ @item bdver2 AMD family 15h Bulldozer version 2. +@item bdver3 +AMD family 15h Bulldozer version 3. + @item btver2 AMD family 16h CPU. @end table Index: gcc/doc/invoke.texi =================================================================== --- gcc/doc/invoke.texi (revision 193132) +++ gcc/doc/invoke.texi (working copy) @@ -13678,6 +13678,11 @@ supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set extensions.) +@item bdver3 +AMD Family 15h core based CPUs with x86-64 instruction set support. (This +supersets BMI, TBM, F16C, FMA, AVX, XOP, LWP, AES, PCL_MUL, CX16, MMX, SSE, +SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM and 64-bit instruction set +extensions.) @item btver1 CPUs based on AMD Family 14h cores with x86-64 instruction set support. (This Index: gcc/config.gcc =================================================================== --- gcc/config.gcc (revision 193132) +++ gcc/config.gcc (working copy) @@ -1269,7 +1269,7 @@ TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'` need_64bit_isa=yes case X"${with_cpu}" in - Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) + Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) ;; X) if test x$with_cpu_64 = x; then @@ -1278,7 +1278,7 @@ ;; *) echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 - echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 + echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 exit 1 ;; esac @@ -1390,7 +1390,7 @@ tmake_file="$tmake_file i386/t-sol2-64" need_64bit_isa=yes case X"${with_cpu}" in - Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) + Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) ;; X) if test x$with_cpu_64 = x; then @@ -1399,7 +1399,7 @@ ;; *) echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 - echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 + echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 exit 1 ;; esac @@ -1456,7 +1456,7 @@ if test x$enable_targets = xall; then tm_defines="${tm_defines} TARGET_BI_ARCH=1" case X"${with_cpu}" in - Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) + Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver3|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) ;; X) if test x$with_cpu_64 = x; then @@ -1465,7 +1465,7 @@ ;; *) echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 - echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 + echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver3 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 exit 1 ;; esac @@ -2706,6 +2706,10 @@ ;; i686-*-* | i786-*-*) case ${target_noncanonical} in + bdver3-*) + arch=bdver3 + cpu=bdver3 + ;; bdver2-*) arch=bdver2 cpu=bdver2 @@ -2807,6 +2811,10 @@ ;; x86_64-*-*) case ${target_noncanonical} in + bdver3-*) + arch=bdver3 + cpu=bdver3 + ;; bdver2-*) arch=bdver2 cpu=bdver2 @@ -3344,8 +3352,8 @@ ;; "" | x86-64 | generic | native \ | k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \ - | opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver2 | btver1 \ - | amdfam10 | barcelona | nocona | core2 | corei7 \ + | opteron-sse3 | athlon-fx | bdver3 | bdver2 | bdver1 | btver2 \ + | btver1 | amdfam10 | barcelona | nocona | core2 | corei7 \ | corei7-avx | core-avx-i | core-avx2 | atom) # OK ;; Index: gcc/config/i386/i386.h =================================================================== --- gcc/config/i386/i386.h (revision 193132) +++ gcc/config/i386/i386.h (working copy) @@ -254,6 +254,7 @@ #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10) #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1) #define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2) +#define TARGET_BDVER3 (ix86_tune == PROCESSOR_BDVER3) #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1) #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2) #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM) @@ -616,6 +617,7 @@ TARGET_CPU_DEFAULT_amdfam10, TARGET_CPU_DEFAULT_bdver1, TARGET_CPU_DEFAULT_bdver2, + TARGET_CPU_DEFAULT_bdver3, TARGET_CPU_DEFAULT_btver1, TARGET_CPU_DEFAULT_btver2, @@ -2098,6 +2100,7 @@ PROCESSOR_AMDFAM10, PROCESSOR_BDVER1, PROCESSOR_BDVER2, + PROCESSOR_BDVER3, PROCESSOR_BTVER1, PROCESSOR_BTVER2, PROCESSOR_ATOM, Index: gcc/config/i386/i386.md =================================================================== --- gcc/config/i386/i386.md (revision 193132) +++ gcc/config/i386/i386.md (working copy) @@ -323,7 +323,7 @@ ;; Processor type. (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7, - atom,generic64,amdfam10,bdver1,bdver2,btver1,btver2" + atom,generic64,amdfam10,bdver1,bdver2,bdver3,btver1,btver2" (const (symbol_ref "ix86_schedule"))) ;; A basic instruction type. Refinements due to arguments to be @@ -336,9 +336,9 @@ push,pop,call,callv,leave, str,bitmanip, fmov,fop,fsgn,fmul,fdiv,fpspc,fcmov,fcmp,fxch,fistp,fisttp,frndint, - sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul, - sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt, - ssediv,sseins,ssemuladd,sse4arg,lwp, + sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul,sse, + ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt, + sseshuf,sseshuf1,ssediv,sseins,ssemuladd,sse4arg,lwp, mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft" (const_string "other")) @@ -353,7 +353,7 @@ (const_string "i387") (eq_attr "type" "sselog,sselog1,sseiadd,sseiadd1,sseishft,sseishft1,sseimul, sse,ssemov,sseadd,sseadd1,ssemul,ssecmp,ssecomi,ssecvt, - ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg") + sseshuf,sseshuf1,ssecvt1,sseicvt,ssediv,sseins,ssemuladd,sse4arg") (const_string "sse") (eq_attr "type" "mmx,mmxmov,mmxadd,mmxmul,mmxcmp,mmxcvt,mmxshft") (const_string "mmx") @@ -594,7 +594,7 @@ (if_then_else (match_operand 1 "constant_call_address_operand") (const_string "none") (const_string "load")) - (and (eq_attr "type" "alu1,negnot,ishift1,sselog1") + (and (eq_attr "type" "alu1,negnot,ishift1,sselog1,sseshuf1") (match_operand 1 "memory_operand")) (const_string "both") (and (match_operand 0 "memory_operand") @@ -609,7 +609,7 @@ imov,imovx,icmp,test,bitmanip, fmov,fcmp,fsgn, sse,ssemov,ssecmp,ssecomi,ssecvt,ssecvt1,sseicvt,sselog1, - sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt") + sseshuf1,sseadd1,sseiadd1,mmx,mmxmov,mmxcmp,mmxcvt") (match_operand 2 "memory_operand")) (const_string "load") (and (eq_attr "type" "icmov,ssemuladd,sse4arg") @@ -947,6 +947,7 @@ (include "k6.md") (include "athlon.md") (include "bdver1.md") +(include "bdver3.md") (include "geode.md") (include "atom.md") (include "core2.md") Index: gcc/config/i386/athlon.md =================================================================== --- gcc/config/i386/athlon.md (revision 193132) +++ gcc/config/i386/athlon.md (working copy) @@ -736,6 +736,36 @@ (eq_attr "type" "sselog,sselog1")) "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)") +;;SSE shuffle operations +(define_insn_reservation "athlon_sseshuf_load" 3 + (and (eq_attr "cpu" "athlon") + (and (eq_attr "type" "sseshuf,sseshuf1") + (eq_attr "memory" "load"))) + "athlon-vector,athlon-fpload2,(athlon-fmul*2)") +(define_insn_reservation "athlon_sseshuf_load_k8" 5 + (and (eq_attr "cpu" "k8,generic64") + (and (eq_attr "type" "sseshuf,sseshuf1") + (eq_attr "memory" "load"))) + "athlon-double,athlon-fpload2k8,(athlon-fmul*2)") +(define_insn_reservation "athlon_sseshuf_load_amdfam10" 4 + (and (eq_attr "cpu" "amdfam10") + (and (eq_attr "type" "sseshuf,sseshuf1") + (eq_attr "memory" "load"))) + "athlon-direct,athlon-fploadk8,(athlon-fadd|athlon-fmul)") + +(define_insn_reservation "athlon_sseshuf" 3 + (and (eq_attr "cpu" "athlon") + (eq_attr "type" "sseshuf,sseshuf1")) + "athlon-vector,athlon-fpsched,athlon-fmul*2") +(define_insn_reservation "athlon_sseshuf_k8" 3 + (and (eq_attr "cpu" "k8,generic64") + (eq_attr "type" "sseshuf,sseshuf1")) + "athlon-double,athlon-fpsched,athlon-fmul") +(define_insn_reservation "athlon_sseshuf_amdfam10" 2 + (and (eq_attr "cpu" "amdfam10") + (eq_attr "type" "sseshuf,sseshuf1")) + "athlon-direct,athlon-fpsched,(athlon-fadd|athlon-fmul)") + ;; ??? pcmp executes in addmul, probably not worthwhile to bother about that. (define_insn_reservation "athlon_ssecmp_load" 2 (and (eq_attr "cpu" "athlon") Index: gcc/config/i386/atom.md =================================================================== --- gcc/config/i386/atom.md (revision 193132) +++ gcc/config/i386/atom.md (working copy) @@ -455,6 +455,30 @@ (eq_attr "memory" "!none"))) "atom-simple-0") +(define_insn_reservation "atom_sseshuf" 1 + (and (eq_attr "cpu" "atom") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "none"))) + "atom-simple-either") + +(define_insn_reservation "atom_sseshuf_mem" 1 + (and (eq_attr "cpu" "atom") + (and (eq_attr "type" "sseshuf") + (eq_attr "memory" "!none"))) + "atom-simple-either") + +(define_insn_reservation "atom_sseshuf1" 1 + (and (eq_attr "cpu" "atom") + (and (eq_attr "type" "sseshuf1") + (eq_attr "memory" "none"))) + "atom-simple-0") + +(define_insn_reservation "atom_sseshuf1_mem" 1 + (and (eq_attr "cpu" "atom") + (and (eq_attr "type" "sseshuf1") + (eq_attr "memory" "!none"))) + "atom-simple-0") + ;; not pmad, not psad (define_insn_reservation "atom_sseiadd" 1 (and (eq_attr "cpu" "atom") @@ -743,8 +767,8 @@ atom_imul_mem, atom_icmp_mem, atom_test_mem, atom_icmov_mem, atom_sselog_mem, atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem, - atom_ishift_mem, atom_ishift1_mem, - atom_rotate_mem, atom_rotate1_mem" + atom_ishift_mem, atom_ishift1_mem, atom_sseshuf_mem, + atom_sseshuf1_mem, atom_rotate_mem, atom_rotate1_mem" "ix86_agi_dependent") ;; Stall from imul to lea is 8 cycles. @@ -757,7 +781,8 @@ atom_ishift_mem, atom_ishift1_mem, atom_rotate_mem, atom_rotate1_mem, atom_imul_mem, atom_icmp_mem, atom_test_mem, atom_icmov_mem, atom_sselog_mem, - atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem" + atom_sselog1_mem, atom_fmov_mem, atom_sseadd_mem, + atom_sseshuf_mem, atom_sseshuf1_mem" "ix86_agi_dependent") ;; There will be 0 cycle stall from cmp/test to jcc Index: gcc/config/i386/ppro.md =================================================================== --- gcc/config/i386/ppro.md (revision 193132) +++ gcc/config/i386/ppro.md (working copy) @@ -700,6 +700,20 @@ (eq_attr "type" "sselog,sselog1")))) "decoder0,(p2+p1)") +(define_insn_reservation "ppro_sse_shuf_V4SF" 2 + (and (eq_attr "cpu" "pentiumpro") + (and (eq_attr "memory" "none") + (and (eq_attr "mode" "V4SF") + (eq_attr "type" "sseshuf,sseshuf1")))) + "decodern,p1") + +(define_insn_reservation "ppro_sse_shuf_V4SF_load" 2 + (and (eq_attr "cpu" "pentiumpro") + (and (eq_attr "memory" "load") + (and (eq_attr "mode" "V4SF") + (eq_attr "type" "sseshuf,sseshuf1")))) + "decoder0,(p2+p1)") + (define_insn_reservation "ppro_sse_mov_V4SF" 1 (and (eq_attr "cpu" "pentiumpro") (and (eq_attr "memory" "none") Index: gcc/config/i386/sse.md =================================================================== --- gcc/config/i386/sse.md (revision 193132) +++ gcc/config/i386/sse.md (working copy) @@ -3860,7 +3860,7 @@ return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } - [(set_attr "type" "sselog") + [(set_attr "type" "sseshuf") (set_attr "length_immediate" "1") (set_attr "prefix" "vex") (set_attr "mode" "V8SF")]) @@ -3911,7 +3911,7 @@ } } [(set_attr "isa" "noavx,avx") - (set_attr "type" "sselog") + (set_attr "type" "sseshuf") (set_attr "length_immediate" "1") (set_attr "prefix" "orig,vex") (set_attr "mode" "V4SF")]) @@ -4018,7 +4018,7 @@ vmovlps\t{%2, %1, %0|%0, %1, %2} %vmovlps\t{%2, %0|%0, %2}" [(set_attr "isa" "noavx,avx,noavx,avx,*") - (set_attr "type" "sselog,sselog,ssemov,ssemov,ssemov") + (set_attr "type" "sseshuf,sseshuf,ssemov,ssemov,ssemov") (set_attr "length_immediate" "1,1,*,*,*") (set_attr "prefix" "orig,vex,orig,vex,maybe_vex") (set_attr "mode" "V4SF,V4SF,V2SF,V2SF,V2SF")]) @@ -4072,7 +4072,7 @@ vbroadcastss\t{%1, %0|%0, %1} shufps\t{$0, %0, %0|%0, %0, 0}" [(set_attr "isa" "avx,avx,noavx") - (set_attr "type" "sselog1,ssemov,sselog1") + (set_attr "type" "sseshuf1,ssemov,sseshuf1") (set_attr "length_immediate" "1,0,1") (set_attr "prefix_extra" "0,1,*") (set_attr "prefix" "vex,vex,orig") @@ -4802,7 +4802,7 @@ return "vshufpd\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } - [(set_attr "type" "sselog") + [(set_attr "type" "sseshuf") (set_attr "length_immediate" "1") (set_attr "prefix" "vex") (set_attr "mode" "V4DF")]) @@ -4916,7 +4916,7 @@ } } [(set_attr "isa" "noavx,avx") - (set_attr "type" "sselog") + (set_attr "type" "sseshuf") (set_attr "length_immediate" "1") (set_attr "prefix" "orig,vex") (set_attr "mode" "V2DF")]) Index: gcc/config/i386/i386-c.c =================================================================== --- gcc/config/i386/i386-c.c (revision 193132) +++ gcc/config/i386/i386-c.c (working copy) @@ -114,6 +114,10 @@ def_or_undef (parse_in, "__bdver2"); def_or_undef (parse_in, "__bdver2__"); break; + case PROCESSOR_BDVER3: + def_or_undef (parse_in, "__bdver3"); + def_or_undef (parse_in, "__bdver3__"); + break; case PROCESSOR_BTVER1: def_or_undef (parse_in, "__btver1"); def_or_undef (parse_in, "__btver1__"); @@ -209,7 +213,10 @@ case PROCESSOR_BDVER2: def_or_undef (parse_in, "__tune_bdver2__"); break; - case PROCESSOR_BTVER1: + case PROCESSOR_BDVER3: + def_or_undef (parse_in, "__tune_bdver3__"); + break; + case PROCESSOR_BTVER1: def_or_undef (parse_in, "__tune_btver1__"); break; case PROCESSOR_BTVER2: Index: gcc/config/i386/i386.opt =================================================================== --- gcc/config/i386/i386.opt (revision 193132) +++ gcc/config/i386/i386.opt (working copy) @@ -419,7 +419,7 @@ mdispatch-scheduler Target RejectNegative Var(flag_dispatch_scheduler) -Do dispatch scheduling if processor is bdver1 or bdver2 and Haifa scheduling +Do dispatch scheduling if processor is bdver1 or bdver2 or bdver3 and Haifa scheduling is selected. mprefer-avx128 Index: gcc/config/i386/bdver1.md =================================================================== --- gcc/config/i386/bdver1.md (revision 193132) +++ gcc/config/i386/bdver1.md (working copy) @@ -501,6 +501,28 @@ (eq_attr "type" "sselog,sselog1")) "bdver1-direct,bdver1-fpsched,bdver1-fxbar") +;; SSE shuffles +(define_insn_reservation "bdver1_sseshuf_load_256" 7 + (and (eq_attr "cpu" "bdver1,bdver2") + (and (eq_attr "type" "sseshuf,sseshuf1") + (and (eq_attr "mode" "V8SF") + (eq_attr "memory" "load")))) + "bdver1-double,bdver1-fpload,bdver1-fmal") +(define_insn_reservation "bdver1_sseshuf_load" 6 + (and (eq_attr "cpu" "bdver1,bdver2") + (and (eq_attr "type" "sseshuf,sseshuf1") + (eq_attr "memory" "load"))) + "bdver1-direct,bdver1-fpload,bdver1-fxbar") +(define_insn_reservation "bdver1_sseshuf_256" 3 + (and (eq_attr "cpu" "bdver1,bdver2") + (and (eq_attr "type" "sseshuf,sseshuf1") + (eq_attr "mode" "V8SF"))) + "bdver1-double,bdver1-fpsched,bdver1-fmal") +(define_insn_reservation "bdver1_sseshuf" 2 + (and (eq_attr "cpu" "bdver1,bdver2") + (eq_attr "type" "sseshuf,sseshuf1")) + "bdver1-direct,bdver1-fpsched,bdver1-fxbar") + ;; PCMP actually executes in FMAL. (define_insn_reservation "bdver1_ssecmp_load" 6 (and (eq_attr "cpu" "bdver1,bdver2") Index: gcc/config/i386/driver-i386.c =================================================================== --- gcc/config/i386/driver-i386.c (revision 193132) +++ gcc/config/i386/driver-i386.c (working copy) @@ -542,6 +542,8 @@ processor = PROCESSOR_GEODE; else if (has_movbe) processor = PROCESSOR_BTVER2; + else if (has_xsaveopt) + processor = PROCESSOR_BDVER3; else if (has_bmi) processor = PROCESSOR_BDVER2; else if (has_xop) @@ -712,6 +714,9 @@ case PROCESSOR_BDVER2: cpu = "bdver2"; break; + case PROCESSOR_BDVER3: + cpu = "bdver3"; + break; case PROCESSOR_BTVER1: cpu = "btver1"; break; Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 193132) +++ gcc/config/i386/i386.c (working copy) @@ -1427,6 +1427,85 @@ 1, /* cond_not_taken_branch_cost. */ }; +struct processor_costs bdver3_cost = { + COSTS_N_INSNS (1), /* cost of an add instruction */ + COSTS_N_INSNS (1), /* cost of a lea instruction */ + COSTS_N_INSNS (1), /* variable shift costs */ + COSTS_N_INSNS (1), /* constant shift costs */ + {COSTS_N_INSNS (4), /* cost of starting multiply for QI */ + COSTS_N_INSNS (4), /* HI */ + COSTS_N_INSNS (4), /* SI */ + COSTS_N_INSNS (6), /* DI */ + COSTS_N_INSNS (6)}, /* other */ + 0, /* cost of multiply per each bit set */ + {COSTS_N_INSNS (19), /* cost of a divide/mod for QI */ + COSTS_N_INSNS (35), /* HI */ + COSTS_N_INSNS (51), /* SI */ + COSTS_N_INSNS (83), /* DI */ + COSTS_N_INSNS (83)}, /* other */ + COSTS_N_INSNS (1), /* cost of movsx */ + COSTS_N_INSNS (1), /* cost of movzx */ + 8, /* "large" insn */ + 9, /* MOVE_RATIO */ + 4, /* cost for loading QImode using movzbl */ + {5, 5, 4}, /* cost of loading integer registers + in QImode, HImode and SImode. + Relative to reg-reg move (2). */ + {4, 4, 4}, /* cost of storing integer registers */ + 2, /* cost of reg,reg fld/fst */ + {5, 5, 12}, /* cost of loading fp registers + in SFmode, DFmode and XFmode */ + {4, 4, 8}, /* cost of storing fp registers + in SFmode, DFmode and XFmode */ + 2, /* cost of moving MMX register */ + {4, 4}, /* cost of loading MMX registers + in SImode and DImode */ + {4, 4}, /* cost of storing MMX registers + in SImode and DImode */ + 2, /* cost of moving SSE register */ + {4, 4, 4}, /* cost of loading SSE registers + in SImode, DImode and TImode */ + {4, 4, 4}, /* cost of storing SSE registers + in SImode, DImode and TImode */ + 2, /* MMX or SSE register to integer */ + 16, /* size of l1 cache. */ + 2048, /* size of l2 cache. */ + 64, /* size of prefetch block */ + /* New AMD processors never drop prefetches; if they cannot be performed + immediately, they are queued. We set number of simultaneous prefetches + to a large constant to reflect this (it probably is not a good idea not + to limit number of prefetches at all, as their execution also takes some + time). */ + 100, /* number of parallel prefetches */ + 2, /* Branch cost */ + COSTS_N_INSNS (6), /* cost of FADD and FSUB insns. */ + COSTS_N_INSNS (6), /* cost of FMUL instruction. */ + COSTS_N_INSNS (42), /* cost of FDIV instruction. */ + COSTS_N_INSNS (2), /* cost of FABS instruction. */ + COSTS_N_INSNS (2), /* cost of FCHS instruction. */ + COSTS_N_INSNS (52), /* cost of FSQRT instruction. */ + + /* BDVER3 has optimized REP instruction for medium sized blocks, but for + very small blocks it is better to use loop. For large blocks, libcall + can do nontemporary accesses and beat inline considerably. */ + {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}}, + {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}}, + {{libcall, {{8, loop}, {24, unrolled_loop}, + {2048, rep_prefix_4_byte}, {-1, libcall}}}, + {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}}, + 6, /* scalar_stmt_cost. */ + 4, /* scalar load_cost. */ + 4, /* scalar_store_cost. */ + 6, /* vec_stmt_cost. */ + 0, /* vec_to_scalar_cost. */ + 2, /* scalar_to_vec_cost. */ + 4, /* vec_align_load_cost. */ + 4, /* vec_unalign_load_cost. */ + 4, /* vec_store_cost. */ + 2, /* cond_taken_branch_cost. */ + 1, /* cond_not_taken_branch_cost. */ +}; + struct processor_costs btver1_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -1987,7 +2066,8 @@ #define m_AMDFAM10 (1<