From patchwork Mon Jul 11 18:38:26 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: harsha.jagasia@amd.com X-Patchwork-Id: 104268 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 74A36B6F7B for ; Tue, 12 Jul 2011 04:38:57 +1000 (EST) Received: (qmail 18374 invoked by alias); 11 Jul 2011 18:38:54 -0000 Received: (qmail 18356 invoked by uid 22791); 11 Jul 2011 18:38:51 -0000 X-SWARE-Spam-Status: No, hits=-1.0 required=5.0 tests=AWL, BAYES_40, RCVD_IN_DNSWL_LOW, TW_AV, TW_BD, TW_VZ, TW_ZB X-Spam-Check-By: sourceware.org Received: from db3ehsobe001.messaging.microsoft.com (HELO DB3EHSOBE001.bigfish.com) (213.199.154.139) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 11 Jul 2011 18:38:35 +0000 Received: from mail45-db3-R.bigfish.com (10.3.81.247) by DB3EHSOBE001.bigfish.com (10.3.84.21) with Microsoft SMTP Server id 14.1.225.22; Mon, 11 Jul 2011 18:38:32 +0000 Received: from mail45-db3 (localhost.localdomain [127.0.0.1]) by mail45-db3-R.bigfish.com (Postfix) with ESMTP id BA9627E031B for ; Mon, 11 Jul 2011 18:38:32 +0000 (UTC) X-SpamScore: -6 X-BigFish: VPS-6(zz936eK4015Lzz1202hzz8275bhz32i668h839h) X-Forefront-Antispam-Report: CIP:163.181.249.109; KIP:(null); UIP:(null); IPVD:NLI; H:ausb3twp02.amd.com; RD:none; EFVD:NLI Received: from mail45-db3 (localhost.localdomain [127.0.0.1]) by mail45-db3 (MessageSwitch) id 1310409512147017_9985; Mon, 11 Jul 2011 18:38:32 +0000 (UTC) Received: from DB3EHSMHS018.bigfish.com (unknown [10.3.81.254]) by mail45-db3.bigfish.com (Postfix) with ESMTP id 1F797DD804B for ; Mon, 11 Jul 2011 18:38:32 +0000 (UTC) Received: from ausb3twp02.amd.com (163.181.249.109) by DB3EHSMHS018.bigfish.com (10.3.87.118) with Microsoft SMTP Server id 14.1.225.22; Mon, 11 Jul 2011 18:38:29 +0000 X-M-MSG: Received: from sausexedgep01.amd.com (sausexedgep01-ext.amd.com [163.181.249.72]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Axway MailGate 3.8.1) with ESMTP id 274C3C81D7 for ; Mon, 11 Jul 2011 13:38:23 -0500 (CDT) Received: from sausexhtp01.amd.com (163.181.3.165) by sausexedgep01.amd.com (163.181.36.54) with Microsoft SMTP Server (TLS) id 8.3.106.1; Mon, 11 Jul 2011 13:39:18 -0500 Received: from sausexmb1.amd.com (163.181.3.156) by sausexhtp01.amd.com (163.181.3.165) with Microsoft SMTP Server id 8.3.83.0; Mon, 11 Jul 2011 13:38:25 -0500 Received: from gccpike1.amd.com ([10.236.44.240]) by sausexmb1.amd.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 11 Jul 2011 13:38:26 -0500 From: To: , , CC: Message-ID: <20110711183826.4035.38768.sendpatchset@gccpike1.amd.com> Subject: AMD bdver2 enablement. Date: Mon, 11 Jul 2011 13:38:26 -0500 MIME-Version: 1.0 X-OriginatorOrg: amd.com Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Hi, This patch does the basic enablement for AMD's upcoming bdver2 processor. It defines -march=bdver2 and -mtune=bdver2, and lets -march=native correctly recognizes bdver2. At the moment the tuning is mostly a copy of bdver1. The patch passed bootstrap and the x86 tests. Is it OK to commit to trunk? Thanks, Harsha 2011-07-11 Harsha Jagasia AMD bdver2 Enablement * config.gcc (i[34567]86-*-linux* | ...): Add bdver2. (case ${target}): Add bdver2. * config/i386/driver-i386.c (host_detect_local_cpu): Let -march=native recognize bdver2 processors. * config/i386/i386-c.c (ix86_target_macros_internal): Add bdver2 def_and_undef * config/i386/i386.c (struct processor_costs bdver2_cost): New bdver2 cost table. (m_BDVER2): New definition. (m_AMD_MULTIPLE): Includes m_BDVER2. (initial_ix86_tune_features): Add bdver2 tuning. (processor_target_table): Add bdver2 entry. (static const char *const cpu_names): Add bdver2 entry. (ix86_option_override_internal): Add bdver2 instruction sets. (ix86_issue_rate): Add bdver2. (ix86_adjust_cost): Add bdver2. (has_dispatch): Add bdver2. * config/i386/i386.h (TARGET_BDVER2): New definition. (enum target_cpu_default): Add TARGET_CPU_DEFAULT_bdver2. (enum processor_type): Add PROCESSOR_BDVER2. * config/i386/i386.md (define_attr "cpu"): Add bdver2. * config/i386/i386.opt ( mdispatch-scheduler): Add bdver2 to description. Index: gcc/config.gcc =================================================================== --- gcc/config.gcc (revision 175929) +++ gcc/config.gcc (working copy) @@ -1283,7 +1283,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfree need_64bit_hwint=yes need_64bit_isa=yes case X"${with_cpu}" in - Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) + Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) ;; X) if test x$with_cpu_64 = x; then @@ -1292,7 +1292,7 @@ i[34567]86-*-linux* | i[34567]86-*-kfree ;; *) echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 - echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 + echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 exit 1 ;; esac @@ -1392,7 +1392,7 @@ i[34567]86-*-solaris2*) need_64bit_hwint=yes need_64bit_isa=yes case X"${with_cpu}" in - Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) + Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) ;; X) if test x$with_cpu_64 = x; then @@ -1401,7 +1401,7 @@ i[34567]86-*-solaris2*) ;; *) echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 - echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 + echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 exit 1 ;; esac @@ -1471,7 +1471,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*) if test x$enable_targets = xall; then tm_defines="${tm_defines} TARGET_BI_ARCH=1" case X"${with_cpu}" in - Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) + Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3) ;; X) if test x$with_cpu_64 = x; then @@ -1480,7 +1480,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*) ;; *) echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2 - echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 + echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2 exit 1 ;; esac @@ -2718,6 +2718,10 @@ case ${target} in ;; i686-*-* | i786-*-*) case ${target_noncanonical} in + bdver2-*) + arch=bdver2 + cpu=bdver2 + ;; bdver1-*) arch=bdver1 cpu=bdver1 @@ -2811,6 +2815,10 @@ case ${target} in ;; x86_64-*-*) case ${target_noncanonical} in + bdver2-*) + arch=bdver2 + cpu=bdver2 + ;; bdver1-*) arch=bdver1 cpu=bdver1 @@ -3246,8 +3254,9 @@ case "${target}" in ;; "" | x86-64 | generic | native \ | k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \ - | opteron-sse3 | athlon-fx | bdver1 | btver1 | amdfam10 \ - | barcelona | nocona | core2 | corei7 | corei7-avx | atom) + | opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver1 \ + | amdfam10 | barcelona | nocona | core2 | corei7 \ + | corei7-avx | atom) # OK ;; *) Index: gcc/config/i386/i386.h =================================================================== --- gcc/config/i386/i386.h (revision 175929) +++ gcc/config/i386/i386.h (working copy) @@ -240,6 +240,7 @@ extern const struct processor_costs ix86 #define TARGET_GENERIC (TARGET_GENERIC32 || TARGET_GENERIC64) #define TARGET_AMDFAM10 (ix86_tune == PROCESSOR_AMDFAM10) #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1) +#define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2) #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1) #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM) @@ -583,6 +584,7 @@ enum target_cpu_default TARGET_CPU_DEFAULT_k8, TARGET_CPU_DEFAULT_amdfam10, TARGET_CPU_DEFAULT_bdver1, + TARGET_CPU_DEFAULT_bdver2, TARGET_CPU_DEFAULT_btver1, TARGET_CPU_DEFAULT_max @@ -2020,6 +2022,7 @@ enum processor_type PROCESSOR_GENERIC64, PROCESSOR_AMDFAM10, PROCESSOR_BDVER1, + PROCESSOR_BDVER2, PROCESSOR_BTVER1, PROCESSOR_ATOM, PROCESSOR_max Index: gcc/config/i386/i386.md =================================================================== --- gcc/config/i386/i386.md (revision 175929) +++ gcc/config/i386/i386.md (working copy) @@ -369,7 +369,7 @@ (define_constants ;; Processor type. (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7, - atom,generic64,amdfam10,bdver1,btver1" + atom,generic64,amdfam10,bdver1,bdver2,btver1" (const (symbol_ref "ix86_schedule"))) ;; A basic instruction type. Refinements due to arguments to be Index: gcc/config/i386/i386.opt =================================================================== --- gcc/config/i386/i386.opt (revision 175929) +++ gcc/config/i386/i386.opt (working copy) @@ -384,7 +384,7 @@ the function. mdispatch-scheduler Target RejectNegative Var(flag_dispatch_scheduler) -Do dispatch scheduling if processor is bdver1 and Haifa scheduling +Do dispatch scheduling if processor is bdver1 or bdver2 and Haifa scheduling is selected. mprefer-avx128 Index: gcc/config/i386/i386-c.c =================================================================== --- gcc/config/i386/i386-c.c (revision 175929) +++ gcc/config/i386/i386-c.c (working copy) @@ -110,6 +110,10 @@ ix86_target_macros_internal (int isa_fla def_or_undef (parse_in, "__bdver1"); def_or_undef (parse_in, "__bdver1__"); break; + case PROCESSOR_BDVER2: + def_or_undef (parse_in, "__bdver2"); + def_or_undef (parse_in, "__bdver2__"); + break; case PROCESSOR_BTVER1: def_or_undef (parse_in, "__btver1"); def_or_undef (parse_in, "__btver1__"); @@ -198,6 +202,9 @@ ix86_target_macros_internal (int isa_fla case PROCESSOR_BDVER1: def_or_undef (parse_in, "__tune_bdver1__"); break; + case PROCESSOR_BDVER2: + def_or_undef (parse_in, "__tune_bdver2__"); + break; case PROCESSOR_BTVER1: def_or_undef (parse_in, "__tune_btver1__"); break; Index: gcc/config/i386/driver-i386.c =================================================================== --- gcc/config/i386/driver-i386.c (revision 175929) +++ gcc/config/i386/driver-i386.c (working copy) @@ -499,6 +499,8 @@ const char *host_detect_local_cpu (int a if (name == SIG_GEODE) processor = PROCESSOR_GEODE; + else if (has_bmi) + processor = PROCESSOR_BDVER2; else if (has_xop) processor = PROCESSOR_BDVER1; else if (has_sse4a && has_ssse3) @@ -664,6 +666,9 @@ const char *host_detect_local_cpu (int a case PROCESSOR_BDVER1: cpu = "bdver1"; break; + case PROCESSOR_BDVER2: + cpu = "bdver2"; + break; case PROCESSOR_BTVER1: cpu = "btver1"; break; Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 175929) +++ gcc/config/i386/i386.c (working copy) @@ -1338,6 +1338,93 @@ struct processor_costs bdver1_cost = { 1, /* cond_not_taken_branch_cost. */ }; +struct processor_costs bdver2_cost = { + COSTS_N_INSNS (1), /* cost of an add instruction */ + COSTS_N_INSNS (1), /* cost of a lea instruction */ + COSTS_N_INSNS (1), /* variable shift costs */ + COSTS_N_INSNS (1), /* constant shift costs */ + {COSTS_N_INSNS (4), /* cost of starting multiply for QI */ + COSTS_N_INSNS (4), /* HI */ + COSTS_N_INSNS (4), /* SI */ + COSTS_N_INSNS (6), /* DI */ + COSTS_N_INSNS (6)}, /* other */ + 0, /* cost of multiply per each bit set */ + {COSTS_N_INSNS (19), /* cost of a divide/mod for QI */ + COSTS_N_INSNS (35), /* HI */ + COSTS_N_INSNS (51), /* SI */ + COSTS_N_INSNS (83), /* DI */ + COSTS_N_INSNS (83)}, /* other */ + COSTS_N_INSNS (1), /* cost of movsx */ + COSTS_N_INSNS (1), /* cost of movzx */ + 8, /* "large" insn */ + 9, /* MOVE_RATIO */ + 4, /* cost for loading QImode using movzbl */ + {5, 5, 4}, /* cost of loading integer registers + in QImode, HImode and SImode. + Relative to reg-reg move (2). */ + {4, 4, 4}, /* cost of storing integer registers */ + 2, /* cost of reg,reg fld/fst */ + {5, 5, 12}, /* cost of loading fp registers + in SFmode, DFmode and XFmode */ + {4, 4, 8}, /* cost of storing fp registers + in SFmode, DFmode and XFmode */ + 2, /* cost of moving MMX register */ + {4, 4}, /* cost of loading MMX registers + in SImode and DImode */ + {4, 4}, /* cost of storing MMX registers + in SImode and DImode */ + 2, /* cost of moving SSE register */ + {4, 4, 4}, /* cost of loading SSE registers + in SImode, DImode and TImode */ + {4, 4, 4}, /* cost of storing SSE registers + in SImode, DImode and TImode */ + 2, /* MMX or SSE register to integer */ + /* On K8: + MOVD reg64, xmmreg Double FSTORE 4 + MOVD reg32, xmmreg Double FSTORE 4 + On AMDFAM10: + MOVD reg64, xmmreg Double FADD 3 + 1/1 1/1 + MOVD reg32, xmmreg Double FADD 3 + 1/1 1/1 */ + 16, /* size of l1 cache. */ + 2048, /* size of l2 cache. */ + 64, /* size of prefetch block */ + /* New AMD processors never drop prefetches; if they cannot be performed + immediately, they are queued. We set number of simultaneous prefetches + to a large constant to reflect this (it probably is not a good idea not + to limit number of prefetches at all, as their execution also takes some + time). */ + 100, /* number of parallel prefetches */ + 2, /* Branch cost */ + COSTS_N_INSNS (6), /* cost of FADD and FSUB insns. */ + COSTS_N_INSNS (6), /* cost of FMUL instruction. */ + COSTS_N_INSNS (42), /* cost of FDIV instruction. */ + COSTS_N_INSNS (2), /* cost of FABS instruction. */ + COSTS_N_INSNS (2), /* cost of FCHS instruction. */ + COSTS_N_INSNS (52), /* cost of FSQRT instruction. */ + + /* BDVER2 has optimized REP instruction for medium sized blocks, but for + very small blocks it is better to use loop. For large blocks, libcall + can do nontemporary accesses and beat inline considerably. */ + {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}}, + {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}}, + {{libcall, {{8, loop}, {24, unrolled_loop}, + {2048, rep_prefix_4_byte}, {-1, libcall}}}, + {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}}, + 6, /* scalar_stmt_cost. */ + 4, /* scalar load_cost. */ + 4, /* scalar_store_cost. */ + 6, /* vec_stmt_cost. */ + 0, /* vec_to_scalar_cost. */ + 2, /* scalar_to_vec_cost. */ + 4, /* vec_align_load_cost. */ + 4, /* vec_unalign_load_cost. */ + 4, /* vec_store_cost. */ + 2, /* cond_taken_branch_cost. */ + 1, /* cond_not_taken_branch_cost. */ +}; + struct processor_costs btver1_cost = { COSTS_N_INSNS (1), /* cost of an add instruction */ COSTS_N_INSNS (2), /* cost of a lea instruction */ @@ -1813,8 +1900,10 @@ const struct processor_costs *ix86_cost #define m_ATHLON_K8 (m_K8 | m_ATHLON) #define m_AMDFAM10 (1<