Patchwork [i386] : AMD btver2 enablement

login
register
mail settings
Submitter venkataramanan.kumar@amd.com
Date July 20, 2012, 4:39 p.m.
Message ID <20120720163932.29552.15178.sendpatchset@adcelk01.amd.com>
Download mbox | patch
Permalink /patch/172310/
State New
Headers show

Comments

venkataramanan.kumar@amd.com - July 20, 2012, 4:39 p.m.
Hi Maintainers,

Below patch does the basic enablement for next generation AMD low power btver2 core.
It defines -march=btver2 and -mtune=btver2, and lets -march=native correctly
recognizes btver2. At the moment the tuning is mostly a copy of btver1.
The patch passed bootstrap and the x86 tests.

Is it OK to commit to trunk?

Also can I modify doc/invoke.texi now?

regards,
Venkat.
Jakub Jelinek - July 20, 2012, 5:01 p.m.
On Fri, Jul 20, 2012 at 11:39:32AM -0500, venkataramanan.kumar@amd.com wrote:
> Below patch does the basic enablement for next generation AMD low power btver2 core.
> It defines -march=btver2 and -mtune=btver2, and lets -march=native correctly
> recognizes btver2. At the moment the tuning is mostly a copy of btver1.
> The patch passed bootstrap and the x86 tests.
> 
> Is it OK to commit to trunk?
> 
> Also can I modify doc/invoke.texi now?

Seems the only difference from btver1 are the cache sizes, right?  That
looks very expensive way of adding another cache size, we can only have
32 different schedulings as PROCESSOR_* is used as a bitmask.
For -march=native, it is either no changes or just some small driver-i386.c
tweaks needed to make sure that for this CPU -mtune=btver1 would be used
together with --param l2-cache-size=whatever, or perhaps -mtune=btver2
could be handled just as an alias of -mtune=btver1 plus --param
l2-cache-size=512?  Having lots of copy&paste tuning in the compiler
doesn't look like a good idea to me.

Of course unless you plan significant scheduling tweaks for the CPU in the
near future.

	Jakub
venkataramanan.kumar@amd.com - July 20, 2012, 6:03 p.m.
Hi Jakub,

Thanks for reviewing the patch.

> Seems the only difference from btver1 are the cache sizes, right?  That
> looks very expensive way of adding another cache size, we can only have
> 32 different schedulings as PROCESSOR_* is used as a bitmask.
> For -march=native, it is either no changes or just some small driver-i386.c
> tweaks needed to make sure that for this CPU -mtune=btver1 would be used
> together with --param l2-cache-size=whatever, or perhaps -mtune=btver2
> could be handled just as an alias of -mtune=btver1 plus --param
> l2-cache-size=512?  Having lots of copy&paste tuning in the compiler
> doesn't look like a good idea to me.

We are expecting some changes to tunings and costs. We will update them in near future.
There are ISA changes as well like btver2 supports AVX, BMI.

This patch is a first placeholder and additional changes are still to be applied.

> 
> Of course unless you plan significant scheduling tweaks for the CPU in the
> near future.
>

Yes we want to add scheduler descriptions and latency information in near future.

 
Regards,
Venkat.

> -----Original Message-----
> From: Jakub Jelinek [mailto:jakub@redhat.com]
> Sent: Friday, July 20, 2012 10:31 PM
> To: Kumar, Venkataramanan
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH, i386]: AMD btver2 enablement
> 
> On Fri, Jul 20, 2012 at 11:39:32AM -0500, venkataramanan.kumar@amd.com wrote:
> > Below patch does the basic enablement for next generation AMD low power
> btver2 core.
> > It defines -march=btver2 and -mtune=btver2, and lets -march=native correctly
> > recognizes btver2. At the moment the tuning is mostly a copy of btver1.
> > The patch passed bootstrap and the x86 tests.
> >
> > Is it OK to commit to trunk?
> >
> > Also can I modify doc/invoke.texi now?
> 
> Seems the only difference from btver1 are the cache sizes, right?  That
> looks very expensive way of adding another cache size, we can only have
> 32 different schedulings as PROCESSOR_* is used as a bitmask.
> For -march=native, it is either no changes or just some small driver-i386.c
> tweaks needed to make sure that for this CPU -mtune=btver1 would be used
> together with --param l2-cache-size=whatever, or perhaps -mtune=btver2
> could be handled just as an alias of -mtune=btver1 plus --param
> l2-cache-size=512?  Having lots of copy&paste tuning in the compiler
> doesn't look like a good idea to me.
> 
> Of course unless you plan significant scheduling tweaks for the CPU in the
> near future.
> 
> 	Jakub

Patch

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 189510)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,28 @@ 
+2012-7-18  Venkataramanan Kumar  <venkataramanan.kumar@amd.com>
+
+	Jaguar Enablement
+	* config.gcc (i[34567]86-*-linux* | ...): Add btver2.
+	(case ${target}): Add btver2.
+	* config/i386/driver-i386.c (host_detect_local_cpu): Let
+	-march=native recognize btver2 processors.
+	* config/i386/i386-c.c (ix86_target_macros_internal): Add
+	btver2 def_and_undef
+	* config/i386/i386.c (struct processor_costs btver2_cost): New
+	btver2 cost table.
+	(m_BTVER2): New definition.
+	(m_AMD_MULTIPLE): Includes m_BTVER2.
+	(initial_ix86_tune_features): Add btver2 tune.
+	(processor_target_table): Add btver2 entry.
+	(static const char *const cpu_names): Add btver2 entry.
+	(software_prefetching_beneficial_p): Add btver2.
+	(ix86_option_override_internal): Add btver2 instruction sets.
+	(ix86_issue_rate): Add btver2.
+	(ix86_adjust_cost): Add btver2.
+	* config/i386/i386.h (TARGET_BTVER2): New definition.
+	(enum target_cpu_default): Add TARGET_CPU_DEFAULT_btver2.
+	(enum processor_type): Add PROCESSOR_BTVER2.
+	* config/i386/i386.md (define_attr "cpu"): Add btver2.
+
 2012-07-16  Hans-Peter Nilsson  <hp@axis.com>
 
 	* config/cris/cris-protos.h (cris_legitimate_address_p): Declare.
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 189510)
+++ gcc/config.gcc	(working copy)
@@ -1214,7 +1214,7 @@ 
 			TM_MULTILIB_CONFIG=`echo $TM_MULTILIB_CONFIG | sed 's/^,//'`
 			need_64bit_isa=yes
 			case X"${with_cpu}" in
-			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+			Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 				;;
 			X)
 				if test x$with_cpu_64 = x; then
@@ -1223,7 +1223,7 @@ 
 				;;
 			*)
 				echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+				echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 				exit 1
 				;;
 			esac
@@ -1335,7 +1335,7 @@ 
 		tmake_file="$tmake_file i386/t-sol2-64"
 		need_64bit_isa=yes
 		case X"${with_cpu}" in
-		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+		Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 			;;
 		X)
 			if test x$with_cpu_64 = x; then
@@ -1344,7 +1344,7 @@ 
 			;;
 		*)
 			echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+			echo "generic atom core2 corei7 corei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 			exit 1
 			;;
 		esac
@@ -1401,7 +1401,7 @@ 
 			if test x$enable_targets = xall; then
 				tm_defines="${tm_defines} TARGET_BI_ARCH=1"
 				case X"${with_cpu}" in
-				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
+				Xgeneric|Xatom|Xcore2|Xcorei7|Xcorei7-avx|Xnocona|Xx86-64|Xbdver2|Xbdver1|Xbtver2|Xbtver1|Xamdfam10|Xbarcelona|Xk8|Xopteron|Xathlon64|Xathlon-fx|Xathlon64-sse3|Xk8-sse3|Xopteron-sse3)
 					;;
 				X)
 					if test x$with_cpu_64 = x; then
@@ -1410,7 +1410,7 @@ 
 					;;
 				*)
 					echo "Unsupported CPU used in --with-cpu=$with_cpu, supported values:" 1>&2
-					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
+					echo "generic atom core2 corei7 Xcorei7-avx nocona x86-64 bdver2 bdver1 btver2 btver1 amdfam10 barcelona k8 opteron athlon64 athlon-fx athlon64-sse3 k8-sse3 opteron-sse3" 1>&2
 					exit 1
 					;;
 				esac
@@ -2630,6 +2630,10 @@ 
 	arch=btver1
 	cpu=btver1
 	;;
+      btver2-*)
+	arch=btver2
+	cpu=btver2
+	;;
       amdfam10-*|barcelona-*)
 	arch=amdfam10
 	cpu=amdfam10
@@ -2727,6 +2731,10 @@ 
 	arch=btver1
 	cpu=btver1
 	;;
+      btver2-*)
+	arch=btver2
+	cpu=btver2
+	;;
       amdfam10-*|barcelona-*)
 	arch=amdfam10
 	cpu=amdfam10
@@ -3161,7 +3169,7 @@ 
 				;;
 			"" | x86-64 | generic | native \
 			| k8 | k8-sse3 | athlon64 | athlon64-sse3 | opteron \
-			| opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver1 \
+			| opteron-sse3 | athlon-fx | bdver2 | bdver1 | btver2 | btver1 \
 			| amdfam10 | barcelona | nocona | core2 | corei7 \
 			| corei7-avx | core-avx-i | core-avx2 | atom)
 				# OK
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	(revision 189510)
+++ gcc/config/i386/i386.h	(working copy)
@@ -249,6 +249,7 @@ 
 #define TARGET_BDVER1 (ix86_tune == PROCESSOR_BDVER1)
 #define TARGET_BDVER2 (ix86_tune == PROCESSOR_BDVER2)
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
+#define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ATOM (ix86_tune == PROCESSOR_ATOM)
 
 /* Feature tests against the various tunings.  */
@@ -608,6 +609,7 @@ 
   TARGET_CPU_DEFAULT_bdver1,
   TARGET_CPU_DEFAULT_bdver2,
   TARGET_CPU_DEFAULT_btver1,
+  TARGET_CPU_DEFAULT_btver2,
 
   TARGET_CPU_DEFAULT_max
 };
@@ -2067,6 +2069,7 @@ 
   PROCESSOR_BDVER1,
   PROCESSOR_BDVER2,
   PROCESSOR_BTVER1,
+  PROCESSOR_BTVER2,
   PROCESSOR_ATOM,
   PROCESSOR_max
 };
Index: gcc/config/i386/i386.md
===================================================================
--- gcc/config/i386/i386.md	(revision 189510)
+++ gcc/config/i386/i386.md	(working copy)
@@ -310,7 +310,7 @@ 
 
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,corei7,
-		    atom,generic64,amdfam10,bdver1,bdver2,btver1"
+		    atom,generic64,amdfam10,bdver1,bdver2,btver1,btver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
Index: gcc/config/i386/i386-c.c
===================================================================
--- gcc/config/i386/i386-c.c	(revision 189510)
+++ gcc/config/i386/i386-c.c	(working copy)
@@ -118,6 +118,10 @@ 
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
       break;
+    case PROCESSOR_BTVER2:
+      def_or_undef (parse_in, "__btver2");
+      def_or_undef (parse_in, "__btver2__");
+      break;
     case PROCESSOR_PENTIUM4:
       def_or_undef (parse_in, "__pentium4");
       def_or_undef (parse_in, "__pentium4__");
@@ -208,6 +212,9 @@ 
    case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
+    case PROCESSOR_BTVER2:
+      def_or_undef (parse_in, "__tune_btver2__");
+       break;
     case PROCESSOR_PENTIUM4:
       def_or_undef (parse_in, "__tune_pentium4__");
       break;
Index: gcc/config/i386/driver-i386.c
===================================================================
--- gcc/config/i386/driver-i386.c	(revision 189510)
+++ gcc/config/i386/driver-i386.c	(working copy)
@@ -514,6 +514,8 @@ 
 
       if (name == SIG_GEODE)
 	processor = PROCESSOR_GEODE;
+      else if (has_movbe)
+	processor = PROCESSOR_BTVER2;
       else if (has_bmi)
         processor = PROCESSOR_BDVER2;
       else if (has_xop)
@@ -687,6 +689,9 @@ 
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
+    case PROCESSOR_BTVER2:
+      cpu = "btver2";
+      break;
 
     default:
       /* Use something reasonable.  */
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 189510)
+++ gcc/config/i386/i386.c	(working copy)
@@ -1508,6 +1508,85 @@ 
   1,					/* cond_not_taken_branch_cost.  */
 };
 
+struct processor_costs btver2_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction */
+  COSTS_N_INSNS (2),			/* cost of a lea instruction */
+  COSTS_N_INSNS (1),			/* variable shift costs */
+  COSTS_N_INSNS (1),			/* constant shift costs */
+  {COSTS_N_INSNS (3),			/* cost of starting multiply for QI */
+   COSTS_N_INSNS (4),			/*				 HI */
+   COSTS_N_INSNS (3),			/*				 SI */
+   COSTS_N_INSNS (4),			/*				 DI */
+   COSTS_N_INSNS (5)},			/*			      other */
+  0,					/* cost of multiply per each bit set */
+  {COSTS_N_INSNS (19),			/* cost of a divide/mod for QI */
+   COSTS_N_INSNS (35),			/*			    HI */
+   COSTS_N_INSNS (51),			/*			    SI */
+   COSTS_N_INSNS (83),			/*			    DI */
+   COSTS_N_INSNS (83)},			/*			    other */
+  COSTS_N_INSNS (1),			/* cost of movsx */
+  COSTS_N_INSNS (1),			/* cost of movzx */
+  8,					/* "large" insn */
+  9,					/* MOVE_RATIO */
+  4,				     /* cost for loading QImode using movzbl */
+  {3, 4, 3},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {3, 4, 3},				/* cost of storing integer registers */
+  4,					/* cost of reg,reg fld/fst */
+  {4, 4, 12},				/* cost of loading fp registers
+					   in SFmode, DFmode and XFmode */
+  {6, 6, 8},				/* cost of storing fp registers
+					   in SFmode, DFmode and XFmode */
+  2,					/* cost of moving MMX register */
+  {3, 3},				/* cost of loading MMX registers
+					   in SImode and DImode */
+  {4, 4},				/* cost of storing MMX registers
+					   in SImode and DImode */
+  2,					/* cost of moving SSE register */
+  {4, 4, 3},				/* cost of loading SSE registers
+					   in SImode, DImode and TImode */
+  {4, 4, 5},				/* cost of storing SSE registers
+					   in SImode, DImode and TImode */
+  3,					/* MMX or SSE register to integer */
+					/* On K8:
+					   MOVD reg64, xmmreg Double FSTORE 4
+					   MOVD reg32, xmmreg Double FSTORE 4
+					   On AMDFAM10:
+					   MOVD reg64, xmmreg Double FADD 3
+							       1/1  1/1
+					    MOVD reg32, xmmreg Double FADD 3
+							       1/1  1/1 */
+  32,					/* size of l1 cache.  */
+  2048,					/* size of l2 cache.  */
+  64,					/* size of prefetch block */
+  100,					/* number of parallel prefetches */
+  2,					/* Branch cost */
+  COSTS_N_INSNS (4),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (4),			/* cost of FMUL instruction.  */
+  COSTS_N_INSNS (19),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (2),			/* cost of FCHS instruction.  */
+  COSTS_N_INSNS (35),			/* cost of FSQRT instruction.  */
+
+  {{libcall, {{6, loop}, {14, unrolled_loop}, {-1, rep_prefix_4_byte}}},
+   {libcall, {{16, loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  {{libcall, {{8, loop}, {24, unrolled_loop},
+	      {2048, rep_prefix_4_byte}, {-1, libcall}}},
+   {libcall, {{48, unrolled_loop}, {8192, rep_prefix_8_byte}, {-1, libcall}}}},
+  4,					/* scalar_stmt_cost.  */
+  2,					/* scalar load_cost.  */
+  2,					/* scalar_store_cost.  */
+  6,					/* vec_stmt_cost.  */
+  0,					/* vec_to_scalar_cost.  */
+  2,					/* scalar_to_vec_cost.  */
+  2,					/* vec_align_load_cost.  */
+  2,					/* vec_unalign_load_cost.  */
+  2,					/* vec_store_cost.  */
+  2,					/* cond_taken_branch_cost.  */
+  1,					/* cond_not_taken_branch_cost.  */
+};
+
 static const
 struct processor_costs pentium4_cost = {
   COSTS_N_INSNS (1),			/* cost of an add instruction */
@@ -1908,8 +1987,10 @@ 
 #define m_BDVER1 (1<<PROCESSOR_BDVER1)
 #define m_BDVER2 (1<<PROCESSOR_BDVER2)
 #define m_BDVER	(m_BDVER1 | m_BDVER2)
+#define m_BTVER (m_BTVER1 | m_BTVER2)
 #define m_BTVER1 (1<<PROCESSOR_BTVER1)
-#define m_AMD_MULTIPLE (m_ATHLON_K8 | m_AMDFAM10 | m_BDVER | m_BTVER1)
+#define m_BTVER2 (1<<PROCESSOR_BTVER2)
+#define m_AMD_MULTIPLE (m_ATHLON_K8 | m_AMDFAM10 | m_BDVER | m_BTVER)
 
 #define m_GENERIC32 (1<<PROCESSOR_GENERIC32)
 #define m_GENERIC64 (1<<PROCESSOR_GENERIC64)
@@ -1949,7 +2030,7 @@ 
   ~m_386,
 
   /* X86_TUNE_USE_SAHF */
-  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER1 | m_GENERIC,
+  m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC,
 
   /* X86_TUNE_MOVX: Enable to zero extend integer registers to avoid
      partial dependencies.  */
@@ -2055,7 +2136,7 @@ 
   m_PPRO | m_P4_NOCONA | m_CORE2I7 | m_ATOM  | m_AMDFAM10 | m_BDVER | m_GENERIC,
 
   /* X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL */
-  m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER1,
+  m_COREI7 | m_AMDFAM10 | m_BDVER | m_BTVER,
 
   /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL */
   m_COREI7 | m_BDVER,
@@ -2130,11 +2211,11 @@ 
 
   /* X86_TUNE_SLOW_IMUL_IMM32_MEM: Imul of 32-bit constant and memory is
      vector path on AMD machines.  */
-  m_CORE2I7_64 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER1 | m_GENERIC64,
+  m_CORE2I7_64 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_SLOW_IMUL_IMM8: Imul of 8-bit constant is vector path on AMD
      machines.  */
-  m_CORE2I7_64 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER1 | m_GENERIC64,
+  m_CORE2I7_64 | m_K8 | m_AMDFAM10 | m_BDVER | m_BTVER | m_GENERIC64,
 
   /* X86_TUNE_MOVE_M1_VIA_OR: On pentiums, it is faster to load -1 via OR
      than a MOV.  */
@@ -2605,6 +2686,7 @@ 
   {&bdver1_cost, 32, 24, 32, 7, 32},
   {&bdver2_cost, 32, 24, 32, 7, 32},
   {&btver1_cost, 32, 24, 32, 7, 32},
+  {&btver2_cost, 32, 24, 32, 7, 32},
   {&atom_cost, 16, 15, 16, 7, 16}
 };
 
@@ -2635,7 +2717,8 @@ 
   "amdfam10",
   "bdver1",
   "bdver2",
-  "btver1"
+  "btver1",
+  "btver2"
 };
 
 /* Return true if a red-zone is in use.  */
@@ -3081,6 +3164,11 @@ 
         | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16},
       {"generic32", PROCESSOR_GENERIC32, CPU_PENTIUMPRO,
 	PTA_HLE /* flags are only used for -march switch.  */ },
+      {"btver2", PROCESSOR_BTVER2, CPU_GENERIC64,
+	PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
+	| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
+	| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
+	| PTA_BMI | PTA_F16C | PTA_MOVBE},
       {"generic64", PROCESSOR_GENERIC64, CPU_GENERIC64,
 	PTA_64BIT
         | PTA_HLE /* flags are only used for -march switch.  */ },
@@ -23642,6 +23730,7 @@ 
     case PROCESSOR_PENTIUM:
     case PROCESSOR_ATOM:
     case PROCESSOR_K6:
+    case PROCESSOR_BTVER2:
       return 2;
 
     case PROCESSOR_PENTIUMPRO:
@@ -23848,6 +23937,7 @@ 
     case PROCESSOR_BDVER1:
     case PROCESSOR_BDVER2:
     case PROCESSOR_BTVER1:
+    case PROCESSOR_BTVER2:
     case PROCESSOR_ATOM:
     case PROCESSOR_GENERIC32:
     case PROCESSOR_GENERIC64: