diff mbox series

[x86_64] : AMD znver2 enablement

Message ID CY4PR12MB14621BCA9F3D91DEA0F745328FCD0@CY4PR12MB1462.namprd12.prod.outlook.com
State New
Headers show
Series [x86_64] : AMD znver2 enablement | expand

Commit Message

Kumar, Venkataramanan Oct. 31, 2018, 5:25 a.m. UTC
Hi Maintainers,

PFA, the patch that enables support for the next generation AMD  Zen CPU via -march=znver2. 
As of now,  znver2 is using the same costs and scheduler descriptions written for znver1.

We will update scheduler descriptions and costing for znver2 later as we get more information.

Ok for trunk?

Regards,
Venkat.

ChangeLog gcc:
        * common/config/i386/i386-common.c (processor_alias_table): Add znver2 entry.
              * config.gcc (i[34567]86-*-linux* | ...): Add znver2.
              (case ${target}): Add znver2.
              * config/i386/driver-i386.c: (host_detect_local_cpu): Let
              -march=native recognize znver2 processors.
              * config/i386/i386-c.c (ix86_target_macros_internal): Add znver2.
              * config/i386/i386.c (m_znver2): New definition.
              (m_ZNVER): New definition.
              (m_AMD_MULTIPLE): Includes m_znver2.
              (processor_cost_table): Add znver2 entry.
              (processor_target_table): Add znver2 entry.
              (get_builtin_code_for_version): Set priority for
         PROCESSOR_ZNVER2.
        (processor_model): Add M_AMDFAM17H_ZNVER2.
        (arch_names_table): Ditto.
        (ix86_reassociation_width): Include znver2. 
        * config/i386/i386.h (TARGET_znver2): New definition.
              (struct ix86_size_cost): Add TARGET_ZNVER2.
              (enum processor_type): Add PROCESSOR_ZNVER2.
              * config/i386/i386.md (define_attr "cpu"): Add znver2.
        * config/i386/x86-tune-costs.h: (processor_costs) Add znver2 costs.
        * config/i386/x86-tune-sched.c: (ix86_issue_rate): Add znver2.
        (ix86_adjust_cost): Add znver2.
              * config/i386/x86-tune.def:  Replace m_ZNVER1 by m_ZNVER
              * gcc/doc/extend.texi: Add details about znver2.
              * gcc/doc/invoke.texi: Add details about znver2.

ChangeLog libgcc
         * config/i386/cpuinfo.c: (get_amd_cpu): Add znver2.
         (processor_subtypes): Ditto.

Comments

Uros Bizjak Nov. 2, 2018, 3:36 p.m. UTC | #1
On Wed, Oct 31, 2018 at 6:25 AM Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
>
> Hi Maintainers,
>
> PFA, the patch that enables support for the next generation AMD  Zen CPU via -march=znver2.
> As of now,  znver2 is using the same costs and scheduler descriptions written for znver1.
>
> We will update scheduler descriptions and costing for znver2 later as we get more information.
>
> Ok for trunk?
>
> Regards,
> Venkat.
>
> ChangeLog gcc:
>         * common/config/i386/i386-common.c (processor_alias_table): Add znver2 entry.
>               * config.gcc (i[34567]86-*-linux* | ...): Add znver2.
>               (case ${target}): Add znver2.
>               * config/i386/driver-i386.c: (host_detect_local_cpu): Let
>               -march=native recognize znver2 processors.
>               * config/i386/i386-c.c (ix86_target_macros_internal): Add znver2.
>               * config/i386/i386.c (m_znver2): New definition.
>               (m_ZNVER): New definition.
>               (m_AMD_MULTIPLE): Includes m_znver2.
>               (processor_cost_table): Add znver2 entry.
>               (processor_target_table): Add znver2 entry.
>               (get_builtin_code_for_version): Set priority for
>          PROCESSOR_ZNVER2.
>         (processor_model): Add M_AMDFAM17H_ZNVER2.
>         (arch_names_table): Ditto.
>         (ix86_reassociation_width): Include znver2.
>         * config/i386/i386.h (TARGET_znver2): New definition.
>               (struct ix86_size_cost): Add TARGET_ZNVER2.
>               (enum processor_type): Add PROCESSOR_ZNVER2.
>               * config/i386/i386.md (define_attr "cpu"): Add znver2.
>         * config/i386/x86-tune-costs.h: (processor_costs) Add znver2 costs.
>         * config/i386/x86-tune-sched.c: (ix86_issue_rate): Add znver2.
>         (ix86_adjust_cost): Add znver2.
>               * config/i386/x86-tune.def:  Replace m_ZNVER1 by m_ZNVER
>               * gcc/doc/extend.texi: Add details about znver2.
>               * gcc/doc/invoke.texi: Add details about znver2.
>
> ChangeLog libgcc
>          * config/i386/cpuinfo.c: (get_amd_cpu): Add znver2.
>          (processor_subtypes): Ditto.


diff --git a/libgcc/config/i386/cpuinfo.h b/libgcc/config/i386/cpuinfo.h
index 0aa887b..86cb4ea 100644
--- a/libgcc/config/i386/cpuinfo.h
+++ b/libgcc/config/i386/cpuinfo.h
@@ -67,6 +67,7 @@ enum processor_subtypes
   AMDFAM15H_BDVER3,
   AMDFAM15H_BDVER4,
   AMDFAM17H_ZNVER1,
+  AMDFAM17H_ZNVER2,
   INTEL_COREI7_IVYBRIDGE,
   INTEL_COREI7_HASWELL,
   INTEL_COREI7_BROADWELL,

As the comment above these enums says:

/* Any new types or subtypes have to be inserted at the end. */

So, please add new entry at the end of enum processor_types.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 963c7fc..bbe3bb3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -32269,6 +32276,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
     M_AMDFAM15H_BDVER3,
     M_AMDFAM15H_BDVER4,
     M_AMDFAM17H_ZNVER1,
+    M_AMDFAM17H_ZNVER2,
     M_INTEL_COREI7_IVYBRIDGE,
     M_INTEL_COREI7_HASWELL,
     M_INTEL_COREI7_BROADWELL,

The above also have to be in sync with enum processor_subtypes.

Otherwise LGTM.

Uros.
Kumar, Venkataramanan Nov. 3, 2018, 6:51 p.m. UTC | #2
Hi Uros,

> -----Original Message-----
> From: Uros Bizjak <ubizjak@gmail.com>
> Sent: Friday, November 2, 2018 9:06 PM
> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> Cc: gcc-patches@gcc.gnu.org; Jan Hubicka <hubicka@ucw.cz>
> Subject: Re: [patch][x86_64]: AMD znver2 enablement
> 
> On Wed, Oct 31, 2018 at 6:25 AM Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com> wrote:
> >
> > Hi Maintainers,
> >
> > PFA, the patch that enables support for the next generation AMD  Zen CPU
> via -march=znver2.
> > As of now,  znver2 is using the same costs and scheduler descriptions
> written for znver1.
> >
> > We will update scheduler descriptions and costing for znver2 later as we
> get more information.
> >
> > Ok for trunk?
> >
> > Regards,
> > Venkat.
> >
> > ChangeLog gcc:
> >         * common/config/i386/i386-common.c (processor_alias_table): Add
> znver2 entry.
> >               * config.gcc (i[34567]86-*-linux* | ...): Add znver2.
> >               (case ${target}): Add znver2.
> >               * config/i386/driver-i386.c: (host_detect_local_cpu): Let
> >               -march=native recognize znver2 processors.
> >               * config/i386/i386-c.c (ix86_target_macros_internal): Add znver2.
> >               * config/i386/i386.c (m_znver2): New definition.
> >               (m_ZNVER): New definition.
> >               (m_AMD_MULTIPLE): Includes m_znver2.
> >               (processor_cost_table): Add znver2 entry.
> >               (processor_target_table): Add znver2 entry.
> >               (get_builtin_code_for_version): Set priority for
> >          PROCESSOR_ZNVER2.
> >         (processor_model): Add M_AMDFAM17H_ZNVER2.
> >         (arch_names_table): Ditto.
> >         (ix86_reassociation_width): Include znver2.
> >         * config/i386/i386.h (TARGET_znver2): New definition.
> >               (struct ix86_size_cost): Add TARGET_ZNVER2.
> >               (enum processor_type): Add PROCESSOR_ZNVER2.
> >               * config/i386/i386.md (define_attr "cpu"): Add znver2.
> >         * config/i386/x86-tune-costs.h: (processor_costs) Add znver2 costs.
> >         * config/i386/x86-tune-sched.c: (ix86_issue_rate): Add znver2.
> >         (ix86_adjust_cost): Add znver2.
> >               * config/i386/x86-tune.def:  Replace m_ZNVER1 by m_ZNVER
> >               * gcc/doc/extend.texi: Add details about znver2.
> >               * gcc/doc/invoke.texi: Add details about znver2.
> >
> > ChangeLog libgcc
> >          * config/i386/cpuinfo.c: (get_amd_cpu): Add znver2.
> >          (processor_subtypes): Ditto.
> 
> 
> diff --git a/libgcc/config/i386/cpuinfo.h b/libgcc/config/i386/cpuinfo.h index
> 0aa887b..86cb4ea 100644
> --- a/libgcc/config/i386/cpuinfo.h
> +++ b/libgcc/config/i386/cpuinfo.h
> @@ -67,6 +67,7 @@ enum processor_subtypes
>    AMDFAM15H_BDVER3,
>    AMDFAM15H_BDVER4,
>    AMDFAM17H_ZNVER1,
> +  AMDFAM17H_ZNVER2,
>    INTEL_COREI7_IVYBRIDGE,
>    INTEL_COREI7_HASWELL,
>    INTEL_COREI7_BROADWELL,
> 
> As the comment above these enums says:
> 
> /* Any new types or subtypes have to be inserted at the end. */
> 
> So, please add new entry at the end of enum processor_types.
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
> 963c7fc..bbe3bb3 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -32269,6 +32276,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
>      M_AMDFAM15H_BDVER3,
>      M_AMDFAM15H_BDVER4,
>      M_AMDFAM17H_ZNVER1,
> +    M_AMDFAM17H_ZNVER2,
>      M_INTEL_COREI7_IVYBRIDGE,
>      M_INTEL_COREI7_HASWELL,
>      M_INTEL_COREI7_BROADWELL,
> 
> The above also have to be in sync with enum processor_subtypes.
> 
> Otherwise LGTM.
> 
> Uros.

I have updated the patch as per your review comments.  Thank you,  I will commit the attached patch.

Regards,
Venkat.

ChangeLog:
        * common/config/i386/i386-common.c (processor_alias_table): Add znver2 entry.
	* config.gcc (i[34567]86-*-linux* | ...): Add znver2.
	(case ${target}): Add znver2.
	* config/i386/driver-i386.c: (host_detect_local_cpu): Let
	-march=native recognize znver2 processors.
	* config/i386/i386-c.c (ix86_target_macros_internal): Add znver2.
	* config/i386/i386.c (m_znver2): New definition.
	(m_ZNVER): New definition.
	(m_AMD_MULTIPLE): Includes m_znver2.
	(processor_cost_table): Add znver2 entry.
	(processor_target_table): Add znver2 entry.
	(get_builtin_code_for_version): Set priority for
         PROCESSOR_ZNVER2.
        (processor_model): Add M_AMDFAM17H_ZNVER2.
        (arch_names_table): Ditto.
        (ix86_reassociation_width): Include znver2. 
        * config/i386/i386.h (TARGET_znver2): New definition.
	(struct ix86_size_cost): Add TARGET_ZNVER2.
	(enum processor_type): Add PROCESSOR_ZNVER2.
	* config/i386/i386.md (define_attr "cpu"): Add znver2.
        * config/i386/x86-tune-costs.h: (processor_costs) Add znver2 costs.
        * config/i386/x86-tune-sched.c: (ix86_issue_rate): Add znver2.
        (ix86_adjust_cost): Add znver2.
	* config/i386/x86-tune.def:  Replace m_ZNVER1 by m_ZNVER
	* gcc/doc/extend.texi: Add details about znver2.
	* gcc/doc/invoke.texi: Add details about znver2.

ChangeLog libgcc
         * config/i386/cpuinfo.c: (get_amd_cpu): Add znver2.
         (processor_subtypes): Ditto.
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index f12806e..ff13ea5 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1677,6 +1677,16 @@ const pta processor_alias_table[] =
       | PTA_RDRND | PTA_MOVBE | PTA_MWAITX | PTA_ADX | PTA_RDSEED
       | PTA_CLZERO | PTA_CLFLUSHOPT | PTA_XSAVEC | PTA_XSAVES
       | PTA_SHA | PTA_LZCNT | PTA_POPCNT},
+  {"znver2", PROCESSOR_ZNVER2, CPU_ZNVER1,
+    PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+      | PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
+      | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_AVX2
+      | PTA_BMI | PTA_BMI2 | PTA_F16C | PTA_FMA | PTA_PRFCHW
+      | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT | PTA_FSGSBASE
+      | PTA_RDRND | PTA_MOVBE | PTA_MWAITX | PTA_ADX | PTA_RDSEED
+      | PTA_CLZERO | PTA_CLFLUSHOPT | PTA_XSAVEC | PTA_XSAVES
+      | PTA_SHA | PTA_LZCNT | PTA_POPCNT | PTA_CLWB | PTA_RDPID
+      | PTA_WBNOINVD},
   {"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
     PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
       | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 93dc297..a47e6c3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -664,11 +664,11 @@ pentium4 pentium4m pentiumpro prescott lakemont"
 # 64-bit x86 processors supported by --with-arch=.  Each processor
 # MUST be separated by exactly one space.
 x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
-bdver3 bdver4 znver1 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
-core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \
-sandybridge ivybridge haswell broadwell bonnell silvermont knl knm \
-skylake-avx512 cannonlake icelake-client icelake-server skylake goldmont \
-goldmont-plus tremont x86-64 native"
+bdver3 bdver4 znver1 znver2 btver1 btver2 k8 k8-sse3 opteron \
+opteron-sse3 nocona core2 corei7 corei7-avx core-avx-i core-avx2 atom \
+slm nehalem westmere sandybridge ivybridge haswell broadwell bonnell \
+silvermont knl knm skylake-avx512 cannonlake icelake-client icelake-server \
+skylake goldmont goldmont-plus tremont x86-64 native"
 
 # Additional x86 processors supported by --with-cpu=.  Each processor
 # MUST be separated by exactly one space.
@@ -3336,6 +3336,10 @@ case ${target} in
 	arch=znver1
 	cpu=znver1
 	;;
+      znver2-*)
+	arch=znver2
+	cpu=znver2
+	;;
       bdver4-*)
         arch=bdver4
         cpu=bdver4
@@ -3453,6 +3457,10 @@ case ${target} in
 	arch=znver1
 	cpu=znver1
 	;;
+      znver2-*)
+	arch=znver2
+	cpu=znver2
+	;;
       bdver4-*)
         arch=bdver4
         cpu=bdver4
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 8c830bd..95ba393 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -649,6 +649,8 @@ const char *host_detect_local_cpu (int argc, const char **argv)
 	processor = PROCESSOR_GEODE;
       else if (has_movbe && family == 22)
 	processor = PROCESSOR_BTVER2;
+      else if (has_clwb)
+	processor = PROCESSOR_ZNVER2;
       else if (has_clzero)
 	processor = PROCESSOR_ZNVER1;
       else if (has_avx2)
@@ -1012,6 +1014,9 @@ const char *host_detect_local_cpu (int argc, const char **argv)
     case PROCESSOR_ZNVER1:
       cpu = "znver1";
       break;
+    case PROCESSOR_ZNVER2:
+      cpu = "znver2";
+      break;
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 005e1a5..a11be6f 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -124,6 +124,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__znver1");
       def_or_undef (parse_in, "__znver1__");
       break;
+    case PROCESSOR_ZNVER2:
+      def_or_undef (parse_in, "__znver2");
+      def_or_undef (parse_in, "__znver2__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
@@ -288,6 +292,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_ZNVER1:
       def_or_undef (parse_in, "__tune_znver1__");
       break;
+    case PROCESSOR_ZNVER2:
+      def_or_undef (parse_in, "__tune_znver2__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 963c7fc..f9ef0b4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -169,12 +169,14 @@ const struct processor_costs *ix86_cost = NULL;
 #define m_BDVER3 (HOST_WIDE_INT_1U<<PROCESSOR_BDVER3)
 #define m_BDVER4 (HOST_WIDE_INT_1U<<PROCESSOR_BDVER4)
 #define m_ZNVER1 (HOST_WIDE_INT_1U<<PROCESSOR_ZNVER1)
+#define m_ZNVER2 (HOST_WIDE_INT_1U<<PROCESSOR_ZNVER2)
 #define m_BTVER1 (HOST_WIDE_INT_1U<<PROCESSOR_BTVER1)
 #define m_BTVER2 (HOST_WIDE_INT_1U<<PROCESSOR_BTVER2)
 #define m_BDVER	(m_BDVER1 | m_BDVER2 | m_BDVER3 | m_BDVER4)
 #define m_BTVER (m_BTVER1 | m_BTVER2)
+#define m_ZNVER	(m_ZNVER1 | m_ZNVER2)
 #define m_AMD_MULTIPLE (m_ATHLON_K8 | m_AMDFAM10 | m_BDVER | m_BTVER \
-			| m_ZNVER1)
+			| m_ZNVER)
 
 #define m_GENERIC (HOST_WIDE_INT_1U<<PROCESSOR_GENERIC)
 
@@ -868,6 +870,7 @@ static const struct processor_costs *processor_cost_table[PROCESSOR_max] =
   &btver1_cost,
   &btver2_cost,
   &znver1_cost,
+  &znver2_cost
 };
 
 static unsigned int
@@ -31601,6 +31604,10 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
 	      arg_str = "znver1";
 	      priority = P_PROC_AVX2;
 	      break;
+	    case PROCESSOR_ZNVER2:
+	      arg_str = "znver2";
+	      priority = P_PROC_AVX2;
+	      break;
 	    }
 	}
 
@@ -32276,7 +32283,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
     M_INTEL_COREI7_SKYLAKE_AVX512,
     M_INTEL_COREI7_CANNONLAKE,
     M_INTEL_COREI7_ICELAKE_CLIENT,
-    M_INTEL_COREI7_ICELAKE_SERVER
+    M_INTEL_COREI7_ICELAKE_SERVER,
+    M_AMDFAM17H_ZNVER2
   };
 
   static struct _arch_names_table
@@ -32323,6 +32331,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
       {"btver2", M_AMD_BTVER2},
       {"amdfam17h", M_AMDFAM17H},
       {"znver1", M_AMDFAM17H_ZNVER1},
+      {"znver2", M_AMDFAM17H_ZNVER2},
     };
 
   static struct _isa_names_table
@@ -49200,8 +49209,8 @@ ix86_reassociation_width (unsigned int op, machine_mode mode)
 
       /* Integer vector instructions execute in FP unit
 	 and can execute 3 additions and one multiplication per cycle.  */
-      if (ix86_tune == PROCESSOR_ZNVER1 && INTEGRAL_MODE_P (mode)
-	  && op != PLUS && op != MINUS)
+      if ((ix86_tune == PROCESSOR_ZNVER1 || ix86_tune == PROCESSOR_ZNVER2)
+	   && INTEGRAL_MODE_P (mode) && op != PLUS && op != MINUS)
 	return 1;
 
       /* Account for targets that splits wide vectors into multiple parts.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 01d49a7..58caab2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -415,6 +415,7 @@ extern const struct processor_costs ix86_size_cost;
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
 #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ZNVER1 (ix86_tune == PROCESSOR_ZNVER1)
+#define TARGET_ZNVER2 (ix86_tune == PROCESSOR_ZNVER2)
 
 /* Feature tests against the various tunings.  */
 enum ix86_tune_indices {
@@ -2272,6 +2273,7 @@ enum processor_type
   PROCESSOR_BTVER1,
   PROCESSOR_BTVER2,
   PROCESSOR_ZNVER1,
+  PROCESSOR_ZNVER2,
   PROCESSOR_max
 };
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7fb2b14..8061a23 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -430,7 +430,7 @@
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,nehalem,
 		    atom,slm,glm,haswell,generic,amdfam10,bdver1,bdver2,bdver3,
-		    bdver4,btver2,znver1"
+		    bdver4,btver2,znver1,znver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 50ecb35..a47b92f 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1273,6 +1273,133 @@ struct processor_costs znver1_cost = {
   "16",					/* Func alignment.  */
 };
 
+/*  ZNVER2 has optimized REP instruction for medium sized blocks, but for
+    very small blocks it is better to use loop.  For large blocks, libcall
+    can do nontemporary accesses and beat inline considerably.  */
+static stringop_algs znver2_memcpy[2] = {
+  {libcall, {{6, loop, false}, {14, unrolled_loop, false},
+	     {-1, rep_prefix_4_byte, false}}},
+  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
+	     {-1, libcall, false}}}};
+static stringop_algs znver2_memset[2] = {
+  {libcall, {{8, loop, false}, {24, unrolled_loop, false},
+	     {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
+  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
+	     {-1, libcall, false}}}};
+
+struct processor_costs znver2_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction.  */
+  COSTS_N_INSNS (1),			/* cost of a lea instruction.  */
+  COSTS_N_INSNS (1),			/* variable shift costs.  */
+  COSTS_N_INSNS (1),			/* constant shift costs.  */
+  {COSTS_N_INSNS (3),			/* cost of starting multiply for QI.  */
+   COSTS_N_INSNS (3),			/* 				 HI.  */
+   COSTS_N_INSNS (3),			/*				 SI.  */
+   COSTS_N_INSNS (3),			/*				 DI.  */
+   COSTS_N_INSNS (3)},			/*			other.  */
+  0,					/* cost of multiply per each bit
+					   set.  */
+   /* Depending on parameters, idiv can get faster on ryzen.  This is upper
+      bound.  */
+  {COSTS_N_INSNS (16),			/* cost of a divide/mod for QI.  */
+   COSTS_N_INSNS (22),			/* 			    HI.  */
+   COSTS_N_INSNS (30),			/*			    SI.  */
+   COSTS_N_INSNS (45),			/*			    DI.  */
+   COSTS_N_INSNS (45)},			/*			    other.  */
+  COSTS_N_INSNS (1),			/* cost of movsx.  */
+  COSTS_N_INSNS (1),			/* cost of movzx.  */
+  8,					/* "large" insn.  */
+  9,					/* MOVE_RATIO.  */
+
+  /* All move costs are relative to integer->integer move times 2 and thus
+     they are latency*2.  */
+
+  /* reg-reg moves are done by renaming and thus they are even cheaper than
+     1 cycle.  Because reg-reg move cost is 2 and following tables correspond
+     to doubles of latencies, we do not model this correctly.  It does not
+     seem to make practical difference to bump prices up even more.  */
+  6,					/* cost for loading QImode using
+					   movzbl.  */
+  {6, 6, 6},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {8, 8, 8},				/* cost of storing integer
+					   registers.  */
+  2,					/* cost of reg,reg fld/fst.  */
+  {6, 6, 16},				/* cost of loading fp registers
+					   in SFmode, DFmode and XFmode.  */
+  {8, 8, 16},				/* cost of storing fp registers
+					   in SFmode, DFmode and XFmode.  */
+  2,					/* cost of moving MMX register.  */
+  {6, 6},				/* cost of loading MMX registers
+					   in SImode and DImode.  */
+  {8, 8},				/* cost of storing MMX registers
+					   in SImode and DImode.  */
+  2, 3, 6,				/* cost of moving XMM,YMM,ZMM
+					   register.  */
+  {6, 6, 6, 10, 20},			/* cost of loading SSE registers
+					   in 32,64,128,256 and 512-bit.  */
+  {6, 6, 6, 10, 20},			/* cost of unaligned loads.  */
+  {8, 8, 8, 8, 16},			/* cost of storing SSE registers
+					   in 32,64,128,256 and 512-bit.  */
+  {8, 8, 8, 8, 16},			/* cost of unaligned stores.  */
+  6, 6,					/* SSE->integer and integer->SSE
+					   moves.  */
+  /* VGATHERDPD is 23 uops and throughput is 9, VGATHERDPD is 35 uops,
+     throughput 12.  Approx 9 uops do not depend on vector size and every load
+     is 7 uops.  */
+  18, 8,				/* Gather load static, per_elt.  */
+  18, 10,				/* Gather store static, per_elt.  */
+  32,					/* size of l1 cache.  */
+  512,					/* size of l2 cache.  */
+  64,					/* size of prefetch block.  */
+  /* New AMD processors never drop prefetches; if they cannot be performed
+     immediately, they are queued.  We set number of simultaneous prefetches
+     to a large constant to reflect this (it probably is not a good idea not
+     to limit number of prefetches at all, as their execution also takes some
+     time).  */
+  100,					/* number of parallel prefetches.  */
+  3,					/* Branch cost.  */
+  COSTS_N_INSNS (5),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (5),			/* cost of FMUL instruction.  */
+  /* Latency of fdiv is 8-15.  */
+  COSTS_N_INSNS (15),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (1),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (1),			/* cost of FCHS instruction.  */
+  /* Latency of fsqrt is 4-10.  */
+  COSTS_N_INSNS (10),			/* cost of FSQRT instruction.  */
+
+  COSTS_N_INSNS (1),			/* cost of cheap SSE instruction.  */
+  COSTS_N_INSNS (3),			/* cost of ADDSS/SD SUBSS/SD insns.  */
+  COSTS_N_INSNS (3),			/* cost of MULSS instruction.  */
+  COSTS_N_INSNS (4),			/* cost of MULSD instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMA SS instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMA SD instruction.  */
+  COSTS_N_INSNS (10),			/* cost of DIVSS instruction.  */
+  /* 9-13.  */
+  COSTS_N_INSNS (13),			/* cost of DIVSD instruction.  */
+  COSTS_N_INSNS (10),			/* cost of SQRTSS instruction.  */
+  COSTS_N_INSNS (15),			/* cost of SQRTSD instruction.  */
+  /* Zen can execute 4 integer operations per cycle.  FP operations
+     take 3 cycles and it can execute 2 integer additions and 2
+     multiplications thus reassociation may make sense up to with of 6.
+     SPEC2k6 bencharks suggests
+     that 4 works better than 6 probably due to register pressure.
+
+     Integer vector operations are taken by FP unit and execute 3 vector
+     plus/minus operations per cycle but only one multiply.  This is adjusted
+     in ix86_reassociation_width.  */
+  4, 4, 3, 6,				/* reassoc int, fp, vec_int, vec_fp.  */
+  znver2_memcpy,
+  znver2_memset,
+  COSTS_N_INSNS (4),			/* cond_taken_branch_cost.  */
+  COSTS_N_INSNS (2),			/* cond_not_taken_branch_cost.  */
+  "16",					/* Loop alignment.  */
+  "16",					/* Jump alignment.  */
+  "0:0:8",				/* Label alignment.  */
+  "16",					/* Func alignment.  */
+};
+
 /* skylake_cost should produce code tuned for Skylake familly of CPUs.  */
 static stringop_algs skylake_memcpy[2] =   {
   {libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
diff --git a/gcc/config/i386/x86-tune-sched.c b/gcc/config/i386/x86-tune-sched.c
index d403a2f..a7fad4a 100644
--- a/gcc/config/i386/x86-tune-sched.c
+++ b/gcc/config/i386/x86-tune-sched.c
@@ -64,6 +64,7 @@ ix86_issue_rate (void)
     case PROCESSOR_BDVER3:
     case PROCESSOR_BDVER4:
     case PROCESSOR_ZNVER1:
+    case PROCESSOR_ZNVER2:
     case PROCESSOR_CORE2:
     case PROCESSOR_NEHALEM:
     case PROCESSOR_SANDYBRIDGE:
@@ -393,6 +394,7 @@ ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost,
       break;
 
     case PROCESSOR_ZNVER1:
+    case PROCESSOR_ZNVER2:
       /* Stack engine allows to execute push&pop instructions in parall.  */
       if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
 	  && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index a46450a..b91dca1 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -62,7 +62,7 @@ DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
    that can be partly masked by careful scheduling of moves.  */
 DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
-	  | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
    are resolved on SSE register parts instead of whole registers, so we may
@@ -100,18 +100,20 @@ DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
 /* X86_TUNE_FUSE_CMP_AND_BRANCH_32: Fuse compare with a subsequent
    conditional jump instruction for 32 bit TARGET.  */
 DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_32, "fuse_cmp_and_branch_32",
-	  m_CORE_ALL | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  m_CORE_ALL | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_FUSE_CMP_AND_BRANCH_64: Fuse compare with a subsequent
    conditional jump instruction for TARGET_64BIT.  */
 DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_64, "fuse_cmp_and_branch_64",
-	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER
+	  | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS: Fuse compare with a
    subsequent conditional jump instruction when the condition jump
    check sign flag (SF) or overflow flag (OF).  */
 DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS, "fuse_cmp_and_branch_soflags",
-	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER
+	  | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_FUSE_ALU_AND_BRANCH: Fuse alu with a subsequent conditional
    jump instruction when the alu instruction produces the CCFLAG consumed by
@@ -280,7 +282,7 @@ DEF_TUNE (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES,
 DEF_TUNE (X86_TUNE_USE_SAHF, "use_sahf",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 	  | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER
-	  | m_BTVER | m_ZNVER1 | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
+	  | m_BTVER | m_ZNVER | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
 	  | m_GENERIC)
 
 /* X86_TUNE_USE_CLTD: Controls use of CLTD and CTQO instructions.  */
@@ -351,19 +353,19 @@ DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, "general_regs_sse_spill",
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, "sse_unaligned_load_optimal",
 	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
 	  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
-	  | m_TREMONT | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER1 | m_GENERIC)
+	  | m_TREMONT | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL: Use movups for misaligned stores instead
    of a sequence loading registers by parts.  */
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, "sse_unaligned_store_optimal",
 	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
 	  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
-	  | m_TREMONT | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  | m_TREMONT | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* Use packed single precision instructions where posisble.  I.e. movups instead
    of movupd.  */
 DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, "sse_packed_single_insn_optimal",
-	  m_BDVER | m_ZNVER1)
+	  m_BDVER | m_ZNVER)
 
 /* X86_TUNE_SSE_TYPELESS_STORES: Always movaps/movups for 128bit stores.   */
 DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
@@ -372,7 +374,7 @@ DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
 /* X86_TUNE_SSE_LOAD0_BY_PXOR: Always use pxor to load0 as opposed to
    xorps/xorpd and other variants.  */
 DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
-	  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER1
+	  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER
 	  | m_GENERIC)
 
 /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC: Enable moves in from integer
@@ -419,11 +421,11 @@ DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
 
 /* X86_TUNE_USE_GATHER: Use gather instructions.  */
 DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
-          ~(m_ZNVER1 | m_GENERIC))
+	  ~(m_ZNVER | m_GENERIC))
 
 /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
    smaller FMA chain.  */
-DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1)
+DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER)
 
 /*****************************************************************************/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too)     */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7aeb4fd..53063d9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20375,6 +20375,9 @@ AMD Family 17h CPU.
 
 @item znver1
 AMD Family 17h Zen version 1.
+
+@item znver2
+AMD Family 17h Zen version 2.
 @end table
 
 Here is an example:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 055e8c4..1973d9e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27261,6 +27261,13 @@ supersets BMI, BMI2, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX,
 SHA, CLZERO, AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3,
 SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
 instruction set extensions.
+@item znver2
+AMD Family 17h core based CPUs with x86-64 instruction set support. (This
+supersets BMI, BMI2, ,CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED,
+MWAITX, SHA, CLZERO, AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A,
+SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
+instruction set extensions.)
+
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index a7bb9da..09f4d6f 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -108,6 +108,8 @@ get_amd_cpu (unsigned int family, unsigned int model)
       /* AMD family 17h version 1.  */
       if (model <= 0x1f)
 	__cpu_model.__cpu_subtype = AMDFAM17H_ZNVER1;
+      if (model >= 0x30)
+	 __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;
       break;
     default:
       break;
diff --git a/libgcc/config/i386/cpuinfo.h b/libgcc/config/i386/cpuinfo.h
index 0aa887b..ac9c348 100644
--- a/libgcc/config/i386/cpuinfo.h
+++ b/libgcc/config/i386/cpuinfo.h
@@ -75,6 +75,7 @@ enum processor_subtypes
   INTEL_COREI7_CANNONLAKE,
   INTEL_COREI7_ICELAKE_CLIENT,
   INTEL_COREI7_ICELAKE_SERVER,
+  AMDFAM17H_ZNVER2,
   CPU_SUBTYPE_MAX
 };
Kumar, Venkataramanan Nov. 4, 2018, 11:22 a.m. UTC | #3
Hi Uros and Honza,

I have committed the znver2 patch.
Ref:https://gcc.gnu.org/viewcvs/gcc?limit_changes=0&view=revision&revision=265775

Thanks you.

regards,
Venkat.

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org>
> On Behalf Of Kumar, Venkataramanan
> Sent: Sunday, November 4, 2018 12:21 AM
> To: Uros Bizjak <ubizjak@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; Jan Hubicka <hubicka@ucw.cz>
> Subject: RE: [patch][x86_64]: AMD znver2 enablement
> 
> Hi Uros,
> 
> > -----Original Message-----
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: Friday, November 2, 2018 9:06 PM
> > To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> > Cc: gcc-patches@gcc.gnu.org; Jan Hubicka <hubicka@ucw.cz>
> > Subject: Re: [patch][x86_64]: AMD znver2 enablement
> >
> > On Wed, Oct 31, 2018 at 6:25 AM Kumar, Venkataramanan
> > <Venkataramanan.Kumar@amd.com> wrote:
> > >
> > > Hi Maintainers,
> > >
> > > PFA, the patch that enables support for the next generation AMD  Zen
> > > CPU
> > via -march=znver2.
> > > As of now,  znver2 is using the same costs and scheduler
> > > descriptions
> > written for znver1.
> > >
> > > We will update scheduler descriptions and costing for znver2 later
> > > as we
> > get more information.
> > >
> > > Ok for trunk?
> > >
> > > Regards,
> > > Venkat.
> > >
> > > ChangeLog gcc:
> > >         * common/config/i386/i386-common.c (processor_alias_table):
> > > Add
> > znver2 entry.
> > >               * config.gcc (i[34567]86-*-linux* | ...): Add znver2.
> > >               (case ${target}): Add znver2.
> > >               * config/i386/driver-i386.c: (host_detect_local_cpu): Let
> > >               -march=native recognize znver2 processors.
> > >               * config/i386/i386-c.c (ix86_target_macros_internal): Add znver2.
> > >               * config/i386/i386.c (m_znver2): New definition.
> > >               (m_ZNVER): New definition.
> > >               (m_AMD_MULTIPLE): Includes m_znver2.
> > >               (processor_cost_table): Add znver2 entry.
> > >               (processor_target_table): Add znver2 entry.
> > >               (get_builtin_code_for_version): Set priority for
> > >          PROCESSOR_ZNVER2.
> > >         (processor_model): Add M_AMDFAM17H_ZNVER2.
> > >         (arch_names_table): Ditto.
> > >         (ix86_reassociation_width): Include znver2.
> > >         * config/i386/i386.h (TARGET_znver2): New definition.
> > >               (struct ix86_size_cost): Add TARGET_ZNVER2.
> > >               (enum processor_type): Add PROCESSOR_ZNVER2.
> > >               * config/i386/i386.md (define_attr "cpu"): Add znver2.
> > >         * config/i386/x86-tune-costs.h: (processor_costs) Add znver2 costs.
> > >         * config/i386/x86-tune-sched.c: (ix86_issue_rate): Add znver2.
> > >         (ix86_adjust_cost): Add znver2.
> > >               * config/i386/x86-tune.def:  Replace m_ZNVER1 by m_ZNVER
> > >               * gcc/doc/extend.texi: Add details about znver2.
> > >               * gcc/doc/invoke.texi: Add details about znver2.
> > >
> > > ChangeLog libgcc
> > >          * config/i386/cpuinfo.c: (get_amd_cpu): Add znver2.
> > >          (processor_subtypes): Ditto.
> >
> >
> > diff --git a/libgcc/config/i386/cpuinfo.h
> > b/libgcc/config/i386/cpuinfo.h index 0aa887b..86cb4ea 100644
> > --- a/libgcc/config/i386/cpuinfo.h
> > +++ b/libgcc/config/i386/cpuinfo.h
> > @@ -67,6 +67,7 @@ enum processor_subtypes
> >    AMDFAM15H_BDVER3,
> >    AMDFAM15H_BDVER4,
> >    AMDFAM17H_ZNVER1,
> > +  AMDFAM17H_ZNVER2,
> >    INTEL_COREI7_IVYBRIDGE,
> >    INTEL_COREI7_HASWELL,
> >    INTEL_COREI7_BROADWELL,
> >
> > As the comment above these enums says:
> >
> > /* Any new types or subtypes have to be inserted at the end. */
> >
> > So, please add new entry at the end of enum processor_types.
> >
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
> > 963c7fc..bbe3bb3 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -32269,6 +32276,7 @@ fold_builtin_cpu (tree fndecl, tree *args)
> >      M_AMDFAM15H_BDVER3,
> >      M_AMDFAM15H_BDVER4,
> >      M_AMDFAM17H_ZNVER1,
> > +    M_AMDFAM17H_ZNVER2,
> >      M_INTEL_COREI7_IVYBRIDGE,
> >      M_INTEL_COREI7_HASWELL,
> >      M_INTEL_COREI7_BROADWELL,
> >
> > The above also have to be in sync with enum processor_subtypes.
> >
> > Otherwise LGTM.
> >
> > Uros.
> 
> I have updated the patch as per your review comments.  Thank you,  I will
> commit the attached patch.
> 
> Regards,
> Venkat.
> 
> ChangeLog:
>         * common/config/i386/i386-common.c (processor_alias_table): Add
> znver2 entry.
> 	* config.gcc (i[34567]86-*-linux* | ...): Add znver2.
> 	(case ${target}): Add znver2.
> 	* config/i386/driver-i386.c: (host_detect_local_cpu): Let
> 	-march=native recognize znver2 processors.
> 	* config/i386/i386-c.c (ix86_target_macros_internal): Add znver2.
> 	* config/i386/i386.c (m_znver2): New definition.
> 	(m_ZNVER): New definition.
> 	(m_AMD_MULTIPLE): Includes m_znver2.
> 	(processor_cost_table): Add znver2 entry.
> 	(processor_target_table): Add znver2 entry.
> 	(get_builtin_code_for_version): Set priority for
>          PROCESSOR_ZNVER2.
>         (processor_model): Add M_AMDFAM17H_ZNVER2.
>         (arch_names_table): Ditto.
>         (ix86_reassociation_width): Include znver2.
>         * config/i386/i386.h (TARGET_znver2): New definition.
> 	(struct ix86_size_cost): Add TARGET_ZNVER2.
> 	(enum processor_type): Add PROCESSOR_ZNVER2.
> 	* config/i386/i386.md (define_attr "cpu"): Add znver2.
>         * config/i386/x86-tune-costs.h: (processor_costs) Add znver2 costs.
>         * config/i386/x86-tune-sched.c: (ix86_issue_rate): Add znver2.
>         (ix86_adjust_cost): Add znver2.
> 	* config/i386/x86-tune.def:  Replace m_ZNVER1 by m_ZNVER
> 	* gcc/doc/extend.texi: Add details about znver2.
> 	* gcc/doc/invoke.texi: Add details about znver2.
> 
> ChangeLog libgcc
>          * config/i386/cpuinfo.c: (get_amd_cpu): Add znver2.
>          (processor_subtypes): Ditto.
diff mbox series

Patch

diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c
index f12806e..ff13ea5 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1677,6 +1677,16 @@  const pta processor_alias_table[] =
       | PTA_RDRND | PTA_MOVBE | PTA_MWAITX | PTA_ADX | PTA_RDSEED
       | PTA_CLZERO | PTA_CLFLUSHOPT | PTA_XSAVEC | PTA_XSAVES
       | PTA_SHA | PTA_LZCNT | PTA_POPCNT},
+  {"znver2", PROCESSOR_ZNVER2, CPU_ZNVER1,
+    PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
+      | PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
+      | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_AVX2
+      | PTA_BMI | PTA_BMI2 | PTA_F16C | PTA_FMA | PTA_PRFCHW
+      | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT | PTA_FSGSBASE
+      | PTA_RDRND | PTA_MOVBE | PTA_MWAITX | PTA_ADX | PTA_RDSEED
+      | PTA_CLZERO | PTA_CLFLUSHOPT | PTA_XSAVEC | PTA_XSAVES
+      | PTA_SHA | PTA_LZCNT | PTA_POPCNT | PTA_CLWB | PTA_RDPID
+      | PTA_WBNOINVD},
   {"btver1", PROCESSOR_BTVER1, CPU_GENERIC,
     PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
       | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_PRFCHW
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 93dc297..a47e6c3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -664,11 +664,11 @@  pentium4 pentium4m pentiumpro prescott lakemont"
 # 64-bit x86 processors supported by --with-arch=.  Each processor
 # MUST be separated by exactly one space.
 x86_64_archs="amdfam10 athlon64 athlon64-sse3 barcelona bdver1 bdver2 \
-bdver3 bdver4 znver1 btver1 btver2 k8 k8-sse3 opteron opteron-sse3 nocona \
-core2 corei7 corei7-avx core-avx-i core-avx2 atom slm nehalem westmere \
-sandybridge ivybridge haswell broadwell bonnell silvermont knl knm \
-skylake-avx512 cannonlake icelake-client icelake-server skylake goldmont \
-goldmont-plus tremont x86-64 native"
+bdver3 bdver4 znver1 znver2 btver1 btver2 k8 k8-sse3 opteron \
+opteron-sse3 nocona core2 corei7 corei7-avx core-avx-i core-avx2 atom \
+slm nehalem westmere sandybridge ivybridge haswell broadwell bonnell \
+silvermont knl knm skylake-avx512 cannonlake icelake-client icelake-server \
+skylake goldmont goldmont-plus tremont x86-64 native"
 
 # Additional x86 processors supported by --with-cpu=.  Each processor
 # MUST be separated by exactly one space.
@@ -3336,6 +3336,10 @@  case ${target} in
 	arch=znver1
 	cpu=znver1
 	;;
+      znver2-*)
+	arch=znver2
+	cpu=znver2
+	;;
       bdver4-*)
         arch=bdver4
         cpu=bdver4
@@ -3453,6 +3457,10 @@  case ${target} in
 	arch=znver1
 	cpu=znver1
 	;;
+      znver2-*)
+	arch=znver2
+	cpu=znver2
+	;;
       bdver4-*)
         arch=bdver4
         cpu=bdver4
diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 8c830bd..95ba393 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -649,6 +649,8 @@  const char *host_detect_local_cpu (int argc, const char **argv)
 	processor = PROCESSOR_GEODE;
       else if (has_movbe && family == 22)
 	processor = PROCESSOR_BTVER2;
+      else if (has_clwb)
+	processor = PROCESSOR_ZNVER2;
       else if (has_clzero)
 	processor = PROCESSOR_ZNVER1;
       else if (has_avx2)
@@ -1012,6 +1014,9 @@  const char *host_detect_local_cpu (int argc, const char **argv)
     case PROCESSOR_ZNVER1:
       cpu = "znver1";
       break;
+    case PROCESSOR_ZNVER2:
+      cpu = "znver2";
+      break;
     case PROCESSOR_BTVER1:
       cpu = "btver1";
       break;
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 005e1a5..a11be6f 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -124,6 +124,10 @@  ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
       def_or_undef (parse_in, "__znver1");
       def_or_undef (parse_in, "__znver1__");
       break;
+    case PROCESSOR_ZNVER2:
+      def_or_undef (parse_in, "__znver2");
+      def_or_undef (parse_in, "__znver2__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__btver1");
       def_or_undef (parse_in, "__btver1__");
@@ -288,6 +292,9 @@  ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
     case PROCESSOR_ZNVER1:
       def_or_undef (parse_in, "__tune_znver1__");
       break;
+    case PROCESSOR_ZNVER2:
+      def_or_undef (parse_in, "__tune_znver2__");
+      break;
     case PROCESSOR_BTVER1:
       def_or_undef (parse_in, "__tune_btver1__");
       break;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 963c7fc..bbe3bb3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -169,12 +169,14 @@  const struct processor_costs *ix86_cost = NULL;
 #define m_BDVER3 (HOST_WIDE_INT_1U<<PROCESSOR_BDVER3)
 #define m_BDVER4 (HOST_WIDE_INT_1U<<PROCESSOR_BDVER4)
 #define m_ZNVER1 (HOST_WIDE_INT_1U<<PROCESSOR_ZNVER1)
+#define m_ZNVER2 (HOST_WIDE_INT_1U<<PROCESSOR_ZNVER2)
 #define m_BTVER1 (HOST_WIDE_INT_1U<<PROCESSOR_BTVER1)
 #define m_BTVER2 (HOST_WIDE_INT_1U<<PROCESSOR_BTVER2)
 #define m_BDVER	(m_BDVER1 | m_BDVER2 | m_BDVER3 | m_BDVER4)
 #define m_BTVER (m_BTVER1 | m_BTVER2)
+#define m_ZNVER	(m_ZNVER1 | m_ZNVER2)
 #define m_AMD_MULTIPLE (m_ATHLON_K8 | m_AMDFAM10 | m_BDVER | m_BTVER \
-			| m_ZNVER1)
+			| m_ZNVER)
 
 #define m_GENERIC (HOST_WIDE_INT_1U<<PROCESSOR_GENERIC)
 
@@ -868,6 +870,7 @@  static const struct processor_costs *processor_cost_table[PROCESSOR_max] =
   &btver1_cost,
   &btver2_cost,
   &znver1_cost,
+  &znver2_cost
 };
 
 static unsigned int
@@ -31601,6 +31604,10 @@  get_builtin_code_for_version (tree decl, tree *predicate_list)
 	      arg_str = "znver1";
 	      priority = P_PROC_AVX2;
 	      break;
+	    case PROCESSOR_ZNVER2:
+	      arg_str = "znver2";
+	      priority = P_PROC_AVX2;
+	      break;
 	    }
 	}
 
@@ -32269,6 +32276,7 @@  fold_builtin_cpu (tree fndecl, tree *args)
     M_AMDFAM15H_BDVER3,
     M_AMDFAM15H_BDVER4,
     M_AMDFAM17H_ZNVER1,
+    M_AMDFAM17H_ZNVER2,
     M_INTEL_COREI7_IVYBRIDGE,
     M_INTEL_COREI7_HASWELL,
     M_INTEL_COREI7_BROADWELL,
@@ -32323,6 +32331,7 @@  fold_builtin_cpu (tree fndecl, tree *args)
       {"btver2", M_AMD_BTVER2},
       {"amdfam17h", M_AMDFAM17H},
       {"znver1", M_AMDFAM17H_ZNVER1},
+      {"znver2", M_AMDFAM17H_ZNVER2},
     };
 
   static struct _isa_names_table
@@ -49200,8 +49209,8 @@  ix86_reassociation_width (unsigned int op, machine_mode mode)
 
       /* Integer vector instructions execute in FP unit
 	 and can execute 3 additions and one multiplication per cycle.  */
-      if (ix86_tune == PROCESSOR_ZNVER1 && INTEGRAL_MODE_P (mode)
-	  && op != PLUS && op != MINUS)
+      if ((ix86_tune == PROCESSOR_ZNVER1 || ix86_tune == PROCESSOR_ZNVER2)
+	   && INTEGRAL_MODE_P (mode) && op != PLUS && op != MINUS)
 	return 1;
 
       /* Account for targets that splits wide vectors into multiple parts.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 01d49a7..58caab2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -415,6 +415,7 @@  extern const struct processor_costs ix86_size_cost;
 #define TARGET_BTVER1 (ix86_tune == PROCESSOR_BTVER1)
 #define TARGET_BTVER2 (ix86_tune == PROCESSOR_BTVER2)
 #define TARGET_ZNVER1 (ix86_tune == PROCESSOR_ZNVER1)
+#define TARGET_ZNVER2 (ix86_tune == PROCESSOR_ZNVER2)
 
 /* Feature tests against the various tunings.  */
 enum ix86_tune_indices {
@@ -2272,6 +2273,7 @@  enum processor_type
   PROCESSOR_BTVER1,
   PROCESSOR_BTVER2,
   PROCESSOR_ZNVER1,
+  PROCESSOR_ZNVER2,
   PROCESSOR_max
 };
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 7fb2b14..8061a23 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -430,7 +430,7 @@ 
 ;; Processor type.
 (define_attr "cpu" "none,pentium,pentiumpro,geode,k6,athlon,k8,core2,nehalem,
 		    atom,slm,glm,haswell,generic,amdfam10,bdver1,bdver2,bdver3,
-		    bdver4,btver2,znver1"
+		    bdver4,btver2,znver1,znver2"
   (const (symbol_ref "ix86_schedule")))
 
 ;; A basic instruction type.  Refinements due to arguments to be
diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index 50ecb35..a47b92f 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -1273,6 +1273,133 @@  struct processor_costs znver1_cost = {
   "16",					/* Func alignment.  */
 };
 
+/*  ZNVER2 has optimized REP instruction for medium sized blocks, but for
+    very small blocks it is better to use loop.  For large blocks, libcall
+    can do nontemporary accesses and beat inline considerably.  */
+static stringop_algs znver2_memcpy[2] = {
+  {libcall, {{6, loop, false}, {14, unrolled_loop, false},
+	     {-1, rep_prefix_4_byte, false}}},
+  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
+	     {-1, libcall, false}}}};
+static stringop_algs znver2_memset[2] = {
+  {libcall, {{8, loop, false}, {24, unrolled_loop, false},
+	     {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
+  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
+	     {-1, libcall, false}}}};
+
+struct processor_costs znver2_cost = {
+  COSTS_N_INSNS (1),			/* cost of an add instruction.  */
+  COSTS_N_INSNS (1),			/* cost of a lea instruction.  */
+  COSTS_N_INSNS (1),			/* variable shift costs.  */
+  COSTS_N_INSNS (1),			/* constant shift costs.  */
+  {COSTS_N_INSNS (3),			/* cost of starting multiply for QI.  */
+   COSTS_N_INSNS (3),			/* 				 HI.  */
+   COSTS_N_INSNS (3),			/*				 SI.  */
+   COSTS_N_INSNS (3),			/*				 DI.  */
+   COSTS_N_INSNS (3)},			/*			other.  */
+  0,					/* cost of multiply per each bit
+					   set.  */
+   /* Depending on parameters, idiv can get faster on ryzen.  This is upper
+      bound.  */
+  {COSTS_N_INSNS (16),			/* cost of a divide/mod for QI.  */
+   COSTS_N_INSNS (22),			/* 			    HI.  */
+   COSTS_N_INSNS (30),			/*			    SI.  */
+   COSTS_N_INSNS (45),			/*			    DI.  */
+   COSTS_N_INSNS (45)},			/*			    other.  */
+  COSTS_N_INSNS (1),			/* cost of movsx.  */
+  COSTS_N_INSNS (1),			/* cost of movzx.  */
+  8,					/* "large" insn.  */
+  9,					/* MOVE_RATIO.  */
+
+  /* All move costs are relative to integer->integer move times 2 and thus
+     they are latency*2.  */
+
+  /* reg-reg moves are done by renaming and thus they are even cheaper than
+     1 cycle.  Because reg-reg move cost is 2 and following tables correspond
+     to doubles of latencies, we do not model this correctly.  It does not
+     seem to make practical difference to bump prices up even more.  */
+  6,					/* cost for loading QImode using
+					   movzbl.  */
+  {6, 6, 6},				/* cost of loading integer registers
+					   in QImode, HImode and SImode.
+					   Relative to reg-reg move (2).  */
+  {8, 8, 8},				/* cost of storing integer
+					   registers.  */
+  2,					/* cost of reg,reg fld/fst.  */
+  {6, 6, 16},				/* cost of loading fp registers
+					   in SFmode, DFmode and XFmode.  */
+  {8, 8, 16},				/* cost of storing fp registers
+					   in SFmode, DFmode and XFmode.  */
+  2,					/* cost of moving MMX register.  */
+  {6, 6},				/* cost of loading MMX registers
+					   in SImode and DImode.  */
+  {8, 8},				/* cost of storing MMX registers
+					   in SImode and DImode.  */
+  2, 3, 6,				/* cost of moving XMM,YMM,ZMM
+					   register.  */
+  {6, 6, 6, 10, 20},			/* cost of loading SSE registers
+					   in 32,64,128,256 and 512-bit.  */
+  {6, 6, 6, 10, 20},			/* cost of unaligned loads.  */
+  {8, 8, 8, 8, 16},			/* cost of storing SSE registers
+					   in 32,64,128,256 and 512-bit.  */
+  {8, 8, 8, 8, 16},			/* cost of unaligned stores.  */
+  6, 6,					/* SSE->integer and integer->SSE
+					   moves.  */
+  /* VGATHERDPD is 23 uops and throughput is 9, VGATHERDPD is 35 uops,
+     throughput 12.  Approx 9 uops do not depend on vector size and every load
+     is 7 uops.  */
+  18, 8,				/* Gather load static, per_elt.  */
+  18, 10,				/* Gather store static, per_elt.  */
+  32,					/* size of l1 cache.  */
+  512,					/* size of l2 cache.  */
+  64,					/* size of prefetch block.  */
+  /* New AMD processors never drop prefetches; if they cannot be performed
+     immediately, they are queued.  We set number of simultaneous prefetches
+     to a large constant to reflect this (it probably is not a good idea not
+     to limit number of prefetches at all, as their execution also takes some
+     time).  */
+  100,					/* number of parallel prefetches.  */
+  3,					/* Branch cost.  */
+  COSTS_N_INSNS (5),			/* cost of FADD and FSUB insns.  */
+  COSTS_N_INSNS (5),			/* cost of FMUL instruction.  */
+  /* Latency of fdiv is 8-15.  */
+  COSTS_N_INSNS (15),			/* cost of FDIV instruction.  */
+  COSTS_N_INSNS (1),			/* cost of FABS instruction.  */
+  COSTS_N_INSNS (1),			/* cost of FCHS instruction.  */
+  /* Latency of fsqrt is 4-10.  */
+  COSTS_N_INSNS (10),			/* cost of FSQRT instruction.  */
+
+  COSTS_N_INSNS (1),			/* cost of cheap SSE instruction.  */
+  COSTS_N_INSNS (3),			/* cost of ADDSS/SD SUBSS/SD insns.  */
+  COSTS_N_INSNS (3),			/* cost of MULSS instruction.  */
+  COSTS_N_INSNS (4),			/* cost of MULSD instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMA SS instruction.  */
+  COSTS_N_INSNS (5),			/* cost of FMA SD instruction.  */
+  COSTS_N_INSNS (10),			/* cost of DIVSS instruction.  */
+  /* 9-13.  */
+  COSTS_N_INSNS (13),			/* cost of DIVSD instruction.  */
+  COSTS_N_INSNS (10),			/* cost of SQRTSS instruction.  */
+  COSTS_N_INSNS (15),			/* cost of SQRTSD instruction.  */
+  /* Zen can execute 4 integer operations per cycle.  FP operations
+     take 3 cycles and it can execute 2 integer additions and 2
+     multiplications thus reassociation may make sense up to with of 6.
+     SPEC2k6 bencharks suggests
+     that 4 works better than 6 probably due to register pressure.
+
+     Integer vector operations are taken by FP unit and execute 3 vector
+     plus/minus operations per cycle but only one multiply.  This is adjusted
+     in ix86_reassociation_width.  */
+  4, 4, 3, 6,				/* reassoc int, fp, vec_int, vec_fp.  */
+  znver2_memcpy,
+  znver2_memset,
+  COSTS_N_INSNS (4),			/* cond_taken_branch_cost.  */
+  COSTS_N_INSNS (2),			/* cond_not_taken_branch_cost.  */
+  "16",					/* Loop alignment.  */
+  "16",					/* Jump alignment.  */
+  "0:0:8",				/* Label alignment.  */
+  "16",					/* Func alignment.  */
+};
+
 /* skylake_cost should produce code tuned for Skylake familly of CPUs.  */
 static stringop_algs skylake_memcpy[2] =   {
   {libcall, {{1024, rep_prefix_4_byte, true}, {-1, libcall, false}}},
diff --git a/gcc/config/i386/x86-tune-sched.c b/gcc/config/i386/x86-tune-sched.c
index d403a2f..a7fad4a 100644
--- a/gcc/config/i386/x86-tune-sched.c
+++ b/gcc/config/i386/x86-tune-sched.c
@@ -64,6 +64,7 @@  ix86_issue_rate (void)
     case PROCESSOR_BDVER3:
     case PROCESSOR_BDVER4:
     case PROCESSOR_ZNVER1:
+    case PROCESSOR_ZNVER2:
     case PROCESSOR_CORE2:
     case PROCESSOR_NEHALEM:
     case PROCESSOR_SANDYBRIDGE:
@@ -393,6 +394,7 @@  ix86_adjust_cost (rtx_insn *insn, int dep_type, rtx_insn *dep_insn, int cost,
       break;
 
     case PROCESSOR_ZNVER1:
+    case PROCESSOR_ZNVER2:
       /* Stack engine allows to execute push&pop instructions in parall.  */
       if ((insn_type == TYPE_PUSH || insn_type == TYPE_POP)
 	  && (dep_insn_type == TYPE_PUSH || dep_insn_type == TYPE_POP))
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index a46450a..b91dca1 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -62,7 +62,7 @@  DEF_TUNE (X86_TUNE_PARTIAL_REG_DEPENDENCY, "partial_reg_dependency",
    that can be partly masked by careful scheduling of moves.  */
 DEF_TUNE (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY, "sse_partial_reg_dependency",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_AMDFAM10
-	  | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_SSE_SPLIT_REGS: Set for machines where the type and dependencies
    are resolved on SSE register parts instead of whole registers, so we may
@@ -100,18 +100,20 @@  DEF_TUNE (X86_TUNE_MEMORY_MISMATCH_STALL, "memory_mismatch_stall",
 /* X86_TUNE_FUSE_CMP_AND_BRANCH_32: Fuse compare with a subsequent
    conditional jump instruction for 32 bit TARGET.  */
 DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_32, "fuse_cmp_and_branch_32",
-	  m_CORE_ALL | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  m_CORE_ALL | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_FUSE_CMP_AND_BRANCH_64: Fuse compare with a subsequent
    conditional jump instruction for TARGET_64BIT.  */
 DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_64, "fuse_cmp_and_branch_64",
-	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER
+	  | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS: Fuse compare with a
    subsequent conditional jump instruction when the condition jump
    check sign flag (SF) or overflow flag (OF).  */
 DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS, "fuse_cmp_and_branch_soflags",
-	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER
+	  | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_FUSE_ALU_AND_BRANCH: Fuse alu with a subsequent conditional
    jump instruction when the alu instruction produces the CCFLAG consumed by
@@ -280,7 +282,7 @@  DEF_TUNE (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES,
 DEF_TUNE (X86_TUNE_USE_SAHF, "use_sahf",
           m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BONNELL | m_SILVERMONT
 	  | m_KNL | m_KNM | m_INTEL | m_K6_GEODE | m_K8 | m_AMDFAM10 | m_BDVER
-	  | m_BTVER | m_ZNVER1 | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
+	  | m_BTVER | m_ZNVER | m_GOLDMONT | m_GOLDMONT_PLUS | m_TREMONT
 	  | m_GENERIC)
 
 /* X86_TUNE_USE_CLTD: Controls use of CLTD and CTQO instructions.  */
@@ -351,19 +353,19 @@  DEF_TUNE (X86_TUNE_GENERAL_REGS_SSE_SPILL, "general_regs_sse_spill",
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL, "sse_unaligned_load_optimal",
 	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
 	  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
-	  | m_TREMONT | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER1 | m_GENERIC)
+	  | m_TREMONT | m_AMDFAM10 | m_BDVER | m_BTVER | m_ZNVER | m_GENERIC)
 
 /* X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL: Use movups for misaligned stores instead
    of a sequence loading registers by parts.  */
 DEF_TUNE (X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL, "sse_unaligned_store_optimal",
 	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_SILVERMONT | m_KNL | m_KNM
 	  | m_INTEL | m_GOLDMONT | m_GOLDMONT_PLUS
-	  | m_TREMONT | m_BDVER | m_ZNVER1 | m_GENERIC)
+	  | m_TREMONT | m_BDVER | m_ZNVER | m_GENERIC)
 
 /* Use packed single precision instructions where posisble.  I.e. movups instead
    of movupd.  */
 DEF_TUNE (X86_TUNE_SSE_PACKED_SINGLE_INSN_OPTIMAL, "sse_packed_single_insn_optimal",
-	  m_BDVER | m_ZNVER1)
+	  m_BDVER | m_ZNVER)
 
 /* X86_TUNE_SSE_TYPELESS_STORES: Always movaps/movups for 128bit stores.   */
 DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
@@ -372,7 +374,7 @@  DEF_TUNE (X86_TUNE_SSE_TYPELESS_STORES, "sse_typeless_stores",
 /* X86_TUNE_SSE_LOAD0_BY_PXOR: Always use pxor to load0 as opposed to
    xorps/xorpd and other variants.  */
 DEF_TUNE (X86_TUNE_SSE_LOAD0_BY_PXOR, "sse_load0_by_pxor",
-	  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER1
+	  m_PPRO | m_P4_NOCONA | m_CORE_ALL | m_BDVER | m_BTVER | m_ZNVER
 	  | m_GENERIC)
 
 /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC: Enable moves in from integer
@@ -419,11 +421,11 @@  DEF_TUNE (X86_TUNE_AVOID_4BYTE_PREFIXES, "avoid_4byte_prefixes",
 
 /* X86_TUNE_USE_GATHER: Use gather instructions.  */
 DEF_TUNE (X86_TUNE_USE_GATHER, "use_gather",
-          ~(m_ZNVER1 | m_GENERIC))
+	  ~(m_ZNVER | m_GENERIC))
 
 /* X86_TUNE_AVOID_128FMA_CHAINS: Avoid creating loops with tight 128bit or
    smaller FMA chain.  */
-DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER1)
+DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER)
 
 /*****************************************************************************/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too)     */
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7aeb4fd..53063d9 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -20375,6 +20375,9 @@  AMD Family 17h CPU.
 
 @item znver1
 AMD Family 17h Zen version 1.
+
+@item znver2
+AMD Family 17h Zen version 2.
 @end table
 
 Here is an example:
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 055e8c4..1973d9e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -27261,6 +27261,13 @@  supersets BMI, BMI2, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX,
 SHA, CLZERO, AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3,
 SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
 instruction set extensions.
+@item znver2
+AMD Family 17h core based CPUs with x86-64 instruction set support. (This
+supersets BMI, BMI2, ,CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED,
+MWAITX, SHA, CLZERO, AES, PCL_MUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A,
+SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit
+instruction set extensions.)
+
 
 @item btver1
 CPUs based on AMD Family 14h cores with x86-64 instruction set support.  (This
diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
index a7bb9da..09f4d6f 100644
--- a/libgcc/config/i386/cpuinfo.c
+++ b/libgcc/config/i386/cpuinfo.c
@@ -108,6 +108,8 @@  get_amd_cpu (unsigned int family, unsigned int model)
       /* AMD family 17h version 1.  */
       if (model <= 0x1f)
 	__cpu_model.__cpu_subtype = AMDFAM17H_ZNVER1;
+      if (model >= 0x30)
+	 __cpu_model.__cpu_subtype = AMDFAM17H_ZNVER2;
       break;
     default:
       break;
diff --git a/libgcc/config/i386/cpuinfo.h b/libgcc/config/i386/cpuinfo.h
index 0aa887b..86cb4ea 100644
--- a/libgcc/config/i386/cpuinfo.h
+++ b/libgcc/config/i386/cpuinfo.h
@@ -67,6 +67,7 @@  enum processor_subtypes
   AMDFAM15H_BDVER3,
   AMDFAM15H_BDVER4,
   AMDFAM17H_ZNVER1,
+  AMDFAM17H_ZNVER2,
   INTEL_COREI7_IVYBRIDGE,
   INTEL_COREI7_HASWELL,
   INTEL_COREI7_BROADWELL,