Message ID | 201312151954.38590.linux@carewolf.com |
---|---|
State | New |
Headers | show |
On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen <carewolf@gmail.com> wrote: > Hi again > On Wednesday 11 December 2013, Uros Bizjak wrote: >> Hello! >> >> > PR gcc/59422 >> > >> > This patch extends the supported targets for function multi versiong to >> > also include Haswell, Silvermont, and the most recent AMD models. It >> > also prioritizes AVX2 versions over AMD specific pre-AVX2 versions. >> >> Please add a ChangeLog entry and attach the complete patch. Please >> also state how you tested the patch, as outlined in the instructions >> [1]. >> >> [1] http://gcc.gnu.org/contribute.html >> > Updated patch for better CPU model detection and added ChangeLog. > > The patch has been tested with the attached test.cpp. Verified that it doesn't > build before the patch, and that it builds after, and verified it selects > correct versions at runtime based on either CPU model or supported ISA (tested > on 3 machines: SandyBridge, IvyBridge and Phenom II). > > Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is that > an old term for what has become the Jaguar architecture? Thanks for the patch! However, you should not change the existing order of enums in cpuinfo.c (enum processor_vendor, enum processor_types, enum processor_subtypes, enum processor_features), but new entries should be added at the end (before *_MAX entry, if exists) of the enum. The enums (enum processor_features and enum processor_model) in config/i386/i386.c should mirror these changes. Please see [1]. Probably, we should document this in the source... - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, Huh... Thanks for catching this. -march=sandybridge is not recognized... I have also CC'd maintainers from Intel and AMD for their comments. [1] http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00034.html Uros.
On Mon, Dec 16, 2013 at 10:34 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen > <carewolf@gmail.com> wrote: > >> Hi again >> On Wednesday 11 December 2013, Uros Bizjak wrote: >>> Hello! >>> >>> > PR gcc/59422 >>> > >>> > This patch extends the supported targets for function multi versiong to >>> > also include Haswell, Silvermont, and the most recent AMD models. It >>> > also prioritizes AVX2 versions over AMD specific pre-AVX2 versions. >>> >>> Please add a ChangeLog entry and attach the complete patch. Please >>> also state how you tested the patch, as outlined in the instructions >>> [1]. >>> >>> [1] http://gcc.gnu.org/contribute.html >>> >> Updated patch for better CPU model detection and added ChangeLog. >> >> The patch has been tested with the attached test.cpp. Verified that it doesn't >> build before the patch, and that it builds after, and verified it selects >> correct versions at runtime based on either CPU model or supported ISA (tested >> on 3 machines: SandyBridge, IvyBridge and Phenom II). >> >> Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is that >> an old term for what has become the Jaguar architecture? > > Thanks for the patch! > > However, you should not change the existing order of enums in > cpuinfo.c (enum processor_vendor, enum processor_types, enum > processor_subtypes, enum processor_features), but new entries should > be added at the end (before *_MAX entry, if exists) of the enum. The > enums (enum processor_features and enum processor_model) in > config/i386/i386.c should mirror these changes. Please see [1]. > > Probably, we should document this in the source... > > - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, > + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, > > Huh... Thanks for catching this. -march=sandybridge is not recognized... - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, + {"core-avx-i", M_INTEL_COREI7_IVYBRIDGE}, + {"core-avx2", M_INTEL_COREI7_HASWELL}, Ah, no. These names are not intended to be used in -march. We can follow the tradition and use sandybridge, ivybridge and haswell here. + {"btver1", M_AMDFAM14H_BTVER1}, + {"btver2", M_AMDFAM14H_BTVER2}, {"amdfam15h", M_AMDFAM15H}, {"bdver1", M_AMDFAM15H_BDVER1}, {"bdver2", M_AMDFAM15H_BDVER2}, Maybe AMD wants bobcat, piledriver and steamroller here instead of btverX / bdverX? Uros.
> Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is that an old term for what has become the Jaguar architecture? Yes, "btver2" = "jaguar". We have the name as per its family name (i.e, bobcat family) in GCC. Similarly we have the names "bdver2" = "piledriver", "bdver3" = "steamroller" as per their family (bulldozer) name. Regards Ganesh -----Original Message----- From: Allan Sandfeld Jensen [mailto:carewolf@gmail.com] Sent: Monday, December 16, 2013 12:25 AM To: Uros Bizjak Cc: gcc-patches@gcc.gnu.org Subject: Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning Hi again On Wednesday 11 December 2013, Uros Bizjak wrote: > Hello! > > > PR gcc/59422 > > > > This patch extends the supported targets for function multi versiong > > to also include Haswell, Silvermont, and the most recent AMD models. > > It also prioritizes AVX2 versions over AMD specific pre-AVX2 versions. > > Please add a ChangeLog entry and attach the complete patch. Please > also state how you tested the patch, as outlined in the instructions > [1]. > > [1] http://gcc.gnu.org/contribute.html > Updated patch for better CPU model detection and added ChangeLog. The patch has been tested with the attached test.cpp. Verified that it doesn't build before the patch, and that it builds after, and verified it selects correct versions at runtime based on either CPU model or supported ISA (tested on 3 machines: SandyBridge, IvyBridge and Phenom II). Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is that an old term for what has become the Jaguar architecture? `Allan
On Monday 16 December 2013, Gopalasubramanian, Ganesh wrote: > > Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is > > that an old term for what has become the Jaguar architecture? > > Yes, "btver2" = "jaguar". We have the name as per its family name (i.e, > bobcat family) in GCC. Similarly we have the names "bdver2" = > "piledriver", "bdver3" = "steamroller" as per their family (bulldozer) > name. > Yes, I figured that was the original idea behind it, but the final family of the jaguar processors seems to have become 16h instead of 14h (bobcat) at some point. Regards `Allan
> Yes, I figured that was the original idea behind it, but the final family of the jaguar processors seems to have become 16h instead of 14h (bobcat) at some point.
Yes. It is amdfam16h. I was supposed to pass on some comments on the patch.
1. Amdfam16h for Jaguar.
2. For Jaguar, the priority needs to be AVX (AVX got included into the Jaguar ISA).
I have a doubt! What would be done if priority is set to "F_FMA4" instead of "F_XOP" for bdver1?
Regards
Ganesh
On Monday 16 December 2013, Uros Bizjak wrote: > On Mon, Dec 16, 2013 at 10:34 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen > > > > <carewolf@gmail.com> wrote: > >> Hi again > >> > >> On Wednesday 11 December 2013, Uros Bizjak wrote: > >>> Hello! > >>> > >>> > PR gcc/59422 > >>> > > >>> > This patch extends the supported targets for function multi versiong > >>> > to also include Haswell, Silvermont, and the most recent AMD models. > >>> > It also prioritizes AVX2 versions over AMD specific pre-AVX2 > >>> > versions. > >>> > >>> Please add a ChangeLog entry and attach the complete patch. Please > >>> also state how you tested the patch, as outlined in the instructions > >>> [1]. > >>> > >>> [1] http://gcc.gnu.org/contribute.html > >> > >> Updated patch for better CPU model detection and added ChangeLog. > >> > >> The patch has been tested with the attached test.cpp. Verified that it > >> doesn't build before the patch, and that it builds after, and verified > >> it selects correct versions at runtime based on either CPU model or > >> supported ISA (tested on 3 machines: SandyBridge, IvyBridge and Phenom > >> II). > >> > >> Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is > >> that an old term for what has become the Jaguar architecture? > > > > Thanks for the patch! > > > > However, you should not change the existing order of enums in > > cpuinfo.c (enum processor_vendor, enum processor_types, enum > > processor_subtypes, enum processor_features), but new entries should > > be added at the end (before *_MAX entry, if exists) of the enum. The > > enums (enum processor_features and enum processor_model) in > > config/i386/i386.c should mirror these changes. Please see [1]. > > > > Probably, we should document this in the source... > > > > - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, > > + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, > > > > Huh... Thanks for catching this. -march=sandybridge is not recognized... > > - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, > + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, > + {"core-avx-i", M_INTEL_COREI7_IVYBRIDGE}, > + {"core-avx2", M_INTEL_COREI7_HASWELL}, > > Ah, no. These names are not intended to be used in -march. We can > follow the tradition and use sandybridge, ivybridge and haswell here. > I had the problem that "arch=corei7-avx" was not recognized as a valid property argument until I made that change. I thought it was the intend to merge this list of models with the canonical names, but perhaps it is an error in the new parameter validation? Note that similarly "arch=sandybridge" is accepted as a valid property argument but then fails as an invalid argument for march. Regards `Allan
On Tuesday 17 December 2013, Allan Sandfeld Jensen wrote: > On Monday 16 December 2013, Uros Bizjak wrote: > > On Mon, Dec 16, 2013 at 10:34 AM, Uros Bizjak <ubizjak@gmail.com> wrote: > > > On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen > > > > > > <carewolf@gmail.com> wrote: > > >> Hi again > > >> > > >> On Wednesday 11 December 2013, Uros Bizjak wrote: > > >>> Hello! > > >>> > > >>> > PR gcc/59422 > > >>> > > > >>> > This patch extends the supported targets for function multi > > >>> > versiong to also include Haswell, Silvermont, and the most recent > > >>> > AMD models. It also prioritizes AVX2 versions over AMD specific > > >>> > pre-AVX2 versions. > > >>> > > >>> Please add a ChangeLog entry and attach the complete patch. Please > > >>> also state how you tested the patch, as outlined in the instructions > > >>> [1]. > > >>> > > >>> [1] http://gcc.gnu.org/contribute.html > > >> > > >> Updated patch for better CPU model detection and added ChangeLog. > > >> > > >> The patch has been tested with the attached test.cpp. Verified that it > > >> doesn't build before the patch, and that it builds after, and verified > > >> it selects correct versions at runtime based on either CPU model or > > >> supported ISA (tested on 3 machines: SandyBridge, IvyBridge and Phenom > > >> II). > > >> > > >> Btw, I couldn't find anything that corresponds to gcc's btver2 arch. > > >> Is that an old term for what has become the Jaguar architecture? > > > > > > Thanks for the patch! > > > > > > However, you should not change the existing order of enums in > > > cpuinfo.c (enum processor_vendor, enum processor_types, enum > > > processor_subtypes, enum processor_features), but new entries should > > > be added at the end (before *_MAX entry, if exists) of the enum. The > > > enums (enum processor_features and enum processor_model) in > > > config/i386/i386.c should mirror these changes. Please see [1]. > > > > > > Probably, we should document this in the source... > > > > > > - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, > > > + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, > > > > > > Huh... Thanks for catching this. -march=sandybridge is not > > > recognized... > > > > - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, > > + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, > > + {"core-avx-i", M_INTEL_COREI7_IVYBRIDGE}, > > + {"core-avx2", M_INTEL_COREI7_HASWELL}, > > > > Ah, no. These names are not intended to be used in -march. We can > > follow the tradition and use sandybridge, ivybridge and haswell here. > > I had the problem that "arch=corei7-avx" was not recognized as a valid > property argument until I made that change. I thought it was the intend to > merge this list of models with the canonical names, but perhaps it is an > error in the new parameter validation? > Ah, sorry. I think I misremembered the problem. After reviewing the code again, I think the only problem is with target("arch=core-avx-i") because it is not in the list of architectures (because it is treated as the same architecture as corei7-avx presumably). I will revert the sandybridge name change in the next patch, and make the new names match. `Allan
Ping!
"Gopalasubramanian, Ganesh" <Ganesh.Gopalasubramanian@amd.com> wrote:
> Yes, I figured that was the original idea behind it, but the final family of the jaguar processors seems to have become 16h instead of 14h (bobcat) at some point.
Yes. It is amdfam16h. I was supposed to pass on some comments on the patch.
1. Amdfam16h for Jaguar.
2. For Jaguar, the priority needs to be AVX (AVX got included into the Jaguar ISA).
I have a doubt! What would be done if priority is set to "F_FMA4" instead of "F_XOP" for bdver1?
Regards
Ganesh
On Wed, Dec 18, 2013 at 4:38 PM, Gopalasubramanian, Ganesh <Ganesh.Gopalasubramanian@amd.com> wrote: > > Ping! > > "Gopalasubramanian, Ganesh" <Ganesh.Gopalasubramanian@amd.com> wrote: > > >> Yes, I figured that was the original idea behind it, but the final family of the jaguar processors seems to have become 16h instead of 14h (bobcat) at some point. > > Yes. It is amdfam16h. I was supposed to pass on some comments on the patch. > 1. Amdfam16h for Jaguar. > 2. For Jaguar, the priority needs to be AVX (AVX got included into the Jaguar ISA). > > I have a doubt! What would be done if priority is set to "F_FMA4" instead of "F_XOP" for bdver1? XOP enables FMA4, so it should be better to set priority to P_PROC_XOP. From config/i386/i386-common.c: #define OPTION_MASK_ISA_XOP_SET \ (OPTION_MASK_ISA_XOP | OPTION_MASK_ISA_FMA4_SET) Looking at processor_dispatch_table, bdver1 doesn't support F_FMA, so the proposed patch fixes an error here. Uros.
On Wednesday 18 December 2013, Gopalasubramanian, Ganesh wrote: > Ping! > > "Gopalasubramanian, Ganesh" <Ganesh.Gopalasubramanian@amd.com> wrote: > > Yes, I figured that was the original idea behind it, but the final family > > of the jaguar processors seems to have become 16h instead of 14h > > (bobcat) at some point. > > Yes. It is amdfam16h. I was supposed to pass on some comments on the patch. > 1. Amdfam16h for Jaguar. > 2. For Jaguar, the priority needs to be AVX (AVX got included into the > Jaguar ISA). > Yes, I changed that in the last patch, though I consider it momentarily problematic because you do not yet enable AVX with march=btver2 (AVX versions would currently be better than btver2 versions for a btver2 arch), but expect march=btver2 will be fixed soon. Regards 'Allan
> Yes, I changed that in the last patch, though I consider it momentarily problematic because you do not yet enable AVX with march=btver2 (AVX versions would currently be better than btver2 versions for a btver2 arch), but expect
march=btver2 will be fixed soon.
The " processor_alias_table" entry for "btver2" in i386.c enables AVX.
<snip>
{"btver2", PROCESSOR_BTVER2, CPU_BTVER2,
PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
| PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
| PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
| PTA_BMI | PTA_F16C | PTA_MOVBE | PTA_PRFCHW
| PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
</snip>
The assembly listing for a simple test (compiled with -march=btver2) also has -mavx enabled. So, can you please enable AVX for btver2?
Regards
Ganesh
On Thursday 19 December 2013, Gopalasubramanian, Ganesh wrote: > > Yes, I changed that in the last patch, though I consider it momentarily > > problematic because you do not yet enable AVX with march=btver2 (AVX > > versions would currently be better than btver2 versions for a btver2 > > arch), but expect > > march=btver2 will be fixed soon. > > The " processor_alias_table" entry for "btver2" in i386.c enables AVX. > > <snip> > {"btver2", PROCESSOR_BTVER2, CPU_BTVER2, > PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 > > | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1 > | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX > | PTA_BMI | PTA_F16C | PTA_MOVBE | PTA_PRFCHW > | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT}, > > </snip> > > The assembly listing for a simple test (compiled with -march=btver2) also > has -mavx enabled. So, can you please enable AVX for btver2? > Sorry, I must have been looking at an older version, but as I said I already did enable it in the latest patch. (see http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01577.html ) `Allan
Index: gcc/ChangeLog =================================================================== --- gcc/ChangeLog (revision 205984) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2013-12-14 Allan Sandfeld Jensen <sandfeld@kde.org> + + PR gcc/59422 + * config/i386/i386.c: Extend function multiversioning + to better support recent Intel and AMD models. + 2013-12-14 Marek Polacek <polacek@redhat.com> PR sanitizer/59503 Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c (revision 205984) +++ gcc/config/i386/i386.c (working copy) @@ -29962,9 +29962,14 @@ P_PROC_SSE4_2, P_POPCNT, P_AVX, + P_PROC_AVX, + P_FMA4, + P_XOP, + P_PROC_XOP, + P_FMA, + P_PROC_FMA, P_AVX2, - P_FMA, - P_PROC_FMA + P_PROC_AVX2 }; enum feature_priority priority = P_ZERO; @@ -29983,11 +29988,15 @@ {"sse", P_SSE}, {"sse2", P_SSE2}, {"sse3", P_SSE3}, + {"sse4a", P_SSE4_a}, {"ssse3", P_SSSE3}, {"sse4.1", P_SSE4_1}, {"sse4.2", P_SSE4_2}, {"popcnt", P_POPCNT}, {"avx", P_AVX}, + {"fma4", P_FMA4}, + {"xop", P_XOP}, + {"fma", P_FMA}, {"avx2", P_AVX2} }; @@ -30041,25 +30050,49 @@ break; case PROCESSOR_COREI7_AVX: arg_str = "corei7-avx"; - priority = P_PROC_SSE4_2; + priority = P_PROC_AVX; break; + case PROCESSOR_HASWELL: + arg_str = "core-avx2"; + priority = P_PROC_AVX2; + break; case PROCESSOR_ATOM: arg_str = "atom"; priority = P_PROC_SSSE3; break; + case PROCESSOR_SLM: + arg_str = "slm"; + priority = P_PROC_SSE4_2; + break; case PROCESSOR_AMDFAM10: arg_str = "amdfam10h"; priority = P_PROC_SSE4_a; break; + case PROCESSOR_BTVER1: + arg_str = "btver1"; + priority = P_PROC_SSE4_a; + break; + case PROCESSOR_BTVER2: + arg_str = "btver2"; + priority = P_PROC_SSE4_2; + break; case PROCESSOR_BDVER1: arg_str = "bdver1"; - priority = P_PROC_FMA; + priority = P_PROC_XOP; break; case PROCESSOR_BDVER2: arg_str = "bdver2"; priority = P_PROC_FMA; break; - } + case PROCESSOR_BDVER3: + arg_str = "bdver3"; + priority = P_PROC_FMA; + break; + case PROCESSOR_BDVER4: + arg_str = "bdver4"; + priority = P_PROC_AVX2; + break; + } } cl_target_option_restore (&global_options, &cur_target); @@ -30919,9 +30952,13 @@ F_SSE2, F_SSE3, F_SSSE3, + F_SSE4_a, F_SSE4_1, F_SSE4_2, F_AVX, + F_FMA4, + F_XOP, + F_FMA, F_AVX2, F_MAX }; @@ -30938,15 +30975,20 @@ M_INTEL_CORE2, M_INTEL_COREI7, M_AMDFAM10H, + M_AMDFAM14H, M_AMDFAM15H, M_INTEL_SLM, M_CPU_SUBTYPE_START, M_INTEL_COREI7_NEHALEM, M_INTEL_COREI7_WESTMERE, M_INTEL_COREI7_SANDYBRIDGE, + M_INTEL_COREI7_IVYBRIDGE, + M_INTEL_COREI7_HASWELL, M_AMDFAM10H_BARCELONA, M_AMDFAM10H_SHANGHAI, M_AMDFAM10H_ISTANBUL, + M_AMDFAM14H_BTVER1, + M_AMDFAM14H_BTVER2, M_AMDFAM15H_BDVER1, M_AMDFAM15H_BDVER2, M_AMDFAM15H_BDVER3, @@ -30968,11 +31010,16 @@ {"corei7", M_INTEL_COREI7}, {"nehalem", M_INTEL_COREI7_NEHALEM}, {"westmere", M_INTEL_COREI7_WESTMERE}, - {"sandybridge", M_INTEL_COREI7_SANDYBRIDGE}, + {"corei7-avx", M_INTEL_COREI7_SANDYBRIDGE}, + {"core-avx-i", M_INTEL_COREI7_IVYBRIDGE}, + {"core-avx2", M_INTEL_COREI7_HASWELL}, {"amdfam10h", M_AMDFAM10H}, {"barcelona", M_AMDFAM10H_BARCELONA}, {"shanghai", M_AMDFAM10H_SHANGHAI}, {"istanbul", M_AMDFAM10H_ISTANBUL}, + {"amdfam14h", M_AMDFAM14H}, + {"btver1", M_AMDFAM14H_BTVER1}, + {"btver2", M_AMDFAM14H_BTVER2}, {"amdfam15h", M_AMDFAM15H}, {"bdver1", M_AMDFAM15H_BDVER1}, {"bdver2", M_AMDFAM15H_BDVER2}, @@ -30994,9 +31041,13 @@ {"sse2", F_SSE2}, {"sse3", F_SSE3}, {"ssse3", F_SSSE3}, + {"sse4a", F_SSE4_a}, {"sse4.1", F_SSE4_1}, {"sse4.2", F_SSE4_2}, {"avx", F_AVX}, + {"fma4", F_FMA4}, + {"xop", F_XOP}, + {"fma", F_FMA}, {"avx2", F_AVX2} }; Index: libgcc/ChangeLog =================================================================== --- libgcc/ChangeLog (revision 205984) +++ libgcc/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2013-12-14 Allan Sandfeld Jensen <sandfeld@kde.org> + + PR gcc/59422 + * config/i386/cpuinfo.c: Detect sse4a, fma4, xop and fma + ISAs and recent Intel and AMD models. + 2013-12-12 Zhenqiang Chen <zhenqiang.chen@arm.com> * config.host (arm*-*-uclinux*): Move t-arm before t-bpabi. Index: libgcc/config/i386/cpuinfo.c =================================================================== --- libgcc/config/i386/cpuinfo.c (revision 205984) +++ libgcc/config/i386/cpuinfo.c (working copy) @@ -60,6 +60,7 @@ INTEL_CORE2, INTEL_COREI7, AMDFAM10H, + AMDFAM14H, AMDFAM15H, INTEL_SLM, CPU_TYPE_MAX @@ -70,11 +71,17 @@ INTEL_COREI7_NEHALEM = 1, INTEL_COREI7_WESTMERE, INTEL_COREI7_SANDYBRIDGE, + INTEL_COREI7_IVYBRIDGE, + INTEL_COREI7_HASWELL, AMDFAM10H_BARCELONA, AMDFAM10H_SHANGHAI, AMDFAM10H_ISTANBUL, + AMDFAM14H_BTVER1, + AMDFAM14H_BTVER2, AMDFAM15H_BDVER1, AMDFAM15H_BDVER2, + AMDFAM15H_BDVER3, + AMDFAM15H_BDVER4, CPU_SUBTYPE_MAX }; @@ -89,9 +96,13 @@ FEATURE_SSE2, FEATURE_SSE3, FEATURE_SSSE3, + FEATURE_SSE4_a, FEATURE_SSE4_1, FEATURE_SSE4_2, FEATURE_AVX, + FEATURE_FMA4, + FEATURE_XOP, + FEATURE_FMA, FEATURE_AVX2 }; @@ -113,36 +124,43 @@ { /* AMD Family 10h. */ case 0x10: + __cpu_model.__cpu_type = AMDFAM10H; switch (model) { case 0x2: /* Barcelona. */ - __cpu_model.__cpu_type = AMDFAM10H; __cpu_model.__cpu_subtype = AMDFAM10H_BARCELONA; break; case 0x4: /* Shanghai. */ - __cpu_model.__cpu_type = AMDFAM10H; __cpu_model.__cpu_subtype = AMDFAM10H_SHANGHAI; break; case 0x8: /* Istanbul. */ - __cpu_model.__cpu_type = AMDFAM10H; __cpu_model.__cpu_subtype = AMDFAM10H_ISTANBUL; break; default: break; } break; - /* AMD Family 15h. */ + /* AMD Family 14h "Bobcat". */ + case 0x14: + __cpu_model.__cpu_type = AMDFAM14H; + if ( model <= 0xf) + __cpu_model.__cpu_subtype = AMDFAM14H_BTVER1; + break; + /* AMD Family 15h "Bulldozer". */ case 0x15: __cpu_model.__cpu_type = AMDFAM15H; /* Bulldozer version 1. */ if ( model <= 0xf) __cpu_model.__cpu_subtype = AMDFAM15H_BDVER1; - /* Bulldozer version 2. */ - if (model >= 0x10 && model <= 0x1f) - __cpu_model.__cpu_subtype = AMDFAM15H_BDVER2; + /* Bulldozer version 2 "Piledriver" */ + if (model >= 0x10 && model <= 0x2f) + __cpu_model.__cpu_subtype = AMDFAM15H_BDVER2; + /* Bulldozer version 3 "Steamroller" */ + if (model >= 0x30 && model <= 0x4f) + __cpu_model.__cpu_subtype = AMDFAM15H_BDVER3; break; default: break; @@ -196,6 +214,18 @@ __cpu_model.__cpu_type = INTEL_COREI7; __cpu_model.__cpu_subtype = INTEL_COREI7_SANDYBRIDGE; break; + case 0x3a: + case 0x3e: + /* Ivy Bridge. */ + __cpu_model.__cpu_type = INTEL_COREI7; + __cpu_model.__cpu_subtype = INTEL_COREI7_IVYBRIDGE; + case 0x3c: + case 0x3f: + case 0x45: + case 0x46: + /* Haswell. */ + __cpu_model.__cpu_type = INTEL_COREI7; + __cpu_model.__cpu_subtype = INTEL_COREI7_HASWELL; case 0x17: case 0x1d: /* Penryn. */ @@ -242,6 +272,8 @@ features |= (1 << FEATURE_SSE4_2); if (ecx & bit_AVX) features |= (1 << FEATURE_AVX); + if (ecx & bit_FMA) + features |= (1 << FEATURE_FMA); /* Get Advanced Features at level 7 (eax = 7, ecx = 0). */ if (max_cpuid_level >= 7) @@ -252,6 +284,23 @@ features |= (1 << FEATURE_AVX2); } + unsigned int ext_level; + unsigned int eax, ebx; + /* Check cpuid level of extended features. */ + __cpuid (0x80000000, ext_level, ebx, ecx, edx); + + if (ext_level > 0x80000000) + { + __cpuid (0x80000001, eax, ebx, ecx, edx); + + if (ecx & bit_SSE4a) + features |= (1 << FEATURE_SSE4_a); + if (ecx & bit_FMA4) + features |= (1 << FEATURE_FMA4); + if (ecx & bit_XOP) + features |= (1 << FEATURE_XOP); + } + __cpu_model.__cpu_features[0] = features; }