mbox series

[0/3] RFC: Platform Support for AMD Zen and AVX2/AVX

Message ID 20200317044646.29707-1-PMallappa@amd.com
Headers show
Series RFC: Platform Support for AMD Zen and AVX2/AVX | expand

Message

dilfridge--- via Libc-alpha March 17, 2020, 4:46 a.m. UTC
From: Prem Mallappa <Premachandra.Mallappa@amd.com>

Hello Glibc Community,

== (cross posting to libc-alpha, apologies for the spam) ==

This is in response to

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=24979
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=24080
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=23249

It is clear that there is no panacea here. However,
here is an attempt to address them in parts.

From [1], enable customers who already have
"haswell" libs and has seen perf benifits by loading
them on AMD Zen.
(Load libraries by placing them in LD_LIBRARY_PATH/zen
or by a symbolic link zen->haswell)

From [2] and [3]
And, A futuristic generic-avx2/generic-avx libs,
enables OS vendors to supply an optimized set.
And haswell/zen are really a superset, hence
keeping it made sense.

By this we would like to open it up for discussion
The haswell/zen can be intel/amd
(or any other name, and supply ifunc based loading
internally)

Prem Mallappa (3):
  x86: Refactor platform support in cpu_features
  x86: Add AMD Zen and AVX2/AVX platform support
  x86: test to load from PLATFORM path

 sysdeps/x86/cpu-features.c | 113 ++++++++++++++++++++++---------------
 sysdeps/x86_64/Makefile    |   3 +-
 2 files changed, 69 insertions(+), 47 deletions(-)

Comments

Florian Weimer March 17, 2020, 9:02 a.m. UTC | #1
* Prem Mallappa via Libc-alpha:

> From: Prem Mallappa <Premachandra.Mallappa@amd.com>
>
> Hello Glibc Community,
>
> == (cross posting to libc-alpha, apologies for the spam) ==
>
> This is in response to
>
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=24979
> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=24080
> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=23249
>
> It is clear that there is no panacea here. However,
> here is an attempt to address them in parts.
>
> From [1], enable customers who already have
> "haswell" libs and has seen perf benifits by loading
> them on AMD Zen.
> (Load libraries by placing them in LD_LIBRARY_PATH/zen
> or by a symbolic link zen->haswell)
>
> From [2] and [3]
> And, A futuristic generic-avx2/generic-avx libs,
> enables OS vendors to supply an optimized set.
> And haswell/zen are really a superset, hence
> keeping it made sense.
>
> By this we would like to open it up for discussion
> The haswell/zen can be intel/amd
> (or any other name, and supply ifunc based loading
> internally)

I think we cannot use the platform subdirectory for that because there
is just a single one.  If we want a Intel/AMD split, we need to
enhance the dynamic loader to try the CPU vendor directory first, and
then fallback to a shared subdirectory.  Most distributions do not
want to test and ship binaries specific to Intel or AMD CPUs.

That's a generic loader change which will need some time to implement,
but we can work on something else in the meantime:

We need to check for *all* relevant CPU flags such code can use and,
and only enable a subdirectory if they are present.  This is necessary
because virtualization and microcode updates can disable individual
CPU features.

For the new shared subdirectory, I think we should not restrict
ourselves just to AVX2, but we should also include useful extensions
that are in practice always implemented in silicon along with AVX2,
but can be separately tweaked.

This seems to be a reasonable list of CPU feature flags to start with:

  3DNOW
  3DNOWEXT
  3DNOWPREFETCH
  ABM
  ADX
  AES
  AVX
  AVX2
  BMI
  BMI2
  CET
  CLFLUSH
  CLFLUSHOPT
  CLWB
  CLZERO
  CMPXCHG16B
  ERMS
  F16C
  FMA
  FMA4
  FSGSBASE
  FSRM
  FXSR
  HLE
  LAHF
  LZCNT
  MOVBE
  MWAITX
  PCLMUL
  PCOMMIT
  PKU
  POPCNT
  PREFETCHW
  RDPID
  RDRAND
  RDSEED
  RDTSCP
  RTM
  SHA
  SSE3
  SSE4.1
  SSE4.2
  SSE4A
  SSSE3
  TSC
  XGETBV
  XSAVE
  XSAVEC
  XSAVEOPT
  XSAVES

You (as in AMD) need to go through this list and come back with the
subset that you think should be enabled for current and future CPUs,
based on your internal roadmap and known errata for existing CPUs.  We
do not need a rationale for how you filter down the list, merely the
outcome.

(I already have the trimmed-down list from Intel.)
dilfridge--- via Libc-alpha March 17, 2020, 1:17 p.m. UTC | #2
On 3/17/20 5:02 AM, Florian Weimer wrote:
> * Prem Mallappa via Libc-alpha:
> 
>> From: Prem Mallappa <Premachandra.Mallappa@amd.com>
>>
>> Hello Glibc Community,
>>
>> == (cross posting to libc-alpha, apologies for the spam) ==
>>
>> This is in response to
>>
>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=24979
>> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=24080
>> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=23249
>>
>> It is clear that there is no panacea here. However,
>> here is an attempt to address them in parts.
>>
>> From [1], enable customers who already have
>> "haswell" libs and has seen perf benifits by loading
>> them on AMD Zen.
>> (Load libraries by placing them in LD_LIBRARY_PATH/zen
>> or by a symbolic link zen->haswell)
>>
>> From [2] and [3]
>> And, A futuristic generic-avx2/generic-avx libs,
>> enables OS vendors to supply an optimized set.
>> And haswell/zen are really a superset, hence
>> keeping it made sense.
>>
>> By this we would like to open it up for discussion
>> The haswell/zen can be intel/amd
>> (or any other name, and supply ifunc based loading
>> internally)
> 
> I think we cannot use the platform subdirectory for that because there
> is just a single one.  If we want a Intel/AMD split, we need to
> enhance the dynamic loader to try the CPU vendor directory first, and
> then fallback to a shared subdirectory.  Most distributions do not
> want to test and ship binaries specific to Intel or AMD CPUs.

I agree. The additional burden on testing, maintaining, and supporting
distinct libraries is not feasible.

> That's a generic loader change which will need some time to implement,
> but we can work on something else in the meantime:
> 
> We need to check for *all* relevant CPU flags such code can use and,
> and only enable a subdirectory if they are present.  This is necessary
> because virtualization and microcode updates can disable individual
> CPU features.

Agreed. This is the only sensible plan. The platform directories already
imply some of this, but it's not well structured.

> For the new shared subdirectory, I think we should not restrict
> ourselves just to AVX2, but we should also include useful extensions
> that are in practice always implemented in silicon along with AVX2,
> but can be separately tweaked.

Agreed.

> This seems to be a reasonable list of CPU feature flags to start with:
> 
>   3DNOW
>   3DNOWEXT
>   3DNOWPREFETCH
>   ABM
>   ADX
>   AES
>   AVX
>   AVX2
>   BMI
>   BMI2
>   CET
>   CLFLUSH
>   CLFLUSHOPT
>   CLWB
>   CLZERO
>   CMPXCHG16B
>   ERMS
>   F16C
>   FMA
>   FMA4
>   FSGSBASE
>   FSRM
>   FXSR
>   HLE
>   LAHF
>   LZCNT
>   MOVBE
>   MWAITX
>   PCLMUL
>   PCOMMIT
>   PKU
>   POPCNT
>   PREFETCHW
>   RDPID
>   RDRAND
>   RDSEED
>   RDTSCP
>   RTM
>   SHA
>   SSE3
>   SSE4.1
>   SSE4.2
>   SSE4A
>   SSSE3
>   TSC
>   XGETBV
>   XSAVE
>   XSAVEC
>   XSAVEOPT
>   XSAVES
> 
> You (as in AMD) need to go through this list and come back with the
> subset that you think should be enabled for current and future CPUs,
> based on your internal roadmap and known errata for existing CPUs.  We
> do not need a rationale for how you filter down the list, merely the
> outcome.

And this is the hard part that we can't solve without AMD's help.

Even if you ignore "future CPUs" it would be useful to get this list
for all current CPUs given your architectural knowledge, errata, and
other factors like microcode, that covers the currently released CPUs.

> (I already have the trimmed-down list from Intel.)
>
dilfridge--- via Libc-alpha March 17, 2020, 7:27 p.m. UTC | #3
On 17/03/2020 10:17, Carlos O'Donell via Libc-alpha wrote:
> On 3/17/20 5:02 AM, Florian Weimer wrote:
>> * Prem Mallappa via Libc-alpha:
>>
>>> From: Prem Mallappa <Premachandra.Mallappa@amd.com>
>>>
>>> Hello Glibc Community,
>>>
>>> == (cross posting to libc-alpha, apologies for the spam) ==
>>>
>>> This is in response to
>>>
>>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=24979
>>> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=24080
>>> [3] https://sourceware.org/bugzilla/show_bug.cgi?id=23249
>>>
>>> It is clear that there is no panacea here. However,
>>> here is an attempt to address them in parts.
>>>
>>> From [1], enable customers who already have
>>> "haswell" libs and has seen perf benifits by loading
>>> them on AMD Zen.
>>> (Load libraries by placing them in LD_LIBRARY_PATH/zen
>>> or by a symbolic link zen->haswell)
>>>
>>> From [2] and [3]
>>> And, A futuristic generic-avx2/generic-avx libs,
>>> enables OS vendors to supply an optimized set.
>>> And haswell/zen are really a superset, hence
>>> keeping it made sense.
>>>
>>> By this we would like to open it up for discussion
>>> The haswell/zen can be intel/amd
>>> (or any other name, and supply ifunc based loading
>>> internally)
>>
>> I think we cannot use the platform subdirectory for that because there
>> is just a single one.  If we want a Intel/AMD split, we need to
>> enhance the dynamic loader to try the CPU vendor directory first, and
>> then fallback to a shared subdirectory.  Most distributions do not
>> want to test and ship binaries specific to Intel or AMD CPUs.
> 
> I agree. The additional burden on testing, maintaining, and supporting
> distinct libraries is not feasible.
> 
>> That's a generic loader change which will need some time to implement,
>> but we can work on something else in the meantime:
>>
>> We need to check for *all* relevant CPU flags such code can use and,
>> and only enable a subdirectory if they are present.  This is necessary
>> because virtualization and microcode updates can disable individual
>> CPU features.
> 
> Agreed. This is the only sensible plan. The platform directories already
> imply some of this, but it's not well structured.

Which should be our policy regarding the platform name over releases?
Should the names set in previous release being supported in a 
compatibility manner or should it not be constraint (as for tunables)
and subject of change?

If the former, with a defined subset of the CPU features flags it might
be possible to share folder over x86 chips without using chips release
names.

> 
>> For the new shared subdirectory, I think we should not restrict
>> ourselves just to AVX2, but we should also include useful extensions
>> that are in practice always implemented in silicon along with AVX2,
>> but can be separately tweaked.
> 
> Agreed.
> 
>> This seems to be a reasonable list of CPU feature flags to start with:
>>
>>   3DNOW
>>   3DNOWEXT
>>   3DNOWPREFETCH
>>   ABM
>>   ADX
>>   AES
>>   AVX
>>   AVX2
>>   BMI
>>   BMI2
>>   CET
>>   CLFLUSH
>>   CLFLUSHOPT
>>   CLWB
>>   CLZERO
>>   CMPXCHG16B
>>   ERMS
>>   F16C
>>   FMA
>>   FMA4
>>   FSGSBASE
>>   FSRM
>>   FXSR
>>   HLE
>>   LAHF
>>   LZCNT
>>   MOVBE
>>   MWAITX
>>   PCLMUL
>>   PCOMMIT
>>   PKU
>>   POPCNT
>>   PREFETCHW
>>   RDPID
>>   RDRAND
>>   RDSEED
>>   RDTSCP
>>   RTM
>>   SHA
>>   SSE3
>>   SSE4.1
>>   SSE4.2
>>   SSE4A
>>   SSSE3
>>   TSC
>>   XGETBV
>>   XSAVE
>>   XSAVEC
>>   XSAVEOPT
>>   XSAVES
>>
>> You (as in AMD) need to go through this list and come back with the
>> subset that you think should be enabled for current and future CPUs,
>> based on your internal roadmap and known errata for existing CPUs.  We
>> do not need a rationale for how you filter down the list, merely the
>> outcome.
> 
> And this is the hard part that we can't solve without AMD's help.
> 
> Even if you ignore "future CPUs" it would be useful to get this list
> for all current CPUs given your architectural knowledge, errata, and
> other factors like microcode, that covers the currently released CPUs.

So the question is how should be move forward: let the chip vendor
(Intel, AMD, etc.) define its own naming scheme based on its own
chip roadmap, or create subsets of features to set a common naming
folder (as suggested by Richard Biener on BZ#24080 [1])?

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=24080
dilfridge--- via Libc-alpha March 17, 2020, 9:37 p.m. UTC | #4
On 3/17/20 3:27 PM, Adhemerval Zanella via Libc-alpha wrote:
> On 17/03/2020 10:17, Carlos O'Donell via Libc-alpha wrote:
>> Agreed. This is the only sensible plan. The platform directories already
>> imply some of this, but it's not well structured.
> 
> Which should be our policy regarding the platform name over releases?
> Should the names set in previous release being supported in a 
> compatibility manner or should it not be constraint (as for tunables)
> and subject of change?

It should be subject to change just like tunables.

It should be an optimization, and not a requirement, and applications
should always provide a fallback implementaiton to allow the application
to load.

Failure to load the libraries *may* result in a failure to start the
application and we need to be OK with that. That is to say that a
particularly package may only ship an optimized library (going against
the recommendation).

We should verify that downstream distributions can use /etc/ld.so.conf
as a way to add back directories into the search of the existing 
additional multilib search directories e.g. Add back /lib64/haswell
for a few years.

Does that answer your question?
 
> If the former, with a defined subset of the CPU features flags it might
> be possible to share folder over x86 chips without using chips release
> names.

Yes.

>> And this is the hard part that we can't solve without AMD's help.
>>
>> Even if you ignore "future CPUs" it would be useful to get this list
>> for all current CPUs given your architectural knowledge, errata, and
>> other factors like microcode, that covers the currently released CPUs.
> 
> So the question is how should be move forward: let the chip vendor
> (Intel, AMD, etc.) define its own naming scheme based on its own
> chip roadmap, or create subsets of features to set a common naming
> folder (as suggested by Richard Biener on BZ#24080 [1])?

In the end I think we'll want:

(a) Try CPU vendor directories first.
- Each vendor should name their directories and the explicit
  compiler options to target them (printed by LD_DEBUG).

(b) Try shared directories second.
- Based on a common set of identified features.
  - Compiler options to target the shared set should be explicitly
    stated (printed by LD_DEBUG).

My understanding is that Florian is asking for help with (b)
to identify what things should be enabled for current CPUs, and
that we'll compare that list to the Intel list and make a common
shared directory that the downstream distributions can used
for the most optimized library we can have in common.

What we can do:
- Cleanup the generic code to allow a CPU vendor split?
  - We have a mix of hwcap bit handling and list handling
    for platforms. This is also a bit messy.
  - Allow vendors to drop in their CPU vendor search list
    ahead of the shared list, again based on feature presence.
- Prepare the code for a shared common directory based on
  some shared subset of features, and enable it only if
  those features are present. Today doing this is a bit
  messy in cpu-features.c.


> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=24080
Florian Weimer March 27, 2020, 2:26 p.m. UTC | #5
* Carlos O'Donell via Libc-alpha:

> On 3/17/20 3:27 PM, Adhemerval Zanella via Libc-alpha wrote:
>> On 17/03/2020 10:17, Carlos O'Donell via Libc-alpha wrote:
>>> Agreed. This is the only sensible plan. The platform directories already
>>> imply some of this, but it's not well structured.
>> 
>> Which should be our policy regarding the platform name over releases?
>> Should the names set in previous release being supported in a 
>> compatibility manner or should it not be constraint (as for tunables)
>> and subject of change?
>
> It should be subject to change just like tunables.

I disagree; for a subset of the directories, we should guarantee
stability.

> It should be an optimization, and not a requirement, and applications
> should always provide a fallback implementaiton to allow the application
> to load.

Agreed.  Programmers need to assume that future glibc versions may
stop selecting certain subdirectories.  However, I'm not sure if we
can suddenly start selecting directories on systems where we did not
do so before.

> We should verify that downstream distributions can use /etc/ld.so.conf
> as a way to add back directories into the search of the existing 
> additional multilib search directories e.g. Add back /lib64/haswell
> for a few years.

I don't think that works.

> In the end I think we'll want:
>
> (a) Try CPU vendor directories first.
> - Each vendor should name their directories and the explicit
>   compiler options to target them (printed by LD_DEBUG).
>
> (b) Try shared directories second.
> - Based on a common set of identified features.
>   - Compiler options to target the shared set should be explicitly
>     stated (printed by LD_DEBUG).
>
> My understanding is that Florian is asking for help with (b)
> to identify what things should be enabled for current CPUs, and
> that we'll compare that list to the Intel list and make a common
> shared directory that the downstream distributions can used
> for the most optimized library we can have in common.

The results for (b) also feed into (a) to some extent because if
research for (b) reveals that certain CPU features have been disabled
by microupdate updates, we probably do not want them for (a), either.