mbox series

[v2,00/10] x86-64: Avoid RTM abort inside a RTM region

Message ID 20210315142520.1661407-1-hjl.tools@gmail.com
Headers show
Series x86-64: Avoid RTM abort inside a RTM region | expand

Message

H.J. Lu March 15, 2021, 2:25 p.m. UTC
Changes in v2:

1. Don't use YMM2 in EVEX strcpy/strcat.
2. Correct EVEX mempcpy listing.
3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.

---
Since VZEROUPPER triggers RTM abort inside a transactionally executing
RTM region, avoid VZEROUPPER inside a RTM region in string/memory
functions:

1. Turn on Prefer_No_VZEROUPPER for processors with RTM.
2. Select functions optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers, which don't need VZEROUPPER at function exit.
3. Select AVX optimized string/memory functions with

	xtest
	jz	1f
	vzeroall
	ret
1:
	vzeroupper
	ret

at function exit on processors with RTM, but without 256-bit EVEX
instructions.
4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp.  Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
5. Add tests to verify that string/memory functions won't cause RTM abort
in RTM region.
6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.

H.J. Lu (10):
  x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
  x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
  x86-64: Add strcpy family functions with 256-bit EVEX
  x86-64: Add memmove family functions with 256-bit EVEX
  x86-64: Add memset family functions with 256-bit EVEX
  x86-64: Add memcmp family functions with 256-bit EVEX
  x86-64: Add AVX optimized string/memory functions for RTM
  x86: Add string/memory function tests in RTM region
  x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
  x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions

 sysdeps/x86/Makefile                          |   23 +
 sysdeps/x86/cpu-features.c                    |   20 +-
 sysdeps/x86/cpu-tunables.c                    |    2 +
 ...cpu-features-preferred_feature_index_1.def |    1 +
 sysdeps/x86/tst-memchr-rtm.c                  |   54 +
 sysdeps/x86/tst-memcmp-rtm.c                  |   52 +
 sysdeps/x86/tst-memmove-rtm.c                 |   53 +
 sysdeps/x86/tst-memrchr-rtm.c                 |   54 +
 sysdeps/x86/tst-memset-rtm.c                  |   45 +
 sysdeps/x86/tst-strchr-rtm.c                  |   54 +
 sysdeps/x86/tst-strcpy-rtm.c                  |   53 +
 sysdeps/x86/tst-string-rtm.h                  |   72 ++
 sysdeps/x86/tst-strlen-rtm.c                  |   53 +
 sysdeps/x86/tst-strncmp-rtm.c                 |   52 +
 sysdeps/x86/tst-strrchr-rtm.c                 |   53 +
 sysdeps/x86_64/multiarch/Makefile             |   58 +-
 sysdeps/x86_64/multiarch/ifunc-avx2.h         |   18 +-
 sysdeps/x86_64/multiarch/ifunc-impl-list.c    |  381 +++++-
 sysdeps/x86_64/multiarch/ifunc-memcmp.h       |   17 +-
 sysdeps/x86_64/multiarch/ifunc-memmove.h      |   45 +-
 sysdeps/x86_64/multiarch/ifunc-memset.h       |   49 +-
 sysdeps/x86_64/multiarch/ifunc-strcpy.h       |   17 +-
 sysdeps/x86_64/multiarch/ifunc-wmemset.h      |   22 +-
 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S    |   12 +
 sysdeps/x86_64/multiarch/memchr-avx2.S        |   45 +-
 sysdeps/x86_64/multiarch/memchr-evex.S        |  381 ++++++
 .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S  |   12 +
 sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S  |   28 +-
 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S  |  440 +++++++
 .../memmove-avx-unaligned-erms-rtm.S          |   17 +
 .../multiarch/memmove-avx512-unaligned-erms.S |   25 +-
 .../multiarch/memmove-evex-unaligned-erms.S   |   33 +
 .../multiarch/memmove-vec-unaligned-erms.S    |   57 +-
 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S   |   12 +
 sysdeps/x86_64/multiarch/memrchr-avx2.S       |   53 +-
 sysdeps/x86_64/multiarch/memrchr-evex.S       |  337 ++++++
 .../memset-avx2-unaligned-erms-rtm.S          |   10 +
 .../multiarch/memset-avx2-unaligned-erms.S    |   12 +-
 .../multiarch/memset-avx512-unaligned-erms.S  |   16 +-
 .../multiarch/memset-evex-unaligned-erms.S    |   24 +
 .../multiarch/memset-vec-unaligned-erms.S     |   61 +-
 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S |    4 +
 sysdeps/x86_64/multiarch/rawmemchr-evex.S     |    4 +
 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S    |    3 +
 sysdeps/x86_64/multiarch/stpcpy-evex.S        |    3 +
 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S   |    4 +
 sysdeps/x86_64/multiarch/stpncpy-evex.S       |    4 +
 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S    |   12 +
 sysdeps/x86_64/multiarch/strcat-avx2.S        |    6 +-
 sysdeps/x86_64/multiarch/strcat-evex.S        |  283 +++++
 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S    |   12 +
 sysdeps/x86_64/multiarch/strchr-avx2.S        |   28 +-
 sysdeps/x86_64/multiarch/strchr-evex.S        |  335 ++++++
 sysdeps/x86_64/multiarch/strchr.c             |   17 +-
 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S |    3 +
 sysdeps/x86_64/multiarch/strchrnul-evex.S     |    3 +
 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S    |   12 +
 sysdeps/x86_64/multiarch/strcmp-avx2.S        |   55 +-
 sysdeps/x86_64/multiarch/strcmp-evex.S        | 1043 +++++++++++++++++
 sysdeps/x86_64/multiarch/strcmp.c             |   19 +-
 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S    |   12 +
 sysdeps/x86_64/multiarch/strcpy-avx2.S        |   85 +-
 sysdeps/x86_64/multiarch/strcpy-evex.S        | 1003 ++++++++++++++++
 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S    |   12 +
 sysdeps/x86_64/multiarch/strlen-avx2.S        |   43 +-
 sysdeps/x86_64/multiarch/strlen-evex.S        |  436 +++++++
 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S   |    3 +
 sysdeps/x86_64/multiarch/strncat-evex.S       |    3 +
 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S   |    3 +
 sysdeps/x86_64/multiarch/strncmp-evex.S       |    3 +
 sysdeps/x86_64/multiarch/strncmp.c            |   19 +-
 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S   |    3 +
 sysdeps/x86_64/multiarch/strncpy-evex.S       |    3 +
 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S   |    4 +
 sysdeps/x86_64/multiarch/strnlen-evex.S       |    4 +
 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S   |   12 +
 sysdeps/x86_64/multiarch/strrchr-avx2.S       |   19 +-
 sysdeps/x86_64/multiarch/strrchr-evex.S       |  265 +++++
 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S    |    3 +
 sysdeps/x86_64/multiarch/wcschr-evex.S        |    3 +
 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S    |    4 +
 sysdeps/x86_64/multiarch/wcscmp-evex.S        |    4 +
 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S    |    4 +
 sysdeps/x86_64/multiarch/wcslen-evex.S        |    4 +
 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S   |    5 +
 sysdeps/x86_64/multiarch/wcsncmp-evex.S       |    5 +
 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S   |    5 +
 sysdeps/x86_64/multiarch/wcsnlen-evex.S       |    5 +
 sysdeps/x86_64/multiarch/wcsnlen.c            |   18 +-
 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S   |    3 +
 sysdeps/x86_64/multiarch/wcsrchr-evex.S       |    3 +
 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S   |    4 +
 sysdeps/x86_64/multiarch/wmemchr-evex.S       |    4 +
 .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S |    4 +
 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S |    4 +
 sysdeps/x86_64/sysdep.h                       |   22 +
 96 files changed, 6372 insertions(+), 337 deletions(-)
 create mode 100644 sysdeps/x86/tst-memchr-rtm.c
 create mode 100644 sysdeps/x86/tst-memcmp-rtm.c
 create mode 100644 sysdeps/x86/tst-memmove-rtm.c
 create mode 100644 sysdeps/x86/tst-memrchr-rtm.c
 create mode 100644 sysdeps/x86/tst-memset-rtm.c
 create mode 100644 sysdeps/x86/tst-strchr-rtm.c
 create mode 100644 sysdeps/x86/tst-strcpy-rtm.c
 create mode 100644 sysdeps/x86/tst-string-rtm.h
 create mode 100644 sysdeps/x86/tst-strlen-rtm.c
 create mode 100644 sysdeps/x86/tst-strncmp-rtm.c
 create mode 100644 sysdeps/x86/tst-strrchr-rtm.c
 create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
 create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S
 create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
 create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S
 create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S
 create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S

Comments

H.J. Lu March 24, 2021, 6:03 p.m. UTC | #1
On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> Changes in v2:
>
> 1. Don't use YMM2 in EVEX strcpy/strcat.
> 2. Correct EVEX mempcpy listing.
> 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
>
> ---
> Since VZEROUPPER triggers RTM abort inside a transactionally executing
> RTM region, avoid VZEROUPPER inside a RTM region in string/memory
> functions:
>
> 1. Turn on Prefer_No_VZEROUPPER for processors with RTM.
> 2. Select functions optimized with 256-bit EVEX instructions using
> YMM16-YMM31 registers, which don't need VZEROUPPER at function exit.
> 3. Select AVX optimized string/memory functions with
>
>         xtest
>         jz      1f
>         vzeroall
>         ret
> 1:
>         vzeroupper
>         ret
>
> at function exit on processors with RTM, but without 256-bit EVEX
> instructions.
> 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
> loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
> 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp.  Add
> Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
> 5. Add tests to verify that string/memory functions won't cause RTM abort
> in RTM region.
> 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
>
> H.J. Lu (10):
>   x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
>   x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
>   x86-64: Add strcpy family functions with 256-bit EVEX
>   x86-64: Add memmove family functions with 256-bit EVEX
>   x86-64: Add memset family functions with 256-bit EVEX
>   x86-64: Add memcmp family functions with 256-bit EVEX
>   x86-64: Add AVX optimized string/memory functions for RTM
>   x86: Add string/memory function tests in RTM region
>   x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
>   x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
>
>  sysdeps/x86/Makefile                          |   23 +
>  sysdeps/x86/cpu-features.c                    |   20 +-
>  sysdeps/x86/cpu-tunables.c                    |    2 +
>  ...cpu-features-preferred_feature_index_1.def |    1 +
>  sysdeps/x86/tst-memchr-rtm.c                  |   54 +
>  sysdeps/x86/tst-memcmp-rtm.c                  |   52 +
>  sysdeps/x86/tst-memmove-rtm.c                 |   53 +
>  sysdeps/x86/tst-memrchr-rtm.c                 |   54 +
>  sysdeps/x86/tst-memset-rtm.c                  |   45 +
>  sysdeps/x86/tst-strchr-rtm.c                  |   54 +
>  sysdeps/x86/tst-strcpy-rtm.c                  |   53 +
>  sysdeps/x86/tst-string-rtm.h                  |   72 ++
>  sysdeps/x86/tst-strlen-rtm.c                  |   53 +
>  sysdeps/x86/tst-strncmp-rtm.c                 |   52 +
>  sysdeps/x86/tst-strrchr-rtm.c                 |   53 +
>  sysdeps/x86_64/multiarch/Makefile             |   58 +-
>  sysdeps/x86_64/multiarch/ifunc-avx2.h         |   18 +-
>  sysdeps/x86_64/multiarch/ifunc-impl-list.c    |  381 +++++-
>  sysdeps/x86_64/multiarch/ifunc-memcmp.h       |   17 +-
>  sysdeps/x86_64/multiarch/ifunc-memmove.h      |   45 +-
>  sysdeps/x86_64/multiarch/ifunc-memset.h       |   49 +-
>  sysdeps/x86_64/multiarch/ifunc-strcpy.h       |   17 +-
>  sysdeps/x86_64/multiarch/ifunc-wmemset.h      |   22 +-
>  sysdeps/x86_64/multiarch/memchr-avx2-rtm.S    |   12 +
>  sysdeps/x86_64/multiarch/memchr-avx2.S        |   45 +-
>  sysdeps/x86_64/multiarch/memchr-evex.S        |  381 ++++++
>  .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S  |   12 +
>  sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S  |   28 +-
>  sysdeps/x86_64/multiarch/memcmp-evex-movbe.S  |  440 +++++++
>  .../memmove-avx-unaligned-erms-rtm.S          |   17 +
>  .../multiarch/memmove-avx512-unaligned-erms.S |   25 +-
>  .../multiarch/memmove-evex-unaligned-erms.S   |   33 +
>  .../multiarch/memmove-vec-unaligned-erms.S    |   57 +-
>  sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S   |   12 +
>  sysdeps/x86_64/multiarch/memrchr-avx2.S       |   53 +-
>  sysdeps/x86_64/multiarch/memrchr-evex.S       |  337 ++++++
>  .../memset-avx2-unaligned-erms-rtm.S          |   10 +
>  .../multiarch/memset-avx2-unaligned-erms.S    |   12 +-
>  .../multiarch/memset-avx512-unaligned-erms.S  |   16 +-
>  .../multiarch/memset-evex-unaligned-erms.S    |   24 +
>  .../multiarch/memset-vec-unaligned-erms.S     |   61 +-
>  sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S |    4 +
>  sysdeps/x86_64/multiarch/rawmemchr-evex.S     |    4 +
>  sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S    |    3 +
>  sysdeps/x86_64/multiarch/stpcpy-evex.S        |    3 +
>  sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S   |    4 +
>  sysdeps/x86_64/multiarch/stpncpy-evex.S       |    4 +
>  sysdeps/x86_64/multiarch/strcat-avx2-rtm.S    |   12 +
>  sysdeps/x86_64/multiarch/strcat-avx2.S        |    6 +-
>  sysdeps/x86_64/multiarch/strcat-evex.S        |  283 +++++
>  sysdeps/x86_64/multiarch/strchr-avx2-rtm.S    |   12 +
>  sysdeps/x86_64/multiarch/strchr-avx2.S        |   28 +-
>  sysdeps/x86_64/multiarch/strchr-evex.S        |  335 ++++++
>  sysdeps/x86_64/multiarch/strchr.c             |   17 +-
>  sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S |    3 +
>  sysdeps/x86_64/multiarch/strchrnul-evex.S     |    3 +
>  sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S    |   12 +
>  sysdeps/x86_64/multiarch/strcmp-avx2.S        |   55 +-
>  sysdeps/x86_64/multiarch/strcmp-evex.S        | 1043 +++++++++++++++++
>  sysdeps/x86_64/multiarch/strcmp.c             |   19 +-
>  sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S    |   12 +
>  sysdeps/x86_64/multiarch/strcpy-avx2.S        |   85 +-
>  sysdeps/x86_64/multiarch/strcpy-evex.S        | 1003 ++++++++++++++++
>  sysdeps/x86_64/multiarch/strlen-avx2-rtm.S    |   12 +
>  sysdeps/x86_64/multiarch/strlen-avx2.S        |   43 +-
>  sysdeps/x86_64/multiarch/strlen-evex.S        |  436 +++++++
>  sysdeps/x86_64/multiarch/strncat-avx2-rtm.S   |    3 +
>  sysdeps/x86_64/multiarch/strncat-evex.S       |    3 +
>  sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S   |    3 +
>  sysdeps/x86_64/multiarch/strncmp-evex.S       |    3 +
>  sysdeps/x86_64/multiarch/strncmp.c            |   19 +-
>  sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S   |    3 +
>  sysdeps/x86_64/multiarch/strncpy-evex.S       |    3 +
>  sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S   |    4 +
>  sysdeps/x86_64/multiarch/strnlen-evex.S       |    4 +
>  sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S   |   12 +
>  sysdeps/x86_64/multiarch/strrchr-avx2.S       |   19 +-
>  sysdeps/x86_64/multiarch/strrchr-evex.S       |  265 +++++
>  sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S    |    3 +
>  sysdeps/x86_64/multiarch/wcschr-evex.S        |    3 +
>  sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S    |    4 +
>  sysdeps/x86_64/multiarch/wcscmp-evex.S        |    4 +
>  sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S    |    4 +
>  sysdeps/x86_64/multiarch/wcslen-evex.S        |    4 +
>  sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S   |    5 +
>  sysdeps/x86_64/multiarch/wcsncmp-evex.S       |    5 +
>  sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S   |    5 +
>  sysdeps/x86_64/multiarch/wcsnlen-evex.S       |    5 +
>  sysdeps/x86_64/multiarch/wcsnlen.c            |   18 +-
>  sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S   |    3 +
>  sysdeps/x86_64/multiarch/wcsrchr-evex.S       |    3 +
>  sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S   |    4 +
>  sysdeps/x86_64/multiarch/wmemchr-evex.S       |    4 +
>  .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S |    4 +
>  sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S |    4 +
>  sysdeps/x86_64/sysdep.h                       |   22 +
>  96 files changed, 6372 insertions(+), 337 deletions(-)
>  create mode 100644 sysdeps/x86/tst-memchr-rtm.c
>  create mode 100644 sysdeps/x86/tst-memcmp-rtm.c
>  create mode 100644 sysdeps/x86/tst-memmove-rtm.c
>  create mode 100644 sysdeps/x86/tst-memrchr-rtm.c
>  create mode 100644 sysdeps/x86/tst-memset-rtm.c
>  create mode 100644 sysdeps/x86/tst-strchr-rtm.c
>  create mode 100644 sysdeps/x86/tst-strcpy-rtm.c
>  create mode 100644 sysdeps/x86/tst-string-rtm.h
>  create mode 100644 sysdeps/x86/tst-strlen-rtm.c
>  create mode 100644 sysdeps/x86/tst-strncmp-rtm.c
>  create mode 100644 sysdeps/x86/tst-strrchr-rtm.c
>  create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
>  create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S
>  create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
>  create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S
>  create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S
>  create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S
>
> --
> 2.30.2
>

These patches have been tested internally and externally for more than 2
weeks.  I have been running the patched system glibc on AVX and AVX512
machines.   If there are no objections nor comments, I will check them in
next Tuesday.
H.J. Lu March 29, 2021, 11:06 p.m. UTC | #2
On Wed, Mar 24, 2021 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > Changes in v2:
> >
> > 1. Don't use YMM2 in EVEX strcpy/strcat.
> > 2. Correct EVEX mempcpy listing.
> > 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> >
> > ---
> > Since VZEROUPPER triggers RTM abort inside a transactionally executing
> > RTM region, avoid VZEROUPPER inside a RTM region in string/memory
> > functions:
> >
> > 1. Turn on Prefer_No_VZEROUPPER for processors with RTM.
> > 2. Select functions optimized with 256-bit EVEX instructions using
> > YMM16-YMM31 registers, which don't need VZEROUPPER at function exit.
> > 3. Select AVX optimized string/memory functions with
> >
> >         xtest
> >         jz      1f
> >         vzeroall
> >         ret
> > 1:
> >         vzeroupper
> >         ret
> >
> > at function exit on processors with RTM, but without 256-bit EVEX
> > instructions.
> > 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
> > loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
> > 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp.  Add
> > Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
> > 5. Add tests to verify that string/memory functions won't cause RTM abort
> > in RTM region.
> > 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> >
> > H.J. Lu (10):
> >   x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
> >   x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
> >   x86-64: Add strcpy family functions with 256-bit EVEX
> >   x86-64: Add memmove family functions with 256-bit EVEX
> >   x86-64: Add memset family functions with 256-bit EVEX
> >   x86-64: Add memcmp family functions with 256-bit EVEX
> >   x86-64: Add AVX optimized string/memory functions for RTM
> >   x86: Add string/memory function tests in RTM region
> >   x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
> >   x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
> >
> >  sysdeps/x86/Makefile                          |   23 +
> >  sysdeps/x86/cpu-features.c                    |   20 +-
> >  sysdeps/x86/cpu-tunables.c                    |    2 +
> >  ...cpu-features-preferred_feature_index_1.def |    1 +
> >  sysdeps/x86/tst-memchr-rtm.c                  |   54 +
> >  sysdeps/x86/tst-memcmp-rtm.c                  |   52 +
> >  sysdeps/x86/tst-memmove-rtm.c                 |   53 +
> >  sysdeps/x86/tst-memrchr-rtm.c                 |   54 +
> >  sysdeps/x86/tst-memset-rtm.c                  |   45 +
> >  sysdeps/x86/tst-strchr-rtm.c                  |   54 +
> >  sysdeps/x86/tst-strcpy-rtm.c                  |   53 +
> >  sysdeps/x86/tst-string-rtm.h                  |   72 ++
> >  sysdeps/x86/tst-strlen-rtm.c                  |   53 +
> >  sysdeps/x86/tst-strncmp-rtm.c                 |   52 +
> >  sysdeps/x86/tst-strrchr-rtm.c                 |   53 +
> >  sysdeps/x86_64/multiarch/Makefile             |   58 +-
> >  sysdeps/x86_64/multiarch/ifunc-avx2.h         |   18 +-
> >  sysdeps/x86_64/multiarch/ifunc-impl-list.c    |  381 +++++-
> >  sysdeps/x86_64/multiarch/ifunc-memcmp.h       |   17 +-
> >  sysdeps/x86_64/multiarch/ifunc-memmove.h      |   45 +-
> >  sysdeps/x86_64/multiarch/ifunc-memset.h       |   49 +-
> >  sysdeps/x86_64/multiarch/ifunc-strcpy.h       |   17 +-
> >  sysdeps/x86_64/multiarch/ifunc-wmemset.h      |   22 +-
> >  sysdeps/x86_64/multiarch/memchr-avx2-rtm.S    |   12 +
> >  sysdeps/x86_64/multiarch/memchr-avx2.S        |   45 +-
> >  sysdeps/x86_64/multiarch/memchr-evex.S        |  381 ++++++
> >  .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S  |   12 +
> >  sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S  |   28 +-
> >  sysdeps/x86_64/multiarch/memcmp-evex-movbe.S  |  440 +++++++
> >  .../memmove-avx-unaligned-erms-rtm.S          |   17 +
> >  .../multiarch/memmove-avx512-unaligned-erms.S |   25 +-
> >  .../multiarch/memmove-evex-unaligned-erms.S   |   33 +
> >  .../multiarch/memmove-vec-unaligned-erms.S    |   57 +-
> >  sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S   |   12 +
> >  sysdeps/x86_64/multiarch/memrchr-avx2.S       |   53 +-
> >  sysdeps/x86_64/multiarch/memrchr-evex.S       |  337 ++++++
> >  .../memset-avx2-unaligned-erms-rtm.S          |   10 +
> >  .../multiarch/memset-avx2-unaligned-erms.S    |   12 +-
> >  .../multiarch/memset-avx512-unaligned-erms.S  |   16 +-
> >  .../multiarch/memset-evex-unaligned-erms.S    |   24 +
> >  .../multiarch/memset-vec-unaligned-erms.S     |   61 +-
> >  sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S |    4 +
> >  sysdeps/x86_64/multiarch/rawmemchr-evex.S     |    4 +
> >  sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S    |    3 +
> >  sysdeps/x86_64/multiarch/stpcpy-evex.S        |    3 +
> >  sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S   |    4 +
> >  sysdeps/x86_64/multiarch/stpncpy-evex.S       |    4 +
> >  sysdeps/x86_64/multiarch/strcat-avx2-rtm.S    |   12 +
> >  sysdeps/x86_64/multiarch/strcat-avx2.S        |    6 +-
> >  sysdeps/x86_64/multiarch/strcat-evex.S        |  283 +++++
> >  sysdeps/x86_64/multiarch/strchr-avx2-rtm.S    |   12 +
> >  sysdeps/x86_64/multiarch/strchr-avx2.S        |   28 +-
> >  sysdeps/x86_64/multiarch/strchr-evex.S        |  335 ++++++
> >  sysdeps/x86_64/multiarch/strchr.c             |   17 +-
> >  sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S |    3 +
> >  sysdeps/x86_64/multiarch/strchrnul-evex.S     |    3 +
> >  sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S    |   12 +
> >  sysdeps/x86_64/multiarch/strcmp-avx2.S        |   55 +-
> >  sysdeps/x86_64/multiarch/strcmp-evex.S        | 1043 +++++++++++++++++
> >  sysdeps/x86_64/multiarch/strcmp.c             |   19 +-
> >  sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S    |   12 +
> >  sysdeps/x86_64/multiarch/strcpy-avx2.S        |   85 +-
> >  sysdeps/x86_64/multiarch/strcpy-evex.S        | 1003 ++++++++++++++++
> >  sysdeps/x86_64/multiarch/strlen-avx2-rtm.S    |   12 +
> >  sysdeps/x86_64/multiarch/strlen-avx2.S        |   43 +-
> >  sysdeps/x86_64/multiarch/strlen-evex.S        |  436 +++++++
> >  sysdeps/x86_64/multiarch/strncat-avx2-rtm.S   |    3 +
> >  sysdeps/x86_64/multiarch/strncat-evex.S       |    3 +
> >  sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S   |    3 +
> >  sysdeps/x86_64/multiarch/strncmp-evex.S       |    3 +
> >  sysdeps/x86_64/multiarch/strncmp.c            |   19 +-
> >  sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S   |    3 +
> >  sysdeps/x86_64/multiarch/strncpy-evex.S       |    3 +
> >  sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S   |    4 +
> >  sysdeps/x86_64/multiarch/strnlen-evex.S       |    4 +
> >  sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S   |   12 +
> >  sysdeps/x86_64/multiarch/strrchr-avx2.S       |   19 +-
> >  sysdeps/x86_64/multiarch/strrchr-evex.S       |  265 +++++
> >  sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S    |    3 +
> >  sysdeps/x86_64/multiarch/wcschr-evex.S        |    3 +
> >  sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S    |    4 +
> >  sysdeps/x86_64/multiarch/wcscmp-evex.S        |    4 +
> >  sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S    |    4 +
> >  sysdeps/x86_64/multiarch/wcslen-evex.S        |    4 +
> >  sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S   |    5 +
> >  sysdeps/x86_64/multiarch/wcsncmp-evex.S       |    5 +
> >  sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S   |    5 +
> >  sysdeps/x86_64/multiarch/wcsnlen-evex.S       |    5 +
> >  sysdeps/x86_64/multiarch/wcsnlen.c            |   18 +-
> >  sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S   |    3 +
> >  sysdeps/x86_64/multiarch/wcsrchr-evex.S       |    3 +
> >  sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S   |    4 +
> >  sysdeps/x86_64/multiarch/wmemchr-evex.S       |    4 +
> >  .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S |    4 +
> >  sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S |    4 +
> >  sysdeps/x86_64/sysdep.h                       |   22 +
> >  96 files changed, 6372 insertions(+), 337 deletions(-)
> >  create mode 100644 sysdeps/x86/tst-memchr-rtm.c
> >  create mode 100644 sysdeps/x86/tst-memcmp-rtm.c
> >  create mode 100644 sysdeps/x86/tst-memmove-rtm.c
> >  create mode 100644 sysdeps/x86/tst-memrchr-rtm.c
> >  create mode 100644 sysdeps/x86/tst-memset-rtm.c
> >  create mode 100644 sysdeps/x86/tst-strchr-rtm.c
> >  create mode 100644 sysdeps/x86/tst-strcpy-rtm.c
> >  create mode 100644 sysdeps/x86/tst-string-rtm.h
> >  create mode 100644 sysdeps/x86/tst-strlen-rtm.c
> >  create mode 100644 sysdeps/x86/tst-strncmp-rtm.c
> >  create mode 100644 sysdeps/x86/tst-strrchr-rtm.c
> >  create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
> >  create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S
> >  create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S
> >
> > --
> > 2.30.2
> >
>
> These patches have been tested internally and externally for more than 2
> weeks.  I have been running the patched system glibc on AVX and AVX512
> machines.   If there are no objections nor comments, I will check them in
> next Tuesday.
>

I checked all 10 patches into master branch.  Here are backports for
2.33 to 2.28 branches:

https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.33
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.32
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.31
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.30
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.29
https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.28
H.J. Lu Jan. 27, 2022, 5:13 p.m. UTC | #3
On Mon, Mar 29, 2021 at 4:06 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Wed, Mar 24, 2021 at 11:03 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Mon, Mar 15, 2021 at 7:25 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > Changes in v2:
> > >
> > > 1. Don't use YMM2 in EVEX strcpy/strcat.
> > > 2. Correct EVEX mempcpy listing.
> > > 3. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> > >
> > > ---
> > > Since VZEROUPPER triggers RTM abort inside a transactionally executing
> > > RTM region, avoid VZEROUPPER inside a RTM region in string/memory
> > > functions:
> > >
> > > 1. Turn on Prefer_No_VZEROUPPER for processors with RTM.
> > > 2. Select functions optimized with 256-bit EVEX instructions using
> > > YMM16-YMM31 registers, which don't need VZEROUPPER at function exit.
> > > 3. Select AVX optimized string/memory functions with
> > >
> > >         xtest
> > >         jz      1f
> > >         vzeroall
> > >         ret
> > > 1:
> > >         vzeroupper
> > >         ret
> > >
> > > at function exit on processors with RTM, but without 256-bit EVEX
> > > instructions.
> > > 4. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
> > > loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
> > > 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp.  Add
> > > Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
> > > 5. Add tests to verify that string/memory functions won't cause RTM abort
> > > in RTM region.
> > > 6. Use ZMM16-ZMM31 in AVX512 memmove/memset family functions.
> > >
> > > H.J. Lu (10):
> > >   x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
> > >   x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
> > >   x86-64: Add strcpy family functions with 256-bit EVEX
> > >   x86-64: Add memmove family functions with 256-bit EVEX
> > >   x86-64: Add memset family functions with 256-bit EVEX
> > >   x86-64: Add memcmp family functions with 256-bit EVEX
> > >   x86-64: Add AVX optimized string/memory functions for RTM
> > >   x86: Add string/memory function tests in RTM region
> > >   x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
> > >   x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
> > >
> > >  sysdeps/x86/Makefile                          |   23 +
> > >  sysdeps/x86/cpu-features.c                    |   20 +-
> > >  sysdeps/x86/cpu-tunables.c                    |    2 +
> > >  ...cpu-features-preferred_feature_index_1.def |    1 +
> > >  sysdeps/x86/tst-memchr-rtm.c                  |   54 +
> > >  sysdeps/x86/tst-memcmp-rtm.c                  |   52 +
> > >  sysdeps/x86/tst-memmove-rtm.c                 |   53 +
> > >  sysdeps/x86/tst-memrchr-rtm.c                 |   54 +
> > >  sysdeps/x86/tst-memset-rtm.c                  |   45 +
> > >  sysdeps/x86/tst-strchr-rtm.c                  |   54 +
> > >  sysdeps/x86/tst-strcpy-rtm.c                  |   53 +
> > >  sysdeps/x86/tst-string-rtm.h                  |   72 ++
> > >  sysdeps/x86/tst-strlen-rtm.c                  |   53 +
> > >  sysdeps/x86/tst-strncmp-rtm.c                 |   52 +
> > >  sysdeps/x86/tst-strrchr-rtm.c                 |   53 +
> > >  sysdeps/x86_64/multiarch/Makefile             |   58 +-
> > >  sysdeps/x86_64/multiarch/ifunc-avx2.h         |   18 +-
> > >  sysdeps/x86_64/multiarch/ifunc-impl-list.c    |  381 +++++-
> > >  sysdeps/x86_64/multiarch/ifunc-memcmp.h       |   17 +-
> > >  sysdeps/x86_64/multiarch/ifunc-memmove.h      |   45 +-
> > >  sysdeps/x86_64/multiarch/ifunc-memset.h       |   49 +-
> > >  sysdeps/x86_64/multiarch/ifunc-strcpy.h       |   17 +-
> > >  sysdeps/x86_64/multiarch/ifunc-wmemset.h      |   22 +-
> > >  sysdeps/x86_64/multiarch/memchr-avx2-rtm.S    |   12 +
> > >  sysdeps/x86_64/multiarch/memchr-avx2.S        |   45 +-
> > >  sysdeps/x86_64/multiarch/memchr-evex.S        |  381 ++++++
> > >  .../x86_64/multiarch/memcmp-avx2-movbe-rtm.S  |   12 +
> > >  sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S  |   28 +-
> > >  sysdeps/x86_64/multiarch/memcmp-evex-movbe.S  |  440 +++++++
> > >  .../memmove-avx-unaligned-erms-rtm.S          |   17 +
> > >  .../multiarch/memmove-avx512-unaligned-erms.S |   25 +-
> > >  .../multiarch/memmove-evex-unaligned-erms.S   |   33 +
> > >  .../multiarch/memmove-vec-unaligned-erms.S    |   57 +-
> > >  sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S   |   12 +
> > >  sysdeps/x86_64/multiarch/memrchr-avx2.S       |   53 +-
> > >  sysdeps/x86_64/multiarch/memrchr-evex.S       |  337 ++++++
> > >  .../memset-avx2-unaligned-erms-rtm.S          |   10 +
> > >  .../multiarch/memset-avx2-unaligned-erms.S    |   12 +-
> > >  .../multiarch/memset-avx512-unaligned-erms.S  |   16 +-
> > >  .../multiarch/memset-evex-unaligned-erms.S    |   24 +
> > >  .../multiarch/memset-vec-unaligned-erms.S     |   61 +-
> > >  sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S |    4 +
> > >  sysdeps/x86_64/multiarch/rawmemchr-evex.S     |    4 +
> > >  sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S    |    3 +
> > >  sysdeps/x86_64/multiarch/stpcpy-evex.S        |    3 +
> > >  sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S   |    4 +
> > >  sysdeps/x86_64/multiarch/stpncpy-evex.S       |    4 +
> > >  sysdeps/x86_64/multiarch/strcat-avx2-rtm.S    |   12 +
> > >  sysdeps/x86_64/multiarch/strcat-avx2.S        |    6 +-
> > >  sysdeps/x86_64/multiarch/strcat-evex.S        |  283 +++++
> > >  sysdeps/x86_64/multiarch/strchr-avx2-rtm.S    |   12 +
> > >  sysdeps/x86_64/multiarch/strchr-avx2.S        |   28 +-
> > >  sysdeps/x86_64/multiarch/strchr-evex.S        |  335 ++++++
> > >  sysdeps/x86_64/multiarch/strchr.c             |   17 +-
> > >  sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S |    3 +
> > >  sysdeps/x86_64/multiarch/strchrnul-evex.S     |    3 +
> > >  sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S    |   12 +
> > >  sysdeps/x86_64/multiarch/strcmp-avx2.S        |   55 +-
> > >  sysdeps/x86_64/multiarch/strcmp-evex.S        | 1043 +++++++++++++++++
> > >  sysdeps/x86_64/multiarch/strcmp.c             |   19 +-
> > >  sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S    |   12 +
> > >  sysdeps/x86_64/multiarch/strcpy-avx2.S        |   85 +-
> > >  sysdeps/x86_64/multiarch/strcpy-evex.S        | 1003 ++++++++++++++++
> > >  sysdeps/x86_64/multiarch/strlen-avx2-rtm.S    |   12 +
> > >  sysdeps/x86_64/multiarch/strlen-avx2.S        |   43 +-
> > >  sysdeps/x86_64/multiarch/strlen-evex.S        |  436 +++++++
> > >  sysdeps/x86_64/multiarch/strncat-avx2-rtm.S   |    3 +
> > >  sysdeps/x86_64/multiarch/strncat-evex.S       |    3 +
> > >  sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S   |    3 +
> > >  sysdeps/x86_64/multiarch/strncmp-evex.S       |    3 +
> > >  sysdeps/x86_64/multiarch/strncmp.c            |   19 +-
> > >  sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S   |    3 +
> > >  sysdeps/x86_64/multiarch/strncpy-evex.S       |    3 +
> > >  sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S   |    4 +
> > >  sysdeps/x86_64/multiarch/strnlen-evex.S       |    4 +
> > >  sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S   |   12 +
> > >  sysdeps/x86_64/multiarch/strrchr-avx2.S       |   19 +-
> > >  sysdeps/x86_64/multiarch/strrchr-evex.S       |  265 +++++
> > >  sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S    |    3 +
> > >  sysdeps/x86_64/multiarch/wcschr-evex.S        |    3 +
> > >  sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S    |    4 +
> > >  sysdeps/x86_64/multiarch/wcscmp-evex.S        |    4 +
> > >  sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S    |    4 +
> > >  sysdeps/x86_64/multiarch/wcslen-evex.S        |    4 +
> > >  sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S   |    5 +
> > >  sysdeps/x86_64/multiarch/wcsncmp-evex.S       |    5 +
> > >  sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S   |    5 +
> > >  sysdeps/x86_64/multiarch/wcsnlen-evex.S       |    5 +
> > >  sysdeps/x86_64/multiarch/wcsnlen.c            |   18 +-
> > >  sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S   |    3 +
> > >  sysdeps/x86_64/multiarch/wcsrchr-evex.S       |    3 +
> > >  sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S   |    4 +
> > >  sysdeps/x86_64/multiarch/wmemchr-evex.S       |    4 +
> > >  .../x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S |    4 +
> > >  sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S |    4 +
> > >  sysdeps/x86_64/sysdep.h                       |   22 +
> > >  96 files changed, 6372 insertions(+), 337 deletions(-)
> > >  create mode 100644 sysdeps/x86/tst-memchr-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-memcmp-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-memmove-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-memrchr-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-memset-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-strchr-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-strcpy-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-string-rtm.h
> > >  create mode 100644 sysdeps/x86/tst-strlen-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-strncmp-rtm.c
> > >  create mode 100644 sysdeps/x86/tst-strrchr-rtm.c
> > >  create mode 100644 sysdeps/x86_64/multiarch/memchr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memchr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memcmp-avx2-movbe-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memcmp-evex-movbe.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memmove-evex-unaligned-erms.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memrchr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memrchr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/memset-evex-unaligned-erms.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/rawmemchr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/stpcpy-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/stpcpy-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/stpncpy-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/stpncpy-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strcat-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strcat-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strchr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strchr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strchrnul-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strchrnul-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strcmp-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strcmp-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strcpy-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strcpy-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strlen-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strlen-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strncat-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strncat-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strncmp-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strncmp-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strncpy-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strncpy-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strnlen-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strnlen-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strrchr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/strrchr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcschr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcschr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcscmp-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcscmp-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcslen-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcslen-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcsncmp-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcsnlen-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wcsrchr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wmemchr-avx2-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wmemchr-evex.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-avx2-movbe-rtm.S
> > >  create mode 100644 sysdeps/x86_64/multiarch/wmemcmp-evex-movbe.S
> > >
> > > --
> > > 2.30.2
> > >
> >
> > These patches have been tested internally and externally for more than 2
> > weeks.  I have been running the patched system glibc on AVX and AVX512
> > machines.   If there are no objections nor comments, I will check them in
> > next Tuesday.
> >
>
> I checked all 10 patches into master branch.  Here are backports for
> 2.33 to 2.28 branches:
>
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.33
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.32
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.31
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.30
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.29
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/pr27457/2.28
>

I am backporting these to release branches.