mbox series

[00/10] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move

Message ID 20200215152628.32068-1-hjl.tools@gmail.com
Headers show
Series i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move | expand

Message

H.J. Lu Feb. 15, 2020, 3:26 p.m. UTC
This patch set was originally submitted in Feb 2019:

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html

I broke it into 10 smaller patches for easy review.

On x86, when AVX and AVX512 are enabled, vector move instructions can
be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):

   0:	c5 f9 6f d1          	vmovdqa %xmm1,%xmm2
   4:	62 f1 fd 08 6f d1    	vmovdqa64 %xmm1,%xmm2

We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
and 256-bit vector moves.  Mode attributes on x86 vector move patterns
indicate target preferences of vector move encoding.  For vector register
to vector register move, we can use 512-bit vector move instructions to
move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
if upper 16 vector registers aren't used.  This patch adds a function,
ix86_output_ssemov, to generate vector moves:

1. If zmm registers are used, use EVEX encoding.
2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
will be generated.
3. If xmm16-xmm31/ymm16-ymm31 registers are used:
   a. With AVX512VL, AVX512VL vector moves will be generated.
   b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
      move will be done with zmm register move.

Tested on AVX2 and AVX512 with and without --with-arch=native.

H.J. Lu (10):
  i386: Properly encode vector registers in vector move
  i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV
  i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV

 gcc/config/i386/i386-protos.h                 |   2 +
 gcc/config/i386/i386.c                        | 274 ++++++++++++++++++
 gcc/config/i386/i386.md                       | 212 +-------------
 gcc/config/i386/mmx.md                        |  29 +-
 gcc/config/i386/predicates.md                 |   5 -
 gcc/config/i386/sse.md                        |  98 +------
 .../gcc.target/i386/avx512vl-vmovdqa64-1.c    |   7 +-
 gcc/testsuite/gcc.target/i386/pr89229-2a.c    |  15 +
 gcc/testsuite/gcc.target/i386/pr89229-2b.c    |  13 +
 gcc/testsuite/gcc.target/i386/pr89229-2c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-3a.c    |  17 ++
 gcc/testsuite/gcc.target/i386/pr89229-3b.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-3c.c    |   7 +
 gcc/testsuite/gcc.target/i386/pr89229-4a.c    |  17 ++
 gcc/testsuite/gcc.target/i386/pr89229-4b.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-4c.c    |   7 +
 gcc/testsuite/gcc.target/i386/pr89229-5a.c    |  16 +
 gcc/testsuite/gcc.target/i386/pr89229-5b.c    |  12 +
 gcc/testsuite/gcc.target/i386/pr89229-5c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-6a.c    |  16 +
 gcc/testsuite/gcc.target/i386/pr89229-6b.c    |   7 +
 gcc/testsuite/gcc.target/i386/pr89229-6c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-7a.c    |  16 +
 gcc/testsuite/gcc.target/i386/pr89229-7b.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89229-7c.c    |   6 +
 gcc/testsuite/gcc.target/i386/pr89346.c       |  15 +
 26 files changed, 497 insertions(+), 330 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7a.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7b.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7c.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89346.c

Comments

H.J. Lu Feb. 24, 2020, 12:54 p.m. UTC | #1
On Sat, Feb 15, 2020 at 7:26 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> This patch set was originally submitted in Feb 2019:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html
>
> I broke it into 10 smaller patches for easy review.
>
> On x86, when AVX and AVX512 are enabled, vector move instructions can
> be encoded with either 2-byte/3-byte VEX (AVX) or 4-byte EVEX (AVX512):
>
>    0:   c5 f9 6f d1             vmovdqa %xmm1,%xmm2
>    4:   62 f1 fd 08 6f d1       vmovdqa64 %xmm1,%xmm2
>
> We prefer VEX encoding over EVEX since VEX is shorter.  Also AVX512F
> only supports 512-bit vector moves.  AVX512F + AVX512VL supports 128-bit
> and 256-bit vector moves.  Mode attributes on x86 vector move patterns
> indicate target preferences of vector move encoding.  For vector register
> to vector register move, we can use 512-bit vector move instructions to
> move 128-bit/256-bit vector if AVX512VL isn't available.  With AVX512F
> and AVX512VL, we should use VEX encoding for 128-bit/256-bit vector moves
> if upper 16 vector registers aren't used.  This patch adds a function,
> ix86_output_ssemov, to generate vector moves:
>
> 1. If zmm registers are used, use EVEX encoding.
> 2. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE or VEX encoding
> will be generated.
> 3. If xmm16-xmm31/ymm16-ymm31 registers are used:
>    a. With AVX512VL, AVX512VL vector moves will be generated.
>    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
>       move will be done with zmm register move.
>
> Tested on AVX2 and AVX512 with and without --with-arch=native.
>
> H.J. Lu (10):
>   i386: Properly encode vector registers in vector move
>   i386: Use ix86_output_ssemov for XImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for OImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for TImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for DImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for SImode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for TFmode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for DFmode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for SFmode TYPE_SSEMOV
>   i386: Use ix86_output_ssemov for MMX TYPE_SSEMOV
>
>  gcc/config/i386/i386-protos.h                 |   2 +
>  gcc/config/i386/i386.c                        | 274 ++++++++++++++++++
>  gcc/config/i386/i386.md                       | 212 +-------------
>  gcc/config/i386/mmx.md                        |  29 +-
>  gcc/config/i386/predicates.md                 |   5 -
>  gcc/config/i386/sse.md                        |  98 +------
>  .../gcc.target/i386/avx512vl-vmovdqa64-1.c    |   7 +-
>  gcc/testsuite/gcc.target/i386/pr89229-2a.c    |  15 +
>  gcc/testsuite/gcc.target/i386/pr89229-2b.c    |  13 +
>  gcc/testsuite/gcc.target/i386/pr89229-2c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-3a.c    |  17 ++
>  gcc/testsuite/gcc.target/i386/pr89229-3b.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-3c.c    |   7 +
>  gcc/testsuite/gcc.target/i386/pr89229-4a.c    |  17 ++
>  gcc/testsuite/gcc.target/i386/pr89229-4b.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-4c.c    |   7 +
>  gcc/testsuite/gcc.target/i386/pr89229-5a.c    |  16 +
>  gcc/testsuite/gcc.target/i386/pr89229-5b.c    |  12 +
>  gcc/testsuite/gcc.target/i386/pr89229-5c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-6a.c    |  16 +
>  gcc/testsuite/gcc.target/i386/pr89229-6b.c    |   7 +
>  gcc/testsuite/gcc.target/i386/pr89229-6c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-7a.c    |  16 +
>  gcc/testsuite/gcc.target/i386/pr89229-7b.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89229-7c.c    |   6 +
>  gcc/testsuite/gcc.target/i386/pr89346.c       |  15 +
>  26 files changed, 497 insertions(+), 330 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-2c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-3c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-4c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-5c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-6c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7a.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7b.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89229-7c.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89346.c
>

PING:

https://gcc.gnu.org/ml/gcc-patches/2020-02/msg00906.html