[00/46] Implement MMX intrinsics with SSE

Message ID	20190201211809.963-1-hjl.tools@gmail.com
Headers	show Return-Path: <gcc-patches-return-495115-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version :content-transfer-encoding; q=dns; s=default; b=PP4Kbcm+77IFYBLq Amztzy99FPxA8LtYkBW2zavrYKL3XUc6uIXailm20HdaZdH0Udm7PBBmtoIxDWyT XBMvS+Lp6SiOOXp+YfY5RMOATk+jsAVYtG3rHUPMJWqzWoe6VKK71pbxiDAR+DwR XG9T3Rw7Q5YH+Xosv7b3Tnc722U= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: "H.J. Lu" <hjl.tools@gmail.com> To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak <ubizjak@gmail.com> Subject: [PATCH 00/46] Implement MMX intrinsics with SSE Date: Fri, 1 Feb 2019 13:17:23 -0800 Message-Id: <20190201211809.963-1-hjl.tools@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Implement MMX intrinsics with SSE \| expand [00/46] Implement MMX intrinsics with SSE [01/46] i386: Add TARGET_MMX_INSNS and TARGET_MMX_WITH_SSE [02/46] libitm: Support _ITM_TYPE_M64 with SSE2 in 64-bit mode [03/46] i386: Allow 64-bit vector modes in SSE registers [04/46] i386: Allow UNSPECV_EMMS with SSE2 in 64-bit mode [05/46] i386: Emulate MMX packsswb/packssdw/packuswb with SSE2 [06/46] i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX [07/46] i386: Emulate MMX plusminus/sat_plusminus with SSE [08/46] i386: Emulate MMX mulv4hi3 with SSE [09/46] i386: Emulate MMX smulv4hi3_highpart with SSE [10/46] i386: Emulate MMX mmx_pmaddwd with SSE [11/46] i386: Emulate MMX ashr<mode>3/<shift_insn><mode>3 with SSE [12/46] i386: Emulate MMX <any_logic><mode>3 with SSE [13/46] i386: Emulate MMX mmx_andnot<mode>3 with SSE [14/46] i386: Emulate MMX mmx_eq/mmx_gt<mode>3 with SSE [15/46] i386: Emulate MMX vec_dupv2si with SSE [16/46] i386: Emulate MMX pshufw with SSE [17/46] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE [18/46] i386: Emulate MMX sse_cvtpi2ps with SSE [19/46] i386: Emulate MMX mmx_pextrw with SSE [20/46] i386: Emulate MMX mmx_pinsrw with SSE [21/46] i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE [22/46] i386: Emulate MMX mmx_pmovmskb with SSE [23/46] i386: Emulate MMX mmx_umulv4hi3_highpart with SSE [24/46] i386: Emulate MMX maskmovq with SSE2 maskmovdqu [25/46] i386: Emulate MMX mmx_uavgv8qi3 with SSE [26/46] i386: Emulate MMX mmx_uavgv4hi3 with SSE [27/46] i386: Emulate MMX mmx_psadbw with SSE [28/46] i386: Emulate MMX movntq with SSE2 movntidi [29/46] i386: Emulate MMX umulv1siv1di3 with SSE2 [30/46] i386: Emulate MMX ssse3_ph<plusminus_mnemonic>wv4hi3 with SSE [31/46] i386: Emulate MMX ssse3_ph<plusminus_mnemonic>dv2si3 with SSE [32/46] i386: Emulate MMX ssse3_pmaddubsw with SSE [33/46] i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE [34/46] i386: Emulate MMX pshufb with SSE version [35/46] i386: Emulate MMX ssse3_psign<mode>3 with SSE [36/46] i386: Emulate MMX ssse3_palignrdi with SSE [37/46] i386: Emulate MMX abs<mode>2 with SSE [38/46] i386: Allow MMXMODE moves without MMX [39/46] i386: Allow MMX vector expanders with SSE [40/46] i386: Don't enable MMX in 64-bit mode by default [41/46] i386: Add tests for MMX intrinsic emulations with SSE [42/46] i386: Also enable SSSE3 __m64 tests without MMX [43/46] i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE [44/46] i386: Implement V2SF add/sub/mul with SEE [45/46] i386: Implement V2SF <-> V2SI conversions with SEE [46/46] i386: Implement V2SF comparisons with SSE

H.J. Lu Feb. 1, 2019, 9:17 p.m. UTC

On x86-64, since __m64 is returned and passed in XMM registers, we can
implement MMX intrinsics with SSE instructions. To support it, we disable
MMX by default in 64-bit mode so that MMX registers won't be available
with x86-64.  Most of MMX instructions have equivalent SSE versions and
results of some SSE versions need to be reshuffled to the right order
for MMX.  Thee are couple tricky cases:

1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand.  A warning is issued since invalid memory access may
happen when bits 64:127 at memory location are unmapped:

xmmintrin.h:1168:3: note: Emulate MMX maskmovq with SSE2 maskmovdqu may result i
n invalid memory access
 1168 |   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.

3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.

4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.

Tests are also added to check each SSE emulation of MMX intrinsics.

With MMX disabled in 64-bit mode, 8-byte vectorizer is enabled with SSE2.

There are no regressions on i686 and x86-64.  For x86-64, GCC is also
tested with

--with-arch=native --with-cpu=native

on AVX2 and AVX512F machines.

H.J. Lu (46):
  i386: Add TARGET_MMX_INSNS and TARGET_MMX_WITH_SSE
  libitm: Support _ITM_TYPE_M64 with SSE2 in 64-bit mode
  i386: Allow 64-bit vector modes in SSE registers
  i386: Allow UNSPECV_EMMS with SSE2 in 64-bit mode
  i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
  i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
  i386: Emulate MMX plusminus/sat_plusminus with SSE
  i386: Emulate MMX mulv4hi3 with SSE
  i386: Emulate MMX smulv4hi3_highpart with SSE
  i386: Emulate MMX mmx_pmaddwd with SSE
  i386: Emulate MMX ashr<mode>3/<shift_insn><mode>3 with SSE
  i386: Emulate MMX <any_logic><mode>3 with SSE
  i386: Emulate MMX mmx_andnot<mode>3 with SSE
  i386: Emulate MMX mmx_eq/mmx_gt<mode>3 with SSE
  i386: Emulate MMX vec_dupv2si with SSE
  i386: Emulate MMX pshufw with SSE
  i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
  i386: Emulate MMX sse_cvtpi2ps with SSE
  i386: Emulate MMX mmx_pextrw with SSE
  i386: Emulate MMX mmx_pinsrw with SSE
  i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
  i386: Emulate MMX mmx_pmovmskb with SSE
  i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
  i386: Emulate MMX maskmovq with SSE2 maskmovdqu
  i386: Emulate MMX mmx_uavgv8qi3 with SSE
  i386: Emulate MMX mmx_uavgv4hi3 with SSE
  i386: Emulate MMX mmx_psadbw with SSE
  i386: Emulate MMX movntq with SSE2 movntidi
  i386: Emulate MMX umulv1siv1di3 with SSE2
  i386: Emulate MMX ssse3_ph<plusminus_mnemonic>wv4hi3 with SSE
  i386: Emulate MMX ssse3_ph<plusminus_mnemonic>dv2si3 with SSE
  i386: Emulate MMX ssse3_pmaddubsw with SSE
  i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
  i386: Emulate MMX pshufb with SSE version
  i386: Emulate MMX ssse3_psign<mode>3 with SSE
  i386: Emulate MMX ssse3_palignrdi with SSE
  i386: Emulate MMX abs<mode>2 with SSE
  i386: Allow MMXMODE moves without MMX
  i386: Allow MMX vector expanders with SSE
  i386: Don't enable MMX in 64-bit mode by default
  i386: Add tests for MMX intrinsic emulations with SSE
  i386: Also enable SSSE3 __m64 tests without MMX
  i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE
  i386: Implement V2SF add/sub/mul with SEE
  i386: Implement V2SF <-> V2SI conversions with SEE
  i386: Implement V2SF comparisons with SSE

 gcc/config/i386/constraints.md                |  10 +
 gcc/config/i386/driver-i386.c                 |   4 +-
 gcc/config/i386/i386-builtin.def              | 126 +--
 gcc/config/i386/i386-protos.h                 |   4 +
 gcc/config/i386/i386.c                        | 205 ++++-
 gcc/config/i386/i386.h                        |  22 +-
 gcc/config/i386/i386.md                       |   3 +-
 gcc/config/i386/i386.opt                      |   4 +
 gcc/config/i386/mmintrin.h                    |  10 +-
 gcc/config/i386/mmx.md                        | 824 ++++++++++++------
 gcc/config/i386/sse.md                        | 412 +++++++--
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c       |   2 +-
 gcc/testsuite/gcc.target/i386/mmx-vals.h      |  77 ++
 gcc/testsuite/gcc.target/i386/pr82483-1.c     |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c     |   2 +-
 gcc/testsuite/gcc.target/i386/pr89028-1.c     |  10 +
 gcc/testsuite/gcc.target/i386/pr89028-10.c    |  39 +
 gcc/testsuite/gcc.target/i386/pr89028-11.c    |  39 +
 gcc/testsuite/gcc.target/i386/pr89028-12.c    |  39 +
 gcc/testsuite/gcc.target/i386/pr89028-13.c    |  39 +
 gcc/testsuite/gcc.target/i386/pr89028-2.c     |  11 +
 gcc/testsuite/gcc.target/i386/pr89028-3.c     |  14 +
 gcc/testsuite/gcc.target/i386/pr89028-4.c     |  14 +
 gcc/testsuite/gcc.target/i386/pr89028-5.c     |  11 +
 gcc/testsuite/gcc.target/i386/pr89028-6.c     |  14 +
 gcc/testsuite/gcc.target/i386/pr89028-7.c     |  14 +
 gcc/testsuite/gcc.target/i386/pr89028-8.c     |  12 +
 gcc/testsuite/gcc.target/i386/pr89028-9.c     |  12 +
 gcc/testsuite/gcc.target/i386/sse-mmx-1.c     |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c   |  43 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c   |  42 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c   |  41 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c   |  31 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c   |  36 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c   |  51 ++
 gcc/testsuite/gcc.target/i386/sse2-mmx-18.c   |  13 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-19.c   |  11 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c    |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c   |  11 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c   |  13 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c    |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c    |   4 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c    |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c    |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c    |  12 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c    |   4 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c    |  80 ++
 .../gcc.target/i386/sse2-mmx-cvtpi2ps.c       |  43 +
 .../gcc.target/i386/sse2-mmx-cvtps2pi.c       |  36 +
 .../gcc.target/i386/sse2-mmx-cvttps2pi.c      |  36 +
 .../gcc.target/i386/sse2-mmx-maskmovq.c       |  52 ++
 .../gcc.target/i386/sse2-mmx-packssdw.c       |  52 ++
 .../gcc.target/i386/sse2-mmx-packsswb.c       |  52 ++
 .../gcc.target/i386/sse2-mmx-packuswb.c       |  52 ++
 .../gcc.target/i386/sse2-mmx-paddb.c          |  48 +
 .../gcc.target/i386/sse2-mmx-paddd.c          |  48 +
 .../gcc.target/i386/sse2-mmx-paddq.c          |  43 +
 .../gcc.target/i386/sse2-mmx-paddsb.c         |  48 +
 .../gcc.target/i386/sse2-mmx-paddsw.c         |  48 +
 .../gcc.target/i386/sse2-mmx-paddusb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-paddusw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-paddw.c          |  48 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c |  44 +
 .../gcc.target/i386/sse2-mmx-pandn.c          |  44 +
 .../gcc.target/i386/sse2-mmx-pavgb.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-pavgw.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-pcmpeqb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpeqd.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpeqw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpgtb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpgtd.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pcmpgtw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-pextrw.c         |  59 ++
 .../gcc.target/i386/sse2-mmx-pinsrw.c         |  61 ++
 .../gcc.target/i386/sse2-mmx-pmaddwd.c        |  47 +
 .../gcc.target/i386/sse2-mmx-pmaxsw.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pmaxub.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pminsw.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pminub.c         |  48 +
 .../gcc.target/i386/sse2-mmx-pmovmskb.c       |  46 +
 .../gcc.target/i386/sse2-mmx-pmulhuw.c        |  51 ++
 .../gcc.target/i386/sse2-mmx-pmulhw.c         |  53 ++
 .../gcc.target/i386/sse2-mmx-pmullw.c         |  52 ++
 .../gcc.target/i386/sse2-mmx-pmuludq.c        |  47 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c  |  44 +
 .../gcc.target/i386/sse2-mmx-psadbw.c         |  58 ++
 .../gcc.target/i386/sse2-mmx-pshufw.c         | 248 ++++++
 .../gcc.target/i386/sse2-mmx-pslld.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-pslldi.c         | 153 ++++
 .../gcc.target/i386/sse2-mmx-psllq.c          |  47 +
 .../gcc.target/i386/sse2-mmx-psllqi.c         | 245 ++++++
 .../gcc.target/i386/sse2-mmx-psllw.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-psllwi.c         | 105 +++
 .../gcc.target/i386/sse2-mmx-psrad.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-psradi.c         | 153 ++++
 .../gcc.target/i386/sse2-mmx-psraw.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-psrawi.c         | 105 +++
 .../gcc.target/i386/sse2-mmx-psrld.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-psrldi.c         | 153 ++++
 .../gcc.target/i386/sse2-mmx-psrlq.c          |  47 +
 .../gcc.target/i386/sse2-mmx-psrlqi.c         | 245 ++++++
 .../gcc.target/i386/sse2-mmx-psrlw.c          |  52 ++
 .../gcc.target/i386/sse2-mmx-psrlwi.c         | 105 +++
 .../gcc.target/i386/sse2-mmx-psubb.c          |  48 +
 .../gcc.target/i386/sse2-mmx-psubd.c          |  48 +
 .../gcc.target/i386/sse2-mmx-psubq.c          |  43 +
 .../gcc.target/i386/sse2-mmx-psubusb.c        |  48 +
 .../gcc.target/i386/sse2-mmx-psubusw.c        |  48 +
 .../gcc.target/i386/sse2-mmx-psubw.c          |  48 +
 .../gcc.target/i386/sse2-mmx-punpckhbw.c      |  53 ++
 .../gcc.target/i386/sse2-mmx-punpckhdq.c      |  47 +
 .../gcc.target/i386/sse2-mmx-punpckhwd.c      |  49 ++
 .../gcc.target/i386/sse2-mmx-punpcklbw.c      |  53 ++
 .../gcc.target/i386/sse2-mmx-punpckldq.c      |  47 +
 .../gcc.target/i386/sse2-mmx-punpcklwd.c      |  49 ++
 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c |  44 +
 gcc/testsuite/gcc.target/i386/ssse3-pabsb.c   |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-pabsd.c   |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-pabsw.c   |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-palignr.c |   6 +-
 gcc/testsuite/gcc.target/i386/ssse3-phaddd.c  |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-phaddsw.c |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-phaddw.c  |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-phsubd.c  |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-phsubsw.c |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-phsubw.c  |   4 +-
 .../gcc.target/i386/ssse3-pmaddubsw.c         |   4 +-
 .../gcc.target/i386/ssse3-pmulhrsw.c          |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-pshufb.c  |   6 +-
 gcc/testsuite/gcc.target/i386/ssse3-psignb.c  |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-psignd.c  |   4 +-
 gcc/testsuite/gcc.target/i386/ssse3-psignw.c  |   4 +-
 .../objc.dg/gnu-encoding/struct-layout-1.h    |   2 +-
 libitm/libitm.h                               |   2 +-
 136 files changed, 6575 insertions(+), 439 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/mmx-vals.h
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-mmx-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtpi2ps.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtps2pi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvttps2pi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-maskmovq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packssdw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packsswb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packuswb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pandn.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pextrw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaddwd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxsw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxub.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminsw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminub.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmovmskb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhuw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmullw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmuludq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psadbw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pshufw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslld.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslldi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllqi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllwi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrad.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psradi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psraw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrawi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrld.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrldi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlqi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlwi.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusb.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhbw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhdq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhwd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklbw.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckldq.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklwd.c
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c

Andi Kleen Feb. 2, 2019, 12:50 a.m. UTC | #1

"H.J. Lu" <hjl.tools@gmail.com> writes:

> To support it, we disable
> MMX by default in 64-bit mode so that MMX registers won't be available

Wouldn't that break inline assembler that references MMX register clobbers?

-Andi

H.J. Lu Feb. 2, 2019, 2:46 a.m. UTC | #2

On Fri, Feb 1, 2019 at 4:50 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>
> > To support it, we disable
> > MMX by default in 64-bit mode so that MMX registers won't be available
>
> Wouldn't that break inline assembler that references MMX register clobbers?

Yes.  You need to use -mmmx explicitly to enable MMX.

Florian Weimer Feb. 2, 2019, 5:05 p.m. UTC | #3

* H. J. Lu:

> 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
> maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
> mask operand.  A warning is issued since invalid memory access may
> happen when bits 64:127 at memory location are unmapped:
>
> xmmintrin.h:1168:3: note: Emulate MMX maskmovq with SSE2 maskmovdqu may result i
> n invalid memory access
>  1168 |   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
>       |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Would it be possible to shift the mask according to the misalignment in
the address?  I think this should allow avoiding crossing a page
boundary if the orginal 64-bit load would not.

H.J. Lu Feb. 2, 2019, 5:12 p.m. UTC | #4

On Sat, Feb 2, 2019 at 9:07 AM Florian Weimer <fw@deneb.enyo.de> wrote:
>
> * H. J. Lu:
>
> > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
> > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
> > mask operand.  A warning is issued since invalid memory access may
> > happen when bits 64:127 at memory location are unmapped:
> >
> > xmmintrin.h:1168:3: note: Emulate MMX maskmovq with SSE2 maskmovdqu may result i
> > n invalid memory access
> >  1168 |   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
> >       |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Would it be possible to shift the mask according to the misalignment in
> the address?  I think this should allow avoiding crossing a page
> boundary if the orginal 64-bit load would not.

I guess it is possible.  But it may be quite a bit complex for for no
apparent gains
since we also need to shift the implicit memory address.

H.J. Lu Feb. 3, 2019, 4:16 p.m. UTC | #5

On Sat, Feb 02, 2019 at 09:12:12AM -0800, H.J. Lu wrote:
> On Sat, Feb 2, 2019 at 9:07 AM Florian Weimer <fw@deneb.enyo.de> wrote:
> >
> > * H. J. Lu:
> >
> > > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
> > > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
> > > mask operand.  A warning is issued since invalid memory access may
> > > happen when bits 64:127 at memory location are unmapped:
> > >
> > > xmmintrin.h:1168:3: note: Emulate MMX maskmovq with SSE2 maskmovdqu may result i
> > > n invalid memory access
> > >  1168 |   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
> > >       |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Would it be possible to shift the mask according to the misalignment in
> > the address?  I think this should allow avoiding crossing a page
> > boundary if the orginal 64-bit load would not.
> 
> I guess it is possible.  But it may be quite a bit complex for for no
> apparent gains
> since we also need to shift the implicit memory address.
> 

I updated MMX maskmovq emulation to handle it:

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00139.html

and added tests to verify that unmapped bits 64:127 at memory address
are properly handled:

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00140.html


H.J.

Andi Kleen Feb. 4, 2019, 4:24 a.m. UTC | #6

On Fri, Feb 01, 2019 at 06:46:53PM -0800, H.J. Lu wrote:
> On Fri, Feb 1, 2019 at 4:50 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > "H.J. Lu" <hjl.tools@gmail.com> writes:
> >
> > > To support it, we disable
> > > MMX by default in 64-bit mode so that MMX registers won't be available
> >
> > Wouldn't that break inline assembler that references MMX register clobbers?
> 
> Yes.  You need to use -mmmx explicitly to enable MMX.

Such a breaking change needs to be clearly spelled out in the documentation /
NEWS.

-Andi

Uros Bizjak Feb. 4, 2019, 9:10 a.m. UTC | #7

On Fri, Feb 1, 2019 at 10:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On x86-64, since __m64 is returned and passed in XMM registers, we can
> implement MMX intrinsics with SSE instructions. To support it, we disable
> MMX by default in 64-bit mode so that MMX registers won't be available
> with x86-64.  Most of MMX instructions have equivalent SSE versions and
> results of some SSE versions need to be reshuffled to the right order
> for MMX.  Thee are couple tricky cases:

I don't think we have to disable MMX registers, but we have to tune
register allocation preferences to not allocate MMX register unless
really necessary. In practice, this means to change y constraints to
*y when TARGET_MMX_WITH_SSE is active (probably using enable
attribute). This would solve problem with assembler clobbers that Andi
exposed.

Uros.

Richard Biener Feb. 4, 2019, 11:04 a.m. UTC | #8

On Mon, Feb 4, 2019 at 10:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Fri, Feb 1, 2019 at 10:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On x86-64, since __m64 is returned and passed in XMM registers, we can
> > implement MMX intrinsics with SSE instructions. To support it, we disable
> > MMX by default in 64-bit mode so that MMX registers won't be available
> > with x86-64.  Most of MMX instructions have equivalent SSE versions and
> > results of some SSE versions need to be reshuffled to the right order
> > for MMX.  Thee are couple tricky cases:
>
> I don't think we have to disable MMX registers, but we have to tune
> register allocation preferences to not allocate MMX register unless
> really necessary. In practice, this means to change y constraints to
> *y when TARGET_MMX_WITH_SSE is active (probably using enable
> attribute). This would solve problem with assembler clobbers that Andi
> exposed.

But is "unless really necessary" good enough to not have it wrongly
under any circumstance?  I actually like HJs patch (not looked at the
details though).  I'd have gone a more aggressive way of simply defaulting
to -mno-mmx without any emulation or whatnot though.

Richard.

>
> Uros.

Jakub Jelinek Feb. 4, 2019, 11:08 a.m. UTC | #9

On Mon, Feb 04, 2019 at 12:04:04PM +0100, Richard Biener wrote:
> On Mon, Feb 4, 2019 at 10:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Fri, Feb 1, 2019 at 10:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On x86-64, since __m64 is returned and passed in XMM registers, we can
> > > implement MMX intrinsics with SSE instructions. To support it, we disable
> > > MMX by default in 64-bit mode so that MMX registers won't be available
> > > with x86-64.  Most of MMX instructions have equivalent SSE versions and
> > > results of some SSE versions need to be reshuffled to the right order
> > > for MMX.  Thee are couple tricky cases:
> >
> > I don't think we have to disable MMX registers, but we have to tune
> > register allocation preferences to not allocate MMX register unless
> > really necessary. In practice, this means to change y constraints to
> > *y when TARGET_MMX_WITH_SSE is active (probably using enable
> > attribute). This would solve problem with assembler clobbers that Andi
> > exposed.
> 
> But is "unless really necessary" good enough to not have it wrongly
> under any circumstance?  I actually like HJs patch (not looked at the

Or we could disable MMX registers unless they are referenced in inline asm
(clobbers or constraints).

Anyway, is the patch set meant for GCC9 or GCC10?  I'd say it would be quite
dangerous to change this in GCC9.

	Jakub

Uros Bizjak Feb. 4, 2019, 11:23 a.m. UTC | #10

On Mon, Feb 4, 2019 at 12:04 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Feb 4, 2019 at 10:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Fri, Feb 1, 2019 at 10:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On x86-64, since __m64 is returned and passed in XMM registers, we can
> > > implement MMX intrinsics with SSE instructions. To support it, we disable
> > > MMX by default in 64-bit mode so that MMX registers won't be available
> > > with x86-64.  Most of MMX instructions have equivalent SSE versions and
> > > results of some SSE versions need to be reshuffled to the right order
> > > for MMX.  Thee are couple tricky cases:
> >
> > I don't think we have to disable MMX registers, but we have to tune
> > register allocation preferences to not allocate MMX register unless
> > really necessary. In practice, this means to change y constraints to
> > *y when TARGET_MMX_WITH_SSE is active (probably using enable
> > attribute). This would solve problem with assembler clobbers that Andi
> > exposed.
>
> But is "unless really necessary" good enough to not have it wrongly
> under any circumstance?  I actually like HJs patch (not looked at the
> details though).  I'd have gone a more aggressive way of simply defaulting
> to -mno-mmx without any emulation or whatnot though.

Please see attached *prototype* patch that enables vectorization for

void
foo (char *restrict r, char *restrict a, char *restrict b)
{
  int i;

  for (i = 0; i < 8; i++)
    r[i] = a[i] + b[i];
}

with and without -mmmx. The pattern is defined as:

(define_insn "*mmx_<plusminus_insn><mode>3"
  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,v")
        (plusminus:MMXMODEI8
      (match_operand:MMXMODEI8 1 "nonimmediate_operand" "<comm>0,0,v")
      (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,v")))]
  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
  "@
   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
   vp<plusminus_mnemonic><mmxvecsize>\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "type" "mmxadd")
   (set_attr "mode" "DI")
   (set (attr "enabled")
     (cond [(eq_attr "alternative" "1")
          (symbol_ref "TARGET_MMX_WITH_SSE")
        (eq_attr "alternative" "2")
            (symbol_ref "TARGET_AVX && TARGET_MMX_WITH_SSE")
         ]
         (symbol_ref ("!TARGET_MMX_WITH_SSE"))))])

so, there is no way mmx register gets allocated with
TARGET_MMX_WITH_SSE. We have had MMX registers enabled in move insns
for years, and there were no problems with current register
preferences. So, I'm pretty confident that the above is enough to
prevent unwanted MMX moves, while still allowing MMX registers.

With the above approach, we can enable TARGET_MMX_WITH_SSE
unconditionally for 64bit SSE2 targets, since we will still allow MMX
regs. Please note that there is no requirement to use MMX instructions
for MMX intrinsics, so we can emit _all_ MMX intrinsics using HJ's
conversion unconditionally.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4e67abe87646..3bf7d33f840d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -44052,7 +44052,8 @@ ix86_vector_mode_supported_p (machine_mode mode)
 {
   if (TARGET_SSE && VALID_SSE_REG_MODE (mode))
     return true;
-  if (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
+  if ((TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
+      || (TARGET_MMX_WITH_SSE && VALID_MMX_REG_MODE (mode)))
     return true;
   if (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
     return true;
@@ -50050,6 +50051,9 @@ ix86_autovectorize_vector_sizes (vector_sizes *sizes)
       sizes->safe_push (32);
       sizes->safe_push (16);
     }
+
+  if (TARGET_MMX_WITH_SSE)
+    sizes->safe_push (8);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 83b025e0cf5d..3c9e77ba7c2e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -585,6 +585,8 @@ extern unsigned char ix86_arch_features[X86_ARCH_LAST];
 
 #define TARGET_FISTTP		(TARGET_SSE3 && TARGET_80387)
 
+#define TARGET_MMX_WITH_SSE	(TARGET_64BIT && TARGET_SSE2)
+
 extern unsigned char x86_prefetch_sse;
 #define TARGET_PREFETCH_SSE	x86_prefetch_sse
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c1e0f2c411e6..304f711d2b27 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -45,7 +45,7 @@
 
 ;; 8 byte integral modes handled by MMX (and by extension, SSE)
 (define_mode_iterator MMXMODEI [V8QI V4HI V2SI])
-(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI V1DI])
+(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")])
 
 ;; All 8-byte vector modes handled by MMX
 (define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF])
@@ -70,7 +70,7 @@
 (define_expand "mov<mode>"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
 	(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (<MODE>mode, operands);
   DONE;
@@ -81,7 +81,7 @@
     "=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
 	(match_operand:MMXMODE 1 "nonimm_or_0_operand"
     "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
-  "TARGET_MMX
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -690,19 +690,37 @@
 	(plusminus:MMXMODEI8
 	  (match_operand:MMXMODEI8 1 "nonimmediate_operand")
 	  (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
-  "TARGET_MMX || (TARGET_SSE2 && <MODE>mode == V1DImode)"
+  "TARGET_MMX"
+  "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
+
+(define_expand "<plusminus_insn><mode>3"
+  [(set (match_operand:MMXMODEI8 0 "register_operand")
+	(plusminus:MMXMODEI8
+	  (match_operand:MMXMODEI8 1 "nonimmediate_operand")
+	  (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
 (define_insn "*mmx_<plusminus_insn><mode>3"
-  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,v")
         (plusminus:MMXMODEI8
-	  (match_operand:MMXMODEI8 1 "nonimmediate_operand" "<comm>0")
-	  (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_MMX || (TARGET_SSE2 && <MODE>mode == V1DImode))
+	  (match_operand:MMXMODEI8 1 "nonimmediate_operand" "<comm>0,0,v")
+	  (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,v")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
-  "p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}"
+  "@
+   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
+   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
+   vp<plusminus_mnemonic><mmxvecsize>\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI")
+   (set (attr "enabled")
+     (cond [(eq_attr "alternative" "1")
+	      (symbol_ref "TARGET_MMX_WITH_SSE")
+	    (eq_attr "alternative" "2")
+	        (symbol_ref "TARGET_AVX && TARGET_MMX_WITH_SSE")
+	     ]
+	     (symbol_ref ("!TARGET_MMX_WITH_SSE"))))])
 
 (define_expand "mmx_<plusminus_insn><mode>3"
   [(set (match_operand:MMXMODE12 0 "register_operand")

Uros Bizjak Feb. 4, 2019, 11:24 a.m. UTC | #11

On Mon, Feb 4, 2019 at 12:08 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Feb 04, 2019 at 12:04:04PM +0100, Richard Biener wrote:
> > On Mon, Feb 4, 2019 at 10:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Fri, Feb 1, 2019 at 10:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On x86-64, since __m64 is returned and passed in XMM registers, we can
> > > > implement MMX intrinsics with SSE instructions. To support it, we disable
> > > > MMX by default in 64-bit mode so that MMX registers won't be available
> > > > with x86-64.  Most of MMX instructions have equivalent SSE versions and
> > > > results of some SSE versions need to be reshuffled to the right order
> > > > for MMX.  Thee are couple tricky cases:
> > >
> > > I don't think we have to disable MMX registers, but we have to tune
> > > register allocation preferences to not allocate MMX register unless
> > > really necessary. In practice, this means to change y constraints to
> > > *y when TARGET_MMX_WITH_SSE is active (probably using enable
> > > attribute). This would solve problem with assembler clobbers that Andi
> > > exposed.
> >
> > But is "unless really necessary" good enough to not have it wrongly
> > under any circumstance?  I actually like HJs patch (not looked at the
>
> Or we could disable MMX registers unless they are referenced in inline asm
> (clobbers or constraints).
>
> Anyway, is the patch set meant for GCC9 or GCC10?  I'd say it would be quite
> dangerous to change this in GCC9.

No, this relatively invasive patchset is definitely meant for GCC10.

Uros.

Uros Bizjak Feb. 4, 2019, 1 p.m. UTC | #12

On Mon, Feb 4, 2019 at 12:23 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Feb 4, 2019 at 12:04 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Feb 4, 2019 at 10:10 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Fri, Feb 1, 2019 at 10:18 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On x86-64, since __m64 is returned and passed in XMM registers, we can
> > > > implement MMX intrinsics with SSE instructions. To support it, we disable
> > > > MMX by default in 64-bit mode so that MMX registers won't be available
> > > > with x86-64.  Most of MMX instructions have equivalent SSE versions and
> > > > results of some SSE versions need to be reshuffled to the right order
> > > > for MMX.  Thee are couple tricky cases:
> > >
> > > I don't think we have to disable MMX registers, but we have to tune
> > > register allocation preferences to not allocate MMX register unless
> > > really necessary. In practice, this means to change y constraints to
> > > *y when TARGET_MMX_WITH_SSE is active (probably using enable
> > > attribute). This would solve problem with assembler clobbers that Andi
> > > exposed.
> >
> > But is "unless really necessary" good enough to not have it wrongly
> > under any circumstance?  I actually like HJs patch (not looked at the
> > details though).  I'd have gone a more aggressive way of simply defaulting
> > to -mno-mmx without any emulation or whatnot though.
>
> Please see attached *prototype* patch that enables vectorization for
>
> void
> foo (char *restrict r, char *restrict a, char *restrict b)
> {
>   int i;
>
>   for (i = 0; i < 8; i++)
>     r[i] = a[i] + b[i];
> }
>
> with and without -mmmx. The pattern is defined as:
>
> (define_insn "*mmx_<plusminus_insn><mode>3"
>   [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,v")
>         (plusminus:MMXMODEI8
>       (match_operand:MMXMODEI8 1 "nonimmediate_operand" "<comm>0,0,v")
>       (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,v")))]
>   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
>    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
>   "@
>    p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
>    p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
>    vp<plusminus_mnemonic><mmxvecsize>\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "type" "mmxadd")
>    (set_attr "mode" "DI")
>    (set (attr "enabled")
>      (cond [(eq_attr "alternative" "1")
>           (symbol_ref "TARGET_MMX_WITH_SSE")
>         (eq_attr "alternative" "2")
>             (symbol_ref "TARGET_AVX && TARGET_MMX_WITH_SSE")
>          ]
>          (symbol_ref ("!TARGET_MMX_WITH_SSE"))))])
>
> so, there is no way mmx register gets allocated with
> TARGET_MMX_WITH_SSE. We have had MMX registers enabled in move insns
> for years, and there were no problems with current register
> preferences. So, I'm pretty confident that the above is enough to
> prevent unwanted MMX moves, while still allowing MMX registers.
>
> With the above approach, we can enable TARGET_MMX_WITH_SSE
> unconditionally for 64bit SSE2 targets, since we will still allow MMX
> regs. Please note that there is no requirement to use MMX instructions
> for MMX intrinsics, so we can emit _all_ MMX intrinsics using HJ's
> conversion unconditionally.

Attached is the patch that enables alternatives in a much more convenient way:

(define_insn "*mmx_<plusminus_insn><mode>3"
  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,v")
        (plusminus:MMXMODEI8
      (match_operand:MMXMODEI8 1 "nonimmediate_operand" "<comm>0,0,v")
      (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,v")))]
  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
   && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
  "@
   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
   vp<plusminus_mnemonic><mmxvecsize>\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
   (set_attr "type" "mmxadd")
   (set_attr "mode" "DI")])

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4e67abe87646..62ad919d2596 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -50050,6 +50050,9 @@ ix86_autovectorize_vector_sizes (vector_sizes *sizes)
       sizes->safe_push (32);
       sizes->safe_push (16);
     }
+
+  if (TARGET_MMX_WITH_SSE)
+    sizes->safe_push (8);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 83b025e0cf5d..30e05e9703d4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -585,6 +585,8 @@ extern unsigned char ix86_arch_features[X86_ARCH_LAST];
 
 #define TARGET_FISTTP		(TARGET_SSE3 && TARGET_80387)
 
+#define TARGET_MMX_WITH_SSE	(TARGET_64BIT && TARGET_SSE2)
+
 extern unsigned char x86_prefetch_sse;
 #define TARGET_PREFETCH_SSE	x86_prefetch_sse
 
@@ -1145,7 +1147,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 
 #define VALID_SSE2_REG_MODE(MODE)					\
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode	\
-   || (MODE) == V2DImode || (MODE) == DFmode)
+   || (MODE) == V2DImode || (MODE) == DFmode				\
+   || (TARGET_MMX_WITH_SSE && VALID_MMX_REG_MODE (mode)))
 
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 744f155fca6f..dd6c04db5e63 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -792,6 +792,9 @@
 		    avx512vl,noavx512vl,x64_avx512dq,x64_avx512bw"
   (const_string "base"))
 
+;; Define instruction set of MMX instructions
+(define_attr "mmx_isa" "base,native,x64_noavx,x64_avx" (const_string "base"))
+
 (define_attr "enabled" ""
   (cond [(eq_attr "isa" "x64") (symbol_ref "TARGET_64BIT")
 	 (eq_attr "isa" "x64_sse2")
@@ -830,6 +833,13 @@
 	 (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
 	 (eq_attr "isa" "avx512vl") (symbol_ref "TARGET_AVX512VL")
 	 (eq_attr "isa" "noavx512vl") (symbol_ref "!TARGET_AVX512VL")
+
+	 (eq_attr "mmx_isa" "native")
+	   (symbol_ref "!TARGET_MMX_WITH_SSE")
+	 (eq_attr "mmx_isa" "x64_avx")
+	   (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
+	 (eq_attr "mmx_isa" "x64_noavx")
+	   (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
 	]
 	(const_int 1)))
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c1e0f2c411e6..125b2bf5b373 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -45,7 +45,7 @@
 
 ;; 8 byte integral modes handled by MMX (and by extension, SSE)
 (define_mode_iterator MMXMODEI [V8QI V4HI V2SI])
-(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI V1DI])
+(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")])
 
 ;; All 8-byte vector modes handled by MMX
 (define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF])
@@ -70,7 +70,7 @@
 (define_expand "mov<mode>"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
 	(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (<MODE>mode, operands);
   DONE;
@@ -81,7 +81,7 @@
     "=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
 	(match_operand:MMXMODE 1 "nonimm_or_0_operand"
     "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
-  "TARGET_MMX
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -690,18 +690,30 @@
 	(plusminus:MMXMODEI8
 	  (match_operand:MMXMODEI8 1 "nonimmediate_operand")
 	  (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
-  "TARGET_MMX || (TARGET_SSE2 && <MODE>mode == V1DImode)"
+  "TARGET_MMX"
+  "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
+
+(define_expand "<plusminus_insn><mode>3"
+  [(set (match_operand:MMXMODEI8 0 "register_operand")
+	(plusminus:MMXMODEI8
+	  (match_operand:MMXMODEI8 1 "nonimmediate_operand")
+	  (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (<CODE>, <MODE>mode, operands);")
 
 (define_insn "*mmx_<plusminus_insn><mode>3"
-  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,v")
         (plusminus:MMXMODEI8
-	  (match_operand:MMXMODEI8 1 "nonimmediate_operand" "<comm>0")
-	  (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_MMX || (TARGET_SSE2 && <MODE>mode == V1DImode))
+	  (match_operand:MMXMODEI8 1 "nonimmediate_operand" "<comm>0,0,v")
+	  (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,v")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
    && ix86_binary_operator_ok (<CODE>, <MODE>mode, operands)"
-  "p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
+  "@
+   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
+   p<plusminus_mnemonic><mmxvecsize>\t{%2, %0|%0, %2}
+   vp<plusminus_mnemonic><mmxvecsize>\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd")
    (set_attr "mode" "DI")])
 
 (define_expand "mmx_<plusminus_insn><mode>3"

[00/46] Implement MMX intrinsics with SSE

Message

Comments