From patchwork Sat Sep 29 22:04:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 976723 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-486673-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Np9CD/VU"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42N2g037hGz9s1x for ; Sun, 30 Sep 2018 08:05:46 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; q=dns; s=default; b=vuYlC8L2Q2Vt +yuyNH3ciGYL2jDOMhaYW3kPvtHDZWxK/uoMVXdSwA7rAhWUWSim7ii3rtGBUmh4 hMChCyxdTczFGg4S+WgccQxDDxlX2H26l97agmU1rP97hXNVcKjt28ARsKYnuzhJ epXt9ZqGXuOAlwb/giBk0eMScsBYC40= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id; s=default; bh=t862xhEDoMK2gQEYvI r073PjpUc=; b=Np9CD/VU0TZ7Drzo6HuvXOrQEDHJ1lBU/qGvO9jqL+18ahSG7B votp/pgf/FEXF3w3Krub3DJZyy1T2PgYBmCY2T9xZ3s/CyVFX/6aRwXSY6JQ8qgg U9yJA7AliFJ4FPZgvU8v4LeT+sNOZlDQ7Q6bmUPI8Tqq175kYkJFAe57Y= Received: (qmail 35913 invoked by alias); 29 Sep 2018 22:05:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 35902 invoked by uid 89); 29 Sep 2018 22:05:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.2 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, SPF_SOFTFAIL autolearn=ham version=3.3.2 spammy=flags_reg, FLAGS_REG X-HELO: mga09.intel.com Received: from mga09.intel.com (HELO mga09.intel.com) (134.134.136.24) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 29 Sep 2018 22:05:35 +0000 Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Sep 2018 15:05:33 -0700 Received: from gnu-cfl-1.sc.intel.com ([172.25.70.237]) by fmsmga007.fm.intel.com with ESMTP; 29 Sep 2018 15:04:25 -0700 From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Uros Bizjak Subject: [PATCH] x86: Add pmovzx/pmovsx patterns with SI/DI operands Date: Sat, 29 Sep 2018 15:04:25 -0700 Message-Id: <20180929220425.4714-1-hjl.tools@gmail.com> X-IsSubscribed: yes Add pmovzx/pmovsx patterns with SI and DI operands for pmovzx/pmovsx instructions which only read the low 4 or 8 bytes from the source. gcc/ PR target/87317 * config/i386/sse.md (*sse4_1_v8qiv8hi2): New pattern. (*sse4_1_v4qiv4si2): Likewise. (*sse4_1_v4hiv4si2): Likewise. (*sse4_1_v2hiv2di2): Likewise. (*sse4_1_v2siv2di2): Likewise. gcc/testsuite/ PR target/87317 * gcc.target/i386/pr87317-1.c: New file. * gcc.target/i386/pr87317-2.c: Likewise. * gcc.target/i386/pr87317-3.c: Likewise. * gcc.target/i386/pr87317-4.c: Likewise. * gcc.target/i386/pr87317-5.c: Likewise. * gcc.target/i386/pr87317-6.c: Likewise. * gcc.target/i386/pr87317-7.c: Likewise. * gcc.target/i386/pr87317-8.c: Likewise. * gcc.target/i386/pr87317-9.c: Likewise. * gcc.target/i386/pr87317-10.c: Likewise. --- gcc/config/i386/sse.md | 98 ++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr87317-1.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-10.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-2.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-3.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-4.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-5.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-6.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-7.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-8.c | 13 +++ gcc/testsuite/gcc.target/i386/pr87317-9.c | 13 +++ 11 files changed, 228 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr87317-9.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index d2722fdfcd0..c8ff35b125c 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -15521,6 +15521,26 @@ (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "TI")]) +(define_insn "*sse4_1_v8qiv8hi2" + [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,v") + (any_extend:V8HI + (vec_select:V8QI + (subreg:V16QI + (vec_concat:V2DI + (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm") + (const_int 0)) 0) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3) + (const_int 4) (const_int 5) + (const_int 6) (const_int 7)]))))] + "TARGET_SSE4_1 && && " + "%vpmovbw\t{%1, %0|%0, %q1}" + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "TI")]) + (define_insn "avx512f_v16qiv16si2" [(set (match_operand:V16SI 0 "register_operand" "=v") (any_extend:V16SI @@ -15562,6 +15582,28 @@ (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "TI")]) +(define_insn "*sse4_1_v4qiv4si2" + [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v") + (any_extend:V4SI + (vec_select:V4QI + (subreg:V16QI + (vec_merge:V4SI + (vec_duplicate:V4SI + (match_operand:SI 1 "nonimmediate_operand" "m,*m,m")) + (const_vector:V4SI + [(const_int 0) (const_int 0) + (const_int 0) (const_int 0)]) + (const_int 1)) 0) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))))] + "TARGET_SSE4_1 && " + "%vpmovbd\t{%1, %0|%0, %k1}" + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "TI")]) + (define_insn "avx512f_v16hiv16si2" [(set (match_operand:V16SI 0 "register_operand" "=v") (any_extend:V16SI @@ -15598,6 +15640,24 @@ (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "TI")]) +(define_insn "*sse4_1_v4hiv4si2" + [(set (match_operand:V4SI 0 "register_operand" "=Yr,*x,v") + (any_extend:V4SI + (vec_select:V4HI + (subreg:V8HI + (vec_concat:V2DI + (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm") + (const_int 0)) 0) + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)]))))] + "TARGET_SSE4_1 && " + "%vpmovwd\t{%1, %0|%0, %q1}" + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "TI")]) + (define_insn "avx512f_v8qiv8di2" [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI @@ -15679,6 +15739,27 @@ (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "TI")]) +(define_insn "*sse4_1_v2hiv2di2" + [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v") + (any_extend:V2DI + (vec_select:V2HI + (subreg:V8HI + (vec_merge:V4SI + (vec_duplicate:V4SI + (match_operand:SI 1 "nonimmediate_operand" "m,*m,m")) + (const_vector:V4SI + [(const_int 0) (const_int 0) + (const_int 0) (const_int 0)]) + (const_int 1)) 0) + (parallel [(const_int 0) (const_int 1)]))))] + "TARGET_SSE4_1 && " + "%vpmovwq\t{%1, %0|%0, %k1}" + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "TI")]) + (define_insn "avx512f_v8siv8di2" [(set (match_operand:V8DI 0 "register_operand" "=v") (any_extend:V8DI @@ -15714,6 +15795,23 @@ (set_attr "prefix" "orig,orig,maybe_evex") (set_attr "mode" "TI")]) +(define_insn "*sse4_1_v2siv2di2" + [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,v") + (any_extend:V2DI + (vec_select:V2SI + (subreg:V4SI + (vec_concat:V2DI + (match_operand:DI 1 "nonimmediate_operand" "Yrm,*xm,vm") + (const_int 0)) 0) + (parallel [(const_int 0) (const_int 1)]))))] + "TARGET_SSE4_1 && " + "%vpmovdq\t{%1, %0|%0, %q1}" + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "TI")]) + ;; ptestps/ptestpd are very similar to comiss and ucomiss when ;; setting FLAGS_REG. But it is not a really compare instruction. (define_insn "avx_vtest" diff --git a/gcc/testsuite/gcc.target/i386/pr87317-1.c b/gcc/testsuite/gcc.target/i386/pr87317-1.c new file mode 100644 index 00000000000..91f00368293 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, void *ptr) +{ + __m128i data = _mm_loadl_epi64((__m128i *)ptr); + data = _mm_cvtepu8_epi16(data); + _mm_storeu_si128((__m128i*)dst, data); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-10.c b/gcc/testsuite/gcc.target/i386/pr87317-10.c new file mode 100644 index 00000000000..99c657e9df8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-10.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, void *ptr) +{ + __m128i data = _mm_cvtsi32_si128(*(int*)ptr); + data = _mm_cvtepu8_epi32(data); + _mm_storeu_si128((__m128i*)dst, data); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-2.c b/gcc/testsuite/gcc.target/i386/pr87317-2.c new file mode 100644 index 00000000000..e21f00334e0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, void *ptr) +{ + __m128i data = _mm_loadl_epi64((__m128i *)ptr); + data = _mm_cvtepi16_epi32(data); + _mm_storeu_si128((__m128i*)dst, data); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-3.c b/gcc/testsuite/gcc.target/i386/pr87317-3.c new file mode 100644 index 00000000000..d4483f9c134 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-3.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, void *ptr) +{ + __m128i data = _mm_loadl_epi64((__m128i *)ptr); + data = _mm_cvtepi32_epi64(data); + _mm_storeu_si128((__m128i*)dst, data); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-4.c b/gcc/testsuite/gcc.target/i386/pr87317-4.c new file mode 100644 index 00000000000..dff24a9d657 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-4.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, __m64 x) +{ + __m128i y = _mm_movpi64_epi64(x); + __m128i z = _mm_cvtepu8_epi16(y); + _mm_storeu_si128((__m128i*)dst, z); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-5.c b/gcc/testsuite/gcc.target/i386/pr87317-5.c new file mode 100644 index 00000000000..574395894b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-5.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, __m64 x) +{ + __m128i y = _mm_movpi64_epi64(x); + __m128i z = _mm_cvtepi16_epi32(y); + _mm_storeu_si128((__m128i*)dst, z); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-6.c b/gcc/testsuite/gcc.target/i386/pr87317-6.c new file mode 100644 index 00000000000..9d27648d433 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-6.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, __m64 x) +{ + __m128i y = _mm_movpi64_epi64(x); + __m128i z = _mm_cvtepi32_epi64 (y); + _mm_storeu_si128((__m128i*)dst, z); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-7.c b/gcc/testsuite/gcc.target/i386/pr87317-7.c new file mode 100644 index 00000000000..99c657e9df8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-7.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, void *ptr) +{ + __m128i data = _mm_cvtsi32_si128(*(int*)ptr); + data = _mm_cvtepu8_epi32(data); + _mm_storeu_si128((__m128i*)dst, data); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-8.c b/gcc/testsuite/gcc.target/i386/pr87317-8.c new file mode 100644 index 00000000000..c688e3e6d08 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-8.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmov(d|q)" } } */ + +#include + +void +f (void *dst, void *ptr) +{ + __m128i data = _mm_cvtsi32_si128(*(int*)ptr); + data = _mm_cvtepu16_epi64(data); + _mm_storeu_si128((__m128i*)dst, data); +} diff --git a/gcc/testsuite/gcc.target/i386/pr87317-9.c b/gcc/testsuite/gcc.target/i386/pr87317-9.c new file mode 100644 index 00000000000..4311ed3ceb5 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr87317-9.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=haswell" } */ +/* { dg-final { scan-assembler-not "vmovq" } } */ + +#include + +int +f (void *ptr) +{ + __m128i data = _mm_loadl_epi64((__m128i *)ptr); + data = _mm_cvtepu8_epi16(data); + return _mm_cvtsi128_si32(data); +}