From patchwork Wed Sep 2 09:34:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Hongtao Liu X-Patchwork-Id: 1355624 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=gcc.gnu.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=E0eTNuBn; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BhJfF4cQHz9sTv for ; Wed, 2 Sep 2020 19:33:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 22A223870854; Wed, 2 Sep 2020 09:33:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 22A223870854 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1599039219; bh=g46VXAADlS5Hixb9rtGPFIhBcFZAJJbmsBitj1XUvM0=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=E0eTNuBnB6PWnwwcNaYwLZ0h5mHtq1bTTzjj8Nz+b4Bpc9dKMwge72EkH8zskgm+Y mXgyU42MbYUQNTTdsQQe+4shvtvVXk0rKtjwJ1+DM/Fl/B06kseWxUBdm8oxV3c3qi 9fV3n7oO/8TPj/+JuGeXO2kRiCV4js10nWs9cOqE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ua1-x929.google.com (mail-ua1-x929.google.com [IPv6:2607:f8b0:4864:20::929]) by sourceware.org (Postfix) with ESMTPS id B14063851C07 for ; Wed, 2 Sep 2020 09:33:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B14063851C07 Received: by mail-ua1-x929.google.com with SMTP id l1so1360604uai.3 for ; Wed, 02 Sep 2020 02:33:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=g46VXAADlS5Hixb9rtGPFIhBcFZAJJbmsBitj1XUvM0=; b=HnAKBL2z4rEqVz2XHcvdpbQNj1Y6hiWDM0ogQVgkxu4DVb8Ecdg/PgE7zks9ZOhgCQ VCbkibNDC7KEk4AdazJBOyHTm70iieQEXGJfO+7iDsHUQl+fEDH4Ry1fzs+rnSS5+yCc utA9r4KUDTRIhjY+HVH6iFAoyTdyfDJt/chaSlootDj6k0VN2/hqLYV5DJlBaLzTJsEI Cq8l8UHrhVOucXaFC4213cuwCsL2UgT4ysR66FLJ9Zp7JsTTNI5mNZ4TSY+BMfrXLjvY HnL9dAgYwbjSvBETKshG9vWTAvex957GDodbYNHTkL7BfJu0UOLyIMSgYMzEFp88cPN6 ZTjA== X-Gm-Message-State: AOAM533rdMr5/T+U0dLF3WDk7ohU5Brj31Ty+iwUvwohb4xewR2nArk7 AIaumEI8g4MaXvBDDPKkmpQqhOVaUP7lMu9pxYI2Os8Ua9I= X-Google-Smtp-Source: ABdhPJwTWmVlIxPyhANjd8ypp3WPlIj9x14EylHE/qSLslfKTzJ2n3onwypxQgYmFck7umsXEG1579TZT+uet2B9w8s= X-Received: by 2002:ab0:71cd:: with SMTP id n13mr4381290uao.24.1599039215983; Wed, 02 Sep 2020 02:33:35 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 2 Sep 2020 17:34:27 +0800 Message-ID: Subject: [PATCH][AVX512]Lower AVX512 vector compare to AVX version when dest is vector To: GCC Patches X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Hongtao Liu via Gcc-patches From: Hongtao Liu Reply-To: Hongtao Liu Errors-To: gcc-patches-bounces@gcc.gnu.org Sender: "Gcc-patches" Hi: Add define_peephole2 to eliminate potential redundant conversion from mask to vector. Bootstrap is ok, regression test is ok for i386/x86-64 backend. Ok for trunk? gcc/ChangeLog: PR target/96891 * config/i386/sse.md (VI_128_256): New mode iterator. (define_peephole2): Lower avx512 vector compare to avx version when dest is vector. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. From ba76432c08f47e4ecc1f355c0dfdea8908aaf9f4 Mon Sep 17 00:00:00 2001 From: liuhongt Date: Wed, 2 Sep 2020 17:14:39 +0800 Subject: [PATCH] Lower AVX512 vector compare to AVX version when dest is vector. gcc/ChangeLog: PR target/96891 * config/i386/sse.md (VI_128_256): New mode iterator. (define_peephole2): Lower avx512 vector compare to avx version when dest is vector. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bw-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-1.c: New test. * gcc.target/i386/avx512f-pr96891-2.c: New test. --- gcc/config/i386/sse.md | 93 +++++++++++++++++++ .../gcc.target/i386/avx512bw-pr96891-1.c | 36 +++++++ .../gcc.target/i386/avx512f-pr96891-1.c | 40 ++++++++ .../gcc.target/i386/avx512f-pr96891-2.c | 30 ++++++ 4 files changed, 199 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 8250325e1a3..31e0dc2a600 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -629,6 +629,9 @@ (define_mode_iterator VI_128 [V16QI V8HI V4SI V2DI]) ;; All 256bit vector integer modes (define_mode_iterator VI_256 [V32QI V16HI V8SI V4DI]) +;; All 128 and 256bit vector integer modes +(define_mode_iterator VI_128_256 [V16QI V8HI V4SI V2DI V32QI V16HI V8SI V4DI]) + ;; Various 128bit vector integer mode combinations (define_mode_iterator VI12_128 [V16QI V8HI]) (define_mode_iterator VI14_128 [V16QI V4SI]) @@ -6703,6 +6706,96 @@ (define_insn "*_cvtmask2" (set_attr "prefix" "evex") (set_attr "mode" "")]) +/* Lower avx512 parallel floating compare to avx compare when dst is vector. */ +(define_peephole2 + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_operand:VF_128_256 1 "register_operand") + (match_operand:VF_128_256 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_31_operand")] + UNSPEC_PCMP)) + (set (match_operand: 4 "register_operand") + (vec_merge: + (match_operand: 5 "vector_all_ones_operand") + (match_operand: 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 7) + (unspec:VF_128_256 + [(match_dup 1) + (match_dup 2) + (match_dup 3)] UNSPEC_PCMP))] + "operands[7] = gen_rtx_REG (mode, REGNO (operands[4]));") + +/* Lower avx512 parallel integral compare to avx compare when dst is vector. */ +(define_peephole2 + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand")] + UNSPEC_MASKED_EQ)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 4) + (eq:VI_128_256 + (match_dup 1) + (match_dup 2)))]) + +(define_peephole2 + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand")] + UNSPEC_MASKED_GT)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "!EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(set (match_dup 4) + (gt:VI_128_256 + (match_dup 1) + (match_dup 2)))]) + +(define_peephole2 + [(set (match_operand: 0 "register_operand") + (unspec: + [(match_operand:VI_128_256 1 "register_operand") + (match_operand:VI_128_256 2 "nonimmediate_operand") + (match_operand:SI 3 "const_0_to_7_operand")] + UNSPEC_PCMP)) + (set (match_operand:VI_128_256 4 "register_operand") + (vec_merge:VI_128_256 + (match_operand:VI_128_256 5 "vector_all_ones_operand") + (match_operand:VI_128_256 6 "const0_operand") + (match_dup 0)))] + "(INTVAL (operands[3]) == 0 || INTVAL (operands[3]) == 6) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[4])) + && !EXT_REX_SSE_REGNO_P (REGNO (operands[1])) + && !(REG_P (operands[2]) && EXT_REX_SSE_REGNO_P (REGNO (operands[2]))) + && peep2_reg_dead_p (2, operands[0])" + [(const_int 0)] +{ + enum rtx_code code = INTVAL (operands[3]) ? GT : EQ; + emit_move_insn (operands[4], gen_rtx_fmt_ee (code, mode, + operands[1], operands[2])); + DONE; +}) + (define_insn "sse2_cvtps2pd" [(set (match_operand:V2DF 0 "register_operand" "=v") (float_extend:V2DF diff --git a/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c new file mode 100644 index 00000000000..45efff4e0f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bw-pr96891-1.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */ + +typedef char v16qi __attribute__ ((vector_size (16))); +typedef char v32qi __attribute__ ((vector_size (32))); +typedef short v8hi __attribute__ ((vector_size (16))); +typedef short v16hi __attribute__ ((vector_size (32))); +typedef int v4si __attribute__ ((vector_size (16))); +typedef int v8si __attribute__ ((vector_size (32))); +typedef long long v2di __attribute__ ((vector_size (16))); +typedef long long v4di __attribute__ ((vector_size (32))); + +#define FOO(VTYPE, OPNAME, OP) \ + VTYPE \ + foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b) \ + { \ + return a OP b; \ + } \ + +FOO (v16qi, eq, ==) +FOO (v16qi, gt, >) +FOO (v32qi, eq, ==) +FOO (v32qi, gt, >) +FOO (v8hi, eq, ==) +FOO (v8hi, gt, >) +FOO (v16hi, eq, ==) +FOO (v16hi, gt, >) +FOO (v4si, eq, ==) +FOO (v4si, gt, >) +FOO (v8si, eq, ==) +FOO (v8si, gt, >) +FOO (v2di, eq, ==) +FOO (v2di, gt, >) +FOO (v4di, eq, ==) +FOO (v4di, gt, >) diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c new file mode 100644 index 00000000000..48ba943e151 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-1.c @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */ + +typedef float v4sf __attribute__ ((vector_size (16))); +typedef float v8sf __attribute__ ((vector_size (32))); +typedef double v2df __attribute__ ((vector_size (16))); +typedef double v4df __attribute__ ((vector_size (32))); + +#define FOO(VTYPE, OPNAME, OP) \ + VTYPE \ + foo_##VTYPE##_##OPNAME (VTYPE a, VTYPE b) \ + { \ + return a OP b; \ + } \ + +FOO (v4sf, eq, ==) +FOO (v4sf, neq, !=) +FOO (v4sf, gt, >) +FOO (v4sf, ge, >=) +FOO (v4sf, lt, <) +FOO (v4sf, le, <=) +FOO (v8sf, eq, ==) +FOO (v8sf, neq, !=) +FOO (v8sf, gt, >) +FOO (v8sf, ge, >=) +FOO (v8sf, lt, <) +FOO (v8sf, le, <=) +FOO (v2df, eq, ==) +FOO (v2df, neq, !=) +FOO (v2df, gt, >) +FOO (v2df, ge, >=) +FOO (v2df, lt, <) +FOO (v2df, le, <=) +FOO (v4df, eq, ==) +FOO (v4df, neq, !=) +FOO (v4df, gt, >) +FOO (v4df, ge, >=) +FOO (v4df, lt, <) +FOO (v4df, le, <=) diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c new file mode 100644 index 00000000000..5192a00e0f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512f-pr96891-2.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -mavx512bw -mavx512dq -O2" } */ +/* { dg-final { scan-assembler-not "%k\[0-7\]" } } */ + +#include + +#define FOO(VTYPE,PREFIX,SUFFIX,OPNAME,MASK,LEN) \ + VTYPE \ + foo_##LEN##_##SUFFIX##_##OPNAME (VTYPE a, VTYPE b) \ + { \ + MASK m = _mm##PREFIX##_cmp##OPNAME##_##SUFFIX##_mask (a, b); \ + return _mm##PREFIX##_movm_##SUFFIX (m); \ + } \ + +FOO (__m128i,, epi8, eq, __mmask16, 128); +FOO (__m128i,, epi16, eq, __mmask8, 128); +FOO (__m128i,, epi32, eq, __mmask8, 128); +FOO (__m128i,, epi64, eq, __mmask8, 128); +FOO (__m128i,, epi8, gt, __mmask16, 128); +FOO (__m128i,, epi16, gt, __mmask8, 128); +FOO (__m128i,, epi32, gt, __mmask8, 128); +FOO (__m128i,, epi64, gt, __mmask8, 128); +FOO (__m256i, 256, epi8, eq, __mmask32, 256); +FOO (__m256i, 256, epi16, eq, __mmask16, 256); +FOO (__m256i, 256, epi32, eq, __mmask8, 256); +FOO (__m256i, 256, epi64, eq, __mmask8, 256); +FOO (__m256i, 256, epi8, gt, __mmask32, 256); +FOO (__m256i, 256, epi16, gt, __mmask16, 256); +FOO (__m256i, 256, epi32, gt, __mmask8, 256); +FOO (__m256i, 256, epi64, gt, __mmask8, 256); -- 2.18.1