From patchwork Wed Sep 8 10:02:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 1525737 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=WWZManDr; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4H4Hkt33Hlz9t0Y for ; Wed, 8 Sep 2021 20:03:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DF39E385043E for ; Wed, 8 Sep 2021 10:02:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DF39E385043E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1631095378; bh=rpMW2lDsIlkoaeG8CiCQaJ3IpbRjr42MOdy17j4U6hE=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=WWZManDr7RCEzmE0reQ+dH+WZVM2dzaVkVgwxrkYMQLUK2+3AV/dOe+vcPSU883Xm AHSHEc5B34K2pbXWOqpZqesgRcxRTtVCQq9/pRTYx3oJJC9feRzV8v4/7IZQ7zj37B zjvccPSKToOHQsfq+OXhp6a08ikIOo4WNjrvGqfQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by sourceware.org (Postfix) with ESMTPS id CC65C3858415 for ; Wed, 8 Sep 2021 10:02:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CC65C3858415 X-IronPort-AV: E=McAfee;i="6200,9189,10100"; a="200642314" X-IronPort-AV: E=Sophos;i="5.85,277,1624345200"; d="scan'208";a="200642314" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2021 03:02:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.85,277,1624345200"; d="scan'208";a="431277806" Received: from scymds01.sc.intel.com ([10.148.94.138]) by orsmga003.jf.intel.com with ESMTP; 08 Sep 2021 03:02:05 -0700 Received: from shliclel219.sh.intel.com (shliclel219.sh.intel.com [10.239.236.219]) by scymds01.sc.intel.com with ESMTP id 188A24II031674; Wed, 8 Sep 2021 03:02:04 -0700 To: gcc-patches@gcc.gnu.org Subject: [PATCH] Optimize vec_extract for 256/512-bit vector when index exceeds the lower 128 bits. Date: Wed, 8 Sep 2021 18:02:03 +0800 Message-Id: <20210908100203.791504-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_NONE, TXREP, T_SPF_TEMPERROR autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" Hi: As decribed in PR, valign{d,q} can be used for vector extract one element. For elements located in the lower 128 bits, only one instruction is needed, so this patch only optimizes elements located above 128 bits. The optimization is like: - vextracti32x8 $0x1, %zmm0, %ymm0 - vmovd %xmm0, %eax + valignd $8, %zmm0, %zmm0, %zmm1 + vmovd %xmm1, %eax - vextracti32x8 $0x1, %zmm0, %ymm0 - vextracti128 $0x1, %ymm0, %xmm0 - vpextrd $3, %xmm0, %eax + valignd $15, %zmm0, %zmm0, %zmm1 + vmovd %xmm1, %eax - vextractf64x2 $0x1, %ymm0, %xmm0 + valignq $2, %ymm0, %ymm0, %ymm0 - vextractf64x4 $0x1, %zmm0, %ymm0 - vextractf64x2 $0x1, %ymm0, %xmm0 - vunpckhpd %xmm0, %xmm0, %xmm0 + valignq $7, %zmm0, %zmm0, %zmm0 Bootstrapped and regtested on x86_64-linux-gnu{-m32,}. gcc/ChangeLog: PR target/91103 * config/i386/sse.md (*vec_extract_valign): New define_insn. gcc/testsuite/ChangeLog: PR target/91103 * gcc.target/i386/pr91103-1.c: New test. * gcc.target/i386/pr91103-2.c: New test. --- gcc/config/i386/sse.md | 32 +++++++++ gcc/testsuite/gcc.target/i386/pr91103-1.c | 37 +++++++++++ gcc/testsuite/gcc.target/i386/pr91103-2.c | 81 +++++++++++++++++++++++ 3 files changed, 150 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr91103-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr91103-2.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 5785e73241c..57c736ff44a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -232,6 +232,12 @@ (define_mode_iterator V48_AVX512VL V16SF (V8SF "TARGET_AVX512VL") (V4SF "TARGET_AVX512VL") V8DF (V4DF "TARGET_AVX512VL") (V2DF "TARGET_AVX512VL")]) +(define_mode_iterator V48_256_512_AVX512VL + [V16SI (V8SI "TARGET_AVX512VL") + V8DI (V4DI "TARGET_AVX512VL") + V16SF (V8SF "TARGET_AVX512VL") + V8DF (V4DF "TARGET_AVX512VL")]) + ;; 1,2 byte AVX-512{BW,VL} vector modes. Supposed TARGET_AVX512BW baseline. (define_mode_iterator VI12_AVX512VL [V64QI (V16QI "TARGET_AVX512VL") (V32QI "TARGET_AVX512VL") @@ -786,6 +792,15 @@ (define_mode_attr sseinsnmode (V4SF "V4SF") (V2DF "V2DF") (TI "TI")]) +(define_mode_attr sseintvecinsnmode + [(V64QI "XI") (V32HI "XI") (V16SI "XI") (V8DI "XI") (V4TI "XI") + (V32QI "OI") (V16HI "OI") (V8SI "OI") (V4DI "OI") (V2TI "OI") + (V16QI "TI") (V8HI "TI") (V4SI "TI") (V2DI "TI") (V1TI "TI") + (V16SF "XI") (V8DF "XI") + (V8SF "OI") (V4DF "OI") + (V4SF "TI") (V2DF "TI") + (TI "TI")]) + ;; SSE constant -1 constraint (define_mode_attr sseconstm1 [(V64QI "BC") (V32HI "BC") (V16SI "BC") (V8DI "BC") (V4TI "BC") @@ -10326,6 +10341,23 @@ (define_insn "_align" [(set_attr "prefix" "evex") (set_attr "mode" "")]) +(define_mode_attr vec_extract_imm_predicate + [(V16SF "const_0_to_15_operand") (V8SF "const_0_to_7_operand") + (V16SI "const_0_to_15_operand") (V8SI "const_0_to_7_operand") + (V8DF "const_0_to_7_operand") (V4DF "const_0_to_3_operand") + (V8DI "const_0_to_7_operand") (V4DI "const_0_to_3_operand")]) + +(define_insn "*vec_extract_valign" + [(set (match_operand: 0 "register_operand" "=v") + (vec_select: + (match_operand:V48_256_512_AVX512VL 1 "register_operand" "v") + (parallel [(match_operand 2 "")])))] + "TARGET_AVX512F + && INTVAL(operands[2]) >= 16 / GET_MODE_SIZE (mode)" + "valign\t{%2, %1, %1, %0|%0, %1, %1, %2}"; + [(set_attr "prefix" "evex") + (set_attr "mode" "")]) + (define_expand "avx512f_shufps512_mask" [(match_operand:V16SF 0 "register_operand") (match_operand:V16SF 1 "register_operand") diff --git a/gcc/testsuite/gcc.target/i386/pr91103-1.c b/gcc/testsuite/gcc.target/i386/pr91103-1.c new file mode 100644 index 00000000000..11caaa8bd1b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr91103-1.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "valign\[dq\]" 16 } } */ + +typedef float v8sf __attribute__((vector_size(32))); +typedef float v16sf __attribute__((vector_size(64))); +typedef int v8si __attribute__((vector_size(32))); +typedef int v16si __attribute__((vector_size(64))); +typedef double v4df __attribute__((vector_size(32))); +typedef double v8df __attribute__((vector_size(64))); +typedef long long v4di __attribute__((vector_size(32))); +typedef long long v8di __attribute__((vector_size(64))); + +#define EXTRACT(V,S,IDX) \ + S \ + __attribute__((noipa)) \ + foo_##V##_##IDX (V v) \ + { \ + return v[IDX]; \ + } \ + +EXTRACT (v8sf, float, 4); +EXTRACT (v8sf, float, 7); +EXTRACT (v8si, int, 4); +EXTRACT (v8si, int, 7); +EXTRACT (v16sf, float, 8); +EXTRACT (v16sf, float, 15); +EXTRACT (v16si, int, 8); +EXTRACT (v16si, int, 15); +EXTRACT (v4df, double, 2); +EXTRACT (v4df, double, 3); +EXTRACT (v4di, long long, 2); +EXTRACT (v4di, long long, 3); +EXTRACT (v8df, double, 4); +EXTRACT (v8df, double, 7); +EXTRACT (v8di, long long, 4); +EXTRACT (v8di, long long, 7); diff --git a/gcc/testsuite/gcc.target/i386/pr91103-2.c b/gcc/testsuite/gcc.target/i386/pr91103-2.c new file mode 100644 index 00000000000..010e4775723 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr91103-2.c @@ -0,0 +1,81 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavx512vl" } */ +/* { dg-require-effective-target avx512vl } */ + +#define AVX512VL + +#ifndef CHECK +#define CHECK "avx512f-helper.h" +#endif + +#include CHECK +#include "pr91103-1.c" + +#define RUNCHECK(U,V,S,IDX) \ + do \ + { \ + S tmp = foo_##V##_##IDX ((V)U.x); \ + if (tmp != U.a[IDX]) \ + abort(); \ + } \ + while (0) + +void +test_256 (void) +{ + union512i_d di1; + union256i_d di2; + union512i_q q1; + union256i_q q2; + union512 f1; + union256 f2; + union512d d1; + union256d d2; + int sign = 1; + + int i = 0; + for (i = 0; i < 16; i++) + { + di1.a[i] = 30 * (i - 30) * sign; + f1.a[i] = 56.78 * (i - 30) * sign; + sign = -sign; + } + + for (i = 0; i != 8; i++) + { + di2.a[i] = 15 * (i + 40) * sign; + f2.a[i] = 90.12 * (i + 40) * sign; + q1.a[i] = 15 * (i + 40) * sign; + d1.a[i] = 90.12 * (i + 40) * sign; + sign = -sign; + } + + for (i = 0; i != 4; i++) + { + q2.a[i] = 15 * (i + 40) * sign; + d2.a[i] = 90.12 * (i + 40) * sign; + sign = -sign; + } + +RUNCHECK (f2, v8sf, float, 4); +RUNCHECK (f2, v8sf, float, 7); +RUNCHECK (di2, v8si, int, 4); +RUNCHECK (di2, v8si, int, 7); +RUNCHECK (f1, v16sf, float, 8); +RUNCHECK (f1, v16sf, float, 15); +RUNCHECK (di1, v16si, int, 8); +RUNCHECK (di1, v16si, int, 15); +RUNCHECK (d2, v4df, double, 2); +RUNCHECK (d2, v4df, double, 3); +RUNCHECK (q2, v4di, long long, 2); +RUNCHECK (q2, v4di, long long, 3); +RUNCHECK (d1, v8df, double, 4); +RUNCHECK (d1, v8df, double, 7); +RUNCHECK (q1, v8di, long long, 4); +RUNCHECK (q1, v8di, long long, 7); +} + +void +test_128() +{ +}