From patchwork Tue Apr 18 17:01:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 1770401 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=f+d6yB7m; dkim-atps=neutral Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4Q19FG18MLz1yZy for ; Wed, 19 Apr 2023 03:01:54 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6048A3857706 for ; Tue, 18 Apr 2023 17:01:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6048A3857706 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681837311; bh=N+U9elVOU8xEBYS9DYgSkDBOgFG0RRGeg9wGbtxSo9Y=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=f+d6yB7mT7QgiewFo7/qTyu/656RxfwqFpdAr30888mzL4uKVZyaql3nTQ2eH8xVz sOUo7LLWZmpoXW5PG+btbsm/scMgvfealGzaoiOJ1AZaFKZX0uiC62W15Afuy0j1K1 XvcmV7LsAi8LhFSZIQR+OcppY/uaJY/a7U85ymOc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by sourceware.org (Postfix) with ESMTPS id 9DF6E3858D1E for ; Tue, 18 Apr 2023 17:01:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9DF6E3858D1E Received: by mail-qt1-x82e.google.com with SMTP id z1so19455708qti.5 for ; Tue, 18 Apr 2023 10:01:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681837287; x=1684429287; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=N+U9elVOU8xEBYS9DYgSkDBOgFG0RRGeg9wGbtxSo9Y=; b=WGHgzOT6zt1qceYNkAhxiqaTegD2ozCCNnIeqIvvr0Eo5AZFuNqSz4GuTop8UCubNb 9pM5kOBiQLHS65wTF9csJSqDFTQBlLqM4Vk4GtiFv6rdp/dJiDlI80Bx+InwbjKQbD+S olasAHGwPX8VdaurJ3jkpb8R6rHZq/2OWF4V9BmSg4gAs0X7JPCKA4K2MfSJ6cGYrLke jF7z1brUNuS7FqkWO5Y5b9dsK3cx7Xg0qZYtZqX5XmaPmfSSZrfQ2Gv9o6RblxL3/PkD V/tyuHHRG16FMHfIs+PIRd+eJlAScscYEEsoeeDkCiuoC0oLc5lAKt59IA5l6bZhhz2q S3ww== X-Gm-Message-State: AAQBX9fJRp86lST0sl0E7xYQ4oQDJNGqtycfDWDrhsMkWUD0VWpKM21v aAZ1I6Y22wFOVn/K3xrcaH+sfAurVo7jubcisyl4cXb+u5vhMQ== X-Google-Smtp-Source: AKy350Zm58rhSodU4WSJAkR0YvOh2wqdPZ3H1EGtRFTWBhG+V4hVoJtxwM5W1TTGvr4IEH1dUsBzH711KgHYAb6fJng= X-Received: by 2002:a05:622a:19a8:b0:3ef:3680:b6da with SMTP id u40-20020a05622a19a800b003ef3680b6damr941986qtc.16.1681837287453; Tue, 18 Apr 2023 10:01:27 -0700 (PDT) MIME-Version: 1.0 Date: Tue, 18 Apr 2023 19:01:16 +0200 Message-ID: Subject: [PATCH] i386: Improve permutations with INSERTPS instruction [PR94908] To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-8.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" INSERTPS can select any element from src and insert into any place of the dest. For SSE4.1 targets, compiler can generate e.g. insertps $64, %xmm0, %xmm1 to insert element 1 from %xmm1 to element 0 of %xmm0. gcc/ChangeLog: PR target/94908 * config/i386/i386-builtin.def (__builtin_ia32_insertps128): Use CODE_FOR_sse4_1_insertps_v4sf. * config/i386/i386-expand.cc (expand_vec_perm_insertps): New. (expand_vec_perm_1): Call expand_vec_per_insertps. * config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here. * config/i386/mmx.md (mmxscalarmode): New mode attribute. (@sse4_1_insertps_): New insn pattern. * config/i386/sse.md (@sse4_1_insertps_): Macroize insn pattern from sse4_1_insertps using VI4F_128 mode iterator. gcc/testsuite/ChangeLog: PR target/94908 * gcc.target/i386/pr94908.c: New test. * gcc.target/i386/sse4_1-insertps-5.c: New test. * gcc.target/i386/vperm-v4sf-2-sse4.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 6dae6972d81..f7cf105ae69 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -942,7 +942,7 @@ BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_blendvpd, "__builtin_ia32_blen BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_blendvps", IX86_BUILTIN_BLENDVPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SF) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dppd, "__builtin_ia32_dppd", IX86_BUILTIN_DPPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_dpps, "__builtin_ia32_dpps", IX86_BUILTIN_DPPS, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) -BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_insertps, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) +BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_insertps_v4sf, "__builtin_ia32_insertps128", IX86_BUILTIN_INSERTPS128, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_INT) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_mpsadbw, "__builtin_ia32_mpsadbw128", IX86_BUILTIN_MPSADBW128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_INT) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_nothing, "__builtin_ia32_pblendvb128", IX86_BUILTIN_PBLENDVB128, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI) BDESC (OPTION_MASK_ISA_SSE4_1, 0, CODE_FOR_sse4_1_pblendw, "__builtin_ia32_pblendw128", IX86_BUILTIN_PBLENDW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_INT) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 0d817fc3f3b..9fa549c4c3b 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -18985,6 +18985,78 @@ expand_vec_perm_movs (struct expand_vec_perm_d *d) return true; } +/* A subroutine of ix86_expand_vec_perm_const_1. Try to implement D + using insertps. */ +static bool +expand_vec_perm_insertps (struct expand_vec_perm_d *d) +{ + machine_mode vmode = d->vmode; + unsigned i, cnt_s, nelt = d->nelt; + int cnt_d = -1; + rtx src, dst; + + if (d->one_operand_p) + return false; + + if (!(TARGET_SSE4_1 + && (vmode == V4SFmode || vmode == V4SImode + || (TARGET_MMX_WITH_SSE + && (vmode == V2SFmode || vmode == V2SImode))))) + return false; + + for (i = 0; i < nelt; ++i) + { + if (d->perm[i] == i) + continue; + if (cnt_d != -1) + { + cnt_d = -1; + break; + } + cnt_d = i; + } + + if (cnt_d == -1) + { + for (i = 0; i < nelt; ++i) + { + if (d->perm[i] == i + nelt) + continue; + if (cnt_d != -1) + return false; + cnt_d = i; + } + + if (cnt_d == -1) + return false; + } + + if (d->testing_p) + return true; + + gcc_assert (cnt_d != -1); + + cnt_s = d->perm[cnt_d]; + if (cnt_s < nelt) + { + src = d->op0; + dst = d->op1; + } + else + { + cnt_s -= nelt; + src = d->op1; + dst = d->op0; + } + gcc_assert (cnt_s < nelt); + + rtx x = gen_sse4_1_insertps (vmode, d->target, dst, src, + GEN_INT (cnt_s << 6 | cnt_d << 4)); + emit_insn (x); + + return true; +} + /* A subroutine of ix86_expand_vec_perm_const_1. Try to implement D in terms of blendp[sd] / pblendw / pblendvb / vpblendd. */ @@ -19918,6 +19990,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) if (expand_vec_perm_movs (d)) return true; + /* Try the SSE4.1 insertps instruction. */ + if (expand_vec_perm_insertps (d)) + return true; + /* Try the fully general two operand permute. */ if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt, d->testing_p)) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index ed689b044c3..1419ea4cff3 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -120,6 +120,7 @@ (define_c_enum "unspec" [ UNSPEC_MASKMOV UNSPEC_MOVCC_MASK UNSPEC_MOVMSK + UNSPEC_INSERTPS UNSPEC_BLENDV UNSPEC_PSHUFB UNSPEC_XOP_PERMUTE diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 18dae03ad0a..872ddbc55f2 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -106,6 +106,10 @@ (define_mode_attr mmxintvecmode (define_mode_attr mmxintvecmodelower [(V2SF "v2si") (V2SI "v2si") (V4HI "v4hi") (V8QI "v8qi")]) +;; Mapping of vector modes back to the scalar modes +(define_mode_attr mmxscalarmode + [(V2SI "SI") (V2SF "SF")]) + (define_mode_attr Yv_Yw [(V8QI "Yw") (V4HI "Yw") (V2SI "Yv") (V1DI "Yv") (V2SF "Yv")]) @@ -1154,6 +1158,42 @@ (define_expand "vcondv2sf" DONE; }) +(define_insn "@sse4_1_insertps_" + [(set (match_operand:V2FI 0 "register_operand" "=Yr,*x,v") + (unspec:V2FI + [(match_operand:V2FI 2 "nonimmediate_operand" "Yrm,*xm,vm") + (match_operand:V2FI 1 "register_operand" "0,0,v") + (match_operand:SI 3 "const_0_to_255_operand")] + UNSPEC_INSERTPS))] + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" +{ + if (MEM_P (operands[2])) + { + unsigned count_s = INTVAL (operands[3]) >> 6; + if (count_s) + operands[3] = GEN_INT (INTVAL (operands[3]) & 0x3f); + operands[2] = adjust_address_nv (operands[2], + mode, count_s * 4); + } + switch (which_alternative) + { + case 0: + case 1: + return "insertps\t{%3, %2, %0|%0, %2, %3}"; + case 2: + return "vinsertps\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + default: + gcc_unreachable (); + } +} + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "sselog") + (set_attr "prefix_data16" "1,1,*") + (set_attr "prefix_extra" "1") + (set_attr "length_immediate" "1") + (set_attr "prefix" "orig,orig,maybe_evex") + (set_attr "mode" "V4SF")]) + (define_insn "*mmx_blendps" [(set (match_operand:V2SF 0 "register_operand" "=Yr,*x,x") (vec_merge:V2SF diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 513960e8f33..5dca8dd1e27 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -38,7 +38,6 @@ (define_c_enum "unspec" [ UNSPEC_INSERTQ ;; For SSE4.1 support - UNSPEC_INSERTPS UNSPEC_DP UNSPEC_MOVNTDQA UNSPEC_MPSADBW @@ -10959,12 +10958,13 @@ (define_insn_and_split "*vec_setv2di_0_zero_extendsi_1" DONE; }) -(define_insn "sse4_1_insertps" - [(set (match_operand:V4SF 0 "register_operand" "=Yr,*x,v") - (unspec:V4SF [(match_operand:V4SF 2 "nonimmediate_operand" "Yrm,*xm,vm") - (match_operand:V4SF 1 "register_operand" "0,0,v") - (match_operand:SI 3 "const_0_to_255_operand")] - UNSPEC_INSERTPS))] +(define_insn "@sse4_1_insertps_" + [(set (match_operand:VI4F_128 0 "register_operand" "=Yr,*x,v") + (unspec:VI4F_128 + [(match_operand:VI4F_128 2 "nonimmediate_operand" "Yrm,*xm,vm") + (match_operand:VI4F_128 1 "register_operand" "0,0,v") + (match_operand:SI 3 "const_0_to_255_operand")] + UNSPEC_INSERTPS))] "TARGET_SSE4_1" { if (MEM_P (operands[2])) @@ -10972,7 +10972,8 @@ (define_insn "sse4_1_insertps" unsigned count_s = INTVAL (operands[3]) >> 6; if (count_s) operands[3] = GEN_INT (INTVAL (operands[3]) & 0x3f); - operands[2] = adjust_address_nv (operands[2], SFmode, count_s * 4); + operands[2] = adjust_address_nv (operands[2], + mode, count_s * 4); } switch (which_alternative) { diff --git a/gcc/testsuite/gcc.target/i386/pr94908.c b/gcc/testsuite/gcc.target/i386/pr94908.c new file mode 100644 index 00000000000..11a5f90e5dd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr94908.c @@ -0,0 +1,14 @@ +/* PR target/94908 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef float v4sf __attribute__((vector_size(16))); + +v4sf g(); + +v4sf f(v4sf a, v4sf b) +{ + return (v4sf){g()[1], a[1], a[2], a[3]}; +} + +/* { dg-final { scan-assembler "\[ \t\]v?insertps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-insertps-5.c b/gcc/testsuite/gcc.target/i386/sse4_1-insertps-5.c new file mode 100644 index 00000000000..d9c4cfc81d4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-insertps-5.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse4.1" } */ + +typedef unsigned int v4si __attribute__((vector_size(16))); +typedef float v4sf __attribute__((vector_size(16))); + +v4si foo_1(v4si x, v4si y) { return (v4si){x[0],y[3],x[2],x[3]}; } +v4si foo_2(v4si x, v4si y) { return (v4si){y[0],x[2],y[2],y[3]}; } +v4si foo_3(v4si x, v4si y) { return (v4si){x[3],y[1],y[2],y[3]}; } + +v4sf bar_1(v4sf x, v4sf y) { return (v4sf){y[0],x[3],y[2],y[3]}; } +v4sf bar_2(v4sf x, v4sf y) { return (v4sf){x[0],y[2],x[2],x[3]}; } +v4sf bar_3(v4sf x, v4sf y) { return (v4sf){y[3],x[1],x[2],x[3]}; } + +/* { dg-final { scan-assembler-times "\tv?insertps\t" 6 } } */ +/* { dg-final { scan-assembler-not "pshufd" } } */ +/* { dg-final { scan-assembler-not "pblendw" } } */ +/* { dg-final { scan-assembler-not "shufps" } } */ +/* { dg-final { scan-assembler-not "blendps" } } */ diff --git a/gcc/testsuite/gcc.target/i386/vperm-v4sf-2-sse4.c b/gcc/testsuite/gcc.target/i386/vperm-v4sf-2-sse4.c new file mode 100644 index 00000000000..ed5963efc5a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vperm-v4sf-2-sse4.c @@ -0,0 +1,4 @@ +/* { dg-do run } */ +/* { dg-require-effective-target sse4 } */ +/* { dg-options "-O -msse4" } */ +#include "vperm-v4sf-2.c"