From patchwork Sun Mar 3 14:32:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 1050864 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-497273-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="KIua8LKW"; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LNGdAPSt"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44C5Gs1B5xz9sBr for ; Mon, 4 Mar 2019 01:32:50 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version :content-transfer-encoding; q=dns; s=default; b=Mu1fQV8Sabnf9yqk UvM0p4fMnzFlVVEpK2C+5Yta2HAAuD/aKu8b3UTg418IlTrLP/kFGidyKe63uaU8 yV05Fdaw2BxaNI295sezwnnNJbHAvEszmXYb7g/Uz1M2xARtwvZYJZMXM5sXx2KH zW9kFYRSG53bnH7IF8UbI6FXXiQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version :content-transfer-encoding; s=default; bh=6yEQEJb+C4Og4scdxg8TJr VRvCI=; b=KIua8LKWS9p/y0cEgkRnkxQS2ytzomLvXJw/vQCvsHw0ShElYyNukb wwG8U5GZctf1uuSIWmPSNpWhyH5MijusP/H/8WOTf2q3tWZ/clnQX6WNjVy6ZMEv gcl1oCpOFoUX9pIPmZ84Y3bQ8UwZcQmdNGxUG6xp/zFblB1AN7Pwk= Received: (qmail 98655 invoked by alias); 3 Mar 2019 14:32:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 98598 invoked by uid 89); 3 Mar 2019 14:32:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=sk:TREE_IN, gimplify.c, gimplifyc, HX-Received:6554 X-HELO: mail-pg1-f193.google.com Received: from mail-pg1-f193.google.com (HELO mail-pg1-f193.google.com) (209.85.215.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 03 Mar 2019 14:32:34 +0000 Received: by mail-pg1-f193.google.com with SMTP id h8so1255618pgp.6 for ; Sun, 03 Mar 2019 06:32:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=bdvCFAV5LyMgdIq5lI1G9wiCIkEE72AL4CjGLxrijls=; b=LNGdAPStG0VMd3NMx0OBOQZWm+ldkpJglAok2QO6piwtsxQicOWyR58WUsiZV4Uzcn GQLNyhFq+mch15bGu0eNaWWFsVgHvV2ZzvT7FHlBZfSQxMUnSgmdI2f+CfxROmivClF1 5FkVKzGRzDm1qqSbW4ubbzVDDsqbvrA0Wd4RKkdlyNKaFt16aSG+nqCYiLogo7u1VQog 3QR2QU7fPmyskx2heJhVyLiWdZ3z0UnSEQ3IztfCOIQEk7aBQbNBtncaxJdWDh5Obn9Q pE8Ga5HM/tfmO+hSK5WRi581J1rhKNfG1aM11u9ZJ+CrEv2eoK9lNy1Zdvoy+0pxnova vL7w== Received: from gnu-cfl-2.localdomain (c-73-93-86-59.hsd1.ca.comcast.net. [73.93.86.59]) by smtp.gmail.com with ESMTPSA id q18sm5708488pgv.9.2019.03.03.06.32.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 03 Mar 2019 06:32:32 -0800 (PST) Received: from gnu-cfl-2.hsd1.ca.comcast.net (localhost [IPv6:::1]) by gnu-cfl-2.localdomain (Postfix) with ESMTP id 050C3C0110; Sun, 3 Mar 2019 06:32:31 -0800 (PST) From: "H.J. Lu" To: gcc-patches@gcc.gnu.org Cc: Richard Guenther Subject: [PATCH] Optimize vector init constructor Date: Sun, 3 Mar 2019 06:32:30 -0800 Message-Id: <20190303143230.19742-1-hjl.tools@gmail.com> MIME-Version: 1.0 X-IsSubscribed: yes For vector init constructor: --- typedef float __v4sf __attribute__ ((__vector_size__ (16))); __v4sf foo (__v4sf x, float f) { __v4sf y = { f, x[1], x[2], x[3] }; return y; } --- we can optimize vector init constructor with vector copy or permute followed by a single scalar insert: __v4sf D.1912; __v4sf D.1913; __v4sf D.1914; __v4sf y; x.0_1 = x; D.1912 = x.0_1; _2 = D.1912; D.1913 = _2; BIT_FIELD_REF = f; y = D.1913; D.1914 = y; return D.1914; instead of __v4sf D.1962; __v4sf y; _1 = BIT_FIELD_REF ; _2 = BIT_FIELD_REF ; _3 = BIT_FIELD_REF ; y = {f, _1, _2, _3}; D.1962 = y; return D.1962; gcc/ PR tree-optimization/88828 * gimplify.c (gimplify_init_constructor): Optimize vector init constructor with vector copy or permute followed by a single scalar insert. gcc/testsuite/ PR tree-optimization/88828 * gcc.target/i386/pr88828-1.c: New test. * gcc.target/i386/pr88828-2.c: Likewise. * gcc.target/i386/pr88828-3a.c: Likewise. * gcc.target/i386/pr88828-3b.c: Likewise. * gcc.target/i386/pr88828-4a.c: Likewise. * gcc.target/i386/pr88828-4b.c: Likewise. * gcc.target/i386/pr88828-5a.c: Likewise. * gcc.target/i386/pr88828-5b.c: Likewise. * gcc.target/i386/pr88828-6a.c: Likewise. * gcc.target/i386/pr88828-6b.c: Likewise. --- gcc/gimplify.c | 176 +++++++++++++++++++-- gcc/testsuite/gcc.target/i386/pr88828-1.c | 16 ++ gcc/testsuite/gcc.target/i386/pr88828-2.c | 17 ++ gcc/testsuite/gcc.target/i386/pr88828-3a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr88828-3b.c | 18 +++ gcc/testsuite/gcc.target/i386/pr88828-4a.c | 17 ++ gcc/testsuite/gcc.target/i386/pr88828-4b.c | 20 +++ gcc/testsuite/gcc.target/i386/pr88828-5a.c | 16 ++ gcc/testsuite/gcc.target/i386/pr88828-5b.c | 18 +++ gcc/testsuite/gcc.target/i386/pr88828-6a.c | 17 ++ gcc/testsuite/gcc.target/i386/pr88828-6b.c | 19 +++ 11 files changed, 336 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-3a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-3b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-4a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-4b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-5a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-5b.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-6a.c create mode 100644 gcc/testsuite/gcc.target/i386/pr88828-6b.c diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 983635ba21f..893a4311f9e 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -5082,22 +5082,170 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, TREE_CONSTANT (ctor) = 0; } - /* Vector types use CONSTRUCTOR all the way through gimple - compilation as a general initializer. */ - FOR_EACH_VEC_SAFE_ELT (elts, ix, ce) + tree rhs_vector = NULL; + /* The vector element to replace scalar elements, which + will be overridden by scalar insert. */ + tree vector_element = NULL; + /* The single scalar element. */ + tree scalar_element = NULL; + unsigned int scalar_idx = 0; + enum { unknown, copy, permute, init } operation = unknown; + bool insert = false; + + /* Check if we can generate vector copy or permute followed by + a single scalar insert. */ + if (TYPE_VECTOR_SUBPARTS (type).is_constant ()) { - enum gimplify_status tret; - tret = gimplify_expr (&ce->value, pre_p, post_p, is_gimple_val, - fb_rvalue); - if (tret == GS_ERROR) - ret = GS_ERROR; - else if (TREE_STATIC (ctor) - && !initializer_constant_valid_p (ce->value, - TREE_TYPE (ce->value))) - TREE_STATIC (ctor) = 0; + /* If all RHS vector elements come from the same vector, + we can use permute. If all RHS vector elements come + from the same vector in the same order, we can use + copy. */ + unsigned int nunits + = TYPE_VECTOR_SUBPARTS (type).to_constant (); + unsigned int nscalars = 0; + unsigned int nvectors = 0; + operation = unknown; + FOR_EACH_VEC_SAFE_ELT (elts, ix, ce) + if (TREE_CODE (ce->value) == ARRAY_REF + || TREE_CODE (ce->value) == ARRAY_RANGE_REF) + { + if (!vector_element) + vector_element = ce->value; + /* Get the vector index. */ + tree idx = TREE_OPERAND (ce->value, 1); + if (TREE_CODE (idx) == INTEGER_CST) + { + /* Get the RHS vector. */ + tree r = ce->value; + while (handled_component_p (r)) + r = TREE_OPERAND (r, 0); + if (type == TREE_TYPE (r)) + { + /* The RHS vector has the same type as + LHS. */ + if (rhs_vector == NULL) + rhs_vector = r; + + /* Check if all RHS vector elements come + fome the same vector. */ + if (rhs_vector == r) + { + nvectors++; + if (TREE_INT_CST_LOW (idx) == ix + && (operation == unknown + || operation == copy)) + operation = copy; + else + operation = permute; + continue; + } + } + } + + /* Otherwise, use vector init. */ + break; + } + else if (TREE_CODE (TYPE_SIZE (TREE_TYPE (ce->value))) + == INTEGER_CST) + { + /* Only allow one single scalar insert. */ + if (nscalars != 0) + break; + nscalars = 1; + insert = true; + scalar_idx = ix; + scalar_element = ce->value; + } + + /* Allow a single scalar insert with vector copy or + vector permute. Vector copy without insert is OK. */ + if (nunits != (nscalars + nvectors) + || (nscalars == 0 && operation != copy)) + operation = unknown; + } + + if (operation == unknown) + { + /* Default to the regular vector init constructor. */ + operation = init; + insert = false; + } + + if (operation == copy) + { + /* Generate a vector copy. */ + tree var = create_tmp_var (type); + if (gimplify_expr (&rhs_vector, pre_p, post_p, + is_gimple_val, fb_rvalue) == GS_ERROR) + { + ret = GS_ERROR; + break; + } + gassign *init = gimple_build_assign (var, rhs_vector); + gimple_seq_add_stmt (pre_p, init); + if (gimplify_expr (&var, pre_p, post_p, is_gimple_val, + fb_rvalue) == GS_ERROR) + { + ret = GS_ERROR; + break; + } + /* Replace RHS with the vector copy. */ + if (!is_gimple_reg (TREE_OPERAND (*expr_p, 0))) + TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (var, pre_p); + else + TREE_OPERAND (*expr_p, 1) = var; + } + else + { + /* Prepare for vector permute by replacing the scalar + element with the vector one. */ + if (operation == permute) + (elts->address())[scalar_idx].value = vector_element; + + /* Vector types use CONSTRUCTOR all the way through gimple + compilation as a general initializer. */ + FOR_EACH_VEC_SAFE_ELT (elts, ix, ce) + { + enum gimplify_status tret; + tret = gimplify_expr (&ce->value, pre_p, post_p, + is_gimple_val, + fb_rvalue); + if (tret == GS_ERROR) + ret = GS_ERROR; + else if (TREE_STATIC (ctor) + && !initializer_constant_valid_p (ce->value, + TREE_TYPE (ce->value))) + TREE_STATIC (ctor) = 0; + } + if (!is_gimple_reg (TREE_OPERAND (*expr_p, 0))) + TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (ctor, pre_p); + } + + if (insert) + { + /* Generate a single scalar insert after vector copy or + permute. */ + tree rhs = TREE_OPERAND (*expr_p, 1); + tree var = create_tmp_var (type); + gassign *init = gimple_build_assign (var, rhs); + gimple_seq_add_stmt (pre_p, init); + if (gimplify_expr (&scalar_element, pre_p, post_p, + is_gimple_val, fb_rvalue) == GS_ERROR) + { + ret = GS_ERROR; + break; + } + tree scalar_type = TREE_TYPE (scalar_element); + tree scalar_size = TYPE_SIZE (scalar_type); + tree bitpos = bitsize_int (scalar_idx + * TREE_INT_CST_LOW (scalar_size)); + tree ref = build3_loc (EXPR_LOCATION (rhs), BIT_FIELD_REF, + scalar_type, var, scalar_size, + bitpos); + init = gimple_build_assign (ref, scalar_element); + gimplify_seq_add_stmt (pre_p, init); + TREE_OPERAND (*expr_p, 1) = var; } - if (!is_gimple_reg (TREE_OPERAND (*expr_p, 0))) - TREE_OPERAND (*expr_p, 1) = get_formal_tmp_var (ctor, pre_p); } break; diff --git a/gcc/testsuite/gcc.target/i386/pr88828-1.c b/gcc/testsuite/gcc.target/i386/pr88828-1.c new file mode 100644 index 00000000000..4ef1feab389 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mno-sse4" } */ +/* { dg-final { scan-assembler "movss" } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "movlhps" } } */ +/* { dg-final { scan-assembler-not "unpcklps" } } */ +/* { dg-final { scan-assembler-not "shufps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { f, x[1], x[2], x[3] }; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-2.c b/gcc/testsuite/gcc.target/i386/pr88828-2.c new file mode 100644 index 00000000000..6dc482b6f4b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mno-sse4" } */ +/* { dg-final { scan-assembler "movss" } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "movlhps" } } */ +/* { dg-final { scan-assembler-not "unpcklps" } } */ +/* { dg-final { scan-assembler-not "shufps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = x; + y[0] = f; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-3a.c b/gcc/testsuite/gcc.target/i386/pr88828-3a.c new file mode 100644 index 00000000000..97eb8e7162a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-3a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mno-sse4" } */ +/* { dg-final { scan-assembler "movss" } } */ +/* { dg-final { scan-assembler-times "shufps" 1 } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "movlhps" } } */ +/* { dg-final { scan-assembler-not "unpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { f, x[0], x[2], x[3] }; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-3b.c b/gcc/testsuite/gcc.target/i386/pr88828-3b.c new file mode 100644 index 00000000000..ab2ba730716 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-3b.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ +/* { dg-final { scan-assembler-times "vpermilps" 1 } } */ +/* { dg-final { scan-assembler-times "vmovss" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpinsrd" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vmovss" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vmovaps" } } */ +/* { dg-final { scan-assembler-not "vmovlhps" } } */ +/* { dg-final { scan-assembler-not "vunpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { f, x[0], x[2], x[3] }; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-4a.c b/gcc/testsuite/gcc.target/i386/pr88828-4a.c new file mode 100644 index 00000000000..a54689be701 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-4a.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mno-sse4" } */ +/* { dg-final { scan-assembler "movss" } } */ +/* { dg-final { scan-assembler-times "shufps" 1 } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "movlhps" } } */ +/* { dg-final { scan-assembler-not "unpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { x[0], x[2], x[3], x[1] }; + y[0] = f; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-4b.c b/gcc/testsuite/gcc.target/i386/pr88828-4b.c new file mode 100644 index 00000000000..0c3a1024d93 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-4b.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ +/* { dg-final { scan-assembler-times "vpermilps" 1 } } */ +/* { dg-final { scan-assembler-times "vmovss" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vpinsrd" 1 { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vmovss" { target ia32 } } } */ +/* { dg-final { scan-assembler-not "vshufps" } } */ +/* { dg-final { scan-assembler-not "vmovaps" } } */ +/* { dg-final { scan-assembler-not "vmovlhps" } } */ +/* { dg-final { scan-assembler-not "vunpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { x[0], x[2], x[3], x[1] }; + y[0] = f; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-5a.c b/gcc/testsuite/gcc.target/i386/pr88828-5a.c new file mode 100644 index 00000000000..534808d3cd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-5a.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mno-sse4" } */ +/* { dg-final { scan-assembler "movss" } } */ +/* { dg-final { scan-assembler-times "shufps" 2 } } */ +/* { dg-final { scan-assembler-times "movaps" 1 } } */ +/* { dg-final { scan-assembler-not "movlhps" } } */ +/* { dg-final { scan-assembler-not "unpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { x[0], x[2], x[3], f }; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-5b.c b/gcc/testsuite/gcc.target/i386/pr88828-5b.c new file mode 100644 index 00000000000..aebea790979 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-5b.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ +/* { dg-final { scan-assembler-times "vpermilps" 1 } } */ +/* { dg-final { scan-assembler-times "vinsertps" 1 } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ +/* { dg-final { scan-assembler-not "vshufps" } } */ +/* { dg-final { scan-assembler-not "vmovaps" } } */ +/* { dg-final { scan-assembler-not "vmovlhps" } } */ +/* { dg-final { scan-assembler-not "vunpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { x[0], x[2], x[3], f }; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-6a.c b/gcc/testsuite/gcc.target/i386/pr88828-6a.c new file mode 100644 index 00000000000..d43a36d9137 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-6a.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse -mno-sse4" } */ +/* { dg-final { scan-assembler "movss" } } */ +/* { dg-final { scan-assembler-times "shufps" 2 } } */ +/* { dg-final { scan-assembler-times "movaps" 1 } } */ +/* { dg-final { scan-assembler-not "movlhps" } } */ +/* { dg-final { scan-assembler-not "unpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { x[0], x[2], x[3], x[0] }; + y[3] = f; + return y; +} diff --git a/gcc/testsuite/gcc.target/i386/pr88828-6b.c b/gcc/testsuite/gcc.target/i386/pr88828-6b.c new file mode 100644 index 00000000000..6856fe6500e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr88828-6b.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx" } */ +/* { dg-final { scan-assembler-times "vpermilps" 1 } } */ +/* { dg-final { scan-assembler-times "vinsertps" 1 } } */ +/* { dg-final { scan-assembler-not "vshufps" } } */ +/* { dg-final { scan-assembler-not "vmovss" } } */ +/* { dg-final { scan-assembler-not "vmovaps" } } */ +/* { dg-final { scan-assembler-not "vmovlhps" } } */ +/* { dg-final { scan-assembler-not "vunpcklps" } } */ + +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +__v4sf +foo (__v4sf x, float f) +{ + __v4sf y = { x[0], x[2], x[3], x[0] }; + y[3] = f; + return y; +}