From patchwork Wed Oct 20 05:30:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 1543728 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256 header.s=default header.b=pE6jG5KS; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HYzjx3yZXz9sR4 for ; Wed, 20 Oct 2021 16:31:15 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A6DF23858025 for ; Wed, 20 Oct 2021 05:31:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6DF23858025 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1634707873; bh=anrToFtqBrJFlMC9BAWOUbWjT72hwZgm8Oyh7lto4AI=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=pE6jG5KSYfeVLeslQiA031R/4uw3a7qvpND5m9g0Q++BtZZuUdsH6J00ST7LY4e3w 2V404kE7aj9TjpDOvAJ30+ASqsThouPOzBVbuex4engfzUU8gemwboC4A/y3apCvFL PXQLmy37PVZKlaNJsGriBGIp5qx0tCJBsgShHbqA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id A37453858C3A for ; Wed, 20 Oct 2021 05:30:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A37453858C3A X-IronPort-AV: E=McAfee;i="6200,9189,10142"; a="252176869" X-IronPort-AV: E=Sophos;i="5.87,166,1631602800"; d="scan'208";a="252176869" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Oct 2021 22:30:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.87,166,1631602800"; d="scan'208";a="551499821" Received: from scymds01.sc.intel.com ([10.148.94.138]) by fmsmga004.fm.intel.com with ESMTP; 19 Oct 2021 22:30:28 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.236.50]) by scymds01.sc.intel.com with ESMTP id 19K5UQHC015265; Tue, 19 Oct 2021 22:30:27 -0700 To: hjl.tools@gmail.com Subject: [PATCH] X86: Add an option -muse-unaligned-vector-move Date: Wed, 20 Oct 2021 13:30:26 +0800 Message-Id: <20211020053026.67998-1-dianhong.xu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "dianhong.xu--- via Gcc-patches" From: "Li, Pan2 via Gcc-patches" Reply-To: dianhong.xu@intel.com Cc: dianhong7@gmail.com, hongtao.liu@intel.com, gcc-patches@gcc.gnu.org, dianhong.xu@intel.com Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org Sender: "Gcc-patches" From: dianhong xu Add -muse-unaligned-vector-move option to emit unaligned vector move instaructions. gcc/ChangeLog: * config/i386/i386-options.c (ix86_target_string): Add -muse-unaligned-vector-move. * config/i386/i386.c (ix86_get_ssemov): Emit unaligned vector if use the new option. * config/i386/i386.opt (muse-unaligned-vector-move): New. * config/i386/sse.md: Emit unaligned vector if use this new option * doc/invoke.texi: Document -muse-unaligned-vector-move gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-vector-unaligned-load-store-1.c: New test. * gcc.target/i386/avx2-vector-unaligned-load-store-2.c: New test. * gcc.target/i386/avx2-vector-unaligned-load-store-3.c: New test. * gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c: New test. --- gcc/config/i386/i386-options.c | 3 +- gcc/config/i386/i386.c | 41 +++---- gcc/config/i386/i386.opt | 4 + gcc/config/i386/sse.md | 30 +++-- gcc/doc/invoke.texi | 7 ++ .../i386/avx2-vector-unaligned-load-store-1.c | 102 +++++++++++++++++ .../i386/avx2-vector-unaligned-load-store-2.c | 107 ++++++++++++++++++ .../i386/avx2-vector-unaligned-load-store-3.c | 11 ++ .../avx512vl-vector-unaligned-load-store-1.c | 13 +++ 9 files changed, 287 insertions(+), 31 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index c9523b26f49..eacbd0f5451 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -397,7 +397,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2, { "-mstv", MASK_STV }, { "-mavx256-split-unaligned-load", MASK_AVX256_SPLIT_UNALIGNED_LOAD }, { "-mavx256-split-unaligned-store", MASK_AVX256_SPLIT_UNALIGNED_STORE }, - { "-mcall-ms2sysv-xlogues", MASK_CALL_MS2SYSV_XLOGUES } + { "-mcall-ms2sysv-xlogues", MASK_CALL_MS2SYSV_XLOGUES }, + { "-muse-unaligned-vector-move", MASK_USE_UNALIGNED_VECTOR_MOVE } }; /* Additional flag options. */ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index f111411e599..7581e854021 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5323,8 +5323,9 @@ ix86_get_ssemov (rtx *operands, unsigned size, enum attr_mode insn_mode, machine_mode mode) { char buf[128]; - bool misaligned_p = (misaligned_operand (operands[0], mode) - || misaligned_operand (operands[1], mode)); + bool need_unaligned_p = (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[0], mode) + || misaligned_operand (operands[1], mode)); bool evex_reg_p = (size == 64 || EXT_REX_SSE_REG_P (operands[0]) || EXT_REX_SSE_REG_P (operands[1])); @@ -5380,17 +5381,17 @@ ix86_get_ssemov (rtx *operands, unsigned size, { case opcode_int: if (scalar_mode == E_HFmode) - opcode = (misaligned_p + opcode = (need_unaligned_p ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") : "vmovdqa64"); else - opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; + opcode = need_unaligned_p ? "vmovdqu32" : "vmovdqa32"; break; case opcode_float: - opcode = misaligned_p ? "vmovups" : "vmovaps"; + opcode = need_unaligned_p ? "vmovups" : "vmovaps"; break; case opcode_double: - opcode = misaligned_p ? "vmovupd" : "vmovapd"; + opcode = need_unaligned_p ? "vmovupd" : "vmovapd"; break; } } @@ -5399,21 +5400,21 @@ ix86_get_ssemov (rtx *operands, unsigned size, switch (scalar_mode) { case E_HFmode: - opcode = (misaligned_p + opcode = (need_unaligned_p ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") : "vmovdqa64"); break; case E_SFmode: - opcode = misaligned_p ? "%vmovups" : "%vmovaps"; + opcode = need_unaligned_p ? "%vmovups" : "%vmovaps"; break; case E_DFmode: - opcode = misaligned_p ? "%vmovupd" : "%vmovapd"; + opcode = need_unaligned_p ? "%vmovupd" : "%vmovapd"; break; case E_TFmode: if (evex_reg_p) - opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; + opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64"; else - opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; + opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa"; break; default: gcc_unreachable (); @@ -5425,13 +5426,13 @@ ix86_get_ssemov (rtx *operands, unsigned size, { case E_QImode: if (evex_reg_p) - opcode = (misaligned_p + opcode = (need_unaligned_p ? (TARGET_AVX512BW ? "vmovdqu8" : "vmovdqu64") : "vmovdqa64"); else - opcode = (misaligned_p + opcode = (need_unaligned_p ? (TARGET_AVX512BW ? "vmovdqu8" : "%vmovdqu") @@ -5439,13 +5440,13 @@ ix86_get_ssemov (rtx *operands, unsigned size, break; case E_HImode: if (evex_reg_p) - opcode = (misaligned_p + opcode = (need_unaligned_p ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64") : "vmovdqa64"); else - opcode = (misaligned_p + opcode = (need_unaligned_p ? (TARGET_AVX512BW ? "vmovdqu16" : "%vmovdqu") @@ -5453,20 +5454,20 @@ ix86_get_ssemov (rtx *operands, unsigned size, break; case E_SImode: if (evex_reg_p) - opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32"; + opcode = need_unaligned_p ? "vmovdqu32" : "vmovdqa32"; else - opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; + opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa"; break; case E_DImode: case E_TImode: case E_OImode: if (evex_reg_p) - opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; + opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64"; else - opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa"; + opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa"; break; case E_XImode: - opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64"; + opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64"; break; default: gcc_unreachable (); diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index ad366974b5b..2162d10925a 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1170,3 +1170,7 @@ Support MWAIT and MONITOR built-in functions and code generation. mavx512fp16 Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation. + +muse-unaligned-vector-move +Target Mask(USE_UNALIGNED_VECTOR_MOVE) Save +Emit unaligned vector move instructions. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index fbf056bf9e6..dc99597f195 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -17059,24 +17059,28 @@ switch (mode) { case E_V8DFmode: - if (misaligned_operand (operands[2], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[2], mode)) return "vmovupd\t{%2, %x0|%x0, %2}"; else return "vmovapd\t{%2, %x0|%x0, %2}"; case E_V16SFmode: - if (misaligned_operand (operands[2], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[2], mode)) return "vmovups\t{%2, %x0|%x0, %2}"; else return "vmovaps\t{%2, %x0|%x0, %2}"; case E_V8DImode: - if (misaligned_operand (operands[2], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[2], mode)) return which_alternative == 2 ? "vmovdqu64\t{%2, %x0|%x0, %2}" : "vmovdqu\t{%2, %x0|%x0, %2}"; else return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}" : "vmovdqa\t{%2, %x0|%x0, %2}"; case E_V16SImode: - if (misaligned_operand (operands[2], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[2], mode)) return which_alternative == 2 ? "vmovdqu32\t{%2, %x0|%x0, %2}" : "vmovdqu\t{%2, %x0|%x0, %2}"; else @@ -25238,27 +25242,32 @@ switch (get_attr_mode (insn)) { case MODE_V16SF: - if (misaligned_operand (operands[1], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[1], mode)) return "vmovups\t{%1, %t0|%t0, %1}"; else return "vmovaps\t{%1, %t0|%t0, %1}"; case MODE_V8DF: - if (misaligned_operand (operands[1], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[1], mode)) return "vmovupd\t{%1, %t0|%t0, %1}"; else return "vmovapd\t{%1, %t0|%t0, %1}"; case MODE_V8SF: - if (misaligned_operand (operands[1], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[1], mode)) return "vmovups\t{%1, %x0|%x0, %1}"; else return "vmovaps\t{%1, %x0|%x0, %1}"; case MODE_V4DF: - if (misaligned_operand (operands[1], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[1], mode)) return "vmovupd\t{%1, %x0|%x0, %1}"; else return "vmovapd\t{%1, %x0|%x0, %1}"; case MODE_XI: - if (misaligned_operand (operands[1], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[1], mode)) { if (which_alternative == 2) return "vmovdqu\t{%1, %t0|%t0, %1}"; @@ -25277,7 +25286,8 @@ return "vmovdqa32\t{%1, %t0|%t0, %1}"; } case MODE_OI: - if (misaligned_operand (operands[1], mode)) + if (TARGET_USE_UNALIGNED_VECTOR_MOVE + || misaligned_operand (operands[1], mode)) { if (which_alternative == 2) return "vmovdqu\t{%1, %x0|%x0, %1}"; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 0cc8a8edd05..13777d62437 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1418,6 +1418,7 @@ See RS/6000 and PowerPC Options. -mstack-protector-guard-offset=@var{offset} @gol -mstack-protector-guard-symbol=@var{symbol} @gol -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol +-muse-unaligned-vector-move @gol -mindirect-branch=@var{choice} -mfunction-return=@var{choice} @gol -mindirect-branch-register -mneeded} @@ -31808,6 +31809,12 @@ resulting in fairly lengthy prologues and epilogues. Using use stubs in the static portion of libgcc to perform these saves and restores, thus reducing function size at the cost of a few extra instructions. +@item -muse-unaligned-vector-move +@opindex muse-unaligned-vector-move +@opindex mno-use-unaligned-vector-move +Use @option{-muse-unaligned-vector-move} to emits unaligned vector move +instructions like vmovdqu, vmovups, vmovupd. + @item -mtls-dialect=@var{type} @opindex mtls-dialect Generate code to access thread-local storage using the @samp{gnu} or diff --git a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c new file mode 100644 index 00000000000..d21eee562ac --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c @@ -0,0 +1,102 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx2 -muse-unaligned-vector-move" } */ + +#define N 1024 + +char **cp; +char **ep; +char **fp; + +void +test_char () +{ + int i; + char **ap = __builtin_assume_aligned (ep, 32); + char **zp; + for (i = 128; i > 0; i--) + { + *ap++ = *cp++; + *zp++ = *fp++; + } +} + +float f1[N], f2[N], f3[N]; + +void +test_float (void) +{ + for (int i = 0; i < N; i++) + { + f3[i] = f1[i] * f2[i]; + } +} + +double d1[N], d2[N], d3[N]; + +void +test_double_load (void) +{ + for (int i = 0; i < N; i++) + { + d3[i] = d1[i] * d2[i]; + + } +} + +unsigned char uc1[N], uc2[N], uc3[N]; +void +test_unchar () +{ + for (int i=0;i +__m128 value128; +char src128[16]; + +__m256 value256; +float src256[8]; + +void add128(__m128* pointer) { + value128 = _mm_add_ps(value128, *pointer); +} + +void add256(__m256* pointer) { + value256 = _mm256_add_ps(value256, *pointer); +} + +__m128d value128d; +__m128d aux128d; +float src128f[4]; +float res128f[4]; +double src128d[2]; +double res128d[2]; + +void add128d(__m128d* pointer, __m128d aux, __m128d* res128d) { + value128d = _mm_add_pd(value128d, *pointer); + __m128d s1 = _mm_add_pd(aux, *pointer); + *res128d = _mm_add_pd(s1, value128d); +} + +__m256d value256d; +__m256d aux256d; +float src256f[8]; +float res256f[8]; +double src256d[4]; +double res256d[4]; + +void add256d(__m256d* pointer, __m256d aux, __m256d* res) { + value256d = _mm256_add_pd(value256d, *pointer); + __m256d s1 = _mm256_add_pd(aux, *pointer); + *res = _mm256_add_pd(s1, value256d); +} + +__m256i value256i; +__m256i aux256i; +char src256c[32]; +char res256c[32]; +short src256s[16]; +short res256s[16]; +int src256i[8]; +int res256i[8]; +long long src256l[4]; +long long res256l[4]; + +void add256i(__m256i* pointer, __m256i aux, __m256i* res) { + value256i = _mm256_add_epi32(value256i, *pointer); + __m256i s1 = _mm256_add_epi32(aux, *pointer); + *res = _mm256_add_epi32(s1, value256i); +} + +void foo1() { + add128((__m128*)src128); +} + +void foo2() { + add256((__m256*)src256); +} + +void foo3() { + add128d((__m128d*)src128d, aux128d, (__m128d*)res128d); +} + +void foo4() { + add128d((__m128d*)src128f, aux128d, (__m128d*)res128f); +} + +void foo5() { + add256d((__m256d*)src256f, aux256d, (__m256d*)res256f); +} + +void foo6() { + add256d((__m256d*)src256d, aux256d, (__m256d*)res256d); +} + +void foo7() { + add256i((__m256i*)src256c, aux256i, (__m256i*)res256c); +} + +void foo8() { + add256i((__m256i*)src256s, aux256i, (__m256i*)res256s); +} + +void foo9() { + add256i((__m256i*)src256i, aux256i, (__m256i*)res256i); +} + +void foo11() { + add256i((__m256i*)src256l, aux256i, (__m256i*)res256l); +} + +/* { dg-final { scan-assembler-not "vmovaps" } } */ +/* { dg-final { scan-assembler-not "vmovapd" } } */ +/* { dg-final { scan-assembler-not "vmovdqa" } } */ +/* { dg-final { scan-assembler "vmovups" } } */ +/* { dg-final { scan-assembler "vmovupd" } } */ +/* { dg-final { scan-assembler "vmovdqu" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c new file mode 100644 index 00000000000..59924304bae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx2 -mno-use-unaligned-vector-move" } */ + +#include "avx2-vector-unaligned-load-store-2.c" + +/* { dg-final { scan-assembler-not "vmovups" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "vmovupd" { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-not "vmovdqu" } } */ +/* { dg-final { scan-assembler "vmovaps" } } */ +/* { dg-final { scan-assembler "vmovapd" } } */ +/* { dg-final { scan-assembler "vmovdqa" } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c new file mode 100644 index 00000000000..3759fd9f2f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mavx512vl -muse-unaligned-vector-move" } */ + +#include "avx2-vector-unaligned-load-store-1.c" + +/* { dg-final { scan-assembler-not "vmovdqa32" } } */ +/* { dg-final { scan-assembler-not "vmovdqa64" } } */ +/* { dg-final { scan-assembler-not "vmovaps" } } */ +/* { dg-final { scan-assembler-not "vmovapd" } } */ +/* { dg-final { scan-assembler "vmovdqu32" } } */ +/* { dg-final { scan-assembler "vmovdqu64" } } */ +/* { dg-final { scan-assembler "vmovups" } } */ +/* { dg-final { scan-assembler "vmovupd" } } */