From patchwork Wed Oct 20 05:30:26 2021
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" <gcc-patches@gcc.gnu.org>
X-Patchwork-Id: 1543728
Return-Path: <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: bilbo.ozlabs.org;
	dkim=pass (1024-bit key;
 unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.a=rsa-sha256
 header.s=default header.b=pE6jG5KS;
	dkim-atps=neutral
Authentication-Results: ozlabs.org;
 spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org
 (client-ip=8.43.85.97; helo=sourceware.org;
 envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org;
 receiver=<UNKNOWN>)
Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest
 SHA256)
	(No client certificate requested)
	by bilbo.ozlabs.org (Postfix) with ESMTPS id 4HYzjx3yZXz9sR4
	for <incoming@patchwork.ozlabs.org>; Wed, 20 Oct 2021 16:31:15 +1100 (AEDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A6DF23858025
	for <incoming@patchwork.ozlabs.org>; Wed, 20 Oct 2021 05:31:13 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6DF23858025
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1634707873;
	bh=anrToFtqBrJFlMC9BAWOUbWjT72hwZgm8Oyh7lto4AI=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=pE6jG5KSYfeVLeslQiA031R/4uw3a7qvpND5m9g0Q++BtZZuUdsH6J00ST7LY4e3w
	 2V404kE7aj9TjpDOvAJ30+ASqsThouPOzBVbuex4engfzUU8gemwboC4A/y3apCvFL
	 PXQLmy37PVZKlaNJsGriBGIp5qx0tCJBsgShHbqA=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by sourceware.org (Postfix) with ESMTPS id A37453858C3A
 for <gcc-patches@gcc.gnu.org>; Wed, 20 Oct 2021 05:30:30 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A37453858C3A
X-IronPort-AV: E=McAfee;i="6200,9189,10142"; a="252176869"
X-IronPort-AV: E=Sophos;i="5.87,166,1631602800"; d="scan'208";a="252176869"
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 19 Oct 2021 22:30:28 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.87,166,1631602800"; d="scan'208";a="551499821"
Received: from scymds01.sc.intel.com ([10.148.94.138])
 by fmsmga004.fm.intel.com with ESMTP; 19 Oct 2021 22:30:28 -0700
Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com
 [10.239.236.50]) by scymds01.sc.intel.com
 with ESMTP id 19K5UQHC015265; Tue, 19 Oct 2021 22:30:27 -0700
To: hjl.tools@gmail.com
Subject: [PATCH] X86: Add an option -muse-unaligned-vector-move
Date: Wed, 20 Oct 2021 13:30:26 +0800
Message-Id: <20211020053026.67998-1-dianhong.xu@intel.com>
X-Mailer: git-send-email 2.18.1
X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: "dianhong.xu--- via Gcc-patches"
 <gcc-patches@gcc.gnu.org>
From: "Li, Pan2 via Gcc-patches" <gcc-patches@gcc.gnu.org>
Reply-To: dianhong.xu@intel.com
Cc: dianhong7@gmail.com, hongtao.liu@intel.com, gcc-patches@gcc.gnu.org,
 dianhong.xu@intel.com
Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org
Sender: "Gcc-patches"
 <gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org>

From: dianhong xu <dianhong.xu@intel.com>

Add -muse-unaligned-vector-move option to emit unaligned vector move
instaructions.

gcc/ChangeLog:

	* config/i386/i386-options.c (ix86_target_string): Add
	-muse-unaligned-vector-move.
	* config/i386/i386.c (ix86_get_ssemov): Emit unaligned vector if use
	the new option.
	* config/i386/i386.opt (muse-unaligned-vector-move): New.
	* config/i386/sse.md: Emit unaligned vector if use this new option
	* doc/invoke.texi: Document -muse-unaligned-vector-move

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx2-vector-unaligned-load-store-1.c: New test.
	* gcc.target/i386/avx2-vector-unaligned-load-store-2.c: New test.
	* gcc.target/i386/avx2-vector-unaligned-load-store-3.c: New test.
	* gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c: New test.
---
 gcc/config/i386/i386-options.c                |   3 +-
 gcc/config/i386/i386.c                        |  41 +++----
 gcc/config/i386/i386.opt                      |   4 +
 gcc/config/i386/sse.md                        |  30 +++--
 gcc/doc/invoke.texi                           |   7 ++
 .../i386/avx2-vector-unaligned-load-store-1.c | 102 +++++++++++++++++
 .../i386/avx2-vector-unaligned-load-store-2.c | 107 ++++++++++++++++++
 .../i386/avx2-vector-unaligned-load-store-3.c |  11 ++
 .../avx512vl-vector-unaligned-load-store-1.c  |  13 +++
 9 files changed, 287 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c
diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index c9523b26f49..eacbd0f5451 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -397,7 +397,8 @@ ix86_target_string (HOST_WIDE_INT isa, HOST_WIDE_INT isa2,
     { "-mstv",				MASK_STV },
     { "-mavx256-split-unaligned-load",	MASK_AVX256_SPLIT_UNALIGNED_LOAD },
     { "-mavx256-split-unaligned-store",	MASK_AVX256_SPLIT_UNALIGNED_STORE },
-    { "-mcall-ms2sysv-xlogues",		MASK_CALL_MS2SYSV_XLOGUES }
+    { "-mcall-ms2sysv-xlogues",		MASK_CALL_MS2SYSV_XLOGUES },
+    { "-muse-unaligned-vector-move",	MASK_USE_UNALIGNED_VECTOR_MOVE }
   };
 
   /* Additional flag options.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f111411e599..7581e854021 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5323,8 +5323,9 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 		 enum attr_mode insn_mode, machine_mode mode)
 {
   char buf[128];
-  bool misaligned_p = (misaligned_operand (operands[0], mode)
-		       || misaligned_operand (operands[1], mode));
+  bool need_unaligned_p = (TARGET_USE_UNALIGNED_VECTOR_MOVE
+			   || misaligned_operand (operands[0], mode)
+			   || misaligned_operand (operands[1], mode));
   bool evex_reg_p = (size == 64
 		     || EXT_REX_SSE_REG_P (operands[0])
 		     || EXT_REX_SSE_REG_P (operands[1]));
@@ -5380,17 +5381,17 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	{
 	case opcode_int:
 	  if (scalar_mode == E_HFmode)
-	    opcode = (misaligned_p
+	    opcode = (need_unaligned_p
 		      ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
 		      : "vmovdqa64");
 	  else
-	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	    opcode = need_unaligned_p ? "vmovdqu32" : "vmovdqa32";
 	  break;
 	case opcode_float:
-	  opcode = misaligned_p ? "vmovups" : "vmovaps";
+	  opcode = need_unaligned_p ? "vmovups" : "vmovaps";
 	  break;
 	case opcode_double:
-	  opcode = misaligned_p ? "vmovupd" : "vmovapd";
+	  opcode = need_unaligned_p ? "vmovupd" : "vmovapd";
 	  break;
 	}
     }
@@ -5399,21 +5400,21 @@ ix86_get_ssemov (rtx *operands, unsigned size,
       switch (scalar_mode)
 	{
 	case E_HFmode:
-	  opcode = (misaligned_p
+	  opcode = (need_unaligned_p
 		    ? (TARGET_AVX512BW ? "vmovdqu16" : "vmovdqu64")
 		    : "vmovdqa64");
 	  break;
 	case E_SFmode:
-	  opcode = misaligned_p ? "%vmovups" : "%vmovaps";
+	  opcode = need_unaligned_p ? "%vmovups" : "%vmovaps";
 	  break;
 	case E_DFmode:
-	  opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
+	  opcode = need_unaligned_p ? "%vmovupd" : "%vmovapd";
 	  break;
 	case E_TFmode:
 	  if (evex_reg_p)
-	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	    opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64";
 	  else
-	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	    opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -5425,13 +5426,13 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	{
 	case E_QImode:
 	  if (evex_reg_p)
-	    opcode = (misaligned_p
+	    opcode = (need_unaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu8"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
 	  else
-	    opcode = (misaligned_p
+	    opcode = (need_unaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu8"
 			 : "%vmovdqu")
@@ -5439,13 +5440,13 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	  break;
 	case E_HImode:
 	  if (evex_reg_p)
-	    opcode = (misaligned_p
+	    opcode = (need_unaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu16"
 			 : "vmovdqu64")
 		      : "vmovdqa64");
 	  else
-	    opcode = (misaligned_p
+	    opcode = (need_unaligned_p
 		      ? (TARGET_AVX512BW
 			 ? "vmovdqu16"
 			 : "%vmovdqu")
@@ -5453,20 +5454,20 @@ ix86_get_ssemov (rtx *operands, unsigned size,
 	  break;
 	case E_SImode:
 	  if (evex_reg_p)
-	    opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+	    opcode = need_unaligned_p ? "vmovdqu32" : "vmovdqa32";
 	  else
-	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	    opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
 	case E_DImode:
 	case E_TImode:
 	case E_OImode:
 	  if (evex_reg_p)
-	    opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	    opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64";
 	  else
-	    opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
+	    opcode = need_unaligned_p ? "%vmovdqu" : "%vmovdqa";
 	  break;
 	case E_XImode:
-	  opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+	  opcode = need_unaligned_p ? "vmovdqu64" : "vmovdqa64";
 	  break;
 	default:
 	  gcc_unreachable ();
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index ad366974b5b..2162d10925a 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1170,3 +1170,7 @@ Support MWAIT and MONITOR built-in functions and code generation.
 mavx512fp16
 Target Mask(ISA2_AVX512FP16) Var(ix86_isa_flags2) Save
 Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, AVX512F and AVX512FP16 built-in functions and code generation.
+
+muse-unaligned-vector-move
+Target Mask(USE_UNALIGNED_VECTOR_MOVE) Save
+Emit unaligned vector move instructions.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index fbf056bf9e6..dc99597f195 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17059,24 +17059,28 @@
   switch (<MODE>mode)
     {
     case E_V8DFmode:
-      if (misaligned_operand (operands[2], <ssequartermode>mode))
+      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	  || misaligned_operand (operands[2], <ssequartermode>mode))
 	return "vmovupd\t{%2, %x0|%x0, %2}";
       else
 	return "vmovapd\t{%2, %x0|%x0, %2}";
     case E_V16SFmode:
-      if (misaligned_operand (operands[2], <ssequartermode>mode))
+      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	  || misaligned_operand (operands[2], <ssequartermode>mode))
 	return "vmovups\t{%2, %x0|%x0, %2}";
       else
 	return "vmovaps\t{%2, %x0|%x0, %2}";
     case E_V8DImode:
-      if (misaligned_operand (operands[2], <ssequartermode>mode))
+      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	  || misaligned_operand (operands[2], <ssequartermode>mode))
 	return which_alternative == 2 ? "vmovdqu64\t{%2, %x0|%x0, %2}"
 				      : "vmovdqu\t{%2, %x0|%x0, %2}";
       else
 	return which_alternative == 2 ? "vmovdqa64\t{%2, %x0|%x0, %2}"
 				      : "vmovdqa\t{%2, %x0|%x0, %2}";
     case E_V16SImode:
-      if (misaligned_operand (operands[2], <ssequartermode>mode))
+      if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	  || misaligned_operand (operands[2], <ssequartermode>mode))
 	return which_alternative == 2 ? "vmovdqu32\t{%2, %x0|%x0, %2}"
 				      : "vmovdqu\t{%2, %x0|%x0, %2}";
       else
@@ -25238,27 +25242,32 @@
       switch (get_attr_mode (insn))
 	{
 	case MODE_V16SF:
-	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
+	  if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	      || misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    return "vmovups\t{%1, %t0|%t0, %1}";
 	  else
 	    return "vmovaps\t{%1, %t0|%t0, %1}";
 	case MODE_V8DF:
-	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
+	  if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	      || misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    return "vmovupd\t{%1, %t0|%t0, %1}";
 	  else
 	    return "vmovapd\t{%1, %t0|%t0, %1}";
 	case MODE_V8SF:
-	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
+	  if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	      || misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    return "vmovups\t{%1, %x0|%x0, %1}";
 	  else
 	    return "vmovaps\t{%1, %x0|%x0, %1}";
 	case MODE_V4DF:
-	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
+	  if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	      || misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    return "vmovupd\t{%1, %x0|%x0, %1}";
 	  else
 	    return "vmovapd\t{%1, %x0|%x0, %1}";
 	case MODE_XI:
-	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
+	  if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	      || misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    {
 	      if (which_alternative == 2)
 		return "vmovdqu\t{%1, %t0|%t0, %1}";
@@ -25277,7 +25286,8 @@
 		return "vmovdqa32\t{%1, %t0|%t0, %1}";
 	    }
 	case MODE_OI:
-	  if (misaligned_operand (operands[1], <ssehalfvecmode>mode))
+	  if (TARGET_USE_UNALIGNED_VECTOR_MOVE
+	      || misaligned_operand (operands[1], <ssehalfvecmode>mode))
 	    {
 	      if (which_alternative == 2)
 		return "vmovdqu\t{%1, %x0|%x0, %1}";
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0cc8a8edd05..13777d62437 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1418,6 +1418,7 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-offset=@var{offset} @gol
 -mstack-protector-guard-symbol=@var{symbol} @gol
 -mgeneral-regs-only  -mcall-ms2sysv-xlogues @gol
+-muse-unaligned-vector-move @gol
 -mindirect-branch=@var{choice}  -mfunction-return=@var{choice} @gol
 -mindirect-branch-register -mneeded}
 
@@ -31808,6 +31809,12 @@ resulting in fairly lengthy prologues and epilogues.  Using
 use stubs in the static portion of libgcc to perform these saves and restores,
 thus reducing function size at the cost of a few extra instructions.
 
+@item -muse-unaligned-vector-move
+@opindex muse-unaligned-vector-move
+@opindex mno-use-unaligned-vector-move
+Use @option{-muse-unaligned-vector-move} to emits unaligned vector move
+instructions like vmovdqu, vmovups, vmovupd.
+
 @item -mtls-dialect=@var{type}
 @opindex mtls-dialect
 Generate code to access thread-local storage using the @samp{gnu} or
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c
new file mode 100644
index 00000000000..d21eee562ac
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-1.c
@@ -0,0 +1,102 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -muse-unaligned-vector-move" } */
+
+#define N 1024
+
+char **cp;
+char **ep;
+char **fp;
+
+void
+test_char ()
+{
+  int i;
+  char **ap = __builtin_assume_aligned (ep, 32);
+  char **zp;
+  for (i = 128; i > 0; i--)
+  {
+    *ap++ = *cp++;
+    *zp++ = *fp++;
+  }
+}
+
+float f1[N], f2[N], f3[N];
+
+void
+test_float (void)
+{
+  for (int i = 0; i < N; i++)
+  {
+    f3[i] = f1[i] * f2[i];
+  }
+}
+
+double d1[N], d2[N], d3[N];
+
+void
+test_double_load (void)
+{
+  for (int i = 0; i < N; i++)
+  {
+    d3[i] = d1[i] * d2[i];
+
+  }
+}
+
+unsigned char uc1[N], uc2[N], uc3[N];
+void
+test_unchar ()
+{
+   for (int i=0;i<N;i++) {
+     uc3[i] = uc1[i] * uc2[i];
+   }
+}
+
+short st1[N], st2[N], st3[N];
+void
+test_short ()
+{
+   for (int i=0;i<N;i++) {
+     st3[i] = st1[i] * st2[i];
+   }
+}
+
+int n1[N], n2[N], n3[N];
+void
+test_int ()
+{
+   for (int i=0;i<N;i++) {
+     n3[i] = n1[i] * n2[i];
+   }
+}
+
+long l1[N], l2[N], l3[N];
+
+void
+test_long ()
+{
+  for (int i=0; i<N; i++)
+  {
+    l3[i] = l1[i] *l2[i];
+  }
+}
+
+long long ll1[N], ll2[N], ll3[N];
+
+void
+test_long_long()
+{
+  for (int i=0;i<N;i++) 
+  {
+    ll3[i] = ll1[i]*ll2[i];
+  }
+}
+
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
+/* { dg-final { scan-assembler-not "vmovaps" } } */
+/* { dg-final { scan-assembler-not "vmovapd" } } */
+/* { dg-final { scan-assembler-times "vmovdqu" 19 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "vmovdqu" 46 { target x32 } } } */
+/* { dg-final { scan-assembler-times "vmovdqu" 47 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "vmovups" 2 } } */
+/* { dg-final { scan-assembler-times "vmovupd" 2 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c
new file mode 100644
index 00000000000..65c81105ebd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-2.c
@@ -0,0 +1,107 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -muse-unaligned-vector-move" } */
+
+#include <immintrin.h>
+__m128 value128;
+char src128[16];
+
+__m256 value256;
+float src256[8];
+
+void add128(__m128* pointer) {
+    value128 = _mm_add_ps(value128, *pointer);
+}
+
+void add256(__m256* pointer) {
+    value256 = _mm256_add_ps(value256, *pointer);
+}
+
+__m128d value128d;
+__m128d aux128d;
+float src128f[4];
+float res128f[4];
+double src128d[2];
+double res128d[2];
+
+void add128d(__m128d* pointer, __m128d aux, __m128d* res128d) {
+    value128d = _mm_add_pd(value128d, *pointer);
+    __m128d s1 = _mm_add_pd(aux, *pointer);
+    *res128d = _mm_add_pd(s1, value128d);
+}
+
+__m256d value256d;
+__m256d aux256d;
+float src256f[8];
+float res256f[8];
+double src256d[4];
+double res256d[4];
+
+void add256d(__m256d* pointer, __m256d aux, __m256d* res) {
+    value256d = _mm256_add_pd(value256d, *pointer);
+    __m256d s1 = _mm256_add_pd(aux, *pointer);
+    *res = _mm256_add_pd(s1, value256d);
+}
+
+__m256i value256i;
+__m256i aux256i;
+char src256c[32];
+char res256c[32];
+short src256s[16];
+short res256s[16];
+int src256i[8];
+int res256i[8];
+long long src256l[4];
+long long res256l[4];
+
+void add256i(__m256i* pointer, __m256i aux, __m256i* res) {
+    value256i = _mm256_add_epi32(value256i, *pointer);
+    __m256i s1 = _mm256_add_epi32(aux, *pointer);
+    *res = _mm256_add_epi32(s1, value256i);
+}
+
+void foo1() {
+    add128((__m128*)src128);
+}
+
+void foo2() {
+    add256((__m256*)src256);
+}
+
+void foo3() {
+    add128d((__m128d*)src128d, aux128d, (__m128d*)res128d);
+}
+
+void foo4() {
+    add128d((__m128d*)src128f, aux128d, (__m128d*)res128f);
+}
+
+void foo5() {
+    add256d((__m256d*)src256f, aux256d, (__m256d*)res256f);
+}
+
+void foo6() {
+    add256d((__m256d*)src256d, aux256d, (__m256d*)res256d);
+}
+
+void foo7() {
+    add256i((__m256i*)src256c, aux256i, (__m256i*)res256c);
+}
+
+void foo8() {
+    add256i((__m256i*)src256s, aux256i, (__m256i*)res256s);
+}
+
+void foo9() {
+    add256i((__m256i*)src256i, aux256i, (__m256i*)res256i);
+}
+
+void foo11() {
+    add256i((__m256i*)src256l, aux256i, (__m256i*)res256l);
+}
+
+/* { dg-final { scan-assembler-not "vmovaps" } } */
+/* { dg-final { scan-assembler-not "vmovapd" } } */
+/* { dg-final { scan-assembler-not "vmovdqa" } } */
+/* { dg-final { scan-assembler "vmovups" } } */
+/* { dg-final { scan-assembler "vmovupd" } } */
+/* { dg-final { scan-assembler "vmovdqu" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c
new file mode 100644
index 00000000000..59924304bae
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vector-unaligned-load-store-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2 -mno-use-unaligned-vector-move" } */
+
+#include "avx2-vector-unaligned-load-store-2.c"
+
+/* { dg-final { scan-assembler-not "vmovups" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "vmovupd" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "vmovdqu" } } */
+/* { dg-final { scan-assembler "vmovaps" } } */
+/* { dg-final { scan-assembler "vmovapd" } } */
+/* { dg-final { scan-assembler "vmovdqa" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c
new file mode 100644
index 00000000000..3759fd9f2f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512vl-vector-unaligned-load-store-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx512vl -muse-unaligned-vector-move" } */
+
+#include "avx2-vector-unaligned-load-store-1.c"
+
+/* { dg-final { scan-assembler-not "vmovdqa32" } } */
+/* { dg-final { scan-assembler-not "vmovdqa64" } } */
+/* { dg-final { scan-assembler-not "vmovaps" } } */
+/* { dg-final { scan-assembler-not "vmovapd" } } */
+/* { dg-final { scan-assembler "vmovdqu32" } } */
+/* { dg-final { scan-assembler "vmovdqu64" } } */
+/* { dg-final { scan-assembler "vmovups" } } */
+/* { dg-final { scan-assembler "vmovupd" } } */