[rs6000] don't use unaligned vsx for memset of less than 32 bytes
diff mbox series

Message ID 979a1eeceb7c4c3f7b2068e9b924970760d695ff.camel@linux.ibm.com
State New
Headers show
Series
  • [rs6000] don't use unaligned vsx for memset of less than 32 bytes
Related show

Commit Message

Aaron Sawdey June 25, 2018, 3:41 p.m. UTC
In gcc 8 I added support for unaligned vsx in the builtin expansion of
memset(x,0,y). Turns out that for memset of less than 32 bytes, this
doesn't really help much, and it also runs into an egregious load-hit-
store case in CPU2006 components gcc and hmmer.

This patch reverts to the previous (gcc 7) behavior for memset of 16-31 
bytes, which is to use vsx stores only if the target is 16 byte
aligned. For 32 bytes or more, unaligned vsx stores will still be used.
  Performance testing of the memset expansion shows that not much is
given up by using scalar stores for 16-31 bytes, and CPU2006 runs show
the performance regression is fixed.

Regstrap passes on powerpc64le, ok for trunk and backport to 8?

Thanks,
   Aaron

2018-06-25  Aaron Sawdey  <acsawdey@linux.ibm.com>

	* config/rs6000/rs6000-string.c (expand_block_clear): Don't use
	unaligned vsx for 16B memset.

Comments

Segher Boessenkool June 26, 2018, 4:01 p.m. UTC | #1
Hi!

On Mon, Jun 25, 2018 at 10:41:32AM -0500, Aaron Sawdey wrote:
> In gcc 8 I added support for unaligned vsx in the builtin expansion of
> memset(x,0,y). Turns out that for memset of less than 32 bytes, this
> doesn't really help much, and it also runs into an egregious load-hit-
> store case in CPU2006 components gcc and hmmer.
> 
> This patch reverts to the previous (gcc 7) behavior for memset of 16-31 
> bytes, which is to use vsx stores only if the target is 16 byte
> aligned. For 32 bytes or more, unaligned vsx stores will still be used.
>   Performance testing of the memset expansion shows that not much is
> given up by using scalar stores for 16-31 bytes, and CPU2006 runs show
> the performance regression is fixed.
> 
> Regstrap passes on powerpc64le, ok for trunk and backport to 8?

Yes, okay for both.  Thanks!


Segher


> 2018-06-25  Aaron Sawdey  <acsawdey@linux.ibm.com>
> 
> 	* config/rs6000/rs6000-string.c (expand_block_clear): Don't use
> 	unaligned vsx for 16B memset.

Patch
diff mbox series

Index: gcc/config/rs6000/rs6000-string.c
===================================================================
--- gcc/config/rs6000/rs6000-string.c	(revision 261808)
+++ gcc/config/rs6000/rs6000-string.c	(working copy)
@@ -90,7 +90,9 @@ 
       machine_mode mode = BLKmode;
       rtx dest;
 
-      if (bytes >= 16 && TARGET_ALTIVEC && (align >= 128 || TARGET_EFFICIENT_UNALIGNED_VSX))
+      if (TARGET_ALTIVEC
+	  && ((bytes >= 16 && align >= 128)
+	      || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX)))
 	{
 	  clear_bytes = 16;
 	  mode = V4SImode;