Message ID | 979a1eeceb7c4c3f7b2068e9b924970760d695ff.camel@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] don't use unaligned vsx for memset of less than 32 bytes | expand |
Hi! On Mon, Jun 25, 2018 at 10:41:32AM -0500, Aaron Sawdey wrote: > In gcc 8 I added support for unaligned vsx in the builtin expansion of > memset(x,0,y). Turns out that for memset of less than 32 bytes, this > doesn't really help much, and it also runs into an egregious load-hit- > store case in CPU2006 components gcc and hmmer. > > This patch reverts to the previous (gcc 7) behavior for memset of 16-31 > bytes, which is to use vsx stores only if the target is 16 byte > aligned. For 32 bytes or more, unaligned vsx stores will still be used. > Performance testing of the memset expansion shows that not much is > given up by using scalar stores for 16-31 bytes, and CPU2006 runs show > the performance regression is fixed. > > Regstrap passes on powerpc64le, ok for trunk and backport to 8? Yes, okay for both. Thanks! Segher > 2018-06-25 Aaron Sawdey <acsawdey@linux.ibm.com> > > * config/rs6000/rs6000-string.c (expand_block_clear): Don't use > unaligned vsx for 16B memset.
Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 261808) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -90,7 +90,9 @@ machine_mode mode = BLKmode; rtx dest; - if (bytes >= 16 && TARGET_ALTIVEC && (align >= 128 || TARGET_EFFICIENT_UNALIGNED_VSX)) + if (TARGET_ALTIVEC + && ((bytes >= 16 && align >= 128) + || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) { clear_bytes = 16; mode = V4SImode;