Message ID | 0e5a2fa3-47df-47d4-89cb-5c421a1e366b@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | [rs6000] better use of unaligned vsx in memset() expansion | expand |
On Mon, Nov 26, 2018 at 03:08:32PM -0600, Aaron Sawdey wrote: > When I previously added the use of unaligned vsx stores to inline expansion > of memset, I didn't do a good job of managing boundary conditions. The intention > was to only use unaligned vsx if the block being cleared was more than 32 bytes. > What it actually did was to prevent the use of unaligned vsx for the last 32 > bytes of any block being cleared. So this change puts the test up front so it > is not affected by the decrement of bytes. Oh wow. Yes, that isn't so great. Okay for trunk (and whatever backports). Thanks, Segher > 2018-11-26 Aaron Sawdey <acsawdey@linux.ibm.com> > > * config/rs6000/rs6000-string.c (expand_block_clear): Change how > we determine if unaligned vsx is ok.
The first version of this had a big bug and cleared past the requested bytes. This version passes regstrap on ppc64le(power7/8/9), ppc64be(power6/7/8), and ppc32(power8). OK for trunk (and 8 backport after a week)? Thanks! Aaron Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 266524) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -85,6 +85,8 @@ if (! optimize_size && bytes > 8 * clear_step) return 0; + bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); + for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) { machine_mode mode = BLKmode; @@ -91,8 +93,7 @@ rtx dest; if (TARGET_ALTIVEC - && ((bytes >= 16 && align >= 128) - || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) + && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok))) { clear_bytes = 16; mode = V4SImode; On 11/26/18 4:29 PM, Segher Boessenkool wrote: > On Mon, Nov 26, 2018 at 03:08:32PM -0600, Aaron Sawdey wrote: >> When I previously added the use of unaligned vsx stores to inline expansion >> of memset, I didn't do a good job of managing boundary conditions. The intention >> was to only use unaligned vsx if the block being cleared was more than 32 bytes. >> What it actually did was to prevent the use of unaligned vsx for the last 32 >> bytes of any block being cleared. So this change puts the test up front so it >> is not affected by the decrement of bytes. > > Oh wow. Yes, that isn't so great. Okay for trunk (and whatever backports). > Thanks, > > > Segher > > >> 2018-11-26 Aaron Sawdey <acsawdey@linux.ibm.com> >> >> * config/rs6000/rs6000-string.c (expand_block_clear): Change how >> we determine if unaligned vsx is ok. >
On Wed, Nov 28, 2018 at 01:24:01PM -0600, Aaron Sawdey wrote: > The first version of this had a big bug and cleared past the requested bytes. > This version passes regstrap on ppc64le(power7/8/9), ppc64be(power6/7/8), > and ppc32(power8). > > OK for trunk (and 8 backport after a week)? > @@ -91,8 +93,7 @@ > rtx dest; > > if (TARGET_ALTIVEC > - && ((bytes >= 16 && align >= 128) > - || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) > + && (bytes >= 16 && ( align >= 128 || unaligned_vsx_ok))) Please remove the stray space? Okay for trunk and later for 8, thanks! Segher
Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 266219) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -85,14 +85,14 @@ if (! optimize_size && bytes > 8 * clear_step) return 0; + bool unaligned_vsx_ok = (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX); + for (offset = 0; bytes > 0; offset += clear_bytes, bytes -= clear_bytes) { machine_mode mode = BLKmode; rtx dest; - if (TARGET_ALTIVEC - && ((bytes >= 16 && align >= 128) - || (bytes >= 32 && TARGET_EFFICIENT_UNALIGNED_VSX))) + if (TARGET_ALTIVEC && ((bytes >= 16 && align >= 128) || unaligned_vsx_ok)) { clear_bytes = 16; mode = V4SImode;