Message ID | YqNi7DDZC+9Eqzsm@toto.the-meissners.org |
---|---|
State | New |
Headers | show |
Series | [V2] Disable generating load/store vector pairs for block copies. | expand |
Hi! On Fri, Jun 10, 2022 at 11:27:40AM -0400, Michael Meissner wrote: > Testing has found that using store vector pair for block copies can result > in a slow down on power10. This patch disables using the vector pair > instructions for block copies if we are tuning for power10. Load paired should be disabled as well, for the same reason. The patch seems to do that fine? Please fix the commit message. Thanks, Segher > 2022-06-09 Michael Meissner <meissner@linux.ibm.com> > > gcc/ > * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do > not generate block copies with vector pair instructions if we are > tuning for power10. > --- > gcc/config/rs6000/rs6000.cc | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 0af2085adc0..59481d9ac70 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -4141,7 +4141,10 @@ rs6000_option_override_internal (bool global_init_p) > > if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) > { > - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) > + /* Do not generate lxvp and stxvp on power10 since there are some > + performance issues. */ > + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX > + && rs6000_tune != PROCESSOR_POWER10) > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > else > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
I have applied the patch to GCC 12. | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001 | From: Michael Meissner <meissner@linux.ibm.com> | Date: Thu, 14 Jul 2022 11:16:08 -0400 | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies. Testing has found that using load and store vector pair for block copies can result in a slow down on power10. This patch disables using the vector pair instructions for block copies if we are tuning for power10. 2022-06-11 Michael Meissner <meissner@linux.ibm.com> gcc/ * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do not generate block copies with vector pair instructions if we are tuning for power10. Back port from master branch. --- gcc/config/rs6000/rs6000.cc | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 4030864aa1a..040487bd277 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p) if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) { - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) + /* Do not generate lxvp and stxvp on power10 since there are some + performance issues. */ + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX + && rs6000_tune != PROCESSOR_POWER10) rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; else rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
Back port patch (changing .cc to .c) from trunk to GCC 11 committed. | From 3118d0856b030fe491a170354fed2df570df199f Mon Sep 17 00:00:00 2001 | From: Michael Meissner <meissner@linux.ibm.com> | Date: Thu, 14 Jul 2022 14:03:37 -0400 | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies. Testing has found that using load and store vector pair for block copies can result in a slow down on power10. This patch disables using the vector pair instructions for block copies if we are tuning for power10. 2022-06-11 Michael Meissner <meissner@linux.ibm.com> gcc/ * config/rs6000/rs6000.c (rs6000_option_override_internal): Do not generate block copies with vector pair instructions if we are tuning for power10. Back port from master branch. --- gcc/config/rs6000/rs6000.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 8c89b45922f..73f90f134f8 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4149,7 +4149,10 @@ rs6000_option_override_internal (bool global_init_p) if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) { - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) + /* Do not generate lxvp and stxvp on power10 since there are some + performance issues. */ + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX + && rs6000_tune != PROCESSOR_POWER10) rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; else rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;
On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote: > I have applied the patch to GCC 12. > > | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001 > | From: Michael Meissner <meissner@linux.ibm.com> > | Date: Thu, 14 Jul 2022 11:16:08 -0400 > | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies. > > Testing has found that using load and store vector pair for block copies > can result in a slow down on power10. This patch disables using the > vector pair instructions for block copies if we are tuning for power10. > > 2022-06-11 Michael Meissner <meissner@linux.ibm.com> > > gcc/ > > * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do > not generate block copies with vector pair instructions if we are > tuning for power10. Back port from master branch. You never posted the trunk version of this, so that never was approved either. > +++ b/gcc/config/rs6000/rs6000.cc > @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p) > > if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) > { > - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) > + /* Do not generate lxvp and stxvp on power10 since there are some > + performance issues. */ > + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX > + && rs6000_tune != PROCESSOR_POWER10) > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > else > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; The TARGET_MMA in that should not be there. Please fix that (that probably needs more changes). This statement does the opposite of what the comment says. Please fix this. On trunk, first. Segher
On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote: > On Thu, Jul 14, 2022 at 11:20:56AM -0400, Michael Meissner wrote: > > I have applied the patch to GCC 12. > > > > | From 22736f3d0d4fb8ce4afb3230023f8accdb03a623 Mon Sep 17 00:00:00 2001 > > | From: Michael Meissner <meissner@linux.ibm.com> > > | Date: Thu, 14 Jul 2022 11:16:08 -0400 > > | Subject: [PATCH] [BACKPORT] Disable generating load/store vector pairs for block copies. > > > > Testing has found that using load and store vector pair for block copies > > can result in a slow down on power10. This patch disables using the > > vector pair instructions for block copies if we are tuning for power10. > > > > 2022-06-11 Michael Meissner <meissner@linux.ibm.com> > > > > gcc/ > > > > * config/rs6000/rs6000.cc (rs6000_option_override_internal): Do > > not generate block copies with vector pair instructions if we are > > tuning for power10. Back port from master branch. > > You never posted the trunk version of this, so that never was approved > either. I did post the trunk version on June 10th, and your only comment was fix the commit message, which I thought I did in the commit. > > +++ b/gcc/config/rs6000/rs6000.cc > > @@ -4151,7 +4151,10 @@ rs6000_option_override_internal (bool global_init_p) > > > > if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) > > { > > - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) > > + /* Do not generate lxvp and stxvp on power10 since there are some > > + performance issues. */ > > + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX > > + && rs6000_tune != PROCESSOR_POWER10) > > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > > else > > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > > The TARGET_MMA in that should not be there. Please fix that (that > probably needs more changes). All of the movoo and movxo support require TARGET_MMA as does the code in rs6000-string.cc that could possibly generate load/store vector pair. To remove the check here would mean also fixing all of the vector load and store pairs in mma.md. > This statement does the opposite of what the comment says. > > Please fix this. On trunk, first.
On Thu, Jul 14, 2022 at 05:49:57PM -0400, Michael Meissner wrote: > On Thu, Jul 14, 2022 at 04:12:14PM -0500, Segher Boessenkool wrote: > > You never posted the trunk version of this, so that never was approved > > either. > > I did post the trunk version on June 10th, and your only comment was fix the > commit message, which I thought I did in the commit. I did not approve the patch. Of course not, I didn't even get as far as reading it. You should have fixed it and sent again, I did not approve anything. > > > + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX > > > + && rs6000_tune != PROCESSOR_POWER10) > > > rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > > > else > > > rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; > > > > The TARGET_MMA in that should not be there. Please fix that (that > > probably needs more changes). > > All of the movoo and movxo support require TARGET_MMA as does the code in > rs6000-string.cc that could possibly generate load/store vector pair. And all that is wrong and should be fixed. > To > remove the check here would mean also fixing all of the vector load and store > pairs in mma.md. That is wha I said, yes,. > > This statement does the opposite of what the comment says. > > > > Please fix this. On trunk, first. This is the core problem with this patch: it is simply wrong. It is a very roundabout way of saying "only enable vector pairs if -mcpu=power10 but -mtune=somethingelse". Which is not a sensible thing to do, and not what the comment says either. Segher
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 0af2085adc0..59481d9ac70 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -4141,7 +4141,10 @@ rs6000_option_override_internal (bool global_init_p) if (!(rs6000_isa_flags_explicit & OPTION_MASK_BLOCK_OPS_VECTOR_PAIR)) { - if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX) + /* Do not generate lxvp and stxvp on power10 since there are some + performance issues. */ + if (TARGET_MMA && TARGET_EFFICIENT_UNALIGNED_VSX + && rs6000_tune != PROCESSOR_POWER10) rs6000_isa_flags |= OPTION_MASK_BLOCK_OPS_VECTOR_PAIR; else rs6000_isa_flags &= ~OPTION_MASK_BLOCK_OPS_VECTOR_PAIR;