Message ID | 2055734.X6m1ecN8ti@polaris |
---|---|
State | New |
Headers | show |
Series | [rs6000] 64-bit integer loads/stores and FP instructions | expand |
Hi! On Wed, Feb 06, 2019 at 11:08:44PM +0100, Eric Botcazou wrote: > as reported e.g. at https://gcc.gnu.org/ml/gcc-help/2018-11/msg00038.html, the > 7 series of compilers started to use FP instructions for simple 64-bit integer > loads/stores in unexpected ways. Consider: <snip> > The difference between GCC 7 and GCC 8 comes from this change: > https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00584.html > which seems to imply that the first change was problematic but which was not > backported to the 7 branch. > > So at a minimum I think that the second change should be backported to the 7 > branch; it applies (almost) cleanly and gives no regressions on PowerPC/Linux. > > Backport from mainline > 2018-06-11 Segher Boessenkool <segher@kernel.crashing.org> > > PR target/85755 > * config/rs6000/rs6000.md (*movdi_internal32): Put constraint modifiers > on the correct operand. > (*movdi_internal64): Ditto. Backporting this is okay. (It was not done because it does not affect correctness). What is the "almost", btw? > But I also think that another part of the first change is problematic, namely > the removal of the '*' modifier on the alternatives, which means that the > register preference of the instruction isn't skewed towards integer registers > any more. (In https://gcc.gnu.org/ml/gcc-patches/2016-11/txt4j9hLRLCzf.txt and you presumably mean the *movdi_internal32 hunk, and the first line of it: - "=Y, r, r, ?m, ?*d, ?*d, + "=Y, r, r, ^m, ^d, ^d, where the last three are mem<-reg, reg<-mem, reg<-reg, for fp regs). This patch was a year before we switched to LRA, and for non-LRA ports * does not do any register preferencing. > That may be OK for native targets, but that is frowned upon for > embedded targets, where people are really unhappy when the compiler emits > floating-point instructions for pure integer code for no apparent reason. For 6xx/7xx people *wanted* to use FP loads and stores whenever possible, because you get twice the bandwidth with them. If you do not want the FP registers used (or FP insns used), use -msoft-float, that's what it's for. Patches are welcome, but complicating this already very complex code by introducing an arbitrary distinction between "embedded and other" (or actually, "server and other") isn't the way to go. Sorry. Could you open a PR please? Segher
> Backporting this is okay. (It was not done because it does not affect > correctness). What is the "almost", btw? The predicate of operand #0 of movdi_internal32 is rs6000_nonimmediate_operand on the 7 branch and nonimmediate_operand on the 8 branch and later. > (In https://gcc.gnu.org/ml/gcc-patches/2016-11/txt4j9hLRLCzf.txt and you > presumably mean the *movdi_internal32 hunk, and the first line of it: > > - "=Y, r, r, ?m, ?*d, ?*d, > + "=Y, r, r, ^m, ^d, ^d, > > where the last three are mem<-reg, reg<-mem, reg<-reg, for fp regs). > > This patch was a year before we switched to LRA, and for non-LRA ports * > does not do any register preferencing. Are you sure? See record_reg_classes in ira-costs.c: case '*': /* Ignore the next letter for this pass. */ c = *++p; break; > For 6xx/7xx people *wanted* to use FP loads and stores whenever possible, > because you get twice the bandwidth with them. > > If you do not want the FP registers used (or FP insns used), use > -msoft-float, that's what it's for. No disagreement, but you can't force people to use it when they use FP code in other parts of their software. Believe me, we (and probably others) tried... > Patches are welcome, but complicating this already very complex code by > introducing an arbitrary distinction between "embedded and other" (or > actually, "server and other") isn't the way to go. Sorry. Not clear what you mean by complicating here... the approach is rather simple and precisely aimed at not disturbing the common path, i.e. server processors, while preserving the historical path for other processors. AFAIK nobody has evaluated the effects of the original change beyond POWER 7/8/9. > Could you open a PR please? Sure, but about what now? ;-)
On Fri, Feb 08, 2019 at 11:46:37AM +0100, Eric Botcazou wrote: > > Backporting this is okay. (It was not done because it does not affect > > correctness). What is the "almost", btw? > > The predicate of operand #0 of movdi_internal32 is rs6000_nonimmediate_operand > on the 7 branch and nonimmediate_operand on the 8 branch and later. Ah k, that is fine of course. > > (In https://gcc.gnu.org/ml/gcc-patches/2016-11/txt4j9hLRLCzf.txt and you > > presumably mean the *movdi_internal32 hunk, and the first line of it: > > > > - "=Y, r, r, ?m, ?*d, ?*d, > > + "=Y, r, r, ^m, ^d, ^d, > > > > where the last three are mem<-reg, reg<-mem, reg<-reg, for fp regs). > > > > This patch was a year before we switched to LRA, and for non-LRA ports * > > does not do any register preferencing. > > Are you sure? No, I checked the docs so I wouldn't spout nonsense, but then I misread. Oops. > > For 6xx/7xx people *wanted* to use FP loads and stores whenever possible, > > because you get twice the bandwidth with them. > > > > If you do not want the FP registers used (or FP insns used), use > > -msoft-float, that's what it's for. > > No disagreement, but you can't force people to use it when they use FP code in > other parts of their software. Believe me, we (and probably others) tried... Of course. And we try to use integer registers only in normal cases, these days. > > Patches are welcome, but complicating this already very complex code by > > introducing an arbitrary distinction between "embedded and other" (or > > actually, "server and other") isn't the way to go. Sorry. > > Not clear what you mean by complicating here... the approach is rather simple > and precisely aimed at not disturbing the common path, i.e. server processors, > while preserving the historical path for other processors. AFAIK nobody has > evaluated the effects of the original change beyond POWER 7/8/9. You make an arbitrary distinction between certain CPUs and duplicate patterns based on that. > > Could you open a PR please? > > Sure, but about what now? ;-) About the bug: it should behave the same as before, use GPRs only in your testcase. Segher
> You make an arbitrary distinction between certain CPUs and duplicate > patterns based on that. Sure, somewhat arbitrary, but that's a proof of concept and IMO better than treating every processor the same way. The alternative would be to complicate further the existing patterns by means of the "enabled" attribute or somesuch, but I can try it too. > About the bug: it should behave the same as before, use GPRs only in your > testcase. It's fixed by backporting the last part of PR target/85755 on the 7 branch, but I can certainly add the testcase to the gcc.target/powerpc testsuite.
On Mon, Feb 11, 2019 at 10:07:44AM +0100, Eric Botcazou wrote: > > You make an arbitrary distinction between certain CPUs and duplicate > > patterns based on that. > > Sure, somewhat arbitrary, but that's a proof of concept and IMO better than > treating every processor the same way. The alternative would be to complicate > further the existing patterns by means of the "enabled" attribute or somesuch, > but I can try it too. No, we should allow both integer and floating point insns for integer stores always. We just get the cost estimates slightly wrong now, apparently. Segher
> No, we should allow both integer and floating point insns for integer stores > always. We just get the cost estimates slightly wrong now, apparently. Note that my proof of concept patch doesn't disallow them either... So what do you suggest? Just putting back the '*' modifiers in the DI patterns? As a matter of fact there are still present in the SI pattern.
On Tue, Feb 12, 2019 at 11:55:24AM +0100, Eric Botcazou wrote: > > No, we should allow both integer and floating point insns for integer stores > > always. We just get the cost estimates slightly wrong now, apparently. > > Note that my proof of concept patch doesn't disallow them either... So what > do you suggest? Just putting back the '*' modifiers in the DI patterns? Yeah, something like that. It will need some serious testing, to make sure we don't regress (including not regressing what that patch that took them away was meant to do). I can arrange some testing, will you do the patch though? > As a matter of fact there are still present in the SI pattern. Yeah. It might not hurt at all to put them back in the DI as well. Here's hoping. Thanks, Segher
> Yeah, something like that. It will need some serious testing, to make > sure we don't regress (including not regressing what that patch that took > them away was meant to do). I can arrange some testing, will you do the > patch though? I can do the patch and also (correctness) testing for 32-bit Linux. Another issue is the extent of the patch: practically speaking, putting back the '*' modifier before all the 'd' constraints would be sufficient, but the current setting is a bit inconsistent|*] so this could also be adjusted. [*] For example, in the *movdi_internal32 pattern, 2 'wi' constraints have it but not the other 2. Likewise for "wv'.
On Thu, Feb 14, 2019 at 05:30:52PM +0100, Eric Botcazou wrote: > > Yeah, something like that. It will need some serious testing, to make > > sure we don't regress (including not regressing what that patch that took > > them away was meant to do). I can arrange some testing, will you do the > > patch though? > > I can do the patch and also (correctness) testing for 32-bit Linux. Performance testing is important here, of course. > Another issue is the extent of the patch: practically speaking, putting back > the '*' modifier before all the 'd' constraints would be sufficient, but the > current setting is a bit inconsistent|*] so this could also be adjusted. > > [*] For example, in the *movdi_internal32 pattern, 2 'wi' constraints have it > but not the other 2. Likewise for "wv'. I think we should change as little as possible for 7/8/9 here, because this is pretty fragile :-( But for 10, yes, let's get more sanity. (We'll have the "enabled" attribute for GCC 10, btw). Segher
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 7d399a2227b..d48395666fc 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4894,6 +4894,13 @@ rs6000_option_override_internal (bool global_init_p) || rs6000_tune == PROCESSOR_PPCE5500 || rs6000_tune == PROCESSOR_PPCE6500); + TARGET_AVOID_FPU_FOR_INT_MOVES = (rs6000_tune != PROCESSOR_POWER4 + && rs6000_tune != PROCESSOR_POWER5 + && rs6000_tune != PROCESSOR_POWER6 + && rs6000_tune != PROCESSOR_POWER7 + && rs6000_tune != PROCESSOR_POWER8 + && rs6000_tune != PROCESSOR_POWER9); + /* Allow debug switches to override the above settings. These are set to -1 in rs6000.opt to indicate the user hasn't directly set the switch. */ if (TARGET_ALWAYS_HINT >= 0) diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index a6929d9641b..e5b8fde8ff0 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -8586,7 +8586,8 @@ Oj, wM, OjwM, Oj, wM, wS, wB"))] - "! TARGET_POWERPC64 + "!TARGET_POWERPC64 + && !TARGET_AVOID_FPU_FOR_INT_MOVES && (gpc_reg_operand (operands[0], DImode) || gpc_reg_operand (operands[1], DImode))" "@ @@ -8616,6 +8617,25 @@ vecsimple") (set_attr "size" "64")]) +;; This is the pre-GCC7 variant for integer-oriented CPUs without vector units. +(define_insn "*movdi_internal32_basic" + [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,r,?m,?*d,?*d,r") + (match_operand:DI 1 "input_operand" "r,Y,r,d,m,d,IJKnGHF"))] + "!TARGET_POWERPC64 + && TARGET_AVOID_FPU_FOR_INT_MOVES + && (gpc_reg_operand (operands[0], DImode) + || gpc_reg_operand (operands[1], DImode))" + "@ + # + # + # + stfd%U0%X0 %1,%0 + lfd%U1%X1 %0,%1 + fmr %0,%1 + #" + [(set_attr "type" "store,load,*,fpstore,fpload,fpsimple,*") + (set_attr "size" "64")]) + (define_split [(set (match_operand:DI 0 "gpc_reg_operand") (match_operand:DI 1 "const_int_operand"))] @@ -8664,6 +8684,7 @@ wg, r, wj, r"))] "TARGET_POWERPC64 + && !TARGET_AVOID_FPU_FOR_INT_MOVES && (gpc_reg_operand (operands[0], DImode) || gpc_reg_operand (operands[1], DImode))" "@ @@ -8710,6 +8731,33 @@ 8, 4, 4, 4, 4, 4, 4, 4, 4, 4")]) +;; This is the pre-GCC7 variant for integer-oriented CPUs without vector units. +(define_insn "*movdi_internal64_basic" + [(set (match_operand:DI 0 "nonimmediate_operand" "=Y,r,r,r,r,r,?m,?*d,?*d,r,*h,*h,r,?*wg") + (match_operand:DI 1 "input_operand" "r,Y,r,I,L,nF,d,m,d,*h,r,0,*wg,r"))] + "TARGET_POWERPC64 + && TARGET_AVOID_FPU_FOR_INT_MOVES + && (gpc_reg_operand (operands[0], DImode) + || gpc_reg_operand (operands[1], DImode))" + "@ + std%U0%X0 %1,%0 + ld%U1%X1 %0,%1 + mr %0,%1 + li %0,%1 + lis %0,%v1 + # + stfd%U0%X0 %1,%0 + lfd%U1%X1 %0,%1 + fmr %0,%1 + mf%1 %0 + mt%0 %1 + nop + mftgpr %0,%1 + mffgpr %0,%1" + [(set_attr "type" "store,load,*,*,*,*,fpstore,fpload,fpsimple,mfjmpr,mtjmpr,*,mftgpr,mffgpr") + (set_attr "size" "64") + (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4,4")]) + ; Some DImode loads are best done as a load of -1 followed by a mask ; instruction. (define_split diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index ace8a477550..381a0c94788 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -106,12 +106,13 @@ unsigned int rs6000_debug ;; Whether to enable the -mfloat128 stuff without necessarily enabling the ;; __float128 keyword. -TargetSave -unsigned char x_TARGET_FLOAT128_TYPE - -Variable +TargetVariable unsigned char TARGET_FLOAT128_TYPE +;; Whether to avoid using the FPU or other units for integer moves if possible +TargetVariable +unsigned char TARGET_AVOID_FPU_FOR_INT_MOVES + ;; This option existed in the past, but now is always on. mpowerpc Target RejectNegative Undocumented Ignore