diff mbox series

WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation' (was: nvptx vs. [PATCH] Add a late-combine pass [PR106594])

Message ID 87bk3huy0z.fsf@euler.schwinge.ddns.net
State New
Headers show
Series WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation' (was: nvptx vs. [PATCH] Add a late-combine pass [PR106594]) | expand

Commit Message

Thomas Schwinge July 1, 2024, 11:55 a.m. UTC
Hi!

On 2024-06-28T00:41:54+0200, I wrote:
> On 2024-06-27T23:20:18+0200, I wrote:
>> On 2024-06-27T22:27:21+0200, I wrote:
>>> On 2024-06-27T18:49:17+0200, I wrote:
>>>> On 2023-10-24T19:49:10+0100, Richard Sandiford <richard.sandiford@arm.com> wrote:
>>>>> This patch adds a combine pass that runs late in the pipeline.
>>>
>>> [After sending, I realized I replied to a previous thread of this work.]
>>>
>>>> I've beek looking a bit through recent nvptx target code generation
>>>> changes for GCC target libraries, and thought I'd also share here my
>>>> findings for the "late-combine" changes in isolation, for nvptx target.
>>>> 
>>>> First the unexpected thing:
>>>
>>> So much for "unexpected thing" -- next level of unexpected here...
>>> Appreciated if anyone feels like helping me find my way through this, but
>>> I totally understand if you've got other things to do.
>>
>> OK, I found something already.  (Unexpectedly quickly...)  ;-)
>>
>>>> there are a few cases where we now see unused
>>>> registers get declared
>
>> But in fact, for both cases
>
> Now tested: 's%both%all'.  :-)
>
>> the unexpected difference goes away if after
>> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'.

The following will be unnecessary assuming that Richard's proposed
"Give fast DCE a separate dirty flag" gets accepted, but may still be
useful if we follow through with the idea to enable (parts of)
'pass_postreload' for nvptx (as discussing with Roger), so, for later:

>> The following makes these two cases work, but evidently needs a lot more
>> analysis: a lot of other passes are enabled that may be anything between
>> beneficial and harmful for 'targetm.no_register_allocation'/nvptx.
>>
>>     --- gcc/passes.cc
>>     +++ gcc/passes.cc
>>     @@ -676,17 +676,17 @@ const pass_data pass_data_postreload =
>>      class pass_postreload : public rtl_opt_pass
>>      {
>>      public:
>>        pass_postreload (gcc::context *ctxt)
>>          : rtl_opt_pass (pass_data_postreload, ctxt)
>>        {}
>>      
>>        /* opt_pass methods: */
>>     -  bool gate (function *) final override { return reload_completed; }
>>     +  bool gate (function *) final override { return reload_completed || targetm.no_register_allocation; }
>>     --- gcc/regcprop.cc
>>     +++ gcc/regcprop.cc
>>     @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass
>>      public:
>>        pass_cprop_hardreg (gcc::context *ctxt)
>>          : rtl_opt_pass (pass_data_cprop_hardreg, ctxt)
>>        {}
>>      
>>        /* opt_pass methods: */
>>        bool gate (function *) final override
>>          {
>>     -      return (optimize > 0 && (flag_cprop_registers));
>>     +      return (optimize > 0 && flag_cprop_registers && !targetm.no_register_allocation);
>>          }
>
> Also, that quickly ICEs; more '[...] && !targetm.no_register_allocation'
> are needed elsewhere, at least.
>
> The following simpler thing, however, does work; move 'pass_fast_rtl_dce'
> out of 'pass_postreload':
>
>     --- gcc/passes.cc
>     +++ gcc/passes.cc
>     @@ -677,14 +677,15 @@ class pass_postreload : public rtl_opt_pass
>      {
>      public:
>        pass_postreload (gcc::context *ctxt)
>          : rtl_opt_pass (pass_data_postreload, ctxt)
>        {}
>      
>        /* opt_pass methods: */
>     +  opt_pass * clone () final override { return new pass_postreload (m_ctxt); }
>        bool gate (function *) final override { return reload_completed; }
>      
>      }; // class pass_postreload
>     --- gcc/passes.def
>     +++ gcc/passes.def
>     @@ -529,7 +529,10 @@ along with GCC; see the file COPYING3.  If not see
>               NEXT_PASS (pass_regrename);
>               NEXT_PASS (pass_fold_mem_offsets);
>               NEXT_PASS (pass_cprop_hardreg);
>     -         NEXT_PASS (pass_fast_rtl_dce);
>     +      POP_INSERT_PASSES ()
>     +      NEXT_PASS (pass_fast_rtl_dce);
>     +      NEXT_PASS (pass_postreload);
>     +      PUSH_INSERT_PASSES_WITHIN (pass_postreload)
>               NEXT_PASS (pass_reorder_blocks);
>               NEXT_PASS (pass_leaf_regs);
>               NEXT_PASS (pass_split_before_sched2);
>
> This (only) cleans up "the mess that 'pass_late_combine' created"; no
> further changes in GCC target libraries for nvptx.  (For avoidance of
> doubt: "mess" is a great exaggeration here.)

But that then disturbs non-nvptx targets; see (prerequisite)
<https://inbox.sourceware.org/87ed8htdwy.fsf@euler.schwinge.ddns.net>
"Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'" for why.

Then, see the attached -- just for later, for now --
"WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation'"
for how to make this work properly.  (This also puts back
'pass_fast_rtl_dce' into 'pass_late_compilation' instead of running it
unconditionally, in order to not change any behavior in that regard.)


Grüße
 Thomas


>>> But: should we expect '-fno-late-combine-instructions' vs.
>>> '-flate-combine-instructions' to behave in the same way?  (After all,
>>> '%r22' remains unused also with '-flate-combine-instructions', and
>>> doesn't need to be emitted.)  This could, of course, also be a nvptx back
>>> end issue?
>>>
>>> I'm happy to supply any dump files etc.  Also, 'tmp-libc_a-lnumeric.i.xz'
>>> is attached if you'd like to reproduce this with your own nvptx target
>>> 'cc1':
>>>
>>>     $ [...]/configure --target=nvptx-none --enable-languages=c
>>>     $ make -j12 all-gcc
>>>     $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions
>>>
>>>
>>> Grüße
>>>  Thomas
diff mbox series

Patch

From ef14e15c3255059f374e04a47d838e9c98c9da2c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <tschwinge@baylibre.com>
Date: Fri, 28 Jun 2024 00:41:54 +0200
Subject: [PATCH] WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into
 'pass_late_compilation'

id:87ed8i2ekt.fsf@euler.schwinge.ddns.net
---
 gcc/passes.cc  | 8 ++++++++
 gcc/passes.def | 6 ++++++
 2 files changed, 14 insertions(+)

diff --git a/gcc/passes.cc b/gcc/passes.cc
index e444b462113..1cdd4a77f5b 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -685,6 +685,10 @@  public:
   {}
 
   /* opt_pass methods: */
+  opt_pass *clone () final override
+  {
+    return new pass_postreload (m_ctxt);
+  }
   bool gate (function *) final override
   {
     if (reload_completed)
@@ -728,6 +732,10 @@  public:
   {}
 
   /* opt_pass methods: */
+  opt_pass *clone () final override
+  {
+    return new pass_late_compilation (m_ctxt);
+  }
   bool gate (function *) final override
   {
     return reload_completed || targetm.no_register_allocation;
diff --git a/gcc/passes.def b/gcc/passes.def
index 72198bc4c4e..cb221438a1e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -529,7 +529,13 @@  along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_regrename);
 	  NEXT_PASS (pass_fold_mem_offsets);
 	  NEXT_PASS (pass_cprop_hardreg);
+      POP_INSERT_PASSES ()
+      NEXT_PASS (pass_late_compilation);
+      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
 	  NEXT_PASS (pass_fast_rtl_dce);
+      POP_INSERT_PASSES ()
+      NEXT_PASS (pass_postreload);
+      PUSH_INSERT_PASSES_WITHIN (pass_postreload)
 	  NEXT_PASS (pass_reorder_blocks);
 	  NEXT_PASS (pass_leaf_regs);
 	  NEXT_PASS (pass_split_before_sched2);
-- 
2.34.1