diff mbox series

[pushed,PR108500] RA: Use simple LRA for huge functions

Message ID 08b7c01c-00f1-8428-e8eb-61508843b714@redhat.com
State New
Headers show
Series [pushed,PR108500] RA: Use simple LRA for huge functions | expand

Commit Message

Vladimir Makarov Feb. 10, 2023, 4:47 p.m. UTC
The following patch is for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108500

The patch improves compilation speed.  Compilation time of the biggest 
test in the PR decreases from 1235s to 709s.

The patch was successfully bootstrapped on x86-64.
diff mbox series

Patch

commit 02371cdd755d2b53fb580d3e8209c44e0c45c337
Author: Vladimir N. Makarov <vmakarov@redhat.com>
Date:   Fri Feb 10 11:12:37 2023 -0500

    RA: Use simple LRA for huge functions
    
    The PR108500 test contains a huge function and RA spends a lot of time
    to compile the test with -O0.  The patch decreases compilation time
    considerably for huge functions.  Compilation time for the PR test
    decreases from 1235s to 709s on Intel i7-13600K.
    
            PR tree-optimization/108500
    
    gcc/ChangeLog:
    
            * params.opt (ira-simple-lra-insn-threshold): Add new param.
            * ira.cc (ira): Use the param to switch on simple LRA.

diff --git a/gcc/ira.cc b/gcc/ira.cc
index 6143db06c52..d0b6ea062e8 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -5624,12 +5624,16 @@  ira (FILE *f)
     if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
       num_used_regs++;
 
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
-     pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
-     use simplified and faster algorithms in LRA.  */
+  /* If there are too many pseudos and/or basic blocks (e.g. 10K pseudos and
+     10K blocks or 100K pseudos and 1K blocks) or we have too many function
+     insns, we will use simplified and faster algorithms in LRA.  */
   lra_simple_p
-    = ira_use_lra_p
-      && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
+    = (ira_use_lra_p
+       && (num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun)
+           /* max uid is a good evaluation of the number of insns as most
+              optimizations are done on tree-SSA level.  */
+           || ((uint64_t) get_max_uid ()
+	       > (uint64_t) param_ira_simple_lra_insn_threshold * 1000)));
 
   if (lra_simple_p)
     {
diff --git a/gcc/params.opt b/gcc/params.opt
index 8a128c321c9..c7913d9063a 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -302,6 +302,10 @@  The number of registers in each class kept unused by loop invariant motion.
 Common Joined UInteger Var(param_ira_max_conflict_table_size) Init(1000) Param Optimization
 Max size of conflict table in MB.
 
+-param=ira-simple-lra-insn-threshold=
+Common Joined UInteger Var(param_ira_simple_lra_insn_threshold) Init(1000) Param Optimization
+Approximate function insn number in 1K units triggering simple local RA.
+
 -param=ira-max-loops-num=
 Common Joined UInteger Var(param_ira_max_loops_num) Init(100) Param Optimization
 Max loops number for regional RA.