Patchwork [1.3] build: compile translate.o at -O1 optimization

login
register
mail settings
Submitter Paolo Bonzini
Date Nov. 27, 2012, 8:36 a.m.
Message ID <1354005361-17805-1-git-send-email-pbonzini@redhat.com>
Download mbox | patch
Permalink /patch/202121/
State New
Headers show

Comments

Paolo Bonzini - Nov. 27, 2012, 8:36 a.m.
Some versions of GCC require insane (>2GB) amounts of memory
to compile translate.o.  As a countermeasure, compile it
with -O1.  This should fix the buildbot failure for
default_x86_64_fedora16.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.target | 2 ++
 1 file changed, 2 insertions(+)
Wayne Xia - Nov. 27, 2012, 9:27 a.m.
> Some versions of GCC require insane (>2GB) amounts of memory
> to compile translate.o.  As a countermeasure, compile it
> with -O1.  This should fix the buildbot failure for
> default_x86_64_fedora16.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>   Makefile.target | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/Makefile.target b/Makefile.target
> index 8b658c0..3981931 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -143,6 +143,8 @@ GENERATED_HEADERS += hmp-commands.h qmp-commands-old.h
> 
>   endif # CONFIG_SOFTMMU
> 
> +%/translate.o: CFLAGS := $(patsubst -O2,-O1,$(CFLAGS))
> +
>   nested-vars += obj-y
> 
>   # This resolves all nested paths, so it must come last
> 
  In tcg case I think translate.o will influent performance obviously,
how about adding an option "fast-build" to use -O1 for it by default.
If you agree I will adding that after this patch upstream which fix
build bot failure quickly.
Paolo Bonzini - Nov. 27, 2012, 9:37 a.m.
Il 27/11/2012 10:27, Wenchao Xia ha scritto:
>   In tcg case I think translate.o will influent performance obviously,
> how about adding an option "fast-build" to use -O1 for it by default.
> If you agree I will adding that after this patch upstream which fix
> build bot failure quickly.

This is not about having a fast or slow build, it's about not requiring
a ludicrous amount of memory...  Besides, translate.c is usually not too
high in the profiles.  Most of the time is spent _executing_
JIT-translated code, not translating it.

Note that there is probably one or two GCC options that can be
fine-tuned to avoid the explosion instead of just -O1 vs. -O2.  If you
have an affected machine (F18) you can help by compiling translate.c
with -O2 -ftime-report.  I planned to do this today, but I first need to
install a F18 virtual machine.

Paolo
Gerd Hoffmann - Nov. 27, 2012, 12:09 p.m.
Hi,

> Note that there is probably one or two GCC options that can be
> fine-tuned to avoid the explosion instead of just -O1 vs. -O2.  If you
> have an affected machine (F18) you can help by compiling translate.c
> with -O2 -ftime-report.  I planned to do this today, but I first need to
> install a F18 virtual machine.

[x] done, see attachment

cheers,
  Gerd
CC    i386-softmmu/target-i386/translate.o

Execution times (seconds)
 phase setup             :   0.17 (100%) usr   0.06 (100%) sys   1.20 (98%) wall    2527 kB (99%) ggc
 phase finalize          :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 2%) wall       0 kB ( 0%) ggc
 TOTAL                 :   0.17             0.06             1.23               2543 kB

Execution times (seconds)
 phase setup             :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.11 ( 0%) wall    1077 kB ( 0%) ggc
 phase parsing           :   0.22 ( 1%) usr   0.41 ( 4%) sys   0.71 ( 1%) wall   13597 kB ( 1%) ggc
 phase cgraph            :  39.98 (99%) usr   9.43 (96%) sys 122.41 (99%) wall 1680634 kB (99%) ggc
 phase generate          :  39.98 (99%) usr   9.43 (96%) sys 122.42 (99%) wall 1680635 kB (99%) ggc
 phase finalize          :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 garbage collection      :   0.31 ( 1%) usr   0.37 ( 4%) sys  29.45 (24%) wall       0 kB ( 0%) ggc
 callgraph construction  :   0.05 ( 0%) usr   0.04 ( 0%) sys   0.05 ( 0%) wall    6753 kB ( 0%) ggc
 callgraph optimization  :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.04 ( 0%) wall    1401 kB ( 0%) ggc
 varpool construction    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     282 kB ( 0%) ggc
 ipa cp                  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     384 kB ( 0%) ggc
 ipa reference           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 ipa pure const          :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 ipa SRA                 :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.06 ( 0%) wall    3613 kB ( 0%) ggc
 ipa free lang data      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall       0 kB ( 0%) ggc
 cfg construction        :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     488 kB ( 0%) ggc
 cfg cleanup             :   0.36 ( 1%) usr   0.01 ( 0%) sys   0.74 ( 1%) wall    1440 kB ( 0%) ggc
 trivially dead code     :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.17 ( 0%) wall       0 kB ( 0%) ggc
 df scan insns           :   0.09 ( 0%) usr   0.01 ( 0%) sys   0.08 ( 0%) wall      38 kB ( 0%) ggc
 df multiple defs        :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall       0 kB ( 0%) ggc
 df reaching defs        :   0.10 ( 0%) usr   0.01 ( 0%) sys   0.30 ( 0%) wall       0 kB ( 0%) ggc
 df live regs            :   0.97 ( 2%) usr   0.06 ( 1%) sys   2.96 ( 2%) wall       0 kB ( 0%) ggc
 df live&initialized regs:   0.46 ( 1%) usr   0.02 ( 0%) sys   0.69 ( 1%) wall       0 kB ( 0%) ggc
 df use-def / def-use chains:   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 df reg dead/unused notes:   0.31 ( 1%) usr   0.02 ( 0%) sys   0.43 ( 0%) wall    2796 kB ( 0%) ggc
 register information    :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 alias analysis          :   0.22 ( 1%) usr   0.02 ( 0%) sys   0.34 ( 0%) wall    7380 kB ( 0%) ggc
 alias stmt walking      :   0.14 ( 0%) usr   0.07 ( 1%) sys   0.27 ( 0%) wall    2333 kB ( 0%) ggc
 register scan           :   0.03 ( 0%) usr   0.01 ( 0%) sys   0.03 ( 0%) wall       3 kB ( 0%) ggc
 rebuild jump labels     :   0.10 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) wall       0 kB ( 0%) ggc
 preprocessing           :   0.07 ( 0%) usr   0.15 ( 2%) sys   0.16 ( 0%) wall    1496 kB ( 0%) ggc
 lexical analysis        :   0.01 ( 0%) usr   0.13 ( 1%) sys   0.16 ( 0%) wall       0 kB ( 0%) ggc
 parser (global)         :   0.03 ( 0%) usr   0.04 ( 0%) sys   0.11 ( 0%) wall    6890 kB ( 0%) ggc
 parser struct body      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall     486 kB ( 0%) ggc
 parser enumerator list  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     352 kB ( 0%) ggc
 parser function body    :   0.04 ( 0%) usr   0.02 ( 0%) sys   0.10 ( 0%) wall    1263 kB ( 0%) ggc
 parser inl. func. body  :   0.07 ( 0%) usr   0.07 ( 1%) sys   0.13 ( 0%) wall    3107 kB ( 0%) ggc
 inline heuristics       :   0.11 ( 0%) usr   0.02 ( 0%) sys   0.16 ( 0%) wall    3669 kB ( 0%) ggc
 integration             :   0.34 ( 1%) usr   0.31 ( 3%) sys   0.56 ( 0%) wall   41530 kB ( 2%) ggc
 tree gimplify           :   0.04 ( 0%) usr   0.02 ( 0%) sys   0.15 ( 0%) wall    3964 kB ( 0%) ggc
 tree eh                 :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       2 kB ( 0%) ggc
 tree CFG construction   :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1582 kB ( 0%) ggc
 tree CFG cleanup        :   0.14 ( 0%) usr   0.01 ( 0%) sys   0.05 ( 0%) wall     456 kB ( 0%) ggc
 tree tail merge         :   0.04 ( 0%) usr   0.01 ( 0%) sys   0.03 ( 0%) wall       6 kB ( 0%) ggc
 tree VRP                :   0.31 ( 1%) usr   0.02 ( 0%) sys   0.40 ( 0%) wall    6708 kB ( 0%) ggc
 tree copy propagation   :   0.12 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 0%) wall     937 kB ( 0%) ggc
 tree find ref. vars     :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     127 kB ( 0%) ggc
 tree PTA                :   0.52 ( 1%) usr   0.28 ( 3%) sys   0.86 ( 1%) wall    2343 kB ( 0%) ggc
 tree SSA rewrite        :   0.05 ( 0%) usr   0.03 ( 0%) sys   0.14 ( 0%) wall    5755 kB ( 0%) ggc
 tree SSA other          :   0.01 ( 0%) usr   0.01 ( 0%) sys   0.02 ( 0%) wall     127 kB ( 0%) ggc
 tree SSA incremental    :   0.13 ( 0%) usr   0.02 ( 0%) sys   0.12 ( 0%) wall     680 kB ( 0%) ggc
 tree operand scan       :   0.19 ( 0%) usr   0.13 ( 1%) sys   0.37 ( 0%) wall   22560 kB ( 1%) ggc
 dominator optimization  :   0.12 ( 0%) usr   0.01 ( 0%) sys   0.14 ( 0%) wall    7169 kB ( 0%) ggc
 tree CCP                :   0.17 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall     670 kB ( 0%) ggc
 tree PHI const/copy prop:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree reassociation      :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     155 kB ( 0%) ggc
 tree PRE                :   0.18 ( 0%) usr   0.04 ( 0%) sys   0.26 ( 0%) wall    5335 kB ( 0%) ggc
 tree FRE                :   0.42 ( 1%) usr   0.10 ( 1%) sys   0.51 ( 0%) wall   12668 kB ( 1%) ggc
 tree code sinking       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      77 kB ( 0%) ggc
 tree linearize phis     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       2 kB ( 0%) ggc
 tree forward propagate  :   0.06 ( 0%) usr   0.02 ( 0%) sys   0.04 ( 0%) wall    2240 kB ( 0%) ggc
 tree conservative DCE   :   0.08 ( 0%) usr   0.05 ( 1%) sys   0.12 ( 0%) wall       4 kB ( 0%) ggc
 tree aggressive DCE     :   0.04 ( 0%) usr   0.03 ( 0%) sys   0.10 ( 0%) wall    1656 kB ( 0%) ggc
 tree buildin call DCE   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 tree DSE                :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall       0 kB ( 0%) ggc
 tree loop invariant motion:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       1 kB ( 0%) ggc
 complete unrolling      :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     182 kB ( 0%) ggc
 tree iv optimization    :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     105 kB ( 0%) ggc
 tree loop init          :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall     149 kB ( 0%) ggc
 tree SSA uncprop        :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 tree rename SSA copies  :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall       0 kB ( 0%) ggc
 tree STMT verifier      :   0.00 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall       0 kB ( 0%) ggc
 dominance computation   :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.10 ( 0%) wall       0 kB ( 0%) ggc
 out of ssa              :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall      35 kB ( 0%) ggc
 expand vars             :   0.07 ( 0%) usr   0.01 ( 0%) sys   0.07 ( 0%) wall    3675 kB ( 0%) ggc
 expand                  :   0.26 ( 1%) usr   0.07 ( 1%) sys   0.32 ( 0%) wall   23890 kB ( 1%) ggc
 post expand cleanups    :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall    1633 kB ( 0%) ggc
 varconst                :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 jump                    :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 forward prop            :   0.13 ( 0%) usr   0.03 ( 0%) sys   0.16 ( 0%) wall    2816 kB ( 0%) ggc
 CSE                     :   0.38 ( 1%) usr   0.03 ( 0%) sys   0.62 ( 1%) wall    2433 kB ( 0%) ggc
 dead code elimination   :   0.08 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall       0 kB ( 0%) ggc
 dead store elim1        :   0.15 ( 0%) usr   0.01 ( 0%) sys   0.19 ( 0%) wall    2431 kB ( 0%) ggc
 dead store elim2        :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.15 ( 0%) wall    1876 kB ( 0%) ggc
 loop analysis           :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      80 kB ( 0%) ggc
 loop invariant motion   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.16 ( 0%) wall       1 kB ( 0%) ggc
 CPROP                   :   0.38 ( 1%) usr   0.03 ( 0%) sys   0.58 ( 0%) wall    2579 kB ( 0%) ggc
 PRE                     :  26.78 (67%) usr   7.04 (71%) sys  72.36 (59%) wall 1444332 kB (85%) ggc
 CSE 2                   :   0.20 ( 0%) usr   0.01 ( 0%) sys   0.18 ( 0%) wall     653 kB ( 0%) ggc
 branch prediction       :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall     319 kB ( 0%) ggc
 combiner                :   0.35 ( 1%) usr   0.01 ( 0%) sys   0.50 ( 0%) wall    6538 kB ( 0%) ggc
 if-conversion           :   0.30 ( 1%) usr   0.00 ( 0%) sys   0.65 ( 1%) wall     496 kB ( 0%) ggc
 regmove                 :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.12 ( 0%) wall     224 kB ( 0%) ggc
 integrated RA           :   1.03 ( 3%) usr   0.02 ( 0%) sys   1.15 ( 1%) wall   26022 kB ( 2%) ggc
 reload                  :   0.51 ( 1%) usr   0.00 ( 0%) sys   0.58 ( 0%) wall    1732 kB ( 0%) ggc
 reload CSE regs         :   0.40 ( 1%) usr   0.00 ( 0%) sys   0.74 ( 1%) wall    3965 kB ( 0%) ggc
 ree                     :   0.03 ( 0%) usr   0.01 ( 0%) sys   0.05 ( 0%) wall     162 kB ( 0%) ggc
 thread pro- & epilogue  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) wall     822 kB ( 0%) ggc
 if-conversion 2         :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall      84 kB ( 0%) ggc
 combine stack adjustments:   0.01 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 peephole 2              :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     228 kB ( 0%) ggc
 hard reg cprop          :   0.05 ( 0%) usr   0.01 ( 0%) sys   0.10 ( 0%) wall       9 kB ( 0%) ggc
 scheduling 2            :   0.47 ( 1%) usr   0.29 ( 3%) sys   1.15 ( 1%) wall     328 kB ( 0%) ggc
 machine dep reorg       :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) wall      44 kB ( 0%) ggc
 reorder blocks          :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) wall     867 kB ( 0%) ggc
 final                   :   0.20 ( 0%) usr   0.02 ( 0%) sys   0.33 ( 0%) wall    2060 kB ( 0%) ggc
 symout                  :   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall       0 kB ( 0%) ggc
 rest of compilation     :   0.26 ( 1%) usr   0.02 ( 0%) sys   0.54 ( 0%) wall    1438 kB ( 0%) ggc
 remove unused locals    :   0.36 ( 1%) usr   0.00 ( 0%) sys   0.41 ( 0%) wall       0 kB ( 0%) ggc
 address taken           :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.08 ( 0%) wall       0 kB ( 0%) ggc
 unaccounted todo        :   0.11 ( 0%) usr   0.03 ( 0%) sys   0.12 ( 0%) wall       0 kB ( 0%) ggc
 rebuild frequencies     :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall       1 kB ( 0%) ggc
 repair loop structures  :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall      23 kB ( 0%) ggc
 TOTAL                 :  40.21             9.85           123.33            1695325 kB

Patch

diff --git a/Makefile.target b/Makefile.target
index 8b658c0..3981931 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -143,6 +143,8 @@  GENERATED_HEADERS += hmp-commands.h qmp-commands-old.h
 
 endif # CONFIG_SOFTMMU
 
+%/translate.o: CFLAGS := $(patsubst -O2,-O1,$(CFLAGS))
+
 nested-vars += obj-y
 
 # This resolves all nested paths, so it must come last