Message ID | 1354032011-32303-1-git-send-email-bonzini@gnu.org |
---|---|
State | New |
Headers | show |
On Tue, Nov 27, 2012 at 5:00 PM, Paolo Bonzini wrote: > Note that the bug is present on older branches too, but it became much > worse sometime between 4.4 and 4.7. Probably around the time we (i.e. you and me) taught gcse.c to handle REG_EQUAL expressions? > 2012-11-26 Paolo Bonzini <> > > PR rtl-optimization/55489 > * gcse.c (compute_transp): Precompute a canonical version > of XEXP (x, 0), and pass it to canon_true_dependence. > > Index: gcse.c > =================================================================== > --- gcse.c (revisione 193848) > +++ gcse.c (copia locale) > @@ -1658,7 +1658,11 @@ compute_transp (const_rtx x, int indx, sbitmap *bm > { > bitmap_iterator bi; > unsigned bb_index; > + rtx x_addr; > > + x_addr = get_addr (XEXP (x, 0)); > + x_addr = canon_rtx (x_addr); > + > /* First handle all the blocks with calls. We don't need to > do any list walking for them. */ > EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi) > @@ -1683,7 +1687,7 @@ > rtx dest_addr = pair->dest_addr; > > if (canon_true_dependence (dest, GET_MODE (dest), > - dest_addr, x, NULL_RTX)) > + dest_addr, x, x_addr)) > RESET_BIT (bmap[bb_index], indx); > } > } I can't approve it, but this looks OK to me. Maybe properly re-indent this block of code while at it? Ciao! Steven
Il 27/11/2012 17:40, Steven Bosscher ha scritto: > > Note that the bug is present on older branches too, but it became much > > worse sometime between 4.4 and 4.7. > > Probably around the time we (i.e. you and me) taught gcse.c to handle > REG_EQUAL expressions? Hmm, no, that was much earlier. Like 4.2 or 4.3, roughly the same time when fwprop went in. Paolo
> Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for > 4.7 and trunk if it passes? > > Paolo > > 2012-11-26 Paolo Bonzini <pbonzini@redhat.com> > > PR rtl-optimization/55489 > * gcse.c (compute_transp): Precompute a canonical version > of XEXP (x, 0), and pass it to canon_true_dependence. OK, thanks.
On Tue, Nov 27, 2012 at 8:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote: > Hi, > > this bug triggers in the compilation of QEMU with GCC 4.7.2. It is > latent on trunk because reg_known_value is completely broken. I'll > send a separate patch, but this one applies there too. > > The problem arises when you have -fPIE (or -fPIC) and a huge function > with a lot of references to global variables. Canonicalization of > position-independent addresses is then done over and over for the > same addresses, resulting in quadratic time and memory complexity for > GCSE's compute_transp; hundreds of megabytes of memory are allocated > in plus_constant, > > The fix is to canonicalize the addresses outside the loop, similar to > what is done by the RTL DSE pass. > > gcc 4.4.6: > PRE : 3.83 (24%) usr 0.15 (17%) sys 3.99 (24%) wall 267307 kB (33%) ggc > > gcc 4.7.2: > PRE : 7.95 (41%) usr 0.40 (40%) sys 8.31 (41%) wall 821017 kB (80%) ggc > > gcc 4.8.0: > PRE : 6.94 (26%) usr 0.02 ( 4%) sys 6.98 (26%) wall 731 kB ( 0%) ggc > > gcc 4.7.2 + patch: > PRE : 5.90 (34%) usr 0.02 ( 3%) sys 6.41 (35%) wall 1670 kB ( 1%) ggc > > Note that the bug is present on older branches too, but it became much > worse sometime between 4.4 and 4.7. > > Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for > 4.7 and trunk if it passes? > > Paolo > > 2012-11-26 Paolo Bonzini <pbonzini@redhat.com> > > PR rtl-optimization/55489 > * gcse.c (compute_transp): Precompute a canonical version > of XEXP (x, 0), and pass it to canon_true_dependence. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55597
Il 04/12/2012 23:30, H.J. Lu ha scritto: > On Tue, Nov 27, 2012 at 8:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote: >> Hi, >> >> this bug triggers in the compilation of QEMU with GCC 4.7.2. It is >> latent on trunk because reg_known_value is completely broken. I'll >> send a separate patch, but this one applies there too. >> >> The problem arises when you have -fPIE (or -fPIC) and a huge function >> with a lot of references to global variables. Canonicalization of >> position-independent addresses is then done over and over for the >> same addresses, resulting in quadratic time and memory complexity for >> GCSE's compute_transp; hundreds of megabytes of memory are allocated >> in plus_constant, >> >> The fix is to canonicalize the addresses outside the loop, similar to >> what is done by the RTL DSE pass. >> >> gcc 4.4.6: >> PRE : 3.83 (24%) usr 0.15 (17%) sys 3.99 (24%) wall 267307 kB (33%) ggc >> >> gcc 4.7.2: >> PRE : 7.95 (41%) usr 0.40 (40%) sys 8.31 (41%) wall 821017 kB (80%) ggc >> >> gcc 4.8.0: >> PRE : 6.94 (26%) usr 0.02 ( 4%) sys 6.98 (26%) wall 731 kB ( 0%) ggc >> >> gcc 4.7.2 + patch: >> PRE : 5.90 (34%) usr 0.02 ( 3%) sys 6.41 (35%) wall 1670 kB ( 1%) ggc >> >> Note that the bug is present on older branches too, but it became much >> worse sometime between 4.4 and 4.7. >> >> Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for >> 4.7 and trunk if it passes? >> >> Paolo >> >> 2012-11-26 Paolo Bonzini <pbonzini@redhat.com> >> >> PR rtl-optimization/55489 >> * gcse.c (compute_transp): Precompute a canonical version >> of XEXP (x, 0), and pass it to canon_true_dependence. >> > > This caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55597 Most likely caused by http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02225.html ([PATCH] Fix allocation of reg_known_value), not by this patch. Paolo
Index: gcse.c =================================================================== --- gcse.c (revisione 193848) +++ gcse.c (copia locale) @@ -1658,7 +1658,11 @@ compute_transp (const_rtx x, int indx, sbitmap *bm { bitmap_iterator bi; unsigned bb_index; + rtx x_addr; + x_addr = get_addr (XEXP (x, 0)); + x_addr = canon_rtx (x_addr); + /* First handle all the blocks with calls. We don't need to do any list walking for them. */ EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi) @@ -1683,7 +1687,7 @@ rtx dest_addr = pair->dest_addr; if (canon_true_dependence (dest, GET_MODE (dest), - dest_addr, x, NULL_RTX)) + dest_addr, x, x_addr)) RESET_BIT (bmap[bb_index], indx); } }