diff mbox

Fix PR55489, memory-hog bug in GCSE

Message ID 1354032011-32303-1-git-send-email-bonzini@gnu.org
State New
Headers show

Commit Message

Paolo Bonzini Nov. 27, 2012, 4 p.m. UTC
Hi,

this bug triggers in the compilation of QEMU with GCC 4.7.2.  It is
latent on trunk because reg_known_value is completely broken.  I'll
send a separate patch, but this one applies there too.

The problem arises when you have -fPIE (or -fPIC) and a huge function
with a lot of references to global variables.  Canonicalization of
position-independent addresses is then done over and over for the
same addresses, resulting in quadratic time and memory complexity for
GCSE's compute_transp; hundreds of megabytes of memory are allocated
in plus_constant,

The fix is to canonicalize the addresses outside the loop, similar to
what is done by the RTL DSE pass.

gcc 4.4.6:
 PRE :   3.83 (24%) usr   0.15 (17%) sys   3.99 (24%) wall  267307 kB (33%) ggc

gcc 4.7.2:
 PRE :   7.95 (41%) usr   0.40 (40%) sys   8.31 (41%) wall  821017 kB (80%) ggc

gcc 4.8.0:
 PRE :   6.94 (26%) usr   0.02 ( 4%) sys   6.98 (26%) wall     731 kB ( 0%) ggc

gcc 4.7.2 + patch:
 PRE :   5.90 (34%) usr   0.02 ( 3%) sys   6.41 (35%) wall    1670 kB ( 1%) ggc

Note that the bug is present on older branches too, but it became much
worse sometime between 4.4 and 4.7.

Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for
4.7 and trunk if it passes?

Paolo

2012-11-26  Paolo Bonzini  <pbonzini@redhat.com>

	PR rtl-optimization/55489
	* gcse.c (compute_transp): Precompute a canonical version
	of XEXP (x, 0), and pass it to canon_true_dependence.

Comments

Steven Bosscher Nov. 27, 2012, 4:40 p.m. UTC | #1
On Tue, Nov 27, 2012 at 5:00 PM, Paolo Bonzini wrote:
> Note that the bug is present on older branches too, but it became much
> worse sometime between 4.4 and 4.7.

Probably around the time we (i.e. you and me) taught gcse.c to handle
REG_EQUAL expressions?

> 2012-11-26  Paolo Bonzini  <>
>
>         PR rtl-optimization/55489
>         * gcse.c (compute_transp): Precompute a canonical version
>         of XEXP (x, 0), and pass it to canon_true_dependence.
>
> Index: gcse.c
> ===================================================================
> --- gcse.c      (revisione 193848)
> +++ gcse.c      (copia locale)
> @@ -1658,7 +1658,11 @@ compute_transp (const_rtx x, int indx, sbitmap *bm
>         {
>           bitmap_iterator bi;
>           unsigned bb_index;
> +         rtx x_addr;
>
> +         x_addr = get_addr (XEXP (x, 0));
> +         x_addr = canon_rtx (x_addr);
> +
>           /* First handle all the blocks with calls.  We don't need to
>              do any list walking for them.  */
>           EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi)
> @@ -1683,7 +1687,7 @@
>                     rtx dest_addr = pair->dest_addr;
>
>                     if (canon_true_dependence (dest, GET_MODE (dest),
> -                                              dest_addr, x, NULL_RTX))
> +                                              dest_addr, x, x_addr))
>                       RESET_BIT (bmap[bb_index], indx);
>                   }
>               }

I can't approve it, but this looks OK to me.

Maybe properly re-indent this block of code while at it?

Ciao!
Steven
Paolo Bonzini Nov. 27, 2012, 4:56 p.m. UTC | #2
Il 27/11/2012 17:40, Steven Bosscher ha scritto:
> > Note that the bug is present on older branches too, but it became much
> > worse sometime between 4.4 and 4.7.
> 
> Probably around the time we (i.e. you and me) taught gcse.c to handle
> REG_EQUAL expressions?

Hmm, no, that was much earlier.  Like 4.2 or 4.3, roughly the same time
when fwprop went in.

Paolo
Eric Botcazou Nov. 27, 2012, 5:58 p.m. UTC | #3
> Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for
> 4.7 and trunk if it passes?
> 
> Paolo
> 
> 2012-11-26  Paolo Bonzini  <pbonzini@redhat.com>
> 
> 	PR rtl-optimization/55489
> 	* gcse.c (compute_transp): Precompute a canonical version
> 	of XEXP (x, 0), and pass it to canon_true_dependence.

OK, thanks.
H.J. Lu Dec. 4, 2012, 10:30 p.m. UTC | #4
On Tue, Nov 27, 2012 at 8:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
> Hi,
>
> this bug triggers in the compilation of QEMU with GCC 4.7.2.  It is
> latent on trunk because reg_known_value is completely broken.  I'll
> send a separate patch, but this one applies there too.
>
> The problem arises when you have -fPIE (or -fPIC) and a huge function
> with a lot of references to global variables.  Canonicalization of
> position-independent addresses is then done over and over for the
> same addresses, resulting in quadratic time and memory complexity for
> GCSE's compute_transp; hundreds of megabytes of memory are allocated
> in plus_constant,
>
> The fix is to canonicalize the addresses outside the loop, similar to
> what is done by the RTL DSE pass.
>
> gcc 4.4.6:
>  PRE :   3.83 (24%) usr   0.15 (17%) sys   3.99 (24%) wall  267307 kB (33%) ggc
>
> gcc 4.7.2:
>  PRE :   7.95 (41%) usr   0.40 (40%) sys   8.31 (41%) wall  821017 kB (80%) ggc
>
> gcc 4.8.0:
>  PRE :   6.94 (26%) usr   0.02 ( 4%) sys   6.98 (26%) wall     731 kB ( 0%) ggc
>
> gcc 4.7.2 + patch:
>  PRE :   5.90 (34%) usr   0.02 ( 3%) sys   6.41 (35%) wall    1670 kB ( 1%) ggc
>
> Note that the bug is present on older branches too, but it became much
> worse sometime between 4.4 and 4.7.
>
> Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for
> 4.7 and trunk if it passes?
>
> Paolo
>
> 2012-11-26  Paolo Bonzini  <pbonzini@redhat.com>
>
>         PR rtl-optimization/55489
>         * gcse.c (compute_transp): Precompute a canonical version
>         of XEXP (x, 0), and pass it to canon_true_dependence.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55597
Paolo Bonzini Dec. 4, 2012, 10:53 p.m. UTC | #5
Il 04/12/2012 23:30, H.J. Lu ha scritto:
> On Tue, Nov 27, 2012 at 8:00 AM, Paolo Bonzini <bonzini@gnu.org> wrote:
>> Hi,
>>
>> this bug triggers in the compilation of QEMU with GCC 4.7.2.  It is
>> latent on trunk because reg_known_value is completely broken.  I'll
>> send a separate patch, but this one applies there too.
>>
>> The problem arises when you have -fPIE (or -fPIC) and a huge function
>> with a lot of references to global variables.  Canonicalization of
>> position-independent addresses is then done over and over for the
>> same addresses, resulting in quadratic time and memory complexity for
>> GCSE's compute_transp; hundreds of megabytes of memory are allocated
>> in plus_constant,
>>
>> The fix is to canonicalize the addresses outside the loop, similar to
>> what is done by the RTL DSE pass.
>>
>> gcc 4.4.6:
>>  PRE :   3.83 (24%) usr   0.15 (17%) sys   3.99 (24%) wall  267307 kB (33%) ggc
>>
>> gcc 4.7.2:
>>  PRE :   7.95 (41%) usr   0.40 (40%) sys   8.31 (41%) wall  821017 kB (80%) ggc
>>
>> gcc 4.8.0:
>>  PRE :   6.94 (26%) usr   0.02 ( 4%) sys   6.98 (26%) wall     731 kB ( 0%) ggc
>>
>> gcc 4.7.2 + patch:
>>  PRE :   5.90 (34%) usr   0.02 ( 3%) sys   6.41 (35%) wall    1670 kB ( 1%) ggc
>>
>> Note that the bug is present on older branches too, but it became much
>> worse sometime between 4.4 and 4.7.
>>
>> Bootstrap finished on x86_64-pc-linux-gnu, regtest in progress; ok for
>> 4.7 and trunk if it passes?
>>
>> Paolo
>>
>> 2012-11-26  Paolo Bonzini  <pbonzini@redhat.com>
>>
>>         PR rtl-optimization/55489
>>         * gcse.c (compute_transp): Precompute a canonical version
>>         of XEXP (x, 0), and pass it to canon_true_dependence.
>>
> 
> This caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55597

Most likely caused by
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02225.html ([PATCH] Fix
allocation of reg_known_value), not by this patch.

Paolo
diff mbox

Patch

Index: gcse.c
===================================================================
--- gcse.c	(revisione 193848)
+++ gcse.c	(copia locale)
@@ -1658,7 +1658,11 @@  compute_transp (const_rtx x, int indx, sbitmap *bm
 	{
 	  bitmap_iterator bi;
 	  unsigned bb_index;
+	  rtx x_addr;
 
+	  x_addr = get_addr (XEXP (x, 0));
+	  x_addr = canon_rtx (x_addr);
+
 	  /* First handle all the blocks with calls.  We don't need to
 	     do any list walking for them.  */
 	  EXECUTE_IF_SET_IN_BITMAP (blocks_with_calls, 0, bb_index, bi)
@@ -1683,7 +1687,7 @@ 
 		    rtx dest_addr = pair->dest_addr;
 
 		    if (canon_true_dependence (dest, GET_MODE (dest),
-					       dest_addr, x, NULL_RTX))
+					       dest_addr, x, x_addr))
 		      RESET_BIT (bmap[bb_index], indx);
 	          }
 	      }