Patchwork Fix handling of very long asm statements in inliner

login
register
mail settings
Submitter Andi Kleen
Date Feb. 21, 2013, 2:05 p.m.
Message ID <20130221140555.GE2928@two.firstfloor.org>
Download mbox | patch
Permalink /patch/222291/
State New
Headers show

Comments

Andi Kleen - Feb. 21, 2013, 2:05 p.m.
An auto generated program with a 6.4mio line asm statement gave
with 4.7 and 4.8:

xxx.c:6400017:1: internal compiler error: in account_size_time, at
ipa-inline-analysis.c:601

The problem is that the inliner counts the number of lines in the asm
statement and multiplies that with a weight. With the weight this
overflows 32bit signed int, and triggers an assert for negative time.

Fix this by limiting the number of lines to 1000 for asm cost
estimation. The RTL backend also does similar multiplications for
jump shortening. I haven't tried to address this, but presumably
it's less likely to result in a failure.

Passes test suite on x86_64-linux.

Ok for 4.7 and 4.8?

2013-02-17  Andi Kleen  <ak@linux.intel.com>

	* tree-inline.c (estimate_num_insns): Limit asm cost to 1000.
Richard Earnshaw - Feb. 21, 2013, 2:57 p.m.
On 21/02/13 14:05, Andi Kleen wrote:
> An auto generated program with a 6.4mio line asm statement gave
> with 4.7 and 4.8:
>
> xxx.c:6400017:1: internal compiler error: in account_size_time, at
> ipa-inline-analysis.c:601
>
> The problem is that the inliner counts the number of lines in the asm
> statement and multiplies that with a weight. With the weight this
> overflows 32bit signed int, and triggers an assert for negative time.
>
> Fix this by limiting the number of lines to 1000 for asm cost
> estimation. The RTL backend also does similar multiplications for
> jump shortening. I haven't tried to address this, but presumably
> it's less likely to result in a failure.
>
> Passes test suite on x86_64-linux.
>
> Ok for 4.7 and 4.8?
>
> 2013-02-17  Andi Kleen  <ak@linux.intel.com>
>
> 	* tree-inline.c (estimate_num_insns): Limit asm cost to 1000.
>
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 2a1b692..7f8f2f2 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -3595,7 +3595,14 @@ estimate_num_insns (gimple stmt, eni_weights *weights)
>         return 0;
>
>       case GIMPLE_ASM:
> -      return asm_str_count (gimple_asm_string (stmt));
> +      {
> +	int count = asm_str_count (gimple_asm_string (stmt));
> +	/* 1000 means infinity. This avoids overflows later
> +	   with very long asm statements.  */
> +	if (count > 1000)
> +	  count = 1000;
> +	return count;
> +      }
>
>       case GIMPLE_RESX:
>         /* This is either going to be an external function call with one
>


That doesn't sound enough, unless there is already code out there that 
respects this count.  1000 at 4 bytes per instruction is only 4k.   More 
that small enough for the rest of the compiler to think that it could 
jump around such blocks cheaply.

I think a limit of 1M or more might be more appropriate.

R.
Andi Kleen - Feb. 21, 2013, 3:59 p.m.
> That doesn't sound enough, unless there is already code out there
> that respects this count.  1000 at 4 bytes per instruction is only
> 4k.   More that small enough for the rest of the compiler to think
> that it could jump around such blocks cheaply.
> 
> I think a limit of 1M or more might be more appropriate.

I got an overflow for 6.4M, so 1M would be dangerously near that.
100k perhaps ? 

This was not for jump shortening, but the inliner heuristics.

In the worst case we could separate the two, would be a larger
patch though.

-Andi
Andi Kleen - Feb. 21, 2013, 5:02 p.m.
> This was not for jump shortening, but the inliner heuristics.
> 
> In the worst case we could separate the two, would be a larger
> patch though.

Actually it's already separated, I don't affect the jump shortening
at all. Only the inliner code adds the limit.

So it would depend whether 1000 * weight is large enough for inlining.
Honza?

-Andi
Jan Hubicka - Feb. 21, 2013, 5:46 p.m.
> > This was not for jump shortening, but the inliner heuristics.
> > 
> > In the worst case we could separate the two, would be a larger
> > patch though.
> 
> Actually it's already separated, I don't affect the jump shortening
> at all. Only the inliner code adds the limit.
> 
> So it would depend whether 1000 * weight is large enough for inlining.
> Honza?

I think it is safely more than enough. 1000*weight makes the function safely
uninlinable. So only it cuts of precision in large function growth/large unit
growth logic.  The logic is skewed here: large function growth is there really
for compiler visible code to prevent non-linear algorithm explosion.  For large
unit growth I duno - in a way those gigantic asm statements that are not going
to be duplicated can be tought as not part of the unit for this putpose, too.

So I am fine with the cutoff.  We may need to add more overflow guards (we
already have quite few for time) that makes me wonder if all this should not be
done all in saturating arithmetic now when it can be done theoretically with one
C++ class?

Honza
> 
> -Andi
Richard Earnshaw - Feb. 21, 2013, 6:19 p.m.
On 21/02/13 15:59, Andi Kleen wrote:
>> That doesn't sound enough, unless there is already code out there
>> that respects this count.  1000 at 4 bytes per instruction is only
>> 4k.   More that small enough for the rest of the compiler to think
>> that it could jump around such blocks cheaply.
>>
>> I think a limit of 1M or more might be more appropriate.
>
> I got an overflow for 6.4M, so 1M would be dangerously near that.
> 100k perhaps ?
>
> This was not for jump shortening, but the inliner heuristics.
>

My mistake.  However, the real problem here is that you're summing a 
potentially large number of items on the expectation that the result 
will be small enough to be scaled successfully by the weight function.

A better patch would be to a apply a much larger limit at the time when 
the count of instructions is finally used, rather than picking some 
arbitrary number here in the hope that it won't cause overflow later on.

Yes, that's a bigger patch, but it would be a more robust solution than 
picking an arbitrary number here.

> In the worst case we could separate the two, would be a larger
> patch though.
>
> -Andi
>
Andi Kleen - Feb. 21, 2013, 6:19 p.m.
> So I am fine with the cutoff.  We may need to add more overflow guards (we
> already have quite few for time) that makes me wonder if all this should not be
> done all in saturating arithmetic now when it can be done theoretically with one
> C++ class?

Sounds like a good idea, although I don't know how intrusive that is. But can
try. 

Still would like to use the 1000 for the 4.7 backport. Ok?

-Andi
Andi Kleen - Feb. 22, 2013, 3:32 a.m.
On Thu, Feb 21, 2013 at 07:19:10PM +0100, Andi Kleen wrote:
> > So I am fine with the cutoff.  We may need to add more overflow guards (we
> > already have quite few for time) that makes me wonder if all this should not be
> > done all in saturating arithmetic now when it can be done theoretically with one
> > C++ class?
> 
> Sounds like a good idea, although I don't know how intrusive that is. But can
> try. 

I looked at this now and it would be very intrusive. Not only in
tree-inline,  but also various callers elsewhere. And there's no
saturating class either, so that would need to be added too.

I think I prefer the simple solution.

BTW this is a regression, old gccs compiled this.

-Andi
Andi Kleen - Sept. 8, 2013, 7:44 p.m.
Andi Kleen <andi@firstfloor.org> writes:

Ping!

This problem is still open, and afaik no better solution has been
proposed. It is also a regression.

Is it ok to commit if I rerun the tests and they pass?

> An auto generated program with a 6.4mio line asm statement gave
> with 4.7 and 4.8:
>
> xxx.c:6400017:1: internal compiler error: in account_size_time, at
> ipa-inline-analysis.c:601
>
> The problem is that the inliner counts the number of lines in the asm
> statement and multiplies that with a weight. With the weight this
> overflows 32bit signed int, and triggers an assert for negative time.
>
> Fix this by limiting the number of lines to 1000 for asm cost
> estimation. The RTL backend also does similar multiplications for
> jump shortening. I haven't tried to address this, but presumably
> it's less likely to result in a failure.
>
> Passes test suite on x86_64-linux.
>
> Ok for 4.7 and 4.8?
>
> 2013-02-17  Andi Kleen  <ak@linux.intel.com>
>
> 	* tree-inline.c (estimate_num_insns): Limit asm cost to 1000.
>
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 2a1b692..7f8f2f2 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -3595,7 +3595,14 @@ estimate_num_insns (gimple stmt, eni_weights *weights)
>        return 0;
>  
>      case GIMPLE_ASM:
> -      return asm_str_count (gimple_asm_string (stmt));
> +      {
> +	int count = asm_str_count (gimple_asm_string (stmt));
> +	/* 1000 means infinity. This avoids overflows later
> +	   with very long asm statements.  */
> +	if (count > 1000)
> +	  count = 1000;
> +	return count;
> +      }
>  
>      case GIMPLE_RESX:
>        /* This is either going to be an external function call with one
Jan Hubicka - Sept. 8, 2013, 9:05 p.m.
> Andi Kleen <andi@firstfloor.org> writes:
> 
> Ping!
> 
> This problem is still open, and afaik no better solution has been
> proposed. It is also a regression.
> 
> Is it ok to commit if I rerun the tests and they pass?

OK.
Honza

Patch

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 2a1b692..7f8f2f2 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3595,7 +3595,14 @@  estimate_num_insns (gimple stmt, eni_weights *weights)
       return 0;
 
     case GIMPLE_ASM:
-      return asm_str_count (gimple_asm_string (stmt));
+      {
+	int count = asm_str_count (gimple_asm_string (stmt));
+	/* 1000 means infinity. This avoids overflows later
+	   with very long asm statements.  */
+	if (count > 1000)
+	  count = 1000;
+	return count;
+      }
 
     case GIMPLE_RESX:
       /* This is either going to be an external function call with one