Patchwork useless cast blocking some optimization in gcc 4.7.3

login
register
mail settings
Submitter Laurent Alfonsi
Date April 8, 2013, 7:13 p.m.
Message ID <516316EF.6080806@st.com>
Download mbox | patch
Permalink /patch/234868/
State New
Headers show

Comments

Laurent Alfonsi - April 8, 2013, 7:13 p.m.
Hello,

I have identified a big performance regression between 4.6 and 4.7. (I 
have enclosed a pathological test).

After investigation, it is because of the += statement applied on 2 
signed chars.
   - It is now type-promoted to "int" when it is written "result += 
foo()".    (since 4.7)
   - it is type promoted to "unsigned char" when it is written "result = 
result + foo()".

The "char->int->char" cast is blocking some optimizations in later phases.
Anyway, this doesn't look wrong, so I extended fold optimization in 
order to catch this case. (patch enclosed)
The patch basically transforms :
     (TypeA)  ( (TypeB) a1 + (TypeB) a2 )                /* with a1 and 
a2 of the signed type TypeA */
into :
     a1 + a2

I believe this is legal for any licit a1/a2 input values (no overflow on 
signed char).
No new failure on the two tested targets : sh-superh-elf and 
x86_64-unknown-linux-gnu.
Should I enter a bugzilla to track this ? Is it ok for trunk ?

2013-04-08  Laurent Alfonsi  <laurent.alfonsi@st.com>

        * fold-const.c (fold_unary_loc): Suppress useless type promotion.


Thanks,
Laurent
Marc Glisse - April 8, 2013, 7:39 p.m.
Hello,

On Mon, 8 Apr 2013, Laurent Alfonsi wrote:

> I have identified a big performance regression between 4.6 and 4.7. (I have 
> enclosed a pathological test).
>
> After investigation, it is because of the += statement applied on 2 signed 
> chars.
>  - It is now type-promoted to "int" when it is written "result += foo()". 
> (since 4.7)
>  - it is type promoted to "unsigned char" when it is written "result = 
> result + foo()".
>
> The "char->int->char" cast is blocking some optimizations in later phases.

Which ones?

> Anyway, this doesn't look wrong, so I extended fold optimization in order to 
> catch this case. (patch enclosed)
> The patch basically transforms :
>    (TypeA)  ( (TypeB) a1 + (TypeB) a2 )                /* with a1 and a2 of 
> the signed type TypeA */
> into :
>    a1 + a2
>
> I believe this is legal for any licit a1/a2 input values (no overflow on 
> signed char).

I don't think this is ok, please refer to the discussion around the 
PR and patch that added this conversion, it was done on purpose.

According to this (4th item)
http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html

char a=100;
a+=a;

is perfectly defined and a is -56 (assuming a signed 8 bit char and a 
strictly larger int). However, your transformation turns it into undefined 
behavior: an addition that overflows in a type with 
TYPE_OVERFLOW_UNDEFINED.
Richard Guenther - April 9, 2013, 8:50 a.m.
On Mon, Apr 8, 2013 at 9:13 PM, Laurent Alfonsi <laurent.alfonsi@st.com> wrote:
>     Hello,
>
> I have identified a big performance regression between 4.6 and 4.7. (I have
> enclosed a pathological test).
>
> After investigation, it is because of the += statement applied on 2 signed
> chars.
>   - It is now type-promoted to "int" when it is written "result += foo()".
> (since 4.7)
>   - it is type promoted to "unsigned char" when it is written "result =
> result + foo()".
>
> The "char->int->char" cast is blocking some optimizations in later phases.
> Anyway, this doesn't look wrong, so I extended fold optimization in order to
> catch this case. (patch enclosed)
> The patch basically transforms :
>     (TypeA)  ( (TypeB) a1 + (TypeB) a2 )                /* with a1 and a2 of
> the signed type TypeA */
> into :
>     a1 + a2
>
> I believe this is legal for any licit a1/a2 input values (no overflow on
> signed char).
> No new failure on the two tested targets : sh-superh-elf and
> x86_64-unknown-linux-gnu.
> Should I enter a bugzilla to track this ? Is it ok for trunk ?

Please open a bugzilla.  No, the patch is not ok, as said by Marc.
It's possible to shorten the operation by performing it using
unsigned arithmetic, that is (signed)((unsigned)a1 + (unsigned)a2).
But if really "casts" cause the issue you are seeing that will not
help you.

Richard.

> 2013-04-08  Laurent Alfonsi  <laurent.alfonsi@st.com>
>
>        * fold-const.c (fold_unary_loc): Suppress useless type promotion.
>
>
> Thanks,
> Laurent
>
Laurent Alfonsi - April 9, 2013, 3:31 p.m.
I understand, Thanks for your answer.
Looking at the standard, I was thinking the example you pointed was 
undefined.
I have created a bugzilla 
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56894) to track this big 
performance regression.

My investigations pointed on the scev_const_prop optimization.
The scalar evolution of "result" variable in the loop cannot be computed 
because of the cast :

     type <integer_type 0xf7de32a0 char sizes-gimplified public string-flag
type_6 QI[...]>
     arg 0 <polynomial_chrec 0xf7ecd180
         type <integer_type 0xf7de33c0 int sizes-gimplified public type_6
SI[...]>
         arg 0 <integer_cst 0xf7dcbe70 constant 2>
         arg 1 <integer_cst 0xf7dcba80 constant 0>
         arg 2 <integer_cst 0xf7dcba9c constant 1>>>

I don't know how to fix the case. But I believe it should be looked at, 
as it is a quite big perf regression in some testcases.
please advice.

Also, I didn't find the bugzilla that led to this change of the += 
operation. I would be interested to have a look at it, if you can find it.

Thanks
Laurent

On 09.04.2013 10:50, Richard Biener wrote:
> On Mon, Apr 8, 2013 at 9:13 PM, Laurent Alfonsi <laurent.alfonsi@st.com> wrote:
>>      Hello,
>>
>> I have identified a big performance regression between 4.6 and 4.7. (I have
>> enclosed a pathological test).
>>
>> After investigation, it is because of the += statement applied on 2 signed
>> chars.
>>    - It is now type-promoted to "int" when it is written "result += foo()".
>> (since 4.7)
>>    - it is type promoted to "unsigned char" when it is written "result =
>> result + foo()".
>>
>> The "char->int->char" cast is blocking some optimizations in later phases.
>> Anyway, this doesn't look wrong, so I extended fold optimization in order to
>> catch this case. (patch enclosed)
>> The patch basically transforms :
>>      (TypeA)  ( (TypeB) a1 + (TypeB) a2 )                /* with a1 and a2 of
>> the signed type TypeA */
>> into :
>>      a1 + a2
>>
>> I believe this is legal for any licit a1/a2 input values (no overflow on
>> signed char).
>> No new failure on the two tested targets : sh-superh-elf and
>> x86_64-unknown-linux-gnu.
>> Should I enter a bugzilla to track this ? Is it ok for trunk ?
> Please open a bugzilla.  No, the patch is not ok, as said by Marc.
> It's possible to shorten the operation by performing it using
> unsigned arithmetic, that is (signed)((unsigned)a1 + (unsigned)a2).
> But if really "casts" cause the issue you are seeing that will not
> help you.
>
> Richard.
>
>> 2013-04-08  Laurent Alfonsi  <laurent.alfonsi@st.com>
>>
>>         * fold-const.c (fold_unary_loc): Suppress useless type promotion.
>>
>>
>> Thanks,
>> Laurent
>>
> .
>

Patch

--- ./gcc.orig/gcc/fold-const.c	2013-04-08 14:09:32.000000000 +0200
+++ ./gcc/gcc/fold-const.c	2013-04-08 11:08:16.000000000 +0200
@@ -8055,6 +8055,26 @@ 
 	    }
 	}
 
+      /* Convert (T1) ((T2)X + (T2)Y) into X + Y, 
+         if X and Y already have type T1 (integral only), and T2 > T1 */
+      if (INTEGRAL_TYPE_P (type)
+          && TYPE_OVERFLOW_UNDEFINED (type)
+	  && (TREE_CODE (op0) == PLUS_EXPR || TREE_CODE (op0) == MINUS_EXPR
+	     || TREE_CODE (op0) == MULT_EXPR)
+	  && TREE_CODE (TREE_OPERAND (op0, 0)) == NOP_EXPR
+	  && TREE_CODE (TREE_OPERAND (op0, 1)) == NOP_EXPR
+	  && type == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (op0, 0), 0))
+	  && type == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (op0, 1), 0))
+	  && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (op0)))
+	{
+	  tem = fold_build2_loc (loc, TREE_CODE (op0), type,
+	    		     fold_convert_loc (loc, type,
+	    				       TREE_OPERAND (op0, 0)),
+	    		     fold_convert_loc (loc, type,
+	    				       TREE_OPERAND (op0, 1)));
+	  return fold_convert_loc (loc, type, tem);
+	}
+
       tem = fold_convert_const (code, type, op0);
       return tem ? tem : NULL_TREE;