Message ID | 516316EF.6080806@st.com |
---|---|
State | New |
Headers | show |
Hello, On Mon, 8 Apr 2013, Laurent Alfonsi wrote: > I have identified a big performance regression between 4.6 and 4.7. (I have > enclosed a pathological test). > > After investigation, it is because of the += statement applied on 2 signed > chars. > - It is now type-promoted to "int" when it is written "result += foo()". > (since 4.7) > - it is type promoted to "unsigned char" when it is written "result = > result + foo()". > > The "char->int->char" cast is blocking some optimizations in later phases. Which ones? > Anyway, this doesn't look wrong, so I extended fold optimization in order to > catch this case. (patch enclosed) > The patch basically transforms : > (TypeA) ( (TypeB) a1 + (TypeB) a2 ) /* with a1 and a2 of > the signed type TypeA */ > into : > a1 + a2 > > I believe this is legal for any licit a1/a2 input values (no overflow on > signed char). I don't think this is ok, please refer to the discussion around the PR and patch that added this conversion, it was done on purpose. According to this (4th item) http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html char a=100; a+=a; is perfectly defined and a is -56 (assuming a signed 8 bit char and a strictly larger int). However, your transformation turns it into undefined behavior: an addition that overflows in a type with TYPE_OVERFLOW_UNDEFINED.
On Mon, Apr 8, 2013 at 9:13 PM, Laurent Alfonsi <laurent.alfonsi@st.com> wrote: > Hello, > > I have identified a big performance regression between 4.6 and 4.7. (I have > enclosed a pathological test). > > After investigation, it is because of the += statement applied on 2 signed > chars. > - It is now type-promoted to "int" when it is written "result += foo()". > (since 4.7) > - it is type promoted to "unsigned char" when it is written "result = > result + foo()". > > The "char->int->char" cast is blocking some optimizations in later phases. > Anyway, this doesn't look wrong, so I extended fold optimization in order to > catch this case. (patch enclosed) > The patch basically transforms : > (TypeA) ( (TypeB) a1 + (TypeB) a2 ) /* with a1 and a2 of > the signed type TypeA */ > into : > a1 + a2 > > I believe this is legal for any licit a1/a2 input values (no overflow on > signed char). > No new failure on the two tested targets : sh-superh-elf and > x86_64-unknown-linux-gnu. > Should I enter a bugzilla to track this ? Is it ok for trunk ? Please open a bugzilla. No, the patch is not ok, as said by Marc. It's possible to shorten the operation by performing it using unsigned arithmetic, that is (signed)((unsigned)a1 + (unsigned)a2). But if really "casts" cause the issue you are seeing that will not help you. Richard. > 2013-04-08 Laurent Alfonsi <laurent.alfonsi@st.com> > > * fold-const.c (fold_unary_loc): Suppress useless type promotion. > > > Thanks, > Laurent >
I understand, Thanks for your answer. Looking at the standard, I was thinking the example you pointed was undefined. I have created a bugzilla (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56894) to track this big performance regression. My investigations pointed on the scev_const_prop optimization. The scalar evolution of "result" variable in the loop cannot be computed because of the cast : type <integer_type 0xf7de32a0 char sizes-gimplified public string-flag type_6 QI[...]> arg 0 <polynomial_chrec 0xf7ecd180 type <integer_type 0xf7de33c0 int sizes-gimplified public type_6 SI[...]> arg 0 <integer_cst 0xf7dcbe70 constant 2> arg 1 <integer_cst 0xf7dcba80 constant 0> arg 2 <integer_cst 0xf7dcba9c constant 1>>> I don't know how to fix the case. But I believe it should be looked at, as it is a quite big perf regression in some testcases. please advice. Also, I didn't find the bugzilla that led to this change of the += operation. I would be interested to have a look at it, if you can find it. Thanks Laurent On 09.04.2013 10:50, Richard Biener wrote: > On Mon, Apr 8, 2013 at 9:13 PM, Laurent Alfonsi <laurent.alfonsi@st.com> wrote: >> Hello, >> >> I have identified a big performance regression between 4.6 and 4.7. (I have >> enclosed a pathological test). >> >> After investigation, it is because of the += statement applied on 2 signed >> chars. >> - It is now type-promoted to "int" when it is written "result += foo()". >> (since 4.7) >> - it is type promoted to "unsigned char" when it is written "result = >> result + foo()". >> >> The "char->int->char" cast is blocking some optimizations in later phases. >> Anyway, this doesn't look wrong, so I extended fold optimization in order to >> catch this case. (patch enclosed) >> The patch basically transforms : >> (TypeA) ( (TypeB) a1 + (TypeB) a2 ) /* with a1 and a2 of >> the signed type TypeA */ >> into : >> a1 + a2 >> >> I believe this is legal for any licit a1/a2 input values (no overflow on >> signed char). >> No new failure on the two tested targets : sh-superh-elf and >> x86_64-unknown-linux-gnu. >> Should I enter a bugzilla to track this ? Is it ok for trunk ? > Please open a bugzilla. No, the patch is not ok, as said by Marc. > It's possible to shorten the operation by performing it using > unsigned arithmetic, that is (signed)((unsigned)a1 + (unsigned)a2). > But if really "casts" cause the issue you are seeing that will not > help you. > > Richard. > >> 2013-04-08 Laurent Alfonsi <laurent.alfonsi@st.com> >> >> * fold-const.c (fold_unary_loc): Suppress useless type promotion. >> >> >> Thanks, >> Laurent >> > . >
--- ./gcc.orig/gcc/fold-const.c 2013-04-08 14:09:32.000000000 +0200 +++ ./gcc/gcc/fold-const.c 2013-04-08 11:08:16.000000000 +0200 @@ -8055,6 +8055,26 @@ } } + /* Convert (T1) ((T2)X + (T2)Y) into X + Y, + if X and Y already have type T1 (integral only), and T2 > T1 */ + if (INTEGRAL_TYPE_P (type) + && TYPE_OVERFLOW_UNDEFINED (type) + && (TREE_CODE (op0) == PLUS_EXPR || TREE_CODE (op0) == MINUS_EXPR + || TREE_CODE (op0) == MULT_EXPR) + && TREE_CODE (TREE_OPERAND (op0, 0)) == NOP_EXPR + && TREE_CODE (TREE_OPERAND (op0, 1)) == NOP_EXPR + && type == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (op0, 0), 0)) + && type == TREE_TYPE (TREE_OPERAND (TREE_OPERAND (op0, 1), 0)) + && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (op0))) + { + tem = fold_build2_loc (loc, TREE_CODE (op0), type, + fold_convert_loc (loc, type, + TREE_OPERAND (op0, 0)), + fold_convert_loc (loc, type, + TREE_OPERAND (op0, 1))); + return fold_convert_loc (loc, type, tem); + } + tem = fold_convert_const (code, type, op0); return tem ? tem : NULL_TREE;