Message ID | 4D00FD55.8050205@codesourcery.com |
---|---|
State | New |
Headers | show |
Andrew Stubbs <ams@codesourcery.com> writes: > The attached patch adjusts the ARM machine description to prevent GCC > emitting redundant zero-extends before 16-bit->32-bit multiply and > accumulate operations. > > The problem is that maddhisi4 pattern has the operands of plus swapped, > with respect to the (de facto?) canonical form always returned by > combine_simplify_rtx. This means that recog does not match the pattern, > and the zero-extends are not optimized away. > > The patch simply swaps the order of the operands. maddhidi4 appears to > have a similar problem, so I've corrected it there also. > > Test case: > > int footrunc (int x, int a, int b) > { > return x + (short) a * (short) b; > } > > Before, compiled with "-O2": > > mov ip, r1, asl #16 > mov r3, r2, asl #16 > mov r1, ip, lsr #16 > mov r2, r3, lsr #16 > smlabb r0, r2, r1, r0 > bx lr > > (On armv7a/thumb2 the code uses uxth, but the problem is the same.) > > After: > smlabb r0, r1, r2, r0 > bx lr > > OK for commit, after stage 1 opens again? Could this go in during stage 3? Like you say, the current maddhisi4 expander (which is known to optabs) generates "noncanonical" rtx. That seems like a bug. Richard
Richard Sandiford <richard.sandiford@linaro.org> writes: > Could this go in during stage 3? Like you say, the current maddhisi4 > expander (which is known to optabs) generates "noncanonical" rtx. > That seems like a bug. Sorry to follow up on myself, but to be more specific: as Paolo said recently, "noncanonical rtx" in this type of situation really means "invalid rtx", so having optabs generate what is effectively invalid rtx seems like a correctness bug. Richard
On Thu, 2010-12-09 at 16:01 +0000, Andrew Stubbs wrote: > The attached patch adjusts the ARM machine description to prevent GCC > emitting redundant zero-extends before 16-bit->32-bit multiply and > accumulate operations. > > The problem is that maddhisi4 pattern has the operands of plus swapped, > with respect to the (de facto?) canonical form always returned by > combine_simplify_rtx. This means that recog does not match the pattern, > and the zero-extends are not optimized away. > > The patch simply swaps the order of the operands. maddhidi4 appears to > have a similar problem, so I've corrected it there also. > > Test case: > > int footrunc (int x, int a, int b) > { > return x + (short) a * (short) b; > } > > Before, compiled with "-O2": > > mov ip, r1, asl #16 > mov r3, r2, asl #16 > mov r1, ip, lsr #16 > mov r2, r3, lsr #16 > smlabb r0, r2, r1, r0 > bx lr > > (On armv7a/thumb2 the code uses uxth, but the problem is the same.) > > After: > smlabb r0, r1, r2, r0 > bx lr > > OK for commit, after stage 1 opens again? > > Andrew This is ok to go in now. While you're modifying these patterns, the % in the register constraint does not do anything useful (and actually makes the compiler slightly slower) since the constraints and predicates are otherwise identical, so it might as well be removed. R.
2010-12-09 Andrew Stubbs <ams@codesourcery.com> gcc/ * config/arm/arm.md (maddhisi4, *maddhidi4): Use the canonical operand order for plus. --- src/gcc-mainline/gcc/config/arm/arm.md | 10 +++++----- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/gcc-mainline/gcc/config/arm/arm.md b/src/gcc-mainline/gcc/config/arm/arm.md index 8bc9926..c816126 100644 --- a/src/gcc-mainline/gcc/config/arm/arm.md +++ b/src/gcc-mainline/gcc/config/arm/arm.md @@ -1800,11 +1800,11 @@ (define_insn "maddhisi4" [(set (match_operand:SI 0 "s_register_operand" "=r") - (plus:SI (match_operand:SI 3 "s_register_operand" "r") - (mult:SI (sign_extend:SI + (plus:SI (mult:SI (sign_extend:SI (match_operand:HI 1 "s_register_operand" "%r")) (sign_extend:SI - (match_operand:HI 2 "s_register_operand" "r")))))] + (match_operand:HI 2 "s_register_operand" "r"))) + (match_operand:SI 3 "s_register_operand" "r")))] "TARGET_DSP_MULTIPLY" "smlabb%?\\t%0, %1, %2, %3" [(set_attr "insn" "smlaxy") @@ -1814,11 +1814,11 @@ (define_insn "*maddhidi4" [(set (match_operand:DI 0 "s_register_operand" "=r") (plus:DI - (match_operand:DI 3 "s_register_operand" "0") (mult:DI (sign_extend:DI (match_operand:HI 1 "s_register_operand" "%r")) (sign_extend:DI - (match_operand:HI 2 "s_register_operand" "r")))))] + (match_operand:HI 2 "s_register_operand" "r"))) + (match_operand:DI 3 "s_register_operand" "0")))] "TARGET_DSP_MULTIPLY" "smlalbb%?\\t%Q0, %R0, %1, %2" [(set_attr "insn" "smlalxy")