diff mbox

[ARM] Don't generate redundant zero_extend before smlabb

Message ID 4D00FD55.8050205@codesourcery.com
State New
Headers show

Commit Message

Andrew Stubbs Dec. 9, 2010, 4:01 p.m. UTC
The attached patch adjusts the ARM machine description to prevent GCC 
emitting redundant zero-extends before 16-bit->32-bit multiply and 
accumulate operations.

The problem is that maddhisi4 pattern has the operands of plus swapped, 
with respect to the (de facto?) canonical form always returned by 
combine_simplify_rtx. This means that recog does not match the pattern, 
and the zero-extends are not optimized away.

The patch simply swaps the order of the operands. maddhidi4 appears to 
have a similar problem, so I've corrected it there also.

Test case:

int footrunc (int x, int a, int b)
{
      return x + (short) a * (short) b;
}

Before, compiled with "-O2":

          mov     ip, r1, asl #16
          mov     r3, r2, asl #16
          mov     r1, ip, lsr #16
          mov     r2, r3, lsr #16
          smlabb  r0, r2, r1, r0
          bx      lr

(On armv7a/thumb2 the code uses uxth, but the problem is the same.)

After:
          smlabb  r0, r1, r2, r0
          bx      lr

OK for commit, after stage 1 opens again?

Andrew

Comments

Richard Sandiford Dec. 15, 2010, 9:37 a.m. UTC | #1
Andrew Stubbs <ams@codesourcery.com> writes:
> The attached patch adjusts the ARM machine description to prevent GCC 
> emitting redundant zero-extends before 16-bit->32-bit multiply and 
> accumulate operations.
>
> The problem is that maddhisi4 pattern has the operands of plus swapped, 
> with respect to the (de facto?) canonical form always returned by 
> combine_simplify_rtx. This means that recog does not match the pattern, 
> and the zero-extends are not optimized away.
>
> The patch simply swaps the order of the operands. maddhidi4 appears to 
> have a similar problem, so I've corrected it there also.
>
> Test case:
>
> int footrunc (int x, int a, int b)
> {
>       return x + (short) a * (short) b;
> }
>
> Before, compiled with "-O2":
>
>           mov     ip, r1, asl #16
>           mov     r3, r2, asl #16
>           mov     r1, ip, lsr #16
>           mov     r2, r3, lsr #16
>           smlabb  r0, r2, r1, r0
>           bx      lr
>
> (On armv7a/thumb2 the code uses uxth, but the problem is the same.)
>
> After:
>           smlabb  r0, r1, r2, r0
>           bx      lr
>
> OK for commit, after stage 1 opens again?

Could this go in during stage 3?  Like you say, the current maddhisi4
expander (which is known to optabs) generates "noncanonical" rtx.
That seems like a bug.

Richard
Richard Sandiford Dec. 15, 2010, 9:44 a.m. UTC | #2
Richard Sandiford <richard.sandiford@linaro.org> writes:
> Could this go in during stage 3?  Like you say, the current maddhisi4
> expander (which is known to optabs) generates "noncanonical" rtx.
> That seems like a bug.

Sorry to follow up on myself, but to be more specific: as Paolo said
recently, "noncanonical rtx" in this type of situation really means
"invalid rtx", so having optabs generate what is effectively invalid
rtx seems like a correctness bug.

Richard
Richard Earnshaw Dec. 16, 2010, 6:26 p.m. UTC | #3
On Thu, 2010-12-09 at 16:01 +0000, Andrew Stubbs wrote:
> The attached patch adjusts the ARM machine description to prevent GCC 
> emitting redundant zero-extends before 16-bit->32-bit multiply and 
> accumulate operations.
> 
> The problem is that maddhisi4 pattern has the operands of plus swapped, 
> with respect to the (de facto?) canonical form always returned by 
> combine_simplify_rtx. This means that recog does not match the pattern, 
> and the zero-extends are not optimized away.
> 
> The patch simply swaps the order of the operands. maddhidi4 appears to 
> have a similar problem, so I've corrected it there also.
> 
> Test case:
> 
> int footrunc (int x, int a, int b)
> {
>       return x + (short) a * (short) b;
> }
> 
> Before, compiled with "-O2":
> 
>           mov     ip, r1, asl #16
>           mov     r3, r2, asl #16
>           mov     r1, ip, lsr #16
>           mov     r2, r3, lsr #16
>           smlabb  r0, r2, r1, r0
>           bx      lr
> 
> (On armv7a/thumb2 the code uses uxth, but the problem is the same.)
> 
> After:
>           smlabb  r0, r1, r2, r0
>           bx      lr
> 
> OK for commit, after stage 1 opens again?
> 
> Andrew

This is ok to go in now.  While you're modifying these patterns, the %
in the register constraint does not do anything useful (and actually
makes the compiler slightly slower) since the constraints and predicates
are otherwise identical, so it might as well be removed.

R.
diff mbox

Patch

2010-12-09  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/arm/arm.md (maddhisi4, *maddhidi4): Use the canonical
	operand order for plus.

---
 src/gcc-mainline/gcc/config/arm/arm.md |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/gcc-mainline/gcc/config/arm/arm.md b/src/gcc-mainline/gcc/config/arm/arm.md
index 8bc9926..c816126 100644
--- a/src/gcc-mainline/gcc/config/arm/arm.md
+++ b/src/gcc-mainline/gcc/config/arm/arm.md
@@ -1800,11 +1800,11 @@ 
 
 (define_insn "maddhisi4"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
-	(plus:SI (match_operand:SI 3 "s_register_operand" "r")
-		 (mult:SI (sign_extend:SI
+	(plus:SI (mult:SI (sign_extend:SI
 			   (match_operand:HI 1 "s_register_operand" "%r"))
 			  (sign_extend:SI
-			   (match_operand:HI 2 "s_register_operand" "r")))))]
+			   (match_operand:HI 2 "s_register_operand" "r")))
+		 (match_operand:SI 3 "s_register_operand" "r")))]
   "TARGET_DSP_MULTIPLY"
   "smlabb%?\\t%0, %1, %2, %3"
   [(set_attr "insn" "smlaxy")
@@ -1814,11 +1814,11 @@ 
 (define_insn "*maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
-	  (match_operand:DI 3 "s_register_operand" "0")
 	  (mult:DI (sign_extend:DI
 	 	    (match_operand:HI 1 "s_register_operand" "%r"))
 		   (sign_extend:DI
-		    (match_operand:HI 2 "s_register_operand" "r")))))]
+		    (match_operand:HI 2 "s_register_operand" "r")))
+	  (match_operand:DI 3 "s_register_operand" "0")))]
   "TARGET_DSP_MULTIPLY"
   "smlalbb%?\\t%Q0, %R0, %1, %2"
   [(set_attr "insn" "smlalxy")