diff mbox

[AArch64] Split X-reg UBFX into W-reg LSR when possible

Message ID 5849294C.6060109@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Dec. 8, 2016, 9:35 a.m. UTC
Hi all,

In this patch we split X-register UBFX instructions that extract up to the edge of a W-register into
a W-register LSR instruction. So for the example in the testcase instead of:
UBFX    X0, X0, 24, 8

we'd generate:
LSR     w0, w0, 24

An LSR is a simpler instruction and there's a higher chance that it can be combined with other instructions.

To do this the patch separates the sign_extract case from the zero_extract case in the *<optab><mode> ANY_EXTRACT
pattern and further splits the SImode/DImode patterns from the resulting zrero_extract pattern.
The DImode zero_extract pattern then becomes a define_insn_and_split that splits into a zero_extend of an lshiftrt
when the bitposition and width of the zero_extract add up to 32.

Bootstrapped and tested on aarch64-none-linux-gnu.

Since we're in stage 3 perhaps this is not for GCC 6, but it is fairly low risk.
I'm happy for it to wait for the next release if necessary.

Thanks,
Kyrill

2016-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (*<optab><mode>): Split into...
     (*extv<mode>): ...This...
     (*extzvsi): ...This...
     (*extzvdi:): ... And this.  Add splitting to lshiftrt when possible.

2016-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/ubfx_lsr_1.c: New test.

Comments

James Greenhalgh Dec. 15, 2016, 11:53 a.m. UTC | #1
On Thu, Dec 08, 2016 at 09:35:08AM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> In this patch we split X-register UBFX instructions that extract up to the
> edge of a W-register into a W-register LSR instruction. So for the example in
> the testcase instead of:

> UBFX    X0, X0, 24, 8
> 
> we'd generate:
> LSR     w0, w0, 24
> 
> An LSR is a simpler instruction and there's a higher chance that it can be
> combined with other instructions.
> 
> To do this the patch separates the sign_extract case from the zero_extract
> case in the *<optab><mode> ANY_EXTRACT pattern and further splits the
> SImode/DImode patterns from the resulting zrero_extract pattern.
> The DImode zero_extract pattern then becomes a define_insn_and_split that
> splits into a zero_extend of an lshiftrt when the bitposition and width of
> the zero_extract add up to 32.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Since we're in stage 3 perhaps this is not for GCC 6, but it is fairly low
> risk.  I'm happy for it to wait for the next release if necessary.

I'm OK with the idea, and I'm OK taking this in for Stage 3, but I'm not
convinced by the implementation.

> 2016-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * config/aarch64/aarch64.md (*<optab><mode>): Split into...
>     (*extv<mode>): ...This...
>     (*extzvsi): ...This...
>     (*extzvdi:): ... And this.  Add splitting to lshiftrt when possible.

Why do we want to to it this way, rather than simply defining a single
"split" which works in the case you're trying to catch.

i.e. (untested)

(define_split
   [(set (match_operand:DI 0 "register_operand")
	(zero_extract:DI (match_operand:DI 1 "register_operand")
			 (match_operand 2
			   "aarch64_simd_shift_imm_offset_di")
			 (match_operand 3
			   "aarch64_simd_shift_imm_di")))]
  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
	     1, GET_MODE_BITSIZE (DImode) - 1)
   && (INTVAL (operands[2]) + INTVAL (operands[3]))
	== GET_MODE_BITSIZE (SImode)"
  [(set (match_dup 0)
	(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
  {
    operands[4] = gen_lowpart (SImode, operands[1]);
  }
)

Thanks,
James

> 
> 2016-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * gcc.target/aarch64/ubfx_lsr_1.c: New test.

> diff --git a/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..bc083862976a88190dbef97a247be8a10b277a12
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +/* Check that an X-reg UBFX can be simplified into a W-reg LSR.  */
> +
> +int
> +f (unsigned long long x)
> +{
> +   x = (x >> 24) & 255;
> +   return x + 1;
> +}
> +
> +/* { dg-final { scan-assembler "lsr\tw" } } */
> +/* { dg-final { scan-assembler-not "ubfx\tx" } } */
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6b4d0ba633af2a549ded2f18962d9ed300f56e12..a6f659c26bb5156d652b6c1f09123e682e9ff648 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4327,16 +4327,51 @@  (define_expand "<optab>"
 )
 
 
-(define_insn "*<optab><mode>"
+(define_insn "*extv<mode>"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
+	(sign_extract:GPI (match_operand:GPI 1 "register_operand" "r")
 			 (match_operand 2
 			   "aarch64_simd_shift_imm_offset_<mode>" "n")
 			 (match_operand 3
 			   "aarch64_simd_shift_imm_<mode>" "n")))]
   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
 	     1, GET_MODE_BITSIZE (<MODE>mode) - 1)"
-  "<su>bfx\\t%<w>0, %<w>1, %3, %2"
+  "sbfx\\t%<w>0, %<w>1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
+(define_insn "*extzvsi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(zero_extract:SI (match_operand:SI 1 "register_operand" "r")
+			 (match_operand 2
+			   "aarch64_simd_shift_imm_offset_si" "n")
+			 (match_operand 3
+			   "aarch64_simd_shift_imm_si" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+	     1, GET_MODE_BITSIZE (SImode) - 1)"
+  "ubfx\\t%w0, %w1, %3, %2"
+  [(set_attr "type" "bfx")]
+)
+
+(define_insn_and_split "*extzvdi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+	(zero_extract:DI (match_operand:DI 1 "register_operand" "r")
+			 (match_operand 2
+			   "aarch64_simd_shift_imm_offset_di" "n")
+			 (match_operand 3
+			   "aarch64_simd_shift_imm_di" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+	     1, GET_MODE_BITSIZE (DImode) - 1)"
+  "ubfx\\t%x0, %x1, %3, %2"
+  ;; When the bitposition and width add up to 32 we can use a W-reg LSR
+  ;; instruction taking advantage of the implicit zero-extension of the X-reg.
+  "&& (INTVAL (operands[2]) + INTVAL (operands[3]))
+	== GET_MODE_BITSIZE (SImode)"
+  [(set (match_dup 0)
+	(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
+  {
+    operands[4] = gen_lowpart (SImode, operands[1]);
+  }
   [(set_attr "type" "bfx")]
 )
 
diff --git a/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..bc083862976a88190dbef97a247be8a10b277a12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that an X-reg UBFX can be simplified into a W-reg LSR.  */
+
+int
+f (unsigned long long x)
+{
+   x = (x >> 24) & 255;
+   return x + 1;
+}
+
+/* { dg-final { scan-assembler "lsr\tw" } } */
+/* { dg-final { scan-assembler-not "ubfx\tx" } } */