diff mbox series

aarch64: Fix several *<LOGICAL:optab>_ashl<mode>3 related regressions [PR100056]

Message ID 20210414072855.GP1179226@tucnak
State New
Headers show
Series aarch64: Fix several *<LOGICAL:optab>_ashl<mode>3 related regressions [PR100056] | expand

Commit Message

Jakub Jelinek April 14, 2021, 7:28 a.m. UTC
Hi!

Before combiner added 2 to 2 combinations, the following testcase functions
have been all compiled into 2 instructions, zero/sign extensions or and
followed by orr with lsl, e.g. for the first function
Trying 7 -> 8:
    7: r96:SI=r94:SI<<0xb
    8: r95:SI=r96:SI|r94:SI
      REG_DEAD r96:SI
      REG_DEAD r94:SI
Successfully matched this instruction:
(set (reg:SI 95)
    (ior:SI (ashift:SI (reg/v:SI 94 [ i ])
            (const_int 11 [0xb]))
        (reg/v:SI 94 [ i ])))
is the important successful try_combine and so we end up with
        and     w0, w0, 255
        orr     w0, w0, w0, lsl 11
in the body.
With 2 to 2 combination, before that can trigger, another successful
combination:
Trying 2 -> 7:
    2: r94:SI=zero_extend(x0:QI)
      REG_DEAD x0:QI
    7: r96:SI=r94:SI<<0xb
is replaced with:
(set (reg/v:SI 94 [ i ])
    (zero_extend:SI (reg:QI 0 x0 [ i ])))
and
(set (reg:SI 96)
    (and:SI (ashift:SI (reg:SI 0 x0 [ i ])
            (const_int 11 [0xb]))
        (const_int 522240 [0x7f800])))
and in the end results in 3 instructions in the body:
        and     w1, w0, 255
        ubfiz   w0, w0, 11, 8
        orr     w0, w0, w1
The following combine splitters help undo that when combiner tries to
combine 3 instructions - the zero/sign extend or and, the other insn
from the 2 to 2 combination ([us]bfiz) and the logical op, the CPUs
don't have an insn to do everything in one op, but we can split it
back into the zero/sign extend or and followed by logical with lsl.

Bootstrapped/regtested on aarch64-linux, ok for trunk?

2021-04-14  Jakub Jelinek  <jakub@redhat.com>

	PR target/100056
	* config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
	Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
	ZERO_EXTEND, SIGN_EXTEND or AND.

	* gcc.target/aarch64/pr100056.c: New test.


	Jakub

Comments

Richard Sandiford April 14, 2021, 4:31 p.m. UTC | #1
Jakub Jelinek <jakub@redhat.com> writes:
> Hi!
>
> Before combiner added 2 to 2 combinations, the following testcase functions
> have been all compiled into 2 instructions, zero/sign extensions or and
> followed by orr with lsl, e.g. for the first function
> Trying 7 -> 8:
>     7: r96:SI=r94:SI<<0xb
>     8: r95:SI=r96:SI|r94:SI
>       REG_DEAD r96:SI
>       REG_DEAD r94:SI
> Successfully matched this instruction:
> (set (reg:SI 95)
>     (ior:SI (ashift:SI (reg/v:SI 94 [ i ])
>             (const_int 11 [0xb]))
>         (reg/v:SI 94 [ i ])))
> is the important successful try_combine and so we end up with
>         and     w0, w0, 255
>         orr     w0, w0, w0, lsl 11
> in the body.
> With 2 to 2 combination, before that can trigger, another successful
> combination:
> Trying 2 -> 7:
>     2: r94:SI=zero_extend(x0:QI)
>       REG_DEAD x0:QI
>     7: r96:SI=r94:SI<<0xb
> is replaced with:
> (set (reg/v:SI 94 [ i ])
>     (zero_extend:SI (reg:QI 0 x0 [ i ])))
> and
> (set (reg:SI 96)
>     (and:SI (ashift:SI (reg:SI 0 x0 [ i ])
>             (const_int 11 [0xb]))
>         (const_int 522240 [0x7f800])))
> and in the end results in 3 instructions in the body:
>         and     w1, w0, 255
>         ubfiz   w0, w0, 11, 8
>         orr     w0, w0, w1
> The following combine splitters help undo that when combiner tries to
> combine 3 instructions - the zero/sign extend or and, the other insn
> from the 2 to 2 combination ([us]bfiz) and the logical op, the CPUs
> don't have an insn to do everything in one op, but we can split it
> back into the zero/sign extend or and followed by logical with lsl.
>
> Bootstrapped/regtested on aarch64-linux, ok for trunk?
>
> 2021-04-14  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR target/100056
> 	* config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
> 	Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
> 	ZERO_EXTEND, SIGN_EXTEND or AND.
>
> 	* gcc.target/aarch64/pr100056.c: New test.
>
> --- gcc/config/aarch64/aarch64.md.jj	2021-04-13 12:40:57.000000000 +0200
> +++ gcc/config/aarch64/aarch64.md	2021-04-13 19:54:17.015764651 +0200
> @@ -4431,6 +4431,59 @@ (define_insn "*<LOGICAL:optab>_<SHIFT:op
>    [(set_attr "type" "logic_shift_imm")]
>  )
>  
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +	(LOGICAL:GPI
> +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +		   (match_operand:GPI 4 "const_int_operand"))
> +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
> +  "can_create_pseudo_p ()
> +   && REG_P (operands[1])
> +   && REG_P (operands[3])
> +   && REGNO (operands[1]) == REGNO (operands[3])
> +   && ((unsigned HOST_WIDE_INT)
> +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> +			   << INTVAL (operands[2]), <MODE>mode)
> +       == UINTVAL (operands[4]))"

IMO this would be easier to understand as:

   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
			   << INTVAL (operands[2]), <MODE>mode)
       == INTVAL (operands[4]))

(At first I thought the cast and UINTVAL were trying to escape the
sign-extension canonicalisation.)

I'm not sure about this one though.  The REGNO checks mean that this is
effectively for hard registers only.  I thought one of the reasons for
make_more_copies was to avoid combining hard registers like this, so I'm
not sure we should have a pattern that specifically targets them.

Segher, have I misunderstood?

> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +				   (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +	(LOGICAL:GPI
> +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +		   (match_operand:GPI 4 "const_int_operand"))
> +	  (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
> +  "can_create_pseudo_p ()
> +   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
> +   && ((unsigned HOST_WIDE_INT)
> +       trunc_int_for_mode (UINTVAL (operands[3])
> +			   << INTVAL (operands[2]), <MODE>mode)
> +       == UINTVAL (operands[4]))"
> +  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +				   (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +	(LOGICAL:GPI
> +	  (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
> +		      (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +	  (sign_extend:GPI (match_dup 1))))]
> +  "can_create_pseudo_p ()"
> +  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +				   (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)

These two look good to me apart from the cast nit.  The last one feels
like it's more general than just sign_extends though.  I guess it would
work for any duplicated operation that can be performed in a single
instruction.

Thanks,
Richard

> +
>  (define_insn "*<optab>_rol<mode>3"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>  	(LOGICAL:GPI (rotate:GPI
> --- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj	2021-04-13 14:20:53.334784184 +0200
> +++ gcc/testsuite/gcc.target/aarch64/pr100056.c	2021-04-13 19:44:09.358529648 +0200
> @@ -0,0 +1,50 @@
> +/* PR target/100056 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
> +
> +int
> +or_shift_u8 (unsigned char i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u3a (unsigned i)
> +{
> +  i &= 7;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u3b (unsigned i)
> +{
> +  i = (i << 29) >> 29;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s16 (signed short i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s8 (signed char i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s13 (int i)
> +{
> +  i = (i << 19) >> 19;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s3 (int i)
> +{
> +  i = (i << 29) >> 29;
> +  return i | (i << 11);
> +}
>
> 	Jakub
Jakub Jelinek April 14, 2021, 5:16 p.m. UTC | #2
On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
> > +(define_split
> > +  [(set (match_operand:GPI 0 "register_operand")
> > +	(LOGICAL:GPI
> > +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> > +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> > +		   (match_operand:GPI 4 "const_int_operand"))
> > +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
> > +  "can_create_pseudo_p ()
> > +   && REG_P (operands[1])
> > +   && REG_P (operands[3])
> > +   && REGNO (operands[1]) == REGNO (operands[3])
> > +   && ((unsigned HOST_WIDE_INT)
> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> > +			   << INTVAL (operands[2]), <MODE>mode)
> > +       == UINTVAL (operands[4]))"
> 
> IMO this would be easier to understand as:
> 
>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> 			   << INTVAL (operands[2]), <MODE>mode)
>        == INTVAL (operands[4]))
> 
> (At first I thought the cast and UINTVAL were trying to escape the
> sign-extension canonicalisation.)

It is ok to write it that way, you're right, I wrote it with
UINTVAL etc. because I initially didn't use trunc_int_for_mode
but that is wrong for SImode if the mask is shifted into bit 31.

> I'm not sure about this one though.  The REGNO checks mean that this is
> effectively for hard registers only.  I thought one of the reasons for
> make_more_copies was to avoid combining hard registers like this, so I'm
> not sure we should have a pattern that specifically targets them.
> 
> Segher, have I misunderstood?

Yes, this one works only with the hard regs, the problem is that when
the hard regs are there, combiner doesn't try anything else, so without
such splitter it punts on that.
If I add yet another testcase which doesn't have hard registers, like:
unsigned
or_shift2 (void)
{
  unsigned char i = 0;
  asm volatile ("" : "+r" (i));
  return i | (i << 11);
}
then my patch doesn't handle that case, and the only splitter that would
help would need to deal with:
(set (reg/i:SI 0 x0)
    (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0)
                (const_int 11 [0xb]))
            (const_int 522240 [0x7f800]))
        (zero_extend:SI (reg:QI 97 [ i ]))))
I have added another combine splitter for this below.  But as you can
see, what combiner simplification comes with isn't really consistent
and orthogonal, different operations in there look quite differently :(.

> These two look good to me apart from the cast nit.  The last one feels
> like it's more general than just sign_extends though.  I guess it would
> work for any duplicated operation that can be performed in a single
> instruction.

True, but only very small portion of them can actually make it through,
it needs something that combine has been able to propagate into another
instruction.  So if we know about other insns that would look the same
and would actually be ever matched, we can e.g. define an operator predicate
for it, but until we have testcases for that, not sure it is worth it.

Here is an updated patch that handles also the zero extends without hard
registers and doesn't have the UHWI casts (but untested for now except
for the testcase):

2021-04-14  Jakub Jelinek  <jakub@redhat.com>

	PR target/100056
	* config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
	Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
	ZERO_EXTEND, SIGN_EXTEND or AND.

	* gcc.target/aarch64/pr100056.c: New test.

--- gcc/config/aarch64/aarch64.md.jj	2021-04-13 20:41:45.030040848 +0200
+++ gcc/config/aarch64/aarch64.md	2021-04-14 19:07:41.641623978 +0200
@@ -4431,6 +4431,75 @@ (define_insn "*<LOGICAL:optab>_<SHIFT:op
   [(set_attr "type" "logic_shift_imm")]
 )
 
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+	(LOGICAL:GPI
+	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
+			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+		   (match_operand:GPI 4 "const_int_operand"))
+	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
+  "can_create_pseudo_p ()
+   && REG_P (operands[1])
+   && REG_P (operands[3])
+   && REGNO (operands[1]) == REGNO (operands[3])
+   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
+			   << INTVAL (operands[2]), <MODE>mode)
+       == INTVAL (operands[4]))"
+  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+				   (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+	(LOGICAL:GPI
+	  (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
+				 [(match_operand 1 "register_operand")])
+			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+		   (match_operand:GPI 3 "const_int_operand"))
+	  (zero_extend:GPI (match_dup 1))))]
+  "can_create_pseudo_p ()
+   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[1]))
+			   << INTVAL (operands[2]), <MODE>mode)
+       == INTVAL (operands[3]))"
+  [(set (match_dup 4) (zero_extend:GPI (match_dup 1)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+				   (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+	(LOGICAL:GPI
+	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
+			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+		   (match_operand:GPI 4 "const_int_operand"))
+	  (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
+  "can_create_pseudo_p ()
+   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
+   && (trunc_int_for_mode (UINTVAL (operands[3])
+			   << INTVAL (operands[2]), <MODE>mode)
+       == INTVAL (operands[4]))"
+  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+				   (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+	(LOGICAL:GPI
+	  (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
+		      (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+	  (sign_extend:GPI (match_dup 1))))]
+  "can_create_pseudo_p ()"
+  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+				   (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
 (define_insn "*<optab>_rol<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(LOGICAL:GPI (rotate:GPI
--- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj	2021-04-14 18:54:25.885626705 +0200
+++ gcc/testsuite/gcc.target/aarch64/pr100056.c	2021-04-14 19:00:00.837828080 +0200
@@ -0,0 +1,58 @@
+/* PR target/100056 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
+
+int
+or_shift_u8 (unsigned char i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_u3a (unsigned i)
+{
+  i &= 7;
+  return i | (i << 11);
+}
+
+int
+or_shift_u3b (unsigned i)
+{
+  i = (i << 29) >> 29;
+  return i | (i << 11);
+}
+
+int
+or_shift_s16 (signed short i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_s8 (signed char i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_s13 (int i)
+{
+  i = (i << 19) >> 19;
+  return i | (i << 11);
+}
+
+int
+or_shift_s3 (int i)
+{
+  i = (i << 29) >> 29;
+  return i | (i << 11);
+}
+
+int
+or_shift_u8_asm (unsigned char x)
+{
+  unsigned char i = x;
+  asm volatile ("" : "+r" (i));
+  return i | (i << 11);
+}


	Jakub
Richard Sandiford April 14, 2021, 5:42 p.m. UTC | #3
Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
>> > +(define_split
>> > +  [(set (match_operand:GPI 0 "register_operand")
>> > +	(LOGICAL:GPI
>> > +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>> > +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> > +		   (match_operand:GPI 4 "const_int_operand"))
>> > +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
>> > +  "can_create_pseudo_p ()
>> > +   && REG_P (operands[1])
>> > +   && REG_P (operands[3])
>> > +   && REGNO (operands[1]) == REGNO (operands[3])
>> > +   && ((unsigned HOST_WIDE_INT)
>> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>> > +			   << INTVAL (operands[2]), <MODE>mode)
>> > +       == UINTVAL (operands[4]))"
>> 
>> IMO this would be easier to understand as:
>> 
>>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>> 			   << INTVAL (operands[2]), <MODE>mode)
>>        == INTVAL (operands[4]))
>> 
>> (At first I thought the cast and UINTVAL were trying to escape the
>> sign-extension canonicalisation.)
>
> It is ok to write it that way, you're right, I wrote it with
> UINTVAL etc. because I initially didn't use trunc_int_for_mode
> but that is wrong for SImode if the mask is shifted into bit 31.
>
>> I'm not sure about this one though.  The REGNO checks mean that this is
>> effectively for hard registers only.  I thought one of the reasons for
>> make_more_copies was to avoid combining hard registers like this, so I'm
>> not sure we should have a pattern that specifically targets them.
>> 
>> Segher, have I misunderstood?
>
> Yes, this one works only with the hard regs, the problem is that when
> the hard regs are there, combiner doesn't try anything else, so without
> such splitter it punts on that.
> If I add yet another testcase which doesn't have hard registers, like:
> unsigned
> or_shift2 (void)
> {
>   unsigned char i = 0;
>   asm volatile ("" : "+r" (i));
>   return i | (i << 11);
> }
> then my patch doesn't handle that case, and the only splitter that would
> help would need to deal with:
> (set (reg/i:SI 0 x0)
>     (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0)
>                 (const_int 11 [0xb]))
>             (const_int 522240 [0x7f800]))
>         (zero_extend:SI (reg:QI 97 [ i ]))))
> I have added another combine splitter for this below.  But as you can
> see, what combiner simplification comes with isn't really consistent
> and orthogonal, different operations in there look quite differently :(.

Hmm, OK.  Still, the above looks reasonable on first principles.

>> These two look good to me apart from the cast nit.  The last one feels
>> like it's more general than just sign_extends though.  I guess it would
>> work for any duplicated operation that can be performed in a single
>> instruction.
>
> True, but only very small portion of them can actually make it through,
> it needs something that combine has been able to propagate into another
> instruction.  So if we know about other insns that would look the same
> and would actually be ever matched, we can e.g. define an operator predicate
> for it, but until we have testcases for that, not sure it is worth it.
>
> Here is an updated patch that handles also the zero extends without hard
> registers and doesn't have the UHWI casts (but untested for now except
> for the testcase):
>
> 2021-04-14  Jakub Jelinek  <jakub@redhat.com>
>
> 	PR target/100056
> 	* config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
> 	Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
> 	ZERO_EXTEND, SIGN_EXTEND or AND.
>
> 	* gcc.target/aarch64/pr100056.c: New test.
>
> --- gcc/config/aarch64/aarch64.md.jj	2021-04-13 20:41:45.030040848 +0200
> +++ gcc/config/aarch64/aarch64.md	2021-04-14 19:07:41.641623978 +0200
> @@ -4431,6 +4431,75 @@ (define_insn "*<LOGICAL:optab>_<SHIFT:op
>    [(set_attr "type" "logic_shift_imm")]
>  )
>  
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +	(LOGICAL:GPI
> +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +		   (match_operand:GPI 4 "const_int_operand"))
> +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
> +  "can_create_pseudo_p ()
> +   && REG_P (operands[1])
> +   && REG_P (operands[3])
> +   && REGNO (operands[1]) == REGNO (operands[3])
> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> +			   << INTVAL (operands[2]), <MODE>mode)
> +       == INTVAL (operands[4]))"
> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +				   (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +	(LOGICAL:GPI
> +	  (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
> +				 [(match_operand 1 "register_operand")])
> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +		   (match_operand:GPI 3 "const_int_operand"))
> +	  (zero_extend:GPI (match_dup 1))))]
> +  "can_create_pseudo_p ()
> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[1]))
> +			   << INTVAL (operands[2]), <MODE>mode)
> +       == INTVAL (operands[3]))"
> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 1)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +				   (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"

Minor, but it might be better to use operand 5 for the temporary here.
Otherwise this looks good apart from the open question about whether
we should be doing complex combinations involving hard regs.  Let's see
what Segher says about that side.

Thanks,
Richard

> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +	(LOGICAL:GPI
> +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +		   (match_operand:GPI 4 "const_int_operand"))
> +	  (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
> +  "can_create_pseudo_p ()
> +   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
> +   && (trunc_int_for_mode (UINTVAL (operands[3])
> +			   << INTVAL (operands[2]), <MODE>mode)
> +       == INTVAL (operands[4]))"
> +  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +				   (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
> +(define_split
> +  [(set (match_operand:GPI 0 "register_operand")
> +	(LOGICAL:GPI
> +	  (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
> +		      (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> +	  (sign_extend:GPI (match_dup 1))))]
> +  "can_create_pseudo_p ()"
> +  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
> +				   (match_dup 4)))]
> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
> +)
> +
>  (define_insn "*<optab>_rol<mode>3"
>    [(set (match_operand:GPI 0 "register_operand" "=r")
>  	(LOGICAL:GPI (rotate:GPI
> --- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj	2021-04-14 18:54:25.885626705 +0200
> +++ gcc/testsuite/gcc.target/aarch64/pr100056.c	2021-04-14 19:00:00.837828080 +0200
> @@ -0,0 +1,58 @@
> +/* PR target/100056 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
> +
> +int
> +or_shift_u8 (unsigned char i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u3a (unsigned i)
> +{
> +  i &= 7;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u3b (unsigned i)
> +{
> +  i = (i << 29) >> 29;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s16 (signed short i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s8 (signed char i)
> +{
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s13 (int i)
> +{
> +  i = (i << 19) >> 19;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_s3 (int i)
> +{
> +  i = (i << 29) >> 29;
> +  return i | (i << 11);
> +}
> +
> +int
> +or_shift_u8_asm (unsigned char x)
> +{
> +  unsigned char i = x;
> +  asm volatile ("" : "+r" (i));
> +  return i | (i << 11);
> +}
>
>
> 	Jakub
Segher Boessenkool April 14, 2021, 5:46 p.m. UTC | #4
Hi!

On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
> Jakub Jelinek <jakub@redhat.com> writes:
> > +(define_split
> > +  [(set (match_operand:GPI 0 "register_operand")
> > +	(LOGICAL:GPI
> > +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> > +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> > +		   (match_operand:GPI 4 "const_int_operand"))
> > +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
> > +  "can_create_pseudo_p ()
> > +   && REG_P (operands[1])
> > +   && REG_P (operands[3])
> > +   && REGNO (operands[1]) == REGNO (operands[3])
> > +   && ((unsigned HOST_WIDE_INT)
> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> > +			   << INTVAL (operands[2]), <MODE>mode)
> > +       == UINTVAL (operands[4]))"
> 
> IMO this would be easier to understand as:
> 
>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
> 			   << INTVAL (operands[2]), <MODE>mode)
>        == INTVAL (operands[4]))
> 
> (At first I thought the cast and UINTVAL were trying to escape the
> sign-extension canonicalisation.)

Yeah.  Or just that with UINTVAL, the implicit conversion is fine (but
maybe that warns?)  But INTVAL is simpler for sure.

> I'm not sure about this one though.  The REGNO checks mean that this is
> effectively for hard registers only.  I thought one of the reasons for
> make_more_copies was to avoid combining hard registers like this, so I'm
> not sure we should have a pattern that specifically targets them.
> 
> Segher, have I misunderstood?

The REGNO checks work fine for pseudos as well.  But, why does it do
this at all, instead of using match_dup?  That should be clearer.

The point of make_more_copies is that the hard registers from function
arguments are not pushed down by combine into actual instructions.  This
can be done by RA if it thinks that is a good idea, and not done if it
thinks it is a bad idea.  Having combine usurp part of the register
allocators role is not a good idea.

There are other reasons hard regs can still end up in RTL insns in
earlier RTL passes of course, but the other changes that went together
with make_more_copies stop combine from doing that a lot (the function
itself makes sure every hard reg is copied to a new pseudo, because
combining that trivial move (from that new pseudo to the pseudo it was
copying it to already!) can still be beneficial for other reasons, all
strange and pretty unhappy, but important on many targets).


Segher
Segher Boessenkool April 14, 2021, 5:53 p.m. UTC | #5
On Wed, Apr 14, 2021 at 06:42:33PM +0100, Richard Sandiford wrote:
> Otherwise this looks good apart from the open question about whether
> we should be doing complex combinations involving hard regs.  Let's see
> what Segher says about that side.

It works find with pseudos as well, but then you need to use the same
pseudo twice in here.  Before RA you cannot know this will be the same
hard reg (and it is a bad idea in general to force that).

Disallowing forwarding hard regs (from function args) in combine helps
quite a bit on average, but there are cases like this one where the
balance swings the other way :-(


Segher
Richard Sandiford April 14, 2021, 5:55 p.m. UTC | #6
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
>> Jakub Jelinek <jakub@redhat.com> writes:
>> > +(define_split
>> > +  [(set (match_operand:GPI 0 "register_operand")
>> > +	(LOGICAL:GPI
>> > +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>> > +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> > +		   (match_operand:GPI 4 "const_int_operand"))
>> > +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
>> > +  "can_create_pseudo_p ()
>> > +   && REG_P (operands[1])
>> > +   && REG_P (operands[3])
>> > +   && REGNO (operands[1]) == REGNO (operands[3])
>> > +   && ((unsigned HOST_WIDE_INT)
>> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>> > +			   << INTVAL (operands[2]), <MODE>mode)
>> > +       == UINTVAL (operands[4]))"
>> 
>> IMO this would be easier to understand as:
>> 
>>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>> 			   << INTVAL (operands[2]), <MODE>mode)
>>        == INTVAL (operands[4]))
>> 
>> (At first I thought the cast and UINTVAL were trying to escape the
>> sign-extension canonicalisation.)
>
> Yeah.  Or just that with UINTVAL, the implicit conversion is fine (but
> maybe that warns?)  But INTVAL is simpler for sure.
>
>> I'm not sure about this one though.  The REGNO checks mean that this is
>> effectively for hard registers only.  I thought one of the reasons for
>> make_more_copies was to avoid combining hard registers like this, so I'm
>> not sure we should have a pattern that specifically targets them.
>> 
>> Segher, have I misunderstood?
>
> The REGNO checks work fine for pseudos as well.  But, why does it do
> this at all, instead of using match_dup?  That should be clearer.

The register is appearing in two different modes: GPI for operand 1
and something smaller than GPI for operand 3 (otherwise the extension
would be ill-formed).  That's what makes it specific to hard registers.

> The point of make_more_copies is that the hard registers from function
> arguments are not pushed down by combine into actual instructions.  This
> can be done by RA if it thinks that is a good idea, and not done if it
> thinks it is a bad idea.  Having combine usurp part of the register
> allocators role is not a good idea.
>
> There are other reasons hard regs can still end up in RTL insns in
> earlier RTL passes of course, but the other changes that went together
> with make_more_copies stop combine from doing that a lot (the function
> itself makes sure every hard reg is copied to a new pseudo, because
> combining that trivial move (from that new pseudo to the pseudo it was
> copying it to already!) can still be beneficial for other reasons, all
> strange and pretty unhappy, but important on many targets).

What would your recommendation be for this pattern?  Is matching
hard registers a bad idea, or should we go with it?

Thanks,
Richard
Jakub Jelinek April 14, 2021, 6:01 p.m. UTC | #7
On Wed, Apr 14, 2021 at 12:46:43PM -0500, Segher Boessenkool wrote:
> The REGNO checks work fine for pseudos as well.  But, why does it do
> this at all, instead of using match_dup?  That should be clearer.

Because with the hard regs it has different modes, so match_dup
wouldn't work.

We are talking here about:
Trying 7, 2 -> 8:
    7: r98:SI=x0:SI<<0xb&0x7f800
      REG_DEAD x0:QI
    2: r96:SI=zero_extend(x0:QI)
    8: r97:SI=r98:SI|r96:SI
      REG_DEAD r98:SI
      REG_DEAD r96:SI
Failed to match this instruction:
(set (reg:SI 97)
    (ior:SI (and:SI (ashift:SI (reg:SI 0 x0 [ i ])
                (const_int 11 [0xb]))
            (const_int 522240 [0x7f800]))
        (zero_extend:SI (reg:QI 0 x0 [ i ]))))
Failed to match this instruction:
(set (reg:SI 97)
    (ior:SI (and:SI (ashift:SI (reg:SI 0 x0 [ i ])
                (const_int 11 [0xb]))
            (const_int 522240 [0x7f800]))
        (and:SI (reg:SI 0 x0)
            (const_int 255 [0xff]))))
Splitting with gen_split_28 (aarch64.md:4434)
Successfully matched this instruction:
(set (reg:SI 99)
    (zero_extend:SI (reg:QI 0 x0 [ i ])))
Successfully matched this instruction:
(set (reg:SI 97)
    (ior:SI (ashift:SI (reg:SI 99)
            (const_int 11 [0xb]))
        (reg:SI 99)))

match_dup means insn-recog.c calls rtx_equal_p and that returns false if the
mode is not the same.

Before combine the 3 insns are:
(insn 2 4 3 2 (set (reg/v:SI 96 [ i ])
        (zero_extend:SI (reg:QI 0 x0 [ i ]))) "pr100056.c":10:1 114 {*zero_extendqisi2_aarch64}
     (expr_list:REG_DEAD (reg:QI 0 x0 [ i ])
        (nil)))
(note 3 2 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 3 8 2 (set (reg:SI 98)
        (ashift:SI (reg/v:SI 96 [ i ])
            (const_int 11 [0xb]))) "pr100056.c":11:17 691 {*aarch64_ashl_sisd_or_int_si3}
     (nil))
(insn 8 7 13 2 (set (reg:SI 97)
        (ior:SI (reg:SI 98)
            (reg/v:SI 96 [ i ]))) "pr100056.c":11:12 488 {iorsi3}
     (expr_list:REG_DEAD (reg:SI 98)
        (expr_list:REG_DEAD (reg/v:SI 96 [ i ])
            (nil))))
and I must say I don't know if make_more_copies was meant to
split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and
(set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo)))
or not.

> The point of make_more_copies is that the hard registers from function
> arguments are not pushed down by combine into actual instructions.  This
> can be done by RA if it thinks that is a good idea, and not done if it
> thinks it is a bad idea.  Having combine usurp part of the register
> allocators role is not a good idea.
> 
> There are other reasons hard regs can still end up in RTL insns in
> earlier RTL passes of course, but the other changes that went together
> with make_more_copies stop combine from doing that a lot (the function
> itself makes sure every hard reg is copied to a new pseudo, because
> combining that trivial move (from that new pseudo to the pseudo it was
> copying it to already!) can still be beneficial for other reasons, all
> strange and pretty unhappy, but important on many targets).

	Jakub
Segher Boessenkool April 14, 2021, 6:11 p.m. UTC | #8
On Wed, Apr 14, 2021 at 06:55:56PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > The REGNO checks work fine for pseudos as well.  But, why does it do
> > this at all, instead of using match_dup?  That should be clearer.
> 
> The register is appearing in two different modes: GPI for operand 1
> and something smaller than GPI for operand 3 (otherwise the extension
> would be ill-formed).  That's what makes it specific to hard registers.

You write your patterns with subregs for that?  That works fine for a
mode change on hard regs as well, afaik.

> > The point of make_more_copies is that the hard registers from function
> > arguments are not pushed down by combine into actual instructions.  This
> > can be done by RA if it thinks that is a good idea, and not done if it
> > thinks it is a bad idea.  Having combine usurp part of the register
> > allocators role is not a good idea.
> >
> > There are other reasons hard regs can still end up in RTL insns in
> > earlier RTL passes of course, but the other changes that went together
> > with make_more_copies stop combine from doing that a lot (the function
> > itself makes sure every hard reg is copied to a new pseudo, because
> > combining that trivial move (from that new pseudo to the pseudo it was
> > copying it to already!) can still be beneficial for other reasons, all
> > strange and pretty unhappy, but important on many targets).
> 
> What would your recommendation be for this pattern?  Is matching
> hard registers a bad idea, or should we go with it?

I would just use a subreg, and match_dup.  But maybe I am missing
something?


Segher
Segher Boessenkool April 14, 2021, 6:47 p.m. UTC | #9
On Wed, Apr 14, 2021 at 08:01:11PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 14, 2021 at 12:46:43PM -0500, Segher Boessenkool wrote:
> > The REGNO checks work fine for pseudos as well.  But, why does it do
> > this at all, instead of using match_dup?  That should be clearer.
> 
> Because with the hard regs it has different modes, so match_dup
> wouldn't work.

A subreg:QI of the match_dup should match fine.  You can use a subreg
wherever GCC tries to match a reg.

> match_dup means insn-recog.c calls rtx_equal_p and that returns false if the
> mode is not the same.

Yes, but they are the same :-)

(reg:SI whatever)  and  (subreg:QI (reg:SI whatever) 0)

> Before combine the 3 insns are:
> (insn 2 4 3 2 (set (reg/v:SI 96 [ i ])
>         (zero_extend:SI (reg:QI 0 x0 [ i ]))) "pr100056.c":10:1 114 {*zero_extendqisi2_aarch64}
>      (expr_list:REG_DEAD (reg:QI 0 x0 [ i ])
>         (nil)))

> and I must say I don't know if make_more_copies was meant to
> split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and
> (set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo)))
> or not.

It makes

(set (reg:QI new) (reg:QI x0))
(set (reg:SI 96) (zero_extend:SI (reg:QI new)))

The point is it keeps exactly the same form, but no hard regs anymore.


Segher
Jakub Jelinek April 14, 2021, 7:20 p.m. UTC | #10
On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote:
> A subreg:QI of the match_dup should match fine.  You can use a subreg
> wherever GCC tries to match a reg.
> 
> > match_dup means insn-recog.c calls rtx_equal_p and that returns false if the
> > mode is not the same.
> 
> Yes, but they are the same :-)

In the end sure.
> 
> (reg:SI whatever)  and  (subreg:QI (reg:SI whatever) 0)

The question is what to write in the splitter pattern so that
they would match.
  [(set (match_operand:GPI 0 "register_operand")
        (LOGICAL:GPI
          (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
                                 [(match_operand 1 "register_operand")])
                               (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
                   (match_operand:GPI 3 "const_int_operand"))
          (zero_extend:GPI (match_dup 1))))]
provably doesn't (that is from the splitter I wrote for the non-hard regs),
nor
  [(set (match_operand:GPI 0 "register_operand")
        (LOGICAL:GPI
          (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
                               (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
                   (match_operand:GPI 3 "const_int_operand"))
          (zero_extend:GPI (subreg (match_dup 1) 0))))]
works (and it is unclear how I'd find out the mode of the subreg even if it
worked).

	Jakub
Jakub Jelinek April 14, 2021, 7:25 p.m. UTC | #11
On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote:
> > and I must say I don't know if make_more_copies was meant to
> > split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and
> > (set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo)))
> > or not.
> 
> It makes
> 
> (set (reg:QI new) (reg:QI x0))
> (set (reg:SI 96) (zero_extend:SI (reg:QI new)))
> 
> The point is it keeps exactly the same form, but no hard regs anymore.

It doesn't, as make_more_copies does:
          rtx dest = SET_DEST (set);
          if (!(REG_P (dest) && !HARD_REGISTER_P (dest)))
              continue;
   
          rtx src = SET_SRC (set);
          if (!(REG_P (src) && HARD_REGISTER_P (src)))
            continue;
but in this case the hard reg is wrapped into the zero_extend already
and so it will continue;

	Jakub
Segher Boessenkool April 14, 2021, 7:42 p.m. UTC | #12
On Wed, Apr 14, 2021 at 09:20:33PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote:
> > A subreg:QI of the match_dup should match fine.  You can use a subreg
> > wherever GCC tries to match a reg.
> > 
> > > match_dup means insn-recog.c calls rtx_equal_p and that returns false if the
> > > mode is not the same.
> > 
> > Yes, but they are the same :-)
> 
> In the end sure.
> > 
> > (reg:SI whatever)  and  (subreg:QI (reg:SI whatever) 0)
> 
> The question is what to write in the splitter pattern so that
> they would match.
>   [(set (match_operand:GPI 0 "register_operand")
>         (LOGICAL:GPI
>           (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
>                                  [(match_operand 1 "register_operand")])
>                                (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>                    (match_operand:GPI 3 "const_int_operand"))
>           (zero_extend:GPI (match_dup 1))))]
> provably doesn't (that is from the splitter I wrote for the non-hard regs),
> nor
>   [(set (match_operand:GPI 0 "register_operand")
>         (LOGICAL:GPI
>           (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>                                (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>                    (match_operand:GPI 3 "const_int_operand"))
>           (zero_extend:GPI (subreg (match_dup 1) 0))))]
> works (and it is unclear how I'd find out the mode of the subreg even if it
> worked).

Just
  (subreg:QI (match_dup 1) 0)
should work?


Segher
Jakub Jelinek April 14, 2021, 7:44 p.m. UTC | #13
On Wed, Apr 14, 2021 at 09:20:35PM +0200, Jakub Jelinek wrote:
> The question is what to write in the splitter pattern so that
> they would match.
>   [(set (match_operand:GPI 0 "register_operand")
>         (LOGICAL:GPI
>           (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
>                                  [(match_operand 1 "register_operand")])
>                                (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>                    (match_operand:GPI 3 "const_int_operand"))
>           (zero_extend:GPI (match_dup 1))))]
> provably doesn't (that is from the splitter I wrote for the non-hard regs),
> nor
>   [(set (match_operand:GPI 0 "register_operand")
>         (LOGICAL:GPI
>           (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>                                (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>                    (match_operand:GPI 3 "const_int_operand"))
>           (zero_extend:GPI (subreg (match_dup 1) 0))))]
> works (and it is unclear how I'd find out the mode of the subreg even if it
> worked).

What seems to work and handles both the pseudos (in which case there
is a (subreg:GPI (reg:whatever xyz) 0) and (reg:whatever xyz)
in the operands) and the hard registers (in which case there is
(reg:GPI xyz) and (reg:whatever xyz) in the operands) is:

(define_split
  [(set (match_operand:GPI 0 "register_operand")
	(LOGICAL:GPI
	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
		   (match_operand:GPI 3 "const_int_operand"))
	  (zero_extend:GPI (match_operand 4 "register_operand"))))]
  "can_create_pseudo_p ()
   && ((paradoxical_subreg_p (operands[1])
	&& rtx_equal_p (SUBREG_REG (operands[1]), operands[4]))
       || (REG_P (operands[1])
	   && REG_P (operands[4])
	   && REGNO (operands[1]) == REGNO (operands[4])))
   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[4]))
			   << INTVAL (operands[2]), <MODE>mode)
       == INTVAL (operands[3]))"
  [(set (match_dup 5) (zero_extend:GPI (match_dup 4)))
   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 5) (match_dup 2))
				   (match_dup 5)))]
  "operands[5] = gen_reg_rtx (<MODE>mode);"
)

While it is one pattern, it needs different handling for the two cases.
Is that acceptable?

	Jakub
Jakub Jelinek April 14, 2021, 7:45 p.m. UTC | #14
On Wed, Apr 14, 2021 at 02:42:54PM -0500, Segher Boessenkool wrote:
> > provably doesn't (that is from the splitter I wrote for the non-hard regs),
> > nor
> >   [(set (match_operand:GPI 0 "register_operand")
> >         (LOGICAL:GPI
> >           (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> >                                (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> >                    (match_operand:GPI 3 "const_int_operand"))
> >           (zero_extend:GPI (subreg (match_dup 1) 0))))]
> > works (and it is unclear how I'd find out the mode of the subreg even if it
> > worked).
> 
> Just
>   (subreg:QI (match_dup 1) 0)
> should work?

That doesn't work either.

	Jakub
Segher Boessenkool April 14, 2021, 7:46 p.m. UTC | #15
On Wed, Apr 14, 2021 at 09:25:46PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 14, 2021 at 01:47:04PM -0500, Segher Boessenkool wrote:
> > > and I must say I don't know if make_more_copies was meant to
> > > split insn 2 into (set (reg:QI pseudo) (reg:QI 0 x0)) and
> > > (set (reg/v:SI 96) (zero_extend:SI (reg:QI pseudo)))
> > > or not.
> > 
> > It makes
> > 
> > (set (reg:QI new) (reg:QI x0))
> > (set (reg:SI 96) (zero_extend:SI (reg:QI new)))
> > 
> > The point is it keeps exactly the same form, but no hard regs anymore.
> 
> It doesn't, as make_more_copies does:
>           rtx dest = SET_DEST (set);
>           if (!(REG_P (dest) && !HARD_REGISTER_P (dest)))
>               continue;
>    
>           rtx src = SET_SRC (set);
>           if (!(REG_P (src) && HARD_REGISTER_P (src)))
>             continue;
> but in this case the hard reg is wrapped into the zero_extend already
> and so it will continue;

Ah, I see.  That could/should be improved then.  But, GCC 12 :-)


Segher
Segher Boessenkool April 14, 2021, 7:47 p.m. UTC | #16
On Wed, Apr 14, 2021 at 09:45:35PM +0200, Jakub Jelinek wrote:
> On Wed, Apr 14, 2021 at 02:42:54PM -0500, Segher Boessenkool wrote:
> > > provably doesn't (that is from the splitter I wrote for the non-hard regs),
> > > nor
> > >   [(set (match_operand:GPI 0 "register_operand")
> > >         (LOGICAL:GPI
> > >           (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> > >                                (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> > >                    (match_operand:GPI 3 "const_int_operand"))
> > >           (zero_extend:GPI (subreg (match_dup 1) 0))))]
> > > works (and it is unclear how I'd find out the mode of the subreg even if it
> > > worked).
> > 
> > Just
> >   (subreg:QI (match_dup 1) 0)
> > should work?
> 
> That doesn't work either.

Why not?  What goes wrong with that?


Segher
Jakub Jelinek April 14, 2021, 7:57 p.m. UTC | #17
On Wed, Apr 14, 2021 at 02:47:59PM -0500, Segher Boessenkool wrote:
> On Wed, Apr 14, 2021 at 09:45:35PM +0200, Jakub Jelinek wrote:
> > On Wed, Apr 14, 2021 at 02:42:54PM -0500, Segher Boessenkool wrote:
> > > > provably doesn't (that is from the splitter I wrote for the non-hard regs),
> > > > nor
> > > >   [(set (match_operand:GPI 0 "register_operand")
> > > >         (LOGICAL:GPI
> > > >           (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
> > > >                                (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
> > > >                    (match_operand:GPI 3 "const_int_operand"))
> > > >           (zero_extend:GPI (subreg (match_dup 1) 0))))]
> > > > works (and it is unclear how I'd find out the mode of the subreg even if it
> > > > worked).
> > > 
> > > Just
> > >   (subreg:QI (match_dup 1) 0)
> > > should work?
> > 
> > That doesn't work either.
> 
> Why not?  What goes wrong with that?

It just doesn't match and therefore doesn't split it.

	Jakub
Richard Sandiford April 14, 2021, 8:36 p.m. UTC | #18
Richard Sandiford <richard.sandiford@arm.com> writes:
> Jakub Jelinek <jakub@redhat.com> writes:
>> On Wed, Apr 14, 2021 at 05:31:23PM +0100, Richard Sandiford wrote:
>>> > +(define_split
>>> > +  [(set (match_operand:GPI 0 "register_operand")
>>> > +	(LOGICAL:GPI
>>> > +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>>> > +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>>> > +		   (match_operand:GPI 4 "const_int_operand"))
>>> > +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
>>> > +  "can_create_pseudo_p ()
>>> > +   && REG_P (operands[1])
>>> > +   && REG_P (operands[3])
>>> > +   && REGNO (operands[1]) == REGNO (operands[3])
>>> > +   && ((unsigned HOST_WIDE_INT)
>>> > +       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>>> > +			   << INTVAL (operands[2]), <MODE>mode)
>>> > +       == UINTVAL (operands[4]))"
>>> 
>>> IMO this would be easier to understand as:
>>> 
>>>    && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>>> 			   << INTVAL (operands[2]), <MODE>mode)
>>>        == INTVAL (operands[4]))
>>> 
>>> (At first I thought the cast and UINTVAL were trying to escape the
>>> sign-extension canonicalisation.)
>>
>> It is ok to write it that way, you're right, I wrote it with
>> UINTVAL etc. because I initially didn't use trunc_int_for_mode
>> but that is wrong for SImode if the mask is shifted into bit 31.
>>
>>> I'm not sure about this one though.  The REGNO checks mean that this is
>>> effectively for hard registers only.  I thought one of the reasons for
>>> make_more_copies was to avoid combining hard registers like this, so I'm
>>> not sure we should have a pattern that specifically targets them.
>>> 
>>> Segher, have I misunderstood?
>>
>> Yes, this one works only with the hard regs, the problem is that when
>> the hard regs are there, combiner doesn't try anything else, so without
>> such splitter it punts on that.
>> If I add yet another testcase which doesn't have hard registers, like:
>> unsigned
>> or_shift2 (void)
>> {
>>   unsigned char i = 0;
>>   asm volatile ("" : "+r" (i));
>>   return i | (i << 11);
>> }
>> then my patch doesn't handle that case, and the only splitter that would
>> help would need to deal with:
>> (set (reg/i:SI 0 x0)
>>     (ior:SI (and:SI (ashift:SI (subreg:SI (reg:QI 97 [ i ]) 0)
>>                 (const_int 11 [0xb]))
>>             (const_int 522240 [0x7f800]))
>>         (zero_extend:SI (reg:QI 97 [ i ]))))
>> I have added another combine splitter for this below.  But as you can
>> see, what combiner simplification comes with isn't really consistent
>> and orthogonal, different operations in there look quite differently :(.
>
> Hmm, OK.  Still, the above looks reasonable on first principles.
>
>>> These two look good to me apart from the cast nit.  The last one feels
>>> like it's more general than just sign_extends though.  I guess it would
>>> work for any duplicated operation that can be performed in a single
>>> instruction.
>>
>> True, but only very small portion of them can actually make it through,
>> it needs something that combine has been able to propagate into another
>> instruction.  So if we know about other insns that would look the same
>> and would actually be ever matched, we can e.g. define an operator predicate
>> for it, but until we have testcases for that, not sure it is worth it.
>>
>> Here is an updated patch that handles also the zero extends without hard
>> registers and doesn't have the UHWI casts (but untested for now except
>> for the testcase):
>>
>> 2021-04-14  Jakub Jelinek  <jakub@redhat.com>
>>
>> 	PR target/100056
>> 	* config/aarch64/aarch64.md (*<LOGICAL:optab>_<SHIFT:optab><mode>3):
>> 	Add combine splitters for *<LOGICAL:optab>_ashl<mode>3 with
>> 	ZERO_EXTEND, SIGN_EXTEND or AND.
>>
>> 	* gcc.target/aarch64/pr100056.c: New test.
>>
>> --- gcc/config/aarch64/aarch64.md.jj	2021-04-13 20:41:45.030040848 +0200
>> +++ gcc/config/aarch64/aarch64.md	2021-04-14 19:07:41.641623978 +0200
>> @@ -4431,6 +4431,75 @@ (define_insn "*<LOGICAL:optab>_<SHIFT:op
>>    [(set_attr "type" "logic_shift_imm")]
>>  )
>>  
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +	(LOGICAL:GPI
>> +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +		   (match_operand:GPI 4 "const_int_operand"))
>> +	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
>> +  "can_create_pseudo_p ()
>> +   && REG_P (operands[1])
>> +   && REG_P (operands[3])
>> +   && REGNO (operands[1]) == REGNO (operands[3])
>> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
>> +			   << INTVAL (operands[2]), <MODE>mode)
>> +       == INTVAL (operands[4]))"
>> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +				   (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>> +)
>> +
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +	(LOGICAL:GPI
>> +	  (and:GPI (ashift:GPI (match_operator:GPI 4 "subreg_lowpart_operator"
>> +				 [(match_operand 1 "register_operand")])
>> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +		   (match_operand:GPI 3 "const_int_operand"))
>> +	  (zero_extend:GPI (match_dup 1))))]
>> +  "can_create_pseudo_p ()
>> +   && (trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[1]))
>> +			   << INTVAL (operands[2]), <MODE>mode)
>> +       == INTVAL (operands[3]))"
>> +  [(set (match_dup 4) (zero_extend:GPI (match_dup 1)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +				   (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>
> Minor, but it might be better to use operand 5 for the temporary here.
> Otherwise this looks good apart from the open question about whether
> we should be doing complex combinations involving hard regs.  Let's see
> what Segher says about that side.

OK, so based on what Segher said about possibly extending
make_more_copies for GCC 12, let's go with this for GCC 11.
It'll then be easy to remove the hard-register version for
GCC 12 if that's appropriate.

Thanks,
Richard

>> +)
>> +
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +	(LOGICAL:GPI
>> +	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
>> +			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +		   (match_operand:GPI 4 "const_int_operand"))
>> +	  (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
>> +  "can_create_pseudo_p ()
>> +   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
>> +   && (trunc_int_for_mode (UINTVAL (operands[3])
>> +			   << INTVAL (operands[2]), <MODE>mode)
>> +       == INTVAL (operands[4]))"
>> +  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +				   (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>> +)
>> +
>> +(define_split
>> +  [(set (match_operand:GPI 0 "register_operand")
>> +	(LOGICAL:GPI
>> +	  (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
>> +		      (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
>> +	  (sign_extend:GPI (match_dup 1))))]
>> +  "can_create_pseudo_p ()"
>> +  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
>> +   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
>> +				   (match_dup 4)))]
>> +  "operands[4] = gen_reg_rtx (<MODE>mode);"
>> +)
>> +
>>  (define_insn "*<optab>_rol<mode>3"
>>    [(set (match_operand:GPI 0 "register_operand" "=r")
>>  	(LOGICAL:GPI (rotate:GPI
>> --- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj	2021-04-14 18:54:25.885626705 +0200
>> +++ gcc/testsuite/gcc.target/aarch64/pr100056.c	2021-04-14 19:00:00.837828080 +0200
>> @@ -0,0 +1,58 @@
>> +/* PR target/100056 */
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
>> +
>> +int
>> +or_shift_u8 (unsigned char i)
>> +{
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_u3a (unsigned i)
>> +{
>> +  i &= 7;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_u3b (unsigned i)
>> +{
>> +  i = (i << 29) >> 29;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s16 (signed short i)
>> +{
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s8 (signed char i)
>> +{
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s13 (int i)
>> +{
>> +  i = (i << 19) >> 19;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_s3 (int i)
>> +{
>> +  i = (i << 29) >> 29;
>> +  return i | (i << 11);
>> +}
>> +
>> +int
>> +or_shift_u8_asm (unsigned char x)
>> +{
>> +  unsigned char i = x;
>> +  asm volatile ("" : "+r" (i));
>> +  return i | (i << 11);
>> +}
>>
>>
>> 	Jakub
diff mbox series

Patch

--- gcc/config/aarch64/aarch64.md.jj	2021-04-13 12:40:57.000000000 +0200
+++ gcc/config/aarch64/aarch64.md	2021-04-13 19:54:17.015764651 +0200
@@ -4431,6 +4431,59 @@  (define_insn "*<LOGICAL:optab>_<SHIFT:op
   [(set_attr "type" "logic_shift_imm")]
 )
 
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+	(LOGICAL:GPI
+	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
+			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+		   (match_operand:GPI 4 "const_int_operand"))
+	  (zero_extend:GPI (match_operand 3 "register_operand"))))]
+  "can_create_pseudo_p ()
+   && REG_P (operands[1])
+   && REG_P (operands[3])
+   && REGNO (operands[1]) == REGNO (operands[3])
+   && ((unsigned HOST_WIDE_INT)
+       trunc_int_for_mode (GET_MODE_MASK (GET_MODE (operands[3]))
+			   << INTVAL (operands[2]), <MODE>mode)
+       == UINTVAL (operands[4]))"
+  [(set (match_dup 4) (zero_extend:GPI (match_dup 3)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+				   (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+	(LOGICAL:GPI
+	  (and:GPI (ashift:GPI (match_operand:GPI 1 "register_operand")
+			       (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+		   (match_operand:GPI 4 "const_int_operand"))
+	  (and:GPI (match_dup 1) (match_operand:GPI 3 "const_int_operand"))))]
+  "can_create_pseudo_p ()
+   && pow2_or_zerop (UINTVAL (operands[3]) + 1)
+   && ((unsigned HOST_WIDE_INT)
+       trunc_int_for_mode (UINTVAL (operands[3])
+			   << INTVAL (operands[2]), <MODE>mode)
+       == UINTVAL (operands[4]))"
+  [(set (match_dup 4) (and:GPI (match_dup 1) (match_dup 3)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+				   (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
+(define_split
+  [(set (match_operand:GPI 0 "register_operand")
+	(LOGICAL:GPI
+	  (ashift:GPI (sign_extend:GPI (match_operand 1 "register_operand"))
+		      (match_operand:QI 2 "aarch64_shift_imm_<mode>"))
+	  (sign_extend:GPI (match_dup 1))))]
+  "can_create_pseudo_p ()"
+  [(set (match_dup 4) (sign_extend:GPI (match_dup 1)))
+   (set (match_dup 0) (LOGICAL:GPI (ashift:GPI (match_dup 4) (match_dup 2))
+				   (match_dup 4)))]
+  "operands[4] = gen_reg_rtx (<MODE>mode);"
+)
+
 (define_insn "*<optab>_rol<mode>3"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(LOGICAL:GPI (rotate:GPI
--- gcc/testsuite/gcc.target/aarch64/pr100056.c.jj	2021-04-13 14:20:53.334784184 +0200
+++ gcc/testsuite/gcc.target/aarch64/pr100056.c	2021-04-13 19:44:09.358529648 +0200
@@ -0,0 +1,50 @@ 
+/* PR target/100056 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not {\t[us]bfiz\tw[0-9]+, w[0-9]+, 11} } } */
+
+int
+or_shift_u8 (unsigned char i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_u3a (unsigned i)
+{
+  i &= 7;
+  return i | (i << 11);
+}
+
+int
+or_shift_u3b (unsigned i)
+{
+  i = (i << 29) >> 29;
+  return i | (i << 11);
+}
+
+int
+or_shift_s16 (signed short i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_s8 (signed char i)
+{
+  return i | (i << 11);
+}
+
+int
+or_shift_s13 (int i)
+{
+  i = (i << 19) >> 19;
+  return i | (i << 11);
+}
+
+int
+or_shift_s3 (int i)
+{
+  i = (i << 29) >> 29;
+  return i | (i << 11);
+}