diff mbox

[AArch64] PR target/68696 FAIL: gcc.target/aarch64/vbslq_u64_1.c scan-assembler-times bif\tv 1

Message ID 5666A119.8010206@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Dec. 8, 2015, 9:21 a.m. UTC
Hi all,

The test gcc.target/aarch64/vbslq_u64_1.c started failing recently due to some tree-level changes.
This just exposed a deficiency in our xor-and-xor pattern for the vector bit-select pattern:
aarch64_simd_bsl<mode>_internal.

We now fail to match the rtx:
(set (reg:V4SI 79)
     (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 32 v0 [ a ])
                 (reg/v:V4SI 77 [ b ]))
             (reg:V4SI 34 v2 [ mask ]))
         (reg/v:V4SI 77 [ b ])))

whereas before combine attempted:
(set (reg:V4SI 79)
     (xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 77 [ b ])
                 (reg:V4SI 32 v0 [ a ]))
             (reg:V4SI 34 v2 [ mask ]))
         (reg/v:V4SI 77 [ b ])))

Note that just the order of the operands of the inner XOR has changed.
This could be solved by making the second operand of the outer XOR a 4th operand
of the pattern, enforcing that it should be equal to operand 2 or 3 in the pattern
condition and performing the appropriate swapping in the output template.
However, the aarch64_simd_bsl<mode>_internal pattern is expanded to by other
places in aarch64-simd.md and updating all the callsites to add a 4th operand is
wasteful and makes them harder to understand.

Therefore this patch adds a new define_insn with the match_dup of operand 2 in
the outer XOR.  I also had to update the alternatives/constraints in the pattern
and the output template. Basically it involves swapping operands 2 and 3 around in the
constraints and output templates.

The test now combines to a single vector bfi instruction again.

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/68696
     * config/aarch64/aarch64-simd.md (*aarch64_simd_bsl<mode>_alt):
     New pattern.
     (aarch64_simd_bsl<mode>_internal): Update comment to reflect
     the above.

Comments

Kyrill Tkachov Dec. 14, 2015, 11:51 a.m. UTC | #1
Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00826.html

Thanks,
Kyrill
On 08/12/15 09:21, Kyrill Tkachov wrote:
> Hi all,
>
> The test gcc.target/aarch64/vbslq_u64_1.c started failing recently due to some tree-level changes.
> This just exposed a deficiency in our xor-and-xor pattern for the vector bit-select pattern:
> aarch64_simd_bsl<mode>_internal.
>
> We now fail to match the rtx:
> (set (reg:V4SI 79)
>     (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 32 v0 [ a ])
>                 (reg/v:V4SI 77 [ b ]))
>             (reg:V4SI 34 v2 [ mask ]))
>         (reg/v:V4SI 77 [ b ])))
>
> whereas before combine attempted:
> (set (reg:V4SI 79)
>     (xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 77 [ b ])
>                 (reg:V4SI 32 v0 [ a ]))
>             (reg:V4SI 34 v2 [ mask ]))
>         (reg/v:V4SI 77 [ b ])))
>
> Note that just the order of the operands of the inner XOR has changed.
> This could be solved by making the second operand of the outer XOR a 4th operand
> of the pattern, enforcing that it should be equal to operand 2 or 3 in the pattern
> condition and performing the appropriate swapping in the output template.
> However, the aarch64_simd_bsl<mode>_internal pattern is expanded to by other
> places in aarch64-simd.md and updating all the callsites to add a 4th operand is
> wasteful and makes them harder to understand.
>
> Therefore this patch adds a new define_insn with the match_dup of operand 2 in
> the outer XOR.  I also had to update the alternatives/constraints in the pattern
> and the output template. Basically it involves swapping operands 2 and 3 around in the
> constraints and output templates.
>
> The test now combines to a single vector bfi instruction again.
>
> Bootstrapped and tested on aarch64.
>
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2015-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/68696
>     * config/aarch64/aarch64-simd.md (*aarch64_simd_bsl<mode>_alt):
>     New pattern.
>     (aarch64_simd_bsl<mode>_internal): Update comment to reflect
>     the above.
James Greenhalgh Dec. 16, 2015, 2:49 p.m. UTC | #2
On Tue, Dec 08, 2015 at 09:21:29AM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> The test gcc.target/aarch64/vbslq_u64_1.c started failing recently due to some tree-level changes.
> This just exposed a deficiency in our xor-and-xor pattern for the vector bit-select pattern:
> aarch64_simd_bsl<mode>_internal.
> 
> We now fail to match the rtx:
> (set (reg:V4SI 79)
>     (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 32 v0 [ a ])
>                 (reg/v:V4SI 77 [ b ]))
>             (reg:V4SI 34 v2 [ mask ]))
>         (reg/v:V4SI 77 [ b ])))
> 
> whereas before combine attempted:
> (set (reg:V4SI 79)
>     (xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 77 [ b ])
>                 (reg:V4SI 32 v0 [ a ]))
>             (reg:V4SI 34 v2 [ mask ]))
>         (reg/v:V4SI 77 [ b ])))
> 
> Note that just the order of the operands of the inner XOR has changed.
> This could be solved by making the second operand of the outer XOR a 4th operand
> of the pattern, enforcing that it should be equal to operand 2 or 3 in the pattern
> condition and performing the appropriate swapping in the output template.
> However, the aarch64_simd_bsl<mode>_internal pattern is expanded to by other
> places in aarch64-simd.md and updating all the callsites to add a 4th operand is
> wasteful and makes them harder to understand.
> 
> Therefore this patch adds a new define_insn with the match_dup of operand 2 in
> the outer XOR.  I also had to update the alternatives/constraints in the pattern
> and the output template. Basically it involves swapping operands 2 and 3 around in the
> constraints and output templates.

Yuck, but OK.

Thanks,
James
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 030a1013caa8a965bcd1615c9686d0be715be921..2856f017c16380623960e21d66149d18efe176f0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2153,6 +2153,10 @@  (define_insn "aarch64_reduc_<maxmin_uns>_internal<mode>"
 ;;     bit op0, op2, mask
 ;;   if (op0 = op2) (so 0-bits in mask choose bits from op1, else op0)
 ;;     bif op0, op1, mask
+;;
+;; This pattern is expanded to by the aarch64_simd_bsl<mode> expander.
+;; Some forms of straight-line code may generate the equivalent form
+;; in *aarch64_simd_bsl<mode>_alt.
 
 (define_insn "aarch64_simd_bsl<mode>_internal"
   [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
@@ -2172,6 +2176,29 @@  (define_insn "aarch64_simd_bsl<mode>_internal"
   [(set_attr "type" "neon_bsl<q>")]
 )
 
+;; We need this form in addition to the above pattern to match the case
+;; when combine tries merging three insns such that the second operand of
+;; the outer XOR matches the second operand of the inner XOR rather than
+;; the first.  The two are equivalent but since recog doesn't try all
+;; permutations of commutative operations, we have to have a separate pattern.
+
+(define_insn "*aarch64_simd_bsl<mode>_alt"
+  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
+	(xor:VSDQ_I_DI
+	   (and:VSDQ_I_DI
+	     (xor:VSDQ_I_DI
+	       (match_operand:VSDQ_I_DI 3 "register_operand" "w,w,0")
+	       (match_operand:VSDQ_I_DI 2 "register_operand" "w,0,w"))
+	      (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
+	  (match_dup:VSDQ_I_DI 2)))]
+  "TARGET_SIMD"
+  "@
+  bsl\\t%0.<Vbtype>, %3.<Vbtype>, %2.<Vbtype>
+  bit\\t%0.<Vbtype>, %3.<Vbtype>, %1.<Vbtype>
+  bif\\t%0.<Vbtype>, %2.<Vbtype>, %1.<Vbtype>"
+  [(set_attr "type" "neon_bsl<q>")]
+)
+
 (define_expand "aarch64_simd_bsl<mode>"
   [(match_operand:VALLDIF 0 "register_operand")
    (match_operand:<V_cmp_result> 1 "register_operand")