[AArch64] Use 'x' constraint for vector HFmode multiplication by indexed element instructions

Message ID 58CA5B10.3020605@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov March 16, 2017, 9:29 a.m.
Hi all,

The advsimd-intrinsics.exp tests for the fmul and fmulx instructions that perform a multiplication by indexed element
have started generating invalid assembly in my testing. For example:
Error: register number out of range 0 to 15 at operand 3 -- `fmulx v24.8h,v23.8h,v22.h[0]'

The problem is that the indexed vector register (v22 in this case) has to be in V0-V15 when accessed as a 16-bit element.
The constraints on the pattern don't reflect this. We already have the h_con constraint that's supposed to do what we want,
but it incorrectly returns the "w" constraint for HF inner modes and it isn't applied in all the patterns that it needs to be
(it's needed for the FMLA, FMLS, FMUL, FMULx by element patterns).
This patch fixes those issues by changing h_con to return the "x" constraint for HF inner modes and applying it to all the operands
that need it in aarch64-simd.md

With this patch the advsimd-intrinsics.exp tests now generate valid assembly and don't complain, so no new regression tests are added.

Bootstrapped and tested on aarch64-none-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2017-03-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/iterators.md (h_con): Return "x" for V4HF and V8HF.
     * config/aarch64/aarch64-simd.md (*aarch64_fma4_elt_from_dup<mode>):
     Use h_con constraint for operand 1.
     (*aarch64_fnma4_elt_from_dup<mode>): Likewise.
     (*aarch64_mulx_elt_from_dup<mode>): Likewise for operand 2.

Comments

James Greenhalgh March 16, 2017, 9:41 a.m. | #1
On Thu, Mar 16, 2017 at 09:29:52AM +0000, Kyrill Tkachov wrote:
> Hi all,
> 
> The advsimd-intrinsics.exp tests for the fmul and fmulx instructions that
> perform a multiplication by indexed element have started generating invalid
> assembly in my testing. For example:
> Error: register number out of range 0 to 15 at operand 3 -- `fmulx v24.8h,v23.8h,v22.h[0]'
> 
> The problem is that the indexed vector register (v22 in this case) has to be
> in V0-V15 when accessed as a 16-bit element.  The constraints on the pattern
> don't reflect this. We already have the h_con constraint that's supposed to
> do what we want, but it incorrectly returns the "w" constraint for HF inner
> modes and it isn't applied in all the patterns that it needs to be (it's
> needed for the FMLA, FMLS, FMUL, FMULx by element patterns).  This patch
> fixes those issues by changing h_con to return the "x" constraint for HF
> inner modes and applying it to all the operands that need it in
> aarch64-simd.md
> 
> With this patch the advsimd-intrinsics.exp tests now generate valid assembly
> and don't complain, so no new regression tests are added.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?

This isn't technically a regression, but the fix is small and safe, and
without this we can generate code which can't assemble. That's critical
enough in my mind to justify the patch go in now.

OK for trunk.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2017-03-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>     * config/aarch64/iterators.md (h_con): Return "x" for V4HF and V8HF.
>     * config/aarch64/aarch64-simd.md (*aarch64_fma4_elt_from_dup<mode>):
>     Use h_con constraint for operand 1.
>     (*aarch64_fnma4_elt_from_dup<mode>): Likewise.
>     (*aarch64_mulx_elt_from_dup<mode>): Likewise for operand 2.
Christophe Lyon March 16, 2017, 9:54 a.m. | #2
On 16 March 2017 at 10:29, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi all,
>
> The advsimd-intrinsics.exp tests for the fmul and fmulx instructions that
> perform a multiplication by indexed element
> have started generating invalid assembly in my testing. For example:
> Error: register number out of range 0 to 15 at operand 3 -- `fmulx
> v24.8h,v23.8h,v22.h[0]'
>

Are you configuring gcc in a special way? I can see no failure on trunk
in advsimd-intrinsics tests at the moment.
Should the tests be improved?

Thanks,

Christophe

> The problem is that the indexed vector register (v22 in this case) has to be
> in V0-V15 when accessed as a 16-bit element.
> The constraints on the pattern don't reflect this. We already have the h_con
> constraint that's supposed to do what we want,
> but it incorrectly returns the "w" constraint for HF inner modes and it
> isn't applied in all the patterns that it needs to be
> (it's needed for the FMLA, FMLS, FMUL, FMULx by element patterns).
> This patch fixes those issues by changing h_con to return the "x" constraint
> for HF inner modes and applying it to all the operands
> that need it in aarch64-simd.md
>
> With this patch the advsimd-intrinsics.exp tests now generate valid assembly
> and don't complain, so no new regression tests are added.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2017-03-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/iterators.md (h_con): Return "x" for V4HF and V8HF.
>     * config/aarch64/aarch64-simd.md (*aarch64_fma4_elt_from_dup<mode>):
>     Use h_con constraint for operand 1.
>     (*aarch64_fnma4_elt_from_dup<mode>): Likewise.
>     (*aarch64_mulx_elt_from_dup<mode>): Likewise for operand 2.
Kyrill Tkachov March 16, 2017, 10:01 a.m. | #3
Hi Christophe,

On 16/03/17 09:54, Christophe Lyon wrote:
> On 16 March 2017 at 10:29, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>> Hi all,
>>
>> The advsimd-intrinsics.exp tests for the fmul and fmulx instructions that
>> perform a multiplication by indexed element
>> have started generating invalid assembly in my testing. For example:
>> Error: register number out of range 0 to 15 at operand 3 -- `fmulx
>> v24.8h,v23.8h,v22.h[0]'
>>
> Are you configuring gcc in a special way? I can see no failure on trunk
> in advsimd-intrinsics tests at the moment.
> Should the tests be improved?

I was configuring with --with-cpu=cortex-a57.
I also have other patches in my tree that I've been experimenting with that modify AdvSIMD codegen
so maybe they triggered this. I haven't tried reproducing it with clean trunk (though I bootstrapped
and tested the patch applied on a clean trunk).
I think the tests did what they should and the assembler caught the error, so I don't see how the tests
could be improved in this case.

Thanks,
Kyrill

> Thanks,
>
> Christophe
>
>> The problem is that the indexed vector register (v22 in this case) has to be
>> in V0-V15 when accessed as a 16-bit element.
>> The constraints on the pattern don't reflect this. We already have the h_con
>> constraint that's supposed to do what we want,
>> but it incorrectly returns the "w" constraint for HF inner modes and it
>> isn't applied in all the patterns that it needs to be
>> (it's needed for the FMLA, FMLS, FMUL, FMULx by element patterns).
>> This patch fixes those issues by changing h_con to return the "x" constraint
>> for HF inner modes and applying it to all the operands
>> that need it in aarch64-simd.md
>>
>> With this patch the advsimd-intrinsics.exp tests now generate valid assembly
>> and don't complain, so no new regression tests are added.
>>
>> Bootstrapped and tested on aarch64-none-linux-gnu.
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>> 2017-03-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * config/aarch64/iterators.md (h_con): Return "x" for V4HF and V8HF.
>>      * config/aarch64/aarch64-simd.md (*aarch64_fma4_elt_from_dup<mode>):
>>      Use h_con constraint for operand 1.
>>      (*aarch64_fnma4_elt_from_dup<mode>): Likewise.
>>      (*aarch64_mulx_elt_from_dup<mode>): Likewise for operand 2.

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index b3db7be..bf13f07 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1660,7 +1660,7 @@  (define_insn "*aarch64_fma4_elt_from_dup<mode>"
   [(set (match_operand:VMUL 0 "register_operand" "=w")
     (fma:VMUL
       (vec_duplicate:VMUL
-	  (match_operand:<VEL> 1 "register_operand" "w"))
+	  (match_operand:<VEL> 1 "register_operand" "<h_con>"))
       (match_operand:VMUL 2 "register_operand" "w")
       (match_operand:VMUL 3 "register_operand" "0")))]
   "TARGET_SIMD"
@@ -1739,7 +1739,7 @@  (define_insn "*aarch64_fnma4_elt_from_dup<mode>"
       (neg:VMUL
         (match_operand:VMUL 2 "register_operand" "w"))
       (vec_duplicate:VMUL
-	(match_operand:<VEL> 1 "register_operand" "w"))
+	(match_operand:<VEL> 1 "register_operand" "<h_con>"))
       (match_operand:VMUL 3 "register_operand" "0")))]
   "TARGET_SIMD"
   "fmls\t%0.<Vtype>, %2.<Vtype>, %1.<Vetype>[0]"
@@ -3191,7 +3191,7 @@  (define_insn "*aarch64_mulx_elt_from_dup<mode>"
 	(unspec:VHSDF
 	 [(match_operand:VHSDF 1 "register_operand" "w")
 	  (vec_duplicate:VHSDF
-	    (match_operand:<VEL> 2 "register_operand" "w"))]
+	    (match_operand:<VEL> 2 "register_operand" "<h_con>"))]
 	 UNSPEC_FMULX))]
   "TARGET_SIMD"
   "fmulx\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[0]";
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 1ddf6ad..43be7fd 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -749,11 +749,11 @@  (define_mode_attr vswap_width_name [(V8QI "to_128") (V16QI "to_64")
 				    (DF   "to_128") (V2DF  "to_64")])
 
 ;; For certain vector-by-element multiplication instructions we must
-;; constrain the HI cases to use only V0-V15.  This is covered by
+;; constrain the 16-bit cases to use only V0-V15.  This is covered by
 ;; the 'x' constraint.  All other modes may use the 'w' constraint.
 (define_mode_attr h_con [(V2SI "w") (V4SI "w")
 			 (V4HI "x") (V8HI "x")
-			 (V4HF "w") (V8HF "w")
+			 (V4HF "x") (V8HF "x")
 			 (V2SF "w") (V4SF "w")
 			 (V2DF "w") (DF "w")])