diff mbox

[AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.

Message ID 1441787488-19661-1-git-send-email-james.greenhalgh@arm.com
State New
Headers show

Commit Message

James Greenhalgh Sept. 9, 2015, 8:31 a.m. UTC
Hi,

This patch clears up some remaining confusion in the vector lane orderings
for the two intrinsics mentioned in the title.

Bootstrapped on aarch64-none-linux-gnu and regression tested for
aarch64_be-none-elf with no issues.

OK?

Thanks,
James

---
2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite
	as an expand.
	(vec_unpacks_hi_v4sf):  Likewise.
	(aarch64_float_extend_lo_v2df): Rename to...
	(aarch64_fcvtl_v4sf): This.
	(aarch64_fcvtl2_v4sf): New.
	(aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
	(aarch64_float_truncate_hi_v4sf_le): New.
	(aarch64_float_truncate_hi_v4sf_be): Likewise.

Comments

Christophe Lyon Sept. 9, 2015, 9:28 a.m. UTC | #1
On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>
> Hi,
>
> This patch clears up some remaining confusion in the vector lane orderings
> for the two intrinsics mentioned in the title.
>
> Bootstrapped on aarch64-none-linux-gnu and regression tested for
> aarch64_be-none-elf with no issues.
>

Does this actually fix an existing testcase?


> OK?
>
> Thanks,
> James
>
> ---
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite
>         as an expand.
>         (vec_unpacks_hi_v4sf):  Likewise.
>         (aarch64_float_extend_lo_v2df): Rename to...
>         (aarch64_fcvtl_v4sf): This.
>         (aarch64_fcvtl2_v4sf): New.
>         (aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
>         (aarch64_float_truncate_hi_v4sf_le): New.
>         (aarch64_float_truncate_hi_v4sf_be): Likewise.
>
Alan Lawrence Sept. 9, 2015, 10:31 a.m. UTC | #2
Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this 
the same issue as my patch 12 "with BE RTL fix"? 
(https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as 
r227551 last night and since this reparameterizes the patterns I don't think 
your patch will apply to current HEAD.

If my patch is wrong...well, that may be, I haven't understood the issue yet. 
But it sounds like the first thing we need is a decent testcase? (Or is the 
confusion just in the RTL representation, so a testcase would require getting 
constant-folding to happen in RTL, which I tried but failed to make that happen 
myself?)

--Alan

On 09/09/15 09:31, James Greenhalgh wrote:
>
> Hi,
>
> This patch clears up some remaining confusion in the vector lane orderings
> for the two intrinsics mentioned in the title.
>
> Bootstrapped on aarch64-none-linux-gnu and regression tested for
> aarch64_be-none-elf with no issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
> 	* config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): Rewrite
> 	as an expand.
> 	(vec_unpacks_hi_v4sf):  Likewise.
> 	(aarch64_float_extend_lo_v2df): Rename to...
> 	(aarch64_fcvtl_v4sf): This.
> 	(aarch64_fcvtl2_v4sf): New.
> 	(aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
> 	(aarch64_float_truncate_hi_v4sf_le): New.
> 	(aarch64_float_truncate_hi_v4sf_be): Likewise.
>
Alan Lawrence Sept. 10, 2015, 11:21 a.m. UTC | #3
On 09/09/15 11:31, Alan Lawrence wrote:
> Hmmm, hang on. I'm not quite sure what the actual issue/bug is here, but is this
> the same issue as my patch 12 "with BE RTL fix"?
> (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01482.html, explanation last at
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02365.html) I pushed this as
> r227551 last night and since this reparameterizes the patterns I don't think
> your patch will apply to current HEAD.
>
> If my patch is wrong...well, that may be, I haven't understood the issue yet.

In particular, we should expect the vec_unpacks standard pattern to have 
different behaviour (from a tree POV), as this is what I find searching for 
VEC_UNPACK in tree-vect-stmts.c:


bool
supportable_widening_operation
{
...
    switch (code)
      {
...
      CASE_CONVERT:
        c1 = VEC_UNPACK_LO_EXPR;
        c2 = VEC_UNPACK_HI_EXPR;
        break;

      case FLOAT_EXPR:
        c1 = VEC_UNPACK_FLOAT_LO_EXPR;
        c2 = VEC_UNPACK_FLOAT_HI_EXPR;
        break;

      case FIX_TRUNC_EXPR:
        /* ??? Not yet implemented due to missing VEC_UNPACK_FIX_TRUNC_HI_EXPR/
           VEC_UNPACK_FIX_TRUNC_LO_EXPR tree codes and optabs used for
           computing the operation.  */
        return false;

      default:
        gcc_unreachable ();
      }

    if (BYTES_BIG_ENDIAN && c1 != VEC_WIDEN_MULT_EVEN_EXPR)
      std::swap (c1, c2);



Yes, IIUC this goes against the principle of tree being the same regardless of 
underlying endianness.

--Alan
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 75fa0ab..c7ae956 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1691,39 +1691,65 @@ 
 
 ;; Float widening operations.
 
-(define_insn "vec_unpacks_lo_v4sf"
+(define_insn "aarch64_float_extend_lo_v2df"
   [(set (match_operand:V2DF 0 "register_operand" "=w")
 	(float_extend:V2DF
-	  (vec_select:V2SF
-	    (match_operand:V4SF 1 "register_operand" "w")
-	    (parallel [(const_int 0) (const_int 1)])
-	  )))]
+	  (match_operand:V2SF 1 "register_operand" "w")))]
   "TARGET_SIMD"
   "fcvtl\\t%0.2d, %1.2s"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "aarch64_float_extend_lo_v2df"
+(define_insn "aarch64_fcvtl_v4sf"
   [(set (match_operand:V2DF 0 "register_operand" "=w")
 	(float_extend:V2DF
-	  (match_operand:V2SF 1 "register_operand" "w")))]
+	  (vec_select:V2SF
+	    (match_operand:V4SF 1 "register_operand" "w")
+	    (match_operand:V4SF 2 "vect_par_cnst_lo_half" ""))))]
   "TARGET_SIMD"
   "fcvtl\\t%0.2d, %1.2s"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
-(define_insn "vec_unpacks_hi_v4sf"
+(define_insn "aarch64_fcvtl2_v4sf"
   [(set (match_operand:V2DF 0 "register_operand" "=w")
 	(float_extend:V2DF
 	  (vec_select:V2SF
 	    (match_operand:V4SF 1 "register_operand" "w")
-	    (parallel [(const_int 2) (const_int 3)])
-	  )))]
+	    (match_operand:V4SF 2 "vect_par_cnst_hi_half" ""))))]
   "TARGET_SIMD"
   "fcvtl2\\t%0.2d, %1.4s"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
+(define_expand "vec_unpacks_lo_v4sf"
+  [(match_operand:V2DF 0 "register_operand" "=w")
+   (match_operand:V4SF 1 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V4SFmode, false);
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_fcvtl2_v4sf
+			     : gen_aarch64_fcvtl_v4sf;
+  emit_insn (gen (operands[0], operands[1], p));
+  DONE;
+}
+)
+
+(define_expand "vec_unpacks_hi_v4sf"
+  [(match_operand:V2DF 0 "register_operand" "=w")
+   (match_operand:V4SF 1 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx p = aarch64_simd_vect_par_cnst_half (V4SFmode, true);
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_fcvtl_v4sf
+			     : gen_aarch64_fcvtl2_v4sf;
+  emit_insn (gen (operands[0], operands[1], p));
+  DONE;
+}
+)
+
 ;; Float narrowing operations.
 
 (define_insn "aarch64_float_truncate_lo_v2sf"
@@ -1735,17 +1761,42 @@ 
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
-(define_insn "aarch64_float_truncate_hi_v4sf"
+(define_insn "aarch64_float_truncate_hi_v4sf_le"
   [(set (match_operand:V4SF 0 "register_operand" "=w")
     (vec_concat:V4SF
       (match_operand:V2SF 1 "register_operand" "0")
       (float_truncate:V2SF
 	(match_operand:V2DF 2 "register_operand" "w"))))]
-  "TARGET_SIMD"
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "fcvtn2\\t%0.4s, %2.2d"
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
+(define_insn "aarch64_float_truncate_hi_v4sf_be"
+  [(set (match_operand:V4SF 0 "register_operand" "=w")
+    (vec_concat:V4SF
+      (float_truncate:V2SF
+	(match_operand:V2DF 2 "register_operand" "w"))
+      (match_operand:V2SF 1 "register_operand" "0")))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "fcvtn2\\t%0.4s, %2.2d"
+  [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
+)
+
+(define_expand "aarch64_float_truncate_hi_v4sf"
+  [(match_operand:V4SF 0 "register_operand" "=w")
+   (match_operand:V2SF 1 "register_operand" "0")
+   (match_operand:V2DF 2 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_float_truncate_hi_v4sf_be
+			     : gen_aarch64_float_truncate_hi_v4sf_le;
+  emit_insn (gen (operands[0], operands[1], operands[2]));
+  DONE;
+}
+)
+
 (define_expand "vec_pack_trunc_v2df"
   [(set (match_operand:V4SF 0 "register_operand")
       (vec_concat:V4SF