Patchwork [ARM] Fix vec_pack_trunc pattern for vectorize_with_neon_quad.

login
register
mail settings
Submitter Ramana Radhakrishnan
Date Aug. 26, 2011, 2:57 p.m.
Message ID <CACUk7=UpjxnFnvgbTR2UUeeiBK7UwHacx5OsoUofUtWZTmXzqw@mail.gmail.com>
Download mbox | patch
Permalink /patch/111792/
State New
Headers show

Comments

Ramana Radhakrishnan - Aug. 26, 2011, 2:57 p.m.
On 16 August 2011 15:20, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> Hi,
>
> While looking at a failure with regrename and
> mvectorize-with-neon-quad I noticed that the early-clobber in this
> vec_pack_trunc pattern is superfluous given that we can use
> reg_overlap_mentioned_p to decide in which order we want to emit these
> 2 instructions. While it works around the problem in regrename.c I
> still think that the behaviour in regrename is a bit suspicious and
> needs some more investigation.
>

RichardS finally fixed the problem in data-flow and hence we should be
able to turn on vectorize_with_quad anyway.

Here's the patch which I thought I should have committed as a
workaround but I think it's better to split this further in the case
where the 2 registers are equal because otherwise you are pointlessly
creating a stall in the Neon pipe for the vmovn result to arrive.
Hence I'm not committing this patch.

Tests finished OK btw for this patch.


cheers
Ramana

index 24dd941..2c60c5f 100644

Patch

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -5631,14 +5631,29 @@ 
 ; the semantics of the instructions require.

 (define_insn "vec_pack_trunc_<mode>"
- [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w")
+ [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=w")
        (vec_concat:<V_narrow_pack>
                (truncate:<V_narrow>
                        (match_operand:VN 1 "register_operand" "w"))
                (truncate:<V_narrow>
                        (match_operand:VN 2 "register_operand" "w"))))]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
- "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2"
+ {
+ /* If operand1 and operand2 are identical, then the second
+    narrowing operation isn't needed as the values obtained
+    in both parts of the destination q register are identical.
+    This precludes the need for an early clobber in the destination
+    operand.  */
+ if (rtx_equal_p (operands[1], operands[2]))
+    return "vmovn.i<V_sz_elem>\\t%e0, %q1\;vmov.i<V_sz_elem>\\t%f0, %e0";
+ else
+  {
+   if (reg_overlap_mentioned_p (operands[0], operands[2]))
+     return "vmovn.i<V_sz_elem>\\t%f0, %q2\;vmovn.i<V_sz_elem>\\t%e0, %q1";
+   else
+     return "vmovn.i<V_sz_elem>\\t%e0, %q1\;vmovn.i<V_sz_elem>\\t%f0, %q2";
+  }
+ }
  [(set_attr "neon_type" "neon_shift_1")
   (set_attr "length" "8")]
 )