[ARM] Fix vec_pack_trunc pattern for vectorize_with_neon_quad.

Submitted by Ramana Radhakrishnan on Aug. 26, 2011, 2:57 p.m.

Details

Message ID CACUk7=UpjxnFnvgbTR2UUeeiBK7UwHacx5OsoUofUtWZTmXzqw@mail.gmail.com
State New
Headers show

Commit Message

Ramana Radhakrishnan Aug. 26, 2011, 2:57 p.m.
On 16 August 2011 15:20, Ramana Radhakrishnan
<ramana.radhakrishnan@linaro.org> wrote:
> Hi,
>
> While looking at a failure with regrename and
> mvectorize-with-neon-quad I noticed that the early-clobber in this
> vec_pack_trunc pattern is superfluous given that we can use
> reg_overlap_mentioned_p to decide in which order we want to emit these
> 2 instructions. While it works around the problem in regrename.c I
> still think that the behaviour in regrename is a bit suspicious and
> needs some more investigation.
>

RichardS finally fixed the problem in data-flow and hence we should be
able to turn on vectorize_with_quad anyway.

Here's the patch which I thought I should have committed as a
workaround but I think it's better to split this further in the case
where the 2 registers are equal because otherwise you are pointlessly
creating a stall in the Neon pipe for the vmovn result to arrive.
Hence I'm not committing this patch.

Tests finished OK btw for this patch.


cheers
Ramana

index 24dd941..2c60c5f 100644

Patch hide | download patch | download mbox

--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -5631,14 +5631,29 @@ 
 ; the semantics of the instructions require.

 (define_insn "vec_pack_trunc_<mode>"
- [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w")
+ [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=w")
        (vec_concat:<V_narrow_pack>
                (truncate:<V_narrow>
                        (match_operand:VN 1 "register_operand" "w"))
                (truncate:<V_narrow>
                        (match_operand:VN 2 "register_operand" "w"))))]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
- "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2"
+ {
+ /* If operand1 and operand2 are identical, then the second
+    narrowing operation isn't needed as the values obtained
+    in both parts of the destination q register are identical.
+    This precludes the need for an early clobber in the destination
+    operand.  */
+ if (rtx_equal_p (operands[1], operands[2]))
+    return "vmovn.i<V_sz_elem>\\t%e0, %q1\;vmov.i<V_sz_elem>\\t%f0, %e0";
+ else
+  {
+   if (reg_overlap_mentioned_p (operands[0], operands[2]))
+     return "vmovn.i<V_sz_elem>\\t%f0, %q2\;vmovn.i<V_sz_elem>\\t%e0, %q1";
+   else
+     return "vmovn.i<V_sz_elem>\\t%e0, %q1\;vmovn.i<V_sz_elem>\\t%f0, %q2";
+  }
+ }
  [(set_attr "neon_type" "neon_shift_1")
   (set_attr "length" "8")]
 )