diff mbox

[ARM] Fix vec_pack_trunc pattern for vectorize_with_neon_quad.

Message ID CACUk7=XmJgAz4=WfvVApDSj3rvY7mXy3nxoJVv7s3cbBWwd5gg@mail.gmail.com
State New
Headers show

Commit Message

Ramana Radhakrishnan Aug. 16, 2011, 2:20 p.m. UTC
Hi,

While looking at a failure with regrename and
mvectorize-with-neon-quad I noticed that the early-clobber in this
vec_pack_trunc pattern is superfluous given that we can use
reg_overlap_mentioned_p to decide in which order we want to emit these
2 instructions. While it works around the problem in regrename.c I
still think that the behaviour in regrename is a bit suspicious and
needs some more investigation.

Refer to my post on gcc@ for more on that particular case.

http://gcc.gnu.org/ml/gcc/2011-08/msg00284.html

I am currently running tests with Ira's patch of this morning

http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01304.html

that turns on mvectorize-with-neon-quad by default to make sure there
are no regressions.

Will commit if no regressions.


cheers
Ramana

2011-08-16  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

	* config/arm/neon.md (vec_pack_trunc_<mode> VN): Remove
	early-clobber. Adjust output template for overlap checks.
diff mbox

Patch

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 24dd941..06c699a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -5631,14 +5631,19 @@ 
 ; the semantics of the instructions require.
 
 (define_insn "vec_pack_trunc_<mode>"
- [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w")
+ [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=w")
        (vec_concat:<V_narrow_pack> 
 		(truncate:<V_narrow> 
 			(match_operand:VN 1 "register_operand" "w"))
 		(truncate:<V_narrow>
 			(match_operand:VN 2 "register_operand" "w"))))]
  "TARGET_NEON && !BYTES_BIG_ENDIAN"
- "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2"
+ {
+ if (reg_overlap_mentioned_p (operands[0], operands[1]))
+   return "vmovn.i<V_sz_elem>\t%e0, %q1\;vmovn.i<V_sz_elem>\t%f0, %q2";
+ else
+   return "vmovn.i<V_sz_elem>\t%f0, %q2\;vmovn.i<V_sz_elem>\t%e0, %q1";
+ }
  [(set_attr "neon_type" "neon_shift_1")
   (set_attr "length" "8")]
 )