From patchwork Wed Sep 28 06:49:14 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@linaro.org>
X-Patchwork-Id: 116703
Return-Path: 
 <gcc-patches-return-302900-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id 1A064B6F7D
	for <incoming@patchwork.ozlabs.org>;
	Wed, 28 Sep 2011 16:49:45 +1000 (EST)
Received: (qmail 12895 invoked by alias); 28 Sep 2011 06:49:41 -0000
Received: (qmail 12883 invoked by uid 22791); 28 Sep 2011 06:49:39 -0000
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0	tests=BAYES_00,
	RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from mail-ww0-f51.google.com (HELO mail-ww0-f51.google.com)
	(74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP;
	Wed, 28 Sep 2011 06:49:25 +0000
Received: by wwf10 with SMTP id 10so7170941wwf.8 for
	<gcc-patches@gcc.gnu.org>; Tue, 27 Sep 2011 23:49:24 -0700 (PDT)
Received: by 10.227.138.147 with SMTP id a19mr10182236wbu.23.1317192563961;
	Tue, 27 Sep 2011 23:49:23 -0700 (PDT)
Received: from richards-thinkpad.stglab.manchester.uk.ibm.com
	(rsandifo.gotadsl.co.uk. [82.133.89.107]) by mx.google.com
	with ESMTPS id ex12sm826158wbb.23.2011.09.27.23.49.21
	(version=TLSv1/SSLv3 cipher=OTHER);
	Tue, 27 Sep 2011 23:49:22 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@linaro.org>
To: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
Mail-Followup-To: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>,
	gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [ARM] Optimise handling of neon_vget_high/low
References: <g4ehzji9w6.fsf@richards-thinkpad.stglab.manchester.uk.ibm.com>
	<CACUk7=WxU83ET0FQdYo3YyVu3iOaj44ZT8F1sRD55-1caXuByw@mail.gmail.com>
Date: Wed, 28 Sep 2011 07:49:14 +0100
In-Reply-To: 
 <CACUk7=WxU83ET0FQdYo3YyVu3iOaj44ZT8F1sRD55-1caXuByw@mail.gmail.com>	(Ramana
	Radhakrishnan's message of "Fri, 23 Sep 2011 10:23:04 +0100")
Message-ID: <g4r5315fh1.fsf@richards-thinkpad.stglab.manchester.uk.ibm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
MIME-Version: 1.0
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org> writes:
> On 14 September 2011 13:30, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> neon_vget_high and neon_vget_low extract one half of a vector.
>> The patterns look like:
>>
>> (define_insn "neon_vget_highv16qi"
>>  [(set (match_operand:V8QI 0 "s_register_operand" "=w")
>>        (vec_select:V8QI (match_operand:V16QI 1 "s_register_operand" "w")
>>                         (parallel [(const_int 8) (const_int 9)
>>                                    (const_int 10) (const_int 11)
>>                                    (const_int 12) (const_int 13)
>>                                    (const_int 14) (const_int 15)])))]
>>  "TARGET_NEON"
>> {
>>  int dest = REGNO (operands[0]);
>>  int src = REGNO (operands[1]);
>>
>>  if (dest != src + 2)
>>    return "vmov\t%P0, %f1";
>>  else
>>    return "";
>> }
>>  [(set_attr "neon_type" "neon_bp_simple")]
>> )
>>
>> But there's nothing here to tell the register allocator what's expected
>> of it, so we do often get the move.  The patch below makes the patterns
>> expand to normal subreg moves instead.
>>
>> Unfortunately, when I first tried this, I ran across some bugs in
>> simplify-rtx.c.  They should be fixed now.  Of course, I can't
>> guarantee that there are other similar bugs elsewhere, but I'll
>> try to fix any that crop up.
>>
>> The new patterns preserve the current treatment on big-endian targets.
>> Namely, the "low" half is always in the lower-numbered registers
>> (subreg byte offset 0).
>>
>> Tested on arm-linux-gnueabi.  OK to install?
>
>
> This is OK . Please watch out for any fallout that comes as a result
> of this . It would be useful to do some big endian testing at some
> point but I'm not going to let that hold up this patch.
>
> While you are here can you look at the quad_halves_<code>v4si etc.
> patterns neon_move_lo_quad_<mode> and neon_move_hi_quad_<mode>
> patterns which seem to have the same problem ?

Ah, yeah, thanks for the pointer.

Tested on arm-linux-gnueabi, and on benchmarks which should (and did)
benefit from it.  OK to install?

Richard


gcc/
	* config/arm/neon.md (neon_move_lo_quad_<mode>): Delete.
	(neon_move_hi_quad_<mode>): Likewise.
	(move_hi_quad_<mode>, move_lo_quad_<mode>): Use subreg moves.

Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	2011-09-27 09:27:18.000000000 +0100
+++ gcc/config/arm/neon.md	2011-09-27 09:45:26.008275971 +0100
@@ -1251,66 +1251,14 @@ (define_insn "quad_halves_<code>v16qi"
                     (const_string "neon_int_1") (const_string "neon_int_5")))]
 )
 
-; FIXME: We wouldn't need the following insns if we could write subregs of
-; vector registers. Make an attempt at removing unnecessary moves, though
-; we're really at the mercy of the register allocator.
-
-(define_insn "neon_move_lo_quad_<mode>"
-  [(set (match_operand:ANY128 0 "s_register_operand" "+w")
-        (vec_concat:ANY128
-          (match_operand:<V_HALF> 1 "s_register_operand" "w")
-          (vec_select:<V_HALF> 
-		(match_dup 0)
-	        (match_operand:ANY128 2 "vect_par_constant_high" ""))))]
-  "TARGET_NEON"
-{
-  int dest = REGNO (operands[0]);
-  int src = REGNO (operands[1]);
-
-  if (dest != src)
-    return "vmov\t%e0, %P1";
-  else
-    return "";
-}
-  [(set_attr "neon_type" "neon_bp_simple")]
-)
-
-(define_insn "neon_move_hi_quad_<mode>"
-  [(set (match_operand:ANY128 0 "s_register_operand" "+w")
-        (vec_concat:ANY128
-          (vec_select:<V_HALF>
-		(match_dup 0)
-	        (match_operand:ANY128 2 "vect_par_constant_low" ""))
-          (match_operand:<V_HALF> 1 "s_register_operand" "w")))]
-	   
-  "TARGET_NEON"
-{
-  int dest = REGNO (operands[0]);
-  int src = REGNO (operands[1]);
-
-  if (dest != src)
-    return "vmov\t%f0, %P1";
-  else
-    return "";
-}
-  [(set_attr "neon_type" "neon_bp_simple")]
-)
-
 (define_expand "move_hi_quad_<mode>"
  [(match_operand:ANY128 0 "s_register_operand" "")
   (match_operand:<V_HALF> 1 "s_register_operand" "")]
  "TARGET_NEON"
 {
-  rtvec v = rtvec_alloc (<V_mode_nunits>/2);
-  rtx t1;
-  int i;
-
-  for (i=0; i < (<V_mode_nunits>/2); i++)
-     RTVEC_ELT (v, i) = GEN_INT (i);
-
-  t1 = gen_rtx_PARALLEL (<MODE>mode, v);
-  emit_insn (gen_neon_move_hi_quad_<mode> (operands[0], operands[1], t1));
-
+  emit_move_insn (simplify_gen_subreg (<V_HALF>mode, operands[0], <MODE>mode,
+				       GET_MODE_SIZE (<V_HALF>mode)),
+		  operands[1]);
   DONE;
 })
 
@@ -1319,16 +1267,9 @@ (define_expand "move_lo_quad_<mode>"
   (match_operand:<V_HALF> 1 "s_register_operand" "")]
  "TARGET_NEON"
 {
-  rtvec v = rtvec_alloc (<V_mode_nunits>/2);
-  rtx t1;
-  int i;
-
-  for (i=0; i < (<V_mode_nunits>/2); i++)
-     RTVEC_ELT (v, i) = GEN_INT ((<V_mode_nunits>/2) + i);
-
-  t1 = gen_rtx_PARALLEL (<MODE>mode, v);
-  emit_insn (gen_neon_move_lo_quad_<mode> (operands[0], operands[1], t1));
-
+  emit_move_insn (simplify_gen_subreg (<V_HALF>mode, operands[0],
+				       <MODE>mode, 0),
+		  operands[1]);
   DONE;
 })