From patchwork Wed Sep 28 06:49:14 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 116703 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 1A064B6F7D for ; Wed, 28 Sep 2011 16:49:45 +1000 (EST) Received: (qmail 12895 invoked by alias); 28 Sep 2011 06:49:41 -0000 Received: (qmail 12883 invoked by uid 22791); 28 Sep 2011 06:49:39 -0000 X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from mail-ww0-f51.google.com (HELO mail-ww0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 28 Sep 2011 06:49:25 +0000 Received: by wwf10 with SMTP id 10so7170941wwf.8 for ; Tue, 27 Sep 2011 23:49:24 -0700 (PDT) Received: by 10.227.138.147 with SMTP id a19mr10182236wbu.23.1317192563961; Tue, 27 Sep 2011 23:49:23 -0700 (PDT) Received: from richards-thinkpad.stglab.manchester.uk.ibm.com (rsandifo.gotadsl.co.uk. [82.133.89.107]) by mx.google.com with ESMTPS id ex12sm826158wbb.23.2011.09.27.23.49.21 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 27 Sep 2011 23:49:22 -0700 (PDT) From: Richard Sandiford To: Ramana Radhakrishnan Mail-Followup-To: Ramana Radhakrishnan , gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Cc: gcc-patches@gcc.gnu.org Subject: Re: [ARM] Optimise handling of neon_vget_high/low References: Date: Wed, 28 Sep 2011 07:49:14 +0100 In-Reply-To: (Ramana Radhakrishnan's message of "Fri, 23 Sep 2011 10:23:04 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Ramana Radhakrishnan writes: > On 14 September 2011 13:30, Richard Sandiford > wrote: >> neon_vget_high and neon_vget_low extract one half of a vector. >> The patterns look like: >> >> (define_insn "neon_vget_highv16qi" >>  [(set (match_operand:V8QI 0 "s_register_operand" "=w") >>        (vec_select:V8QI (match_operand:V16QI 1 "s_register_operand" "w") >>                         (parallel [(const_int 8) (const_int 9) >>                                    (const_int 10) (const_int 11) >>                                    (const_int 12) (const_int 13) >>                                    (const_int 14) (const_int 15)])))] >>  "TARGET_NEON" >> { >>  int dest = REGNO (operands[0]); >>  int src = REGNO (operands[1]); >> >>  if (dest != src + 2) >>    return "vmov\t%P0, %f1"; >>  else >>    return ""; >> } >>  [(set_attr "neon_type" "neon_bp_simple")] >> ) >> >> But there's nothing here to tell the register allocator what's expected >> of it, so we do often get the move.  The patch below makes the patterns >> expand to normal subreg moves instead. >> >> Unfortunately, when I first tried this, I ran across some bugs in >> simplify-rtx.c.  They should be fixed now.  Of course, I can't >> guarantee that there are other similar bugs elsewhere, but I'll >> try to fix any that crop up. >> >> The new patterns preserve the current treatment on big-endian targets. >> Namely, the "low" half is always in the lower-numbered registers >> (subreg byte offset 0). >> >> Tested on arm-linux-gnueabi.  OK to install? > > > This is OK . Please watch out for any fallout that comes as a result > of this . It would be useful to do some big endian testing at some > point but I'm not going to let that hold up this patch. > > While you are here can you look at the quad_halves_v4si etc. > patterns neon_move_lo_quad_ and neon_move_hi_quad_ > patterns which seem to have the same problem ? Ah, yeah, thanks for the pointer. Tested on arm-linux-gnueabi, and on benchmarks which should (and did) benefit from it. OK to install? Richard gcc/ * config/arm/neon.md (neon_move_lo_quad_): Delete. (neon_move_hi_quad_): Likewise. (move_hi_quad_, move_lo_quad_): Use subreg moves. Index: gcc/config/arm/neon.md =================================================================== --- gcc/config/arm/neon.md 2011-09-27 09:27:18.000000000 +0100 +++ gcc/config/arm/neon.md 2011-09-27 09:45:26.008275971 +0100 @@ -1251,66 +1251,14 @@ (define_insn "quad_halves_v16qi" (const_string "neon_int_1") (const_string "neon_int_5")))] ) -; FIXME: We wouldn't need the following insns if we could write subregs of -; vector registers. Make an attempt at removing unnecessary moves, though -; we're really at the mercy of the register allocator. - -(define_insn "neon_move_lo_quad_" - [(set (match_operand:ANY128 0 "s_register_operand" "+w") - (vec_concat:ANY128 - (match_operand: 1 "s_register_operand" "w") - (vec_select: - (match_dup 0) - (match_operand:ANY128 2 "vect_par_constant_high" ""))))] - "TARGET_NEON" -{ - int dest = REGNO (operands[0]); - int src = REGNO (operands[1]); - - if (dest != src) - return "vmov\t%e0, %P1"; - else - return ""; -} - [(set_attr "neon_type" "neon_bp_simple")] -) - -(define_insn "neon_move_hi_quad_" - [(set (match_operand:ANY128 0 "s_register_operand" "+w") - (vec_concat:ANY128 - (vec_select: - (match_dup 0) - (match_operand:ANY128 2 "vect_par_constant_low" "")) - (match_operand: 1 "s_register_operand" "w")))] - - "TARGET_NEON" -{ - int dest = REGNO (operands[0]); - int src = REGNO (operands[1]); - - if (dest != src) - return "vmov\t%f0, %P1"; - else - return ""; -} - [(set_attr "neon_type" "neon_bp_simple")] -) - (define_expand "move_hi_quad_" [(match_operand:ANY128 0 "s_register_operand" "") (match_operand: 1 "s_register_operand" "")] "TARGET_NEON" { - rtvec v = rtvec_alloc (/2); - rtx t1; - int i; - - for (i=0; i < (/2); i++) - RTVEC_ELT (v, i) = GEN_INT (i); - - t1 = gen_rtx_PARALLEL (mode, v); - emit_insn (gen_neon_move_hi_quad_ (operands[0], operands[1], t1)); - + emit_move_insn (simplify_gen_subreg (mode, operands[0], mode, + GET_MODE_SIZE (mode)), + operands[1]); DONE; }) @@ -1319,16 +1267,9 @@ (define_expand "move_lo_quad_" (match_operand: 1 "s_register_operand" "")] "TARGET_NEON" { - rtvec v = rtvec_alloc (/2); - rtx t1; - int i; - - for (i=0; i < (/2); i++) - RTVEC_ELT (v, i) = GEN_INT ((/2) + i); - - t1 = gen_rtx_PARALLEL (mode, v); - emit_insn (gen_neon_move_lo_quad_ (operands[0], operands[1], t1)); - + emit_move_insn (simplify_gen_subreg (mode, operands[0], + mode, 0), + operands[1]); DONE; })