From patchwork Tue Aug  5 07:05:00 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Andrew Pinski <pinskia@gmail.com>
X-Patchwork-Id: 376576
Return-Path: 
 <gcc-patches-return-374108-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256
	bits)) (No client certificate requested)
	by ozlabs.org (Postfix) with ESMTPS id E1AEA140095
	for <incoming@patchwork.ozlabs.org>;
	Tue,  5 Aug 2014 17:05:15 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; q=dns; s=default; b=mrvCr86k0zEh4chcf5
	gS2H/m7O5ioiOWrZNuPmmW+k7gvi7Za4Q4mFKZ94lxV0//lXVWEWCSQYMr9y9DPO
	VZxzB0+02tnCpnkxj5HmiZ3R9HZgPGiC+qDKr/cwVANKa2F1aAPbcINzob0YEM24
	oM5Fsxx3msqtpZUeWcvblaDJk=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:in-reply-to:references:date:message-id:subject
	:from:to:cc:content-type; s=default; bh=OvCIW9Gx2eDLvo5w9zfGa/3K
	F4M=; b=gH6IKcQz+MgXrcjW5YOynHVCJEqHkVi5Q74nAEAJQdDrNg/hUaBU2xyM
	O8z7GFPcGIRcRyw8F3/rSc+Jn4EtVNZB67kZ3r30c8Re/9Jz09Ng/33XMg8pttG9
	u9TIzSsbvLkVzkfWLfPsOg63K2osEb9/kHMZ0yY0csqo9YEUwhY=
Received: (qmail 21739 invoked by alias); 5 Aug 2014 07:05:07 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 21728 invoked by uid 89); 5 Aug 2014 07:05:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00,
	FREEMAIL_FROM, RCVD_IN_DNSWL_LOW,
	SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-we0-f176.google.com
Received: from mail-we0-f176.google.com (HELO mail-we0-f176.google.com)
	(74.125.82.176) by sourceware.org
	(qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted)
	ESMTPS; Tue, 05 Aug 2014 07:05:04 +0000
Received: by mail-we0-f176.google.com with SMTP id q58so493224wes.21 for
	<gcc-patches@gcc.gnu.org>; Tue, 05 Aug 2014 00:05:00 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.180.21.141 with SMTP id v13mr37800870wie.48.1407222300836;
	Tue, 05 Aug 2014 00:05:00 -0700 (PDT)
Received: by 10.217.40.8 with HTTP; Tue, 5 Aug 2014 00:05:00 -0700 (PDT)
In-Reply-To: <F1E21B8F-A0D0-45B1-84BE-9BF60A5FD1E5@gmail.com>
References: <1402044609-30902-1-git-send-email-james.greenhalgh@arm.com>
	<F1E21B8F-A0D0-45B1-84BE-9BF60A5FD1E5@gmail.com>
Date: Tue, 5 Aug 2014 00:05:00 -0700
Message-ID: 
 <CA+=Sn1ma5kA6AZa_44DLiDyjVLH=rLvp0_TC0ck99PuGpmNK6A@mail.gmail.com>
Subject: Re: [AArch64] Implement movmem for the benefit of inline memcpy
From: Andrew Pinski <pinskia@gmail.com>
To: James Greenhalgh <james.greenhalgh@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	"marcus.shawcroft@arm.com" <marcus.shawcroft@arm.com>
X-IsSubscribed: yes

On Fri, Aug 1, 2014 at 2:21 AM,  <pinskia@gmail.com> wrote:
>
>
>> On Jun 6, 2014, at 1:50 AM, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>>
>>
>> Hi,
>>
>> The move_by_pieces infrastructure performs a copy by repeatedly trying
>> the largest safe copy it can make. So for a 15-byte copy we might see:
>>
>> offset   amount  bytes copied
>> 0        8       0-7
>> 8        4       8-11
>> 12       2       12-13
>> 14       1       14
>>
>> However, we can implement a 15-byte copy as so:
>>
>> offset   amount  bytes copied
>> 0        8       0-7
>> 7        8       7-14
>>
>> Which can prove more efficient for both space and speed.
>>
>> In this patch we set MOVE_RATIO low to avoid using move_by_pieces, and
>> implement the movmem pattern name to expand small block copy cases. Note, this
>> optimization does not apply for -mstrict-align targets, which must continue
>> copying byte-by-byte.
>
> Why not change move_by_pieces instead of having a target specific code? This seems like a better option. You can check is unaligned slow target macro to see if you want to do this optimization too.   As I mentioned in the other email make sure you check the volatile ness of the from and to before doing this optimization.

Attached is the patch which does what I mentioned, I also changed
store_by_pieces to implement a similar optimization there (for memset
and strcpy).  Also since I used SLOW_UNALIGNED_ACCESS, this is a
generic optimization.

I had tested an earlier version on x86_64-linux-gnu and I am in the
middle of bootstrap/testing on this one.

Thanks,
Andrew Pinski

* expr.c (move_by_pieces):
Take the min of max_size and len to speed up things
and to take advatage of the mode in move_by_pieces_1.
(move_by_pieces_1): Read/write the leftovers using an overlapping
memory locations to reduce the number of reads/writes.
(store_by_pieces_1): Take the min of max_size and len to speed up things
and to take advatage of the mode in store_by_pieces_2.
(store_by_pieces_2): Write the leftovers using an overlapping
memory locations to reduce the number of writes.


>
> Thanks,
> Andrew
>
>
>>
>> Setting MOVE_RATIO low in this way causes a few tests to begin failing,
>> both of these are documented in the test-case as expected to fail for
>> low MOVE_RATIO targets, which do not allow certain Tree-Level optimizations.
>>
>> Bootstrapped on aarch64-unknown-linux-gnu with no issues.
>>
>> OK for trunk?
>>
>> Thanks,
>> James
>>
>> ---
>> gcc/
>>
>> 2014-06-06  James Greenhalgh  <james.greenhalgh@arm.com>
>>
>>    * config/aarch64/aarch64-protos.h (aarch64_expand_movmem): New.
>>    * config/aarch64/aarch64.c (aarch64_move_pointer): New.
>>    (aarch64_progress_pointer): Likewise.
>>    (aarch64_copy_one_part_and_move_pointers): Likewise.
>>    (aarch64_expand_movmen): Likewise.
>>    * config/aarch64/aarch64.h (MOVE_RATIO): Set low.
>>    * config/aarch64/aarch64.md (movmem<mode>): New.
>>
>> gcc/testsuite/
>>
>> 2014-06-06  James Greenhalgh  <james.greenhalgh@arm.com>
>>
>>    * gcc.dg/tree-ssa/pr42585.c: Skip for AArch64.
>>    * gcc.dg/tree-ssa/sra-12.c: Likewise.
>> <0001-AArch64-Implement-movmem-for-the-benefit-of-inline-m.patch>

Index: expr.c
===================================================================
--- expr.c	(revision 213306)
+++ expr.c	(working copy)
@@ -876,6 +876,9 @@ move_by_pieces (rtx to, rtx from, unsign
   if (data.reverse) data.offset = len;
   data.len = len;
 
+  /* Use the MIN of the length and the max size we can use. */
+  max_size = max_size > (len + 1) ? (len + 1) : max_size;
+
   /* If copying requires more than two move insns,
      copy addresses to registers (to make displacements shorter)
      and use post-increment if available.  */
@@ -1073,6 +1076,32 @@ move_by_pieces_1 (insn_gen_fn genfun, ma
 
       data->len -= size;
     }
+
+  /* If we have some data left and unalign accesses
+     are not slow, back up slightly and emit the move. */
+  if (data->len > 0
+      && !STRICT_ALIGNMENT
+      && !SLOW_UNALIGNED_ACCESS (mode, 1)
+      /* Not a stack push */
+      && data->to
+      /* Neither side is volatile memory. */
+      && !MEM_VOLATILE_P (data->to)
+      && !MEM_VOLATILE_P (data->from)
+      && ceil_log2 (data->len) == exact_log2 (size)
+      /* No incrementing of the to or from. */
+      && data->explicit_inc_to == 0
+      && data->explicit_inc_from == 0
+      /* No auto-incrementing of the to or from. */
+      && !data->autinc_to
+      && !data->autinc_from
+      && !data->reverse)
+    {
+      unsigned offset = data->offset - (size - data->len);
+      to1 = adjust_address (data->to, mode, offset);
+      from1 = adjust_address (data->from, mode, offset);
+      emit_insn ((*genfun) (to1, from1));
+      data->len = 0;
+    }
 }
 
 /* Emit code to move a block Y to a block X.  This may be done with
@@ -2636,6 +2665,9 @@ store_by_pieces_1 (struct store_by_piece
   if (data->reverse)
     data->offset = data->len;
 
+  /* Use the MIN of the length and the max size we can use. */
+  max_size = max_size > (data->len + 1) ? (data->len + 1)  : max_size;
+
   /* If storing requires more than two move insns,
      copy addresses to registers (to make displacements shorter)
      and use post-increment if available.  */
@@ -2733,6 +2765,24 @@ store_by_pieces_2 (insn_gen_fn genfun, m
 
       data->len -= size;
     }
+
+  /* If we have some data left and unalign accesses
+     are not slow, back up slightly and emit that constant.  */
+  if (data->len > 0
+      && !STRICT_ALIGNMENT
+      && !SLOW_UNALIGNED_ACCESS (mode, 1)
+      && !MEM_VOLATILE_P (data->to)
+      && ceil_log2 (data->len) == exact_log2 (size)
+      && data->explicit_inc_to == 0
+      && !data->autinc_to
+      && !data->reverse)
+    {
+      unsigned offset = data->offset - (size - data->len);
+      to1 = adjust_address (data->to, mode, offset);
+      cst = (*data->constfun) (data->constfundata, offset, mode);
+      emit_insn ((*genfun) (to1, cst));
+      data->len = 0;
+    }
 }
 
 /* Write zeros through the storage of OBJECT.  If OBJECT has BLKmode, SIZE is