From patchwork Tue Aug 5 07:05:00 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Pinski X-Patchwork-Id: 376576 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id E1AEA140095 for ; Tue, 5 Aug 2014 17:05:15 +1000 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=mrvCr86k0zEh4chcf5 gS2H/m7O5ioiOWrZNuPmmW+k7gvi7Za4Q4mFKZ94lxV0//lXVWEWCSQYMr9y9DPO VZxzB0+02tnCpnkxj5HmiZ3R9HZgPGiC+qDKr/cwVANKa2F1aAPbcINzob0YEM24 oM5Fsxx3msqtpZUeWcvblaDJk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=OvCIW9Gx2eDLvo5w9zfGa/3K F4M=; b=gH6IKcQz+MgXrcjW5YOynHVCJEqHkVi5Q74nAEAJQdDrNg/hUaBU2xyM O8z7GFPcGIRcRyw8F3/rSc+Jn4EtVNZB67kZ3r30c8Re/9Jz09Ng/33XMg8pttG9 u9TIzSsbvLkVzkfWLfPsOg63K2osEb9/kHMZ0yY0csqo9YEUwhY= Received: (qmail 21739 invoked by alias); 5 Aug 2014 07:05:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 21728 invoked by uid 89); 5 Aug 2014 07:05:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-we0-f176.google.com Received: from mail-we0-f176.google.com (HELO mail-we0-f176.google.com) (74.125.82.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 05 Aug 2014 07:05:04 +0000 Received: by mail-we0-f176.google.com with SMTP id q58so493224wes.21 for ; Tue, 05 Aug 2014 00:05:00 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.180.21.141 with SMTP id v13mr37800870wie.48.1407222300836; Tue, 05 Aug 2014 00:05:00 -0700 (PDT) Received: by 10.217.40.8 with HTTP; Tue, 5 Aug 2014 00:05:00 -0700 (PDT) In-Reply-To: References: <1402044609-30902-1-git-send-email-james.greenhalgh@arm.com> Date: Tue, 5 Aug 2014 00:05:00 -0700 Message-ID: Subject: Re: [AArch64] Implement movmem for the benefit of inline memcpy From: Andrew Pinski To: James Greenhalgh Cc: "gcc-patches@gcc.gnu.org" , "marcus.shawcroft@arm.com" X-IsSubscribed: yes On Fri, Aug 1, 2014 at 2:21 AM, wrote: > > >> On Jun 6, 2014, at 1:50 AM, James Greenhalgh wrote: >> >> >> Hi, >> >> The move_by_pieces infrastructure performs a copy by repeatedly trying >> the largest safe copy it can make. So for a 15-byte copy we might see: >> >> offset amount bytes copied >> 0 8 0-7 >> 8 4 8-11 >> 12 2 12-13 >> 14 1 14 >> >> However, we can implement a 15-byte copy as so: >> >> offset amount bytes copied >> 0 8 0-7 >> 7 8 7-14 >> >> Which can prove more efficient for both space and speed. >> >> In this patch we set MOVE_RATIO low to avoid using move_by_pieces, and >> implement the movmem pattern name to expand small block copy cases. Note, this >> optimization does not apply for -mstrict-align targets, which must continue >> copying byte-by-byte. > > Why not change move_by_pieces instead of having a target specific code? This seems like a better option. You can check is unaligned slow target macro to see if you want to do this optimization too. As I mentioned in the other email make sure you check the volatile ness of the from and to before doing this optimization. Attached is the patch which does what I mentioned, I also changed store_by_pieces to implement a similar optimization there (for memset and strcpy). Also since I used SLOW_UNALIGNED_ACCESS, this is a generic optimization. I had tested an earlier version on x86_64-linux-gnu and I am in the middle of bootstrap/testing on this one. Thanks, Andrew Pinski * expr.c (move_by_pieces): Take the min of max_size and len to speed up things and to take advatage of the mode in move_by_pieces_1. (move_by_pieces_1): Read/write the leftovers using an overlapping memory locations to reduce the number of reads/writes. (store_by_pieces_1): Take the min of max_size and len to speed up things and to take advatage of the mode in store_by_pieces_2. (store_by_pieces_2): Write the leftovers using an overlapping memory locations to reduce the number of writes. > > Thanks, > Andrew > > >> >> Setting MOVE_RATIO low in this way causes a few tests to begin failing, >> both of these are documented in the test-case as expected to fail for >> low MOVE_RATIO targets, which do not allow certain Tree-Level optimizations. >> >> Bootstrapped on aarch64-unknown-linux-gnu with no issues. >> >> OK for trunk? >> >> Thanks, >> James >> >> --- >> gcc/ >> >> 2014-06-06 James Greenhalgh >> >> * config/aarch64/aarch64-protos.h (aarch64_expand_movmem): New. >> * config/aarch64/aarch64.c (aarch64_move_pointer): New. >> (aarch64_progress_pointer): Likewise. >> (aarch64_copy_one_part_and_move_pointers): Likewise. >> (aarch64_expand_movmen): Likewise. >> * config/aarch64/aarch64.h (MOVE_RATIO): Set low. >> * config/aarch64/aarch64.md (movmem): New. >> >> gcc/testsuite/ >> >> 2014-06-06 James Greenhalgh >> >> * gcc.dg/tree-ssa/pr42585.c: Skip for AArch64. >> * gcc.dg/tree-ssa/sra-12.c: Likewise. >> <0001-AArch64-Implement-movmem-for-the-benefit-of-inline-m.patch> Index: expr.c =================================================================== --- expr.c (revision 213306) +++ expr.c (working copy) @@ -876,6 +876,9 @@ move_by_pieces (rtx to, rtx from, unsign if (data.reverse) data.offset = len; data.len = len; + /* Use the MIN of the length and the max size we can use. */ + max_size = max_size > (len + 1) ? (len + 1) : max_size; + /* If copying requires more than two move insns, copy addresses to registers (to make displacements shorter) and use post-increment if available. */ @@ -1073,6 +1076,32 @@ move_by_pieces_1 (insn_gen_fn genfun, ma data->len -= size; } + + /* If we have some data left and unalign accesses + are not slow, back up slightly and emit the move. */ + if (data->len > 0 + && !STRICT_ALIGNMENT + && !SLOW_UNALIGNED_ACCESS (mode, 1) + /* Not a stack push */ + && data->to + /* Neither side is volatile memory. */ + && !MEM_VOLATILE_P (data->to) + && !MEM_VOLATILE_P (data->from) + && ceil_log2 (data->len) == exact_log2 (size) + /* No incrementing of the to or from. */ + && data->explicit_inc_to == 0 + && data->explicit_inc_from == 0 + /* No auto-incrementing of the to or from. */ + && !data->autinc_to + && !data->autinc_from + && !data->reverse) + { + unsigned offset = data->offset - (size - data->len); + to1 = adjust_address (data->to, mode, offset); + from1 = adjust_address (data->from, mode, offset); + emit_insn ((*genfun) (to1, from1)); + data->len = 0; + } } /* Emit code to move a block Y to a block X. This may be done with @@ -2636,6 +2665,9 @@ store_by_pieces_1 (struct store_by_piece if (data->reverse) data->offset = data->len; + /* Use the MIN of the length and the max size we can use. */ + max_size = max_size > (data->len + 1) ? (data->len + 1) : max_size; + /* If storing requires more than two move insns, copy addresses to registers (to make displacements shorter) and use post-increment if available. */ @@ -2733,6 +2765,24 @@ store_by_pieces_2 (insn_gen_fn genfun, m data->len -= size; } + + /* If we have some data left and unalign accesses + are not slow, back up slightly and emit that constant. */ + if (data->len > 0 + && !STRICT_ALIGNMENT + && !SLOW_UNALIGNED_ACCESS (mode, 1) + && !MEM_VOLATILE_P (data->to) + && ceil_log2 (data->len) == exact_log2 (size) + && data->explicit_inc_to == 0 + && !data->autinc_to + && !data->reverse) + { + unsigned offset = data->offset - (size - data->len); + to1 = adjust_address (data->to, mode, offset); + cst = (*data->constfun) (data->constfundata, offset, mode); + emit_insn ((*genfun) (to1, cst)); + data->len = 0; + } } /* Write zeros through the storage of OBJECT. If OBJECT has BLKmode, SIZE is