From patchwork Thu Apr 12 17:37:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 897748 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-476297-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Bws+fswd"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40MSm20bmWz9s25 for ; Fri, 13 Apr 2018 03:37:37 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-transfer-encoding:in-reply-to; q=dns; s= default; b=ihefKmPTExDmXO78KV+ZEiqsDVtgeSdZdd1u6oEgRwyIBGHN9ERZM KEq/wvYiHqNl0L9IaaZf2wiCeLVqHW+AQhl1AaH8Z8T5ZsGlSA/3yaz81CKQUcLg CtUgX96GRs9RGPuJqramn1bBlPEBSCECcXd3DpLJ90ZmnimuNB56jM= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-transfer-encoding:in-reply-to; s=default; bh=TI92x+3/N+IZBS6O86UeFCPbVlo=; b=Bws+fswdbCY/YzD3w4dBH1qviWsU tpakWRkG7+HBjQ3x0T828MQMH61nWN7k2weScyoGyf6YBDwBC6tirkOme7tJvAUG xrJQye1spWbUHyTXAreymcvirABGebfa3ZC7at9LaETCuFCTpWjOB9ojy0y7KfbU G+vBvH0j0peQbP8= Received: (qmail 42463 invoked by alias); 12 Apr 2018 17:37:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 42445 invoked by uid 89); 12 Apr 2018 17:37:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_LAZY_DOMAIN_SECURITY autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Apr 2018 17:37:28 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DDF9940241FB; Thu, 12 Apr 2018 17:37:26 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.36.118.110]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7924F2017F21; Thu, 12 Apr 2018 17:37:26 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id w3CHbNZO015086; Thu, 12 Apr 2018 19:37:23 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id w3CHbMW2015075; Thu, 12 Apr 2018 19:37:22 +0200 Date: Thu, 12 Apr 2018 19:37:22 +0200 From: Jakub Jelinek To: Richard Biener , Wilco Dijkstra Cc: nd , "mliska@suse.cz" , "ubizjak@gmail.com" , GCC Patches , "marc.glisse@inria.fr" , "H.J. Lu" , Jan Hubicka Subject: [PATCH] Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657, take 2) Message-ID: <20180412173722.GP8577@tucnak> Reply-To: Jakub Jelinek References: <20180412160306.GN8577@tucnak> <20180412164917.GO8577@tucnak> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) X-IsSubscribed: yes On Thu, Apr 12, 2018 at 05:29:35PM +0000, Wilco Dijkstra wrote: > > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got added, > > in 2013 other power versions, in 2016 s390*, etc.  Doing a decent mempcpy > > isn't hard if you have asm version of memcpy and one spare register. > > More mempcpy implementations have been added in recent years indeed, but almost all > add an extra copy of the memcpy code rather than using a single combined implementation. > That means it is still better to call memcpy (which is frequently used and thus likely in L1/L2) > rather than mempcpy (which is more likely to be cold and thus not cached). That really depends, usually when some app uses mempcpy, it uses it very heavily. And all the proposed patches do is honor what the user asked, if you use memcpy () + n, we aren't transforming that into mempcpy behind the user's back. Anyway, here is what I think Richard was asking for, that I'm currently bootstrapping/regtesting. It can be easily combined with Martin's target hook if needed, or do it only for endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp) etc. 2018-04-12 Martin Liska Jakub Jelinek PR middle-end/81657 * expr.h (enum block_op_methods): Add BLOCK_OP_NO_LIBCALL_RET. * expr.c (emit_block_move_hints): Handle BLOCK_OP_NO_LIBCALL_RET. * builtins.c (expand_builtin_memory_copy_args): Use BLOCK_OP_NO_LIBCALL_RET method for mempcpy with non-ignored target, handle dest_addr == pc_rtx. * gcc.dg/string-opt-1.c: Remove bogus comment. Expect a mempcpy call. Jakub --- gcc/expr.h.jj 2018-01-12 11:35:51.424222835 +0100 +++ gcc/expr.h 2018-04-12 18:38:07.377464114 +0200 @@ -100,7 +100,11 @@ enum block_op_methods BLOCK_OP_NO_LIBCALL, BLOCK_OP_CALL_PARM, /* Like BLOCK_OP_NORMAL, but the libcall can be tail call optimized. */ - BLOCK_OP_TAILCALL + BLOCK_OP_TAILCALL, + /* Like BLOCK_OP_NO_LIBCALL, but instead of emitting a libcall return + pc_rtx to indicate nothing has been emitted and let the caller handle + it. */ + BLOCK_OP_NO_LIBCALL_RET }; typedef rtx (*by_pieces_constfn) (void *, HOST_WIDE_INT, scalar_int_mode); --- gcc/expr.c.jj 2018-04-06 19:19:14.954130838 +0200 +++ gcc/expr.c 2018-04-12 18:39:58.866536619 +0200 @@ -1565,7 +1565,7 @@ emit_block_move_hints (rtx x, rtx y, rtx unsigned HOST_WIDE_INT max_size, unsigned HOST_WIDE_INT probable_max_size) { - bool may_use_call; + int may_use_call; rtx retval = 0; unsigned int align; @@ -1577,7 +1577,7 @@ emit_block_move_hints (rtx x, rtx y, rtx { case BLOCK_OP_NORMAL: case BLOCK_OP_TAILCALL: - may_use_call = true; + may_use_call = 1; break; case BLOCK_OP_CALL_PARM: @@ -1589,7 +1589,11 @@ emit_block_move_hints (rtx x, rtx y, rtx break; case BLOCK_OP_NO_LIBCALL: - may_use_call = false; + may_use_call = 0; + break; + + case BLOCK_OP_NO_LIBCALL_RET: + may_use_call = -1; break; default: @@ -1625,6 +1629,9 @@ emit_block_move_hints (rtx x, rtx y, rtx && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x)) && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y))) { + if (may_use_call < 0) + return pc_rtx; + /* Since x and y are passed to a libcall, mark the corresponding tree EXPR as addressable. */ tree y_expr = MEM_EXPR (y); --- gcc/builtins.c.jj 2018-04-12 13:35:34.328395156 +0200 +++ gcc/builtins.c 2018-04-12 18:42:01.846616598 +0200 @@ -3650,12 +3650,16 @@ expand_builtin_memory_copy_args (tree de set_mem_align (src_mem, src_align); /* Copy word part most expediently. */ - dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, - CALL_EXPR_TAILCALL (exp) - && (endp == 0 || target == const0_rtx) - ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL, + enum block_op_methods method = BLOCK_OP_NORMAL; + if (CALL_EXPR_TAILCALL (exp) && (endp == 0 || target == const0_rtx)) + method = BLOCK_OP_TAILCALL; + if (endp == 1 && target != const0_rtx) + method = BLOCK_OP_NO_LIBCALL_RET; + dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx, method, expected_align, expected_size, min_size, max_size, probable_max_size); + if (dest_addr == pc_rtx) + return NULL_RTX; if (dest_addr == 0) { --- gcc/testsuite/gcc.dg/string-opt-1.c.jj 2017-08-01 19:23:09.923512205 +0200 +++ gcc/testsuite/gcc.dg/string-opt-1.c 2018-04-12 18:57:10.940217129 +0200 @@ -1,4 +1,3 @@ -/* Ensure mempcpy is "optimized" into memcpy followed by addition. */ /* { dg-do compile } */ /* { dg-options "-O2" } */ @@ -48,5 +47,5 @@ main (void) return 0; } -/* { dg-final { scan-assembler-not "\" } } */ +/* { dg-final { scan-assembler "mempcpy" } } */ /* { dg-final { scan-assembler "memcpy" } } */