From patchwork Mon Sep 30 16:36:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Sawdey X-Patchwork-Id: 1169502 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-509892-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Ho/oWETE"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46hp2R5G3Zz9sPJ for ; Tue, 1 Oct 2019 02:36:46 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=IZwPmBJ9a4xF7aTv Lz0oDMwPyIR+XgY/fdabl03orz0udQRjWYLCmhJjBPmGBOKbLcUOagCYOvsRE2ji 1CtBpLP6fc/x/z0v7R+OZs8PWeY6G7Su+TnngOBXVIww0/giN1NHfQ4zKplLMaJn 9v/MsPRpFeqlo3ENDCpReKB05Uc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:message-id:date:mime-version:content-type :content-transfer-encoding; s=default; bh=568Bj1HutS+geujzBV9Abo PxcAM=; b=Ho/oWETEHgc3Y9q4vl5WmCV3QXIBei65/V1Vj2Bk4TIpbw1wIQTFO5 kzHnGkTQAwFkcvPLdusMI9CjLCJX/H9q+ywZncF+99Lb9ykubL49QXNGEG4obI4K B8jcFCfQH1VpC/ZV+RgsRW9je0yM6PC/rolwlWknTc5XEmdBNp1MI= Received: (qmail 45573 invoked by alias); 30 Sep 2019 16:36:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 45565 invoked by uid 89); 30 Sep 2019 16:36:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-7.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=aaron, UD:D, figures, Aaron X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 30 Sep 2019 16:36:36 +0000 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x8UGXZ2I079814; Mon, 30 Sep 2019 12:36:35 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 2vbk5p6um5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 30 Sep 2019 12:36:34 -0400 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.27/8.16.0.27) with SMTP id x8UGXdTn080222; Mon, 30 Sep 2019 12:36:34 -0400 Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0a-001b2d01.pphosted.com with ESMTP id 2vbk5p6uk8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 30 Sep 2019 12:36:34 -0400 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x8UGZSqA018798; Mon, 30 Sep 2019 16:36:33 GMT Received: from b01cxnp22035.gho.pok.ibm.com (b01cxnp22035.gho.pok.ibm.com [9.57.198.25]) by ppma02wdc.us.ibm.com with ESMTP id 2v9y57v7yd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 30 Sep 2019 16:36:33 +0000 Received: from b01ledav001.gho.pok.ibm.com (b01ledav001.gho.pok.ibm.com [9.57.199.106]) by b01cxnp22035.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x8UGaWqi48431542 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 30 Sep 2019 16:36:32 GMT Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B45B12805A; Mon, 30 Sep 2019 16:36:32 +0000 (GMT) Received: from b01ledav001.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E4C3128058; Mon, 30 Sep 2019 16:36:31 +0000 (GMT) Received: from ragesh4.local (unknown [9.211.104.11]) by b01ledav001.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 30 Sep 2019 16:36:31 +0000 (GMT) To: GCC Patches Cc: Segher Boessenkool , Bill Schmidt , David Edelsohn From: Aaron Sawdey Subject: [PATCH, RS6000] Add movmemsi pattern for inline expansion of memmove() Message-ID: Date: Mon, 30 Sep 2019 11:36:31 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 X-IsSubscribed: yes This patch uses the support added in the patch I posted last week for actually doing inline expansion of memmove(). I've added a might_overlap parameter to expand_block_move() to tell it when it must make sure to handle overlapping moves. I changed the code to save up the generated rtx for both loads and stores instead of just stores. In the might_overlap==true case, if we get to MAX_MOVE_REG and the move is not done yet, then we bail out and return false. So what this can now do is inline expand any memmove() that can be done in 4 loads followed by 4 stores. It will use lxv/stxv if size/alignment allows, otherwise it will use unaligned integer loads/stores. So it can expand most memmove() up to 32 bytes, and some that are 33-64 bytes if the arguments are 16 byte aligned. I've also removed the code from expand_block_move() for dealing with mode==BLKmode because I don't believe that can happen. The big if construct that figures out which size we are going to use has a plain else on it, and every clause in it sets mode to something other than BLKmode. So I removed that code to simplify things and just left a gcc_assert(mode != BLKmode). Regtest in progress on ppc64le (power9), if tests are ok, is this ok for trunk after the movmem optab patch posted last week is approved? Thanks! Aaron 2019-09-30 Aaron Sawdey * config/rs6000/rs6000-protos.h (expand_block_move): Change prototype. * config/rs6000/rs6000-string.c (expand_block_move): Add might_overlap parm. * config/rs6000/rs6000.md (movmemsi): Add new pattern. (cpymemsi): Add might_overlap parm to expand_block_move() call. Index: gcc/config/rs6000/rs6000-protos.h =================================================================== --- gcc/config/rs6000/rs6000-protos.h (revision 276131) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -69,7 +69,7 @@ extern void rs6000_generate_float2_double_code (rtx, rtx, rtx); extern void rs6000_generate_vsigned2_code (bool, rtx, rtx, rtx); extern int expand_block_clear (rtx[]); -extern int expand_block_move (rtx[]); +extern int expand_block_move (rtx[], bool); extern bool expand_block_compare (rtx[]); extern bool expand_strn_compare (rtx[], int); extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode); Index: gcc/config/rs6000/rs6000-string.c =================================================================== --- gcc/config/rs6000/rs6000-string.c (revision 276131) +++ gcc/config/rs6000/rs6000-string.c (working copy) @@ -2719,7 +2719,7 @@ #define MAX_MOVE_REG 4 int -expand_block_move (rtx operands[]) +expand_block_move (rtx operands[], bool might_overlap) { rtx orig_dest = operands[0]; rtx orig_src = operands[1]; @@ -2730,6 +2730,7 @@ int bytes; int offset; int move_bytes; + rtx loads[MAX_MOVE_REG]; rtx stores[MAX_MOVE_REG]; int num_reg = 0; @@ -2817,47 +2818,35 @@ gen_func.mov = gen_movqi; } + /* Mode is always set to something other than BLKmode by one of the + cases of the if statement above. */ + gcc_assert (mode != BLKmode); + src = adjust_address (orig_src, mode, offset); dest = adjust_address (orig_dest, mode, offset); - if (mode != BLKmode) - { - rtx tmp_reg = gen_reg_rtx (mode); + rtx tmp_reg = gen_reg_rtx (mode); + + loads[num_reg] = (*gen_func.mov) (tmp_reg, src); + stores[num_reg++] = (*gen_func.mov) (dest, tmp_reg); - emit_insn ((*gen_func.mov) (tmp_reg, src)); - stores[num_reg++] = (*gen_func.mov) (dest, tmp_reg); - } + /* If we didn't succeed in doing it in one pass, we can't do it in the + might_overlap case. Bail out and return failure. */ + if (might_overlap && num_reg >= MAX_MOVE_REG + && bytes > move_bytes) + return 0; - if (mode == BLKmode || num_reg >= MAX_MOVE_REG || bytes == move_bytes) + /* Emit loads and stores saved up. */ + if (num_reg >= MAX_MOVE_REG || bytes == move_bytes) { int i; for (i = 0; i < num_reg; i++) + emit_insn (loads[i]); + for (i = 0; i < num_reg; i++) emit_insn (stores[i]); num_reg = 0; } - - if (mode == BLKmode) - { - /* Move the address into scratch registers. The movmemsi - patterns require zero offset. */ - if (!REG_P (XEXP (src, 0))) - { - rtx src_reg = copy_addr_to_reg (XEXP (src, 0)); - src = replace_equiv_address (src, src_reg); - } - set_mem_size (src, move_bytes); - - if (!REG_P (XEXP (dest, 0))) - { - rtx dest_reg = copy_addr_to_reg (XEXP (dest, 0)); - dest = replace_equiv_address (dest, dest_reg); - } - set_mem_size (dest, move_bytes); - - emit_insn ((*gen_func.movmemsi) (dest, src, - GEN_INT (move_bytes & 31), - align_rtx)); - } + } return 1; Index: gcc/config/rs6000/rs6000.md =================================================================== --- gcc/config/rs6000/rs6000.md (revision 276131) +++ gcc/config/rs6000/rs6000.md (working copy) @@ -9057,7 +9057,7 @@ FAIL; }) -;; String/block move insn. +;; String/block copy insn (source and destination must not overlap). ;; Argument 0 is the destination ;; Argument 1 is the source ;; Argument 2 is the length @@ -9070,11 +9070,31 @@ (use (match_operand:SI 3 ""))])] "" { - if (expand_block_move (operands)) + if (expand_block_move (operands, false)) DONE; else FAIL; }) + +;; String/block move insn (source and destination may overlap). +;; Argument 0 is the destination +;; Argument 1 is the source +;; Argument 2 is the length +;; Argument 3 is the alignment + +(define_expand "movmemsi" + [(parallel [(set (match_operand:BLK 0 "") + (match_operand:BLK 1 "")) + (use (match_operand:SI 2 "")) + (use (match_operand:SI 3 ""))])] + "" +{ + if (expand_block_move (operands, true)) + DONE; + else + FAIL; +}) + ;; Define insns that do load or store with update. Some of these we can ;; get by using pre-decrement or pre-increment, but the hardware can also