From patchwork Wed Oct 16 14:01:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Meissner X-Patchwork-Id: 1177925 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-511110-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="Q0akz/AV"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46tYr60Y20z9sP3 for ; Thu, 17 Oct 2019 01:01:40 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:references:mime-version:content-type :in-reply-to; q=dns; s=default; b=ft1HTK36C8n+C3UnXQyaTpcJzyg5Ga vRE7G7f40fYMv9xF0UrlsRr5G6bpCbd+RCEjfRr3Zpkgm9wQl/t67NwY2U9WEEME ABZPf1tIkFwaQTIIySn9iAarbNit0Q8vtzh/IUO4vpnCuGmNReH1WP7xOhI9l4t2 JYh+rbm62g8QA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:subject:message-id:references:mime-version:content-type :in-reply-to; s=default; bh=V5i9nfKBn/+d9xMXdFk9Z6L5Yq4=; b=Q0ak z/AVk5G48YmZI17dtSfOixX7b/m7G6rSFr5398FNSMQbKgEc0PXE/HwdmTvCkb75 U1I0F9QpDsXLHiAR3qdC3SWJ/iqgHFSigavBEtLq/sIxaZo49xaQ3fkobkYp38wq 2EkZIhJUt0g67juZ+QUQ08ePCifVZGanZ5g8f7c= Received: (qmail 1635 invoked by alias); 16 Oct 2019 14:01:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 1621 invoked by uid 89); 16 Oct 2019 14:01:32 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.1 spammy=67027, reg, subreg, mem, 2026, UD:constraints.md X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 Oct 2019 14:01:26 +0000 Received: from pps.filterd (m0098399.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x9GDwAFP046300; Wed, 16 Oct 2019 10:01:23 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 2vp306crfv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Oct 2019 10:01:23 -0400 Received: from m0098399.ppops.net (m0098399.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.27/8.16.0.27) with SMTP id x9GDwEwF046621; Wed, 16 Oct 2019 10:01:23 -0400 Received: from ppma03wdc.us.ibm.com (ba.79.3fa9.ip4.static.sl-reverse.com [169.63.121.186]) by mx0a-001b2d01.pphosted.com with ESMTP id 2vp306crep-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Oct 2019 10:01:22 -0400 Received: from pps.filterd (ppma03wdc.us.ibm.com [127.0.0.1]) by ppma03wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x9GE0d0I022470; Wed, 16 Oct 2019 14:01:21 GMT Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by ppma03wdc.us.ibm.com with ESMTP id 2vk6f7cx42-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Oct 2019 14:01:21 +0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x9GE1K1J61538808 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 16 Oct 2019 14:01:20 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9C2AF780A6; Wed, 16 Oct 2019 14:01:20 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 22698780A7; Wed, 16 Oct 2019 14:01:20 +0000 (GMT) Received: from ibm-toto.the-meissners.org (unknown [9.32.77.177]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTPS; Wed, 16 Oct 2019 14:01:19 +0000 (GMT) Date: Wed, 16 Oct 2019 10:01:18 -0400 From: Michael Meissner To: Michael Meissner , gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com Subject: [PATCH] V6, #5 of 17: Add prefixed instruction support to vector extract optimizations Message-ID: <20191016140118.GE4483@ibm-toto.the-meissners.org> Mail-Followup-To: Michael Meissner , gcc-patches@gcc.gnu.org, segher@kernel.crashing.org, dje.gcc@gmail.com References: <20191016125100.GA31255@ibm-toto.the-meissners.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20191016125100.GA31255@ibm-toto.the-meissners.org> User-Agent: Mutt/1.5.21 (2010-09-15) This patch updates the support for optimizing vector extracts to know about prefixed addressing. There are two parts to the patch: 1) If a vector extract with a constant element number extracts an element from a vector residing in memory that uses a prefixed address (either numeric or PC-relative), the offset for the element number is folded into the address of the vector and a scalar load is done. 2) If a vector extract with a variable element number extracts an element from a vector residing in memory that uses a prefixed address, the optimization is not done because we would need two temporary base registers to do this calculation and currently we only have one. Instead, the vector is loaded into a vector register, and the element extract is done from the value in a register. We discovered this trying to run real code through the mambo simulator. The compiler previously would try to use the single base register both to hold the address and to generate the element offset. This patch adds a new constraint (em) that prevents using prefixed addresses. Without the constraint the register allocator would recreate the prefixed address for the insn. This patch updates V5 patch #7 which did the same thing. Along with the other patches, I have done bootstraps on a little endian power8 system, and there were no regressions in the test suite. I have built both Spec 2006 and Spec 2017 with all of these patches installed using -mcpu=future, and there were no failures. Can I check this into the trunk? Note, I may have limited email access on October 17th and 18th, 2019. 2019-10-15 Michael Meissner * config/rs6000/constraints.md (em constraint): New constraint for non-prefixed memory. * config/rs6000/predicates.md (non_prefixed_memory): New predicate. (reg_or_non_prefixed_memory): New predicate. * config/rs6000/rs6000.c (rs6000_adjust_vec_address): Add support for optimizing extracting a constant vector element from a vector that uses a prefixed address. If the element number is variable and the address uses a prefixed address, abort. * config/rs6000/vsx.md (vsx_extract__var, VSX_D iterator): Do not allow combining prefixed memory with a variable vector extract. (vsx_extract_v4sf_var): Do not allow combining prefixed memory with a variable vector extract. (vsx_extract__var, VSX_EXTRACT_I iterator): Do not allow combining prefixed memory with a variable vector extract. (vsx_extract__mode_var): Do not allow combining prefixed memory with a variable vector extract. * doc/md.texi (PowerPC constraints): Document the em constraint. Index: gcc/config/rs6000/constraints.md =================================================================== --- gcc/config/rs6000/constraints.md (revision 276974) +++ gcc/config/rs6000/constraints.md (working copy) @@ -202,6 +202,11 @@ (define_constraint "H" ;; Memory constraints +(define_memory_constraint "em" + "A memory operand that does not contain a prefixed address." + (and (match_code "mem") + (match_test "non_prefixed_memory (op, mode)"))) + (define_memory_constraint "es" "A ``stable'' memory operand; that is, one which does not include any automodification of the base register. Unlike @samp{m}, this constraint Index: gcc/config/rs6000/predicates.md =================================================================== --- gcc/config/rs6000/predicates.md (revision 277024) +++ gcc/config/rs6000/predicates.md (working copy) @@ -1822,3 +1822,24 @@ (define_predicate "prefixed_memory" { return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); }) + +;; Return true if the operand is a memory address that does not use a prefixed +;; address. +(define_predicate "non_prefixed_memory" + (match_code "mem") +{ + /* If the operand is not a valid memory operand even if it is not prefixed, + do not return true. */ + if (!memory_operand (op, mode)) + return false; + + return !address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); +}) + +;; Return true if the operand is either a register or it is a non-prefixed +;; memory operand. +(define_predicate "reg_or_non_prefixed_memory" + (match_code "reg,subreg,mem") +{ + return gpc_reg_operand (op, mode) || non_prefixed_memory (op, mode); +}) Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c (revision 277018) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -6702,6 +6702,7 @@ rs6000_adjust_vec_address (rtx scalar_re rtx element_offset; rtx new_addr; bool valid_addr_p; + bool pcrel_p = pcrel_local_address (addr, Pmode); /* Vector addresses should not have PRE_INC, PRE_DEC, or PRE_MODIFY. */ gcc_assert (GET_RTX_CLASS (GET_CODE (addr)) != RTX_AUTOINC); @@ -6739,6 +6740,38 @@ rs6000_adjust_vec_address (rtx scalar_re else if (REG_P (addr) || SUBREG_P (addr)) new_addr = gen_rtx_PLUS (Pmode, addr, element_offset); + /* Optimize PC-relative addresses with a constant offset. */ + else if (pcrel_p && CONST_INT_P (element_offset)) + { + rtx addr2 = addr; + HOST_WIDE_INT offset = INTVAL (element_offset); + + if (GET_CODE (addr2) == CONST) + addr2 = XEXP (addr2, 0); + + if (GET_CODE (addr2) == PLUS) + { + offset += INTVAL (XEXP (addr2, 1)); + addr2 = XEXP (addr2, 0); + } + + gcc_assert (SIGNED_34BIT_OFFSET_P (offset)); + if (offset) + { + addr2 = gen_rtx_PLUS (Pmode, addr2, GEN_INT (offset)); + new_addr = gen_rtx_CONST (Pmode, addr2); + } + else + new_addr = addr2; + } + + /* With only one temporary base register, we can't support a PC-relative + address added to a variable offset. This is because the PADDI instruction + requires RA to be 0 when doing a PC-relative add (i.e. no register to add + to). */ + else if (pcrel_p) + gcc_unreachable (); + /* Optimize D-FORM addresses with constant offset with a constant element, to include the element offset in the address directly. */ else if (GET_CODE (addr) == PLUS) @@ -6753,8 +6786,11 @@ rs6000_adjust_vec_address (rtx scalar_re HOST_WIDE_INT offset = INTVAL (op1) + INTVAL (element_offset); rtx offset_rtx = GEN_INT (offset); - if (IN_RANGE (offset, -32768, 32767) - && (scalar_size < 8 || (offset & 0x3) == 0)) + if (TARGET_PREFIXED_ADDR && SIGNED_34BIT_OFFSET_P (offset)) + new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); + + else if (SIGNED_16BIT_OFFSET_P (offset) + && (scalar_size < 8 || (offset & 0x3) == 0)) new_addr = gen_rtx_PLUS (Pmode, op0, offset_rtx); else { @@ -6802,11 +6838,11 @@ rs6000_adjust_vec_address (rtx scalar_re new_addr = gen_rtx_PLUS (Pmode, base_tmp, element_offset); } - /* If we have a PLUS, we need to see whether the particular register class - allows for D-FORM or X-FORM addressing. */ - if (GET_CODE (new_addr) == PLUS) + /* If we have a PLUS or a PC-relative address without the PLUS, we need to + see whether the particular register class allows for D-FORM or X-FORM + addressing. */ + if (GET_CODE (new_addr) == PLUS || pcrel_p) { - rtx op1 = XEXP (new_addr, 1); addr_mask_type addr_mask; unsigned int scalar_regno = reg_or_subregno (scalar_reg); @@ -6823,10 +6859,16 @@ rs6000_adjust_vec_address (rtx scalar_re else gcc_unreachable (); - if (REG_P (op1) || SUBREG_P (op1)) - valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0; - else + if (pcrel_p) valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0; + else + { + rtx op1 = XEXP (new_addr, 1); + if (REG_P (op1) || SUBREG_P (op1)) + valid_addr_p = (addr_mask & RELOAD_REG_INDEXED) != 0; + else + valid_addr_p = (addr_mask & RELOAD_REG_OFFSET) != 0; + } } else if (REG_P (new_addr) || SUBREG_P (new_addr)) Index: gcc/config/rs6000/vsx.md =================================================================== --- gcc/config/rs6000/vsx.md (revision 277018) +++ gcc/config/rs6000/vsx.md (working copy) @@ -3243,9 +3243,10 @@ (define_insn "vsx_vslo_" ;; Variable V2DI/V2DF extract (define_insn_and_split "vsx_extract__var" [(set (match_operand: 0 "gpc_reg_operand" "=v,wa,r") - (unspec: [(match_operand:VSX_D 1 "input_operand" "v,m,m") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] - UNSPEC_VSX_EXTRACT)) + (unspec: + [(match_operand:VSX_D 1 "reg_or_non_prefixed_memory" "v,em,em") + (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] + UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) (clobber (match_scratch:V2DI 4 "=&v,X,X"))] "VECTOR_MEM_VSX_P (mode) && TARGET_DIRECT_MOVE_64BIT" @@ -3313,9 +3314,10 @@ (define_insn_and_split "*vsx_extract_v4s ;; Variable V4SF extract (define_insn_and_split "vsx_extract_v4sf_var" [(set (match_operand:SF 0 "gpc_reg_operand" "=wa,wa,?r") - (unspec:SF [(match_operand:V4SF 1 "input_operand" "v,m,m") - (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] - UNSPEC_VSX_EXTRACT)) + (unspec:SF + [(match_operand:V4SF 1 "reg_or_non_prefixed_memory" "v,em,em") + (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] + UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,&b,&b")) (clobber (match_scratch:V2DI 4 "=&v,X,X"))] "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT" @@ -3676,7 +3678,7 @@ (define_insn_and_split "*vsx_extract__var" [(set (match_operand: 0 "gpc_reg_operand" "=r,r,r") (unspec: - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_prefixed_memory" "v,v,em") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT)) (clobber (match_scratch:DI 3 "=r,r,&b")) @@ -3696,7 +3698,7 @@ (define_insn_and_split "*vsx_extract_ 0 "gpc_reg_operand" "=r,r,r") (zero_extend: (unspec: - [(match_operand:VSX_EXTRACT_I 1 "input_operand" "v,v,m") + [(match_operand:VSX_EXTRACT_I 1 "reg_or_non_prefixed_memory" "v,v,em") (match_operand:DI 2 "gpc_reg_operand" "r,r,r")] UNSPEC_VSX_EXTRACT))) (clobber (match_scratch:DI 3 "=r,r,&b")) Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi (revision 276974) +++ gcc/doc/md.texi (working copy) @@ -3373,6 +3373,9 @@ asm ("st %1,%0" : "=m<>" (mem) : "r" (va is not. +@item em +A memory operand that does not contain a prefixed address. + @item es A ``stable'' memory operand; that is, one which does not include any automodification of the base register. This used to be useful when