From patchwork Tue Aug 2 13:32:57 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Modra X-Patchwork-Id: 654806 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3s3cbp5TTmz9t6K for ; Tue, 2 Aug 2016 23:33:42 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=NgAzuNB1; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; q=dns; s=default; b=F84dcYaclw69+jEAqlZk2ZX7ji8b7CywriOGGBAanU6ElPoZ4o D/fQgosGerBH5V/PZsX+FkBETdEZcR5vifodGXSPOYILoBDUcKlmoGLaGsrNt/Fh kj6Ak9eBA0F7ZsJGVfzbmdPib9qj8KJpeAM1SisxCSGhAhP3i8V07q1x0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:date :from:to:cc:subject:message-id:mime-version:content-type; s= default; bh=tu0CoRNX3zLpCwa9aeJCVC+Djsk=; b=NgAzuNB1FilSdLET+6CS TpGtC3ZcmxqFpyxw+mGLMD7aKXSGDNx9SI0XJX5VHQvfEXHZotwXn/JVmsrmvc+T b7UCwFhmUIgaQubPrLCie2JJgt+mOLP8cAx66gbRes7+O1w5/KUR9RP8jzmkWweo 0srHKmEPtDSALnkhC3mMv8E= Received: (qmail 9214 invoked by alias); 2 Aug 2016 13:33:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 8916 invoked by uid 89); 2 Aug 2016 13:33:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=mems, rejecting, get_mode, subreg_reg X-HELO: mail-pf0-f195.google.com Received: from mail-pf0-f195.google.com (HELO mail-pf0-f195.google.com) (209.85.192.195) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 02 Aug 2016 13:33:05 +0000 Received: by mail-pf0-f195.google.com with SMTP id h186so12492165pfg.2 for ; Tue, 02 Aug 2016 06:33:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-disposition:user-agent; bh=QWSKCkGl/K+IVOyfi10c3+/+XhOxMuMCOsIiVnOfH/Q=; b=jwm8OyyUBxecqkuTtLd6Hh843fysPKmXqp4CPBxduj+mu0meJ/GkscX45nRFt8tWHr Xydz7eWnaNDTeGR3M5Ev/fA0lPqVVxG5gV91f6wpi7y/ugEBx45VVHOZLusgqsQ6oItL VuK1zUJLWxobR+WuqlGnV3QMNl37JqzbWYeWb/94R6pNv2kN4FoZnN4AzNa84XoKnit8 lwigx+KmokkTA97G1cwe7/YZaUQPh1VMScm7qeHRTSlw9MxSm1F9iB93YzwU5TOrOEJp nIch3bc5dhOVXZkDfl/2ZIXlxFRKTg5ryDhuCyZK2/LFWEZqIh2M7GK4CJRxTI0GAh2K bSyw== X-Gm-Message-State: AEkoouvK5Qjoj6kaE6ZiaINtYq+SMIWbpNL6TgmCx9QZVtnVRyTYkUI1XVCslXqiOcoZDA== X-Received: by 10.98.88.131 with SMTP id m125mr101928817pfb.63.1470144782940; Tue, 02 Aug 2016 06:33:02 -0700 (PDT) Received: from bubble.grove.modra.org (CPE-58-160-146-233.sa.bigpond.net.au. [58.160.146.233]) by smtp.gmail.com with ESMTPSA id g10sm5063651pfc.57.2016.08.02.06.33.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Aug 2016 06:33:01 -0700 (PDT) Received: by bubble.grove.modra.org (Postfix, from userid 1000) id 3D3E8C1983; Tue, 2 Aug 2016 23:02:57 +0930 (ACST) Date: Tue, 2 Aug 2016 23:02:57 +0930 From: Alan Modra To: gcc-patches@gcc.gnu.org Cc: Vladimir Makarov Subject: [PATCH, LRA] PR71680, Reload of slow mems Message-ID: <20160802133256.GB20904@bubble.grove.modra.org> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) X-IsSubscribed: yes This is a patch for a problem in lra, triggered by the rs6000 backend not allowing SImode in floating point registers. First, some analysis. pr71680.c -m64 -mcpu=power8 -O1 -mlra, ira output showing two problem insns. (insn 7 5 26 3 (set (reg:SI 159 [ a ]) (mem/c:SI (reg/f:DI 158) [1 a+0 S4 A8])) pr71680.c:18 464 {*movsi_internal1} (expr_list:REG_EQUIV (mem/c:SI (reg/f:DI 158) [1 a+0 S4 A8]) (nil))) (insn 26 7 27 3 (set (reg:DI 162) (unspec:DI [ (fix:SI (subreg:SF (reg:SI 159 [ a ]) 0)) ] UNSPEC_FCTIWZ)) pr71680.c:18 372 {fctiwz_sf} (expr_list:REG_DEAD (reg:SI 159 [ a ]) (nil))) Insn 26 requires that reg 159 be of class FLOAT_REGS. first lra action: deleting insn with uid = 7. Changing pseudo 159 in operand 1 of insn 26 on equiv [r158:DI] Creating newreg=164, assigning class ALL_REGS to subreg reg r164 26: r162:DI=unspec[fix(r164:SI#0)] 7 REG_DEAD r159:SI Inserting subreg reload before: 30: r164:SI=[r158:DI] [snip] Change to class FLOAT_REGS for r164 Well, that didn't do much. lra tried the equiv mem, found that didn't work, and had to reload. Effectively getting back to the two original insns but r159 replaced with r164. simplify_operand_subreg did not do anything in this case because SLOW_UNALIGNED_ACCESS was true (wrongly for power8, but that's beside the point). So now we have, using abbreviated rtl notation: r164:SI=[r158:DI] r162:DI=unspec[fix(r164:SI)] The problem here is that the first insn isn't valid, due to the rs6000 backend not supporting SImode in fprs, and r164 must be an fpr to make the second insn valid. next lra action: Creating newreg=165 from oldreg=164, assigning class GENERAL_REGS to r165 30: r165:SI=[r158:DI] Inserting insn reload after: 31: r164:SI=r165:SI so now we have r165:SI=[r158:DI] r164:SI=r165:SI r162:DI=unspec[fix(r164:SI)] This ought to be good on power8, except for one little thing. r165 is GENERAL_REGS so the first insn is good, a gpr load from mem. r164 is FLOAT_REGS, making the last insn good, a fctiwz. The second insn ought to be a sldi, mtvsrd, xscvspdpn combination, but that is only supported for SFmode. So lra continues on reloading the second insn, but in vain because it never tries anything other than SImode and as noted above, SImode is not valid in fprs. What this patch does is arrange to emit the two reloads needed for the SLOW_UNALIGNED_ACCESS case at once, moving the subreg to the second insn in order to switch modes, producing: r164:SI=[r158:DI] r165:SF=r164:SI#0 r162:DI=unspec[fix(r165:SF)] I've also tidied a couple of other things: 1) "old" is unnecessary as it duplicated "operand". 2) Rejecting mem subregs due to SLOW_UNALIGNED_ACCESS only makes sense if the access in the original mode was fast. Bootstrapped and regression tested powerpc64le-linux and powerpc64-linux. OK to apply? PR target/71680 * lra-constraints.c (simplify_operand_subreg): Allow subreg mode for mem when SLOW_UNALIGNED_ACCESS if inner mode is also slow. Emit two reloads for slow mem case, first loading in fast innermode, then converting to required mode. testsuite/ * gcc.target/powerpc/pr71680.c: New. diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index 45b6506..b7b30b1 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -1462,19 +1462,9 @@ simplify_operand_subreg (int nop, machine_mode reg_mode) reg = SUBREG_REG (operand); innermode = GET_MODE (reg); type = curr_static_id->operand[nop].type; - /* If we change address for paradoxical subreg of memory, the - address might violate the necessary alignment or the access might - be slow. So take this into consideration. We should not worry - about access beyond allocated memory for paradoxical memory - subregs as we don't substitute such equiv memory (see processing - equivalences in function lra_constraints) and because for spilled - pseudos we allocate stack memory enough for the biggest - corresponding paradoxical subreg. */ - if (MEM_P (reg) - && (! SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)) - || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode))) + if (MEM_P (reg)) { - rtx subst, old = *curr_id->operand_loc[nop]; + rtx subst; alter_subreg (curr_id->operand_loc[nop], false); subst = *curr_id->operand_loc[nop]; @@ -1482,27 +1472,78 @@ simplify_operand_subreg (int nop, machine_mode reg_mode) if (! valid_address_p (innermode, XEXP (reg, 0), MEM_ADDR_SPACE (reg)) || valid_address_p (GET_MODE (subst), XEXP (subst, 0), - MEM_ADDR_SPACE (subst))) - return true; - else if ((get_constraint_type (lookup_constraint - (curr_static_id->operand[nop].constraint)) - != CT_SPECIAL_MEMORY) - /* We still can reload address and if the address is - valid, we can remove subreg without reloading its - inner memory. */ - && valid_address_p (GET_MODE (subst), - regno_reg_rtx - [ira_class_hard_regs - [base_reg_class (GET_MODE (subst), - MEM_ADDR_SPACE (subst), - ADDRESS, SCRATCH)][0]], - MEM_ADDR_SPACE (subst))) - return true; + MEM_ADDR_SPACE (subst)) + || ((get_constraint_type (lookup_constraint + (curr_static_id->operand[nop].constraint)) + != CT_SPECIAL_MEMORY) + /* We still can reload address and if the address is + valid, we can remove subreg without reloading its + inner memory. */ + && valid_address_p (GET_MODE (subst), + regno_reg_rtx + [ira_class_hard_regs + [base_reg_class (GET_MODE (subst), + MEM_ADDR_SPACE (subst), + ADDRESS, SCRATCH)][0]], + MEM_ADDR_SPACE (subst)))) + { + /* If we change address for paradoxical subreg of memory, the + address might violate the necessary alignment or the access might + be slow. So take this into consideration. We should not worry + about access beyond allocated memory for paradoxical memory + subregs as we don't substitute such equiv memory (see processing + equivalences in function lra_constraints) and because for spilled + pseudos we allocate stack memory enough for the biggest + corresponding paradoxical subreg. */ + if (!SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)) + || SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)) + || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode)) + return true; + + /* INNERMODE is fast, MODE slow. Reload the mem in INNERMODE. */ + enum reg_class rclass + = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS); + if (get_reload_reg (curr_static_id->operand[nop].type, innermode, reg, + rclass, TRUE, "slow mem", &new_reg)) + { + bool insert_before, insert_after; + bitmap_set_bit (&lra_subreg_reload_pseudos, REGNO (new_reg)); + + insert_before = (type != OP_OUT + || GET_MODE_SIZE (innermode) > GET_MODE_SIZE (mode)); + insert_after = type != OP_IN; + insert_move_for_subreg (insert_before ? &before : NULL, + insert_after ? &after : NULL, + reg, new_reg); + } + *curr_id->operand_loc[nop] = operand; + SUBREG_REG (operand) = new_reg; + + /* Convert to MODE. */ + reg = operand; + rclass = (enum reg_class) targetm.preferred_reload_class (reg, ALL_REGS); + if (get_reload_reg (curr_static_id->operand[nop].type, mode, reg, + rclass, TRUE, "slow mem", &new_reg)) + { + bool insert_before, insert_after; + bitmap_set_bit (&lra_subreg_reload_pseudos, REGNO (new_reg)); + + insert_before = type != OP_OUT; + insert_after = type != OP_IN; + insert_move_for_subreg (insert_before ? &before : NULL, + insert_after ? &after : NULL, + reg, new_reg); + } + *curr_id->operand_loc[nop] = new_reg; + lra_process_new_insns (curr_insn, before, after, + "Inserting slow mem reload"); + return true; + } /* If the address was valid and became invalid, prefer to reload the memory. Typical case is when the index scale should correspond the memory. */ - *curr_id->operand_loc[nop] = old; + *curr_id->operand_loc[nop] = operand; } else if (REG_P (reg) && REGNO (reg) < FIRST_PSEUDO_REGISTER) { diff --git a/gcc/testsuite/gcc.target/powerpc/pr71680.c b/gcc/testsuite/gcc.target/powerpc/pr71680.c new file mode 100644 index 0000000..fe5260f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr71680.c @@ -0,0 +1,19 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */ +/* { dg-options "-mcpu=power8 -O1 -mlra" } */ + +#pragma pack(1) +struct +{ + float f0; +} a; + +extern void foo (int); + +int +main (void) +{ + for (;;) + foo ((int) a.f0); +}