From patchwork Mon Oct 8 19:36:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Bergner X-Patchwork-Id: 980768 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-487158-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="wSvkki7K"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42TVws6ShNz9sB7 for ; Tue, 9 Oct 2018 06:36:45 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; q=dns; s=default; b=erWNx IaFPNtRdbU4AcW7Z+0DskMGsUdFnJ1rGeseCIjl7t2rx16/VpPKOHXVtR/JWON1a xSNXU0OY1aQxk4/hx3demHS1XM8WntolTMab282fPbzRb/l9fi0zWqzppC2PnLMa aSg5qOv5MhTWXSY0OBFKwmXPNr2ql1lVPrSo48= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:to:cc :from:subject:date:mime-version:content-type :content-transfer-encoding:message-id; s=default; bh=/Sx/alHUegM WBHNo8wU+/2qJlH4=; b=wSvkki7KcD+LVszImp05fo+8dg9mTc9vVTZEvpSUqq/ j3Mr/5A8pR+jyCIOstKaYh0NWaj5aGzNeMA/mc72fVnth9mqtOPGEmu9S3oskbUb mnjYqGNR/8XCrRZIqLMTNXul13uAR4pGOh1EPjQUboDNvbPCsGB59MK7U27M/wdY = Received: (qmail 2154 invoked by alias); 8 Oct 2018 19:36:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2141 invoked by uid 89); 8 Oct 2018 19:36:37 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.8 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=tracks, frequency, TImode, timode X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 08 Oct 2018 19:36:35 +0000 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w98JYMtI120256 for ; Mon, 8 Oct 2018 15:36:33 -0400 Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.152]) by mx0a-001b2d01.pphosted.com with ESMTP id 2n0axfpsmj-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 08 Oct 2018 15:36:33 -0400 Received: from localhost by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 8 Oct 2018 13:36:32 -0600 Received: from b03cxnp07029.gho.boulder.ibm.com (9.17.130.16) by e34.co.us.ibm.com (192.168.1.134) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Mon, 8 Oct 2018 13:36:29 -0600 Received: from b03ledav003.gho.boulder.ibm.com (b03ledav003.gho.boulder.ibm.com [9.17.130.234]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id w98JaSSb47579202 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Mon, 8 Oct 2018 12:36:28 -0700 Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5047F6A047; Mon, 8 Oct 2018 13:36:28 -0600 (MDT) Received: from b03ledav003.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D76C96A058; Mon, 8 Oct 2018 13:36:27 -0600 (MDT) Received: from otta.rchland.ibm.com (unknown [9.10.86.13]) by b03ledav003.gho.boulder.ibm.com (Postfix) with ESMTP; Mon, 8 Oct 2018 13:36:27 -0600 (MDT) To: GCC Patches Cc: Vladimir N Makarov , Jeff Law , Segher Boessenkool , Bill Schmidt From: Peter Bergner Subject: [RFC][PATCH IRA] Fix PR87507, IRA unnecessarily uses non-volatile registers during register assignment Date: Mon, 8 Oct 2018 14:36:27 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 x-cbid: 18100819-0016-0000-0000-0000093DC572 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00009843; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000267; SDB=6.01099794; UDB=6.00568966; IPR=6.00879842; MB=3.00023667; MTD=3.00000008; XFM=3.00000015; UTC=2018-10-08 19:36:31 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18100819-0017-0000-0000-000040A1DD98 Message-Id: X-IsSubscribed: yes PR87507 shows a problem where IRA assigns a non-volatile TImode reg pair to a pseudo when there is a volatile reg pair available to use. This then causes us to emit save/restore code for the non-volatile reg usage. The problem here is that the only volatile reg pair that is available is an odd/even reg pair (r7,r8) and ira-costs.c:ira_tune_allocno_costs() disparages odd/even reg pairs by increasing their cost. That's fine, but an even/odd non-volatile reg pair should still be more expensive than an odd/even reg pair given the save/restore that we'd need to use it. However, the costs used in assign_hard_reg() show that non-volatile reg pair (r30,r31) is cheaper than odd/even reg pair (r7,r8) (15 versus 1000). That's a huge disparity in costs, so looking at where the non-volatile cost comes from, it comes from the code below in ira-color.c:assign_hard_reg(): if (!HONOR_REG_ALLOC_ORDER) { if ((saved_nregs = calculate_saved_nregs (hard_regno, mode)) != 0) /* We need to save/restore the hard register in epilogue/prologue. Therefore we increase the cost. */ { rclass = REGNO_REG_CLASS (hard_regno); add_cost = ((ira_memory_move_cost[mode][rclass][0] + ira_memory_move_cost[mode][rclass][1]) * saved_nregs / hard_regno_nregs (hard_regno, mode) - 1); cost += add_cost; full_cost += add_cost; } } I'm not sure I understand the "* saved_nregs / h_r_n (h_r, m) - 1" part of the calculation. If saved_nregs is the number of hard regs that need to be saved for hard_regno in mode MODE (ie, we don't need to save a hard reg if it's already been saved, etc.), then why aren't we just multiplying by saved_nregs? The other problem I see here is that we're not scaling the cost by the basic block frequency of the prologue/epilogue, which is what is causing the non-volatile reg cost to be so low compared to the odd/even volatile reg use, which is scaled. However, even if I fix that, improve_allocation() comes along and undoes it, because it too does not correctly compute the cost of non-volatiles, so that seems to me that it needs fixing too. I have the following work in progress patch I'd like some comments on. Am I on the right track here? I noticed that assign_hard_reg() tracks min_cost and min_full_cost, but min_cost is actually never used for anything other than setting min_cost, so I removed it. I also don't understand why we don't charge non-volatile usage for targets that define HONOR_REG_ALLOC_ORDER. Why shouldn't we always account for save/restore of non-volatile reg usage? I'll note I did not change that behavior. Thoughts on the patch below? Vlad, can you comment on some of my questions above? Peter gcc/ PR rtl-optimization/87507 * ira-color.c (calculate_saved_nregs): Rename from this... (calculate_saved_nregs_cost): ...to this. Return cost of saving NREGS. (assign_hard_reg): (improve_allocation): gcc/testsuite/ PR rtl-optimization/87507 * gcc.dg/pr10474.c: Don't XFAIL for powerpc*-*-*. * gcc.target/powerpc/vsx-vector-6.p8.c: Update expected output. Index: gcc/ira-color.c =================================================================== --- gcc/ira-color.c (revision 264795) +++ gcc/ira-color.c (working copy) @@ -1648,11 +1648,10 @@ check_hard_reg_p (ira_allocno_t a, int h return j == nregs; } -/* Return number of registers needed to be saved and restored at - function prologue/epilogue if we allocate HARD_REGNO to hold value - of MODE. */ +/* Return the cost of saving and restoring HARD_REGNO in mode MODE at + function prologue and epilogue. */ static int -calculate_saved_nregs (int hard_regno, machine_mode mode) +calculate_saved_nregs_cost (int hard_regno, machine_mode mode) { int i; int nregs = 0; @@ -1663,7 +1662,14 @@ calculate_saved_nregs (int hard_regno, m && !TEST_HARD_REG_BIT (call_used_reg_set, hard_regno + i) && !LOCAL_REGNO (hard_regno + i)) nregs++; - return nregs; + if (nregs == 0) + return 0; + + enum reg_class rclass = REGNO_REG_CLASS (hard_regno); + return (ira_memory_move_cost[mode][rclass][0] + + ira_memory_move_cost[mode][rclass][1]) + * nregs + * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)); } /* Choose a hard register for allocno A. If RETRY_P is TRUE, it means @@ -1694,14 +1700,11 @@ assign_hard_reg (ira_allocno_t a, bool r { HARD_REG_SET conflicting_regs[2], profitable_hard_regs; int i, j, hard_regno, best_hard_regno, class_size; - int cost, mem_cost, min_cost, full_cost, min_full_cost, nwords, word; + int cost, mem_cost, full_cost, min_full_cost, nwords, word; int *a_costs; enum reg_class aclass; machine_mode mode; - static int costs[FIRST_PSEUDO_REGISTER], full_costs[FIRST_PSEUDO_REGISTER]; - int saved_nregs; - enum reg_class rclass; - int add_cost; + static int full_costs[FIRST_PSEUDO_REGISTER]; #ifdef STACK_REGS bool no_stack_reg_p; #endif @@ -1713,9 +1716,7 @@ assign_hard_reg (ira_allocno_t a, bool r aclass = ALLOCNO_CLASS (a); class_size = ira_class_hard_regs_num[aclass]; best_hard_regno = -1; - memset (full_costs, 0, sizeof (int) * class_size); mem_cost = 0; - memset (costs, 0, sizeof (int) * class_size); memset (full_costs, 0, sizeof (int) * class_size); #ifdef STACK_REGS no_stack_reg_p = false; @@ -1733,15 +1734,9 @@ assign_hard_reg (ira_allocno_t a, bool r cost = ALLOCNO_UPDATED_CLASS_COST (a); for (i = 0; i < class_size; i++) if (a_costs != NULL) - { - costs[i] += a_costs[i]; - full_costs[i] += a_costs[i]; - } + full_costs[i] += a_costs[i]; else - { - costs[i] += cost; - full_costs[i] += cost; - } + full_costs[i] += cost; nwords = ALLOCNO_NUM_OBJECTS (a); curr_allocno_process++; for (word = 0; word < nwords; word++) @@ -1861,7 +1856,7 @@ assign_hard_reg (ira_allocno_t a, bool r queue_update_cost (a, NULL, COST_HOP_DIVISOR); update_conflict_hard_regno_costs (full_costs, aclass, false); } - min_cost = min_full_cost = INT_MAX; + min_full_cost = INT_MAX; /* We don't care about giving callee saved registers to allocnos no living through calls because call clobbered registers are allocated first (it is usual practice to put them first in @@ -1878,25 +1873,15 @@ assign_hard_reg (ira_allocno_t a, bool r if (! check_hard_reg_p (a, hard_regno, conflicting_regs, profitable_hard_regs)) continue; - cost = costs[i]; + full_cost = full_costs[i]; if (!HONOR_REG_ALLOC_ORDER) { - if ((saved_nregs = calculate_saved_nregs (hard_regno, mode)) != 0) - /* We need to save/restore the hard register in - epilogue/prologue. Therefore we increase the cost. */ - { - rclass = REGNO_REG_CLASS (hard_regno); - add_cost = ((ira_memory_move_cost[mode][rclass][0] - + ira_memory_move_cost[mode][rclass][1]) - * saved_nregs / hard_regno_nregs (hard_regno, - mode) - 1); - cost += add_cost; - full_cost += add_cost; - } + /* If we need to save/restore the hard register in epilogue/prologue, + then increase its cost. */ + full_cost += calculate_saved_nregs_cost (hard_regno, mode); } - if (min_cost > cost) - min_cost = cost; + if (min_full_cost > full_cost) { min_full_cost = full_cost; @@ -2971,12 +2956,18 @@ improve_allocation (void) for (j = 0; j < class_size; j++) { hregno = ira_class_hard_regs[aclass][j]; + int hregno_cost = costs[hregno]; + + /* Include the cost of saving and restoring HREGNO in the function + prologue/epilogue. */ + hregno_cost += calculate_saved_nregs_cost (hregno, mode); + if (check_hard_reg_p (a, hregno, conflicting_regs, profitable_hard_regs) - && min_cost > costs[hregno]) + && min_cost > hregno_cost) { best = hregno; - min_cost = costs[hregno]; + min_cost = hregno_cost; } } if (min_cost >= 0)