From patchwork Fri Nov 19 17:25:16 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vladimir Makarov X-Patchwork-Id: 72278 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id 117301007D2 for ; Sat, 20 Nov 2010 04:24:48 +1100 (EST) Received: (qmail 27738 invoked by alias); 19 Nov 2010 17:24:45 -0000 Received: (qmail 27706 invoked by uid 22791); 19 Nov 2010 17:24:43 -0000 X-SWARE-Spam-Status: No, hits=-6.2 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, SPF_HELO_PASS, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 19 Nov 2010 17:24:39 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id oAJHOb6B000444 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 19 Nov 2010 12:24:37 -0500 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id oAJHObbR003972; Fri, 19 Nov 2010 12:24:37 -0500 Received: from 1005.home ([10.3.113.16]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id oAJHOZc6016919; Fri, 19 Nov 2010 12:24:35 -0500 Message-ID: <4CE6B2FC.7080800@redhat.com> Date: Fri, 19 Nov 2010 12:25:16 -0500 From: Vladimir Makarov User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.1.6-1.fc13 Thunderbird/3.1.6 MIME-Version: 1.0 To: Jeff Law CC: gcc-patches Subject: Re: Improving reload inheritance code generation and predictability References: <4CE53C3D.6020001@redhat.com> <4CE59A33.707@redhat.com> <4CE69D7F.4000400@redhat.com> In-Reply-To: <4CE69D7F.4000400@redhat.com> X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On 11/19/2010 10:53 AM, Jeff Law wrote: > On 11/18/10 14:27, Vladimir Makarov wrote: >>> >> >> I like this idea and also thought long ago to try it too. Because of >> better inheritance I think it should show some code size improvement >> and probably some performance improvement too besides better debugging. > There's a definite code size improvement. > >> >> I am afraid only that it will take some compilation time too (which >> will be probably compensated partially by less final insns >> processing) and IMHO that is not because of insn traversing but >> mostly because of usage of DF-infrastructure. > I'm also more concerned about the DF scanning than the BB scan when we > need a reload register. Obviously for something with huge blocks (say > our friend fpppp) scanning the insns in the BB could get expensive and > we could clamp the number of insns scanned on a PARAM value. > > Anyway, I quickly inserted some counters to measure some data and ran > a bootstrap (without java). > > The first thing I note only 56% of the source files we compile even > end up calling allocate_reload_reg. I did not track total number of > function's compiled. 56% is low enough that lazily initializing the > DF data is probably worth it since DF scans the entire insn stream. > If we could lazily initialize DF within a block only, then that'd > probably save even more. > > Within the files that called allocate_reload_reg, we had 207003 calls > to allocate_reload_reg and we scanned 2071962 insns in the loop, or 10 > insns per call. That seemed rather high to me as I was expecting a > scan rate of 5-7 insns per call. > > Two related obvious improvements came to mind. If there is only one > spill reg, then scanning is totally unnecessary and if there is only > one spill reg left to find during a scan, we can stop the scan, in > both cases the remaining reg is the most desirable reg and scanning > insns is totally unnecessary. These two improvements get us down to > 7.5 insns scanned per call to allocate_reload_reg. Still more than I > would have expected. > > libgcc's bid_round results in 918 calls and 60627 insns scanned (* 3 > since libgcc is built 3 times during a bootstrap), which represents > more than 10% of the total insns scanned. If we factored out > bid_round's effects we'd be looking at 6.5 insns scanned per call > which seems about right. > Interesting. >> >> Some time ago I analyzed how many memory is used by DF during an IRA >> snapshot. It was about 25% vs 7% allocated by IRA for its IR (% of >> all heap memory). Touching this huge footprint will worse code >> locality and result in slow code. >> >> Reload does not use DF and even automatic insn rescanning is switched >> off. I believe that if reload were rewritten to use DF, it could >> result in much slower code. This is just some my speculations which >> really hard to confirm or reject. > Note that we still have DF structures lying around because ira doesn't > call df_finish prior to calling reload. So the memory increase should > be minimal (basically just the increase due to insns inserted by > caller-saves and the like). > That is not about memory increase. It is about DF data expelling rtl data from caches. I've just did some measurements of compilation time on your patch on all_cp2k_gfortran.f90 (> 400K lines of fortran) without patch 219.20user with only df calls in reload (see patch below) 221.39user with all your patch 221.17user So 1% of degradation is only because the patch touches DF-data (not scanning insns in finding reload reg as someone can think). Better inheritance might improve compilation time because less insns are generated (although it is hard to say the difference on two last lines is too small). That data was for -O2. I guess that the compilation time degradation will be even more for -O0. > The alternative would be to deep scan each insn in the loop. Index: reload1.c =================================================================== --- reload1.c (revision 166913) +++ reload1.c (working copy) @@ -1045,6 +1045,11 @@ } } + df_finish_pass (true); + df_scan_alloc (NULL); + df_scan_blocks (); + df_analyze (); + /* Use the reload registers where necessary by generating move instructions to move the must-be-register values into or out of the reload registers. */ @@ -1061,6 +1066,8 @@ gcc_assert (verify_initial_elim_offsets ()); } + df_finish_pass (true); + /* If we were able to eliminate the frame pointer, show that it is no longer live at the start of any basic block. If it ls live by virtue of being in a pseudo, that pseudo will be marked live