From patchwork Wed Jun 16 14:33:34 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 55889 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id C00D61007D3 for ; Thu, 17 Jun 2010 00:33:51 +1000 (EST) Received: (qmail 17789 invoked by alias); 16 Jun 2010 14:33:49 -0000 Received: (qmail 17780 invoked by uid 22791); 16 Jun 2010 14:33:48 -0000 X-SWARE-Spam-Status: No, hits=-6.0 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_HI, SPF_HELO_PASS, TW_CP, T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 16 Jun 2010 14:33:41 +0000 Received: from int-mx08.intmail.prod.int.phx2.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o5GEXNNi004636 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 16 Jun 2010 10:33:23 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [10.16.42.4]) by int-mx08.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id o5GEXLia003038 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 16 Jun 2010 10:33:22 -0400 Received: from tyan-ft48-01.lab.bos.redhat.com (tyan-ft48-01.lab.bos.redhat.com [127.0.0.1]) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4) with ESMTP id o5GEXZxG013179; Wed, 16 Jun 2010 16:33:35 +0200 Received: (from jakub@localhost) by tyan-ft48-01.lab.bos.redhat.com (8.14.4/8.14.4/Submit) id o5GEXY5x013178; Wed, 16 Jun 2010 16:33:34 +0200 Date: Wed, 16 Jun 2010 16:33:34 +0200 From: Jakub Jelinek To: gcc-patches@gcc.gnu.org Cc: Mark Mitchell , Jan Hubicka , Michael Matz Subject: Re: Speed up genattrtab Message-ID: <20100616143334.GP7811@tyan-ft48-01.lab.bos.redhat.com> Reply-To: Jakub Jelinek References: <4C17B61F.2030602@codesourcery.com> <20100615172641.GE10161@atrey.karlin.mff.cuni.cz> <4C17C8F3.10700@codesourcery.com> <20100615191633.GN7811@tyan-ft48-01.lab.bos.redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20100615191633.GN7811@tyan-ft48-01.lab.bos.redhat.com> User-Agent: Mutt/1.5.20 (2009-12-10) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org On Tue, Jun 15, 2010 at 09:16:33PM +0200, Jakub Jelinek wrote: > On Tue, Jun 15, 2010 at 11:39:47AM -0700, Mark Mitchell wrote: > I believe on x86_64/i686 the most time is spent in compiling > internal_dfa_insn_code, primarily because there are so many different > schedulings. > The insn is a big switch on recog_memoized, where most of the cases first > compare ix86_schedule var to some enum. I guess it would be certainly > faster to compile to instead split the big function into separate function > for each schedule and make internal_dfa_insn_code a function pointer, would > need to be benchmarked how it would actually perform at runtime. Here is a WIP untested patch. A little bit hackish (I hardcode "cpu" attribute name - the final variant could perhaps look at all EQ_ATTRs in first decl->condexp and see whether any of the attributes is used in EQ_ATTR in decl->condexp of all reservations) so far and unfinished (it generates internal_dfa_insn_code_generic64 internal_dfa_insn_code_amdfam10 internal_dfa_insn_code_core2 ... functions (and similarly for insn_default_latency_), but so far doesn't emit the glue (in particular some function that would be called to initialize the fn pointers based on ix86_schedule in x86_64/i686 case (not sure if we would need to call it once post options, or e.g. during expand, or before each scheduling), emit the actual fn pointer and how to ensure the callers know it is a fn pointer instead of a function (or whether we need to emit insn_dfa_insn_code as a wrapper that calls a fn pointer). In any case, the patch (assuming there are no bugs, would in the end need to test whether these subfunctions give the same return values as original function) speeded up genattrtab (-g -O0 build, i.e. non-optimized) from 1m9.429s down to 0m3.621s, insn-attrtab.c shrunk from 5139992 bytes to 3822187, compilation time using optimized release checking cc1 went down from 40.949s to 19.403s, insn-attrtab.o .text section went down from 1003956 bytes to 635384 bytes. Jakub --- gcc/genattrtab.c.jj 2010-06-11 09:38:08.000000000 +0200 +++ gcc/genattrtab.c 2010-06-16 16:00:00.000000000 +0200 @@ -4379,28 +4379,115 @@ make_automaton_attrs (void) int i; struct insn_reserv *decl; rtx code_exp, lats_exp, byps_exp; + const char *cpu_name; + struct attr_desc *cpu_attr; if (n_insn_reservs == 0) return; - code_exp = rtx_alloc (COND); - lats_exp = rtx_alloc (COND); - - XVEC (code_exp, 0) = rtvec_alloc (n_insn_reservs * 2); - XVEC (lats_exp, 0) = rtvec_alloc (n_insn_reservs * 2); + cpu_name = "cpu"; + cpu_attr = find_attr (&cpu_name, 0); + if (cpu_attr != NULL && 0) + { + rtx *condexps = XNEWVEC (rtx, n_insn_reservs * 3); + struct attr_value *val; - XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1); - XEXP (lats_exp, 1) = make_numeric_value (0); + gcc_assert (cpu_attr->is_const + && !cpu_attr->is_special + && !cpu_attr->is_numeric); + for (val = cpu_attr->first_value; val; val = val->next) + { + int j; + char *name; + rtx test = attr_rtx (EQ_ATTR, cpu_name, XSTR (val->value, 0)); + + if (val == cpu_attr->default_val) + continue; + gcc_assert (GET_CODE (val->value) == CONST_STRING); + for (decl = all_insn_reservs, i = 0; + decl; + decl = decl->next) + { + rtx ctest = test; + rtx condexp + = simplify_and_tree (decl->condexp, &ctest, -2, 0); + if (condexp == false_rtx) + continue; + if (condexp == true_rtx) + break; + condexps[i] = condexp; + condexps[i + 1] = make_numeric_value (decl->insn_num); + condexps[i + 2] = make_numeric_value (decl->default_latency); + i += 3; + } + + code_exp = rtx_alloc (COND); + lats_exp = rtx_alloc (COND); + + j = i / 3 * 2; + XVEC (code_exp, 0) = rtvec_alloc (j); + XVEC (lats_exp, 0) = rtvec_alloc (j); + + if (decl) + { + XEXP (code_exp, 1) = make_numeric_value (decl->insn_num); + XEXP (lats_exp, 1) = make_numeric_value (decl->default_latency); + } + else + { + XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1); + XEXP (lats_exp, 1) = make_numeric_value (0); + } + + while (i > 0) + { + i -= 3; + j -= 2; + XVECEXP (code_exp, 0, j) = condexps[i]; + XVECEXP (lats_exp, 0, j) = condexps[i]; + + XVECEXP (code_exp, 0, j + 1) = condexps[i + 1]; + XVECEXP (lats_exp, 0, j + 1) = condexps[i + 2]; + } + + name = XNEWVEC (char, + sizeof ("*internal_dfa_insn_code_") + + strlen (XSTR (val->value, 0))); + strcpy (name, "*internal_dfa_insn_code_"); + strcat (name, XSTR (val->value, 0)); + make_internal_attr (name, code_exp, ATTR_NONE); + strcpy (name, "*insn_default_latency_"); + strcat (name, XSTR (val->value, 0)); + make_internal_attr (name, lats_exp, ATTR_NONE); + XDELETEVEC (name); + } - for (decl = all_insn_reservs, i = 0; - decl; - decl = decl->next, i += 2) + XDELETEVEC (condexps); + } + else { - XVECEXP (code_exp, 0, i) = decl->condexp; - XVECEXP (lats_exp, 0, i) = decl->condexp; + code_exp = rtx_alloc (COND); + lats_exp = rtx_alloc (COND); + + XVEC (code_exp, 0) = rtvec_alloc (n_insn_reservs * 2); + XVEC (lats_exp, 0) = rtvec_alloc (n_insn_reservs * 2); - XVECEXP (code_exp, 0, i+1) = make_numeric_value (decl->insn_num); - XVECEXP (lats_exp, 0, i+1) = make_numeric_value (decl->default_latency); + XEXP (code_exp, 1) = make_numeric_value (n_insn_reservs + 1); + XEXP (lats_exp, 1) = make_numeric_value (0); + + for (decl = all_insn_reservs, i = 0; + decl; + decl = decl->next, i += 2) + { + XVECEXP (code_exp, 0, i) = decl->condexp; + XVECEXP (lats_exp, 0, i) = decl->condexp; + + XVECEXP (code_exp, 0, i+1) = make_numeric_value (decl->insn_num); + XVECEXP (lats_exp, 0, i+1) + = make_numeric_value (decl->default_latency); + } + make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE); + make_internal_attr ("*insn_default_latency", lats_exp, ATTR_NONE); } if (n_bypasses == 0) @@ -4423,8 +4510,6 @@ make_automaton_attrs (void) } } - make_internal_attr ("*internal_dfa_insn_code", code_exp, ATTR_NONE); - make_internal_attr ("*insn_default_latency", lats_exp, ATTR_NONE); make_internal_attr ("*bypass_p", byps_exp, ATTR_NONE); }