From patchwork Tue Nov 15 21:34:38 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bernd Edlinger X-Patchwork-Id: 695293 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3tJLJn3YFbz9t0p for ; Wed, 16 Nov 2016 08:35:03 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="SnIg6/nH"; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:references:in-reply-to:content-type :mime-version; q=dns; s=default; b=OKgywiUf1+R8N9psppR1FImVEd2ff g+x11v4bDFJ7kI6Fc91IJBikLGYAMqT3p/NEGcWeUZ6UwwrBGsFs1etjuyuAs1h9 zSgD3bxOszv6T1AtuJci0R06U5Vqmyb08s3y2mVSU95+m0wr/ExthpOE8hrEmg1d SHJaC4V2XRLAYE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:references:in-reply-to:content-type :mime-version; s=default; bh=Bnv1x+kQwHE0c/wO3iQMXurfCWg=; b=SnI g6/nHmsQhEcgx3uy54l4NWG0fLc/UfBqdVQRysYPwlqlr8r3penRfHdcwo5c9IJa ddpzF1pC7iQ08N7rAn6n3ueUkFmOTfgOQGUzOGl7dJO+w+9KUzFCPGEci02jYwSO DVUBBXQL1qes77uwdjQ1eWg4JNU78pwssI5MqSJg= Received: (qmail 124584 invoked by alias); 15 Nov 2016 21:34:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 124567 invoked by uid 89); 15 Nov 2016 21:34:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=tossed, 6117, simplification, HX-MS-Has-Attach:yes X-HELO: SNT004-OMC4S20.hotmail.com Received: from snt004-omc4s20.hotmail.com (HELO SNT004-OMC4S20.hotmail.com) (65.55.90.223) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 15 Nov 2016 21:34:42 +0000 Received: from EUR03-DB5-obe.outbound.protection.outlook.com ([65.55.90.200]) by SNT004-OMC4S20.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 15 Nov 2016 13:34:40 -0800 Received: from DB5EUR03FT010.eop-EUR03.prod.protection.outlook.com (10.152.20.58) by DB5EUR03HT108.eop-EUR03.prod.protection.outlook.com (10.152.21.204) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.721.5; Tue, 15 Nov 2016 21:34:38 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.20.51) by DB5EUR03FT010.mail.protection.outlook.com (10.152.20.96) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.721.5 via Frontend Transport; Tue, 15 Nov 2016 21:34:38 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) with mapi id 15.01.0734.007; Tue, 15 Nov 2016 21:34:38 +0000 From: Bernd Edlinger To: GCC Patches , Bernd Schmidt , "richard.sandiford@arm.com" Subject: Re: [PATCH] Significantly reduce memory usage of genattrtab Date: Tue, 15 Nov 2016 21:34:38 +0000 Message-ID: References: <87h9793pe5.fsf@e105548-lin.cambridge.arm.com> In-Reply-To: <87h9793pe5.fsf@e105548-lin.cambridge.arm.com> authentication-results: gcc.gnu.org; dkim=none (message not signed) header.d=none; gcc.gnu.org; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:; UpperCasedChecksum:; SizeAsReceived:7490; Count:36 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 36 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1; DB5EUR03HT108; 7:s2HHl1bjsM9OnCDnMhvSpQ1wJc3TPBbXYj+ajvuvgDeKnlP/Hou/OfOqrAVTxaLyEFzKvZsoffW3BDLMC7kV49MkR/gANc4juMVx0wVoKGiyonL2SKjdiSEcqzEnpSLkHGFCwttDb7imL/DlQi9573WP2iEm5eT324hzRHGumrstVs7RYKXwLrv62zGs5Z5dI762qZ+sMbAZtBkAKTBr7AfQZa5Q8szw2V8mQZfB0yGvtw3+HP01mHfWgM2w7/iD8+HffnCjLwSRtE1pBcvpaRAXLm0KafU4Ww8iy0MGOtuZxYVjRGlU3yUx77Gx1y+Mb06tlLGtyCcGtHEuqEJOMabMJ3PMkrXkqubDcwgiWR0= x-forefront-antispam-report: EFV:NLI; SFV:NSPM; SFS:(10019020)(98900003); DIR:OUT; SFP:1102; SCL:1; SRVR:DB5EUR03HT108; H:AM4PR0701MB2162.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; x-ms-office365-filtering-correlation-id: 62331604-b2f0-4b4c-612a-08d40d9f3300 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001)(1601124038)(1603103113)(1601125047); SRVR:DB5EUR03HT108; x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(432015012)(102415395)(82015046); SRVR:DB5EUR03HT108; BCL:0; PCL:0; RULEID:; SRVR:DB5EUR03HT108; x-forefront-prvs: 012792EC17 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 15 Nov 2016 21:34:38.4856 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5EUR03HT108 On 11/15/16 13:21, Richard Sandiford wrote: > Bernd Edlinger writes: >> Hi! >> >> The genattrtab build-tool uses way too much memory in general. >> I think there is no other build step that uses more memory. >> >> On the currently trunk it takes around 700MB to build the >> ARM latency tab files. I debugged that yesterday >> and found that this can be reduced to 8MB (!). Yes, really. >> >> So the attached patch does try really hard to hash and re-use >> all ever created rtx objects. >> >> Bootstrapped and reg-tested on x86_64-pc-linux-gnu and ARM. >> Is it OK for trunk? > > Just to check: does this produce the same output as before? > And did you notice any difference in the time genattrtab > takes to run? > The run time was in the range of 24-25s, with and without the patch. However the tables are a bit different, although that seems only to be w flaw with the ATTR_CURR_SIMPLIFIED_P which is now re-used when a matching rtx was found in the hash. As I said, the generated functions do really work, probably because just a few simplifications are missing. So it looks like I need to clear the ATTR_CURR_SIMPLIFIED_P on re-used binary ops. That I found out just by try-and-error. I can say that now the generated functions are exactly identical for i386, arm and mips. The memory and the run time did not change due to this re-hashing. > > ATTR_PERMANENT_P is supposed to guarantee that no other rtx like it exists, > so that x != y when x or y is "permanent" implies that the attributes > must be different. This lets attr_equal_p avoid a recursive walk: > > static int > attr_equal_p (rtx x, rtx y) > { > return (x == y || (! (ATTR_PERMANENT_P (x) && ATTR_PERMANENT_P (y)) > && rtx_equal_p (x, y))); > } > > Does the patch still guarantee that? > Hmm, I see. I expected that ATTR_PERMANENT_P means more or less, that it is in the hash table. I believe that a long time ago, there was a kind of garbage collection of temporary rtx objects, that needed to be copied from the temporary space to the permanent space, after the simplification was done. And then all temporary objects were just tossed away. But that was long before my time. Today everything is permanent, that is why the memory usage is unbounded. But I can fix that, by only setting ATTR_PERMANENT_P on the hashed rtx when both sub-rtx are also ATTR_PERMANENT_P. How does that new version look, is it OK? Thanks Bernd. 2016-11-15 Bernd Edlinger * genattrtab.c (attr_rtx_1): Avoid allocating new rtx objects. Clear ATTR_CURR_SIMPLIFIED_P for re-used binary rtx objects. Use DEF_ATTR_STRING for string arguments. Use RTL_HASH for integer arguments. Only set ATTR_PERMANENT_P on newly hashed rtx when all sub-rtx are also permanent. (attr_eq): Simplify. (attr_copy_rtx): Remove. (make_canonical, get_attr_value): Use attr_equal_p. (copy_boolean): Rehash NOT. (simplify_test_exp_in_temp, optimize_attrs): Remove call to attr_copy_rtx. (attr_alt_intersection, attr_alt_union, attr_alt_complement, mk_attr_alt): Rehash EQ_ATTR_ALT. (make_automaton_attrs): Use attr_eq. Index: gcc/genattrtab.c =================================================================== --- gcc/genattrtab.c (revision 242335) +++ gcc/genattrtab.c (working copy) @@ -386,6 +386,7 @@ attr_rtx_1 (enum rtx_code code, va_list p) unsigned int hashcode; struct attr_hash *h; struct obstack *old_obstack = rtl_obstack; + int permanent_p = 1; /* For each of several cases, search the hash table for an existing entry. Use that entry if one is found; otherwise create a new RTL and add it @@ -395,13 +396,8 @@ attr_rtx_1 (enum rtx_code code, va_list p) { rtx arg0 = va_arg (p, rtx); - /* A permanent object cannot point to impermanent ones. */ if (! ATTR_PERMANENT_P (arg0)) - { - rt_val = rtx_alloc (code); - XEXP (rt_val, 0) = arg0; - return rt_val; - } + permanent_p = 0; hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0)); for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) @@ -425,14 +421,8 @@ attr_rtx_1 (enum rtx_code code, va_list p) rtx arg0 = va_arg (p, rtx); rtx arg1 = va_arg (p, rtx); - /* A permanent object cannot point to impermanent ones. */ if (! ATTR_PERMANENT_P (arg0) || ! ATTR_PERMANENT_P (arg1)) - { - rt_val = rtx_alloc (code); - XEXP (rt_val, 0) = arg0; - XEXP (rt_val, 1) = arg1; - return rt_val; - } + permanent_p = 0; hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1)); for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) @@ -440,7 +430,10 @@ attr_rtx_1 (enum rtx_code code, va_list p) && GET_CODE (h->u.rtl) == code && XEXP (h->u.rtl, 0) == arg0 && XEXP (h->u.rtl, 1) == arg1) - return h->u.rtl; + { + ATTR_CURR_SIMPLIFIED_P (h->u.rtl) = 0; + return h->u.rtl; + } if (h == 0) { @@ -481,6 +474,9 @@ attr_rtx_1 (enum rtx_code code, va_list p) char *arg0 = va_arg (p, char *); char *arg1 = va_arg (p, char *); + arg0 = DEF_ATTR_STRING (arg0); + arg1 = DEF_ATTR_STRING (arg1); + hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1)); for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) if (h->hashcode == hashcode @@ -497,6 +493,29 @@ attr_rtx_1 (enum rtx_code code, va_list p) XSTR (rt_val, 1) = arg1; } } + else if (GET_RTX_LENGTH (code) == 2 + && GET_RTX_FORMAT (code)[0] == 'i' + && GET_RTX_FORMAT (code)[1] == 'i') + { + int arg0 = va_arg (p, int); + int arg1 = va_arg (p, int); + + hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1)); + for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next) + if (h->hashcode == hashcode + && GET_CODE (h->u.rtl) == code + && XINT (h->u.rtl, 0) == arg0 + && XINT (h->u.rtl, 1) == arg1) + return h->u.rtl; + + if (h == 0) + { + rtl_obstack = hash_obstack; + rt_val = rtx_alloc (code); + XINT (rt_val, 0) = arg0; + XINT (rt_val, 1) = arg1; + } + } else if (code == CONST_INT) { HOST_WIDE_INT arg0 = va_arg (p, HOST_WIDE_INT); @@ -552,7 +571,7 @@ attr_rtx_1 (enum rtx_code code, va_list p) rtl_obstack = old_obstack; attr_hash_add_rtx (hashcode, rt_val); - ATTR_PERMANENT_P (rt_val) = 1; + ATTR_PERMANENT_P (rt_val) = permanent_p; return rt_val; } @@ -592,7 +611,7 @@ attr_printf (unsigned int len, const char *fmt, .. static rtx attr_eq (const char *name, const char *value) { - return attr_rtx (EQ_ATTR, DEF_ATTR_STRING (name), DEF_ATTR_STRING (value)); + return attr_rtx (EQ_ATTR, name, value); } static const char * @@ -646,89 +665,6 @@ attr_equal_p (rtx x, rtx y) && rtx_equal_p (x, y))); } -/* Copy an attribute value expression, - descending to all depths, but not copying any - permanent hashed subexpressions. */ - -static rtx -attr_copy_rtx (rtx orig) -{ - rtx copy; - int i, j; - RTX_CODE code; - const char *format_ptr; - - /* No need to copy a permanent object. */ - if (ATTR_PERMANENT_P (orig)) - return orig; - - code = GET_CODE (orig); - - switch (code) - { - case REG: - CASE_CONST_ANY: - case SYMBOL_REF: - case MATCH_TEST: - case CODE_LABEL: - case PC: - case CC0: - return orig; - - default: - break; - } - - copy = rtx_alloc (code); - PUT_MODE (copy, GET_MODE (orig)); - ATTR_IND_SIMPLIFIED_P (copy) = ATTR_IND_SIMPLIFIED_P (orig); - ATTR_CURR_SIMPLIFIED_P (copy) = ATTR_CURR_SIMPLIFIED_P (orig); - ATTR_PERMANENT_P (copy) = ATTR_PERMANENT_P (orig); - - format_ptr = GET_RTX_FORMAT (GET_CODE (copy)); - - for (i = 0; i < GET_RTX_LENGTH (GET_CODE (copy)); i++) - { - switch (*format_ptr++) - { - case 'e': - XEXP (copy, i) = XEXP (orig, i); - if (XEXP (orig, i) != NULL) - XEXP (copy, i) = attr_copy_rtx (XEXP (orig, i)); - break; - - case 'E': - case 'V': - XVEC (copy, i) = XVEC (orig, i); - if (XVEC (orig, i) != NULL) - { - XVEC (copy, i) = rtvec_alloc (XVECLEN (orig, i)); - for (j = 0; j < XVECLEN (copy, i); j++) - XVECEXP (copy, i, j) = attr_copy_rtx (XVECEXP (orig, i, j)); - } - break; - - case 'n': - case 'i': - XINT (copy, i) = XINT (orig, i); - break; - - case 'w': - XWINT (copy, i) = XWINT (orig, i); - break; - - case 's': - case 'S': - XSTR (copy, i) = XSTR (orig, i); - break; - - default: - gcc_unreachable (); - } - } - return copy; -} - /* Given a test expression EXP for attribute ATTR, ensure it is validly formed. LOC is the location of the .md construct that contains EXP. @@ -1236,7 +1172,7 @@ make_canonical (file_location loc, struct attr_des XVECEXP (exp, 0, i) = copy_boolean (XVECEXP (exp, 0, i)); XVECEXP (exp, 0, i + 1) = make_canonical (loc, attr, XVECEXP (exp, 0, i + 1)); - if (! rtx_equal_p (XVECEXP (exp, 0, i + 1), defval)) + if (! attr_equal_p (XVECEXP (exp, 0, i + 1), defval)) allsame = 0; } if (allsame) @@ -1257,6 +1193,8 @@ copy_boolean (rtx exp) if (GET_CODE (exp) == AND || GET_CODE (exp) == IOR) return attr_rtx (GET_CODE (exp), copy_boolean (XEXP (exp, 0)), copy_boolean (XEXP (exp, 1))); + else if (GET_CODE (exp) == NOT) + return attr_rtx (NOT, copy_boolean (XEXP (exp, 0))); if (GET_CODE (exp) == MATCH_OPERAND) { XSTR (exp, 1) = DEF_ATTR_STRING (XSTR (exp, 1)); @@ -1298,7 +1236,7 @@ get_attr_value (file_location loc, rtx value, stru } for (av = attr->first_value; av; av = av->next) - if (rtx_equal_p (value, av->value) + if (attr_equal_p (value, av->value) && (num_alt == 0 || av->first_insn == NULL || insn_alternatives[av->first_insn->def->insn_code])) return av; @@ -2339,9 +2277,7 @@ simplify_test_exp_in_temp (rtx exp, int insn_code, rtl_obstack = temp_obstack; x = simplify_test_exp (exp, insn_code, insn_index); rtl_obstack = old; - if (x == exp || rtl_obstack == temp_obstack) - return x; - return attr_copy_rtx (x); + return x; } /* Returns true if S1 is a subset of S2. */ @@ -2397,28 +2333,27 @@ attr_alt_subset_of_compl_p (rtx s1, rtx s2) static rtx attr_alt_intersection (rtx s1, rtx s2) { - rtx result = rtx_alloc (EQ_ATTR_ALT); + int result; switch ((XINT (s1, 1) << 1) | XINT (s2, 1)) { case (0 << 1) | 0: - XINT (result, 0) = XINT (s1, 0) & XINT (s2, 0); + result = XINT (s1, 0) & XINT (s2, 0); break; case (0 << 1) | 1: - XINT (result, 0) = XINT (s1, 0) & ~XINT (s2, 0); + result = XINT (s1, 0) & ~XINT (s2, 0); break; case (1 << 1) | 0: - XINT (result, 0) = XINT (s2, 0) & ~XINT (s1, 0); + result = XINT (s2, 0) & ~XINT (s1, 0); break; case (1 << 1) | 1: - XINT (result, 0) = XINT (s1, 0) | XINT (s2, 0); + result = XINT (s1, 0) | XINT (s2, 0); break; default: gcc_unreachable (); } - XINT (result, 1) = XINT (s1, 1) & XINT (s2, 1); - return result; + return attr_rtx (EQ_ATTR_ALT, result, XINT (s1, 1) & XINT (s2, 1)); } /* Return EQ_ATTR_ALT expression representing union of S1 and S2. */ @@ -2426,28 +2361,27 @@ attr_alt_intersection (rtx s1, rtx s2) static rtx attr_alt_union (rtx s1, rtx s2) { - rtx result = rtx_alloc (EQ_ATTR_ALT); + int result; switch ((XINT (s1, 1) << 1) | XINT (s2, 1)) { case (0 << 1) | 0: - XINT (result, 0) = XINT (s1, 0) | XINT (s2, 0); + result = XINT (s1, 0) | XINT (s2, 0); break; case (0 << 1) | 1: - XINT (result, 0) = XINT (s2, 0) & ~XINT (s1, 0); + result = XINT (s2, 0) & ~XINT (s1, 0); break; case (1 << 1) | 0: - XINT (result, 0) = XINT (s1, 0) & ~XINT (s2, 0); + result = XINT (s1, 0) & ~XINT (s2, 0); break; case (1 << 1) | 1: - XINT (result, 0) = XINT (s1, 0) & XINT (s2, 0); + result = XINT (s1, 0) & XINT (s2, 0); break; default: gcc_unreachable (); } - XINT (result, 1) = XINT (s1, 1) | XINT (s2, 1); - return result; + return attr_rtx (EQ_ATTR_ALT, result, XINT (s1, 1) | XINT (s2, 1)); } /* Return EQ_ATTR_ALT expression representing complement of S. */ @@ -2455,12 +2389,7 @@ attr_alt_union (rtx s1, rtx s2) static rtx attr_alt_complement (rtx s) { - rtx result = rtx_alloc (EQ_ATTR_ALT); - - XINT (result, 0) = XINT (s, 0); - XINT (result, 1) = 1 - XINT (s, 1); - - return result; + return attr_rtx (EQ_ATTR_ALT, XINT (s, 0), 1 - XINT (s, 1)); } /* Return EQ_ATTR_ALT expression representing set containing elements set @@ -2469,12 +2398,7 @@ attr_alt_complement (rtx s) static rtx mk_attr_alt (uint64_t e) { - rtx result = rtx_alloc (EQ_ATTR_ALT); - - XINT (result, 0) = e; - XINT (result, 1) = 0; - - return result; + return attr_rtx (EQ_ATTR_ALT, (int)e, 0); } /* Given an expression, see if it can be simplified for a particular insn @@ -3045,7 +2969,6 @@ optimize_attrs (int num_insn_codes) && attr_rtx_cost (newexp) < 26 ) { - newexp = attr_copy_rtx (newexp); remove_insn_ent (av, ie); av = get_attr_value (ie->def->loc, newexp, attr, ie->def->insn_code); @@ -5004,7 +4927,7 @@ make_automaton_attrs (void) { int j; char *name; - rtx test = attr_rtx (EQ_ATTR, tune_attr->name, XSTR (val->value, 0)); + rtx test = attr_eq (tune_attr->name, XSTR (val->value, 0)); if (val == tune_attr->default_val) continue;