From patchwork Mon Nov 4 18:41:28 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yufeng Zhang X-Patchwork-Id: 288225 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 6630F2C0086 for ; Tue, 5 Nov 2013 05:43:24 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=qX0YlqxCKn/rRqzlM+Kwy071x3PGZcGsEieWOkWJQ4g QdGfmo6SIgkjqyBimybcNV6MCFuiOl6iLB8cd1NBCcF79fVmC+iEWJt4bmyMPf5n 6dkyJe1aPc42dSLXB2NWpj7xdjgeBImYdDP/4r6i0Sw18tmmj3R/Y8+b8l73wgtA = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=sYfRSihz+1pQgjwbv39NiGaHDGw=; b=P134hoX3IDVYRBfrf 4PiURNGH2zSIbseJ/j5sJGVk/1JglL+bvvlBjZCHHnQCZ+iqKpNrxnJmi0ebc1Wq e4MplqzFH129r4qnZA9yyypKB6IxIbOob4wXTwxtoUPJv+BOdubdX12zaftwSLZ3 tr99h7WsJCk3KzT4SEv9bXSCDU= Received: (qmail 32762 invoked by alias); 4 Nov 2013 18:41:41 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 32737 invoked by uid 89); 4 Nov 2013 18:41:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL, BAYES_50, RDNS_NONE, SPF_PASS autolearn=no version=3.3.2 X-HELO: service87.mimecast.com Received: from Unknown (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 04 Nov 2013 18:41:39 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Mon, 04 Nov 2013 18:41:30 +0000 Received: from [10.1.201.52] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Mon, 4 Nov 2013 18:41:29 +0000 Message-ID: <5277EA58.5020303@arm.com> Date: Mon, 04 Nov 2013 18:41:28 +0000 From: Yufeng Zhang User-Agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: Bill Schmidt , Richard Biener Subject: [PATCH] Optional alternative base_expr in finding basis for CAND_REFs X-MC-Unique: 113110418413000301 X-IsSubscribed: yes Hi, This patch extends the slsr pass to optionally use an alternative base expression in finding basis for CAND_REFs. Currently the pass uses hash-based algorithm to match the base_expr in a candidate. Given a test case like the following, slsr will not be able to recognize the two CAND_REFs have the same basis, as their base_expr are of different SSA_NAMEs: typedef int arr_2[20][20]; void foo (arr_2 a2, int i, int j) { a2[i][j] = 1; a2[i + 10][j] = 2; } The gimple dump before slsr is like the following (using an arm-none-eabi gcc): i.0_2 = (unsigned int) i_1(D); _3 = i.0_2 * 80; _5 = a2_4(D) + _3; *_5[j_7(D)] = 1; <---- _9 = _3 + 800; _10 = a2_4(D) + _9; *_10[j_7(D)] = 2; <---- Here are the dumps for the two CAND_REFs generated for the two statements pointed by the arrows: 4 [2] _5 = a2_4(D) + _3; ADD : a2_4(D) + (80 * i_1(D)) : int[20] * basis: 0 dependent: 0 sibling: 0 next-interp: 0 dead-savings: 0 8 [2] *_10[j_7(D)] = 2; REF : _10 + ((sizetype) j_7(D) * 4) + 0 : int[20] * basis: 5 dependent: 0 sibling: 0 next-interp: 0 dead-savings: 0 As mentioned previously, slsr cannot establish that candidate 4 is the basis for the candidate 8, as they have different base_exprs: a2_4(D) and _10, respectively. However, the two references actually only differ by an immediate offset (800). This patch uses the tree affine combination facilities to create an optional alternative base expression to be used in finding (as well as recording) the basis. It calls tree_to_aff_combination_expand on base_expr, reset the offset field of the generated aff_tree to 0 and generate a tree from it by calling aff_combination_to_tree. The new tree is recorded as a potential basis, and when find_basis_for_candidate fails to find a basis for a CAND_REF in its normal approach, it searches again using a tree expanded in such way. Such an expanded tree usually discloses the expression behind an SSA_NAME. In the example above, instead of seeing the strength reduction candidate chains like this: _5 -> 5 _10 -> 8 we are now having: _5 -> 5 _10 -> 8 a2_4(D) + (sizetype) i_1(D) * 80 -> 5 -> 8 With the candidates 5 and 8 linked to the same tree expression (a2_4(D) + (sizetype) i_1(D) * 80), slsr is now able to establish that 5 is the basis of 8. The patch doesn't attempt to change the content of any CAND_REF though. It only enables CAND_REFs which (1) have the same stride and (2) have the underlying expressions of their base_expr only differ in immediate offsets, to be recognized to have the same basis. The statements with such CAND_REFs will be lowered to MEM_REFs, and later on the RTL expander shall be able to fold and re-associate the immediate offsets to the rightmost side of the addressing expression, and therefore exposes the common sub-expression successfully. The code-gen difference of the example code on arm with -O2 -mcpu=cortex-15 is: mov r3, r1, asl #6 - add ip, r0, r2, asl #2 str lr, [sp, #-4]! + mov ip, #1 + mov lr, #2 add r1, r3, r1, asl #4 - mov lr, #1 - mov r3, #2 add r0, r0, r1 - add r0, r0, #800 - str lr, [ip, r1] - str r3, [r0, r2, asl #2] + add r3, r0, r2, asl #2 + str ip, [r0, r2, asl #2] + str lr, [r3, #800] ldr pc, [sp], #4 One fewer instruction in this simple case. The example used in illustration is too simple to show code-gen difference on x86_64, but the included test case will show the benefit of the patch quite obviously. The patch has passed * bootstrapping on arm and x86_64 * regtest on arm-none-eabi, aarch64-none-elf and x86_64 There is no regression in SPEC2K on arm or x86_64. OK to commit to the trunk? Any comment is welcomed! Thanks, Yufeng gcc/ * gimple-ssa-strength-reduction.c: Include tree-affine.h. (find_basis_for_base_expr): Update comment. (find_basis_for_candidate): Add new parameter 'alt_base_expr' of type 'tree'. Optionally call find_basis_for_base_expr with 'alt_base_expr'. (record_potential_basis): Add new parameter 'alt_base_expr' of type 'tree'; set node->base_expr with 'alt_base_expr' if it is not NULL. (name_expansions): New static variable. (get_alternative_base): New function. (alloc_cand_and_find_basis): Call get_alternative_base for CAND_REF. Update calls to find_basis_for_candidate and record_potential_basis. (execute_strength_reduction): Call free_affine_expand_cache with &name_expansions. gcc/testsuite/ * gcc.dg/tree-ssa/slsr-41.c: New test. diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c index 9a5072c..3150046 100644 --- a/gcc/gimple-ssa-strength-reduction.c +++ b/gcc/gimple-ssa-strength-reduction.c @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. If not see #include "expmed.h" #include "params.h" #include "hash-table.h" +#include "tree-affine.h" /* Information about a strength reduction candidate. Each statement in the candidate table represents an expression of one of the @@ -434,9 +435,10 @@ find_phi_def (tree base) return c->cand_num; } -/* Helper routine for find_basis_for_candidate. May be called twice: +/* Helper routine for find_basis_for_candidate. May be called three times: once for the candidate's base expr, and optionally again for the - candidate's phi definition. */ + candidate's phi definition, as well as for an alternative base expr + passed as the 2nd argument to find_basis_for_candidate. */ static slsr_cand_t find_basis_for_base_expr (slsr_cand_t c, tree base_expr) @@ -477,10 +479,13 @@ find_basis_for_base_expr (slsr_cand_t c, tree base_expr) appear in a block that dominates the candidate statement and have the same stride and type. If more than one possible basis exists, the one with highest index in the vector is chosen; this will be - the most immediately dominating basis. */ + the most immediately dominating basis. + + When ALT_BASE_EXPR is not NULL, it will also be used to look for + possible candidates if all previous attempts have failed. */ static int -find_basis_for_candidate (slsr_cand_t c) +find_basis_for_candidate (slsr_cand_t c, tree alt_base_expr) { slsr_cand_t basis = find_basis_for_base_expr (c, c->base_expr); @@ -513,6 +518,9 @@ find_basis_for_candidate (slsr_cand_t c) } } + if (!basis && alt_base_expr) + basis = find_basis_for_base_expr (c, alt_base_expr); + if (basis) { c->sibling = basis->dependent; @@ -524,16 +532,17 @@ find_basis_for_candidate (slsr_cand_t c) } /* Record a mapping from the base expression of C to C itself, indicating that - C may potentially serve as a basis using that base expression. */ + C may potentially serve as a basis using that base expression. Use + ALT_BASE_EXPR as the base expression instead, if it is not NULL. */ static void -record_potential_basis (slsr_cand_t c) +record_potential_basis (slsr_cand_t c, tree alt_base_expr) { cand_chain_t node; cand_chain **slot; node = (cand_chain_t) obstack_alloc (&chain_obstack, sizeof (cand_chain)); - node->base_expr = c->base_expr; + node->base_expr = alt_base_expr ? alt_base_expr : c->base_expr; node->cand = c; node->next = NULL; slot = base_cand_map.find_slot (node, INSERT); @@ -548,14 +557,46 @@ record_potential_basis (slsr_cand_t c) *slot = node; } +static struct pointer_map_t *name_expansions; + +/* Given BASE, use the tree affine combiniation facilities to + find the underlying tree expression for BASE, with any + immediate offset excluded. */ + +static tree +get_alternative_base (tree base) +{ + tree expr; + aff_tree aff; + + tree_to_aff_combination_expand (base, TREE_TYPE (base), + &aff, &name_expansions); + aff.offset = tree_to_double_int (integer_zero_node); + expr = aff_combination_to_tree (&aff); + + if (expr == base) + expr = NULL; + + return expr; +} + /* Allocate storage for a new candidate and initialize its fields. - Attempt to find a basis for the candidate. */ + Attempt to find a basis for the candidate. + + For CAND_REF, an alternative base may also be recorded and used + to find a basis. This helps cases where the expression hidden + behind BASE (which is usually an SSA_NAME) has immediate offset, + e.g. + + a2[i][j] = 1; + a2[i + 20][j] = 2; */ static slsr_cand_t -alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, +alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, double_int index, tree stride, tree ctype, unsigned savings) { + tree alt_base_expr = NULL; slsr_cand_t c = (slsr_cand_t) obstack_alloc (&cand_obstack, sizeof (slsr_cand)); c->cand_stmt = gs; @@ -573,12 +614,17 @@ alloc_cand_and_find_basis (enum cand_kind kind, gimple gs, tree base, cand_vec.safe_push (c); + if (kind == CAND_REF) + alt_base_expr = get_alternative_base (base); + if (kind == CAND_PHI) c->basis = 0; else - c->basis = find_basis_for_candidate (c); + c->basis = find_basis_for_candidate (c, alt_base_expr); - record_potential_basis (c); + record_potential_basis (c, NULL); + if (alt_base_expr) + record_potential_basis (c, alt_base_expr); return c; } @@ -3534,6 +3580,8 @@ execute_strength_reduction (void) dump_cand_chains (); } + free_affine_expand_cache (&name_expansions); + /* Analyze costs and make appropriate replacements. */ analyze_candidates_and_replace (); diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c new file mode 100644 index 0000000..870d714 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-41.c @@ -0,0 +1,24 @@ +/* Verify straight-line strength reduction in using + alternative base expr to record and look for the + potential candidate. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-slsr" } */ + +typedef int arr_2[50][50]; + +void foo (arr_2 a2, int v1) +{ + int i, j; + + i = v1 + 5; + j = i; + a2 [i-10] [j] = 2; + a2 [i] [j++] = i; + a2 [i+20] [j++] = i; + a2 [i-3] [i-1] += 1; + return; +} + +/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */ +/* { dg-final { cleanup-tree-dump "slsr" } } */