From patchwork Tue Oct 29 09:18:09 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bin Cheng X-Patchwork-Id: 286764 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 4BF552C033B for ; Tue, 29 Oct 2013 20:18:45 +1100 (EST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=mQGf6EjTjsQNg05zugggGtCuVATYIXHcEY0SRoOShzN/oqv5AtZQP CzbYvMvh1IjCXFnXMC0FmdnTHB3pu88W1Bh4wHVTYUJ1Q4NXTpybBVBVjYF0ID4D vtbuiayW9zDT4rkAe6qWpAGTB+0Z1R3KA5m8fT6hD+khtl+dQuQkX8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=n+TA8fv92mvKck9dOOO70qFCGKs=; b=szDzOe7994Hwwd6vD3Xr pofzsvWqWfy+LH/vjE2+ODg0Xx62M/fh3RhSo+xpiCjlVfeOzmF8cnuf/MBcMas5 2fXt54igXZcNuElEIc3ztnnTl9y7KtQSvud16RXJrP81r9cBEvxWQS1TCOkXCJ+o N7uCBQQSKN6Hjhf7G4gMc2U= Received: (qmail 28687 invoked by alias); 29 Oct 2013 09:18:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 28672 invoked by uid 89); 29 Oct 2013 09:18:37 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Oct 2013 09:18:35 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Tue, 29 Oct 2013 09:18:31 +0000 Received: from SHAWIN162 ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Tue, 29 Oct 2013 09:18:28 +0000 From: "bin.cheng" To: Subject: [PATCH GCC]Simplify address expression in IVOPT Date: Tue, 29 Oct 2013 17:18:09 +0800 Message-ID: <000001ced487$c9e8e540$5dbaafc0$@arm.com> MIME-Version: 1.0 X-MC-Unique: 113102909183104701 X-IsSubscribed: yes Hi, I noticed that IVOPT generates complex address expressions like below for iv base. &arr_base[0].y &arr[0] &MEM[p+o] It's even worse for targets support auto-increment addressing mode because IVOPT adjusts such base expression with +/- step, then creates below: &arr_base[0].y +/- step &arr[0] +/- step &MEM[p+o] +/- step It has two disadvantages: 1) Cost computation in IVOPT can't handle complex address expression and general returns spill_cost for it, which is bad since address iv is important to IVOPT. 2) IVOPT creates duplicate candidates for IVs which have same value in different forms, for example, two candidates are generated with each for "&a[0]" and "&a". Again, it's even worse for auto-increment addressing mode. This patch fixes the issue by simplifying address expression at the entry of allocating IV struct. Maybe the simplification can be put in various fold* functions but I think it might be better in this way, because: 1) fold* functions are used from front-end to various tree optimizations, the simplified address expressions may not be what each optimizer wanted. Think about parallelism related passes, they might want the array index information kept for further analysis. 2) In some way, the simplification is conflict with current implementation of fold* function. Take fold_binary_loc as an example, it tries to simplify "&a[i1] +p c* i2" into "&a[i1+i2]". Of course we can simplify in this way for IVOPT too, but that will cause new problems like: a) we have to add code in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want to avoid; b) the simplification can't always be done because of the sign/unsigned offset problem (especially for auto-increment addressing mode). 3) There are many entry point for fold* functions, the change will be non-trivial. 4) The simplification is only done in alloc_iv for true (not duplicate ones) iv struct, the number of such iv should be moderate. With these points, I think it might be a win to do the simplification in IVOPT and create a kind of sand box to let IVOPT play. Any suggestions? Bootstrap and tested on x86/x86_64/arm. The patch causes three cases failed on some target, but all of them are false alarm, which can be resolved by refining test cases to check more accurate information. Is it OK? Thanks. bin gcc/testsuite/ChangeLog 2013-10-29 Bin Cheng * gcc.dg/tree-ssa/loop-2.c: Refine check condition. * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. 2013-10-29 Bin Cheng * tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression. (get_shiftadd_cost): Check equality using operand_equal_p. (force_expr_to_var_cost): Refactor the code. Handle type conversion. (split_address_cost): Call force_expr_to_var_cost. Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (working copy) @@ -27,7 +27,7 @@ void xxx(void) /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */ /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */ /* 17 * iter should be strength reduced. */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy) @@ -7,7 +7,8 @@ extern char a[]; -/* Can not infer loop iteration from array -- exit test can not be replaced. */ +/* Can not infer loop iteration from array -- exit test can not be + replaced by the array address. */ void foo (unsigned int i_width, TYPE dst) { unsigned long long i = 0; @@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst) } } -/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */ +/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1 "ivopts"} } */ /* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c (working copy) @@ -18,5 +18,5 @@ long foo(long* p, long* p2, int N1, int N2) return s; } -/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */ +/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(.*p2.*\\)" 1 "ivopts"} } */ /* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 204117) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -924,9 +924,25 @@ determine_base_object (tree expr) static struct iv * alloc_iv (tree base, tree step) { + tree expr = base; struct iv *iv = XCNEW (struct iv); gcc_assert (step != NULL_TREE); + /* Simplify complex address expressions like &MEM_REF, &ARRAY_REF + and &COMPONENT_REF into general expressions. By doing this: + 1) More accurate cost can be computed for address expressions; + 2) Duplicate candidates won't be created. */ + STRIP_NOPS (expr); + if (TREE_CODE (expr) == ADDR_EXPR + && (TREE_CODE (TREE_OPERAND (expr, 0)) == MEM_REF + || TREE_CODE (TREE_OPERAND (expr, 0)) == ARRAY_REF + || TREE_CODE (TREE_OPERAND (expr, 0)) == COMPONENT_REF)) + { + aff_tree comb; + tree_to_aff_combination (expr, TREE_TYPE (base), &comb); + base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (&comb)); + } + iv->base = base; iv->base_object = determine_base_object (base); iv->step = step; @@ -3481,17 +3497,21 @@ get_shiftadd_cost (tree expr, enum machine_mode mo int m = exact_log2 (int_cst_value (cst)); int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode)); int sa_cost; + bool equal_p = false; if (!(m >= 0 && m < maxm)) return false; + if (operand_equal_p (op1, mult, 0)) + equal_p = true; + sa_cost = (TREE_CODE (expr) != MINUS_EXPR ? shiftadd_cost (speed, mode, m) - : (mult == op1 + : (equal_p ? shiftsub1_cost (speed, mode, m) : shiftsub0_cost (speed, mode, m))); res = new_cost (sa_cost, 0); - res = add_costs (res, mult == op1 ? cost0 : cost1); + res = add_costs (res, equal_p ? cost0 : cost1); STRIP_NOPS (multop); if (!is_gimple_val (multop)) @@ -3585,30 +3605,14 @@ force_expr_to_var_cost (tree expr, bool speed) op1 = TREE_OPERAND (expr, 1); STRIP_NOPS (op0); STRIP_NOPS (op1); - - if (is_gimple_val (op0)) - cost0 = no_cost; - else - cost0 = force_expr_to_var_cost (op0, speed); - - if (is_gimple_val (op1)) - cost1 = no_cost; - else - cost1 = force_expr_to_var_cost (op1, speed); - break; + case NOP_EXPR: + case CONVERT_EXPR: case NEGATE_EXPR: op0 = TREE_OPERAND (expr, 0); STRIP_NOPS (op0); op1 = NULL_TREE; - - if (is_gimple_val (op0)) - cost0 = no_cost; - else - cost0 = force_expr_to_var_cost (op0, speed); - - cost1 = no_cost; break; default: @@ -3616,6 +3620,18 @@ force_expr_to_var_cost (tree expr, bool speed) return new_cost (target_spill_cost[speed], 0); } + if (op0 == NULL_TREE + || (is_gimple_val (op0) && (TREE_CODE (op0) != ADDR_EXPR))) + cost0 = no_cost; + else + cost0 = force_expr_to_var_cost (op0, speed); + + if (op1 == NULL_TREE + || (is_gimple_val (op1) && (TREE_CODE (op1) != ADDR_EXPR))) + cost1 = no_cost; + else + cost1 = force_expr_to_var_cost (op1, speed); + mode = TYPE_MODE (TREE_TYPE (expr)); switch (TREE_CODE (expr)) { @@ -3639,7 +3655,18 @@ force_expr_to_var_cost (tree expr, bool speed) speed, &sa_cost)) return sa_cost; } + break; + case NOP_EXPR: + case CONVERT_EXPR: + { + tree inner_mode, outer_mode; + outer_mode = TREE_TYPE (expr); + inner_mode = TREE_TYPE (op0); + cost = new_cost (convert_cost (TYPE_MODE (outer_mode), + TYPE_MODE (inner_mode), speed), 0); + } + break; case MULT_EXPR: if (cst_and_fits_in_hwi (op0)) @@ -3713,7 +3740,7 @@ split_address_cost (struct ivopts_data *data, *var_present = true; fd_ivopts_data = data; walk_tree (&addr, find_depends, depends_on, NULL); - return new_cost (target_spill_cost[data->speed], 0); + return force_expr_to_var_cost (addr, data->speed); } *offset += bitpos / BITS_PER_UNIT;