Message ID | 000001ced487$c9e8e540$5dbaafc0$@arm.com |
---|---|
State | New |
Headers | show |
On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng <bin.cheng@arm.com> wrote: > Hi, > I noticed that IVOPT generates complex address expressions like below for iv > base. > &arr_base[0].y > &arr[0] > &MEM[p+o] > It's even worse for targets support auto-increment addressing mode because > IVOPT adjusts such base expression with +/- step, then creates below: > &arr_base[0].y +/- step > &arr[0] +/- step > &MEM[p+o] +/- step > It has two disadvantages: > 1) Cost computation in IVOPT can't handle complex address expression and > general returns spill_cost for it, which is bad since address iv is > important to IVOPT. > 2) IVOPT creates duplicate candidates for IVs which have same value in > different forms, for example, two candidates are generated with each for > "&a[0]" and "&a". Again, it's even worse for auto-increment addressing > mode. > > This patch fixes the issue by simplifying address expression at the entry of > allocating IV struct. Maybe the simplification can be put in various fold* > functions but I think it might be better in this way, because: > 1) fold* functions are used from front-end to various tree optimizations, > the simplified address expressions may not be what each optimizer wanted. > Think about parallelism related passes, they might want the array index > information kept for further analysis. > 2) In some way, the simplification is conflict with current implementation > of fold* function. Take fold_binary_loc as an example, it tries to simplify > "&a[i1] +p c* i2" into "&a[i1+i2]". Of course we can simplify in this way > for IVOPT too, but that will cause new problems like: a) we have to add code > in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want > to avoid; b) the simplification can't always be done because of the > sign/unsigned offset problem (especially for auto-increment addressing > mode). > 3) There are many entry point for fold* functions, the change will be > non-trivial. > 4) The simplification is only done in alloc_iv for true (not duplicate ones) > iv struct, the number of such iv should be moderate. > > With these points, I think it might be a win to do the simplification in > IVOPT and create a kind of sand box to let IVOPT play. Any suggestions? > > Bootstrap and tested on x86/x86_64/arm. > The patch causes three cases failed on some target, but all of them are > false alarm, which can be resolved by refining test cases to check more > accurate information. > > Is it OK? Hmm. I think you want what get_inner_reference_aff does, using the return value of get_inner_reference as starting point for determine_base_object. And you don't want to restrict yourselves so much on what exprs to process, but only exclude DECL_Ps. Just amend get_inner_reference_aff to return the tree base object. Note that this isn't really "simplifying" but rather lowering all addresses. The other changes in this patch are unrelated, right? Thanks, Richard. > Thanks. > bin > > gcc/testsuite/ChangeLog > 2013-10-29 Bin Cheng <bin.cheng@arm.com> > > * gcc.dg/tree-ssa/loop-2.c: Refine check condition. > * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. > * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. > > 2013-10-29 Bin Cheng <bin.cheng@arm.com> > > * tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression. > (get_shiftadd_cost): Check equality using operand_equal_p. > (force_expr_to_var_cost): Refactor the code. Handle type > conversion. > (split_address_cost): Call force_expr_to_var_cost.
On 10/30/13 14:46, Richard Biener wrote: > On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng<bin.cheng@arm.com> wrote: >> Hi, >> I noticed that IVOPT generates complex address expressions like below for iv >> base. >> &arr_base[0].y >> &arr[0] >> &MEM[p+o] >> It's even worse for targets support auto-increment addressing mode because >> IVOPT adjusts such base expression with +/- step, then creates below: >> &arr_base[0].y +/- step >> &arr[0] +/- step >> &MEM[p+o] +/- step >> It has two disadvantages: >> 1) Cost computation in IVOPT can't handle complex address expression and >> general returns spill_cost for it, which is bad since address iv is >> important to IVOPT. >> 2) IVOPT creates duplicate candidates for IVs which have same value in >> different forms, for example, two candidates are generated with each for >> "&a[0]" and "&a". Again, it's even worse for auto-increment addressing >> mode. >> >> This patch fixes the issue by simplifying address expression at the entry of >> allocating IV struct. Maybe the simplification can be put in various fold* >> functions but I think it might be better in this way, because: >> 1) fold* functions are used from front-end to various tree optimizations, >> the simplified address expressions may not be what each optimizer wanted. >> Think about parallelism related passes, they might want the array index >> information kept for further analysis. >> 2) In some way, the simplification is conflict with current implementation >> of fold* function. Take fold_binary_loc as an example, it tries to simplify >> "&a[i1] +p c* i2" into "&a[i1+i2]". Of course we can simplify in this way >> for IVOPT too, but that will cause new problems like: a) we have to add code >> in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want >> to avoid; b) the simplification can't always be done because of the >> sign/unsigned offset problem (especially for auto-increment addressing >> mode). >> 3) There are many entry point for fold* functions, the change will be >> non-trivial. >> 4) The simplification is only done in alloc_iv for true (not duplicate ones) >> iv struct, the number of such iv should be moderate. >> >> With these points, I think it might be a win to do the simplification in >> IVOPT and create a kind of sand box to let IVOPT play. Any suggestions? >> >> Bootstrap and tested on x86/x86_64/arm. >> The patch causes three cases failed on some target, but all of them are >> false alarm, which can be resolved by refining test cases to check more >> accurate information. >> >> Is it OK? > > Hmm. I think you want what get_inner_reference_aff does, using > the return value of get_inner_reference as starting point for > determine_base_object. And you don't want to restrict yourselves > so much on what exprs to process, but only exclude DECL_Ps. > Just amend get_inner_reference_aff to return the tree base object. Or, update determine_base_object to handle MEM_REF, ARRAY_REF, COMPONENT_REF, etc. by calling get_inner_reference to get the base and continuing the recursive determine_base_object on the return value (TREE_OPERAND (base, 0)). Calling an amended get_inner_reference_aff can be expensive, as the function will also spend time in transforming the reference from tree to aff_tree. Yufeng
> -----Original Message----- > From: Richard Biener [mailto:richard.guenther@gmail.com] > Sent: Wednesday, October 30, 2013 10:46 PM > To: Bin Cheng > Cc: GCC Patches > Subject: Re: [PATCH GCC]Simplify address expression in IVOPT > > On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng <bin.cheng@arm.com> wrote: > > Hmm. I think you want what get_inner_reference_aff does, using the return > value of get_inner_reference as starting point for determine_base_object. > And you don't want to restrict yourselves so much on what exprs to process, > but only exclude DECL_Ps. Did you mean I should pass all address expressions to get_inner_reference_aff except the one with declaration operand? I am a little confused here why DECL_Ps should be handled specially? Seems it can be handled properly by the get_* function. Anything important I missed? > Just amend get_inner_reference_aff to return the tree base object. I found it's inappropriate to do this because: functions like get_inner_reference* only accept reference expressions, while base_object has to be computed for other kinds of expressions too. Take gimple statement "a_1 = *ptr_1" as an example, the base passed to alloc_iv is ptr_1; the base_object computed by determine_base_object is also ptr_1, which we can't rely on get_inner_reference* . Also It seems the comment before determine_base_object might be inaccurate: " Returns a memory object to that EXPR points." which I think is " Returns a pointer pointing to the main memory object to that EXPR points." Right? > > Note that this isn't really "simplifying" but rather lowering all addresses. > > The other changes in this patch are unrelated, right? Right, I should have named this message like "refine cost computation of expressions in IVOPT" just as the patch file. Thanks. bin
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c (working copy) @@ -27,7 +27,7 @@ void xxx(void) /* { dg-final { scan-tree-dump-times " \\* \[^\\n\\r\]*=" 0 "optimized" } } */ /* { dg-final { scan-tree-dump-times "\[^\\n\\r\]*= \\* " 0 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "MEM" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "MEM\\\[base" 1 "optimized" } } */ /* 17 * iter should be strength reduced. */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c (working copy) @@ -7,7 +7,8 @@ extern char a[]; -/* Can not infer loop iteration from array -- exit test can not be replaced. */ +/* Can not infer loop iteration from array -- exit test can not be + replaced by the array address. */ void foo (unsigned int i_width, TYPE dst) { unsigned long long i = 0; @@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst) } } -/* { dg-final { scan-tree-dump-times "Replacing" 0 "ivopts"} } */ +/* { dg-final { scan-tree-dump-times "\[^:\]*if \\(.*j_\[0-9\]+.*\\)" 1 "ivopts"} } */ /* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c (revision 204117) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c (working copy) @@ -18,5 +18,5 @@ long foo(long* p, long* p2, int N1, int N2) return s; } -/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */ +/* { dg-final { scan-tree-dump-times "Replacing exit test: if \\(.*p2.*\\)" 1 "ivopts"} } */ /* { dg-final { cleanup-tree-dump "ivopts" } } */ Index: gcc/tree-ssa-loop-ivopts.c =================================================================== --- gcc/tree-ssa-loop-ivopts.c (revision 204117) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -924,9 +924,25 @@ determine_base_object (tree expr) static struct iv * alloc_iv (tree base, tree step) { + tree expr = base; struct iv *iv = XCNEW (struct iv); gcc_assert (step != NULL_TREE); + /* Simplify complex address expressions like &MEM_REF, &ARRAY_REF + and &COMPONENT_REF into general expressions. By doing this: + 1) More accurate cost can be computed for address expressions; + 2) Duplicate candidates won't be created. */ + STRIP_NOPS (expr); + if (TREE_CODE (expr) == ADDR_EXPR + && (TREE_CODE (TREE_OPERAND (expr, 0)) == MEM_REF + || TREE_CODE (TREE_OPERAND (expr, 0)) == ARRAY_REF + || TREE_CODE (TREE_OPERAND (expr, 0)) == COMPONENT_REF)) + { + aff_tree comb; + tree_to_aff_combination (expr, TREE_TYPE (base), &comb); + base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (&comb)); + } + iv->base = base; iv->base_object = determine_base_object (base); iv->step = step; @@ -3481,17 +3497,21 @@ get_shiftadd_cost (tree expr, enum machine_mode mo int m = exact_log2 (int_cst_value (cst)); int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode)); int sa_cost; + bool equal_p = false; if (!(m >= 0 && m < maxm)) return false; + if (operand_equal_p (op1, mult, 0)) + equal_p = true; + sa_cost = (TREE_CODE (expr) != MINUS_EXPR ? shiftadd_cost (speed, mode, m) - : (mult == op1 + : (equal_p ? shiftsub1_cost (speed, mode, m) : shiftsub0_cost (speed, mode, m))); res = new_cost (sa_cost, 0); - res = add_costs (res, mult == op1 ? cost0 : cost1); + res = add_costs (res, equal_p ? cost0 : cost1); STRIP_NOPS (multop); if (!is_gimple_val (multop)) @@ -3585,30 +3605,14 @@ force_expr_to_var_cost (tree expr, bool speed) op1 = TREE_OPERAND (expr, 1); STRIP_NOPS (op0); STRIP_NOPS (op1); - - if (is_gimple_val (op0)) - cost0 = no_cost; - else - cost0 = force_expr_to_var_cost (op0, speed); - - if (is_gimple_val (op1)) - cost1 = no_cost; - else - cost1 = force_expr_to_var_cost (op1, speed); - break; + case NOP_EXPR: + case CONVERT_EXPR: case NEGATE_EXPR: op0 = TREE_OPERAND (expr, 0); STRIP_NOPS (op0); op1 = NULL_TREE; - - if (is_gimple_val (op0)) - cost0 = no_cost; - else - cost0 = force_expr_to_var_cost (op0, speed); - - cost1 = no_cost; break; default: @@ -3616,6 +3620,18 @@ force_expr_to_var_cost (tree expr, bool speed) return new_cost (target_spill_cost[speed], 0); } + if (op0 == NULL_TREE + || (is_gimple_val (op0) && (TREE_CODE (op0) != ADDR_EXPR))) + cost0 = no_cost; + else + cost0 = force_expr_to_var_cost (op0, speed); + + if (op1 == NULL_TREE + || (is_gimple_val (op1) && (TREE_CODE (op1) != ADDR_EXPR))) + cost1 = no_cost; + else + cost1 = force_expr_to_var_cost (op1, speed); + mode = TYPE_MODE (TREE_TYPE (expr)); switch (TREE_CODE (expr)) { @@ -3639,7 +3655,18 @@ force_expr_to_var_cost (tree expr, bool speed) speed, &sa_cost)) return sa_cost; } + break; + case NOP_EXPR: + case CONVERT_EXPR: + { + tree inner_mode, outer_mode; + outer_mode = TREE_TYPE (expr); + inner_mode = TREE_TYPE (op0); + cost = new_cost (convert_cost (TYPE_MODE (outer_mode), + TYPE_MODE (inner_mode), speed), 0); + } + break; case MULT_EXPR: if (cst_and_fits_in_hwi (op0)) @@ -3713,7 +3740,7 @@ split_address_cost (struct ivopts_data *data, *var_present = true; fd_ivopts_data = data; walk_tree (&addr, find_depends, depends_on, NULL); - return new_cost (target_spill_cost[data->speed], 0); + return force_expr_to_var_cost (addr, data->speed); } *offset += bitpos / BITS_PER_UNIT;