Message ID | CAHFci29=FGUBoOKrhLjj9WMGmZn2Gsn_m-ZYVeWieZjxU=YcOw@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Mon, Nov 25, 2013 at 7:41 PM, Jeff Law <law@redhat.com> wrote: >>> On 11/25/13 02:22, bin.cheng wrote: >>>> >>>> Hi, >>>> I previously committed two patches lowering complex address expression for >>>> IVOPT at http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00546.html and >>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01103.html >>>> When I bootstrapping GCC I found there were some peculiar cases like >>>> &MEM[ptr+CST] + xxxx, which should be handled too. This patch consists >>>> below two changes: >>>> >>>> 1) change in alloc_iv: >>>> Original code lowers top level complex address expressions like >>>> &MEM[ptr+off]. The patch relaxes check condition in order to lower >>>> expressions like &MEM[ptr+off] + xxx, just as the BASE from below dump: >>>> use 2 >>>> generic >>>> in statement _595 = &MEM[(void *)&this_prg + 36B] + _594; >>>> >>>> at position >>>> type struct gcov_bucket_type * >>>> base (struct gcov_bucket_type *) &MEM[(void *)&this_prg + 36B] + >>>> (sizetype) ((unsigned int) (src_i_683 + -1) * 20) >>>> step 4294967276 >>>> base object (void *) &this_prg >>>> related candidates >>>> >>>> 2) change in tree_to_aff_combination: >>>> The function get_inner_reference returns "&MEM[ptr+off]" as the core for >>>> input like the memory ADDRESS in below dump: >>>> use 2 >>>> address >>>> in statement _59 = MEM[(const struct gcov_ctr_summary *)summary_22(D) + >>>> 4B].histogram[h_ix_111].min_value; >>>> >>>> at position MEM[(const struct gcov_ctr_summary *)summary_22(D) + >>>> 4B].histogram[h_ix_111].min_value >>>> type const gcov_type * >>>> base (const gcov_type *) &MEM[(const struct gcov_ctr_summary >>>> *)summary_22(D) + 4B] + 36 >>>> step 20 >>>> base object (void *) summary_22(D) >>>> related candidates >>>> >>>> Which can be further reduced into something like "summary_22(D) + 40B". >>>> This change is necessary for the first one, because I am using >>>> tree_to_aff_combination rather than get_inner_reference_aff now. >>>> >>>> Bootstrap and test on x86/x86_64/arm. Is it OK? >>>> >>>> Thanks. >>>> bin >>>> >>>> 2013-11-25 Bin Cheng <bin.cheng@arm.com> >>>> >>>> * tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >>>> (alloc_iv): Lower more cases by calling contain_complex_addr_expr >>>> and tree_to_aff_combination. >>>> * tree-affine.c (tree_to_aff_combination): Handle &MEM[ptr+CST] >>>> in core part of complex reference. >>>> >>>> gcc/testsuite/ChangeLog >>>> 2013-11-25 Bin Cheng <bin.cheng@arm.com> >>>> >>>> * gcc.dg/tree-ssa/ivopts-lower_base.c: New test. >>> >>> Unless there's a PR for this problem, I think this needs to wait. >> >> I agree. Btw, please split the patch. >> >> Index: gcc/tree-affine.c >> =================================================================== >> --- gcc/tree-affine.c (revision 205087) >> +++ gcc/tree-affine.c (working copy) >> @@ -328,7 +328,19 @@ tree_to_aff_combination (tree expr, tree type, aff >> double_int::from_uhwi (bitpos / BITS_PER_UNIT)); >> core = build_fold_addr_expr (core); >> if (TREE_CODE (core) == ADDR_EXPR) >> - aff_combination_add_elt (comb, core, double_int_one); >> + { >> + /* Handle &MEM[ptr + CST] in core part of complex reference. */ >> + if (TREE_CODE (TREE_OPERAND (core, 0)) == MEM_REF) >> + { >> + core = TREE_OPERAND (core, 0); >> + tree_to_aff_combination (TREE_OPERAND (core, 0), type, &tmp); >> + aff_combination_add (comb, &tmp); >> + tree_to_aff_combination (TREE_OPERAND (core, 1), sizetype, &tmp); >> + aff_combination_add (comb, &tmp); >> + } >> + else >> + aff_combination_add_elt (comb, core, double_int_one); >> + } >> else >> { >> tree_to_aff_combination (core, type, &tmp) >> >> please handle the offset before taking the address, thus >> >> if (TREE_CODE (core) == MEM_REF) >> { >> add constant offset; >> core = TREE_OPERAND (core, 0); >> } >> else >> core = build_fold_addr_expr (core); >> >> that simplifies things and avoids the address building. >> > Hi, > I split the patch into two and updated the test case. > The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3. > Is it OK? > > Thanks, > bin > > PATCH 1: > > 2014-05-06 Bin Cheng <bin.cheng@arm.com> > > * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for > core part of address expressions. No gcc/ in the changelog Simplify that by using aff_combination_add_cst: + if (TREE_CODE (core) == MEM_REF) + { + aff_combination_add_cst (comb, mem_ref_offset (core)); + core = TREE_OPERAND (core, 0); patch 1 is ok with that change. > PATCH 2: > > 2014-05-06 Bin Cheng <bin.cheng@arm.com> > > * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. > (alloc_iv): Lower base expressions containing ADDR_EXPR. So this "lowers" addresses(?) which are based on &<not-decl>, like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR when originally introducing address lowering (in rev. 204497) I was concerned about the cost? Now I wonder why we bother to convert the lowered form back to trees to store it in ->base and not simply keep (and always compute) the affine expansion only. Thus, change struct iv to have aff_tree base instead of tree base? Can you see what it takes to do such change? At the point we replace uses we go into affine representation again anyway. Thanks, Richard. > gcc/testsuite/ChangeLog > 2014-05-06 Bin Cheng <bin.cheng@arm.com> > > * gcc.dg/tree-ssa/ivopts-lower_base.c: New test. > > > > -- > Best Regards.
On Tue, May 6, 2014 at 6:44 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >> Hi, >> I split the patch into two and updated the test case. >> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3. >> Is it OK? >> >> Thanks, >> bin >> >> PATCH 1: >> >> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >> >> * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for >> core part of address expressions. > > No gcc/ in the changelog > > Simplify that by using aff_combination_add_cst: > > + if (TREE_CODE (core) == MEM_REF) > + { > + aff_combination_add_cst (comb, mem_ref_offset (core)); > + core = TREE_OPERAND (core, 0); > > patch 1 is ok with that change. Installed with below change because of wide-int merge: - core = build_fold_addr_expr (core); + if (TREE_CODE (core) == MEM_REF) + { + aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1))); + core = TREE_OPERAND (core, 0); > >> PATCH 2: >> >> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >> >> * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >> (alloc_iv): Lower base expressions containing ADDR_EXPR. > > So this "lowers" addresses(?) which are based on &<not-decl>, > like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR &a+4 is already in its lowered form, what we want to handle is &EXPR + 4, in which EXPR is MEM_REF/ARRAY_REF, etc.. > when originally introducing address lowering (in rev. 204497) I was > concerned about the cost? Yes, you did. I still think the iv number is relative small for each loop, thus the change won't cause compilation time issue considering the existing tree<->affine transformation in ivopt. I would like to collect more accurate time information for ivopt in gcc bootstrap. Should I use "-ftime-report" then add all time slices in TV_TREE_LOOP_IVOPTS category? Is there any better solutions? Thanks. > > Now I wonder why we bother to convert the lowered form back > to trees to store it in ->base and not simply keep (and always > compute) the affine expansion only. Thus, change struct iv > to have aff_tree base instead of tree base? > > Can you see what it takes to do such change? At the point > we replace uses we go into affine representation again anyway. Good idea, I may have a look. Thanks, bin
On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Tue, May 6, 2014 at 6:44 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >>> <richard.guenther@gmail.com> wrote: > >>> Hi, >>> I split the patch into two and updated the test case. >>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3. >>> Is it OK? >>> >>> Thanks, >>> bin >>> >>> PATCH 1: >>> >>> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >>> >>> * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for >>> core part of address expressions. >> >> No gcc/ in the changelog >> >> Simplify that by using aff_combination_add_cst: >> >> + if (TREE_CODE (core) == MEM_REF) >> + { >> + aff_combination_add_cst (comb, mem_ref_offset (core)); >> + core = TREE_OPERAND (core, 0); >> >> patch 1 is ok with that change. > > Installed with below change because of wide-int merge: > - core = build_fold_addr_expr (core); > + if (TREE_CODE (core) == MEM_REF) > + { > + aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1))); > + core = TREE_OPERAND (core, 0); > >> >>> PATCH 2: >>> >>> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >>> >>> * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >>> (alloc_iv): Lower base expressions containing ADDR_EXPR. >> >> So this "lowers" addresses(?) which are based on &<not-decl>, >> like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR > &a+4 is already in its lowered form, what we want to handle is &EXPR + > 4, in which EXPR is MEM_REF/ARRAY_REF, etc.. > >> when originally introducing address lowering (in rev. 204497) I was >> concerned about the cost? > Yes, you did. I still think the iv number is relative small for each > loop, thus the change won't cause compilation time issue considering > the existing tree<->affine transformation in ivopt. > I would like to collect more accurate time information for ivopt in > gcc bootstrap. Should I use "-ftime-report" then add all time slices > in TV_TREE_LOOP_IVOPTS category? Is there any better solutions? > Thanks. > >> >> Now I wonder why we bother to convert the lowered form back >> to trees to store it in ->base and not simply keep (and always >> compute) the affine expansion only. Thus, change struct iv >> to have aff_tree base instead of tree base? >> >> Can you see what it takes to do such change? At the point >> we replace uses we go into affine representation again anyway. > Good idea, I may have a look. After going through work flow of ivopt, I think it's practical to make such change and can help compilation time. Ivopt calls tree_to_aff_combination to convert iv->base into affine_tree whenever it tries to represent an iv_use by an iv_cand, i.e., N*M times for computing cost of each iv_use represented by each iv_cand, and M times for rewriting each iv_use. Here, N and M stands for number of iv_use and iv_cand. Also strip_offset (which is a recursive function) is called for iv->base also at complexity of O(NM), so it may be improved too. To make a start, it's possible to store both tree and affine_tree of base in struct iv, and use either of them whenever needed. It seems to me there are two potential issues of this change. First, though affine_tree of base is stored, we can't operate on it directly, which means we have to copy it before using it. Second, ivopt sometimes computes affine_tree of base in different type other than TREE_TYPE(base), we may need to do additional calls to aff_combination_convert to get wanted type. All these will introduce additional computation and compromise benefit of the change. At last, I did some experiments on how many additional calls to tree_to_aff_combination are introduced by this patch. The data of bootstrap shows it's less than 3% comparing to calls made by current implementation of ivopt, which confirms that this patch shouldn't have issue of compilation time. Any comments? Thanks, bin
On Sun, May 11, 2014 at 2:49 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: > On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Tue, May 6, 2014 at 6:44 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >>>> <richard.guenther@gmail.com> wrote: >> >>>> Hi, >>>> I split the patch into two and updated the test case. >>>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3. >>>> Is it OK? >>>> >>>> Thanks, >>>> bin >>>> >>>> PATCH 1: >>>> >>>> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >>>> >>>> * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for >>>> core part of address expressions. >>> >>> No gcc/ in the changelog >>> >>> Simplify that by using aff_combination_add_cst: >>> >>> + if (TREE_CODE (core) == MEM_REF) >>> + { >>> + aff_combination_add_cst (comb, mem_ref_offset (core)); >>> + core = TREE_OPERAND (core, 0); >>> >>> patch 1 is ok with that change. >> >> Installed with below change because of wide-int merge: >> - core = build_fold_addr_expr (core); >> + if (TREE_CODE (core) == MEM_REF) >> + { >> + aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1))); >> + core = TREE_OPERAND (core, 0); >> >>> >>>> PATCH 2: >>>> >>>> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >>>> >>>> * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >>>> (alloc_iv): Lower base expressions containing ADDR_EXPR. >>> >>> So this "lowers" addresses(?) which are based on &<not-decl>, >>> like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR >> &a+4 is already in its lowered form, what we want to handle is &EXPR + >> 4, in which EXPR is MEM_REF/ARRAY_REF, etc.. >> >>> when originally introducing address lowering (in rev. 204497) I was >>> concerned about the cost? >> Yes, you did. I still think the iv number is relative small for each >> loop, thus the change won't cause compilation time issue considering >> the existing tree<->affine transformation in ivopt. >> I would like to collect more accurate time information for ivopt in >> gcc bootstrap. Should I use "-ftime-report" then add all time slices >> in TV_TREE_LOOP_IVOPTS category? Is there any better solutions? >> Thanks. >> >>> >>> Now I wonder why we bother to convert the lowered form back >>> to trees to store it in ->base and not simply keep (and always >>> compute) the affine expansion only. Thus, change struct iv >>> to have aff_tree base instead of tree base? >>> >>> Can you see what it takes to do such change? At the point >>> we replace uses we go into affine representation again anyway. >> Good idea, I may have a look. > After going through work flow of ivopt, I think it's practical to make > such change and can help compilation time. Ivopt calls > tree_to_aff_combination to convert iv->base into affine_tree whenever > it tries to represent an iv_use by an iv_cand, i.e., N*M times for > computing cost of each iv_use represented by each iv_cand, and M times > for rewriting each iv_use. Here, N and M stands for number of iv_use > and iv_cand. Also strip_offset (which is a recursive function) is > called for iv->base also at complexity of O(NM), so it may be improved > too. > To make a start, it's possible to store both tree and affine_tree of > base in struct iv, and use either of them whenever needed. > > It seems to me there are two potential issues of this change. First, > though affine_tree of base is stored, we can't operate on it directly, > which means we have to copy it before using it. You'd use it as unchanged operand to some aff_* function to avoid actual copying of course. > Second, ivopt > sometimes computes affine_tree of base in different type other than > TREE_TYPE(base), we may need to do additional calls to > aff_combination_convert to get wanted type. All these will introduce > additional computation and compromise benefit of the change. Sure. > At last, I did some experiments on how many additional calls to > tree_to_aff_combination are introduced by this patch. The data of > bootstrap shows it's less than 3% comparing to calls made by current > implementation of ivopt, which confirms that this patch shouldn't have > issue of compilation time. > > Any comments? I'd see the use of affines as making the algorithmic structure of IVOPTs clearer and more consistent (also with possibly using the _expand variants later to get even more simplifications). I don't want to force you to do this change but I thought it may help further changes (as you seem to be doing a lot of IVOPTs changes lately). So - the patch is ok as-is then, but please consider refactoring IVOPTs code when you do changes. It's current structure is somewhat awkward. Thanks, Richard. > Thanks, > bin > > > -- > Best Regards.
On Tue, May 13, 2014 at 4:59 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Sun, May 11, 2014 at 2:49 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >> On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>> On Tue, May 6, 2014 at 6:44 PM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote: >>>>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >>>>> <richard.guenther@gmail.com> wrote: >>> >>>>> Hi, >>>>> I split the patch into two and updated the test case. >>>>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3. >>>>> Is it OK? >>>>> >>>>> Thanks, >>>>> bin >>>>> >>>>> PATCH 1: >>>>> >>>>> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >>>>> >>>>> * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for >>>>> core part of address expressions. >>>> >>>> No gcc/ in the changelog >>>> >>>> Simplify that by using aff_combination_add_cst: >>>> >>>> + if (TREE_CODE (core) == MEM_REF) >>>> + { >>>> + aff_combination_add_cst (comb, mem_ref_offset (core)); >>>> + core = TREE_OPERAND (core, 0); >>>> >>>> patch 1 is ok with that change. >>> >>> Installed with below change because of wide-int merge: >>> - core = build_fold_addr_expr (core); >>> + if (TREE_CODE (core) == MEM_REF) >>> + { >>> + aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1))); >>> + core = TREE_OPERAND (core, 0); >>> >>>> >>>>> PATCH 2: >>>>> >>>>> 2014-05-06 Bin Cheng <bin.cheng@arm.com> >>>>> >>>>> * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >>>>> (alloc_iv): Lower base expressions containing ADDR_EXPR. >>>> >>>> So this "lowers" addresses(?) which are based on &<not-decl>, >>>> like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR >>> &a+4 is already in its lowered form, what we want to handle is &EXPR + >>> 4, in which EXPR is MEM_REF/ARRAY_REF, etc.. >>> >>>> when originally introducing address lowering (in rev. 204497) I was >>>> concerned about the cost? >>> Yes, you did. I still think the iv number is relative small for each >>> loop, thus the change won't cause compilation time issue considering >>> the existing tree<->affine transformation in ivopt. >>> I would like to collect more accurate time information for ivopt in >>> gcc bootstrap. Should I use "-ftime-report" then add all time slices >>> in TV_TREE_LOOP_IVOPTS category? Is there any better solutions? >>> Thanks. >>> >>>> >>>> Now I wonder why we bother to convert the lowered form back >>>> to trees to store it in ->base and not simply keep (and always >>>> compute) the affine expansion only. Thus, change struct iv >>>> to have aff_tree base instead of tree base? >>>> >>>> Can you see what it takes to do such change? At the point >>>> we replace uses we go into affine representation again anyway. >>> Good idea, I may have a look. >> After going through work flow of ivopt, I think it's practical to make >> such change and can help compilation time. Ivopt calls >> tree_to_aff_combination to convert iv->base into affine_tree whenever >> it tries to represent an iv_use by an iv_cand, i.e., N*M times for >> computing cost of each iv_use represented by each iv_cand, and M times >> for rewriting each iv_use. Here, N and M stands for number of iv_use >> and iv_cand. Also strip_offset (which is a recursive function) is >> called for iv->base also at complexity of O(NM), so it may be improved >> too. >> To make a start, it's possible to store both tree and affine_tree of >> base in struct iv, and use either of them whenever needed. >> >> It seems to me there are two potential issues of this change. First, >> though affine_tree of base is stored, we can't operate on it directly, >> which means we have to copy it before using it. > > You'd use it as unchanged operand to some aff_* function to avoid > actual copying of course. > >> Second, ivopt >> sometimes computes affine_tree of base in different type other than >> TREE_TYPE(base), we may need to do additional calls to >> aff_combination_convert to get wanted type. All these will introduce >> additional computation and compromise benefit of the change. > > Sure. > >> At last, I did some experiments on how many additional calls to >> tree_to_aff_combination are introduced by this patch. The data of >> bootstrap shows it's less than 3% comparing to calls made by current >> implementation of ivopt, which confirms that this patch shouldn't have >> issue of compilation time. >> >> Any comments? > > I'd see the use of affines as making the algorithmic structure of > IVOPTs clearer and more consistent (also with possibly using > the _expand variants later to get even more simplifications). Yes, one example, it's possible to rewrite iv uses in a loop-invariant sensitive manner if we can use affine in the whole process. Currently fold_tree routines tend to decompose invariant into different expressions. > > I don't want to force you to do this change but I thought it may > help further changes (as you seem to be doing a lot of IVOPTs > changes lately). > > So - the patch is ok as-is then, but please consider refactoring Patch installed. > IVOPTs code when you do changes. It's current structure > is somewhat awkward. Understood. I will continue to look at it when have proper time. Thanks, bin
Index: gcc/tree-affine.c =================================================================== --- gcc/tree-affine.c (revision 210047) +++ gcc/tree-affine.c (working copy) @@ -331,7 +331,15 @@ tree_to_aff_combination (tree expr, tree type, aff break; aff_combination_const (comb, type, double_int::from_uhwi (bitpos / BITS_PER_UNIT)); - core = build_fold_addr_expr (core); + if (TREE_CODE (core) == MEM_REF) + { + tree_to_aff_combination (TREE_OPERAND (core, 1), sizetype, &tmp); + aff_combination_add (comb, &tmp); + core = TREE_OPERAND (core, 0); + } + else + core = build_fold_addr_expr (core); + if (TREE_CODE (core) == ADDR_EXPR) aff_combination_add_elt (comb, core, double_int_one); else