diff mbox

Pick up more address lowering cases for ivopt and tree-affine.c

Message ID CAHFci29=FGUBoOKrhLjj9WMGmZn2Gsn_m-ZYVeWieZjxU=YcOw@mail.gmail.com
State New
Headers show

Commit Message

Bin.Cheng May 6, 2014, 8:39 a.m. UTC
On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Nov 25, 2013 at 7:41 PM, Jeff Law <law@redhat.com> wrote:
>> On 11/25/13 02:22, bin.cheng wrote:
>>>
>>> Hi,
>>> I previously committed two patches lowering complex address expression for
>>> IVOPT at http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00546.html and
>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01103.html
>>> When I bootstrapping GCC I found there were some peculiar cases like
>>> &MEM[ptr+CST] + xxxx, which should be handled too.  This patch consists
>>> below two changes:
>>>
>>> 1) change in alloc_iv:
>>> Original code lowers top level complex address expressions like
>>> &MEM[ptr+off].  The patch relaxes check condition in order to lower
>>> expressions like &MEM[ptr+off] + xxx, just as the BASE from below dump:
>>> use 2
>>>    generic
>>>    in statement _595 = &MEM[(void *)&this_prg + 36B] + _594;
>>>
>>>    at position
>>>    type struct gcov_bucket_type *
>>>    base (struct gcov_bucket_type *) &MEM[(void *)&this_prg + 36B] +
>>> (sizetype) ((unsigned int) (src_i_683 + -1) * 20)
>>>    step 4294967276
>>>    base object (void *) &this_prg
>>>    related candidates
>>>
>>> 2) change in tree_to_aff_combination:
>>> The function get_inner_reference returns "&MEM[ptr+off]" as the core for
>>> input like the memory ADDRESS in below dump:
>>> use 2
>>>    address
>>>    in statement _59 = MEM[(const struct gcov_ctr_summary *)summary_22(D) +
>>> 4B].histogram[h_ix_111].min_value;
>>>
>>>    at position MEM[(const struct gcov_ctr_summary *)summary_22(D) +
>>> 4B].histogram[h_ix_111].min_value
>>>    type const gcov_type *
>>>    base (const gcov_type *) &MEM[(const struct gcov_ctr_summary
>>> *)summary_22(D) + 4B] + 36
>>>    step 20
>>>    base object (void *) summary_22(D)
>>>    related candidates
>>>
>>> Which can be further reduced into something like "summary_22(D) + 40B".
>>> This change is necessary for the first one, because I am using
>>> tree_to_aff_combination rather than get_inner_reference_aff now.
>>>
>>> Bootstrap and test on x86/x86_64/arm.  Is it OK?
>>>
>>> Thanks.
>>> bin
>>>
>>> 2013-11-25  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         * tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
>>>         (alloc_iv): Lower more cases by calling contain_complex_addr_expr
>>>         and tree_to_aff_combination.
>>>         * tree-affine.c (tree_to_aff_combination): Handle &MEM[ptr+CST]
>>>         in core part of complex reference.
>>>
>>> gcc/testsuite/ChangeLog
>>> 2013-11-25  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>         * gcc.dg/tree-ssa/ivopts-lower_base.c: New test.
>>
>> Unless there's a PR for this problem, I think this needs to wait.
>
> I agree.  Btw, please split the patch.
>
> Index: gcc/tree-affine.c
> ===================================================================
> --- gcc/tree-affine.c    (revision 205087)
> +++ gcc/tree-affine.c    (working copy)
> @@ -328,7 +328,19 @@ tree_to_aff_combination (tree expr, tree type, aff
>                   double_int::from_uhwi (bitpos / BITS_PER_UNIT));
>        core = build_fold_addr_expr (core);
>        if (TREE_CODE (core) == ADDR_EXPR)
> -    aff_combination_add_elt (comb, core, double_int_one);
> +    {
> +      /* Handle &MEM[ptr + CST] in core part of complex reference.  */
> +      if (TREE_CODE (TREE_OPERAND (core, 0)) == MEM_REF)
> +        {
> +          core = TREE_OPERAND (core, 0);
> +          tree_to_aff_combination (TREE_OPERAND (core, 0), type, &tmp);
> +          aff_combination_add (comb, &tmp);
> +          tree_to_aff_combination (TREE_OPERAND (core, 1), sizetype, &tmp);
> +          aff_combination_add (comb, &tmp);
> +        }
> +      else
> +        aff_combination_add_elt (comb, core, double_int_one);
> +    }
>        else
>      {
>        tree_to_aff_combination (core, type, &tmp)
>
> please handle the offset before taking the address, thus
>
>   if (TREE_CODE (core) == MEM_REF)
>     {
>        add constant offset;
>        core = TREE_OPERAND (core, 0);
>     }
>   else
>     core = build_fold_addr_expr (core);
>
> that simplifies things and avoids the address building.
>
Hi,
I split the patch into two and updated the test case.
The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3.
Is it OK?

Thanks,
bin

PATCH 1:

2014-05-06  Bin Cheng  <bin.cheng@arm.com>

    * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for
    core part of address expressions.

PATCH 2:

2014-05-06  Bin Cheng  <bin.cheng@arm.com>

    * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
    (alloc_iv): Lower base expressions containing ADDR_EXPR.

gcc/testsuite/ChangeLog
2014-05-06  Bin Cheng  <bin.cheng@arm.com>

    * gcc.dg/tree-ssa/ivopts-lower_base.c: New test.

Comments

Richard Biener May 6, 2014, 10:44 a.m. UTC | #1
On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Nov 25, 2013 at 7:41 PM, Jeff Law <law@redhat.com> wrote:
>>> On 11/25/13 02:22, bin.cheng wrote:
>>>>
>>>> Hi,
>>>> I previously committed two patches lowering complex address expression for
>>>> IVOPT at http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00546.html and
>>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01103.html
>>>> When I bootstrapping GCC I found there were some peculiar cases like
>>>> &MEM[ptr+CST] + xxxx, which should be handled too.  This patch consists
>>>> below two changes:
>>>>
>>>> 1) change in alloc_iv:
>>>> Original code lowers top level complex address expressions like
>>>> &MEM[ptr+off].  The patch relaxes check condition in order to lower
>>>> expressions like &MEM[ptr+off] + xxx, just as the BASE from below dump:
>>>> use 2
>>>>    generic
>>>>    in statement _595 = &MEM[(void *)&this_prg + 36B] + _594;
>>>>
>>>>    at position
>>>>    type struct gcov_bucket_type *
>>>>    base (struct gcov_bucket_type *) &MEM[(void *)&this_prg + 36B] +
>>>> (sizetype) ((unsigned int) (src_i_683 + -1) * 20)
>>>>    step 4294967276
>>>>    base object (void *) &this_prg
>>>>    related candidates
>>>>
>>>> 2) change in tree_to_aff_combination:
>>>> The function get_inner_reference returns "&MEM[ptr+off]" as the core for
>>>> input like the memory ADDRESS in below dump:
>>>> use 2
>>>>    address
>>>>    in statement _59 = MEM[(const struct gcov_ctr_summary *)summary_22(D) +
>>>> 4B].histogram[h_ix_111].min_value;
>>>>
>>>>    at position MEM[(const struct gcov_ctr_summary *)summary_22(D) +
>>>> 4B].histogram[h_ix_111].min_value
>>>>    type const gcov_type *
>>>>    base (const gcov_type *) &MEM[(const struct gcov_ctr_summary
>>>> *)summary_22(D) + 4B] + 36
>>>>    step 20
>>>>    base object (void *) summary_22(D)
>>>>    related candidates
>>>>
>>>> Which can be further reduced into something like "summary_22(D) + 40B".
>>>> This change is necessary for the first one, because I am using
>>>> tree_to_aff_combination rather than get_inner_reference_aff now.
>>>>
>>>> Bootstrap and test on x86/x86_64/arm.  Is it OK?
>>>>
>>>> Thanks.
>>>> bin
>>>>
>>>> 2013-11-25  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         * tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
>>>>         (alloc_iv): Lower more cases by calling contain_complex_addr_expr
>>>>         and tree_to_aff_combination.
>>>>         * tree-affine.c (tree_to_aff_combination): Handle &MEM[ptr+CST]
>>>>         in core part of complex reference.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>> 2013-11-25  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>         * gcc.dg/tree-ssa/ivopts-lower_base.c: New test.
>>>
>>> Unless there's a PR for this problem, I think this needs to wait.
>>
>> I agree.  Btw, please split the patch.
>>
>> Index: gcc/tree-affine.c
>> ===================================================================
>> --- gcc/tree-affine.c    (revision 205087)
>> +++ gcc/tree-affine.c    (working copy)
>> @@ -328,7 +328,19 @@ tree_to_aff_combination (tree expr, tree type, aff
>>                   double_int::from_uhwi (bitpos / BITS_PER_UNIT));
>>        core = build_fold_addr_expr (core);
>>        if (TREE_CODE (core) == ADDR_EXPR)
>> -    aff_combination_add_elt (comb, core, double_int_one);
>> +    {
>> +      /* Handle &MEM[ptr + CST] in core part of complex reference.  */
>> +      if (TREE_CODE (TREE_OPERAND (core, 0)) == MEM_REF)
>> +        {
>> +          core = TREE_OPERAND (core, 0);
>> +          tree_to_aff_combination (TREE_OPERAND (core, 0), type, &tmp);
>> +          aff_combination_add (comb, &tmp);
>> +          tree_to_aff_combination (TREE_OPERAND (core, 1), sizetype, &tmp);
>> +          aff_combination_add (comb, &tmp);
>> +        }
>> +      else
>> +        aff_combination_add_elt (comb, core, double_int_one);
>> +    }
>>        else
>>      {
>>        tree_to_aff_combination (core, type, &tmp)
>>
>> please handle the offset before taking the address, thus
>>
>>   if (TREE_CODE (core) == MEM_REF)
>>     {
>>        add constant offset;
>>        core = TREE_OPERAND (core, 0);
>>     }
>>   else
>>     core = build_fold_addr_expr (core);
>>
>> that simplifies things and avoids the address building.
>>
> Hi,
> I split the patch into two and updated the test case.
> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3.
> Is it OK?
>
> Thanks,
> bin
>
> PATCH 1:
>
> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>
>     * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for
>     core part of address expressions.

No gcc/ in the changelog

Simplify that by using aff_combination_add_cst:

+      if (TREE_CODE (core) == MEM_REF)
+       {
+         aff_combination_add_cst (comb, mem_ref_offset (core));
+         core = TREE_OPERAND (core, 0);

patch 1 is ok with that change.

> PATCH 2:
>
> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>
>     * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
>     (alloc_iv): Lower base expressions containing ADDR_EXPR.

So this "lowers" addresses(?) which are based on &<not-decl>,
like &a[0] + 4 but not &a + 4.  I question this odd choice.  ISTR
when originally introducing address lowering (in rev. 204497) I was
concerned about the cost?

Now I wonder why we bother to convert the lowered form back
to trees to store it in ->base and not simply keep (and always
compute) the affine expansion only.  Thus, change struct iv
to have aff_tree base instead of tree base?

Can you see what it takes to do such change?  At the point
we replace uses we go into affine representation again anyway.

Thanks,
Richard.

> gcc/testsuite/ChangeLog
> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>
>     * gcc.dg/tree-ssa/ivopts-lower_base.c: New test.
>
>
>
> --
> Best Regards.
Bin.Cheng May 8, 2014, 9:08 a.m. UTC | #2
On Tue, May 6, 2014 at 6:44 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:

>> Hi,
>> I split the patch into two and updated the test case.
>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3.
>> Is it OK?
>>
>> Thanks,
>> bin
>>
>> PATCH 1:
>>
>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for
>>     core part of address expressions.
>
> No gcc/ in the changelog
>
> Simplify that by using aff_combination_add_cst:
>
> +      if (TREE_CODE (core) == MEM_REF)
> +       {
> +         aff_combination_add_cst (comb, mem_ref_offset (core));
> +         core = TREE_OPERAND (core, 0);
>
> patch 1 is ok with that change.

Installed with below change because of wide-int merge:
-      core = build_fold_addr_expr (core);
+      if (TREE_CODE (core) == MEM_REF)
+    {
+      aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1)));
+      core = TREE_OPERAND (core, 0);

>
>> PATCH 2:
>>
>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>
>>     * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
>>     (alloc_iv): Lower base expressions containing ADDR_EXPR.
>
> So this "lowers" addresses(?) which are based on &<not-decl>,
> like &a[0] + 4 but not &a + 4.  I question this odd choice.  ISTR
&a+4 is already in its lowered form, what we want to handle is &EXPR +
4, in which EXPR is MEM_REF/ARRAY_REF, etc..

> when originally introducing address lowering (in rev. 204497) I was
> concerned about the cost?
Yes, you did.  I still think the iv number is relative small for each
loop, thus the change won't cause compilation time issue considering
the existing tree<->affine transformation in ivopt.
I would like to collect more accurate time information for ivopt in
gcc bootstrap.  Should I use "-ftime-report" then add all time slices
in TV_TREE_LOOP_IVOPTS category?  Is there any better solutions?
Thanks.

>
> Now I wonder why we bother to convert the lowered form back
> to trees to store it in ->base and not simply keep (and always
> compute) the affine expansion only.  Thus, change struct iv
> to have aff_tree base instead of tree base?
>
> Can you see what it takes to do such change?  At the point
> we replace uses we go into affine representation again anyway.
Good idea, I may have a look.

Thanks,
bin
Bin.Cheng May 11, 2014, 12:49 p.m. UTC | #3
On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Tue, May 6, 2014 at 6:44 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>
>>> Hi,
>>> I split the patch into two and updated the test case.
>>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3.
>>> Is it OK?
>>>
>>> Thanks,
>>> bin
>>>
>>> PATCH 1:
>>>
>>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>     * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for
>>>     core part of address expressions.
>>
>> No gcc/ in the changelog
>>
>> Simplify that by using aff_combination_add_cst:
>>
>> +      if (TREE_CODE (core) == MEM_REF)
>> +       {
>> +         aff_combination_add_cst (comb, mem_ref_offset (core));
>> +         core = TREE_OPERAND (core, 0);
>>
>> patch 1 is ok with that change.
>
> Installed with below change because of wide-int merge:
> -      core = build_fold_addr_expr (core);
> +      if (TREE_CODE (core) == MEM_REF)
> +    {
> +      aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1)));
> +      core = TREE_OPERAND (core, 0);
>
>>
>>> PATCH 2:
>>>
>>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>>
>>>     * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
>>>     (alloc_iv): Lower base expressions containing ADDR_EXPR.
>>
>> So this "lowers" addresses(?) which are based on &<not-decl>,
>> like &a[0] + 4 but not &a + 4.  I question this odd choice.  ISTR
> &a+4 is already in its lowered form, what we want to handle is &EXPR +
> 4, in which EXPR is MEM_REF/ARRAY_REF, etc..
>
>> when originally introducing address lowering (in rev. 204497) I was
>> concerned about the cost?
> Yes, you did.  I still think the iv number is relative small for each
> loop, thus the change won't cause compilation time issue considering
> the existing tree<->affine transformation in ivopt.
> I would like to collect more accurate time information for ivopt in
> gcc bootstrap.  Should I use "-ftime-report" then add all time slices
> in TV_TREE_LOOP_IVOPTS category?  Is there any better solutions?
> Thanks.
>
>>
>> Now I wonder why we bother to convert the lowered form back
>> to trees to store it in ->base and not simply keep (and always
>> compute) the affine expansion only.  Thus, change struct iv
>> to have aff_tree base instead of tree base?
>>
>> Can you see what it takes to do such change?  At the point
>> we replace uses we go into affine representation again anyway.
> Good idea, I may have a look.
After going through work flow of ivopt, I think it's practical to make
such change and can help compilation time.  Ivopt calls
tree_to_aff_combination to convert iv->base into affine_tree whenever
it tries to represent an iv_use by an iv_cand,  i.e., N*M times for
computing cost of each iv_use represented by each iv_cand, and M times
for rewriting each iv_use.  Here, N and M stands for number of iv_use
and iv_cand.  Also strip_offset (which is a recursive function) is
called for iv->base also at complexity of O(NM), so it may be improved
too.
To make a start, it's possible to store both tree and affine_tree of
base in struct iv, and use either of them whenever needed.

It seems to me there are two potential issues of this change.  First,
though affine_tree of base is stored, we can't operate on it directly,
which means we have to copy it before using it.  Second, ivopt
sometimes computes affine_tree of base in different type other than
TREE_TYPE(base), we may need to do additional calls to
aff_combination_convert to get wanted type.  All these will introduce
additional computation and compromise benefit of the change.

At last, I did some experiments on how many additional calls to
tree_to_aff_combination are introduced by this patch.  The data of
bootstrap shows it's less than 3% comparing to calls made by current
implementation of ivopt, which confirms that this patch shouldn't have
issue of compilation time.

Any comments?

Thanks,
bin
Richard Biener May 13, 2014, 8:59 a.m. UTC | #4
On Sun, May 11, 2014 at 2:49 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Tue, May 6, 2014 at 6:44 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>
>>>> Hi,
>>>> I split the patch into two and updated the test case.
>>>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3.
>>>> Is it OK?
>>>>
>>>> Thanks,
>>>> bin
>>>>
>>>> PATCH 1:
>>>>
>>>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>     * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for
>>>>     core part of address expressions.
>>>
>>> No gcc/ in the changelog
>>>
>>> Simplify that by using aff_combination_add_cst:
>>>
>>> +      if (TREE_CODE (core) == MEM_REF)
>>> +       {
>>> +         aff_combination_add_cst (comb, mem_ref_offset (core));
>>> +         core = TREE_OPERAND (core, 0);
>>>
>>> patch 1 is ok with that change.
>>
>> Installed with below change because of wide-int merge:
>> -      core = build_fold_addr_expr (core);
>> +      if (TREE_CODE (core) == MEM_REF)
>> +    {
>> +      aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1)));
>> +      core = TREE_OPERAND (core, 0);
>>
>>>
>>>> PATCH 2:
>>>>
>>>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>>>
>>>>     * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
>>>>     (alloc_iv): Lower base expressions containing ADDR_EXPR.
>>>
>>> So this "lowers" addresses(?) which are based on &<not-decl>,
>>> like &a[0] + 4 but not &a + 4.  I question this odd choice.  ISTR
>> &a+4 is already in its lowered form, what we want to handle is &EXPR +
>> 4, in which EXPR is MEM_REF/ARRAY_REF, etc..
>>
>>> when originally introducing address lowering (in rev. 204497) I was
>>> concerned about the cost?
>> Yes, you did.  I still think the iv number is relative small for each
>> loop, thus the change won't cause compilation time issue considering
>> the existing tree<->affine transformation in ivopt.
>> I would like to collect more accurate time information for ivopt in
>> gcc bootstrap.  Should I use "-ftime-report" then add all time slices
>> in TV_TREE_LOOP_IVOPTS category?  Is there any better solutions?
>> Thanks.
>>
>>>
>>> Now I wonder why we bother to convert the lowered form back
>>> to trees to store it in ->base and not simply keep (and always
>>> compute) the affine expansion only.  Thus, change struct iv
>>> to have aff_tree base instead of tree base?
>>>
>>> Can you see what it takes to do such change?  At the point
>>> we replace uses we go into affine representation again anyway.
>> Good idea, I may have a look.
> After going through work flow of ivopt, I think it's practical to make
> such change and can help compilation time.  Ivopt calls
> tree_to_aff_combination to convert iv->base into affine_tree whenever
> it tries to represent an iv_use by an iv_cand,  i.e., N*M times for
> computing cost of each iv_use represented by each iv_cand, and M times
> for rewriting each iv_use.  Here, N and M stands for number of iv_use
> and iv_cand.  Also strip_offset (which is a recursive function) is
> called for iv->base also at complexity of O(NM), so it may be improved
> too.
> To make a start, it's possible to store both tree and affine_tree of
> base in struct iv, and use either of them whenever needed.
>
> It seems to me there are two potential issues of this change.  First,
> though affine_tree of base is stored, we can't operate on it directly,
> which means we have to copy it before using it.

You'd use it as unchanged operand to some aff_* function to avoid
actual copying of course.

>  Second, ivopt
> sometimes computes affine_tree of base in different type other than
> TREE_TYPE(base), we may need to do additional calls to
> aff_combination_convert to get wanted type.  All these will introduce
> additional computation and compromise benefit of the change.

Sure.

> At last, I did some experiments on how many additional calls to
> tree_to_aff_combination are introduced by this patch.  The data of
> bootstrap shows it's less than 3% comparing to calls made by current
> implementation of ivopt, which confirms that this patch shouldn't have
> issue of compilation time.
>
> Any comments?

I'd see the use of affines as making the algorithmic structure of
IVOPTs clearer and more consistent (also with possibly using
the _expand variants later to get even more simplifications).

I don't want to force you to do this change but I thought it may
help further changes (as you seem to be doing a lot of IVOPTs
changes lately).

So - the patch is ok as-is then, but please consider refactoring
IVOPTs code when you do changes.  It's current structure
is somewhat awkward.

Thanks,
Richard.

> Thanks,
> bin
>
>
> --
> Best Regards.
Bin.Cheng May 13, 2014, 10:18 a.m. UTC | #5
On Tue, May 13, 2014 at 4:59 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Sun, May 11, 2014 at 2:49 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Tue, May 6, 2014 at 6:44 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>
>>>>> Hi,
>>>>> I split the patch into two and updated the test case.
>>>>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm cortex-m3.
>>>>> Is it OK?
>>>>>
>>>>> Thanks,
>>>>> bin
>>>>>
>>>>> PATCH 1:
>>>>>
>>>>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>     * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for
>>>>>     core part of address expressions.
>>>>
>>>> No gcc/ in the changelog
>>>>
>>>> Simplify that by using aff_combination_add_cst:
>>>>
>>>> +      if (TREE_CODE (core) == MEM_REF)
>>>> +       {
>>>> +         aff_combination_add_cst (comb, mem_ref_offset (core));
>>>> +         core = TREE_OPERAND (core, 0);
>>>>
>>>> patch 1 is ok with that change.
>>>
>>> Installed with below change because of wide-int merge:
>>> -      core = build_fold_addr_expr (core);
>>> +      if (TREE_CODE (core) == MEM_REF)
>>> +    {
>>> +      aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1)));
>>> +      core = TREE_OPERAND (core, 0);
>>>
>>>>
>>>>> PATCH 2:
>>>>>
>>>>> 2014-05-06  Bin Cheng  <bin.cheng@arm.com>
>>>>>
>>>>>     * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New.
>>>>>     (alloc_iv): Lower base expressions containing ADDR_EXPR.
>>>>
>>>> So this "lowers" addresses(?) which are based on &<not-decl>,
>>>> like &a[0] + 4 but not &a + 4.  I question this odd choice.  ISTR
>>> &a+4 is already in its lowered form, what we want to handle is &EXPR +
>>> 4, in which EXPR is MEM_REF/ARRAY_REF, etc..
>>>
>>>> when originally introducing address lowering (in rev. 204497) I was
>>>> concerned about the cost?
>>> Yes, you did.  I still think the iv number is relative small for each
>>> loop, thus the change won't cause compilation time issue considering
>>> the existing tree<->affine transformation in ivopt.
>>> I would like to collect more accurate time information for ivopt in
>>> gcc bootstrap.  Should I use "-ftime-report" then add all time slices
>>> in TV_TREE_LOOP_IVOPTS category?  Is there any better solutions?
>>> Thanks.
>>>
>>>>
>>>> Now I wonder why we bother to convert the lowered form back
>>>> to trees to store it in ->base and not simply keep (and always
>>>> compute) the affine expansion only.  Thus, change struct iv
>>>> to have aff_tree base instead of tree base?
>>>>
>>>> Can you see what it takes to do such change?  At the point
>>>> we replace uses we go into affine representation again anyway.
>>> Good idea, I may have a look.
>> After going through work flow of ivopt, I think it's practical to make
>> such change and can help compilation time.  Ivopt calls
>> tree_to_aff_combination to convert iv->base into affine_tree whenever
>> it tries to represent an iv_use by an iv_cand,  i.e., N*M times for
>> computing cost of each iv_use represented by each iv_cand, and M times
>> for rewriting each iv_use.  Here, N and M stands for number of iv_use
>> and iv_cand.  Also strip_offset (which is a recursive function) is
>> called for iv->base also at complexity of O(NM), so it may be improved
>> too.
>> To make a start, it's possible to store both tree and affine_tree of
>> base in struct iv, and use either of them whenever needed.
>>
>> It seems to me there are two potential issues of this change.  First,
>> though affine_tree of base is stored, we can't operate on it directly,
>> which means we have to copy it before using it.
>
> You'd use it as unchanged operand to some aff_* function to avoid
> actual copying of course.
>
>>  Second, ivopt
>> sometimes computes affine_tree of base in different type other than
>> TREE_TYPE(base), we may need to do additional calls to
>> aff_combination_convert to get wanted type.  All these will introduce
>> additional computation and compromise benefit of the change.
>
> Sure.
>
>> At last, I did some experiments on how many additional calls to
>> tree_to_aff_combination are introduced by this patch.  The data of
>> bootstrap shows it's less than 3% comparing to calls made by current
>> implementation of ivopt, which confirms that this patch shouldn't have
>> issue of compilation time.
>>
>> Any comments?
>
> I'd see the use of affines as making the algorithmic structure of
> IVOPTs clearer and more consistent (also with possibly using
> the _expand variants later to get even more simplifications).
Yes, one example, it's possible to rewrite iv uses in a loop-invariant
sensitive manner if we can use affine in the whole process.  Currently
fold_tree routines tend to decompose invariant into different
expressions.

>
> I don't want to force you to do this change but I thought it may
> help further changes (as you seem to be doing a lot of IVOPTs
> changes lately).
>
> So - the patch is ok as-is then, but please consider refactoring
Patch installed.

> IVOPTs code when you do changes.  It's current structure
> is somewhat awkward.
Understood.  I will continue to look at it when have proper time.

Thanks,
bin
diff mbox

Patch

Index: gcc/tree-affine.c
===================================================================
--- gcc/tree-affine.c	(revision 210047)
+++ gcc/tree-affine.c	(working copy)
@@ -331,7 +331,15 @@  tree_to_aff_combination (tree expr, tree type, aff
 	break;
       aff_combination_const (comb, type,
 			     double_int::from_uhwi (bitpos / BITS_PER_UNIT));
-      core = build_fold_addr_expr (core);
+      if (TREE_CODE (core) == MEM_REF)
+	{
+	  tree_to_aff_combination (TREE_OPERAND (core, 1), sizetype, &tmp);
+	  aff_combination_add (comb, &tmp);
+	  core = TREE_OPERAND (core, 0);
+	}
+      else
+	core = build_fold_addr_expr (core);
+
       if (TREE_CODE (core) == ADDR_EXPR)
 	aff_combination_add_elt (comb, core, double_int_one);
       else