[PR69489/01] Improve tree ifcvt by storing/tracking DR against its innermost loop bahavior if possible
diff mbox

Message ID CAFiYyc3-6ua2+uXMpEL_x5xp8bxp-Qrke8EFhHzFKQ6bqmeVdw@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener March 16, 2016, 12:20 p.m. UTC
On Wed, Mar 16, 2016 at 10:59 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> One issue revealed in tree ifcvt is the pass stores/tracks DRs against its memory references in IR.  This causes difficulty in identifying same memory references appearing in different forms.  Given below example:
>
> void foo (int a[], int b[])
> {
>   int i;
>   for (i = 0; i < 100; i++)
>     {
>       if (a[i] ==0)
>         a[i] = b[i]*4;
>       else
>         a[i] = b[i]*3;
>     }
> }
>
> The gimple dump before tree ifcvt is as:
>
>   <bb 2>:
>
>   <bb 3>:
>   # i_24 = PHI <i_19(7), 0(2)>
>   # ivtmp_28 = PHI <ivtmp_23(7), 100(2)>
>   _5 = (long unsigned int) i_24;
>   _6 = _5 * 4;
>   _8 = a_7(D) + _6;
>   _9 = *_8;
>   if (_9 == 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
>
>   <bb 4>:
>   _11 = b_10(D) + _6;
>   _12 = *_11;
>   _13 = _12 * 4;
>   goto <bb 6>;
>
>   <bb 5>:
>   _15 = b_10(D) + _6;
>   _16 = *_15;
>   _17 = _16 * 3;
>
>   <bb 6>:
>   # cstore_1 = PHI <_13(4), _17(5)>
>   *_8 = cstore_1;
>   i_19 = i_24 + 1;
>   ivtmp_23 = ivtmp_28 - 1;
>   if (ivtmp_23 != 0)
>     goto <bb 7>;
>   else
>     goto <bb 8>;
>
>   <bb 7>:
>   goto <bb 3>;
>
>   <bb 8>:
>   return;
>
> The two memory references "*_11" and "*_15" are actually the same thing, but ifcvt failed to discover this because they are recorded in different forms.  This patch fixes the issue by recording/tracking memory reference against its innermost_loop_behavior if: the memory reference has innermost_loop_behavior and it is not a compound reference.
> BTW, PR56625 reported that this case couldn't be vectorized even tree if-conv can handle it.  Interesting thing is at current time, it's tree if-conv that could not handle the case.  Once it's if-converted with this patch, vectorizer are able to handle it too.
>
> Bootstrap and test on x86_64 and AArch64.  Is it OK, not sure if it's GCC 7?

Hmm.

+  equal_p = true;
+  if (e1->base_address && e2->base_address)
+    equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0);
+  if (e1->offset && e2->offset)
+    equal_p &= operand_equal_p (e1->offset, e2->offset, 0);

surely better to return false early.

I think we don't want this in tree-data-refs.h also because of ...

@@ -615,15 +619,29 @@
hash_memrefs_baserefs_and_store_DRs_read_written_info
(data_reference_p a)
   data_reference_p *master_dr, *base_master_dr;
   tree ref = DR_REF (a);
   tree base_ref = DR_BASE_OBJECT (a);
+  innermost_loop_behavior *innermost = &DR_INNERMOST (a);
   tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
   bool exist1, exist2;

-  while (TREE_CODE (ref) == COMPONENT_REF
-        || TREE_CODE (ref) == IMAGPART_EXPR
-        || TREE_CODE (ref) == REALPART_EXPR)
-    ref = TREE_OPERAND (ref, 0);
+  /* If reference in DR has innermost loop behavior and it is not
+     a compound memory reference, we store it to innermost_DR_map,
+     otherwise to ref_DR_map.  */
+  if (TREE_CODE (ref) == COMPONENT_REF
+      || TREE_CODE (ref) == IMAGPART_EXPR
+      || TREE_CODE (ref) == REALPART_EXPR
+      || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a)
+          || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a)))
+    {
+      while (TREE_CODE (ref) == COMPONENT_REF
+            || TREE_CODE (ref) == IMAGPART_EXPR
+            || TREE_CODE (ref) == REALPART_EXPR)
+       ref = TREE_OPERAND (ref, 0);
+
+      master_dr = &ref_DR_map->get_or_insert (ref, &exist1);
+    }
+  else
+    master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1);

we don't want an extra hashmap but replace ref_DR_map entirely.  So we'd need to
strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART
and REALPART) before creating the DR (or adjust the equality function
and hashing
to disregard them which means subtracting their offset from DR_INIT.

To adjust the references we collect you'd maybe could use a callback
to get_references_in_stmt
to adjust them.

OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple as



Can you add a testcase for the vectorization?

Richard.


> Thanks,
> bin
>
>
> 2016-03-11  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/56625
>         PR tree-optimization/69489
>         * tree-data-ref.h (innermost_loop_behavior_hash): New class for
>         hashing struct innermost_loop_behavior.
>         (DR_INNERMOST): New macro.
>         * tree-if-conv.c (innermost_DR_map): New map.
>         (ref_DR_map, baseref_DR_map): Revise the comment.
>         (hash_memrefs_baserefs_and_store_DRs_read_written_info): Store DR
>         to innermost_DR_map if it has innermost loop behavior and is not
>         a compound reference.
>         (ifcvt_memrefs_wont_trap): Get DR from innermost_DR_map if it has
>         innermost loop behavior and is not a compound reference.
>         (if_convertible_loop_p_1): Initialize innermost_DR_map.
>         (if_convertible_loop_p): Release innermost_DR_map.
>
> gcc/testsuite/ChangeLog
> 2016-03-11  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/56625
>         PR tree-optimization/69489
>         * gcc.dg/tree-ssa/ifc-pr69489-1.c: New test.
>

Comments

Bin.Cheng March 16, 2016, 4:17 p.m. UTC | #1
On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Wed, Mar 16, 2016 at 10:59 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> > Hi,
> > ......
> > Bootstrap and test on x86_64 and AArch64.  Is it OK, not sure if it's GCC 7?
>
> Hmm.
Hi,
Thanks for reviewing.
>
> +  equal_p = true;
> +  if (e1->base_address && e2->base_address)
> +    equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0);
> +  if (e1->offset && e2->offset)
> +    equal_p &= operand_equal_p (e1->offset, e2->offset, 0);
>
> surely better to return false early.
>
> I think we don't want this in tree-data-refs.h also because of ...
>
> @@ -615,15 +619,29 @@
> hash_memrefs_baserefs_and_store_DRs_read_written_info
> (data_reference_p a)
>    data_reference_p *master_dr, *base_master_dr;and REALPART) before creating the DR (or adjust the equality function
and hashing
>    tree ref = DR_REF (a);
>    tree base_ref = DR_BASE_OBJECT (a);
> +  innermost_loop_behavior *innermost = &DR_INNERMOST (a);
>    tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
>    bool exist1, exist2;
>
> -  while (TREE_CODE (ref) == COMPONENT_REF
> -        || TREE_CODE (ref) == IMAGPART_EXPR
> -        || TREE_CODE (ref) == REALPART_EXPR)
> -    ref = TREE_OPERAND (ref, 0);
> +  /* If reference in DR has innermost loop behavior and it is not
> +     a compound memory reference, we store it to innermost_DR_map,
> +     otherwise to ref_DR_map.  */
> +  if (TREE_CODE (ref) == COMPONENT_REF
> +      || TREE_CODE (ref) == IMAGPART_EXPR
> +      || TREE_CODE (ref) == REALPART_EXPR
> +      || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a)
> +          || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a)))
> +    {
> +      while (TREE_CODE (ref) == COMPONENT_REF
> +            || TREE_CODE (ref) == IMAGPART_EXPR
> +            || TREE_CODE (ref) == REALPART_EXPR)
> +       ref = TREE_OPERAND (ref, 0);
> +
> +      master_dr = &ref_DR_map->get_or_insert (ref, &exist1);
> +    }
> +  else
> +    master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1);
>
> we don't want an extra hashmap but replace ref_DR_map entirely.  So we'd need to
> strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART
> and REALPART) before creating the DR (or adjust the equality function
> and hashing
> to disregard them which means subtracting their offset from DR_INIT.
I am not sure if I understand correctly.  But for component reference,
it is the base object that we want to record/track.  For example,

  for (i = 0; i < N; i++) {
    m = *data++;

    m1 = p1->x - m;
    m2 = p2->x + m;

    p3->y = (m1 >= m2) ? p1->y : p2->y;

    p1++;
    p2++;
    p3++;
  }
We want to infer that reads of p1/p2 in condition statement won't trap
because there are unconditional reads of the structures, though the
unconditional reads are actual of other sub-objects.  Here it is the
invariant part of address that we want to track.
Also illustrated by this example, we can't rely on data-ref analyzer
here.  Because in gathering/scattering cases, the address could be not
affine at all.
>
> To adjust the references we collect you'd maybe could use a callback
> to get_references_in_stmt
> to adjust them.
>
> OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple as
Is this a part of the method you suggested above, or is it an
alternative one?  If it's the latter, then I have below questions
embedded.
>
> Index: tree-if-conv.c
> ===================================================================
> --- tree-if-conv.c      (revision 234215)
> +++ tree-if-conv.c      (working copy)
> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo
>
>    for (i = 0; refs->iterate (i, &dr); i++)
>      {
> +      tree *refp = &DR_REF (dr);
> +      while ((TREE_CODE (*refp) == COMPONENT_REF
> +             && TREE_OPERAND (*refp, 2) == NULL_TREE)
> +            || TREE_CODE (*refp) == IMAGPART_EXPR
> +            || TREE_CODE (*refp) == REALPART_EXPR)
> +       refp = &TREE_OPERAND (*refp, 0);
> +      if (refp != &DR_REF (dr))
> +       {
> +         tree saved_base = *refp;
> +         *refp = integer_zero_node;
> +
> +         if (DR_INIT (dr))
> +           {
> +             tree poffset;
> +             int punsignedp, preversep, pvolatilep;
> +             machine_mode pmode;
> +             HOST_WIDE_INT pbitsize, pbitpos;
> +             get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &poffset,
> +                                  &pmode, &punsignedp, &preversep, &pvolatilep,
> +                                  false);
> +             gcc_assert (poffset == NULL_TREE);
> +
> +             DR_INIT (dr)
> +               = wide_int_to_tree (ssizetype,
> +                                   wi::sub (DR_INIT (dr),
> +                                            pbitpos / BITS_PER_UNIT));
> +           }
> +
> +         *refp = saved_base;
> +         DR_REF (dr) = *refp;
> +       }
Looks to me the code is trying to resolve difference between two (or
more) component references, which is DR_INIT in the code.  But DR_INIT
is not the only thing needs to be handled.  For a structure containing
two sub-arrays, DR_OFFSET may be different too.
Actually I did try to replace ref_DR_map.  For memory reference
doesn't have innermost affine behavior, we need to modify DR fields by
populating it with artificial data.  This results in some kind of
over-designed code.  IMHO, modifying some DR structures outside of
data-ref analyzer isn't that good.  After comparing the two methods, I
think may be this one is better, of course with the cost of two
different maps.

I might be misunderstanding your idea, so please correct if I was wrong.

Thanks,
bin
> +
>        dr->aux = XNEW (struct ifc_dr);
>        DR_BASE_W_UNCONDITIONALLY (dr) = false;
>        DR_RW_UNCONDITIONALLY (dr) = false;
>
>
> Can you add a testcase for the vectorization?
>
> Richard.
>
Richard Biener March 17, 2016, 9:53 a.m. UTC | #2
On Wed, Mar 16, 2016 at 5:17 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>>
>> On Wed, Mar 16, 2016 at 10:59 AM, Bin Cheng <Bin.Cheng@arm.com> wrote:
>> > Hi,
>> > ......
>> > Bootstrap and test on x86_64 and AArch64.  Is it OK, not sure if it's GCC 7?
>>
>> Hmm.
> Hi,
> Thanks for reviewing.
>>
>> +  equal_p = true;
>> +  if (e1->base_address && e2->base_address)
>> +    equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0);
>> +  if (e1->offset && e2->offset)
>> +    equal_p &= operand_equal_p (e1->offset, e2->offset, 0);
>>
>> surely better to return false early.
>>
>> I think we don't want this in tree-data-refs.h also because of ...
>>
>> @@ -615,15 +619,29 @@
>> hash_memrefs_baserefs_and_store_DRs_read_written_info
>> (data_reference_p a)
>>    data_reference_p *master_dr, *base_master_dr;and REALPART) before creating the DR (or adjust the equality function
> and hashing
>>    tree ref = DR_REF (a);
>>    tree base_ref = DR_BASE_OBJECT (a);
>> +  innermost_loop_behavior *innermost = &DR_INNERMOST (a);
>>    tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
>>    bool exist1, exist2;
>>
>> -  while (TREE_CODE (ref) == COMPONENT_REF
>> -        || TREE_CODE (ref) == IMAGPART_EXPR
>> -        || TREE_CODE (ref) == REALPART_EXPR)
>> -    ref = TREE_OPERAND (ref, 0);
>> +  /* If reference in DR has innermost loop behavior and it is not
>> +     a compound memory reference, we store it to innermost_DR_map,
>> +     otherwise to ref_DR_map.  */
>> +  if (TREE_CODE (ref) == COMPONENT_REF
>> +      || TREE_CODE (ref) == IMAGPART_EXPR
>> +      || TREE_CODE (ref) == REALPART_EXPR
>> +      || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a)
>> +          || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a)))
>> +    {
>> +      while (TREE_CODE (ref) == COMPONENT_REF
>> +            || TREE_CODE (ref) == IMAGPART_EXPR
>> +            || TREE_CODE (ref) == REALPART_EXPR)
>> +       ref = TREE_OPERAND (ref, 0);
>> +
>> +      master_dr = &ref_DR_map->get_or_insert (ref, &exist1);
>> +    }
>> +  else
>> +    master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1);
>>
>> we don't want an extra hashmap but replace ref_DR_map entirely.  So we'd need to
>> strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART
>> and REALPART) before creating the DR (or adjust the equality function
>> and hashing
>> to disregard them which means subtracting their offset from DR_INIT.
> I am not sure if I understand correctly.  But for component reference,
> it is the base object that we want to record/track.  For example,
>
>   for (i = 0; i < N; i++) {
>     m = *data++;
>
>     m1 = p1->x - m;
>     m2 = p2->x + m;
>
>     p3->y = (m1 >= m2) ? p1->y : p2->y;
>
>     p1++;
>     p2++;
>     p3++;
>   }
> We want to infer that reads of p1/p2 in condition statement won't trap
> because there are unconditional reads of the structures, though the
> unconditional reads are actual of other sub-objects.  Here it is the
> invariant part of address that we want to track.

Well, the variant parts - we want to strip invariant parts as far as we can
(offsetof (x) and offsetof (y))

> Also illustrated by this example, we can't rely on data-ref analyzer
> here.  Because in gathering/scattering cases, the address could be not
> affine at all.

Sure, but that's a different issue.

>>
>> To adjust the references we collect you'd maybe could use a callback
>> to get_references_in_stmt
>> to adjust them.
>>
>> OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple as
> Is this a part of the method you suggested above, or is it an
> alternative one?  If it's the latter, then I have below questions
> embedded.

It is an alternative to adding a hook to get_references_in_stmt and
probably "easier".

>>
>> Index: tree-if-conv.c
>> ===================================================================
>> --- tree-if-conv.c      (revision 234215)
>> +++ tree-if-conv.c      (working copy)
>> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo
>>
>>    for (i = 0; refs->iterate (i, &dr); i++)
>>      {
>> +      tree *refp = &DR_REF (dr);
>> +      while ((TREE_CODE (*refp) == COMPONENT_REF
>> +             && TREE_OPERAND (*refp, 2) == NULL_TREE)
>> +            || TREE_CODE (*refp) == IMAGPART_EXPR
>> +            || TREE_CODE (*refp) == REALPART_EXPR)
>> +       refp = &TREE_OPERAND (*refp, 0);
>> +      if (refp != &DR_REF (dr))
>> +       {
>> +         tree saved_base = *refp;
>> +         *refp = integer_zero_node;
>> +
>> +         if (DR_INIT (dr))
>> +           {
>> +             tree poffset;
>> +             int punsignedp, preversep, pvolatilep;
>> +             machine_mode pmode;
>> +             HOST_WIDE_INT pbitsize, pbitpos;
>> +             get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &poffset,
>> +                                  &pmode, &punsignedp, &preversep, &pvolatilep,
>> +                                  false);
>> +             gcc_assert (poffset == NULL_TREE);
>> +
>> +             DR_INIT (dr)
>> +               = wide_int_to_tree (ssizetype,
>> +                                   wi::sub (DR_INIT (dr),
>> +                                            pbitpos / BITS_PER_UNIT));
>> +           }
>> +
>> +         *refp = saved_base;
>> +         DR_REF (dr) = *refp;
>> +       }
> Looks to me the code is trying to resolve difference between two (or
> more) component references, which is DR_INIT in the code.  But DR_INIT
> is not the only thing needs to be handled.  For a structure containing
> two sub-arrays, DR_OFFSET may be different too.

Yes, but we can't say that if

  a->a[i]

doesn't trap that then

  a->b[i]

doesn't trap either.  We can only "strip" outermost
non-variable-offset components.

But maybe I'm missing what example you are thinking of.

> Actually I did try to replace ref_DR_map.  For memory reference
> doesn't have innermost affine behavior, we need to modify DR fields by
> populating it with artificial data.  This results in some kind of
> over-designed code.  IMHO, modifying some DR structures outside of
> data-ref analyzer isn't that good.  After comparing the two methods, I
> think may be this one is better, of course with the cost of two
> different maps.

IMHO the code is already somewhat awful in using data-ref analysis at all
(for what it does it certainly doesn't need it).  Note that in the case
data-ref analysis cannot analyze innermost behavior we're good as well
with my proposed code - we simply leave DR_INIT NULL (but still strip
outermost components).

Richard.

> I might be misunderstanding your idea, so please correct if I was wrong.
>
> Thanks,
> bin
>> +
>>        dr->aux = XNEW (struct ifc_dr);
>>        DR_BASE_W_UNCONDITIONALLY (dr) = false;
>>        DR_RW_UNCONDITIONALLY (dr) = false;
>>
>>
>> Can you add a testcase for the vectorization?
>>
>> Richard.
>>
Bin.Cheng March 28, 2016, 7:57 p.m. UTC | #3
Sorry, Should have replied to gcc-patches list.

Thanks,
bin

---------- Forwarded message ----------
From: "Bin.Cheng" <amker.cheng@gmail.com>
Date: Tue, 29 Mar 2016 03:55:04 +0800
Subject: Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking
DR against its innermost loop bahavior if possible
To: Richard Biener <richard.guenther@gmail.com>

On 3/17/16, Richard Biener <richard.guenther@gmail.com> wrote:
> On Wed, Mar 16, 2016 at 5:17 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>>
>>> Hmm.
>> Hi,
>> Thanks for reviewing.
>>>
>>> +  equal_p = true;
>>> +  if (e1->base_address && e2->base_address)
>>> +    equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0);
>>> +  if (e1->offset && e2->offset)
>>> +    equal_p &= operand_equal_p (e1->offset, e2->offset, 0);
>>>
>>> surely better to return false early.
>>>
>>> I think we don't want this in tree-data-refs.h also because of ...
>>>
>>> @@ -615,15 +619,29 @@
>>> hash_memrefs_baserefs_and_store_DRs_read_written_info
>>> (data_reference_p a)
>>>    data_reference_p *master_dr, *base_master_dr;and REALPART) before
>>> creating the DR (or adjust the equality function
>> and hashing
>>>    tree ref = DR_REF (a);
>>>    tree base_ref = DR_BASE_OBJECT (a);
>>> +  innermost_loop_behavior *innermost = &DR_INNERMOST (a);
>>>    tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
>>>    bool exist1, exist2;
>>>
>>> -  while (TREE_CODE (ref) == COMPONENT_REF
>>> -        || TREE_CODE (ref) == IMAGPART_EXPR
>>> -        || TREE_CODE (ref) == REALPART_EXPR)
>>> -    ref = TREE_OPERAND (ref, 0);
>>> +  /* If reference in DR has innermost loop behavior and it is not
>>> +     a compound memory reference, we store it to innermost_DR_map,
>>> +     otherwise to ref_DR_map.  */
>>> +  if (TREE_CODE (ref) == COMPONENT_REF
>>> +      || TREE_CODE (ref) == IMAGPART_EXPR
>>> +      || TREE_CODE (ref) == REALPART_EXPR
>>> +      || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a)
>>> +          || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a)))
>>> +    {
>>> +      while (TREE_CODE (ref) == COMPONENT_REF
>>> +            || TREE_CODE (ref) == IMAGPART_EXPR
>>> +            || TREE_CODE (ref) == REALPART_EXPR)
>>> +       ref = TREE_OPERAND (ref, 0);
>>> +
>>> +      master_dr = &ref_DR_map->get_or_insert (ref, &exist1);
>>> +    }
>>> +  else
>>> +    master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1);
>>>
>>> we don't want an extra hashmap but replace ref_DR_map entirely.  So we'd
>>> need to
>>> strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART
>>> and REALPART) before creating the DR (or adjust the equality function
>>> and hashing
>>> to disregard them which means subtracting their offset from DR_INIT.
>> I am not sure if I understand correctly.  But for component reference,
>> it is the base object that we want to record/track.  For example,
>>
>>   for (i = 0; i < N; i++) {
>>     m = *data++;
>>
>>     m1 = p1->x - m;
>>     m2 = p2->x + m;
>>
>>     p3->y = (m1 >= m2) ? p1->y : p2->y;
>>
>>     p1++;
>>     p2++;
>>     p3++;
>>   }
>> We want to infer that reads of p1/p2 in condition statement won't trap
>> because there are unconditional reads of the structures, though the
>> unconditional reads are actual of other sub-objects.  Here it is the
>> invariant part of address that we want to track.
>
> Well, the variant parts - we want to strip invariant parts as far as we can
> (offsetof (x) and offsetof (y))
>
>> Also illustrated by this example, we can't rely on data-ref analyzer
>> here.  Because in gathering/scattering cases, the address could be not
>> affine at all.
>
> Sure, but that's a different issue.
>
>>>
>>> To adjust the references we collect you'd maybe could use a callback
>>> to get_references_in_stmt
>>> to adjust them.
>>>
>>> OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple
>>> as
>> Is this a part of the method you suggested above, or is it an
>> alternative one?  If it's the latter, then I have below questions
>> embedded.
>
> It is an alternative to adding a hook to get_references_in_stmt and
> probably "easier".
>
>>>
>>> Index: tree-if-conv.c
>>> ===================================================================
>>> --- tree-if-conv.c      (revision 234215)
>>> +++ tree-if-conv.c      (working copy)
>>> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo
>>>
>>>    for (i = 0; refs->iterate (i, &dr); i++)
>>>      {
>>> +      tree *refp = &DR_REF (dr);
>>> +      while ((TREE_CODE (*refp) == COMPONENT_REF
>>> +             && TREE_OPERAND (*refp, 2) == NULL_TREE)
>>> +            || TREE_CODE (*refp) == IMAGPART_EXPR
>>> +            || TREE_CODE (*refp) == REALPART_EXPR)
>>> +       refp = &TREE_OPERAND (*refp, 0);
>>> +      if (refp != &DR_REF (dr))
>>> +       {
>>> +         tree saved_base = *refp;
>>> +         *refp = integer_zero_node;
>>> +
>>> +         if (DR_INIT (dr))
>>> +           {
>>> +             tree poffset;
>>> +             int punsignedp, preversep, pvolatilep;
>>> +             machine_mode pmode;
>>> +             HOST_WIDE_INT pbitsize, pbitpos;
>>> +             get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos,
>>> &poffset,
>>> +                                  &pmode, &punsignedp, &preversep,
>>> &pvolatilep,
>>> +                                  false);
>>> +             gcc_assert (poffset == NULL_TREE);
>>> +
>>> +             DR_INIT (dr)
>>> +               = wide_int_to_tree (ssizetype,
>>> +                                   wi::sub (DR_INIT (dr),
>>> +                                            pbitpos / BITS_PER_UNIT));
>>> +           }
>>> +
>>> +         *refp = saved_base;
>>> +         DR_REF (dr) = *refp;
>>> +       }
>> Looks to me the code is trying to resolve difference between two (or
>> more) component references, which is DR_INIT in the code.  But DR_INIT
>> is not the only thing needs to be handled.  For a structure containing
>> two sub-arrays, DR_OFFSET may be different too.
>
> Yes, but we can't say that if
>
>   a->a[i]
>
> doesn't trap that then
>
>   a->b[i]
>
> doesn't trap either.  We can only "strip" outermost
> non-variable-offset components.
>
> But maybe I'm missing what example you are thinking of.
Hmm, this was the case I meant.  What I don't understand is current
code logic does infer trap information for a.b[i] from a.a[i].  Given
below example:
struct str
{
  int a[10];
  int b[20];
  char c;
};

void bar (struct str *);
int foo (int x, int n)
{
  int i;
  struct str s;
  bar (&s);
  for (i = 0; i < n; i++)
    {
      s.a[i] = s.b[i];
      if (x > i)
        s.b[i] = 0;
    }
  bar (&s);
  return 0;
}
The loop is convertible because of below code in function
ifcvt_memrefs_wont_trap:

  /* If a is unconditionally accessed then ... */
  if (DR_RW_UNCONDITIONALLY (*master_dr))
    {
      /* an unconditional read won't trap.  */
      if (DR_IS_READ (a))
	return true;

      /* an unconditionaly write won't trap if the base is written
         to unconditionally.  */
      if (base_master_dr
	  && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
	return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
      else
	{
	  /* or the base is know to be not readonly.  */
	  tree base_tree = get_base_address (DR_REF (a));
	  if (DECL_P (base_tree)
	      && decl_binds_to_current_def_p (base_tree)
	      && ! TREE_READONLY (base_tree))
	    return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
	}
    }
It is the main object '&s' that is recorded in base_master_dr, so
s.b[i] is considered not trap.  Even it's not, I suppose
get_base_address will give same result?

Thanks,
bin
Richard Biener March 29, 2016, 8:37 a.m. UTC | #4
On Mon, Mar 28, 2016 at 9:57 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> Sorry, Should have replied to gcc-patches list.
>
> Thanks,
> bin
>
> ---------- Forwarded message ----------
> From: "Bin.Cheng" <amker.cheng@gmail.com>
> Date: Tue, 29 Mar 2016 03:55:04 +0800
> Subject: Re: [PATCH PR69489/01]Improve tree ifcvt by storing/tracking
> DR against its innermost loop bahavior if possible
> To: Richard Biener <richard.guenther@gmail.com>
>
> On 3/17/16, Richard Biener <richard.guenther@gmail.com> wrote:
>> On Wed, Mar 16, 2016 at 5:17 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Mar 16, 2016 at 12:20 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>>
>>>> Hmm.
>>> Hi,
>>> Thanks for reviewing.
>>>>
>>>> +  equal_p = true;
>>>> +  if (e1->base_address && e2->base_address)
>>>> +    equal_p &= operand_equal_p (e1->base_address, e2->base_address, 0);
>>>> +  if (e1->offset && e2->offset)
>>>> +    equal_p &= operand_equal_p (e1->offset, e2->offset, 0);
>>>>
>>>> surely better to return false early.
>>>>
>>>> I think we don't want this in tree-data-refs.h also because of ...
>>>>
>>>> @@ -615,15 +619,29 @@
>>>> hash_memrefs_baserefs_and_store_DRs_read_written_info
>>>> (data_reference_p a)
>>>>    data_reference_p *master_dr, *base_master_dr;and REALPART) before
>>>> creating the DR (or adjust the equality function
>>> and hashing
>>>>    tree ref = DR_REF (a);
>>>>    tree base_ref = DR_BASE_OBJECT (a);
>>>> +  innermost_loop_behavior *innermost = &DR_INNERMOST (a);
>>>>    tree ca = bb_predicate (gimple_bb (DR_STMT (a)));
>>>>    bool exist1, exist2;
>>>>
>>>> -  while (TREE_CODE (ref) == COMPONENT_REF
>>>> -        || TREE_CODE (ref) == IMAGPART_EXPR
>>>> -        || TREE_CODE (ref) == REALPART_EXPR)
>>>> -    ref = TREE_OPERAND (ref, 0);
>>>> +  /* If reference in DR has innermost loop behavior and it is not
>>>> +     a compound memory reference, we store it to innermost_DR_map,
>>>> +     otherwise to ref_DR_map.  */
>>>> +  if (TREE_CODE (ref) == COMPONENT_REF
>>>> +      || TREE_CODE (ref) == IMAGPART_EXPR
>>>> +      || TREE_CODE (ref) == REALPART_EXPR
>>>> +      || !(DR_BASE_ADDRESS (a) || DR_OFFSET (a)
>>>> +          || DR_INIT (a) || DR_STEP (a) || DR_ALIGNED_TO (a)))
>>>> +    {
>>>> +      while (TREE_CODE (ref) == COMPONENT_REF
>>>> +            || TREE_CODE (ref) == IMAGPART_EXPR
>>>> +            || TREE_CODE (ref) == REALPART_EXPR)
>>>> +       ref = TREE_OPERAND (ref, 0);
>>>> +
>>>> +      master_dr = &ref_DR_map->get_or_insert (ref, &exist1);
>>>> +    }
>>>> +  else
>>>> +    master_dr = &innermost_DR_map->get_or_insert (innermost, &exist1);
>>>>
>>>> we don't want an extra hashmap but replace ref_DR_map entirely.  So we'd
>>>> need to
>>>> strip outermost non-variant handled-components (COMPONENT_REF, IMAGPART
>>>> and REALPART) before creating the DR (or adjust the equality function
>>>> and hashing
>>>> to disregard them which means subtracting their offset from DR_INIT.
>>> I am not sure if I understand correctly.  But for component reference,
>>> it is the base object that we want to record/track.  For example,
>>>
>>>   for (i = 0; i < N; i++) {
>>>     m = *data++;
>>>
>>>     m1 = p1->x - m;
>>>     m2 = p2->x + m;
>>>
>>>     p3->y = (m1 >= m2) ? p1->y : p2->y;
>>>
>>>     p1++;
>>>     p2++;
>>>     p3++;
>>>   }
>>> We want to infer that reads of p1/p2 in condition statement won't trap
>>> because there are unconditional reads of the structures, though the
>>> unconditional reads are actual of other sub-objects.  Here it is the
>>> invariant part of address that we want to track.
>>
>> Well, the variant parts - we want to strip invariant parts as far as we can
>> (offsetof (x) and offsetof (y))
>>
>>> Also illustrated by this example, we can't rely on data-ref analyzer
>>> here.  Because in gathering/scattering cases, the address could be not
>>> affine at all.
>>
>> Sure, but that's a different issue.
>>
>>>>
>>>> To adjust the references we collect you'd maybe could use a callback
>>>> to get_references_in_stmt
>>>> to adjust them.
>>>>
>>>> OTOH post-processing the DRs in if_convertible_loop_p_1 can be as simple
>>>> as
>>> Is this a part of the method you suggested above, or is it an
>>> alternative one?  If it's the latter, then I have below questions
>>> embedded.
>>
>> It is an alternative to adding a hook to get_references_in_stmt and
>> probably "easier".
>>
>>>>
>>>> Index: tree-if-conv.c
>>>> ===================================================================
>>>> --- tree-if-conv.c      (revision 234215)
>>>> +++ tree-if-conv.c      (working copy)
>>>> @@ -1235,6 +1220,38 @@ if_convertible_loop_p_1 (struct loop *lo
>>>>
>>>>    for (i = 0; refs->iterate (i, &dr); i++)
>>>>      {
>>>> +      tree *refp = &DR_REF (dr);
>>>> +      while ((TREE_CODE (*refp) == COMPONENT_REF
>>>> +             && TREE_OPERAND (*refp, 2) == NULL_TREE)
>>>> +            || TREE_CODE (*refp) == IMAGPART_EXPR
>>>> +            || TREE_CODE (*refp) == REALPART_EXPR)
>>>> +       refp = &TREE_OPERAND (*refp, 0);
>>>> +      if (refp != &DR_REF (dr))
>>>> +       {
>>>> +         tree saved_base = *refp;
>>>> +         *refp = integer_zero_node;
>>>> +
>>>> +         if (DR_INIT (dr))
>>>> +           {
>>>> +             tree poffset;
>>>> +             int punsignedp, preversep, pvolatilep;
>>>> +             machine_mode pmode;
>>>> +             HOST_WIDE_INT pbitsize, pbitpos;
>>>> +             get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos,
>>>> &poffset,
>>>> +                                  &pmode, &punsignedp, &preversep,
>>>> &pvolatilep,
>>>> +                                  false);
>>>> +             gcc_assert (poffset == NULL_TREE);
>>>> +
>>>> +             DR_INIT (dr)
>>>> +               = wide_int_to_tree (ssizetype,
>>>> +                                   wi::sub (DR_INIT (dr),
>>>> +                                            pbitpos / BITS_PER_UNIT));
>>>> +           }
>>>> +
>>>> +         *refp = saved_base;
>>>> +         DR_REF (dr) = *refp;
>>>> +       }
>>> Looks to me the code is trying to resolve difference between two (or
>>> more) component references, which is DR_INIT in the code.  But DR_INIT
>>> is not the only thing needs to be handled.  For a structure containing
>>> two sub-arrays, DR_OFFSET may be different too.
>>
>> Yes, but we can't say that if
>>
>>   a->a[i]
>>
>> doesn't trap that then
>>
>>   a->b[i]
>>
>> doesn't trap either.  We can only "strip" outermost
>> non-variable-offset components.
>>
>> But maybe I'm missing what example you are thinking of.
> Hmm, this was the case I meant.  What I don't understand is current
> code logic does infer trap information for a.b[i] from a.a[i].  Given
> below example:
> struct str
> {
>   int a[10];
>   int b[20];
>   char c;
> };
>
> void bar (struct str *);
> int foo (int x, int n)
> {
>   int i;
>   struct str s;
>   bar (&s);
>   for (i = 0; i < n; i++)
>     {
>       s.a[i] = s.b[i];
>       if (x > i)
>         s.b[i] = 0;
>     }
>   bar (&s);
>   return 0;
> }
> The loop is convertible because of below code in function
> ifcvt_memrefs_wont_trap:
>
>   /* If a is unconditionally accessed then ... */
>   if (DR_RW_UNCONDITIONALLY (*master_dr))
>     {
>       /* an unconditional read won't trap.  */
>       if (DR_IS_READ (a))
>         return true;
>
>       /* an unconditionaly write won't trap if the base is written
>          to unconditionally.  */
>       if (base_master_dr
>           && DR_BASE_W_UNCONDITIONALLY (*base_master_dr))
>         return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
>       else
>         {
>           /* or the base is know to be not readonly.  */
>           tree base_tree = get_base_address (DR_REF (a));
>           if (DECL_P (base_tree)
>               && decl_binds_to_current_def_p (base_tree)
>               && ! TREE_READONLY (base_tree))
>             return PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES);
>         }
>     }
> It is the main object '&s' that is recorded in base_master_dr, so
> s.b[i] is considered not trap.  Even it's not, I suppose
> get_base_address will give same result?

Well, for this case it sees that s.b[i] is read from so it can't be an
out-of-bound
access.  And s.a[i] is written to unconditionally so 's' cannot be a
readonly object.
With both pieces of information we can conclude that s.b[i] = 0 doesn't trap.

Richard.

>
> Thanks,
> bin
>
>
>
> --
> Best Regards.

Patch
diff mbox

Index: tree-if-conv.c
===================================================================
--- tree-if-conv.c      (revision 234215)
+++ tree-if-conv.c      (working copy)
@@ -1235,6 +1220,38 @@  if_convertible_loop_p_1 (struct loop *lo

   for (i = 0; refs->iterate (i, &dr); i++)
     {
+      tree *refp = &DR_REF (dr);
+      while ((TREE_CODE (*refp) == COMPONENT_REF
+             && TREE_OPERAND (*refp, 2) == NULL_TREE)
+            || TREE_CODE (*refp) == IMAGPART_EXPR
+            || TREE_CODE (*refp) == REALPART_EXPR)
+       refp = &TREE_OPERAND (*refp, 0);
+      if (refp != &DR_REF (dr))
+       {
+         tree saved_base = *refp;
+         *refp = integer_zero_node;
+
+         if (DR_INIT (dr))
+           {
+             tree poffset;
+             int punsignedp, preversep, pvolatilep;
+             machine_mode pmode;
+             HOST_WIDE_INT pbitsize, pbitpos;
+             get_inner_reference (DR_REF (dr), &pbitsize, &pbitpos, &poffset,
+                                  &pmode, &punsignedp, &preversep, &pvolatilep,
+                                  false);
+             gcc_assert (poffset == NULL_TREE);
+
+             DR_INIT (dr)
+               = wide_int_to_tree (ssizetype,
+                                   wi::sub (DR_INIT (dr),
+                                            pbitpos / BITS_PER_UNIT));
+           }
+
+         *refp = saved_base;
+         DR_REF (dr) = *refp;
+       }
+
       dr->aux = XNEW (struct ifc_dr);
       DR_BASE_W_UNCONDITIONALLY (dr) = false;
       DR_RW_UNCONDITIONALLY (dr) = false;