Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
diff mbox

Message ID CAFiYyc0YRG7qgRks3pOUemtvE-sH4jJ9sOh0P+Pk0bJTe7hY4Q@mail.gmail.com
State New
Headers show

Commit Message

Richard Biener Jan. 19, 2017, 11:22 a.m. UTC
On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
>>>>> Hello.
>>>>>
>>>>>
>>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>>>
>>>>> Ready to be installed?
>>>>
>>>> I'm not sure.  If we have such zero distance refs in the IL at the
>>>> time pcom runs then not handling
>>>> them will pessimize code-gen for cases where they are part of a larger
>>>> chain.  Esp. I don't like
>>> Do you mean different chains distributed because of MAX_DISTANCE by
>>> "larger chain"?  With the patch, such chain of refs would still be
>>> pred-commoned, just the arithmetic operation not combined, which could
>>> be handled by later DOM?
>>>> another stmt_dominates_stmt_p call and thus rather not handle length
>>>> == 0 at all...
>>> Not handle length == 0 chains at all may be sub-optimal.  As you said,
>>> such chain of refs at the point may simply because previous dom/cse
>>> fail to analyze the references.
>>>>
>>>> We already seem to go great length in associating stuff when combining
>>>> stuff thus isn't this
>>>> maybe an artifact of this association?  Maybe we simply need to sort
>>>> the new chain after
>>>> combining it so the root stmt comes last?
>>>>
>>>> Note that there seems to be only a single length per chain but not all
>>>> refs in a chain need to
>>>> have the same distance.  This means your fix is likely incomplete?
>>>> What prevents the situation
>>>> to arise for distance != 0?
>>> Yes, it's possible for two refs have the same distance in a chain with
>>> length > 0.  But that should not be a problem, because existing uses
>>> are replaced by newly generated PHI variables which always dominate
>>> the uses, right?
>>
>> I must admit I don't know predcom in such detail but then can we handle
>> distance == 0 by simply inserting a PHI for those as well (a degenerate
>> one of course)?  Or can for distance == 0 the ref be not loop invariant?
> Not sure if I understand the question correctly.  Distance is
> difference of niter between one ref and the root ref of the chain, so
> 0 distance/length doesn't mean a loop invariant, it's very likely two
> (exactly the same) references in each loop iteration, the address of
> reference is still a SCEV.  OTOH, invariant chain has invariant
> address, and is handled separately.  For the first question, it's
> length, rather than distance that decides how the chain is handled.
> For length > 0 chain, we have to insert PHIs to pass carried result of
> memory reference, even some refs may have 0 distance to the root ref.
>>
>> Note that for length == 0 all refs in the chain will have a dependence distance
>> of zero.  So my first argument probably doesn't hold and we could simply
>> remove handling of length == 0 chains and rely on CSE?
> I am not sure, that CSE opportunity of references exists at this point
> means previous cse pass failed for some reason.

Or a later pass introduced it (in this case, the vectorizer).

> Predcom could be the
> only pass that can handle such case as it understands data reference
> better.  Note Martin's patch is not to skip handling of length == 0
> chain, later ref will still be CSEed with result of root ref, only the
> combination operation like chain1 + chain2 is skipped.  In this case,
> following dom should be able to handle such (loop independent) cse
> opportunities.

I must admit I don't completely understand the consequences of this
disabling but of course DOM should also be able to handle the CSE
(ok, DOM is known to be quite weak with memory equivalence but
it's not that data-dependence is really better in all cases).

Can't we simply re-order refs in new_chain appropriately or handle
this case in combinable_refs_p instead?

That is, I understand the patch as a hack as it should be always
possible to find dominating refs?

In fact the point of the assert tells us a simpler fix may be


which obviously matches the assert we run into for the testcase?  I'd
be ok with that
(no stmt_dominates_stmt_p, heh) with a suitable comment before it.

Richard.


>
> Thanks,
> bin
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>>>
>>>> Richard.
>>>>
>>>>> Martin
>>>>>

Comments

Bin.Cheng Jan. 19, 2017, 12:07 p.m. UTC | #1
On Thu, Jan 19, 2017 at 11:22 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
>>>>>> Hello.
>>>>>>
>>>>>>
>>>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>>>>
>>>>>> Ready to be installed?
>>>>>
>>>>> I'm not sure.  If we have such zero distance refs in the IL at the
>>>>> time pcom runs then not handling
>>>>> them will pessimize code-gen for cases where they are part of a larger
>>>>> chain.  Esp. I don't like
>>>> Do you mean different chains distributed because of MAX_DISTANCE by
>>>> "larger chain"?  With the patch, such chain of refs would still be
>>>> pred-commoned, just the arithmetic operation not combined, which could
>>>> be handled by later DOM?
>>>>> another stmt_dominates_stmt_p call and thus rather not handle length
>>>>> == 0 at all...
>>>> Not handle length == 0 chains at all may be sub-optimal.  As you said,
>>>> such chain of refs at the point may simply because previous dom/cse
>>>> fail to analyze the references.
>>>>>
>>>>> We already seem to go great length in associating stuff when combining
>>>>> stuff thus isn't this
>>>>> maybe an artifact of this association?  Maybe we simply need to sort
>>>>> the new chain after
>>>>> combining it so the root stmt comes last?
>>>>>
>>>>> Note that there seems to be only a single length per chain but not all
>>>>> refs in a chain need to
>>>>> have the same distance.  This means your fix is likely incomplete?
>>>>> What prevents the situation
>>>>> to arise for distance != 0?
>>>> Yes, it's possible for two refs have the same distance in a chain with
>>>> length > 0.  But that should not be a problem, because existing uses
>>>> are replaced by newly generated PHI variables which always dominate
>>>> the uses, right?
>>>
>>> I must admit I don't know predcom in such detail but then can we handle
>>> distance == 0 by simply inserting a PHI for those as well (a degenerate
>>> one of course)?  Or can for distance == 0 the ref be not loop invariant?
>> Not sure if I understand the question correctly.  Distance is
>> difference of niter between one ref and the root ref of the chain, so
>> 0 distance/length doesn't mean a loop invariant, it's very likely two
>> (exactly the same) references in each loop iteration, the address of
>> reference is still a SCEV.  OTOH, invariant chain has invariant
>> address, and is handled separately.  For the first question, it's
>> length, rather than distance that decides how the chain is handled.
>> For length > 0 chain, we have to insert PHIs to pass carried result of
>> memory reference, even some refs may have 0 distance to the root ref.
>>>
>>> Note that for length == 0 all refs in the chain will have a dependence distance
>>> of zero.  So my first argument probably doesn't hold and we could simply
>>> remove handling of length == 0 chains and rely on CSE?
>> I am not sure, that CSE opportunity of references exists at this point
>> means previous cse pass failed for some reason.
>
> Or a later pass introduced it (in this case, the vectorizer).
>
>> Predcom could be the
>> only pass that can handle such case as it understands data reference
>> better.  Note Martin's patch is not to skip handling of length == 0
>> chain, later ref will still be CSEed with result of root ref, only the
>> combination operation like chain1 + chain2 is skipped.  In this case,
>> following dom should be able to handle such (loop independent) cse
>> opportunities.
>
> I must admit I don't completely understand the consequences of this
> disabling but of course DOM should also be able to handle the CSE
> (ok, DOM is known to be quite weak with memory equivalence but
> it's not that data-dependence is really better in all cases).
>
> Can't we simply re-order refs in new_chain appropriately or handle
> this case in combinable_refs_p instead?
It's not refs need to be reordered, root ref always dominates others.
But yes, we need to find a dominator insertion place for combined
operation.  Looking at function reassociate_to_the_same_stmt, it
simply inserts new_stmt at root_stmt of root ref, which causes ICE in
this case.  The new_stmt needs to be inserted at a place also
dominating combination of later refs.  We can either compute the
information in place, or compute and pass the information from
previous combinable_refs_p.  This should be the real fix.

Thanks,
bin
>
> That is, I understand the patch as a hack as it should be always
> possible to find dominating refs?
>
> In fact the point of the assert tells us a simpler fix may be
>
> Index: tree-predcom.c
> ===================================================================
> --- tree-predcom.c      (revision 244519)
> +++ tree-predcom.c      (working copy)
> @@ -2330,6 +2334,12 @@ combine_chains (chain_p ch1, chain_p ch2
>           break;
>         }
>      }
> +  if (new_chain->length == 0
> +      && ! new_chain->has_max_use_after)
> +    {
> +      release_chain (new_chain);
> +      return NULL;
> +    }
>
>    ch1->combined = true;
>    ch2->combined = true;
>
> which obviously matches the assert we run into for the testcase?  I'd
> be ok with that
> (no stmt_dominates_stmt_p, heh) with a suitable comment before it.
>
> Richard.
>
>
>>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> bin
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Martin
>>>>>>

Patch
diff mbox

Index: tree-predcom.c
===================================================================
--- tree-predcom.c      (revision 244519)
+++ tree-predcom.c      (working copy)
@@ -2330,6 +2334,12 @@  combine_chains (chain_p ch1, chain_p ch2
          break;
        }
     }
+  if (new_chain->length == 0
+      && ! new_chain->has_max_use_after)
+    {
+      release_chain (new_chain);
+      return NULL;
+    }

   ch1->combined = true;
   ch2->combined = true;