Patchwork Fix for PR51879 - Missed tail merging with non-const/pure calls

login
register
mail settings
Submitter Tom de Vries
Date April 24, 2012, 9:19 p.m.
Message ID <4F9718E9.5080903@mentor.com>
Download mbox | patch
Permalink /patch/154763/
State New
Headers show

Comments

Tom de Vries - April 24, 2012, 9:19 p.m.
On 17/04/12 14:24, Richard Guenther wrote:
> On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>> On 27/01/12 21:37, Tom de Vries wrote:
>>> On 24/01/12 11:40, Richard Guenther wrote:
>>>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>> Richard,
>>>>> Jakub,
>>>>>
>>>>> the following patch fixes PR51879.
>>>>>
>>>>> Consider the following test-case:
>>>>> ...
>>>>> int bar (int);
>>>>> void baz (int);
>>>>>
>>>>> void
>>>>> foo (int y)
>>>>> {
>>>>>  int a;
>>>>>  if (y == 6)
>>>>>    a = bar (7);
>>>>>  else
>>>>>    a = bar (7);
>>>>>  baz (a);
>>>>> }
>>>>> ...
>>>>>
>>>>> after compiling at -02, the representation looks like this before tail-merging:
>>>>> ...
>>>>>  # BLOCK 3 freq:1991
>>>>>  # PRED: 2 [19.9%]  (true,exec)
>>>>>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>>>>  # USE = nonlocal
>>>>>  # CLB = nonlocal
>>>>>  aD.1709_3 = barD.1703 (7);
>>>>>  goto <bb 5>;
>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>
>>>>>  # BLOCK 4 freq:8009
>>>>>  # PRED: 2 [80.1%]  (false,exec)
>>>>>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>>>>  # USE = nonlocal
>>>>>  # CLB = nonlocal
>>>>>  aD.1709_4 = barD.1703 (7);
>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>
>>>>>  # BLOCK 5 freq:10000
>>>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>>>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>>>>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>>>>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>>>>  # USE = nonlocal
>>>>>  # CLB = nonlocal
>>>>>  bazD.1705 (aD.1709_1);
>>>>>  # VUSE <.MEMD.1714_9>
>>>>>  return;
>>>>> ...
>>>>>
>>>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>>>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>>>>
>>>>> The patch makes sure non-const/pure call results (gimple_vdef and
>>>>> gimple_call_lhs) are properly value numbered.
>>>>>
>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>
>>>>> ok for stage1?
>>>>
>>>> The following cannot really work:
>>>>
>>>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>>>    result = vn_reference_lookup_1 (&vr1, NULL);
>>>>    if (result)
>>>>      {
>>>> -      changed = set_ssa_val_to (lhs, result);
>>>> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>>>> +      if (vdef)
>>>> +       changed |= set_ssa_val_to (vdef, result_vdef);
>>>> +      changed |= set_ssa_val_to (lhs, result);
>>>>
>>>> because 'result' may be not an SSA name.  It might also not have
>>>> a proper definition statement (if VN_INFO (result)->needs_insertion
>>>> is true).  So you at least need to guard things properly.
>>>>
>>>
>>> Right. And that also doesn't work if the function is without lhs, such as in the
>>> new test-case pr51879-6.c.
>>>
>>> I fixed this by storing both lhs and vdef, such that I don't have to derive
>>> the vdef from the lhs.
>>>
>>>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>>>> at some point ...)
>>>>
>>>
>>> Doing so willl hurt performance of tail-merging in its current form.
>>> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
>>> value numbering as used in tail-merging has its limitations too.
>>> Do you have any ideas how to address that one?
>>>
>>>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>>>           /* ???  We should handle stores from calls.  */
>>>>           else if (TREE_CODE (lhs) == SSA_NAME)
>>>>             {
>>>> +             tree vuse = gimple_vuse (stmt);
>>>>               if (!gimple_call_internal_p (stmt)
>>>> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>>>> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>>> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>>>                 changed = visit_reference_op_call (lhs, stmt);
>>>>               else
>>>>                 changed = defs_to_varying (stmt);
>>>>
>>>> ... exactly because of the issue that a stmt has multiple defs.  Btw,
>>>> vuse should have been visited here or be part of our SCC, so, why do
>>>> you need this check?
>>>>
>>>
>>> Removed now, that was a workaround for a bug in an earlier version of the patch,
>>> that I didn't need anymore.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> OK for stage1?
>>>
>>
>> Richard,
>>
>> quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
>> ...
>> I think these fixes hint at that we should
>> use "structural" equality as fallback if value-numbering doesn't equate
>> two stmt effects.  Thus, treat two stmts with exactly the same operands
>> and flags as equal and using value-numbering to canonicalize operands
>> (when they are SSA names) for that comparison, or use VN entirely
>> if there are no side-effects on the stmt.
>>
>> Changing value-numbering of virtual operands, even if it looks correct in the
>> simple cases you change, doesn't look like a general solution for the missed
>> tail merging opportunities.
>> ...
>>
>> The test-case pr51879-6.c shows a case where improving value numbering will help
>> tail-merging, but structural equality comparison not:
>> ...
>>  # BLOCK 3 freq:1991
>>  # PRED: 2 [19.9%]  (true,exec)
>>  # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  blaD.1708 (5);
>>  # .MEMD.1717_8 = VDEF <.MEMD.1717_7>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  aD.1712_3 = barD.1704 (7);
>>  goto <bb 5>;
>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 4 freq:8009
>>  # PRED: 2 [80.1%]  (false,exec)
>>  # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  blaD.1708 (5);
>>  # .MEMD.1717_10 = VDEF <.MEMD.1717_9>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  aD.1712_4 = barD.1704 (7);
>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>
>>  # BLOCK 5 freq:10000
>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>  # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)>
>>  # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)>
>>  # .MEMD.1717_11 = VDEF <.MEMD.1717_5>
>>  # USE = nonlocal
>>  # CLB = nonlocal
>>  bazD.1706 (aD.1712_1);
>>  # VUSE <.MEMD.1717_11>
>>  return;
>> ...
>> by structurally comparing the gimples in both blocks, we can see that these are
>> equal.
>> What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For
>> that, we need value numbering to number the vuses the same. And if we get value
>> numbering to number these values equal, we don't need the structural comparison.
>>
>> In this case structural comparison cannot be used as fallback for
>> value-numbering. So, ok for trunk?
>>
>> And if not ok, do you have a suggestion of how to deal with this otherwise?
> 
> Hmm.  I'm not sure we can conclude that they have the same value!
> 
> +int bar (int);
> +void baz (int);
> +void bla (int);
> +
> +void
> +foo (int y)
> +{
> +  int a;
> +  if (y == 6)
> +    {
> +      bla (5);
> +      a = bar (7);
> +    }
> +  else
> +    {
> +      bla (5);
> +      a = bar (7);
> +    }
> +  baz (a);
> +}
> 
> I can implement bla as
> 
> void bla(int) { a = random (); }
> 
> thus a would not have the same value on both paths

I think it's hard to decide whether they have the same value or not, since we
cannot write a test-case that compares the results of calls from 2 branches of
an if statement.

I started looking at the problem in terms of subsequent calls. Consider the
imaginary attribute nosideeffect, similar to const and pure.

const:
- no effects except the return value
- the return value depends only on the parameters

pure:
- no effects except the return value
- the return value depends only on the parameters and/or global variables

nosideeffect:
- no effects except the return value

The code fragment:
...
extern int f (int) __attribute ((nosideeffect));
int g()
{
  int a;
  int b;
  a = f (5);
  b = f (5);
  return a + b;
}
...

would result in a gimple sequence more or less like this:
...
  # VUSE <.MEMD.1712_4(D)>
  aD.1707_1 = fD.1704 (5);
  # VUSE <.MEMD.1712_4(D)>
  bD.1708_2 = fD.1704 (5);
  D.1710_3 = aD.1707_1 + bD.1708_2;
  # VUSE <.MEMD.1712_6>
  return D.1710_3;
...

The value numbering modification I'm proposing would say that a and b have the
same value, which is not true. I updated the patch to ensure in this case that a
and b would be seen as different.
Note that things won't break if the attribute is missing on a function, since
that just means that the function would have a vdef, and there would be no 2
subsequent function calls with the same vuse.

> - but that is not what
> matters - because obviously we can still merge the blocks.

Right.

>  That is,
> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
> are the same.  But is that not what your value-numbering changes will assert?

In the case of calls in different branches of an control flow statement, we can
aggressively assume they're the same, since we cannot write a check that would
start failing.

In the case of subsequent calls with side effects, the vuses will never be the same.

In the case of subsequent calls without side effects, but not pure or const, we
can't assume they have the same result. AFAIK, there's currently no way to
specify such a function, but the updated patch should be able handle this.

> (not sure how you achieve this, but it looks like you insert/lookup general
> calls, thus not pure/const ones

Correct.

> - that could easily lead to miscompiles(?)).
> 

I have bootstrapped and reg-tested the updated (and the previous) patch on
x86_64 (ada inclusive), and found no issues.

Can you think of a test-case that fails with the previous or the updated patch?
Otherwise, ok for trunk?

Thanks,
- Tom

> Richard.

2012-04-24  Tom de Vries  <tom@codesourcery.com>

	PR tree-optimization/51879
	* tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
	* tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
	Handle case that lhs is NULL_TREE.
	(visit_use): Handle calls with side-effect using
	visit_reference_op_call.

	gcc.dg/pr51879.c: New test.
	gcc.dg/pr51879-2.c: Same.
	gcc.dg/pr51879-3.c: Same.
	gcc.dg/pr51879-4.c: Same.
	gcc.dg/pr51879-6.c: Same.
Richard Guenther - April 25, 2012, 9:57 a.m.
On Tue, Apr 24, 2012 at 11:19 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
> On 17/04/12 14:24, Richard Guenther wrote:
>> On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>> On 27/01/12 21:37, Tom de Vries wrote:
>>>> On 24/01/12 11:40, Richard Guenther wrote:
>>>>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote:
>>>>>> Richard,
>>>>>> Jakub,
>>>>>>
>>>>>> the following patch fixes PR51879.
>>>>>>
>>>>>> Consider the following test-case:
>>>>>> ...
>>>>>> int bar (int);
>>>>>> void baz (int);
>>>>>>
>>>>>> void
>>>>>> foo (int y)
>>>>>> {
>>>>>>  int a;
>>>>>>  if (y == 6)
>>>>>>    a = bar (7);
>>>>>>  else
>>>>>>    a = bar (7);
>>>>>>  baz (a);
>>>>>> }
>>>>>> ...
>>>>>>
>>>>>> after compiling at -02, the representation looks like this before tail-merging:
>>>>>> ...
>>>>>>  # BLOCK 3 freq:1991
>>>>>>  # PRED: 2 [19.9%]  (true,exec)
>>>>>>  # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)>
>>>>>>  # USE = nonlocal
>>>>>>  # CLB = nonlocal
>>>>>>  aD.1709_3 = barD.1703 (7);
>>>>>>  goto <bb 5>;
>>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>>
>>>>>>  # BLOCK 4 freq:8009
>>>>>>  # PRED: 2 [80.1%]  (false,exec)
>>>>>>  # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)>
>>>>>>  # USE = nonlocal
>>>>>>  # CLB = nonlocal
>>>>>>  aD.1709_4 = barD.1703 (7);
>>>>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>>>>
>>>>>>  # BLOCK 5 freq:10000
>>>>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>>>>  # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)>
>>>>>>  # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)>
>>>>>>  # .MEMD.1714_9 = VDEF <.MEMD.1714_5>
>>>>>>  # USE = nonlocal
>>>>>>  # CLB = nonlocal
>>>>>>  bazD.1705 (aD.1709_1);
>>>>>>  # VUSE <.MEMD.1714_9>
>>>>>>  return;
>>>>>> ...
>>>>>>
>>>>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8
>>>>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5.
>>>>>>
>>>>>> The patch makes sure non-const/pure call results (gimple_vdef and
>>>>>> gimple_call_lhs) are properly value numbered.
>>>>>>
>>>>>> Bootstrapped and reg-tested on x86_64.
>>>>>>
>>>>>> ok for stage1?
>>>>>
>>>>> The following cannot really work:
>>>>>
>>>>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl
>>>>>    result = vn_reference_lookup_1 (&vr1, NULL);
>>>>>    if (result)
>>>>>      {
>>>>> -      changed = set_ssa_val_to (lhs, result);
>>>>> +      tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result));
>>>>> +      if (vdef)
>>>>> +       changed |= set_ssa_val_to (vdef, result_vdef);
>>>>> +      changed |= set_ssa_val_to (lhs, result);
>>>>>
>>>>> because 'result' may be not an SSA name.  It might also not have
>>>>> a proper definition statement (if VN_INFO (result)->needs_insertion
>>>>> is true).  So you at least need to guard things properly.
>>>>>
>>>>
>>>> Right. And that also doesn't work if the function is without lhs, such as in the
>>>> new test-case pr51879-6.c.
>>>>
>>>> I fixed this by storing both lhs and vdef, such that I don't have to derive
>>>> the vdef from the lhs.
>>>>
>>>>> (On a side-note - I _did_ want to remove value-numbering virtual operands
>>>>> at some point ...)
>>>>>
>>>>
>>>> Doing so willl hurt performance of tail-merging in its current form.
>>>> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that
>>>> value numbering as used in tail-merging has its limitations too.
>>>> Do you have any ideas how to address that one?
>>>>
>>>>> @@ -3359,8 +3366,10 @@ visit_use (tree use)
>>>>>           /* ???  We should handle stores from calls.  */
>>>>>           else if (TREE_CODE (lhs) == SSA_NAME)
>>>>>             {
>>>>> +             tree vuse = gimple_vuse (stmt);
>>>>>               if (!gimple_call_internal_p (stmt)
>>>>> -                 && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
>>>>> +                 && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
>>>>> +                     || (vuse && SSA_VAL (vuse) != VN_TOP)))
>>>>>                 changed = visit_reference_op_call (lhs, stmt);
>>>>>               else
>>>>>                 changed = defs_to_varying (stmt);
>>>>>
>>>>> ... exactly because of the issue that a stmt has multiple defs.  Btw,
>>>>> vuse should have been visited here or be part of our SCC, so, why do
>>>>> you need this check?
>>>>>
>>>>
>>>> Removed now, that was a workaround for a bug in an earlier version of the patch,
>>>> that I didn't need anymore.
>>>>
>>>> Bootstrapped and reg-tested on x86_64.
>>>>
>>>> OK for stage1?
>>>>
>>>
>>> Richard,
>>>
>>> quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html:
>>> ...
>>> I think these fixes hint at that we should
>>> use "structural" equality as fallback if value-numbering doesn't equate
>>> two stmt effects.  Thus, treat two stmts with exactly the same operands
>>> and flags as equal and using value-numbering to canonicalize operands
>>> (when they are SSA names) for that comparison, or use VN entirely
>>> if there are no side-effects on the stmt.
>>>
>>> Changing value-numbering of virtual operands, even if it looks correct in the
>>> simple cases you change, doesn't look like a general solution for the missed
>>> tail merging opportunities.
>>> ...
>>>
>>> The test-case pr51879-6.c shows a case where improving value numbering will help
>>> tail-merging, but structural equality comparison not:
>>> ...
>>>  # BLOCK 3 freq:1991
>>>  # PRED: 2 [19.9%]  (true,exec)
>>>  # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  blaD.1708 (5);
>>>  # .MEMD.1717_8 = VDEF <.MEMD.1717_7>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  aD.1712_3 = barD.1704 (7);
>>>  goto <bb 5>;
>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 4 freq:8009
>>>  # PRED: 2 [80.1%]  (false,exec)
>>>  # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  blaD.1708 (5);
>>>  # .MEMD.1717_10 = VDEF <.MEMD.1717_9>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  aD.1712_4 = barD.1704 (7);
>>>  # SUCC: 5 [100.0%]  (fallthru,exec)
>>>
>>>  # BLOCK 5 freq:10000
>>>  # PRED: 3 [100.0%]  (fallthru,exec) 4 [100.0%]  (fallthru,exec)
>>>  # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)>
>>>  # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)>
>>>  # .MEMD.1717_11 = VDEF <.MEMD.1717_5>
>>>  # USE = nonlocal
>>>  # CLB = nonlocal
>>>  bazD.1706 (aD.1712_1);
>>>  # VUSE <.MEMD.1717_11>
>>>  return;
>>> ...
>>> by structurally comparing the gimples in both blocks, we can see that these are
>>> equal.
>>> What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For
>>> that, we need value numbering to number the vuses the same. And if we get value
>>> numbering to number these values equal, we don't need the structural comparison.
>>>
>>> In this case structural comparison cannot be used as fallback for
>>> value-numbering. So, ok for trunk?
>>>
>>> And if not ok, do you have a suggestion of how to deal with this otherwise?
>>
>> Hmm.  I'm not sure we can conclude that they have the same value!
>>
>> +int bar (int);
>> +void baz (int);
>> +void bla (int);
>> +
>> +void
>> +foo (int y)
>> +{
>> +  int a;
>> +  if (y == 6)
>> +    {
>> +      bla (5);
>> +      a = bar (7);
>> +    }
>> +  else
>> +    {
>> +      bla (5);
>> +      a = bar (7);
>> +    }
>> +  baz (a);
>> +}
>>
>> I can implement bla as
>>
>> void bla(int) { a = random (); }
>>
>> thus a would not have the same value on both paths
>
> I think it's hard to decide whether they have the same value or not, since we
> cannot write a test-case that compares the results of calls from 2 branches of
> an if statement.

void *foo ()
{
  return __builtin_return_address (0);
}

void *bar (_Bool b)
{
  if (b)
    return foo ();
  else
    return foo ();
}

int main()
{
  if (bar(true) == bar(false))
    abort ();
}

ok ... outside of the scope of standard "C", but we certainly _can_ do this.
Which would question tail-merging the above at all, of course.

> I started looking at the problem in terms of subsequent calls. Consider the
> imaginary attribute nosideeffect, similar to const and pure.
>
> const:
> - no effects except the return value
> - the return value depends only on the parameters
>
> pure:
> - no effects except the return value
> - the return value depends only on the parameters and/or global variables
>
> nosideeffect:
> - no effects except the return value
>
> The code fragment:
> ...
> extern int f (int) __attribute ((nosideeffect));
> int g()
> {
>  int a;
>  int b;
>  a = f (5);
>  b = f (5);
>  return a + b;
> }
> ...
>
> would result in a gimple sequence more or less like this:
> ...
>  # VUSE <.MEMD.1712_4(D)>
>  aD.1707_1 = fD.1704 (5);
>  # VUSE <.MEMD.1712_4(D)>
>  bD.1708_2 = fD.1704 (5);
>  D.1710_3 = aD.1707_1 + bD.1708_2;
>  # VUSE <.MEMD.1712_6>
>  return D.1710_3;
> ...
>
> The value numbering modification I'm proposing would say that a and b have the
> same value, which is not true. I updated the patch to ensure in this case that a
> and b would be seen as different.
> Note that things won't break if the attribute is missing on a function, since
> that just means that the function would have a vdef, and there would be no 2
> subsequent function calls with the same vuse.
>
>> - but that is not what
>> matters - because obviously we can still merge the blocks.
>
> Right.
>
>>  That is,
>> we cannot be sure that the side-effects on memory of bla (5) and bla (5)
>> are the same.  But is that not what your value-numbering changes will assert?
>
> In the case of calls in different branches of an control flow statement, we can
> aggressively assume they're the same, since we cannot write a check that would
> start failing.

Apart from the above ... which shows that even the returned values can matter.

> In the case of subsequent calls with side effects, the vuses will never be the same.
>
> In the case of subsequent calls without side effects, but not pure or const, we
> can't assume they have the same result. AFAIK, there's currently no way to
> specify such a function, but the updated patch should be able handle this.
>
>> (not sure how you achieve this, but it looks like you insert/lookup general
>> calls, thus not pure/const ones
>
> Correct.
>
>> - that could easily lead to miscompiles(?)).
>>

And to increased compile-time (not sure if it matters).

> I have bootstrapped and reg-tested the updated (and the previous) patch on
> x86_64 (ada inclusive), and found no issues.
>
> Can you think of a test-case that fails with the previous or the updated patch?

I see you do not handle

struct S { int i; };
struct S foo (void);
struct S bar (void)
{
  struct S s1, s2;
  if (...)
   s = foo ();
  else
   s = foo ();

because the calls have a LHS that is not an SSA name.  In general
value-numbering virtual operands will change what further consumers lookup.
So changing

  # MEM_2 = VUSE <MEM_3>
  foo ();
  # VUSE <MEM_2>
  ... = x;

by value-numbering MEM_2 to MEM_3 will make the lookup of ... = x use
MEM_3, not considering a clobber by foo.  This might not actually be an
issue (after all we do this for stores, too).

I see you do not handle

int bar (int);
void baz (int);
void bla (int);
int s;
void
foo (int y)
{
  int a;
  if (y == 6)
    {
      s = 1;
      bla (5);
      a = bar (7);
    }
  else
    {
      s = 1;
      bla (5);
      a = bar (7);
    }
  baz (a);
}

and I don't see how you need the new ref->vdef member - that is also
not handled consistently (not in hashing or comparing, etc.).  Can't you
always use SSA_VAL (vuse)?

Btw, what's

+  if (vdef)
+    VN_INFO (vdef)->use_processed = true;

for?  We arrive here by visiting the VDEF after all and visit_use does already
set the use processed.  Is it that we end up visiting the call twice because
of the lhs SSA name definition and the virtual operand definition?  If so then
visit_use should instead do

FOR_EACH_SSA_DEF_OPERAND (...)
   VN_INFO (def)->use_processed = true;

and defs_to_varying then no longer needs to do that.

> Otherwise, ok for trunk?

Let's give this one more iteration, but I guess the basic idea should work.

Thanks,
Richard.

> Thanks,
> - Tom
>
>> Richard.
>
> 2012-04-24  Tom de Vries  <tom@codesourcery.com>
>
>        PR tree-optimization/51879
>        * tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field.
>        * tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef.
>        Handle case that lhs is NULL_TREE.
>        (visit_use): Handle calls with side-effect using
>        visit_reference_op_call.
>
>        gcc.dg/pr51879.c: New test.
>        gcc.dg/pr51879-2.c: Same.
>        gcc.dg/pr51879-3.c: Same.
>        gcc.dg/pr51879-4.c: Same.
>        gcc.dg/pr51879-6.c: Same.
>
>
Jakub Jelinek - April 25, 2012, 10:09 a.m.
On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote:
> void *foo ()
> {
>   return __builtin_return_address (0);
> }
> 
> void *bar (_Bool b)
> {
>   if (b)
>     return foo ();
>   else
>     return foo ();
> }
> 
> int main()
> {
>   if (bar(true) == bar(false))
>     abort ();
> }
> 
> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
> Which would question tail-merging the above at all, of course.

I don't think we guarantee the above, after all, even pure functions may
use __builtin_return_address (0) - it doesn't modify memory, and we happily
remove pure calls, CSE the return values etc.

	Jakub
Tom de Vries - April 25, 2012, 10:18 a.m.
On 25/04/12 12:09, Jakub Jelinek wrote:
> On Wed, Apr 25, 2012 at 11:57:09AM +0200, Richard Guenther wrote:
>> void *foo ()
>> {
>>   return __builtin_return_address (0);
>> }
>>
>> void *bar (_Bool b)
>> {
>>   if (b)
>>     return foo ();
>>   else
>>     return foo ();
>> }
>>
>> int main()
>> {
>>   if (bar(true) == bar(false))
>>     abort ();
>> }
>>
>> ok ... outside of the scope of standard "C", but we certainly _can_ do this.
>> Which would question tail-merging the above at all, of course.
> 
> I don't think we guarantee the above, after all, even pure functions may
> use __builtin_return_address (0) - it doesn't modify memory, and we happily
> remove pure calls, CSE the return values etc.
> 

Jakub,

pure:
- no effects except the return value
- return value depends only on the parameters and/or global variables

AFAIU, given this definition, a pure function cannot use
__builtin_return_address since it would mean that the return value depends on
something else than parameters and global variables.

Thanks,
- Tom

> 	Jakub

Patch

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c (revision 185028)
+++ gcc/tree-ssa-sccvn.c (working copy)
@@ -2598,27 +2598,44 @@  visit_reference_op_call (tree lhs, gimpl
 {
   bool changed = false;
   struct vn_reference_s vr1;
-  tree result;
+  vn_reference_t vnresult = NULL;
   tree vuse = gimple_vuse (stmt);
+  tree vdef = gimple_vdef (stmt);
+
+  if (vdef)
+    VN_INFO (vdef)->use_processed = true;
 
   vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE;
   vr1.operands = valueize_shared_reference_ops_from_call (stmt);
   vr1.type = gimple_expr_type (stmt);
   vr1.set = 0;
   vr1.hashcode = vn_reference_compute_hash (&vr1);
-  result = vn_reference_lookup_1 (&vr1, NULL);
-  if (result)
+  vn_reference_lookup_1 (&vr1, &vnresult);
+
+  if (vnresult)
     {
-      changed = set_ssa_val_to (lhs, result);
-      if (TREE_CODE (result) == SSA_NAME
-	  && VN_INFO (result)->has_constants)
-	VN_INFO (lhs)->has_constants = true;
+      if (vnresult->vdef)
+	changed |= set_ssa_val_to (vdef, vnresult->vdef);
+
+      if (!vnresult->result && lhs)
+	vnresult->result = lhs;
+
+      if (vnresult->result && lhs)
+	{
+	  changed |= set_ssa_val_to (lhs, vnresult->result);
+
+	  if (VN_INFO (vnresult->result)->has_constants)
+	    VN_INFO (lhs)->has_constants = true;
+	}
     }
   else
     {
       void **slot;
       vn_reference_t vr2;
-      changed = set_ssa_val_to (lhs, lhs);
+      if (vdef)
+	changed |= set_ssa_val_to (vdef, vdef);
+      if (lhs)
+	changed |= set_ssa_val_to (lhs, lhs);
       vr2 = (vn_reference_t) pool_alloc (current_info->references_pool);
       vr2->vuse = vr1.vuse;
       vr2->operands = valueize_refs (create_reference_ops_from_call (stmt));
@@ -2626,6 +2643,7 @@  visit_reference_op_call (tree lhs, gimpl
       vr2->set = vr1.set;
       vr2->hashcode = vr1.hashcode;
       vr2->result = lhs;
+      vr2->vdef = vdef;
       slot = htab_find_slot_with_hash (current_info->references,
 				       vr2, vr2->hashcode, INSERT);
       if (*slot)
@@ -3186,8 +3204,7 @@  visit_use (tree use)
     {
       if (gimple_code (stmt) == GIMPLE_PHI)
 	changed = visit_phi (stmt);
-      else if (!gimple_has_lhs (stmt)
-	       || gimple_has_volatile_ops (stmt))
+      else if (gimple_has_volatile_ops (stmt))
 	changed = defs_to_varying (stmt);
       else if (is_gimple_assign (stmt))
 	{
@@ -3349,34 +3366,42 @@  visit_use (tree use)
 
 	  /* ???  We could try to simplify calls.  */
 
-	  if (stmt_has_constants (stmt)
-	      && TREE_CODE (lhs) == SSA_NAME)
-	    VN_INFO (lhs)->has_constants = true;
-	  else if (TREE_CODE (lhs) == SSA_NAME)
+	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
 	    {
-	      /* We reset expr and constantness here because we may
-		 have been value numbering optimistically, and
-		 iterating. They may become non-constant in this case,
-		 even if they were optimistically constant. */
-	      VN_INFO (lhs)->has_constants = false;
-	      VN_INFO (lhs)->expr = NULL_TREE;
+	      if (stmt_has_constants (stmt))
+		VN_INFO (lhs)->has_constants = true;
+	      else
+		{
+		  /* We reset expr and constantness here because we may
+		     have been value numbering optimistically, and
+		     iterating.  They may become non-constant in this case,
+		     even if they were optimistically constant.  */
+		  VN_INFO (lhs)->has_constants = false;
+		  VN_INFO (lhs)->expr = NULL_TREE;
+		}
+
+	      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
+		{
+		  changed = defs_to_varying (stmt);
+		  goto done;
+		}
 	    }
 
-	  if (TREE_CODE (lhs) == SSA_NAME
-	      && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
-	    changed = defs_to_varying (stmt);
 	  /* ???  We should handle stores from calls.  */
-	  else if (TREE_CODE (lhs) == SSA_NAME)
-	    {
-	      if (!gimple_call_internal_p (stmt)
-		  && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST))
-		changed = visit_reference_op_call (lhs, stmt);
-	      else
-		changed = defs_to_varying (stmt);
-	    }
+	  if (!gimple_call_internal_p (stmt)
+	      && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)
+		  /* If the call has side effects, subsequent calls won't have
+		     the same incoming vuse, so it's save to assume
+		     equality.  */
+		  || gimple_has_side_effects (stmt))
+	      && ((lhs && TREE_CODE (lhs) == SSA_NAME)
+		  || (!lhs && gimple_vdef (stmt))))
+	    changed = visit_reference_op_call (lhs, stmt);
 	  else
 	    changed = defs_to_varying (stmt);
 	}
+      else
+	changed = defs_to_varying (stmt);
     }
  done:
   return changed;
Index: gcc/tree-ssa-sccvn.h
===================================================================
--- gcc/tree-ssa-sccvn.h (revision 185028)
+++ gcc/tree-ssa-sccvn.h (working copy)
@@ -110,6 +110,7 @@  typedef struct vn_reference_s
   tree type;
   VEC (vn_reference_op_s, heap) *operands;
   tree result;
+  tree vdef;
 } *vn_reference_t;
 typedef const struct vn_reference_s *const_vn_reference_t;
 
Index: gcc/testsuite/gcc.dg/pr51879-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-2.c (revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y)
+    baz (bar (7) + 6);
+  else
+    baz (bar (7) + 6);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-6.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-6.c (revision 0)
@@ -0,0 +1,27 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+
+int bar (int);
+void baz (int);
+void bla (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    {
+      bla (5);
+      a = bar (7);
+    }
+  else
+    {
+      bla (5);
+      a = bar (7);
+    }
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879.c (revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7);
+  else
+    a = bar (7);
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-3.c (revision 0)
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+void
+foo (int y)
+{
+  int a;
+  if (y == 6)
+    a = bar (7) + 6;
+  else
+    a = bar (7) + 6;
+  baz (a);
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */
Index: gcc/testsuite/gcc.dg/pr51879-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr51879-4.c (revision 0)
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre" } */
+
+int bar (int);
+void baz (int);
+
+int foo (int y)
+{
+  int a, b;
+  a = bar (7) + 6;
+  b = bar (7) + 6;
+  return a + b;
+}
+
+/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre"} } */
+/* { dg-final { cleanup-tree-dump "pre" } } */