Message ID | 4F230B0F.7080107@mentor.com |
---|---|
State | New |
Headers | show |
On 27/01/12 21:37, Tom de Vries wrote: > On 24/01/12 11:40, Richard Guenther wrote: >> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote: >>> Richard, >>> Jakub, >>> >>> the following patch fixes PR51879. >>> >>> Consider the following test-case: >>> ... >>> int bar (int); >>> void baz (int); >>> >>> void >>> foo (int y) >>> { >>> int a; >>> if (y == 6) >>> a = bar (7); >>> else >>> a = bar (7); >>> baz (a); >>> } >>> ... >>> >>> after compiling at -02, the representation looks like this before tail-merging: >>> ... >>> # BLOCK 3 freq:1991 >>> # PRED: 2 [19.9%] (true,exec) >>> # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)> >>> # USE = nonlocal >>> # CLB = nonlocal >>> aD.1709_3 = barD.1703 (7); >>> goto <bb 5>; >>> # SUCC: 5 [100.0%] (fallthru,exec) >>> >>> # BLOCK 4 freq:8009 >>> # PRED: 2 [80.1%] (false,exec) >>> # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)> >>> # USE = nonlocal >>> # CLB = nonlocal >>> aD.1709_4 = barD.1703 (7); >>> # SUCC: 5 [100.0%] (fallthru,exec) >>> >>> # BLOCK 5 freq:10000 >>> # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) >>> # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)> >>> # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)> >>> # .MEMD.1714_9 = VDEF <.MEMD.1714_5> >>> # USE = nonlocal >>> # CLB = nonlocal >>> bazD.1705 (aD.1709_1); >>> # VUSE <.MEMD.1714_9> >>> return; >>> ... >>> >>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 >>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. >>> >>> The patch makes sure non-const/pure call results (gimple_vdef and >>> gimple_call_lhs) are properly value numbered. >>> >>> Bootstrapped and reg-tested on x86_64. >>> >>> ok for stage1? >> >> The following cannot really work: >> >> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl >> result = vn_reference_lookup_1 (&vr1, NULL); >> if (result) >> { >> - changed = set_ssa_val_to (lhs, result); >> + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); >> + if (vdef) >> + changed |= set_ssa_val_to (vdef, result_vdef); >> + changed |= set_ssa_val_to (lhs, result); >> >> because 'result' may be not an SSA name. It might also not have >> a proper definition statement (if VN_INFO (result)->needs_insertion >> is true). So you at least need to guard things properly. >> > > Right. And that also doesn't work if the function is without lhs, such as in the > new test-case pr51879-6.c. > > I fixed this by storing both lhs and vdef, such that I don't have to derive > the vdef from the lhs. > >> (On a side-note - I _did_ want to remove value-numbering virtual operands >> at some point ...) >> > > Doing so willl hurt performance of tail-merging in its current form. > OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that > value numbering as used in tail-merging has its limitations too. > Do you have any ideas how to address that one? > >> @@ -3359,8 +3366,10 @@ visit_use (tree use) >> /* ??? We should handle stores from calls. */ >> else if (TREE_CODE (lhs) == SSA_NAME) >> { >> + tree vuse = gimple_vuse (stmt); >> if (!gimple_call_internal_p (stmt) >> - && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)) >> + && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST) >> + || (vuse && SSA_VAL (vuse) != VN_TOP))) >> changed = visit_reference_op_call (lhs, stmt); >> else >> changed = defs_to_varying (stmt); >> >> ... exactly because of the issue that a stmt has multiple defs. Btw, >> vuse should have been visited here or be part of our SCC, so, why do >> you need this check? >> > > Removed now, that was a workaround for a bug in an earlier version of the patch, > that I didn't need anymore. > > Bootstrapped and reg-tested on x86_64. > > OK for stage1? > Richard, quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: ... I think these fixes hint at that we should use "structural" equality as fallback if value-numbering doesn't equate two stmt effects. Thus, treat two stmts with exactly the same operands and flags as equal and using value-numbering to canonicalize operands (when they are SSA names) for that comparison, or use VN entirely if there are no side-effects on the stmt. Changing value-numbering of virtual operands, even if it looks correct in the simple cases you change, doesn't look like a general solution for the missed tail merging opportunities. ... The test-case pr51879-6.c shows a case where improving value numbering will help tail-merging, but structural equality comparison not: ... # BLOCK 3 freq:1991 # PRED: 2 [19.9%] (true,exec) # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)> # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_8 = VDEF <.MEMD.1717_7> # USE = nonlocal # CLB = nonlocal aD.1712_3 = barD.1704 (7); goto <bb 5>; # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 4 freq:8009 # PRED: 2 [80.1%] (false,exec) # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)> # USE = nonlocal # CLB = nonlocal blaD.1708 (5); # .MEMD.1717_10 = VDEF <.MEMD.1717_9> # USE = nonlocal # CLB = nonlocal aD.1712_4 = barD.1704 (7); # SUCC: 5 [100.0%] (fallthru,exec) # BLOCK 5 freq:10000 # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)> # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)> # .MEMD.1717_11 = VDEF <.MEMD.1717_5> # USE = nonlocal # CLB = nonlocal bazD.1706 (aD.1712_1); # VUSE <.MEMD.1717_11> return; ... by structurally comparing the gimples in both blocks, we can see that these are equal. What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For that, we need value numbering to number the vuses the same. And if we get value numbering to number these values equal, we don't need the structural comparison. In this case structural comparison cannot be used as fallback for value-numbering. So, ok for trunk? And if not ok, do you have a suggestion of how to deal with this otherwise? Thanks, - Tom > Thanks, > - Tom > >> Thanks, >> Richard. >> > > 2012-01-27 Tom de Vries <tom@codesourcery.com> > > PR tree-optimization/51879 > * tree-ssa-sccvn.h (struct vn_reference_s): Add vdef field. > * tree-ssa-sccvn.c (visit_reference_op_call): Handle gimple_vdef. > Handle case that lhs is NULL_TREE. > (visit_use): Handle non-pure/const calls and calls without result using > visit_reference_op_call. > > gcc.dg/pr51879.c: New test. > gcc.dg/pr51879-2.c: Same. > gcc.dg/pr51879-3.c: Same. > gcc.dg/pr51879-4.c: Same. > gcc.dg/pr51879-6.c: Same. >
On Sat, Apr 14, 2012 at 9:26 AM, Tom de Vries <Tom_deVries@mentor.com> wrote: > On 27/01/12 21:37, Tom de Vries wrote: >> On 24/01/12 11:40, Richard Guenther wrote: >>> On Mon, Jan 23, 2012 at 10:27 PM, Tom de Vries <Tom_deVries@mentor.com> wrote: >>>> Richard, >>>> Jakub, >>>> >>>> the following patch fixes PR51879. >>>> >>>> Consider the following test-case: >>>> ... >>>> int bar (int); >>>> void baz (int); >>>> >>>> void >>>> foo (int y) >>>> { >>>> int a; >>>> if (y == 6) >>>> a = bar (7); >>>> else >>>> a = bar (7); >>>> baz (a); >>>> } >>>> ... >>>> >>>> after compiling at -02, the representation looks like this before tail-merging: >>>> ... >>>> # BLOCK 3 freq:1991 >>>> # PRED: 2 [19.9%] (true,exec) >>>> # .MEMD.1714_7 = VDEF <.MEMD.1714_6(D)> >>>> # USE = nonlocal >>>> # CLB = nonlocal >>>> aD.1709_3 = barD.1703 (7); >>>> goto <bb 5>; >>>> # SUCC: 5 [100.0%] (fallthru,exec) >>>> >>>> # BLOCK 4 freq:8009 >>>> # PRED: 2 [80.1%] (false,exec) >>>> # .MEMD.1714_8 = VDEF <.MEMD.1714_6(D)> >>>> # USE = nonlocal >>>> # CLB = nonlocal >>>> aD.1709_4 = barD.1703 (7); >>>> # SUCC: 5 [100.0%] (fallthru,exec) >>>> >>>> # BLOCK 5 freq:10000 >>>> # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) >>>> # aD.1709_1 = PHI <aD.1709_3(3), aD.1709_4(4)> >>>> # .MEMD.1714_5 = PHI <.MEMD.1714_7(3), .MEMD.1714_8(4)> >>>> # .MEMD.1714_9 = VDEF <.MEMD.1714_5> >>>> # USE = nonlocal >>>> # CLB = nonlocal >>>> bazD.1705 (aD.1709_1); >>>> # VUSE <.MEMD.1714_9> >>>> return; >>>> ... >>>> >>>> the patch allows aD.1709_4 to be value numbered to aD.1709_3, and .MEMD.1714_8 >>>> to .MEMD.1714_7, which enables tail-merging of blocks 4 and 5. >>>> >>>> The patch makes sure non-const/pure call results (gimple_vdef and >>>> gimple_call_lhs) are properly value numbered. >>>> >>>> Bootstrapped and reg-tested on x86_64. >>>> >>>> ok for stage1? >>> >>> The following cannot really work: >>> >>> @@ -2600,7 +2601,11 @@ visit_reference_op_call (tree lhs, gimpl >>> result = vn_reference_lookup_1 (&vr1, NULL); >>> if (result) >>> { >>> - changed = set_ssa_val_to (lhs, result); >>> + tree result_vdef = gimple_vdef (SSA_NAME_DEF_STMT (result)); >>> + if (vdef) >>> + changed |= set_ssa_val_to (vdef, result_vdef); >>> + changed |= set_ssa_val_to (lhs, result); >>> >>> because 'result' may be not an SSA name. It might also not have >>> a proper definition statement (if VN_INFO (result)->needs_insertion >>> is true). So you at least need to guard things properly. >>> >> >> Right. And that also doesn't work if the function is without lhs, such as in the >> new test-case pr51879-6.c. >> >> I fixed this by storing both lhs and vdef, such that I don't have to derive >> the vdef from the lhs. >> >>> (On a side-note - I _did_ want to remove value-numbering virtual operands >>> at some point ...) >>> >> >> Doing so willl hurt performance of tail-merging in its current form. >> OTOH, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51964#c0 shows that >> value numbering as used in tail-merging has its limitations too. >> Do you have any ideas how to address that one? >> >>> @@ -3359,8 +3366,10 @@ visit_use (tree use) >>> /* ??? We should handle stores from calls. */ >>> else if (TREE_CODE (lhs) == SSA_NAME) >>> { >>> + tree vuse = gimple_vuse (stmt); >>> if (!gimple_call_internal_p (stmt) >>> - && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)) >>> + && (gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST) >>> + || (vuse && SSA_VAL (vuse) != VN_TOP))) >>> changed = visit_reference_op_call (lhs, stmt); >>> else >>> changed = defs_to_varying (stmt); >>> >>> ... exactly because of the issue that a stmt has multiple defs. Btw, >>> vuse should have been visited here or be part of our SCC, so, why do >>> you need this check? >>> >> >> Removed now, that was a workaround for a bug in an earlier version of the patch, >> that I didn't need anymore. >> >> Bootstrapped and reg-tested on x86_64. >> >> OK for stage1? >> > > Richard, > > quoting you in http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00618.html: > ... > I think these fixes hint at that we should > use "structural" equality as fallback if value-numbering doesn't equate > two stmt effects. Thus, treat two stmts with exactly the same operands > and flags as equal and using value-numbering to canonicalize operands > (when they are SSA names) for that comparison, or use VN entirely > if there are no side-effects on the stmt. > > Changing value-numbering of virtual operands, even if it looks correct in the > simple cases you change, doesn't look like a general solution for the missed > tail merging opportunities. > ... > > The test-case pr51879-6.c shows a case where improving value numbering will help > tail-merging, but structural equality comparison not: > ... > # BLOCK 3 freq:1991 > # PRED: 2 [19.9%] (true,exec) > # .MEMD.1717_7 = VDEF <.MEMD.1717_6(D)> > # USE = nonlocal > # CLB = nonlocal > blaD.1708 (5); > # .MEMD.1717_8 = VDEF <.MEMD.1717_7> > # USE = nonlocal > # CLB = nonlocal > aD.1712_3 = barD.1704 (7); > goto <bb 5>; > # SUCC: 5 [100.0%] (fallthru,exec) > > # BLOCK 4 freq:8009 > # PRED: 2 [80.1%] (false,exec) > # .MEMD.1717_9 = VDEF <.MEMD.1717_6(D)> > # USE = nonlocal > # CLB = nonlocal > blaD.1708 (5); > # .MEMD.1717_10 = VDEF <.MEMD.1717_9> > # USE = nonlocal > # CLB = nonlocal > aD.1712_4 = barD.1704 (7); > # SUCC: 5 [100.0%] (fallthru,exec) > > # BLOCK 5 freq:10000 > # PRED: 3 [100.0%] (fallthru,exec) 4 [100.0%] (fallthru,exec) > # aD.1712_1 = PHI <aD.1712_3(3), aD.1712_4(4)> > # .MEMD.1717_5 = PHI <.MEMD.1717_8(3), .MEMD.1717_10(4)> > # .MEMD.1717_11 = VDEF <.MEMD.1717_5> > # USE = nonlocal > # CLB = nonlocal > bazD.1706 (aD.1712_1); > # VUSE <.MEMD.1717_11> > return; > ... > by structurally comparing the gimples in both blocks, we can see that these are > equal. > What we can't see, is that aD.1712_3 and aD.1712_4 have the same value. For > that, we need value numbering to number the vuses the same. And if we get value > numbering to number these values equal, we don't need the structural comparison. > > In this case structural comparison cannot be used as fallback for > value-numbering. So, ok for trunk? > > And if not ok, do you have a suggestion of how to deal with this otherwise? Hmm. I'm not sure we can conclude that they have the same value! +int bar (int); +void baz (int); +void bla (int); + +void +foo (int y) +{ + int a; + if (y == 6) + { + bla (5); + a = bar (7); + } + else + { + bla (5); + a = bar (7); + } + baz (a); +} I can implement bla as void bla(int) { a = random (); } thus a would not have the same value on both paths - but that is not what matters - because obviously we can still merge the blocks. That is, we cannot be sure that the side-effects on memory of bla (5) and bla (5) are the same. But is that not what your value-numbering changes will assert? (not sure how you achieve this, but it looks like you insert/lookup general calls, thus not pure/const ones - that could easily lead to miscompiles(?)). Richard. Richard.
Index: gcc/tree-ssa-sccvn.c =================================================================== --- gcc/tree-ssa-sccvn.c (revision 183325) +++ gcc/tree-ssa-sccvn.c (working copy) @@ -2589,27 +2589,44 @@ visit_reference_op_call (tree lhs, gimpl { bool changed = false; struct vn_reference_s vr1; - tree result; + vn_reference_t vnresult = NULL; tree vuse = gimple_vuse (stmt); + tree vdef = gimple_vdef (stmt); + + if (vdef) + VN_INFO (vdef)->use_processed = true; vr1.vuse = vuse ? SSA_VAL (vuse) : NULL_TREE; vr1.operands = valueize_shared_reference_ops_from_call (stmt); vr1.type = gimple_expr_type (stmt); vr1.set = 0; vr1.hashcode = vn_reference_compute_hash (&vr1); - result = vn_reference_lookup_1 (&vr1, NULL); - if (result) + vn_reference_lookup_1 (&vr1, &vnresult); + + if (vnresult) { - changed = set_ssa_val_to (lhs, result); - if (TREE_CODE (result) == SSA_NAME - && VN_INFO (result)->has_constants) - VN_INFO (lhs)->has_constants = true; + if (vnresult->vdef) + changed |= set_ssa_val_to (vdef, vnresult->vdef); + + if (!vnresult->result && lhs) + vnresult->result = lhs; + + if (vnresult->result && lhs) + { + changed |= set_ssa_val_to (lhs, vnresult->result); + + if (VN_INFO (vnresult->result)->has_constants) + VN_INFO (lhs)->has_constants = true; + } } else { void **slot; vn_reference_t vr2; - changed = set_ssa_val_to (lhs, lhs); + if (vdef) + changed |= set_ssa_val_to (vdef, vdef); + if (lhs) + changed |= set_ssa_val_to (lhs, lhs); vr2 = (vn_reference_t) pool_alloc (current_info->references_pool); vr2->vuse = vr1.vuse; vr2->operands = valueize_refs (create_reference_ops_from_call (stmt)); @@ -2617,6 +2634,7 @@ visit_reference_op_call (tree lhs, gimpl vr2->set = vr1.set; vr2->hashcode = vr1.hashcode; vr2->result = lhs; + vr2->vdef = vdef; slot = htab_find_slot_with_hash (current_info->references, vr2, vr2->hashcode, INSERT); if (*slot) @@ -3177,8 +3195,7 @@ visit_use (tree use) { if (gimple_code (stmt) == GIMPLE_PHI) changed = visit_phi (stmt); - else if (!gimple_has_lhs (stmt) - || gimple_has_volatile_ops (stmt)) + else if (gimple_has_volatile_ops (stmt)) changed = defs_to_varying (stmt); else if (is_gimple_assign (stmt)) { @@ -3340,34 +3357,37 @@ visit_use (tree use) /* ??? We could try to simplify calls. */ - if (stmt_has_constants (stmt) - && TREE_CODE (lhs) == SSA_NAME) - VN_INFO (lhs)->has_constants = true; - else if (TREE_CODE (lhs) == SSA_NAME) + if (lhs && TREE_CODE (lhs) == SSA_NAME) { - /* We reset expr and constantness here because we may - have been value numbering optimistically, and - iterating. They may become non-constant in this case, - even if they were optimistically constant. */ - VN_INFO (lhs)->has_constants = false; - VN_INFO (lhs)->expr = NULL_TREE; + if (stmt_has_constants (stmt)) + VN_INFO (lhs)->has_constants = true; + else + { + /* We reset expr and constantness here because we may + have been value numbering optimistically, and + iterating. They may become non-constant in this case, + even if they were optimistically constant. */ + VN_INFO (lhs)->has_constants = false; + VN_INFO (lhs)->expr = NULL_TREE; + } + + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) + { + changed = defs_to_varying (stmt); + goto done; + } } - if (TREE_CODE (lhs) == SSA_NAME - && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) - changed = defs_to_varying (stmt); /* ??? We should handle stores from calls. */ - else if (TREE_CODE (lhs) == SSA_NAME) - { - if (!gimple_call_internal_p (stmt) - && gimple_call_flags (stmt) & (ECF_PURE | ECF_CONST)) - changed = visit_reference_op_call (lhs, stmt); - else - changed = defs_to_varying (stmt); - } + if (!gimple_call_internal_p (stmt) + && ((lhs && TREE_CODE (lhs) == SSA_NAME) + || (!lhs && gimple_vdef (stmt)))) + changed = visit_reference_op_call (lhs, stmt); else changed = defs_to_varying (stmt); } + else + changed = defs_to_varying (stmt); } done: return changed; Index: gcc/tree-ssa-sccvn.h =================================================================== --- gcc/tree-ssa-sccvn.h (revision 183325) +++ gcc/tree-ssa-sccvn.h (working copy) @@ -110,6 +110,7 @@ typedef struct vn_reference_s tree type; VEC (vn_reference_op_s, heap) *operands; tree result; + tree vdef; } *vn_reference_t; typedef const struct vn_reference_s *const_vn_reference_t; Index: gcc/testsuite/gcc.dg/pr51879-2.c =================================================================== --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-2.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +int bar (int); +void baz (int); + +void +foo (int y) +{ + int a; + if (y) + baz (bar (7) + 6); + else + baz (bar (7) + 6); +} + +/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */ +/* { dg-final { scan-tree-dump-times "baz \\(" 1 "pre"} } */ +/* { dg-final { cleanup-tree-dump "pre" } } */ Index: gcc/testsuite/gcc.dg/pr51879-6.c =================================================================== --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-6.c (revision 0) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + + +int bar (int); +void baz (int); +void bla (int); + +void +foo (int y) +{ + int a; + if (y == 6) + { + bla (5); + a = bar (7); + } + else + { + bla (5); + a = bar (7); + } + baz (a); +} + +/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */ +/* { dg-final { cleanup-tree-dump "pre" } } */ Index: gcc/testsuite/gcc.dg/pr51879.c =================================================================== --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +int bar (int); +void baz (int); + +void +foo (int y) +{ + int a; + if (y == 6) + a = bar (7); + else + a = bar (7); + baz (a); +} + +/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */ +/* { dg-final { cleanup-tree-dump "pre" } } */ Index: gcc/testsuite/gcc.dg/pr51879-3.c =================================================================== --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-3.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +int bar (int); +void baz (int); + +void +foo (int y) +{ + int a; + if (y == 6) + a = bar (7) + 6; + else + a = bar (7) + 6; + baz (a); +} + +/* { dg-final { scan-tree-dump-times "bar \\(" 1 "pre"} } */ +/* { dg-final { cleanup-tree-dump "pre" } } */ Index: gcc/testsuite/gcc.dg/pr51879-4.c =================================================================== --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr51879-4.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-pre" } */ + +int bar (int); +void baz (int); + +int foo (int y) +{ + int a, b; + a = bar (7) + 6; + b = bar (7) + 6; + return a + b; +} + +/* { dg-final { scan-tree-dump-times "bar \\(" 2 "pre"} } */ +/* { dg-final { cleanup-tree-dump "pre" } } */