Message ID | alpine.DEB.2.21.1905120045560.26727@stedding.saclay.inria.fr |
---|---|
State | New |
Headers | show |
Series | malloc cannot alias preexisting pointers | expand |
On Sun, 12 May 2019, Marc Glisse wrote: > this patch lets gcc know that if a pointer existed before the call to malloc, > the result of malloc cannot alias it. This is a bit ad hoc and fragile. A > small improvement would be, when the 2 statements are in the same bb but in > the wrong order, to check if there is any statement in between that might > prevent from reordering them. But that's more complicated, and the patch as > it is already does help. I was also thinking of adding a few more cases: * if ptr1=PHI<x,y> and ptr2 can alias neither x nor y... but we probably need to be careful to avoid a combinatorial explosion. * if ptr1 is a NULL pointer, we cannot dereference it (at least with the right flags), so dereferencing it cannot alias anything. This seems mostly useful as a recursive call with PHI<x,0>. Let me know if this looks like a bad idea.
Marc Glisse <marc.glisse@inria.fr> writes: > Hello, > > this patch lets gcc know that if a pointer existed before the call to > malloc, the result of malloc cannot alias it. This is a bit ad hoc and > fragile. A small improvement would be, when the 2 statements are in the > same bb but in the wrong order, to check if there is any statement in > between that might prevent from reordering them. But that's more > complicated, and the patch as it is already does help. > > I expect people may not like the call to a function from > tree-ssa-loop-niter too much, but it is convenient. And if someone > improves it, they will likely have to rewrite something not quite > equivalent to stmt_dominates_stmt_p. It has linear complexity for statements in the same block though. (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids being up-to-date.) > The testcase uses libstdc++ quite a bit. I thought about putting it in the > libstdc++ testsuite, but it does not know scan-tree-dump, and in the > assembly it may be too late, memcpy may get expanded to something > target-specific. So it is in g++.dg. I could write a more artificial > testcase, but the behavior of std::vector is really what I want to test... > > Bootstrap+regtest on x86_64-pc-linux-gnu. > > 2019-05-13 Marc Glisse <marc.glisse@inria.fr> > > gcc/ > * tree-ssa-loop-niter.c (stmt_dominates_stmt_p): Handle NULL > basic block. > * tree-ssa-alias.c: Include tree-ssa-loop-niter.h. > (ptr_derefs_may_alias_p): Detect malloc and an older pointer. > > gcc/testsuite/ > * g++.dg/tree-ssa/ldist-2.C: New file.
On Sun, 12 May 2019, Richard Sandiford wrote: > Marc Glisse <marc.glisse@inria.fr> writes: >> Hello, >> >> this patch lets gcc know that if a pointer existed before the call to >> malloc, the result of malloc cannot alias it. This is a bit ad hoc and >> fragile. A small improvement would be, when the 2 statements are in the >> same bb but in the wrong order, to check if there is any statement in >> between that might prevent from reordering them. But that's more >> complicated, and the patch as it is already does help. >> >> I expect people may not like the call to a function from >> tree-ssa-loop-niter too much, but it is convenient. And if someone >> improves it, they will likely have to rewrite something not quite >> equivalent to stmt_dominates_stmt_p. > > It has linear complexity for statements in the same block though. Ah, true. I should anyway test that the second statement is malloc (cheaper) before checking for domination, but even then that could be used to create a quadratic behavior :-( > (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids > being up-to-date.) This function is called from all over the place. Unless there is a simple test to check if uids are safe to use (reassoc_in_progress?), that's going to be a problem. I find it surprising that this information is so hard to get to... Stopping stmt_dominates_stmt_p after walking 128 statements doesn't feel like a great solution. But without controlling the pass where this happens, I don't see any good solution. It would be great if that non-aliasing property could be recorded in the points-to information, then I could set it from a pass I control, but I somehow got the impression that it wouldn't work. Maybe I should try to understand PTA better to make sure.
On Sun, May 12, 2019 at 2:51 PM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Sun, 12 May 2019, Richard Sandiford wrote: > > > Marc Glisse <marc.glisse@inria.fr> writes: > >> Hello, > >> > >> this patch lets gcc know that if a pointer existed before the call to > >> malloc, the result of malloc cannot alias it. This is a bit ad hoc and > >> fragile. A small improvement would be, when the 2 statements are in the > >> same bb but in the wrong order, to check if there is any statement in > >> between that might prevent from reordering them. But that's more > >> complicated, and the patch as it is already does help. > >> > >> I expect people may not like the call to a function from > >> tree-ssa-loop-niter too much, but it is convenient. And if someone > >> improves it, they will likely have to rewrite something not quite > >> equivalent to stmt_dominates_stmt_p. > > > > It has linear complexity for statements in the same block though. > > Ah, true. I should anyway test that the second statement is malloc > (cheaper) before checking for domination, but even then that could be used > to create a quadratic behavior :-( I also think the oracle queries shouldn't encompany such expensive pieces... > > (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids > > being up-to-date.) > > This function is called from all over the place. Unless there is a simple > test to check if uids are safe to use (reassoc_in_progress?), that's going > to be a problem. I find it surprising that this information is so hard to > get to... Stopping stmt_dominates_stmt_p after walking 128 statements > doesn't feel like a great solution. But without controlling the pass where > this happens, I don't see any good solution. It would be great if that > non-aliasing property could be recorded in the points-to information, then > I could set it from a pass I control, but I somehow got the impression > that it wouldn't work. Maybe I should try to understand PTA better to make > sure. In princple PTA should know the aliasing cannot happen but possibly the info is either lost or the IL is too obfuscated at the point it gets computed. (hello C++...) So at ldist time I see # PT = null { D.47989 } (escaped, escaped heap) # ALIGN = 8, MISALIGN = 0 # USE = nonlocal null { D.47989 } (escaped, escaped heap) # CLB = nonlocal null { D.47989 } (escaped, escaped heap) _27 = malloc (8192); good. The interesting loop is the following where the allocation PTA is good and _4 intersect because they include ESCAPED. This is because _27 itself escapes and PTA not being flow-sensitive. I don't see an easy way to improve things here but dislike the stmt walking very much. That is, the proper solution is to re-write PTA in some way. <bb 4> [local count: 2866890688]: # PT = nonlocal escaped null # __first_13 = PHI <_4(12), __first_23(13)> # PT = null { D.47989 } (escaped, escaped heap) # ALIGN = 8, MISALIGN = 0 # __cur_12 = PHI <_27(12), __cur_24(13)> # PT = nonlocal escaped null _20 = MEM[(int * const &)__first_13 clique 2 base 0]; MEM[(int * &)__first_13 clique 2 base 0] = 0B; MEM[(struct &)__cur_12 clique 3 base 1] ={v} {CLOBBER}; MEM[(struct _Head_base *)__cur_12 clique 3 base 1]._M_head_impl = _20; # PT = nonlocal escaped null _22 = MEM[(int * &)__first_13]; if (_22 != 0B) goto <bb 5>; [53.47%] else goto <bb 16>; [46.53%] <bb 16> [local count: 1333964241]: goto <bb 6>; [100.00%] <bb 5> [local count: 1532926450]: *_22 ={v} {CLOBBER}; # USE = nonlocal null { D.47989 } (escaped, escaped heap) # CLB = nonlocal null { D.47989 } (escaped, escaped heap) operator delete (_22, 4); <bb 6> [local count: 2866890691]: MEM[(struct &)__first_13] ={v} {CLOBBER}; # PT = nonlocal escaped __first_23 = __first_13 + 8; # PT = { D.47989 } (escaped, escaped heap) # ALIGN = 8, MISALIGN = 0 __cur_24 = __cur_12 + 8; if (_9 == __first_23) goto <bb 7>; [11.00%] else goto <bb 13>; [89.00%] <bb 13> [local count: 2551532717]: goto <bb 4>; [100.00%] > -- > Marc Glisse
On Mon, 13 May 2019, Richard Biener wrote: > On Sun, May 12, 2019 at 2:51 PM Marc Glisse <marc.glisse@inria.fr> wrote: >> >> On Sun, 12 May 2019, Richard Sandiford wrote: >> >>> Marc Glisse <marc.glisse@inria.fr> writes: >>>> Hello, >>>> >>>> this patch lets gcc know that if a pointer existed before the call to >>>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and >>>> fragile. A small improvement would be, when the 2 statements are in the >>>> same bb but in the wrong order, to check if there is any statement in >>>> between that might prevent from reordering them. But that's more >>>> complicated, and the patch as it is already does help. >>>> >>>> I expect people may not like the call to a function from >>>> tree-ssa-loop-niter too much, but it is convenient. And if someone >>>> improves it, they will likely have to rewrite something not quite >>>> equivalent to stmt_dominates_stmt_p. >>> >>> It has linear complexity for statements in the same block though. >> >> Ah, true. I should anyway test that the second statement is malloc >> (cheaper) before checking for domination, but even then that could be used >> to create a quadratic behavior :-( > > I also think the oracle queries shouldn't encompany such expensive pieces... Well, it is only expensive because finding the order of statements in a basic block is expensive. Would it be better if it only did this check if we are in one of a very limited set of passes where statements have reliable UIDs? Maybe that's not enough passes, loop-distribution uses UIDs but I didn't check if it uses them in a way compatible with this use... What if I only check basic-block domination and punt when the statements are in the same basic block? That seems cheap enough, and would already work for the vector testcase. >>> (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids >>> being up-to-date.) >> >> This function is called from all over the place. Unless there is a simple >> test to check if uids are safe to use (reassoc_in_progress?), that's going >> to be a problem. I find it surprising that this information is so hard to >> get to... Stopping stmt_dominates_stmt_p after walking 128 statements >> doesn't feel like a great solution. But without controlling the pass where >> this happens, I don't see any good solution. It would be great if that >> non-aliasing property could be recorded in the points-to information, then >> I could set it from a pass I control, but I somehow got the impression >> that it wouldn't work. Maybe I should try to understand PTA better to make >> sure. > > In princple PTA should know the aliasing cannot happen but possibly > the info is either lost or the IL is too obfuscated at the point it gets > computed. (hello C++...) We don't need much obfuscation for this, a simple C program int g; int*f(int**p){ int*q=*p; int*r=__builtin_malloc(4); *q=1; *r=2; g=*q; return r; } gives q_4 = *p_3(D); r_6 = __builtin_malloc (4); *q_4 = 1; *r_6 = 2; _1 = *q_4; g = _1; return r_6; where we clearly don't manage to prove that q and r don't alias. > So at ldist time I see > > # PT = null { D.47989 } (escaped, escaped heap) > # ALIGN = 8, MISALIGN = 0 > # USE = nonlocal null { D.47989 } (escaped, escaped heap) > # CLB = nonlocal null { D.47989 } (escaped, escaped heap) > _27 = malloc (8192); > > good. The interesting loop is the following where the allocation PTA > is good and _4 intersect because they include ESCAPED. This is > because _27 itself escapes and PTA not being flow-sensitive. I don't > see an easy way to improve things here but dislike the stmt walking > very much. That is, the proper solution is to re-write PTA in some way. I am trying to imagine what that might look like. I imagine that instead of having a single "escaped" set, we would have multiple escaped sets at different points in the function (at most one per VDEF?). Then _4 would only contain escaped3 while heap5 (*_27) only appears in escaped9 and later? It may be tricky to keep a linear-ish complexity with anything more powerful than the current PTA. I don't know what LLVM is doing, but they seem to manage.
On 5/11/19 5:33 PM, Marc Glisse wrote: > Hello, > > this patch lets gcc know that if a pointer existed before the call to > malloc, the result of malloc cannot alias it. This is a bit ad hoc and > fragile. A small improvement would be, when the 2 statements are in the > same bb but in the wrong order, to check if there is any statement in > between that might prevent from reordering them. But that's more > complicated, and the patch as it is already does help. I.e., given the description of attribute malloc: the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P. doesn't the same guarantee hold for other functions declared with the attribute (so that the same optimization could be applied to them as well)? Martin > > I expect people may not like the call to a function from > tree-ssa-loop-niter too much, but it is convenient. And if someone > improves it, they will likely have to rewrite something not quite > equivalent to stmt_dominates_stmt_p. > > The testcase uses libstdc++ quite a bit. I thought about putting it in > the libstdc++ testsuite, but it does not know scan-tree-dump, and in the > assembly it may be too late, memcpy may get expanded to something > target-specific. So it is in g++.dg. I could write a more artificial > testcase, but the behavior of std::vector is really what I want to test... > > Bootstrap+regtest on x86_64-pc-linux-gnu. > > 2019-05-13 Marc Glisse <marc.glisse@inria.fr> > > gcc/ > * tree-ssa-loop-niter.c (stmt_dominates_stmt_p): Handle NULL > basic block. > * tree-ssa-alias.c: Include tree-ssa-loop-niter.h. > (ptr_derefs_may_alias_p): Detect malloc and an older pointer. > > gcc/testsuite/ > * g++.dg/tree-ssa/ldist-2.C: New file. >
On Mon, May 13, 2019 at 10:36:00AM -0600, Martin Sebor wrote: > On 5/11/19 5:33 PM, Marc Glisse wrote: > > Hello, > > > > this patch lets gcc know that if a pointer existed before the call to > > malloc, the result of malloc cannot alias it. This is a bit ad hoc and > > fragile. A small improvement would be, when the 2 statements are in the > > same bb but in the wrong order, to check if there is any statement in > > between that might prevent from reordering them. But that's more > > complicated, and the patch as it is already does help. > > I.e., given the description of attribute malloc: > > the pointer P returned by the function cannot alias any other > pointer valid when the function returns, and moreover no pointers > to valid objects occur in any storage addressed by P. > > doesn't the same guarantee hold for other functions declared with > the attribute (so that the same optimization could be applied to > them as well)? Doesn't realloc have also the malloc attribute, but the return value can alias preexisting pointers (if no reallocation occurs, just the allocation is extended)? Jakub
On 5/13/19 10:49 AM, Jakub Jelinek wrote: > On Mon, May 13, 2019 at 10:36:00AM -0600, Martin Sebor wrote: >> On 5/11/19 5:33 PM, Marc Glisse wrote: >>> Hello, >>> >>> this patch lets gcc know that if a pointer existed before the call to >>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and >>> fragile. A small improvement would be, when the 2 statements are in the >>> same bb but in the wrong order, to check if there is any statement in >>> between that might prevent from reordering them. But that's more >>> complicated, and the patch as it is already does help. >> >> I.e., given the description of attribute malloc: >> >> the pointer P returned by the function cannot alias any other >> pointer valid when the function returns, and moreover no pointers >> to valid objects occur in any storage addressed by P. >> >> doesn't the same guarantee hold for other functions declared with >> the attribute (so that the same optimization could be applied to >> them as well)? > > Doesn't realloc have also the malloc attribute, but the return value can > alias preexisting pointers (if no reallocation occurs, just the allocation > is extended)? Realloc doesn't have the malloc attribute. Martin
On Mon, 13 May 2019, Martin Sebor wrote: > On 5/11/19 5:33 PM, Marc Glisse wrote: >> Hello, >> >> this patch lets gcc know that if a pointer existed before the call to >> malloc, the result of malloc cannot alias it. This is a bit ad hoc and >> fragile. A small improvement would be, when the 2 statements are in the >> same bb but in the wrong order, to check if there is any statement in >> between that might prevent from reordering them. But that's more >> complicated, and the patch as it is already does help. > > I.e., given the description of attribute malloc: > > the pointer P returned by the function cannot alias any other > pointer valid when the function returns, and moreover no pointers > to valid objects occur in any storage addressed by P. > > doesn't the same guarantee hold for other functions declared with > the attribute (so that the same optimization could be applied to > them as well)? The patch tests DECL_IS_MALLOC, not BUILT_IN_MALLOC. From the failures I got with earlier versions, that seems to include alloca at least, so I would expect it also includes any function with the attribute.
On 5/13/19 11:37 AM, Marc Glisse wrote: > On Mon, 13 May 2019, Martin Sebor wrote: > >> On 5/11/19 5:33 PM, Marc Glisse wrote: >>> Hello, >>> >>> this patch lets gcc know that if a pointer existed before the call to >>> malloc, the result of malloc cannot alias it. This is a bit ad hoc >>> and fragile. A small improvement would be, when the 2 statements are >>> in the same bb but in the wrong order, to check if there is any >>> statement in between that might prevent from reordering them. But >>> that's more complicated, and the patch as it is already does help. >> >> I.e., given the description of attribute malloc: >> >> the pointer P returned by the function cannot alias any other >> pointer valid when the function returns, and moreover no pointers >> to valid objects occur in any storage addressed by P. >> >> doesn't the same guarantee hold for other functions declared with >> the attribute (so that the same optimization could be applied to >> them as well)? > > The patch tests DECL_IS_MALLOC, not BUILT_IN_MALLOC. From the failures I > got with earlier versions, that seems to include alloca at least, so I > would expect it also includes any function with the attribute. Very good. May I suggest to add a test case to verify that? Martin
On Mon, May 13, 2019 at 3:38 PM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Mon, 13 May 2019, Richard Biener wrote: > > > On Sun, May 12, 2019 at 2:51 PM Marc Glisse <marc.glisse@inria.fr> wrote: > >> > >> On Sun, 12 May 2019, Richard Sandiford wrote: > >> > >>> Marc Glisse <marc.glisse@inria.fr> writes: > >>>> Hello, > >>>> > >>>> this patch lets gcc know that if a pointer existed before the call to > >>>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and > >>>> fragile. A small improvement would be, when the 2 statements are in the > >>>> same bb but in the wrong order, to check if there is any statement in > >>>> between that might prevent from reordering them. But that's more > >>>> complicated, and the patch as it is already does help. > >>>> > >>>> I expect people may not like the call to a function from > >>>> tree-ssa-loop-niter too much, but it is convenient. And if someone > >>>> improves it, they will likely have to rewrite something not quite > >>>> equivalent to stmt_dominates_stmt_p. > >>> > >>> It has linear complexity for statements in the same block though. > >> > >> Ah, true. I should anyway test that the second statement is malloc > >> (cheaper) before checking for domination, but even then that could be used > >> to create a quadratic behavior :-( > > > > I also think the oracle queries shouldn't encompany such expensive pieces... > > Well, it is only expensive because finding the order of statements in a > basic block is expensive. Would it be better if it only did this check if > we are in one of a very limited set of passes where statements have > reliable UIDs? Maybe that's not enough passes, loop-distribution uses UIDs > but I didn't check if it uses them in a way compatible with this use... > > What if I only check basic-block domination and punt when the statements > are in the same basic block? That seems cheap enough, and would already > work for the vector testcase. > > >>> (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids > >>> being up-to-date.) > >> > >> This function is called from all over the place. Unless there is a simple > >> test to check if uids are safe to use (reassoc_in_progress?), that's going > >> to be a problem. I find it surprising that this information is so hard to > >> get to... Stopping stmt_dominates_stmt_p after walking 128 statements > >> doesn't feel like a great solution. But without controlling the pass where > >> this happens, I don't see any good solution. It would be great if that > >> non-aliasing property could be recorded in the points-to information, then > >> I could set it from a pass I control, but I somehow got the impression > >> that it wouldn't work. Maybe I should try to understand PTA better to make > >> sure. > > > > In princple PTA should know the aliasing cannot happen but possibly > > the info is either lost or the IL is too obfuscated at the point it gets > > computed. (hello C++...) > > We don't need much obfuscation for this, a simple C program > > int g; > int*f(int**p){ > int*q=*p; > int*r=__builtin_malloc(4); > *q=1; > *r=2; > g=*q; > return r; > } > > gives > > q_4 = *p_3(D); > r_6 = __builtin_malloc (4); > *q_4 = 1; > *r_6 = 2; > _1 = *q_4; > g = _1; > return r_6; > > where we clearly don't manage to prove that q and r don't alias. We can probably improve this one in general from inside PTA by treating escapes through return specially. I wasn't looking too closely at the C++ testcase but I simply assumed the addition to ESCAPED happens through storing the malloc result to memory that PTA cannot prove local. > > So at ldist time I see > > > > # PT = null { D.47989 } (escaped, escaped heap) > > # ALIGN = 8, MISALIGN = 0 > > # USE = nonlocal null { D.47989 } (escaped, escaped heap) > > # CLB = nonlocal null { D.47989 } (escaped, escaped heap) > > _27 = malloc (8192); > > > > good. The interesting loop is the following where the allocation PTA > > is good and _4 intersect because they include ESCAPED. This is > > because _27 itself escapes and PTA not being flow-sensitive. I don't > > see an easy way to improve things here but dislike the stmt walking > > very much. That is, the proper solution is to re-write PTA in some way. > > I am trying to imagine what that might look like. I imagine that instead > of having a single "escaped" set, we would have multiple escaped sets at > different points in the function (at most one per VDEF?). Then _4 would > only contain escaped3 while heap5 (*_27) only appears in escaped9 and > later? It may be tricky to keep a linear-ish complexity with anything more > powerful than the current PTA. I don't know what LLVM is doing, but they > seem to manage. IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally less powerful. We are using Andersen constraint-based analysis because Steensgard was patented back in time (that patent has now expired). One approach to make PTA "more" flow-sensitive is to partition the function into regions (basically represent the function as a series of function calls to its parts). For the issue above there's the long-standing issue of splitting escapes from function return from escapes to functions called to properly represent local variable lifetime. That said, your simpler patch still relies on up-to-date dominators, sth not guaranteed or required previously. We might consider implementing a separate (region-based) analysis adjusting/pruning points-to information with flow-sensitive knowlege. This might even work when done from inside PTA as post-processing. Richard. > > -- > Marc Glisse
On Tue, 14 May 2019, Richard Biener wrote: >>> In princple PTA should know the aliasing cannot happen but possibly >>> the info is either lost or the IL is too obfuscated at the point it gets >>> computed. (hello C++...) >> >> We don't need much obfuscation for this, a simple C program >> >> int g; >> int*f(int**p){ >> int*q=*p; >> int*r=__builtin_malloc(4); >> *q=1; >> *r=2; >> g=*q; >> return r; >> } >> >> gives >> >> q_4 = *p_3(D); >> r_6 = __builtin_malloc (4); >> *q_4 = 1; >> *r_6 = 2; >> _1 = *q_4; >> g = _1; >> return r_6; >> >> where we clearly don't manage to prove that q and r don't alias. > > We can probably improve this one in general from inside PTA > by treating escapes through return specially. I wasn't looking > too closely at the C++ testcase but I simply assumed the > addition to ESCAPED happens through storing the malloc > result to memory that PTA cannot prove local. Yes, that is indeed the case. I only wrote the version with return to simplify, but the vector testcase does store the malloc result in non-local memory, so handling return as a special case wouldn't help it. >>> So at ldist time I see >>> >>> # PT = null { D.47989 } (escaped, escaped heap) >>> # ALIGN = 8, MISALIGN = 0 >>> # USE = nonlocal null { D.47989 } (escaped, escaped heap) >>> # CLB = nonlocal null { D.47989 } (escaped, escaped heap) >>> _27 = malloc (8192); >>> >>> good. The interesting loop is the following where the allocation PTA >>> is good and _4 intersect because they include ESCAPED. This is >>> because _27 itself escapes and PTA not being flow-sensitive. I don't >>> see an easy way to improve things here but dislike the stmt walking >>> very much. That is, the proper solution is to re-write PTA in some way. >> >> I am trying to imagine what that might look like. I imagine that instead >> of having a single "escaped" set, we would have multiple escaped sets at >> different points in the function (at most one per VDEF?). Then _4 would >> only contain escaped3 while heap5 (*_27) only appears in escaped9 and >> later? It may be tricky to keep a linear-ish complexity with anything more >> powerful than the current PTA. I don't know what LLVM is doing, but they >> seem to manage. > > IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally > less powerful. We are using Andersen constraint-based analysis because > Steensgard was patented back in time (that patent has now expired). In https://llvm.org/docs/AliasAnalysis.html they say that Steensgaard’s algorithm is flow-insensitive and context-insensitive, so it might not be it. They run about 5 different alias analysis engines and combine their results to disambiguate as much as they can. > One approach to make PTA "more" flow-sensitive is to partition the function > into regions (basically represent the function as a series of function calls > to its parts). > > For the issue above there's the long-standing issue of splitting escapes > from function return from escapes to functions called to properly represent > local variable lifetime. > > That said, your simpler patch still relies on up-to-date dominators, sth > not guaranteed or required previously. Ah, dom_info_available_p does not mean that the available information is up-to-date? :-( > We might consider implementing a separate (region-based) analysis > adjusting/pruning points-to information with flow-sensitive knowlege. > This might even work when done from inside PTA as post-processing.
On Tue, May 14, 2019 at 4:05 PM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Tue, 14 May 2019, Richard Biener wrote: > > >>> In princple PTA should know the aliasing cannot happen but possibly > >>> the info is either lost or the IL is too obfuscated at the point it gets > >>> computed. (hello C++...) > >> > >> We don't need much obfuscation for this, a simple C program > >> > >> int g; > >> int*f(int**p){ > >> int*q=*p; > >> int*r=__builtin_malloc(4); > >> *q=1; > >> *r=2; > >> g=*q; > >> return r; > >> } > >> > >> gives > >> > >> q_4 = *p_3(D); > >> r_6 = __builtin_malloc (4); > >> *q_4 = 1; > >> *r_6 = 2; > >> _1 = *q_4; > >> g = _1; > >> return r_6; > >> > >> where we clearly don't manage to prove that q and r don't alias. > > > > We can probably improve this one in general from inside PTA > > by treating escapes through return specially. I wasn't looking > > too closely at the C++ testcase but I simply assumed the > > addition to ESCAPED happens through storing the malloc > > result to memory that PTA cannot prove local. > > Yes, that is indeed the case. I only wrote the version with return to > simplify, but the vector testcase does store the malloc result in > non-local memory, so handling return as a special case wouldn't help it. > > >>> So at ldist time I see > >>> > >>> # PT = null { D.47989 } (escaped, escaped heap) > >>> # ALIGN = 8, MISALIGN = 0 > >>> # USE = nonlocal null { D.47989 } (escaped, escaped heap) > >>> # CLB = nonlocal null { D.47989 } (escaped, escaped heap) > >>> _27 = malloc (8192); > >>> > >>> good. The interesting loop is the following where the allocation PTA > >>> is good and _4 intersect because they include ESCAPED. This is > >>> because _27 itself escapes and PTA not being flow-sensitive. I don't > >>> see an easy way to improve things here but dislike the stmt walking > >>> very much. That is, the proper solution is to re-write PTA in some way. > >> > >> I am trying to imagine what that might look like. I imagine that instead > >> of having a single "escaped" set, we would have multiple escaped sets at > >> different points in the function (at most one per VDEF?). Then _4 would > >> only contain escaped3 while heap5 (*_27) only appears in escaped9 and > >> later? It may be tricky to keep a linear-ish complexity with anything more > >> powerful than the current PTA. I don't know what LLVM is doing, but they > >> seem to manage. > > > > IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally > > less powerful. We are using Andersen constraint-based analysis because > > Steensgard was patented back in time (that patent has now expired). > > In https://llvm.org/docs/AliasAnalysis.html they say that Steensgaard’s > algorithm is flow-insensitive and context-insensitive, so it might not be > it. They run about 5 different alias analysis engines and combine their > results to disambiguate as much as they can. > > > One approach to make PTA "more" flow-sensitive is to partition the function > > into regions (basically represent the function as a series of function calls > > to its parts). > > > > For the issue above there's the long-standing issue of splitting escapes > > from function return from escapes to functions called to properly represent > > local variable lifetime. > > > > That said, your simpler patch still relies on up-to-date dominators, sth > > not guaranteed or required previously. > > Ah, dom_info_available_p does not mean that the available information is > up-to-date? :-( Kind-of. A safer test is dom_info_state (cfun, CDI_DOMINATORS) == DOM_OK. I think the in_gimple_form test is superfluous. As you write the heuristic you could as well remove the malloc result points-to set from the others after points-to analysis is finished? One could even devise something like a "negative" live set (all-locals & ~live-locals), pruning the points-to sets of SSA names on the whole CFG considering that int *p; int y; void foo() { int x; *p = &y; if (p != &y) abort (); *p = &x; } computes p to point to both y and x at the point of the comparison. That is, this isn't special to malloc but to all variables that come into scope only after function entry. This is something I would like much more than these kinds of local tricks in the oracle itself. Richard. > > We might consider implementing a separate (region-based) analysis > > adjusting/pruning points-to information with flow-sensitive knowlege. > > This might even work when done from inside PTA as post-processing. > > -- > Marc Glisse
On Wed, 15 May 2019, Richard Biener wrote: > As you write the heuristic you could as well remove the malloc result > points-to set from the others after points-to analysis is finished? Looking at the vector testcase: #include <vector> #include <memory> #include <new> inline void* operator new(std::size_t n){return malloc(n);} inline void operator delete(void*p){free(p);} typedef std::unique_ptr<int> T; typedef std::vector<T> V; void f(V&v){v.reserve(1024);} I get in the alias pass for malloc _24, points-to NULL, points-to vars: { D.48250 } (escaped, escaped heap) and for the pointer I want to say cannot alias _24 _4, points-to non-local, points-to escaped, points-to NULL, points-to vars: { } I don't see what I can remove from where, all the important info is in "escaped" (well in this case I guess "points-to escaped" could be dropped because this is the very first statement and nothing has escaped yet, but that's too specialized). Actually, I don't even understand how I could encode the fact that they cannot alias in the current points-to structures. It seems that it would require at least changing from one "escaped" to several "escaped_XX" (or more radically several PT info per SSA_NAME). > One could even devise something like a "negative" live set > (all-locals & ~live-locals), pruning the points-to sets of SSA names > on the whole CFG considering that > > int *p; > int y; > void foo() > { > int x; > *p = &y; > if (p != &y) abort (); > *p = &x; > } > > computes p to point to both y and x at the point of the comparison. > > That is, this isn't special to malloc but to all variables that come into > scope only after function entry. Indeed, flow sensitivity can help for more than malloc. > This is something I would like much more than these kinds of local > tricks in the oracle itself. That's interesting, but I don't see how it would solve the "escaped" issue. One detail I noticed: in the alias dump, I see _10 = { ESCAPED NONLOCAL } for a POINTER_DIFF_EXPR that cannot encode a pointer. If I change that to get _10 = { } and _6 = { }, for _19 = (long unsigned int) _10; I have _19 = { } as expected, but for _7 = _6 /[ex] 8; I get _7 = { NONLOCAL } same as v where I was expecting _7 = { }. I guess that's because it involves a constant, and constants can be addresses. It feels like the pass is creating a bit of useless work for itself, though that probably doesn't matter...
On Sat, May 18, 2019 at 2:56 PM Marc Glisse <marc.glisse@inria.fr> wrote: > > On Wed, 15 May 2019, Richard Biener wrote: > > > As you write the heuristic you could as well remove the malloc result > > points-to set from the others after points-to analysis is finished? > > Looking at the vector testcase: > > #include <vector> > #include <memory> > #include <new> > inline void* operator new(std::size_t n){return malloc(n);} > inline void operator delete(void*p){free(p);} > typedef std::unique_ptr<int> T; > typedef std::vector<T> V; > void f(V&v){v.reserve(1024);} > > I get in the alias pass for malloc > > _24, points-to NULL, points-to vars: { D.48250 } (escaped, escaped heap) > > and for the pointer I want to say cannot alias _24 > > _4, points-to non-local, points-to escaped, points-to NULL, points-to vars: { } > > I don't see what I can remove from where, all the important info is in > "escaped" (well in this case I guess "points-to escaped" could be dropped > because this is the very first statement and nothing has escaped yet, but > that's too specialized). Actually, I don't even understand how I could > encode the fact that they cannot alias in the current points-to > structures. It seems that it would require at least changing from one > "escaped" to several "escaped_XX" (or more radically several PT info per > SSA_NAME). Note that while 'escaped' is a special variable during points-to solving once we transition to points-to sets as attached to SSA names it is just a placeholder for a set of DECLs (which does indeed save bits in the points-to bitmaps). To prune sth from escaped on a specific SSA name (as you need to do here) you could simply remove the 'points-to-escaped' bit, expand 'escaped' "inline" into the points-to bitmap of the SSA name and prune from the result. Note I have never done statistics on how much having that separate ESCAPED set helps memory-wise, it definitely makes points-to queries more expensive since we eventually have to look into the ESCAPED solution as well. A better data structure for points-to set representation might be nice to have. > > One could even devise something like a "negative" live set > > (all-locals & ~live-locals), pruning the points-to sets of SSA names > > on the whole CFG considering that > > > > int *p; > > int y; > > void foo() > > { > > int x; > > *p = &y; > > if (p != &y) abort (); > > *p = &x; > > } > > > > computes p to point to both y and x at the point of the comparison. > > > > That is, this isn't special to malloc but to all variables that come into > > scope only after function entry. > > Indeed, flow sensitivity can help for more than malloc. > > > This is something I would like much more than these kinds of local > > tricks in the oracle itself. > > That's interesting, but I don't see how it would solve the "escaped" > issue. > > > > One detail I noticed: in the alias dump, I see > _10 = { ESCAPED NONLOCAL } > for a POINTER_DIFF_EXPR that cannot encode a pointer. If I change that to > get _10 = { } and _6 = { }, for > _19 = (long unsigned int) _10; > I have _19 = { } as expected, but for > _7 = _6 /[ex] 8; > I get _7 = { NONLOCAL } same as v > where I was expecting _7 = { }. I guess that's because it involves a > constant, and constants can be addresses. > It feels like the pass is creating a bit of useless work for itself, > though that probably doesn't matter... Yeah, here it's missing special-casing of some operators, neither POINTER_MINUS_EXPR is handled (always creates a non-pointer, aka points-to nothing?), nor *_DIV_EXPR are handled (same points-to set as the LHS I guess, similar for *_MOD_EXPR). Tracking pointers through integers is both good and bad here I guess... I'll test a patch for the above. Richard. > > -- > Marc Glisse
Index: gcc/testsuite/g++.dg/tree-ssa/ldist-2.C =================================================================== --- gcc/testsuite/g++.dg/tree-ssa/ldist-2.C (nonexistent) +++ gcc/testsuite/g++.dg/tree-ssa/ldist-2.C (working copy) @@ -0,0 +1,14 @@ +// { dg-do compile { target c++11 } } +// { dg-options "-O3 -fdump-tree-optimized" } + +#include <vector> +#include <memory> +#include <new> +// Remove those 2 inlines once gcc knows that new/delete are special +inline void* operator new(std::size_t n){return malloc(n);} +inline void operator delete(void*p){free(p);} +typedef std::unique_ptr<int> T; +typedef std::vector<T> V; +void f(V&v){v.reserve(1024);} + +/* { dg-final { scan-tree-dump "memcpy" "optimized" } } */ Index: gcc/tree-ssa-alias.c =================================================================== --- gcc/tree-ssa-alias.c (revision 271097) +++ gcc/tree-ssa-alias.c (working copy) @@ -31,20 +31,21 @@ along with GCC; see the file COPYING3. #include "cgraph.h" #include "tree-pretty-print.h" #include "alias.h" #include "fold-const.h" #include "langhooks.h" #include "dumpfile.h" #include "tree-eh.h" #include "tree-dfa.h" #include "ipa-reference.h" #include "varasm.h" +#include "tree-ssa-loop-niter.h" /* Broad overview of how alias analysis on gimple works: Statements clobbering or using memory are linked through the virtual operand factored use-def chain. The virtual operand is unique per function, its symbol is accessible via gimple_vop (cfun). Virtual operands are used for efficiently walking memory statements in the gimple IL and are useful for things like value-numbering as a generation count for memory references. @@ -284,20 +285,40 @@ ptr_derefs_may_alias_p (tree ptr1, tree || !POINTER_TYPE_P (TREE_TYPE (ptr1)) || !POINTER_TYPE_P (TREE_TYPE (ptr2))) return true; /* We may end up with two empty points-to solutions for two same pointers. In this case we still want to say both pointers alias, so shortcut that here. */ if (ptr1 == ptr2) return true; + /* Memory returned by malloc cannot alias with a pre-existing pointer. + This is extremely crude, the order between the statements may be quite + arbitrary. */ + if (in_gimple_form && dom_info_available_p (cfun, CDI_DOMINATORS)) + { + gimple *def1 = SSA_NAME_DEF_STMT (ptr1); + gimple *def2 = SSA_NAME_DEF_STMT (ptr2); + tree decl; + if (stmt_dominates_stmt_p (def1, def2) + && is_gimple_call (def2) + && (decl = gimple_call_fndecl (def2)) + && DECL_IS_MALLOC (decl)) + return false; + else if (stmt_dominates_stmt_p (def2, def1) + && is_gimple_call (def1) + && (decl = gimple_call_fndecl (def1)) + && DECL_IS_MALLOC (decl)) + return false; + } + /* If we do not have useful points-to information for either pointer we cannot disambiguate anything else. */ pi1 = SSA_NAME_PTR_INFO (ptr1); pi2 = SSA_NAME_PTR_INFO (ptr2); if (!pi1 || !pi2) return true; /* ??? This does not use TBAA to prune decls from the intersection that not both pointers may access. */ return pt_solutions_intersect (&pi1->pt, &pi2->pt); Index: gcc/tree-ssa-loop-niter.c =================================================================== --- gcc/tree-ssa-loop-niter.c (revision 271097) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -4433,20 +4433,23 @@ stmt_dominates_stmt_p (gimple *s1, gimpl if (gimple_code (s1) == GIMPLE_PHI) return true; for (bsi = gsi_start_bb (bb1); gsi_stmt (bsi) != s2; gsi_next (&bsi)) if (gsi_stmt (bsi) == s1) return true; return false; } + if (!bb2) + return false; + return dominated_by_p (CDI_DOMINATORS, bb2, bb1); } /* Returns true when we can prove that the number of executions of STMT in the loop is at most NITER, according to the bound on the number of executions of the statement NITER_BOUND->stmt recorded in NITER_BOUND and fact that NITER_BOUND->stmt dominate STMT. ??? This code can become quite a CPU hog - we can have many bounds, and large basic block forcing stmt_dominates_stmt_p to be queried