diff mbox series

malloc cannot alias preexisting pointers

Message ID alpine.DEB.2.21.1905120045560.26727@stedding.saclay.inria.fr
State New
Headers show
Series malloc cannot alias preexisting pointers | expand

Commit Message

Marc Glisse May 11, 2019, 11:33 p.m. UTC
Hello,

this patch lets gcc know that if a pointer existed before the call to 
malloc, the result of malloc cannot alias it. This is a bit ad hoc and 
fragile. A small improvement would be, when the 2 statements are in the 
same bb but in the wrong order, to check if there is any statement in 
between that might prevent from reordering them. But that's more 
complicated, and the patch as it is already does help.

I expect people may not like the call to a function from 
tree-ssa-loop-niter too much, but it is convenient. And if someone 
improves it, they will likely have to rewrite something not quite 
equivalent to stmt_dominates_stmt_p.

The testcase uses libstdc++ quite a bit. I thought about putting it in the 
libstdc++ testsuite, but it does not know scan-tree-dump, and in the 
assembly it may be too late, memcpy may get expanded to something 
target-specific. So it is in g++.dg. I could write a more artificial 
testcase, but the behavior of std::vector is really what I want to test...

Bootstrap+regtest on x86_64-pc-linux-gnu.

2019-05-13  Marc Glisse  <marc.glisse@inria.fr>

gcc/
 	* tree-ssa-loop-niter.c (stmt_dominates_stmt_p): Handle NULL
 	basic block.
 	* tree-ssa-alias.c: Include tree-ssa-loop-niter.h.
 	(ptr_derefs_may_alias_p): Detect malloc and an older pointer.

gcc/testsuite/
 	* g++.dg/tree-ssa/ldist-2.C: New file.

Comments

Marc Glisse May 12, 2019, 12:13 a.m. UTC | #1
On Sun, 12 May 2019, Marc Glisse wrote:

> this patch lets gcc know that if a pointer existed before the call to malloc, 
> the result of malloc cannot alias it. This is a bit ad hoc and fragile. A 
> small improvement would be, when the 2 statements are in the same bb but in 
> the wrong order, to check if there is any statement in between that might 
> prevent from reordering them. But that's more complicated, and the patch as 
> it is already does help.

I was also thinking of adding a few more cases:

* if ptr1=PHI<x,y> and ptr2 can alias neither x nor y... but we probably 
need to be careful to avoid a combinatorial explosion.

* if ptr1 is a NULL pointer, we cannot dereference it (at least with the 
right flags), so dereferencing it cannot alias anything. This seems mostly 
useful as a recursive call with PHI<x,0>.

Let me know if this looks like a bad idea.
Richard Sandiford May 12, 2019, 11:33 a.m. UTC | #2
Marc Glisse <marc.glisse@inria.fr> writes:
> Hello,
>
> this patch lets gcc know that if a pointer existed before the call to 
> malloc, the result of malloc cannot alias it. This is a bit ad hoc and 
> fragile. A small improvement would be, when the 2 statements are in the 
> same bb but in the wrong order, to check if there is any statement in 
> between that might prevent from reordering them. But that's more 
> complicated, and the patch as it is already does help.
>
> I expect people may not like the call to a function from 
> tree-ssa-loop-niter too much, but it is convenient. And if someone 
> improves it, they will likely have to rewrite something not quite 
> equivalent to stmt_dominates_stmt_p.

It has linear complexity for statements in the same block though.
(reassoc_stmt_dominates_stmt_p avoids that, but relies on uids
being up-to-date.)

> The testcase uses libstdc++ quite a bit. I thought about putting it in the 
> libstdc++ testsuite, but it does not know scan-tree-dump, and in the 
> assembly it may be too late, memcpy may get expanded to something 
> target-specific. So it is in g++.dg. I could write a more artificial 
> testcase, but the behavior of std::vector is really what I want to test...
>
> Bootstrap+regtest on x86_64-pc-linux-gnu.
>
> 2019-05-13  Marc Glisse  <marc.glisse@inria.fr>
>
> gcc/
>  	* tree-ssa-loop-niter.c (stmt_dominates_stmt_p): Handle NULL
>  	basic block.
>  	* tree-ssa-alias.c: Include tree-ssa-loop-niter.h.
>  	(ptr_derefs_may_alias_p): Detect malloc and an older pointer.
>
> gcc/testsuite/
>  	* g++.dg/tree-ssa/ldist-2.C: New file.
Marc Glisse May 12, 2019, 12:51 p.m. UTC | #3
On Sun, 12 May 2019, Richard Sandiford wrote:

> Marc Glisse <marc.glisse@inria.fr> writes:
>> Hello,
>>
>> this patch lets gcc know that if a pointer existed before the call to
>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and
>> fragile. A small improvement would be, when the 2 statements are in the
>> same bb but in the wrong order, to check if there is any statement in
>> between that might prevent from reordering them. But that's more
>> complicated, and the patch as it is already does help.
>>
>> I expect people may not like the call to a function from
>> tree-ssa-loop-niter too much, but it is convenient. And if someone
>> improves it, they will likely have to rewrite something not quite
>> equivalent to stmt_dominates_stmt_p.
>
> It has linear complexity for statements in the same block though.

Ah, true. I should anyway test that the second statement is malloc 
(cheaper) before checking for domination, but even then that could be used 
to create a quadratic behavior :-(

> (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids
> being up-to-date.)

This function is called from all over the place. Unless there is a simple 
test to check if uids are safe to use (reassoc_in_progress?), that's going 
to be a problem. I find it surprising that this information is so hard to 
get to... Stopping stmt_dominates_stmt_p after walking 128 statements 
doesn't feel like a great solution. But without controlling the pass where 
this happens, I don't see any good solution. It would be great if that 
non-aliasing property could be recorded in the points-to information, then 
I could set it from a pass I control, but I somehow got the impression 
that it wouldn't work. Maybe I should try to understand PTA better to make 
sure.
Richard Biener May 13, 2019, 10:14 a.m. UTC | #4
On Sun, May 12, 2019 at 2:51 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Sun, 12 May 2019, Richard Sandiford wrote:
>
> > Marc Glisse <marc.glisse@inria.fr> writes:
> >> Hello,
> >>
> >> this patch lets gcc know that if a pointer existed before the call to
> >> malloc, the result of malloc cannot alias it. This is a bit ad hoc and
> >> fragile. A small improvement would be, when the 2 statements are in the
> >> same bb but in the wrong order, to check if there is any statement in
> >> between that might prevent from reordering them. But that's more
> >> complicated, and the patch as it is already does help.
> >>
> >> I expect people may not like the call to a function from
> >> tree-ssa-loop-niter too much, but it is convenient. And if someone
> >> improves it, they will likely have to rewrite something not quite
> >> equivalent to stmt_dominates_stmt_p.
> >
> > It has linear complexity for statements in the same block though.
>
> Ah, true. I should anyway test that the second statement is malloc
> (cheaper) before checking for domination, but even then that could be used
> to create a quadratic behavior :-(

I also think the oracle queries shouldn't encompany such expensive pieces...

> > (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids
> > being up-to-date.)
>
> This function is called from all over the place. Unless there is a simple
> test to check if uids are safe to use (reassoc_in_progress?), that's going
> to be a problem. I find it surprising that this information is so hard to
> get to... Stopping stmt_dominates_stmt_p after walking 128 statements
> doesn't feel like a great solution. But without controlling the pass where
> this happens, I don't see any good solution. It would be great if that
> non-aliasing property could be recorded in the points-to information, then
> I could set it from a pass I control, but I somehow got the impression
> that it wouldn't work. Maybe I should try to understand PTA better to make
> sure.

In princple PTA should know the aliasing cannot happen but possibly
the info is either lost or the IL is too obfuscated at the point it gets
computed.  (hello C++...)

So at ldist time I see

  # PT = null { D.47989 } (escaped, escaped heap)
  # ALIGN = 8, MISALIGN = 0
  # USE = nonlocal null { D.47989 } (escaped, escaped heap)
  # CLB = nonlocal null { D.47989 } (escaped, escaped heap)
  _27 = malloc (8192);

good.  The interesting loop is the following where the allocation PTA
is good and _4 intersect because they include ESCAPED.  This is
because _27 itself escapes and PTA not being flow-sensitive.  I don't
see an easy way to improve things here but dislike the stmt walking
very much.  That is, the proper solution is to re-write PTA in some way.

  <bb 4> [local count: 2866890688]:
  # PT = nonlocal escaped null
  # __first_13 = PHI <_4(12), __first_23(13)>
  # PT = null { D.47989 } (escaped, escaped heap)
  # ALIGN = 8, MISALIGN = 0
  # __cur_12 = PHI <_27(12), __cur_24(13)>
  # PT = nonlocal escaped null
  _20 = MEM[(int * const &)__first_13 clique 2 base 0];
  MEM[(int * &)__first_13 clique 2 base 0] = 0B;
  MEM[(struct  &)__cur_12 clique 3 base 1] ={v} {CLOBBER};
  MEM[(struct _Head_base *)__cur_12 clique 3 base 1]._M_head_impl = _20;
  # PT = nonlocal escaped null
  _22 = MEM[(int * &)__first_13];
  if (_22 != 0B)
    goto <bb 5>; [53.47%]
  else
    goto <bb 16>; [46.53%]

  <bb 16> [local count: 1333964241]:
  goto <bb 6>; [100.00%]

  <bb 5> [local count: 1532926450]:
  *_22 ={v} {CLOBBER};
  # USE = nonlocal null { D.47989 } (escaped, escaped heap)
  # CLB = nonlocal null { D.47989 } (escaped, escaped heap)
  operator delete (_22, 4);

  <bb 6> [local count: 2866890691]:
  MEM[(struct  &)__first_13] ={v} {CLOBBER};
  # PT = nonlocal escaped
  __first_23 = __first_13 + 8;
  # PT = { D.47989 } (escaped, escaped heap)
  # ALIGN = 8, MISALIGN = 0
  __cur_24 = __cur_12 + 8;
  if (_9 == __first_23)
    goto <bb 7>; [11.00%]
  else
    goto <bb 13>; [89.00%]

  <bb 13> [local count: 2551532717]:
  goto <bb 4>; [100.00%]


> --
> Marc Glisse
Marc Glisse May 13, 2019, 1:38 p.m. UTC | #5
On Mon, 13 May 2019, Richard Biener wrote:

> On Sun, May 12, 2019 at 2:51 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>>
>> On Sun, 12 May 2019, Richard Sandiford wrote:
>>
>>> Marc Glisse <marc.glisse@inria.fr> writes:
>>>> Hello,
>>>>
>>>> this patch lets gcc know that if a pointer existed before the call to
>>>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and
>>>> fragile. A small improvement would be, when the 2 statements are in the
>>>> same bb but in the wrong order, to check if there is any statement in
>>>> between that might prevent from reordering them. But that's more
>>>> complicated, and the patch as it is already does help.
>>>>
>>>> I expect people may not like the call to a function from
>>>> tree-ssa-loop-niter too much, but it is convenient. And if someone
>>>> improves it, they will likely have to rewrite something not quite
>>>> equivalent to stmt_dominates_stmt_p.
>>>
>>> It has linear complexity for statements in the same block though.
>>
>> Ah, true. I should anyway test that the second statement is malloc
>> (cheaper) before checking for domination, but even then that could be used
>> to create a quadratic behavior :-(
>
> I also think the oracle queries shouldn't encompany such expensive pieces...

Well, it is only expensive because finding the order of statements in a 
basic block is expensive. Would it be better if it only did this check if 
we are in one of a very limited set of passes where statements have 
reliable UIDs? Maybe that's not enough passes, loop-distribution uses UIDs 
but I didn't check if it uses them in a way compatible with this use...

What if I only check basic-block domination and punt when the statements 
are in the same basic block? That seems cheap enough, and would already 
work for the vector testcase.

>>> (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids
>>> being up-to-date.)
>>
>> This function is called from all over the place. Unless there is a simple
>> test to check if uids are safe to use (reassoc_in_progress?), that's going
>> to be a problem. I find it surprising that this information is so hard to
>> get to... Stopping stmt_dominates_stmt_p after walking 128 statements
>> doesn't feel like a great solution. But without controlling the pass where
>> this happens, I don't see any good solution. It would be great if that
>> non-aliasing property could be recorded in the points-to information, then
>> I could set it from a pass I control, but I somehow got the impression
>> that it wouldn't work. Maybe I should try to understand PTA better to make
>> sure.
>
> In princple PTA should know the aliasing cannot happen but possibly
> the info is either lost or the IL is too obfuscated at the point it gets
> computed.  (hello C++...)

We don't need much obfuscation for this, a simple C program

int g;
int*f(int**p){
   int*q=*p;
   int*r=__builtin_malloc(4);
   *q=1;
   *r=2;
   g=*q;
   return r;
}

gives

   q_4 = *p_3(D);
   r_6 = __builtin_malloc (4);
   *q_4 = 1;
   *r_6 = 2;
   _1 = *q_4;
   g = _1;
   return r_6;

where we clearly don't manage to prove that q and r don't alias.

> So at ldist time I see
>
>  # PT = null { D.47989 } (escaped, escaped heap)
>  # ALIGN = 8, MISALIGN = 0
>  # USE = nonlocal null { D.47989 } (escaped, escaped heap)
>  # CLB = nonlocal null { D.47989 } (escaped, escaped heap)
>  _27 = malloc (8192);
>
> good.  The interesting loop is the following where the allocation PTA
> is good and _4 intersect because they include ESCAPED.  This is
> because _27 itself escapes and PTA not being flow-sensitive.  I don't
> see an easy way to improve things here but dislike the stmt walking
> very much.  That is, the proper solution is to re-write PTA in some way.

I am trying to imagine what that might look like. I imagine that instead 
of having a single "escaped" set, we would have multiple escaped sets at 
different points in the function (at most one per VDEF?). Then _4 would 
only contain escaped3 while heap5 (*_27) only appears in escaped9 and 
later? It may be tricky to keep a linear-ish complexity with anything more 
powerful than the current PTA. I don't know what LLVM is doing, but they 
seem to manage.
Martin Sebor May 13, 2019, 4:36 p.m. UTC | #6
On 5/11/19 5:33 PM, Marc Glisse wrote:
> Hello,
> 
> this patch lets gcc know that if a pointer existed before the call to 
> malloc, the result of malloc cannot alias it. This is a bit ad hoc and 
> fragile. A small improvement would be, when the 2 statements are in the 
> same bb but in the wrong order, to check if there is any statement in 
> between that might prevent from reordering them. But that's more 
> complicated, and the patch as it is already does help.

I.e., given the description of attribute malloc:

   the pointer P returned by the function cannot alias any other
   pointer valid when the function returns, and moreover no pointers
   to valid objects occur in any storage addressed by P.

doesn't the same guarantee hold for other functions declared with
the attribute (so that the same optimization could be applied to
them as well)?

Martin

> 
> I expect people may not like the call to a function from 
> tree-ssa-loop-niter too much, but it is convenient. And if someone 
> improves it, they will likely have to rewrite something not quite 
> equivalent to stmt_dominates_stmt_p.
> 
> The testcase uses libstdc++ quite a bit. I thought about putting it in 
> the libstdc++ testsuite, but it does not know scan-tree-dump, and in the 
> assembly it may be too late, memcpy may get expanded to something 
> target-specific. So it is in g++.dg. I could write a more artificial 
> testcase, but the behavior of std::vector is really what I want to test...
> 
> Bootstrap+regtest on x86_64-pc-linux-gnu.
> 
> 2019-05-13  Marc Glisse  <marc.glisse@inria.fr>
> 
> gcc/
>      * tree-ssa-loop-niter.c (stmt_dominates_stmt_p): Handle NULL
>      basic block.
>      * tree-ssa-alias.c: Include tree-ssa-loop-niter.h.
>      (ptr_derefs_may_alias_p): Detect malloc and an older pointer.
> 
> gcc/testsuite/
>      * g++.dg/tree-ssa/ldist-2.C: New file.
>
Jakub Jelinek May 13, 2019, 4:49 p.m. UTC | #7
On Mon, May 13, 2019 at 10:36:00AM -0600, Martin Sebor wrote:
> On 5/11/19 5:33 PM, Marc Glisse wrote:
> > Hello,
> > 
> > this patch lets gcc know that if a pointer existed before the call to
> > malloc, the result of malloc cannot alias it. This is a bit ad hoc and
> > fragile. A small improvement would be, when the 2 statements are in the
> > same bb but in the wrong order, to check if there is any statement in
> > between that might prevent from reordering them. But that's more
> > complicated, and the patch as it is already does help.
> 
> I.e., given the description of attribute malloc:
> 
>   the pointer P returned by the function cannot alias any other
>   pointer valid when the function returns, and moreover no pointers
>   to valid objects occur in any storage addressed by P.
> 
> doesn't the same guarantee hold for other functions declared with
> the attribute (so that the same optimization could be applied to
> them as well)?

Doesn't realloc have also the malloc attribute, but the return value can
alias preexisting pointers (if no reallocation occurs, just the allocation
is extended)?

	Jakub
Martin Sebor May 13, 2019, 5:19 p.m. UTC | #8
On 5/13/19 10:49 AM, Jakub Jelinek wrote:
> On Mon, May 13, 2019 at 10:36:00AM -0600, Martin Sebor wrote:
>> On 5/11/19 5:33 PM, Marc Glisse wrote:
>>> Hello,
>>>
>>> this patch lets gcc know that if a pointer existed before the call to
>>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and
>>> fragile. A small improvement would be, when the 2 statements are in the
>>> same bb but in the wrong order, to check if there is any statement in
>>> between that might prevent from reordering them. But that's more
>>> complicated, and the patch as it is already does help.
>>
>> I.e., given the description of attribute malloc:
>>
>>    the pointer P returned by the function cannot alias any other
>>    pointer valid when the function returns, and moreover no pointers
>>    to valid objects occur in any storage addressed by P.
>>
>> doesn't the same guarantee hold for other functions declared with
>> the attribute (so that the same optimization could be applied to
>> them as well)?
> 
> Doesn't realloc have also the malloc attribute, but the return value can
> alias preexisting pointers (if no reallocation occurs, just the allocation
> is extended)?

Realloc doesn't have the malloc attribute.

Martin
Marc Glisse May 13, 2019, 5:37 p.m. UTC | #9
On Mon, 13 May 2019, Martin Sebor wrote:

> On 5/11/19 5:33 PM, Marc Glisse wrote:
>> Hello,
>> 
>> this patch lets gcc know that if a pointer existed before the call to 
>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and 
>> fragile. A small improvement would be, when the 2 statements are in the 
>> same bb but in the wrong order, to check if there is any statement in 
>> between that might prevent from reordering them. But that's more 
>> complicated, and the patch as it is already does help.
>
> I.e., given the description of attribute malloc:
>
>  the pointer P returned by the function cannot alias any other
>  pointer valid when the function returns, and moreover no pointers
>  to valid objects occur in any storage addressed by P.
>
> doesn't the same guarantee hold for other functions declared with
> the attribute (so that the same optimization could be applied to
> them as well)?

The patch tests DECL_IS_MALLOC, not BUILT_IN_MALLOC. From the failures I 
got with earlier versions, that seems to include alloca at least, so I 
would expect it also includes any function with the attribute.
Martin Sebor May 13, 2019, 6:33 p.m. UTC | #10
On 5/13/19 11:37 AM, Marc Glisse wrote:
> On Mon, 13 May 2019, Martin Sebor wrote:
> 
>> On 5/11/19 5:33 PM, Marc Glisse wrote:
>>> Hello,
>>>
>>> this patch lets gcc know that if a pointer existed before the call to 
>>> malloc, the result of malloc cannot alias it. This is a bit ad hoc 
>>> and fragile. A small improvement would be, when the 2 statements are 
>>> in the same bb but in the wrong order, to check if there is any 
>>> statement in between that might prevent from reordering them. But 
>>> that's more complicated, and the patch as it is already does help.
>>
>> I.e., given the description of attribute malloc:
>>
>>  the pointer P returned by the function cannot alias any other
>>  pointer valid when the function returns, and moreover no pointers
>>  to valid objects occur in any storage addressed by P.
>>
>> doesn't the same guarantee hold for other functions declared with
>> the attribute (so that the same optimization could be applied to
>> them as well)?
> 
> The patch tests DECL_IS_MALLOC, not BUILT_IN_MALLOC. From the failures I 
> got with earlier versions, that seems to include alloca at least, so I 
> would expect it also includes any function with the attribute.

Very good.  May I suggest to add a test case to verify that?

Martin
Richard Biener May 14, 2019, 7:10 a.m. UTC | #11
On Mon, May 13, 2019 at 3:38 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Mon, 13 May 2019, Richard Biener wrote:
>
> > On Sun, May 12, 2019 at 2:51 PM Marc Glisse <marc.glisse@inria.fr> wrote:
> >>
> >> On Sun, 12 May 2019, Richard Sandiford wrote:
> >>
> >>> Marc Glisse <marc.glisse@inria.fr> writes:
> >>>> Hello,
> >>>>
> >>>> this patch lets gcc know that if a pointer existed before the call to
> >>>> malloc, the result of malloc cannot alias it. This is a bit ad hoc and
> >>>> fragile. A small improvement would be, when the 2 statements are in the
> >>>> same bb but in the wrong order, to check if there is any statement in
> >>>> between that might prevent from reordering them. But that's more
> >>>> complicated, and the patch as it is already does help.
> >>>>
> >>>> I expect people may not like the call to a function from
> >>>> tree-ssa-loop-niter too much, but it is convenient. And if someone
> >>>> improves it, they will likely have to rewrite something not quite
> >>>> equivalent to stmt_dominates_stmt_p.
> >>>
> >>> It has linear complexity for statements in the same block though.
> >>
> >> Ah, true. I should anyway test that the second statement is malloc
> >> (cheaper) before checking for domination, but even then that could be used
> >> to create a quadratic behavior :-(
> >
> > I also think the oracle queries shouldn't encompany such expensive pieces...
>
> Well, it is only expensive because finding the order of statements in a
> basic block is expensive. Would it be better if it only did this check if
> we are in one of a very limited set of passes where statements have
> reliable UIDs? Maybe that's not enough passes, loop-distribution uses UIDs
> but I didn't check if it uses them in a way compatible with this use...
>
> What if I only check basic-block domination and punt when the statements
> are in the same basic block? That seems cheap enough, and would already
> work for the vector testcase.
>
> >>> (reassoc_stmt_dominates_stmt_p avoids that, but relies on uids
> >>> being up-to-date.)
> >>
> >> This function is called from all over the place. Unless there is a simple
> >> test to check if uids are safe to use (reassoc_in_progress?), that's going
> >> to be a problem. I find it surprising that this information is so hard to
> >> get to... Stopping stmt_dominates_stmt_p after walking 128 statements
> >> doesn't feel like a great solution. But without controlling the pass where
> >> this happens, I don't see any good solution. It would be great if that
> >> non-aliasing property could be recorded in the points-to information, then
> >> I could set it from a pass I control, but I somehow got the impression
> >> that it wouldn't work. Maybe I should try to understand PTA better to make
> >> sure.
> >
> > In princple PTA should know the aliasing cannot happen but possibly
> > the info is either lost or the IL is too obfuscated at the point it gets
> > computed.  (hello C++...)
>
> We don't need much obfuscation for this, a simple C program
>
> int g;
> int*f(int**p){
>    int*q=*p;
>    int*r=__builtin_malloc(4);
>    *q=1;
>    *r=2;
>    g=*q;
>    return r;
> }
>
> gives
>
>    q_4 = *p_3(D);
>    r_6 = __builtin_malloc (4);
>    *q_4 = 1;
>    *r_6 = 2;
>    _1 = *q_4;
>    g = _1;
>    return r_6;
>
> where we clearly don't manage to prove that q and r don't alias.

We can probably improve this one in general from inside PTA
by treating escapes through return specially.  I wasn't looking
too closely at the C++ testcase but I simply assumed the
addition to ESCAPED happens through storing the malloc
result to memory that PTA cannot prove local.

> > So at ldist time I see
> >
> >  # PT = null { D.47989 } (escaped, escaped heap)
> >  # ALIGN = 8, MISALIGN = 0
> >  # USE = nonlocal null { D.47989 } (escaped, escaped heap)
> >  # CLB = nonlocal null { D.47989 } (escaped, escaped heap)
> >  _27 = malloc (8192);
> >
> > good.  The interesting loop is the following where the allocation PTA
> > is good and _4 intersect because they include ESCAPED.  This is
> > because _27 itself escapes and PTA not being flow-sensitive.  I don't
> > see an easy way to improve things here but dislike the stmt walking
> > very much.  That is, the proper solution is to re-write PTA in some way.
>
> I am trying to imagine what that might look like. I imagine that instead
> of having a single "escaped" set, we would have multiple escaped sets at
> different points in the function (at most one per VDEF?). Then _4 would
> only contain escaped3 while heap5 (*_27) only appears in escaped9 and
> later? It may be tricky to keep a linear-ish complexity with anything more
> powerful than the current PTA. I don't know what LLVM is doing, but they
> seem to manage.

IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally
less powerful.  We are using Andersen constraint-based analysis because
Steensgard was patented back in time (that patent has now expired).

One approach to make PTA "more" flow-sensitive is to partition the function
into regions (basically represent the function as a series of function calls
to its parts).

For the issue above there's the long-standing issue of splitting escapes
from function return from escapes to functions called to properly represent
local variable lifetime.

That said, your simpler patch still relies on up-to-date dominators, sth
not guaranteed or required previously.

We might consider implementing a separate (region-based) analysis
adjusting/pruning points-to information with flow-sensitive knowlege.
This might even work when done from inside PTA as post-processing.

Richard.

>
> --
> Marc Glisse
Marc Glisse May 14, 2019, 2:05 p.m. UTC | #12
On Tue, 14 May 2019, Richard Biener wrote:

>>> In princple PTA should know the aliasing cannot happen but possibly
>>> the info is either lost or the IL is too obfuscated at the point it gets
>>> computed.  (hello C++...)
>>
>> We don't need much obfuscation for this, a simple C program
>>
>> int g;
>> int*f(int**p){
>>    int*q=*p;
>>    int*r=__builtin_malloc(4);
>>    *q=1;
>>    *r=2;
>>    g=*q;
>>    return r;
>> }
>>
>> gives
>>
>>    q_4 = *p_3(D);
>>    r_6 = __builtin_malloc (4);
>>    *q_4 = 1;
>>    *r_6 = 2;
>>    _1 = *q_4;
>>    g = _1;
>>    return r_6;
>>
>> where we clearly don't manage to prove that q and r don't alias.
>
> We can probably improve this one in general from inside PTA
> by treating escapes through return specially.  I wasn't looking
> too closely at the C++ testcase but I simply assumed the
> addition to ESCAPED happens through storing the malloc
> result to memory that PTA cannot prove local.

Yes, that is indeed the case. I only wrote the version with return to 
simplify, but the vector testcase does store the malloc result in 
non-local memory, so handling return as a special case wouldn't help it.

>>> So at ldist time I see
>>>
>>>  # PT = null { D.47989 } (escaped, escaped heap)
>>>  # ALIGN = 8, MISALIGN = 0
>>>  # USE = nonlocal null { D.47989 } (escaped, escaped heap)
>>>  # CLB = nonlocal null { D.47989 } (escaped, escaped heap)
>>>  _27 = malloc (8192);
>>>
>>> good.  The interesting loop is the following where the allocation PTA
>>> is good and _4 intersect because they include ESCAPED.  This is
>>> because _27 itself escapes and PTA not being flow-sensitive.  I don't
>>> see an easy way to improve things here but dislike the stmt walking
>>> very much.  That is, the proper solution is to re-write PTA in some way.
>>
>> I am trying to imagine what that might look like. I imagine that instead
>> of having a single "escaped" set, we would have multiple escaped sets at
>> different points in the function (at most one per VDEF?). Then _4 would
>> only contain escaped3 while heap5 (*_27) only appears in escaped9 and
>> later? It may be tricky to keep a linear-ish complexity with anything more
>> powerful than the current PTA. I don't know what LLVM is doing, but they
>> seem to manage.
>
> IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally
> less powerful.  We are using Andersen constraint-based analysis because
> Steensgard was patented back in time (that patent has now expired).

In https://llvm.org/docs/AliasAnalysis.html they say that Steensgaard’s 
algorithm is flow-insensitive and context-insensitive, so it might not be 
it. They run about 5 different alias analysis engines and combine their 
results to disambiguate as much as they can.

> One approach to make PTA "more" flow-sensitive is to partition the function
> into regions (basically represent the function as a series of function calls
> to its parts).
>
> For the issue above there's the long-standing issue of splitting escapes
> from function return from escapes to functions called to properly represent
> local variable lifetime.
>
> That said, your simpler patch still relies on up-to-date dominators, sth
> not guaranteed or required previously.

Ah, dom_info_available_p does not mean that the available information is 
up-to-date? :-(

> We might consider implementing a separate (region-based) analysis
> adjusting/pruning points-to information with flow-sensitive knowlege.
> This might even work when done from inside PTA as post-processing.
Richard Biener May 15, 2019, 11:48 a.m. UTC | #13
On Tue, May 14, 2019 at 4:05 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Tue, 14 May 2019, Richard Biener wrote:
>
> >>> In princple PTA should know the aliasing cannot happen but possibly
> >>> the info is either lost or the IL is too obfuscated at the point it gets
> >>> computed.  (hello C++...)
> >>
> >> We don't need much obfuscation for this, a simple C program
> >>
> >> int g;
> >> int*f(int**p){
> >>    int*q=*p;
> >>    int*r=__builtin_malloc(4);
> >>    *q=1;
> >>    *r=2;
> >>    g=*q;
> >>    return r;
> >> }
> >>
> >> gives
> >>
> >>    q_4 = *p_3(D);
> >>    r_6 = __builtin_malloc (4);
> >>    *q_4 = 1;
> >>    *r_6 = 2;
> >>    _1 = *q_4;
> >>    g = _1;
> >>    return r_6;
> >>
> >> where we clearly don't manage to prove that q and r don't alias.
> >
> > We can probably improve this one in general from inside PTA
> > by treating escapes through return specially.  I wasn't looking
> > too closely at the C++ testcase but I simply assumed the
> > addition to ESCAPED happens through storing the malloc
> > result to memory that PTA cannot prove local.
>
> Yes, that is indeed the case. I only wrote the version with return to
> simplify, but the vector testcase does store the malloc result in
> non-local memory, so handling return as a special case wouldn't help it.
>
> >>> So at ldist time I see
> >>>
> >>>  # PT = null { D.47989 } (escaped, escaped heap)
> >>>  # ALIGN = 8, MISALIGN = 0
> >>>  # USE = nonlocal null { D.47989 } (escaped, escaped heap)
> >>>  # CLB = nonlocal null { D.47989 } (escaped, escaped heap)
> >>>  _27 = malloc (8192);
> >>>
> >>> good.  The interesting loop is the following where the allocation PTA
> >>> is good and _4 intersect because they include ESCAPED.  This is
> >>> because _27 itself escapes and PTA not being flow-sensitive.  I don't
> >>> see an easy way to improve things here but dislike the stmt walking
> >>> very much.  That is, the proper solution is to re-write PTA in some way.
> >>
> >> I am trying to imagine what that might look like. I imagine that instead
> >> of having a single "escaped" set, we would have multiple escaped sets at
> >> different points in the function (at most one per VDEF?). Then _4 would
> >> only contain escaped3 while heap5 (*_27) only appears in escaped9 and
> >> later? It may be tricky to keep a linear-ish complexity with anything more
> >> powerful than the current PTA. I don't know what LLVM is doing, but they
> >> seem to manage.
> >
> > IIRC LLVM uses Steensgard analysis which is flow-sensitive but generally
> > less powerful.  We are using Andersen constraint-based analysis because
> > Steensgard was patented back in time (that patent has now expired).
>
> In https://llvm.org/docs/AliasAnalysis.html they say that Steensgaard’s
> algorithm is flow-insensitive and context-insensitive, so it might not be
> it. They run about 5 different alias analysis engines and combine their
> results to disambiguate as much as they can.
>
> > One approach to make PTA "more" flow-sensitive is to partition the function
> > into regions (basically represent the function as a series of function calls
> > to its parts).
> >
> > For the issue above there's the long-standing issue of splitting escapes
> > from function return from escapes to functions called to properly represent
> > local variable lifetime.
> >
> > That said, your simpler patch still relies on up-to-date dominators, sth
> > not guaranteed or required previously.
>
> Ah, dom_info_available_p does not mean that the available information is
> up-to-date? :-(

Kind-of.  A safer test is dom_info_state (cfun, CDI_DOMINATORS) == DOM_OK.
I think the in_gimple_form test is superfluous.

As you write the heuristic you could as well remove the malloc result
points-to set from the others after points-to analysis is finished?

One could even devise something like a "negative" live set
(all-locals & ~live-locals), pruning the points-to sets of SSA names
on the whole CFG considering that

int *p;
int y;
void foo()
{
   int x;
   *p = &y;
   if (p != &y) abort ();
   *p = &x;
}

computes p to point to both y and x at the point of the comparison.

That is, this isn't special to malloc but to all variables that come into
scope only after function entry.

This is something I would like much more than these kinds of local
tricks in the oracle itself.

Richard.


> > We might consider implementing a separate (region-based) analysis
> > adjusting/pruning points-to information with flow-sensitive knowlege.
> > This might even work when done from inside PTA as post-processing.
>
> --
> Marc Glisse
Marc Glisse May 18, 2019, 12:55 p.m. UTC | #14
On Wed, 15 May 2019, Richard Biener wrote:

> As you write the heuristic you could as well remove the malloc result
> points-to set from the others after points-to analysis is finished?

Looking at the vector testcase:

#include <vector>
#include <memory>
#include <new>
inline void* operator new(std::size_t n){return malloc(n);}
inline void operator delete(void*p){free(p);}
typedef std::unique_ptr<int> T;
typedef std::vector<T> V;
void f(V&v){v.reserve(1024);}

I get in the alias pass for malloc

_24, points-to NULL, points-to vars: { D.48250 } (escaped, escaped heap)

and for the pointer I want to say cannot alias _24

_4, points-to non-local, points-to escaped, points-to NULL, points-to vars: { }

I don't see what I can remove from where, all the important info is in 
"escaped" (well in this case I guess "points-to escaped" could be dropped 
because this is the very first statement and nothing has escaped yet, but 
that's too specialized). Actually, I don't even understand how I could 
encode the fact that they cannot alias in the current points-to 
structures. It seems that it would require at least changing from one 
"escaped" to several "escaped_XX" (or more radically several PT info per 
SSA_NAME).

> One could even devise something like a "negative" live set
> (all-locals & ~live-locals), pruning the points-to sets of SSA names
> on the whole CFG considering that
>
> int *p;
> int y;
> void foo()
> {
>   int x;
>   *p = &y;
>   if (p != &y) abort ();
>   *p = &x;
> }
>
> computes p to point to both y and x at the point of the comparison.
>
> That is, this isn't special to malloc but to all variables that come into
> scope only after function entry.

Indeed, flow sensitivity can help for more than malloc.

> This is something I would like much more than these kinds of local
> tricks in the oracle itself.

That's interesting, but I don't see how it would solve the "escaped" 
issue.



One detail I noticed: in the alias dump, I see
_10 = { ESCAPED NONLOCAL }
for a POINTER_DIFF_EXPR that cannot encode a pointer. If I change that to 
get _10 = { } and _6 = { }, for
   _19 = (long unsigned int) _10;
I have _19 = { } as expected, but for
   _7 = _6 /[ex] 8;
I get _7 = { NONLOCAL } same as v
where I was expecting _7 = { }. I guess that's because it involves a 
constant, and constants can be addresses.
It feels like the pass is creating a bit of useless work for itself, 
though that probably doesn't matter...
Richard Biener May 20, 2019, 7:44 a.m. UTC | #15
On Sat, May 18, 2019 at 2:56 PM Marc Glisse <marc.glisse@inria.fr> wrote:
>
> On Wed, 15 May 2019, Richard Biener wrote:
>
> > As you write the heuristic you could as well remove the malloc result
> > points-to set from the others after points-to analysis is finished?
>
> Looking at the vector testcase:
>
> #include <vector>
> #include <memory>
> #include <new>
> inline void* operator new(std::size_t n){return malloc(n);}
> inline void operator delete(void*p){free(p);}
> typedef std::unique_ptr<int> T;
> typedef std::vector<T> V;
> void f(V&v){v.reserve(1024);}
>
> I get in the alias pass for malloc
>
> _24, points-to NULL, points-to vars: { D.48250 } (escaped, escaped heap)
>
> and for the pointer I want to say cannot alias _24
>
> _4, points-to non-local, points-to escaped, points-to NULL, points-to vars: { }
>
> I don't see what I can remove from where, all the important info is in
> "escaped" (well in this case I guess "points-to escaped" could be dropped
> because this is the very first statement and nothing has escaped yet, but
> that's too specialized). Actually, I don't even understand how I could
> encode the fact that they cannot alias in the current points-to
> structures. It seems that it would require at least changing from one
> "escaped" to several "escaped_XX" (or more radically several PT info per
> SSA_NAME).

Note that while 'escaped' is a special variable during points-to solving
once we transition to points-to sets as attached to SSA names it is
just a placeholder for a set of DECLs (which does indeed save bits
in the points-to bitmaps).  To prune sth from escaped on a specific
SSA name (as you need to do here) you could simply remove
the 'points-to-escaped' bit, expand 'escaped' "inline" into the points-to
bitmap of the SSA name and prune from the result.

Note I have never done statistics on how much having that separate
ESCAPED set helps memory-wise, it definitely makes points-to
queries more expensive since we eventually have to look into the
ESCAPED solution as well.  A better data structure for points-to
set representation might be nice to have.

> > One could even devise something like a "negative" live set
> > (all-locals & ~live-locals), pruning the points-to sets of SSA names
> > on the whole CFG considering that
> >
> > int *p;
> > int y;
> > void foo()
> > {
> >   int x;
> >   *p = &y;
> >   if (p != &y) abort ();
> >   *p = &x;
> > }
> >
> > computes p to point to both y and x at the point of the comparison.
> >
> > That is, this isn't special to malloc but to all variables that come into
> > scope only after function entry.
>
> Indeed, flow sensitivity can help for more than malloc.
>
> > This is something I would like much more than these kinds of local
> > tricks in the oracle itself.
>
> That's interesting, but I don't see how it would solve the "escaped"
> issue.
>
>
>
> One detail I noticed: in the alias dump, I see
> _10 = { ESCAPED NONLOCAL }
> for a POINTER_DIFF_EXPR that cannot encode a pointer. If I change that to
> get _10 = { } and _6 = { }, for
>    _19 = (long unsigned int) _10;
> I have _19 = { } as expected, but for
>    _7 = _6 /[ex] 8;
> I get _7 = { NONLOCAL } same as v
> where I was expecting _7 = { }. I guess that's because it involves a
> constant, and constants can be addresses.
> It feels like the pass is creating a bit of useless work for itself,
> though that probably doesn't matter...

Yeah, here it's missing special-casing of some operators, neither
POINTER_MINUS_EXPR is handled (always creates a non-pointer,
aka points-to nothing?), nor *_DIV_EXPR are handled (same
points-to set as the LHS I guess, similar for *_MOD_EXPR).
Tracking pointers through integers is both good and bad here
I guess...

I'll test a patch for the above.

Richard.


>
> --
> Marc Glisse
diff mbox series

Patch

Index: gcc/testsuite/g++.dg/tree-ssa/ldist-2.C
===================================================================
--- gcc/testsuite/g++.dg/tree-ssa/ldist-2.C	(nonexistent)
+++ gcc/testsuite/g++.dg/tree-ssa/ldist-2.C	(working copy)
@@ -0,0 +1,14 @@ 
+// { dg-do compile { target c++11 } }
+// { dg-options "-O3 -fdump-tree-optimized" }
+
+#include <vector>
+#include <memory>
+#include <new>
+// Remove those 2 inlines once gcc knows that new/delete are special
+inline void* operator new(std::size_t n){return malloc(n);}
+inline void operator delete(void*p){free(p);}
+typedef std::unique_ptr<int> T;
+typedef std::vector<T> V;
+void f(V&v){v.reserve(1024);}
+
+/* { dg-final { scan-tree-dump "memcpy" "optimized" } } */
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c	(revision 271097)
+++ gcc/tree-ssa-alias.c	(working copy)
@@ -31,20 +31,21 @@  along with GCC; see the file COPYING3.
 #include "cgraph.h"
 #include "tree-pretty-print.h"
 #include "alias.h"
 #include "fold-const.h"
 #include "langhooks.h"
 #include "dumpfile.h"
 #include "tree-eh.h"
 #include "tree-dfa.h"
 #include "ipa-reference.h"
 #include "varasm.h"
+#include "tree-ssa-loop-niter.h"
 
 /* Broad overview of how alias analysis on gimple works:
 
    Statements clobbering or using memory are linked through the
    virtual operand factored use-def chain.  The virtual operand
    is unique per function, its symbol is accessible via gimple_vop (cfun).
    Virtual operands are used for efficiently walking memory statements
    in the gimple IL and are useful for things like value-numbering as
    a generation count for memory references.
 
@@ -284,20 +285,40 @@  ptr_derefs_may_alias_p (tree ptr1, tree
       || !POINTER_TYPE_P (TREE_TYPE (ptr1))
       || !POINTER_TYPE_P (TREE_TYPE (ptr2)))
     return true;
 
   /* We may end up with two empty points-to solutions for two same pointers.
      In this case we still want to say both pointers alias, so shortcut
      that here.  */
   if (ptr1 == ptr2)
     return true;
 
+  /* Memory returned by malloc cannot alias with a pre-existing pointer.
+     This is extremely crude, the order between the statements may be quite
+     arbitrary.  */
+  if (in_gimple_form && dom_info_available_p (cfun, CDI_DOMINATORS))
+    {
+      gimple *def1 = SSA_NAME_DEF_STMT (ptr1);
+      gimple *def2 = SSA_NAME_DEF_STMT (ptr2);
+      tree decl;
+      if (stmt_dominates_stmt_p (def1, def2)
+	  && is_gimple_call (def2)
+	  && (decl = gimple_call_fndecl (def2))
+	  && DECL_IS_MALLOC (decl))
+	return false;
+      else if (stmt_dominates_stmt_p (def2, def1)
+	       && is_gimple_call (def1)
+	       && (decl = gimple_call_fndecl (def1))
+	       && DECL_IS_MALLOC (decl))
+	return false;
+    }
+
   /* If we do not have useful points-to information for either pointer
      we cannot disambiguate anything else.  */
   pi1 = SSA_NAME_PTR_INFO (ptr1);
   pi2 = SSA_NAME_PTR_INFO (ptr2);
   if (!pi1 || !pi2)
     return true;
 
   /* ???  This does not use TBAA to prune decls from the intersection
      that not both pointers may access.  */
   return pt_solutions_intersect (&pi1->pt, &pi2->pt);
Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c	(revision 271097)
+++ gcc/tree-ssa-loop-niter.c	(working copy)
@@ -4433,20 +4433,23 @@  stmt_dominates_stmt_p (gimple *s1, gimpl
       if (gimple_code (s1) == GIMPLE_PHI)
 	return true;
 
       for (bsi = gsi_start_bb (bb1); gsi_stmt (bsi) != s2; gsi_next (&bsi))
 	if (gsi_stmt (bsi) == s1)
 	  return true;
 
       return false;
     }
 
+  if (!bb2)
+    return false;
+
   return dominated_by_p (CDI_DOMINATORS, bb2, bb1);
 }
 
 /* Returns true when we can prove that the number of executions of
    STMT in the loop is at most NITER, according to the bound on
    the number of executions of the statement NITER_BOUND->stmt recorded in
    NITER_BOUND and fact that NITER_BOUND->stmt dominate STMT.
 
    ??? This code can become quite a CPU hog - we can have many bounds,
    and large basic block forcing stmt_dominates_stmt_p to be queried