diff mbox series

doc: clarify the situation with pointer arithmetic

Message ID alpine.LNX.2.20.13.2001201801090.7204@monopod.intra.ispras.ru
State New
Headers show
Series doc: clarify the situation with pointer arithmetic | expand

Commit Message

Alexander Monakov Jan. 20, 2020, 3:08 p.m. UTC
Hi,

we have this paragraph in the documentation that attempts to prohibit something
that is allowed by the language.  Instead, I think we should say that this
generally should work and explain that a problem in GCC implementation
breaks this.

OK to apply?

Thanks.
Alexander


	* doc/implement-c.texi (Arrays and Pointers): Rewrite the paragraph
	about pointer-as-integer arithmetic.

Comments

Sandra Loosemore Jan. 20, 2020, 9:34 p.m. UTC | #1
On 1/20/20 8:08 AM, Alexander Monakov wrote:
> Hi,
> 
> we have this paragraph in the documentation that attempts to prohibit something
> that is allowed by the language.  Instead, I think we should say that this
> generally should work and explain that a problem in GCC implementation
> breaks this.
> 
> OK to apply?
> 
> Thanks.
> Alexander
> 
> 
> 	* doc/implement-c.texi (Arrays and Pointers): Rewrite the paragraph
> 	about pointer-as-integer arithmetic.
> 
> diff --git a/gcc/doc/implement-c.texi b/gcc/doc/implement-c.texi
> index 692297b69c4..beee2510670 100644
> --- a/gcc/doc/implement-c.texi
> +++ b/gcc/doc/implement-c.texi
> @@ -401,11 +401,15 @@ pointer representation is smaller than the integer type, extends according
>   to the signedness of the integer type if the pointer representation
>   is larger than the integer type, otherwise the bits are unchanged.
>   
> -When casting from pointer to integer and back again, the resulting
> -pointer must reference the same object as the original pointer, otherwise
> -the behavior is undefined.  That is, one may not use integer arithmetic to
> -avoid the undefined behavior of pointer arithmetic as proscribed in
> -C99 and C11 6.5.6/8.
> +Arithmetic on integers derived from pointers can produce a value such
> +that casting it back produces a valid pointer corresponding to one of
> +the original pointers.  Thus, integer arithmetic allows to express
> +computations that might not be expressible as pointer arithmetic without
> +undefined behavior.  However, at a certain point the distinction between
> +pointers and integers is lost (when GCC translates from GIMPLE internal
> +representation to RTL), but some optimizations still attempt to track
> +pointer arithmetic beyond that point.  In some cases this may cause
> +valid code to be incorrectly optimized.
>   
>   @item
>   @cite{The size of the result of subtracting two pointers to elements
> 

I'm not happy with this -- we shouldn't be talking about internal 
concepts like GIMPLE and RTL in the GCC user manual.  Can we instead 
talk about which user-visible optimization options cause problems, so 
that users who feel the urgent need to write hacky code like that know 
what to disable?  (I'd guess it would be things involving alias 
analysis.)  Otherwise, I think we should stick with the current "don't 
do this" language or some variant of it that explains briefly that, 
while such code is technically allowed by the C standard, it confuses 
GCC's optimizers.

-Sandra
Alexander Monakov Jan. 20, 2020, 10:16 p.m. UTC | #2
On Mon, 20 Jan 2020, Sandra Loosemore wrote:

> I'm not happy with this -- we shouldn't be talking about internal concepts
> like GIMPLE and RTL in the GCC user manual.  Can we instead talk about which
> user-visible optimization options cause problems, so that users who feel the
> urgent need to write hacky code like that know what to disable?  (I'd guess it
> would be things involving alias analysis.)  Otherwise, I think we should stick
> with the current "don't do this" language or some variant of it that explains
> briefly that, while such code is technically allowed by the C standard, it
> confuses GCC's optimizers.

The problem is in code shared by multiple RTL optimizers, so I'm afraid we
cannot give a list of optimizations to disable. Instead, we can suggest that
users can place optimization barriers on problematic pointers:

    ptr = (void *) iptr;
    __asm__ ("" : "+g"(ptr));
    // use *ptr

i.e. prevent incorrect tracking of possible pointer targets by copying the
pointer value through an asm statement. To make the intent of the asm more
clear, users may wish to enumerate possible targets in input constraints
of such asm:

    ptr = (void *) iptr;
    __asm__ ("" : "+g"(ptr) : "X"(&obj1), "X"(&obj2), ...);
    // use *ptr

Alexander
Joseph Myers Jan. 20, 2020, 11:25 p.m. UTC | #3
On Mon, 20 Jan 2020, Alexander Monakov wrote:

> Hi,
> 
> we have this paragraph in the documentation that attempts to prohibit 
> something that is allowed by the language.  Instead, I think we should 
> say that this generally should work and explain that a problem in GCC 
> implementation breaks this.

Is this based on the proposals to adopt a PNVI model (as described in 
N2362 / N2363 / N2364) for C2x?  Do you have a more detailed analysis of 
exactly which issues would need changes in GCC to follow the proposed 
PNVI-ae-udi semantics?  Setting up a meta-bug in Bugzilla with 
dependencies on such issues might be useful, for example - I know there 
are existing bugs you've filed or commented on, but it would help to have 
a list in a single place of issues for implementing PNVI-ae-udi.

It's not obvious that documentation relating to things that are 
implementation-defined in existing C standard versions, where detailed 
memory model questions were not defined, is an appropriate place to 
discuss questions of how some optimizations might not follow a more 
precise definition proposed for C2x.
Alexander Monakov Jan. 21, 2020, 12:53 p.m. UTC | #4
On Mon, 20 Jan 2020, Joseph Myers wrote:

> On Mon, 20 Jan 2020, Alexander Monakov wrote:
> 
> > Hi,
> > 
> > we have this paragraph in the documentation that attempts to prohibit 
> > something that is allowed by the language.  Instead, I think we should 
> > say that this generally should work and explain that a problem in GCC 
> > implementation breaks this.
> 
> Is this based on the proposals to adopt a PNVI model (as described in 
> N2362 / N2363 / N2364) for C2x?  Do you have a more detailed analysis of 
> exactly which issues would need changes in GCC to follow the proposed 
> PNVI-ae-udi semantics?  Setting up a meta-bug in Bugzilla with 
> dependencies on such issues might be useful, for example - I know there 
> are existing bugs you've filed or commented on, but it would help to have 
> a list in a single place of issues for implementing PNVI-ae-udi.
> 
> It's not obvious that documentation relating to things that are 
> implementation-defined in existing C standard versions, where detailed 
> memory model questions were not defined, is an appropriate place to 
> discuss questions of how some optimizations might not follow a more 
> precise definition proposed for C2x.

My intent was more basic. I observed that the paragraph can be interpreted as
saying that if you have a cast 'I1 = (intptr_t) P1', then perform some
computations on I1 that do not in any way depend on values of other pointers,
then casting the result back can not point to a different object than P1.
But that is not very helpful, because this is the case where the user might have
used normal pointer arithmetic in the first place. The important cases
involve computations that depend on multiple pointers to unrelated objects.

I think that the case discussed in the proposed patch is already not ambiguous
and does not need further clarifications to the standard.

I also think we should provide the users with rationale when imposing such
restrictions, and, as Sandra noted, offer a way to work around the problem.

However I don't mind dropping the patch if it's not appropriate.  Richard,
can you share your opinion here?

Thanks.
Alexander
Richard Biener Jan. 21, 2020, 2:32 p.m. UTC | #5
On Tue, 21 Jan 2020, Alexander Monakov wrote:

> On Mon, 20 Jan 2020, Joseph Myers wrote:
> 
> > On Mon, 20 Jan 2020, Alexander Monakov wrote:
> > 
> > > Hi,
> > > 
> > > we have this paragraph in the documentation that attempts to prohibit 
> > > something that is allowed by the language.  Instead, I think we should 
> > > say that this generally should work and explain that a problem in GCC 
> > > implementation breaks this.
> > 
> > Is this based on the proposals to adopt a PNVI model (as described in 
> > N2362 / N2363 / N2364) for C2x?  Do you have a more detailed analysis of 
> > exactly which issues would need changes in GCC to follow the proposed 
> > PNVI-ae-udi semantics?  Setting up a meta-bug in Bugzilla with 
> > dependencies on such issues might be useful, for example - I know there 
> > are existing bugs you've filed or commented on, but it would help to have 
> > a list in a single place of issues for implementing PNVI-ae-udi.
> > 
> > It's not obvious that documentation relating to things that are 
> > implementation-defined in existing C standard versions, where detailed 
> > memory model questions were not defined, is an appropriate place to 
> > discuss questions of how some optimizations might not follow a more 
> > precise definition proposed for C2x.
> 
> My intent was more basic. I observed that the paragraph can be interpreted as
> saying that if you have a cast 'I1 = (intptr_t) P1', then perform some
> computations on I1 that do not in any way depend on values of other pointers,
> then casting the result back can not point to a different object than P1.
> But that is not very helpful, because this is the case where the user might have
> used normal pointer arithmetic in the first place. The important cases
> involve computations that depend on multiple pointers to unrelated objects.
> 
> I think that the case discussed in the proposed patch is already not ambiguous
> and does not need further clarifications to the standard.
> 
> I also think we should provide the users with rationale when imposing such
> restrictions, and, as Sandra noted, offer a way to work around the problem.
> 
> However I don't mind dropping the patch if it's not appropriate.  Richard,
> can you share your opinion here?

Let me say a few things here.

First I always thought the only thing C guarantees is that you can
convert a pointer to an integer and back (the same value!) then you
get the same pointer if the integer type has sufficient size.  The
C standard nowhere specifies semantics of the case where you modify
the integer "representation" and then convert that to a pointer.
But that may have changed.

Second.  Fact is RTL does not distinguish between pointers and
integers and thus any attempt to make something valid when you
use integers and invalid when you use pointers is not going to work.

Third.  GCC tracks pointed to things through integers.  That means it
 a) assumes the integer representation cannot "leave" an object as
    the current documentation states
 b) it handles "mixing" pointers to different objects in their
    integer representation correctly as to that it assumes the
    result converted to a pointer might point to either object
the tracking implementation details also track pieces and funneling
pointers through FP values.

Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
wants to get the "best" of both which can never be done since a compiler
needs to have a way to be conservative - in this area it's conflicting
conservative treatment which is impossible.

Fifth.  GCC has issues involving equivalences and when (implicitely)
turning control into data dependences.  The C standard seems to
have conflicting (to GCC) wording with regard to equivalences
of pointers vs. integers (well, not actually sure).  For all these
issues I usually point at the second issue above because trying
to plug holes without addressing that very thing isn't going to work
(unless you want to plug the "hole" for integer code as well).

And yeah, a meta-bug or even better a single WIKI page listing
issues (and maybe referencing the various partially redundant and
partially mixing different issues bugzillas) would be helpful here.
To be the entry-point in bugzilla is PR49330, see comment#7 for
the fundamental RTL representation issue.

Thanks,
Richard.
Alexander Monakov Jan. 21, 2020, 3:04 p.m. UTC | #6
On Tue, 21 Jan 2020, Richard Biener wrote:

> Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
> wants to get the "best" of both which can never be done since a compiler
> needs to have a way to be conservative - in this area it's conflicting
> conservative treatment which is impossible.

This paragraph is unclear, I don't immediately see what the conflicting goals
are. The rest is clear enough given the previous discussions I saw.

Did you mean the restriction that you cannot do arithmetic involving two
integers based on pointers, get a value corresponding to one of them,
cast it back and get a pointer suitable for accessing either of two
originally pointed-to objects? I don't see that as a conflict because
it places a restriction on users, not the compiler.

Alexander
Joseph Myers Jan. 22, 2020, 1:15 a.m. UTC | #7
On Tue, 21 Jan 2020, Alexander Monakov wrote:

> My intent was more basic. I observed that the paragraph can be interpreted as
> saying that if you have a cast 'I1 = (intptr_t) P1', then perform some
> computations on I1 that do not in any way depend on values of other pointers,
> then casting the result back can not point to a different object than P1.

Yes.  I don't think the wording in the existing standard (where anything 
other than casting the same integer value back to pointer type - and, 
furthermore, the result is only specified to compare equal to the original 
pointer, not to be otherwise usable in place of it) can distinguish that 
version (PVI) from PNVI.

> But that is not very helpful, because this is the case where the user 
> might have used normal pointer arithmetic in the first place.

I think a typical use case of intptr_t might be e.g. masking off low bits 
of a pointer value for alignment while remaining within the same object.  
Another one might be to compare pointers to different objects in 
implementing memmove-like functions, where ordered pointer comparison 
would not be defined in ISO C.

> I think that the case discussed in the proposed patch is already not ambiguous
> and does not need further clarifications to the standard.

I don't think such a requirement for PNVI can be deduced from the existing 
text.
Joseph Myers Jan. 22, 2020, 1:19 a.m. UTC | #8
On Tue, 21 Jan 2020, Richard Biener wrote:

> Second.  Fact is RTL does not distinguish between pointers and
> integers and thus any attempt to make something valid when you
> use integers and invalid when you use pointers is not going to work.

That simply means that an earlier stage in the compiler would need to mark 
cases of integers converted from pointers in some way as "could have 
provenance of any address-exposed object" (that being the effect of 
PNVI-ae models) unless it can prove the offsets etc. applied do keep the 
pointer within the original object.
Richard Biener Jan. 22, 2020, 7:32 a.m. UTC | #9
On Tue, 21 Jan 2020, Alexander Monakov wrote:

> On Tue, 21 Jan 2020, Richard Biener wrote:
> 
> > Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
> > wants to get the "best" of both which can never be done since a compiler
> > needs to have a way to be conservative - in this area it's conflicting
> > conservative treatment which is impossible.
> 
> This paragraph is unclear, I don't immediately see what the conflicting goals
> are. The rest is clear enough given the previous discussions I saw.
> 
> Did you mean the restriction that you cannot do arithmetic involving two
> integers based on pointers, get a value corresponding to one of them,
> cast it back and get a pointer suitable for accessing either of two
> originally pointed-to objects? I don't see that as a conflict because
> it places a restriction on users, not the compiler.

As far as I remember the discussions PNVI requires to track
provenance for correctness, you may not miss or attach wrong provenance
to a pointer and there's only "single" provenance, not "many"
(aka, may point to A and B).  I don't see how you can ever implement that.

But maybe I'm misremembering.

Richard.
Richard Biener Jan. 22, 2020, 8:04 a.m. UTC | #10
On Wed, 22 Jan 2020, Joseph Myers wrote:

> On Tue, 21 Jan 2020, Richard Biener wrote:
> 
> > Second.  Fact is RTL does not distinguish between pointers and
> > integers and thus any attempt to make something valid when you
> > use integers and invalid when you use pointers is not going to work.
> 
> That simply means that an earlier stage in the compiler would need to mark 
> cases of integers converted from pointers in some way as "could have 
> provenance of any address-exposed object" (that being the effect of 
> PNVI-ae models) unless it can prove the offsets etc. applied do keep the 
> pointer within the original object.

We already have that, it's called points-to analysis which computes
"points-to sets" even for integers.  But even there it assumes you
can't leave an object via arithmetic, doing that would be quite
bad for optimization in the current state or prohibitly expensive
since it would require exact offset tracking.

Now invent something that transfers this to RTL.  Patches to the main
piece where the issue triggers (find_base_term + base_alias_check and the
very similar find_base_value) welcome.

Richard.
Martin Sebor Jan. 22, 2020, 11:40 a.m. UTC | #11
On 1/22/20 8:32 AM, Richard Biener wrote:
> On Tue, 21 Jan 2020, Alexander Monakov wrote:
> 
>> On Tue, 21 Jan 2020, Richard Biener wrote:
>>
>>> Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
>>> wants to get the "best" of both which can never be done since a compiler
>>> needs to have a way to be conservative - in this area it's conflicting
>>> conservative treatment which is impossible.
>>
>> This paragraph is unclear, I don't immediately see what the conflicting goals
>> are. The rest is clear enough given the previous discussions I saw.
>>
>> Did you mean the restriction that you cannot do arithmetic involving two
>> integers based on pointers, get a value corresponding to one of them,
>> cast it back and get a pointer suitable for accessing either of two
>> originally pointed-to objects? I don't see that as a conflict because
>> it places a restriction on users, not the compiler.
> 
> As far as I remember the discussions PNVI requires to track
> provenance for correctness, you may not miss or attach wrong provenance
> to a pointer and there's only "single" provenance, not "many"
> (aka, may point to A and B).  I don't see how you can ever implement that.

The PVNI variant preferred by the object model group is referred
to as "PNVI-ae-udi" which stands for "PNVI exposed-address user-
disambiguation."  (The PNVI part stands for "Provenance Not Via
Integers.)  This base PVNI model basically prohibits provenance
tracking via integers, making it possible for programs to derive
pointers to unrelated objects via casts between pointers and
integers (and modifying the integer in between the casts).  This
is considered a new restriction on implementations because
the standard doesn't permit it (as you said upthread, all it
specifies is that a pointer is equal to one obtained by casting
the original to a intptr_t and back).

The -ae-udi variant limits this restriction on implementations
to escaped pointers and provides a means for users/programs to
disambiguate between pointers to adjacent objects (i.e., a past
the end pointer and one to the beginning of the object stored
there).  The latest proposal is in N2362, with an overview in
N2378).  At the last WG14 meeting there was broad discomfort
with adopting the proposal for C2X because of the absence of
implementation experience and concerns raised by implementers.
The guidance to the study group was to target a separate technical
specification for the proposal and allow time for implementation
experience.  If the feedback from implementers is positive
(whatever that might mean) WG14 said it would consider adopting
the model for a revision of C after C2X.

Overall, the impact of the proposals as well as their goal is to
constrain implementations to the (presumed) benefit of programs
in terms of expressiveness.  There are numerous examples of code
that's currently treated as undefined by one or more compilers
(as a consequence of optimizations) that the model makes valid.
I'm not aware of any optimization opportunities the proposal
might open up in GCC.

Martin
Richard Biener Jan. 23, 2020, 1:18 p.m. UTC | #12
On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote:
>
> On 1/22/20 8:32 AM, Richard Biener wrote:
> > On Tue, 21 Jan 2020, Alexander Monakov wrote:
> >
> >> On Tue, 21 Jan 2020, Richard Biener wrote:
> >>
> >>> Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
> >>> wants to get the "best" of both which can never be done since a compiler
> >>> needs to have a way to be conservative - in this area it's conflicting
> >>> conservative treatment which is impossible.
> >>
> >> This paragraph is unclear, I don't immediately see what the conflicting goals
> >> are. The rest is clear enough given the previous discussions I saw.
> >>
> >> Did you mean the restriction that you cannot do arithmetic involving two
> >> integers based on pointers, get a value corresponding to one of them,
> >> cast it back and get a pointer suitable for accessing either of two
> >> originally pointed-to objects? I don't see that as a conflict because
> >> it places a restriction on users, not the compiler.
> >
> > As far as I remember the discussions PNVI requires to track
> > provenance for correctness, you may not miss or attach wrong provenance
> > to a pointer and there's only "single" provenance, not "many"
> > (aka, may point to A and B).  I don't see how you can ever implement that.
>
> The PVNI variant preferred by the object model group is referred
> to as "PNVI-ae-udi" which stands for "PNVI exposed-address user-
> disambiguation."  (The PNVI part stands for "Provenance Not Via
> Integers.)  This base PVNI model basically prohibits provenance
> tracking via integers, making it possible for programs to derive
> pointers to unrelated objects via casts between pointers and
> integers (and modifying the integer in between the casts).  This
> is considered a new restriction on implementations because
> the standard doesn't permit it (as you said upthread, all it
> specifies is that a pointer is equal to one obtained by casting
> the original to a intptr_t and back).
>
> The -ae-udi variant limits this restriction on implementations
> to escaped pointers and provides a means for users/programs to
> disambiguate between pointers to adjacent objects (i.e., a past
> the end pointer and one to the beginning of the object stored
> there).  The latest proposal is in N2362, with an overview in
> N2378).  At the last WG14 meeting there was broad discomfort
> with adopting the proposal for C2X because of the absence of
> implementation experience and concerns raised by implementers.
> The guidance to the study group was to target a separate technical
> specification for the proposal and allow time for implementation
> experience.  If the feedback from implementers is positive
> (whatever that might mean) WG14 said it would consider adopting
> the model for a revision of C after C2X.
>
> Overall, the impact of the proposals as well as their goal is to
> constrain implementations to the (presumed) benefit of programs
> in terms of expressiveness.  There are numerous examples of code
> that's currently treated as undefined by one or more compilers
> (as a consequence of optimizations) that the model makes valid.
> I'm not aware of any optimization opportunities the proposal
> might open up in GCC.

Well, PNVI limits optimization opportunities of GCC which currently
_does_ track provenance through integers and thus only allows
a very limited set(*) of "unrelated" pointers to appear here (documented
is that none are, implementation details differ from version to version).

There are no optimization "opportunities" by making pointer <-> integer
conversions lose information.

(*) for recent versions we allow pointers to globals to be "invented" but
place strict restrictions on automatic variables based on the idea that
you cannot have reliable absolute addressing of those (but you could
place objects at 0x12340 if you like via linker scripts)

Then there are of course bugs in GCC (just found PR93381) when
tracking provenance through integers (GCC also disregards the
possibility of someone actually moving pointers in very weird ways
which you could say is a bug).

You also can't easily disregard aligning bitwise ands of leaving the
current object if you consider overaligning so you'd again need
precise tracking of pointer offsets in points-to analysis which we don't have.

Richard.

>
> Martin
Uecker, Martin Jan. 23, 2020, 11:46 p.m. UTC | #13
Am Donnerstag, den 23.01.2020, 14:18 +0100 schrieb Richard Biener:
> On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote:
> > 
> > On 1/22/20 8:32 AM, Richard Biener wrote:
> > > On Tue, 21 Jan 2020, Alexander Monakov wrote:
> > > 
> > > > On Tue, 21 Jan 2020, Richard Biener wrote:
> > > > 
> > > > > Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
> > > > > wants to get the "best" of both which can never be done since a compiler
> > > > > needs to have a way to be conservative - in this area it's conflicting
> > > > > conservative treatment which is impossible.
> > > > 
> > > > This paragraph is unclear, I don't immediately see what the conflicting goals
> > > > are. The rest is clear enough given the previous discussions I saw.
> > > > 
> > > > Did you mean the restriction that you cannot do arithmetic involving two
> > > > integers based on pointers, get a value corresponding to one of them,
> > > > cast it back and get a pointer suitable for accessing either of two
> > > > originally pointed-to objects? I don't see that as a conflict because
> > > > it places a restriction on users, not the compiler.
> > > 
> > > As far as I remember the discussions PNVI requires to track
> > > provenance for correctness, you may not miss or attach wrong provenance
> > > to a pointer and there's only "single" provenance, not "many"
> > > (aka, may point to A and B).  I don't see how you can ever implement that.

I have not idea how you came to that conclusion. PNVI is perfectly
compatible with a naive compiler who does not track provenance at
all as well as an abstract machine that actually carries run-time
provenance around with each pointer and checks every operation. 
It was designed specifically to allow both cases and everything
in between (especially compilers who track provenance during
compile time but the programs then do not track provenance at
run-time).

You may be confused by the abstract formulation that indeed
assigns a single provenance to each pointer. A compiler would
track its *knowledge about provenance*, which would be a set
of possible targets.

> > The PVNI variant preferred by the object model group is referred
> > to as "PNVI-ae-udi" which stands for "PNVI exposed-address user-
> > disambiguation."  (The PNVI part stands for "Provenance Not Via
> > Integers.)  This base PVNI model basically prohibits provenance
> > tracking via integers, making it possible for programs to derive
> > pointers to unrelated objects via casts between pointers and
> > integers (and modifying the integer in between the casts).  This
> > is considered a new restriction on implementations because
> > the standard doesn't permit it (as you said upthread, all it
> > specifies is that a pointer is equal to one obtained by casting
> > the original to a intptr_t and back).

This is not entirely clear what the standard means. 7.20.1.4.

In my opinion, converting the same integer back should yield
a valid pointer where "same" is defined in the usual sense
(i.e. via mathematical identity and not via provenance).

> > The -ae-udi variant limits this restriction on implementations
> > to escaped pointers and provides a means for users/programs to
> > disambiguate between pointers to adjacent objects (i.e., a past
> > the end pointer and one to the beginning of the object stored
> > there).  The latest proposal is in N2362, with an overview in
> > N2378).  At the last WG14 meeting there was broad discomfort
> > with adopting the proposal for C2X because of the absence of
> > implementation experience and concerns raised by implementers.
> > The guidance to the study group was to target a separate technical
> > specification for the proposal and allow time for implementation
> > experience.  If the feedback from implementers is positive
> > (whatever that might mean) WG14 said it would consider adopting
> > the model for a revision of C after C2X.
> >
> > Overall, the impact of the proposals as well as their goal is to
> > constrain implementations to the (presumed) benefit of programs
> > in terms of expressiveness.  There are numerous examples of code
> > that's currently treated as undefined by one or more compilers
> > (as a consequence of optimizations) that the model makes valid.
> > I'm not aware of any optimization opportunities the proposal
> > might open up in GCC.
> 
> Well, PNVI limits optimization opportunities of GCC which currently
> _does_ track provenance through integers and thus only allows
> a very limited set(*) of "unrelated" pointers to appear here (documented
> is that none are, implementation details differ from version to version).
> 
> There are no optimization "opportunities" by making pointer <-> integer
> conversions lose information.

You are right: It is meant to constrain optimizations.

The reasoning behind this that currently all compilers behave
inconsistently and this is not terrible useful to anybody.

At the same time, there does not appear to
be any reasonable way how integers can have provenance. 
Any rules we came up with really got complicated and are
also fundamentally at odds with the usual mathematical
properties of integers one would naively expect.


> (*) for recent versions we allow pointers to globals to be "invented" but
> place strict restrictions on automatic variables based on the idea that
> you cannot have reliable absolute addressing of those (but you could
> place objects at 0x12340 if you like via linker scripts)

The idea with PVNI-ae would be that pointers could be "invented"
only to "exposed" objects, i.e. address taken and the address
was cast to int (or escaped compiler analysis). So pointers to
other automatic variables can not be invented.

> Then there are of course bugs in GCC (just found PR93381) when
> tracking provenance through integers (GCC also disregards the
> possibility of someone actually moving pointers in very weird ways
> which you could say is a bug).

Yes, nice example. With the current standard, it is not even
clear whether this is a bug or not.


Martin

> You also can't easily disregard aligning bitwise ands of leaving the
> current object if you consider overaligning so you'd again need
> precise tracking of pointer offsets in points-to analysis which we don't have.
> 
> Richard.
> 
> > 
> > Martin
Richard Biener Jan. 27, 2020, 2:42 p.m. UTC | #14
On Fri, Jan 24, 2020 at 12:46 AM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Donnerstag, den 23.01.2020, 14:18 +0100 schrieb Richard Biener:
> > On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote:
> > >
> > > On 1/22/20 8:32 AM, Richard Biener wrote:
> > > > On Tue, 21 Jan 2020, Alexander Monakov wrote:
> > > >
> > > > > On Tue, 21 Jan 2020, Richard Biener wrote:
> > > > >
> > > > > > Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
> > > > > > wants to get the "best" of both which can never be done since a compiler
> > > > > > needs to have a way to be conservative - in this area it's conflicting
> > > > > > conservative treatment which is impossible.
> > > > >
> > > > > This paragraph is unclear, I don't immediately see what the conflicting goals
> > > > > are. The rest is clear enough given the previous discussions I saw.
> > > > >
> > > > > Did you mean the restriction that you cannot do arithmetic involving two
> > > > > integers based on pointers, get a value corresponding to one of them,
> > > > > cast it back and get a pointer suitable for accessing either of two
> > > > > originally pointed-to objects? I don't see that as a conflict because
> > > > > it places a restriction on users, not the compiler.
> > > >
> > > > As far as I remember the discussions PNVI requires to track
> > > > provenance for correctness, you may not miss or attach wrong provenance
> > > > to a pointer and there's only "single" provenance, not "many"
> > > > (aka, may point to A and B).  I don't see how you can ever implement that.
>
> I have not idea how you came to that conclusion. PNVI is perfectly
> compatible with a naive compiler who does not track provenance at
> all as well as an abstract machine that actually carries run-time
> provenance around with each pointer and checks every operation.
> It was designed specifically to allow both cases and everything
> in between (especially compilers who track provenance during
> compile time but the programs then do not track provenance at
> run-time).
>
> You may be confused by the abstract formulation that indeed
> assigns a single provenance to each pointer. A compiler would
> track its *knowledge about provenance*, which would be a set
> of possible targets.

Well, the question is whether PVNI allows the compiler to put any
additional restriction on what the provenance of an interger is.  It
appears not, so any attempt to track provenance through integers
is doomed until the cases are very simple.  I'm not sure that's desirable (*).

> > > The PVNI variant preferred by the object model group is referred
> > > to as "PNVI-ae-udi" which stands for "PNVI exposed-address user-
> > > disambiguation."  (The PNVI part stands for "Provenance Not Via
> > > Integers.)  This base PVNI model basically prohibits provenance
> > > tracking via integers, making it possible for programs to derive
> > > pointers to unrelated objects via casts between pointers and
> > > integers (and modifying the integer in between the casts).  This
> > > is considered a new restriction on implementations because
> > > the standard doesn't permit it (as you said upthread, all it
> > > specifies is that a pointer is equal to one obtained by casting
> > > the original to a intptr_t and back).
>
> This is not entirely clear what the standard means. 7.20.1.4.
>
> In my opinion, converting the same integer back should yield
> a valid pointer where "same" is defined in the usual sense
> (i.e. via mathematical identity and not via provenance).
>
> > > The -ae-udi variant limits this restriction on implementations
> > > to escaped pointers and provides a means for users/programs to
> > > disambiguate between pointers to adjacent objects (i.e., a past
> > > the end pointer and one to the beginning of the object stored
> > > there).  The latest proposal is in N2362, with an overview in
> > > N2378).  At the last WG14 meeting there was broad discomfort
> > > with adopting the proposal for C2X because of the absence of
> > > implementation experience and concerns raised by implementers.
> > > The guidance to the study group was to target a separate technical
> > > specification for the proposal and allow time for implementation
> > > experience.  If the feedback from implementers is positive
> > > (whatever that might mean) WG14 said it would consider adopting
> > > the model for a revision of C after C2X.
> > >
> > > Overall, the impact of the proposals as well as their goal is to
> > > constrain implementations to the (presumed) benefit of programs
> > > in terms of expressiveness.  There are numerous examples of code
> > > that's currently treated as undefined by one or more compilers
> > > (as a consequence of optimizations) that the model makes valid.
> > > I'm not aware of any optimization opportunities the proposal
> > > might open up in GCC.
> >
> > Well, PNVI limits optimization opportunities of GCC which currently
> > _does_ track provenance through integers and thus only allows
> > a very limited set(*) of "unrelated" pointers to appear here (documented
> > is that none are, implementation details differ from version to version).
> >
> > There are no optimization "opportunities" by making pointer <-> integer
> > conversions lose information.
>
> You are right: It is meant to constrain optimizations.
>
> The reasoning behind this that currently all compilers behave
> inconsistently and this is not terrible useful to anybody.
>
> At the same time, there does not appear to
> be any reasonable way how integers can have provenance.
> Any rules we came up with really got complicated and are
> also fundamentally at odds with the usual mathematical
> properties of integers one would naively expect.

So the original point where GCC started to track provenance through
non-pointers (PVNI should really be PVNNP since I guess tracking
provenance through floats isn't to be done either ;)) was a testcase
showing that Matlab (IIRC) generated C code funneled pointers through
a pair of floats (obviously a single float isn't enough for 64bit pointers ...)
and that prevents a good deal of optimization due to missed alias analysis.
I fixed that and now GCC happily tracks provenance through a pair of
floats ...

(*) this also shows the level of "obfuscation" needed to fool compilers
to lose provenance knowledge is hard to predict.

>
> > (*) for recent versions we allow pointers to globals to be "invented" but
> > place strict restrictions on automatic variables based on the idea that
> > you cannot have reliable absolute addressing of those (but you could
> > place objects at 0x12340 if you like via linker scripts)
>
> The idea with PVNI-ae would be that pointers could be "invented"
> only to "exposed" objects, i.e. address taken and the address
> was cast to int (or escaped compiler analysis). So pointers to
> other automatic variables can not be invented.
>
> > Then there are of course bugs in GCC (just found PR93381) when
> > tracking provenance through integers (GCC also disregards the
> > possibility of someone actually moving pointers in very weird ways
> > which you could say is a bug).
>
> Yes, nice example. With the current standard, it is not even
> clear whether this is a bug or not.
>
>
> Martin
>
> > You also can't easily disregard aligning bitwise ands of leaving the
> > current object if you consider overaligning so you'd again need
> > precise tracking of pointer offsets in points-to analysis which we don't have.
> >
> > Richard.
> >
> > >
> > > Martin
Uecker, Martin Jan. 28, 2020, 12:54 a.m. UTC | #15
Hi Richard,

thank you for your response. 


Am Montag, den 27.01.2020, 15:42 +0100 schrieb Richard Biener:
> On Fri, Jan 24, 2020 at 12:46 AM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > Am Donnerstag, den 23.01.2020, 14:18 +0100 schrieb Richard Biener:
> > > On Wed, Jan 22, 2020 at 12:40 PM Martin Sebor <msebor@gmail.com> wrote:
> > > > 
> > > > On 1/22/20 8:32 AM, Richard Biener wrote:
> > > > > On Tue, 21 Jan 2020, Alexander Monakov wrote:
> > > > > 
> > > > > > On Tue, 21 Jan 2020, Richard Biener wrote:
> > > > > > 
> > > > > > > Fourth.  That PNVI (I assume it's the whole pointer-provenance stuff)
> > > > > > > wants to get the "best" of both which can never be done since a compiler
> > > > > > > needs to have a way to be conservative - in this area it's conflicting
> > > > > > > conservative treatment which is impossible.
> > > > > > 
> > > > > > This paragraph is unclear, I don't immediately see what the conflicting goals
> > > > > > are. The rest is clear enough given the previous discussions I saw.
> > > > > > 
> > > > > > Did you mean the restriction that you cannot do arithmetic involving two
> > > > > > integers based on pointers, get a value corresponding to one of them,
> > > > > > cast it back and get a pointer suitable for accessing either of two
> > > > > > originally pointed-to objects? I don't see that as a conflict because
> > > > > > it places a restriction on users, not the compiler.
> > > > > 
> > > > > As far as I remember the discussions PNVI requires to track
> > > > > provenance for correctness, you may not miss or attach wrong provenance
> > > > > to a pointer and there's only "single" provenance, not "many"
> > > > > (aka, may point to A and B).  I don't see how you can ever implement that.
> > 
> > I have not idea how you came to that conclusion. PNVI is perfectly
> > compatible with a naive compiler who does not track provenance at
> > all as well as an abstract machine that actually carries run-time
> > provenance around with each pointer and checks every operation.
> > It was designed specifically to allow both cases and everything
> > in between (especially compilers who track provenance during
> > compile time but the programs then do not track provenance at
> > run-time).
> > 
> > You may be confused by the abstract formulation that indeed
> > assigns a single provenance to each pointer. A compiler would
> > track its *knowledge about provenance*, which would be a set
> > of possible targets.
> 
> Well, the question is whether PVNI allows the compiler to put any
> additional restriction on what the provenance of an interger is.  It
> appears not, so any attempt to track provenance through integers
> is doomed until the cases are very simple.  I'm not sure that's desirable (*).

PVNI makes something which is now implementation
defined have well defined semantics. This is not possible to do
without putting additional restrictions on the compiler and
this will make some optimizations non-compliant. In both
committees  (C and C++) there is a consensus that the
rules about provenance should be made clear and explicit.
So far, this is the only well-developed proposal.

....

> > > Well, PNVI limits optimization opportunities of GCC which currently
> > > _does_ track provenance through integers and thus only allows
> > > a very limited set(*) of "unrelated" pointers to appear here (documented
> > > is that none are, implementation details differ from version to version).
> > > 
> > > There are no optimization "opportunities" by making pointer <-> integer
> > > conversions lose information.
> > 
> > You are right: It is meant to constrain optimizations.
> > 
> > The reasoning behind this that currently all compilers behave
> > inconsistently and this is not terrible useful to anybody.
> > 
> > At the same time, there does not appear to
> > be any reasonable way how integers can have provenance.
> > Any rules we came up with really got complicated and are
> > also fundamentally at odds with the usual mathematical
> > properties of integers one would naively expect.
> 
> So the original point where GCC started to track provenance through
> non-pointers (PVNI should really be PVNNP since I guess tracking
> provenance through floats isn't to be done either ;)) was a testcase
> showing that Matlab (IIRC) generated C code funneled pointers through
> a pair of floats (obviously a single float isn't enough for 64bit pointers ...)
> and that prevents a good deal of optimization due to missed alias analysis.
> I fixed that and now GCC happily tracks provenance through a pair of
> floats ...

Yes,  this is impressive. And yes, the proposal would
forbid this (i.e. doing aliasing analysis based on the provenance
tracked through integers). 

PVNI would make the Matlab code well-defined (under some
conditions), but break the optimization based on tracking
provenance through floats. But as the currently the code is not
portable at all, I do not think it is really a problem if
we compiler offers this optimization only as a non-standard
compiler option.


> (*) this also shows the level of "obfuscation" needed to fool compilers
> to lose provenance knowledge is hard to predict.

Well, this is exactly the problem we want to address by defining
a clear way to do this. Casting to an integer would be the way
to state: "consider the pointer as escaped and forget the 
provenance"  and casting an integer to a  pointer would
mean "this pointer may point to all objects whose pointer has
escaped". This would give the programmer explicit control about
this aspect and make most existing code using pointer-to-integer
casts well-defined. At the same time, this should be simple
to add to existing points-to analysis.

Best,
Martin
Alexander Monakov Jan. 28, 2020, 7:20 a.m. UTC | #16
On Tue, 28 Jan 2020, Uecker, Martin wrote:

> > (*) this also shows the level of "obfuscation" needed to fool compilers
> > to lose provenance knowledge is hard to predict.
> 
> Well, this is exactly the problem we want to address by defining
> a clear way to do this. Casting to an integer would be the way
> to state: "consider the pointer as escaped and forget the 
> provenance"  and casting an integer to a  pointer would
> mean "this pointer may point to all objects whose pointer has
> escaped". This would give the programmer explicit control about
> this aspect and make most existing code using pointer-to-integer
> casts well-defined. At the same time, this should be simple
> to add to existing points-to analysis.

Can you explain why you make it required for the compiler to treat the
points-to set unnecessarily broader than it could prove? In the Matlab
example, there's a simple chain of computations that the compiler is
following to prove that the pointer resulting from the final cast is
derived from exactly one other pointer (no other pointers have
participated in the computations).

Or, in other words:

is there an example where a programmer can distinguish between the
requirement you explain above vs. the fine-grained interpretation
that GIMPLE aims to implement (at least as I understand it), which is:

  when the program creates a pointer by means of non-pointer computations
  (casts, representation access, etc), the resulting pointer may point to:

    * any object which address could have participated in the computation
      (which is at worst the entire set of "exposed" objects up to that
       program point, but can be much narrower if the compiler can see
       the entire chain of computations)

    * any objects which is not "exposed" but could have known address, e.g.
      because it is placed at a specific address during linking

Thanks.
Alexander
Richard Biener Jan. 28, 2020, 10:01 a.m. UTC | #17
On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amonakov@ispras.ru> wrote:
>
> On Tue, 28 Jan 2020, Uecker, Martin wrote:
>
> > > (*) this also shows the level of "obfuscation" needed to fool compilers
> > > to lose provenance knowledge is hard to predict.
> >
> > Well, this is exactly the problem we want to address by defining
> > a clear way to do this. Casting to an integer would be the way
> > to state: "consider the pointer as escaped and forget the
> > provenance"  and casting an integer to a  pointer would
> > mean "this pointer may point to all objects whose pointer has
> > escaped". This would give the programmer explicit control about
> > this aspect and make most existing code using pointer-to-integer
> > casts well-defined. At the same time, this should be simple
> > to add to existing points-to analysis.
>
> Can you explain why you make it required for the compiler to treat the
> points-to set unnecessarily broader than it could prove? In the Matlab
> example, there's a simple chain of computations that the compiler is
> following to prove that the pointer resulting from the final cast is
> derived from exactly one other pointer (no other pointers have
> participated in the computations).
>
> Or, in other words:
>
> is there an example where a programmer can distinguish between the
> requirement you explain above vs. the fine-grained interpretation
> that GIMPLE aims to implement (at least as I understand it), which is:
>
>   when the program creates a pointer by means of non-pointer computations
>   (casts, representation access, etc), the resulting pointer may point to:
>
>     * any object which address could have participated in the computation
>       (which is at worst the entire set of "exposed" objects up to that
>        program point, but can be much narrower if the compiler can see
>        the entire chain of computations)
>
>     * any objects which is not "exposed" but could have known address, e.g.
>       because it is placed at a specific address during linking

Note for the current PTA implementation there's almost no cases we can
handle conservatively enough.  Consider the simple

 int a[4];
 int *p = &a[1];
 uintptr_t pi = (uintptr_t)p;
 pi += 4;
 int *q = (int *)pi;

our PTA knows that p points to a (but not the exact offset), same for pi
(the cast doesn't change the value).  Now you add 4 - this could lead
you outside of 'a' so the points-to set becomes 'a and anything'.

I'm also not sure what PVNI does to

 int a[4];
 int *p = &a[1];
 p += 10;
 uintptr_t pi = (uintptr_t)p;
 p = (int *)pi;

we assume that p points to a even after p += 10 (but it of course points
outside of the object - obvious here, but not necessarily in more
obfuscated cases).
Now, can we assume pi points to a?  The cast isn't value-changing.  Do we have
to assume (int *)pi points to anything?  So, is

 p = (int *)(uintptr_t)p;

something like "laundering" a pointer?  We don't preserve such simple round-trip
casts since they are value-preserving.  Are they provenance preserving?

Richard.

> Thanks.
> Alexander
Uecker, Martin Jan. 28, 2020, 12:05 p.m. UTC | #18
Am Dienstag, den 28.01.2020, 10:20 +0300 schrieb Alexander Monakov:
> On Tue, 28 Jan 2020, Uecker, Martin wrote:
> 
> > > (*) this also shows the level of "obfuscation" needed to fool compilers
> > > to lose provenance knowledge is hard to predict.
> > 
> > Well, this is exactly the problem we want to address by defining
> > a clear way to do this. Casting to an integer would be the way
> > to state: "consider the pointer as escaped and forget the 
> > provenance"  and casting an integer to a  pointer would
> > mean "this pointer may point to all objects whose pointer has
> > escaped". This would give the programmer explicit control about
> > this aspect and make most existing code using pointer-to-integer
> > casts well-defined. At the same time, this should be simple
> > to add to existing points-to analysis.
> 
> Can you explain why you make it required for the compiler to treat the
> points-to set unnecessarily broader than it could prove? In the Matlab
> example, there's a simple chain of computations that the compiler is
> following to prove that the pointer resulting from the final cast is
> derived from exactly one other pointer (no other pointers have
> participated in the computations).
> 
> Or, in other words:
> 
> is there an example where a programmer can distinguish between the
> requirement you explain above vs. the fine-grained interpretation
> that GIMPLE aims to implement (at least as I understand it), which is:
> 
>   when the program creates a pointer by means of non-pointer computations
>   (casts, representation access, etc), the resulting pointer may point to:
> 
>     * any object which address could have participated in the computation
>       (which is at worst the entire set of "exposed" objects up to that
>        program point, but can be much narrower if the compiler can see
>        the entire chain of computations)
> 
>     * any objects which is not "exposed" but could have known address, e.g.
>       because it is placed at a specific address during linking

Unfortunately, this is not as simple. It is not trivial to
define the set of objects whose "address could have participated
in the computation"

int a = ... random number
int b = &y;
if (a == b) {
  int *p = (int*)a; 

Did '&y' participate in the computation?

What if you output and integer using I/O and read it back in?

What if you copy an integer using control flow?

There are many similar questions like this.

If we want to make this part of the standard, we need to formulate
rules for all integer operations about how the provenance flows.


There are several problems with this. A compiler needs to be able
to compute the complete points-to set. If it might miss an object
which is allowed to be used it has to be conservative with
aliasing - the analysis become useless. 

So we always have the trade-off between making the rules simpler
and restrict the tracking or specify more complicated rules that
allow sophisticated tracking but  then all compilers that want
to do this optimization have to implement this complicated
rules or need to fall back to being conservative. 


Finally, all integer operations would have a potential hidden
second meaning when applied to addresses of objects, which
makes it much easier to reason about.

Assume you have a function

int difference(int a, int b)
{
	return a - b;
}

And later you replace an expression 
'difference(a, a)' with '0' in the program.
This seems a trivial and logical thing to do, but if
this expression was applied to addresses, you might
have broken a provenance chain and introduced a bug.


In my opinion, integers should stay integers with simple
logical properties. 

Gruß,
Martin
Uecker, Martin Jan. 28, 2020, 12:24 p.m. UTC | #19
Am Dienstag, den 28.01.2020, 11:01 +0100 schrieb Richard Biener:
> On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > 
> > On Tue, 28 Jan 2020, Uecker, Martin wrote:
> > 
> > > > (*) this also shows the level of "obfuscation" needed to fool compilers
> > > > to lose provenance knowledge is hard to predict.
> > > 
> > > Well, this is exactly the problem we want to address by defining
> > > a clear way to do this. Casting to an integer would be the way
> > > to state: "consider the pointer as escaped and forget the
> > > provenance"  and casting an integer to a  pointer would
> > > mean "this pointer may point to all objects whose pointer has
> > > escaped". This would give the programmer explicit control about
> > > this aspect and make most existing code using pointer-to-integer
> > > casts well-defined. At the same time, this should be simple
> > > to add to existing points-to analysis.
> > 
> > Can you explain why you make it required for the compiler to treat the
> > points-to set unnecessarily broader than it could prove? In the Matlab
> > example, there's a simple chain of computations that the compiler is
> > following to prove that the pointer resulting from the final cast is
> > derived from exactly one other pointer (no other pointers have
> > participated in the computations).
> > 
> > Or, in other words:
> > 
> > is there an example where a programmer can distinguish between the
> > requirement you explain above vs. the fine-grained interpretation
> > that GIMPLE aims to implement (at least as I understand it), which is:
> > 
> >   when the program creates a pointer by means of non-pointer computations
> >   (casts, representation access, etc), the resulting pointer may point to:
> > 
> >     * any object which address could have participated in the computation
> >       (which is at worst the entire set of "exposed" objects up to that
> >        program point, but can be much narrower if the compiler can see
> >        the entire chain of computations)
> > 
> >     * any objects which is not "exposed" but could have known address, e.g.
> >       because it is placed at a specific address during linking
> 
> Note for the current PTA implementation there's almost no cases we can
> handle conservatively enough.  Consider the simple
> 
>  int a[4];
>  int *p = &a[1];
>  uintptr_t pi = (uintptr_t)p;
>  pi += 4;
>  int *q = (int *)pi;
> 
> our PTA knows that p points to a (but not the exact offset), same for pi
> (the cast doesn't change the value).  Now you add 4 - this could lead
> you outside of 'a' so the points-to set becomes 'a and anything'.

PVNI would say that 'a' gets exposed in the first case and
then 'q' can point to all exposed objects because of the
second cast.

A correct and conservative implementation would do this:
In the PTA you would need to mark the address of a escaped
and then later assign a conservative points-to set (all
escaped objects) to q.

Yes, this limits optimizations, but I do not think this is
terrible. (optimizations could be re-enabled with a compiler
option)

> I'm also not sure what PVNI does to
> 
>  int a[4];
>  int *p = &a[1];
>  p += 10;
>  uintptr_t pi = (uintptr_t)p;
>  p = (int *)pi;
> 
> we assume that p points to a even after p += 10 (but it of course points
> outside of the object - obvious here, but not necessarily in more
> obfuscated cases).

This is UB because a pointer to outside of the object is formed.

> Now, can we assume pi points to a?  The cast isn't value-changing.  Do we have
> to assume (int *)pi points to anything?  So, is
> 
>  p = (int *)(uintptr_t)p;
> 
> something like "laundering" a pointer?  We don't preserve such simple round-trip
> casts since they are value-preserving.  Are they provenance preserving?

Yes, such a pair would be "laundering" as it allows 'p' to then 
point to any exposed object provenance-wise.

For such casts the FE would maybe add a marker. Maybe a calls
to builtin functions 'builtin_expose(a)' and 'builting_bless'.
(having those builtins would be interesting on its own, btw).

Having said this, some optimizations could still be allowed using
the "as-if" rule and other lines of reasoning. Specifally, PVNI
states that 'p' gets assigned the provenance of the object the
integer values is the address of. So if the compiler can proof
that the address belongs to certain objects it can reassign the
points-to set to the new 'p'. Only if there is ambiguity between
which objects the address belongs to, the reasoning needs to
be more conservative. 

For example:
int a[3];
int b[3];

&b; // b also exposed
int *p = (int*)(uintptr_t)&a[3];

Here, p could point to the one-after address of 'a' or the
first element of 'b'. (but only because 'b' was also exposed).

If the compiler can prove that something like this can not
happen (e.g. by considering offsets), it can still do some
tracking of points-to sets. 

Best,
Martin



> Richard.
> 
> > Thanks.
> > Alexander
Alexander Monakov Jan. 28, 2020, 5:42 p.m. UTC | #20
On Tue, 28 Jan 2020, Uecker, Martin wrote:

> Unfortunately, this is not as simple. It is not trivial to
> define the set of objects whose "address could have participated
> in the computation"
> [...]

I am not satisfied with the response, but I'm not sure I should
continue arguing here.

Alexander
Richard Biener Jan. 29, 2020, 8:45 a.m. UTC | #21
On Tue, Jan 28, 2020 at 1:24 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Dienstag, den 28.01.2020, 11:01 +0100 schrieb Richard Biener:
> > On Tue, Jan 28, 2020 at 8:20 AM Alexander Monakov <amonakov@ispras.ru> wrote:
> > >
> > > On Tue, 28 Jan 2020, Uecker, Martin wrote:
> > >
> > > > > (*) this also shows the level of "obfuscation" needed to fool compilers
> > > > > to lose provenance knowledge is hard to predict.
> > > >
> > > > Well, this is exactly the problem we want to address by defining
> > > > a clear way to do this. Casting to an integer would be the way
> > > > to state: "consider the pointer as escaped and forget the
> > > > provenance"  and casting an integer to a  pointer would
> > > > mean "this pointer may point to all objects whose pointer has
> > > > escaped". This would give the programmer explicit control about
> > > > this aspect and make most existing code using pointer-to-integer
> > > > casts well-defined. At the same time, this should be simple
> > > > to add to existing points-to analysis.
> > >
> > > Can you explain why you make it required for the compiler to treat the
> > > points-to set unnecessarily broader than it could prove? In the Matlab
> > > example, there's a simple chain of computations that the compiler is
> > > following to prove that the pointer resulting from the final cast is
> > > derived from exactly one other pointer (no other pointers have
> > > participated in the computations).
> > >
> > > Or, in other words:
> > >
> > > is there an example where a programmer can distinguish between the
> > > requirement you explain above vs. the fine-grained interpretation
> > > that GIMPLE aims to implement (at least as I understand it), which is:
> > >
> > >   when the program creates a pointer by means of non-pointer computations
> > >   (casts, representation access, etc), the resulting pointer may point to:
> > >
> > >     * any object which address could have participated in the computation
> > >       (which is at worst the entire set of "exposed" objects up to that
> > >        program point, but can be much narrower if the compiler can see
> > >        the entire chain of computations)
> > >
> > >     * any objects which is not "exposed" but could have known address, e.g.
> > >       because it is placed at a specific address during linking
> >
> > Note for the current PTA implementation there's almost no cases we can
> > handle conservatively enough.  Consider the simple
> >
> >  int a[4];
> >  int *p = &a[1];
> >  uintptr_t pi = (uintptr_t)p;
> >  pi += 4;
> >  int *q = (int *)pi;
> >
> > our PTA knows that p points to a (but not the exact offset), same for pi
> > (the cast doesn't change the value).  Now you add 4 - this could lead
> > you outside of 'a' so the points-to set becomes 'a and anything'.
>
> PVNI would say that 'a' gets exposed in the first case and
> then 'q' can point to all exposed objects because of the
> second cast.
>
> A correct and conservative implementation would do this:
> In the PTA you would need to mark the address of a escaped
> and then later assign a conservative points-to set (all
> escaped objects) to q.

I see (I'm asking all these questions to see what implementing -fpta-pnvi
would mean).  That might be a feasible implementation route.  How
does "escaped" as used in your answer apply when doing inter-procedural
analysis?  I guess we would need to assume points-to sets include
automatic variables that escaped in the caller.

> Yes, this limits optimizations, but I do not think this is
> terrible. (optimizations could be re-enabled with a compiler
> option)

We'll see.

> > I'm also not sure what PVNI does to
> >
> >  int a[4];
> >  int *p = &a[1];
> >  p += 10;
> >  uintptr_t pi = (uintptr_t)p;
> >  p = (int *)pi;
> >
> > we assume that p points to a even after p += 10 (but it of course points
> > outside of the object - obvious here, but not necessarily in more
> > obfuscated cases).
>
> This is UB because a pointer to outside of the object is formed.
>
> > Now, can we assume pi points to a?  The cast isn't value-changing.  Do we have
> > to assume (int *)pi points to anything?  So, is
> >
> >  p = (int *)(uintptr_t)p;
> >
> > something like "laundering" a pointer?  We don't preserve such simple round-trip
> > casts since they are value-preserving.  Are they provenance preserving?
>
> Yes, such a pair would be "laundering" as it allows 'p' to then
> point to any exposed object provenance-wise.
>
> For such casts the FE would maybe add a marker. Maybe a calls
> to builtin functions 'builtin_expose(a)' and 'builting_bless'.
> (having those builtins would be interesting on its own, btw).

Uh, ok.

> Having said this, some optimizations could still be allowed using
> the "as-if" rule and other lines of reasoning. Specifally, PVNI
> states that 'p' gets assigned the provenance of the object the
> integer values is the address of. So if the compiler can proof
> that the address belongs to certain objects it can reassign the
> points-to set to the new 'p'. Only if there is ambiguity between
> which objects the address belongs to, the reasoning needs to
> be more conservative.
>
> For example:
> int a[3];
> int b[3];
>
> &b; // b also exposed
> int *p = (int*)(uintptr_t)&a[3];
>
> Here, p could point to the one-after address of 'a' or the
> first element of 'b'. (but only because 'b' was also exposed).
>
> If the compiler can prove that something like this can not
> happen (e.g. by considering offsets), it can still do some
> tracking of points-to sets.

That's probably the very case that we'll get wrong since
we definitely won't be able to reliably preserve these
kind of laundering points...

I guess they could be obfuscated like

union { void *p; long l; } u;

u.p = p;
u.l = u.l;
p = u.p;

where GCC (and IIRC also now the standard) allows the
type-punning when reading u.p via u.l and vice versa.
That is, the "conversions" might be hidden in the
memory access types.  That means (our PTA tracks
points-to sets of memory) that all stores of pointers
(even to automatic vars that are themselves not exposed)
make them escaped and CSE would need to desperately
avoid CSEing the load from u.p to p or insert laundering
operations.

I guess I'd me much more happy if PVNI said that when
an integer is converted to a pointer and the integer
is value-equivalent to pointers { p1, p2, ... } then
the provenance of the resulting pointer is
that of p1 (or p2, ... which is semantically equivalent)
and when two pointers p1 and p2 are
value-equivalent and their provenance is not the
same then the behavior is undefined.  That is,

int a, b;
int  *p = &a + 1;
int *q = &b;
if (p == q)
  ... undefined ...

Richard.

> Best,
> Martin
>
>
>
> > Richard.
> >
> > > Thanks.
> > > Alexander
Uecker, Martin Jan. 29, 2020, 2 p.m. UTC | #22
Am Mittwoch, den 29.01.2020, 09:45 +0100 schrieb Richard Biener:
> On Tue, Jan 28, 2020 at 1:24 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:

> > > 
> > > Note for the current PTA implementation there's almost no cases we can
> > > handle conservatively enough.  Consider the simple
> > > 
> > >  int a[4];
> > >  int *p = &a[1];
> > >  uintptr_t pi = (uintptr_t)p;
> > >  pi += 4;
> > >  int *q = (int *)pi;
> > > 
> > > our PTA knows that p points to a (but not the exact offset), same for pi
> > > (the cast doesn't change the value).  Now you add 4 - this could lead
> > > you outside of 'a' so the points-to set becomes 'a and anything'.
> > 
> > PVNI would say that 'a' gets exposed in the first case and
> > then 'q' can point to all exposed objects because of the
> > second cast.
> > 
> > A correct and conservative implementation would do this:
> > In the PTA you would need to mark the address of a escaped
> > and then later assign a conservative points-to set (all
> > escaped objects) to q.
> 
> I see (I'm asking all these questions to see what implementing -fpta-pnvi
> would mean).  

Thank you for looking into this.

> That might be a feasible implementation route.  How
> does "escaped" as used in your answer apply when doing inter-procedural
> analysis?  I guess we would need to assume points-to sets include
> automatic variables that escaped in the caller.

Yes. Is this not something the compiler assumes in general?

If you have code like this:

int pi = foo(p);
int *q = bar(pi);

where 'foo' and 'bar' are unknown to the compiler, it
should have the same  semantics with regard to PTA
as needed for the casts in PVNI.

(except that one knows that q and p have the same
value which could be exploited for optimization)

> > Yes, this limits optimizations, but I do not think this is
> > terrible. (optimizations could be re-enabled with a compiler
> > option)
> 
> We'll see.
> 
> > > I'm also not sure what PVNI does to
> > > 
> > >  int a[4];
> > >  int *p = &a[1];
> > >  p += 10;
> > >  uintptr_t pi = (uintptr_t)p;
> > >  p = (int *)pi;
> > > 
> > > we assume that p points to a even after p += 10 (but it of course points
> > > outside of the object - obvious here, but not necessarily in more
> > > obfuscated cases).
> > 
> > This is UB because a pointer to outside of the object is formed.
> > 
> > > Now, can we assume pi points to a?  The cast isn't value-changing.  Do we have
> > > to assume (int *)pi points to anything?  So, is
> > > 
> > >  p = (int *)(uintptr_t)p;
> > > 
> > > something like "laundering" a pointer?  We don't preserve such simple round-trip
> > > casts since they are value-preserving.  Are they provenance preserving?
> > 
> > Yes, such a pair would be "laundering" as it allows 'p' to then
> > point to any exposed object provenance-wise.
> > 
> > For such casts the FE would maybe add a marker. Maybe a calls
> > to builtin functions 'builtin_expose(a)' and 'builting_bless'.
> > (having those builtins would be interesting on its own, btw).
> 
> Uh, ok.

For testing, I implemented builtin_expose by adding
in 'find_func_aliases_for_builtin_call' 

   case BUILT_IN_ESCAPE:
       {
          make_escape_constraint (gimple_call_arg (t, 0));
          return true;
       }

which seems to work, but I wasn't sure about
the other function.

There are concurrent algorithms which revive dead pointers.
They might also need explicit control to work around provenance
issues.

> > Having said this, some optimizations could still be allowed using
> > the "as-if" rule and other lines of reasoning. Specifally, PVNI
> > states that 'p' gets assigned the provenance of the object the
> > integer values is the address of. So if the compiler can proof
> > that the address belongs to certain objects it can reassign the
> > points-to set to the new 'p'. Only if there is ambiguity between
> > which objects the address belongs to, the reasoning needs to
> > be more conservative.
> > 
> > For example:
> > int a[3];
> > int b[3];
> > 
> > &b; // b also exposed

Correction: this should be:

(intptr_t)&b;

as it is the cast to integers and not just the address-taken
operation that marks the object exposed.

> > int *p = (int*)(uintptr_t)&a[3];
> > 
> > Here, p could point to the one-after address of 'a' or the
> > first element of 'b'. (but only because 'b' was also exposed).
> > 
> > If the compiler can prove that something like this can not
> > happen (e.g. by considering offsets), it can still do some
> > tracking of points-to sets.
> 
> That's probably the very case that we'll get wrong since
> we definitely won't be able to reliably preserve these
> kind of laundering points...

Maybe the FE could add the builtins ?

> I guess they could be obfuscated like
> 
> union { void *p; long l; } u;
> 
> u.p = p;
> u.l = u.l;
> p = u.p;
> 
> where GCC (and IIRC also now the standard) allows the
> type-punning when reading u.p via u.l and vice versa.

It is allowed in C but not in C++.

> That is, the "conversions" might be hidden in the
> memory access types.  That means (our PTA tracks
> points-to sets of memory) that all stores of pointers
> (even to automatic vars that are themselves not exposed)
> make them escaped and CSE would need to desperately
> avoid CSEing the load from u.p to p or insert laundering
> operations.

I am not sure I fully understand the proposal.

> I guess I'd me much more happy if PVNI said that when
> an integer is converted to a pointer and the integer
> is value-equivalent to pointers { p1, p2, ... } then
> the provenance of the resulting pointer is
> that of p1 (or p2, ... which is semantically equivalent)

(if the provenance is the same)

> and when two pointers p1 and p2 are
> value-equivalent and their provenance is not the
> same then the behavior is undefined. 

I see. Then here..

int a[3];
int b[3];

(uintptr_t)&b[0]; // b also exposed
int *p = (int*)(uintptr_t)&a[3];

..the behavior is undefined because the
two pointers have identical addresses 
but different provenance.

I agree, from a compiler writer's point-of-view
this would be a good solution. But to a programmer,
this would be quite difficult to explain.
The preference of the working group was that the casts
should just work in all cases and do what the
programmer intended, even if this prevents some
optimization. But I will see that this is
added to the list of options under consideration.


PVNI-ae-ud assigns the provenance of an
exposed object at the address. If there
are two possible objects (as in the example
above), the pointer could point to both but
then has to be used consistently only with
only one object. Essentially, we want the
pointer to have exactly one provenance but
we might delay the decision. The idea is
that a compiler might figure out the correct
provenance later, e.g. by observing accesses.


It is possible to formulate
some conditions about when a pointer converted
from an integer could get assigned the
points-to-set of a value-equivalent pointer:

1) using knowledge about object location in
memory: If there is no adjacent object which
was exposed, one can conclude that the
provenance is the object at this address.

2) based on offsets: If the pointer points
in the middle of an object, there is also
no ambiguity.

3) a mix of both, to differentiate objects
before and after in memory.


>  That is,
> 
> int a, b;
> int  *p = &a + 1;
> int *q = &b;
> if (p == q)
>   ... undefined ...

We considered making the comparison undefined in the
specific situation where one of the pointer is one-after
pointer and the other a pointer to the beginning of a
different object. This would solve the problems with
conditional equivalences.

Others proposed to make the result of the comparison
unspecified, but I think this does not help.

At the moment, the consensus is that pointer
comparison should be always allowed and the
result should only depend on the address. Again,
the idea is to make is simpler and more consistent
for the programmer. But yes, this makes it more
difficult for the compiler writer.

Best,
Martin
Richard Biener Jan. 30, 2020, 8:30 a.m. UTC | #23
On Wed, Jan 29, 2020 at 3:00 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Mittwoch, den 29.01.2020, 09:45 +0100 schrieb Richard Biener:
> > On Tue, Jan 28, 2020 at 1:24 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
>
> > > >
> > > > Note for the current PTA implementation there's almost no cases we can
> > > > handle conservatively enough.  Consider the simple
> > > >
> > > >  int a[4];
> > > >  int *p = &a[1];
> > > >  uintptr_t pi = (uintptr_t)p;
> > > >  pi += 4;
> > > >  int *q = (int *)pi;
> > > >
> > > > our PTA knows that p points to a (but not the exact offset), same for pi
> > > > (the cast doesn't change the value).  Now you add 4 - this could lead
> > > > you outside of 'a' so the points-to set becomes 'a and anything'.
> > >
> > > PVNI would say that 'a' gets exposed in the first case and
> > > then 'q' can point to all exposed objects because of the
> > > second cast.
> > >
> > > A correct and conservative implementation would do this:
> > > In the PTA you would need to mark the address of a escaped
> > > and then later assign a conservative points-to set (all
> > > escaped objects) to q.
> >
> > I see (I'm asking all these questions to see what implementing -fpta-pnvi
> > would mean).
>
> Thank you for looking into this.
>
> > That might be a feasible implementation route.  How
> > does "escaped" as used in your answer apply when doing inter-procedural
> > analysis?  I guess we would need to assume points-to sets include
> > automatic variables that escaped in the caller.
>
> Yes. Is this not something the compiler assumes in general?
>
> If you have code like this:
>
> int pi = foo(p);
> int *q = bar(pi);
>
> where 'foo' and 'bar' are unknown to the compiler, it
> should have the same  semantics with regard to PTA
> as needed for the casts in PVNI.
>
> (except that one knows that q and p have the same
> value which could be exploited for optimization)
>
> > > Yes, this limits optimizations, but I do not think this is
> > > terrible. (optimizations could be re-enabled with a compiler
> > > option)
> >
> > We'll see.
> >
> > > > I'm also not sure what PVNI does to
> > > >
> > > >  int a[4];
> > > >  int *p = &a[1];
> > > >  p += 10;
> > > >  uintptr_t pi = (uintptr_t)p;
> > > >  p = (int *)pi;
> > > >
> > > > we assume that p points to a even after p += 10 (but it of course points
> > > > outside of the object - obvious here, but not necessarily in more
> > > > obfuscated cases).
> > >
> > > This is UB because a pointer to outside of the object is formed.
> > >
> > > > Now, can we assume pi points to a?  The cast isn't value-changing.  Do we have
> > > > to assume (int *)pi points to anything?  So, is
> > > >
> > > >  p = (int *)(uintptr_t)p;
> > > >
> > > > something like "laundering" a pointer?  We don't preserve such simple round-trip
> > > > casts since they are value-preserving.  Are they provenance preserving?
> > >
> > > Yes, such a pair would be "laundering" as it allows 'p' to then
> > > point to any exposed object provenance-wise.
> > >
> > > For such casts the FE would maybe add a marker. Maybe a calls
> > > to builtin functions 'builtin_expose(a)' and 'builting_bless'.
> > > (having those builtins would be interesting on its own, btw).
> >
> > Uh, ok.
>
> For testing, I implemented builtin_expose by adding
> in 'find_func_aliases_for_builtin_call'
>
>    case BUILT_IN_ESCAPE:
>        {
>           make_escape_constraint (gimple_call_arg (t, 0));
>           return true;
>        }
>
> which seems to work, but I wasn't sure about
> the other function.
>
> There are concurrent algorithms which revive dead pointers.
> They might also need explicit control to work around provenance
> issues.
>
> > > Having said this, some optimizations could still be allowed using
> > > the "as-if" rule and other lines of reasoning. Specifally, PVNI
> > > states that 'p' gets assigned the provenance of the object the
> > > integer values is the address of. So if the compiler can proof
> > > that the address belongs to certain objects it can reassign the
> > > points-to set to the new 'p'. Only if there is ambiguity between
> > > which objects the address belongs to, the reasoning needs to
> > > be more conservative.
> > >
> > > For example:
> > > int a[3];
> > > int b[3];
> > >
> > > &b; // b also exposed
>
> Correction: this should be:
>
> (intptr_t)&b;
>
> as it is the cast to integers and not just the address-taken
> operation that marks the object exposed.
>
> > > int *p = (int*)(uintptr_t)&a[3];
> > >
> > > Here, p could point to the one-after address of 'a' or the
> > > first element of 'b'. (but only because 'b' was also exposed).
> > >
> > > If the compiler can prove that something like this can not
> > > happen (e.g. by considering offsets), it can still do some
> > > tracking of points-to sets.
> >
> > That's probably the very case that we'll get wrong since
> > we definitely won't be able to reliably preserve these
> > kind of laundering points...
>
> Maybe the FE could add the builtins ?
>
> > I guess they could be obfuscated like
> >
> > union { void *p; long l; } u;
> >
> > u.p = p;
> > u.l = u.l;
> > p = u.p;
> >
> > where GCC (and IIRC also now the standard) allows the
> > type-punning when reading u.p via u.l and vice versa.
>
> It is allowed in C but not in C++.
>
> > That is, the "conversions" might be hidden in the
> > memory access types.  That means (our PTA tracks
> > points-to sets of memory) that all stores of pointers
> > (even to automatic vars that are themselves not exposed)
> > make them escaped and CSE would need to desperately
> > avoid CSEing the load from u.p to p or insert laundering
> > operations.
>
> I am not sure I fully understand the proposal.
>
> > I guess I'd me much more happy if PVNI said that when
> > an integer is converted to a pointer and the integer
> > is value-equivalent to pointers { p1, p2, ... } then
> > the provenance of the resulting pointer is
> > that of p1 (or p2, ... which is semantically equivalent)
>
> (if the provenance is the same)
>
> > and when two pointers p1 and p2 are
> > value-equivalent and their provenance is not the
> > same then the behavior is undefined.
>
> I see. Then here..
>
> int a[3];
> int b[3];
>
> (uintptr_t)&b[0]; // b also exposed
> int *p = (int*)(uintptr_t)&a[3];
>
> ..the behavior is undefined because the
> two pointers have identical addresses
> but different provenance.
>
> I agree, from a compiler writer's point-of-view
> this would be a good solution. But to a programmer,
> this would be quite difficult to explain.
> The preference of the working group was that the casts
> should just work in all cases and do what the
> programmer intended, even if this prevents some
> optimization. But I will see that this is
> added to the list of options under consideration.
>
>
> PVNI-ae-ud assigns the provenance of an
> exposed object at the address. If there
> are two possible objects (as in the example
> above), the pointer could point to both but
> then has to be used consistently only with
> only one object. Essentially, we want the
> pointer to have exactly one provenance but
> we might delay the decision. The idea is
> that a compiler might figure out the correct
> provenance later, e.g. by observing accesses.

I thought about alternatives to PVNI and implementation
consequences.  But all different kind of must-behave-like-this
guarantees face serious implementation difficulties I think
so the only alternative to PVNI (which I think is implementable
but at a optimization opportunity cost) is one that makes
two pointers with the same value always have the same
provenance (and otherwise make the behavior undefined).

> It is possible to formulate
> some conditions about when a pointer converted
> from an integer could get assigned the
> points-to-set of a value-equivalent pointer:
>
> 1) using knowledge about object location in
> memory: If there is no adjacent object which
> was exposed, one can conclude that the
> provenance is the object at this address.

Usually at the point compilers want to know objects
are not laid out.  So what compilers do is simply
say the user cannot possibly know so it can
choose at will (even if later object layout disagrees).

> 2) based on offsets: If the pointer points
> in the middle of an object, there is also
> no ambiguity.

The difficulty here lies in the requirement of
exact offset tracking which makes (some?)
points-to implementations prohibitly expensive.
But yes, sure.

> 3) a mix of both, to differentiate objects
> before and after in memory.
>
>
> >  That is,
> >
> > int a, b;
> > int  *p = &a + 1;
> > int *q = &b;
> > if (p == q)
> >   ... undefined ...
>
> We considered making the comparison undefined in the
> specific situation where one of the pointer is one-after
> pointer and the other a pointer to the beginning of a
> different object. This would solve the problems with
> conditional equivalences.

Note my proposal doesn't make the comparison undefined
but the case where both are equivalent cannot be reached
at runtime without invoking undefined behavior.  That means
we can optimize the comparison based on provenance
where p points to a and q points to b.

> Others proposed to make the result of the comparison
> unspecified, but I think this does not help.

Indeed.  It's not unspecified, it's known to evaluate to false.
I think there's existing wording in the standard that
allows it to evaluate to true for pointers one-after-the-object,
that would need to be changed of course.

> At the moment, the consensus is that pointer
> comparison should be always allowed and the
> result should only depend on the address. Again,
> the idea is to make is simpler and more consistent
> for the programmer. But yes, this makes it more
> difficult for the compiler writer.

it's a conflict of interest on the user side as well - users
expect DWIM semantics but at the same time want
fastest possible code...

Richard.

> Best,
> Martin
Uecker, Martin Jan. 30, 2020, 2:10 p.m. UTC | #24
Am Donnerstag, den 30.01.2020, 09:30 +0100 schrieb Richard Biener:
> On Wed, Jan 29, 2020 at 3:00 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:

...

> > > I guess I'd me much more happy if PVNI said that when
> > > an integer is converted to a pointer and the integer
> > > is value-equivalent to pointers { p1, p2, ... } then
> > > the provenance of the resulting pointer is
> > > that of p1 (or p2, ... which is semantically equivalent)
> > 
> > (if the provenance is the same)
> > 
> > > and when two pointers p1 and p2 are
> > > value-equivalent and their provenance is not the
> > > same then the behavior is undefined.
> > 
> > I see. Then here..
> > 
> > int a[3];
> > int b[3];
> > 
> > (uintptr_t)&b[0]; // b also exposed
> > int *p = (int*)(uintptr_t)&a[3];
> > 
> > ..the behavior is undefined because the
> > two pointers have identical addresses
> > but different provenance.
> > 
> > I agree, from a compiler writer's point-of-view
> > this would be a good solution. But to a programmer,
> > this would be quite difficult to explain.
> > The preference of the working group was that the casts
> > should just work in all cases and do what the
> > programmer intended, even if this prevents some
> > optimization. But I will see that this is
> > added to the list of options under consideration.
> > 
> > 
> > PVNI-ae-ud assigns the provenance of an
> > exposed object at the address. If there
> > are two possible objects (as in the example
> > above), the pointer could point to both but
> > then has to be used consistently only with
> > only one object. Essentially, we want the
> > pointer to have exactly one provenance but
> > we might delay the decision. The idea is
> > that a compiler might figure out the correct
> > provenance later, e.g. by observing accesses.
> 
> I thought about alternatives to PVNI and implementation
> consequences.  But all different kind of must-behave-like-this
> guarantees face serious implementation difficulties I think
> so the only alternative to PVNI (which I think is implementable
> but at a optimization opportunity cost) is one that makes
> two pointers with the same value always have the same
> provenance (and otherwise make the behavior undefined).

This would need to come with precise rules about
when the occurance of two such pointers is UB,
e.g. comparisons of such pointers, or that
two such pointers are cast to int in the same
execution.

The mere existance of such pointers should be
quite common and should not already be UB.

But I am uncomfortable with the idea that
comparison of pointers is always allowed except
for some special case which then is UB. This
might cause are and very difficult to find bugs.

> > It is possible to formulate
> > some conditions about when a pointer converted
> > from an integer could get assigned the
> > points-to-set of a value-equivalent pointer:
> > 
> > 1) using knowledge about object location in
> > memory: If there is no adjacent object which
> > was exposed, one can conclude that the
> > provenance is the object at this address.
> 
> Usually at the point compilers want to know objects
> are not laid out.  So what compilers do is simply
> say the user cannot possibly know so it can
> choose at will (even if later object layout disagrees).

The compiler is free to choose at will. But in
my opinion, it then has to stick with its choice.

Otherwise, this leads to really abstract and
confusing semantics. The wording of the standard
also implies that UB is based on actual behavior.

> > 2) based on offsets: If the pointer points
> > in the middle of an object, there is also
> > no ambiguity.
> 
> The difficulty here lies in the requirement of
> exact offset tracking which makes (some?)
> points-to implementations prohibitly expensive.
> But yes, sure.

Yes, but perhaps there are some low-hanging fruit
where it is easy to determine.

> > 3) a mix of both, to differentiate objects
> > before and after in memory.
> > 
> > 
> > >  That is,
> > > 
> > > int a, b;
> > > int  *p = &a + 1;
> > > int *q = &b;
> > > if (p == q)
> > >   ... undefined ...
> > 
> > We considered making the comparison undefined in the
> > specific situation where one of the pointer is one-after
> > pointer and the other a pointer to the beginning of a
> > different object. This would solve the problems with
> > conditional equivalences.
> 
> Note my proposal doesn't make the comparison undefined
> but the case where both are equivalent cannot be reached
> at runtime without invoking undefined behavior.  That means
> we can optimize the comparison based on provenance
> where p points to a and q points to b.

Sorry, I did not get this. What are the exact conditions
for UB?

> > Others proposed to make the result of the comparison
> > unspecified, but I think this does not help.
> 
> Indeed.  It's not unspecified, it's known to evaluate to false.
> I think there's existing wording in the standard that
> allows it to evaluate to true for pointers one-after-the-object,
> that would need to be changed of course.

The problem is that if the comparison if not optimized
and the pointers have the same address, then it would
evaluate to true at run-time. If I understand correctly,
you somehow want to make this case be UB, but I haven't
quite understood how (if it is not the comparison of such
pointers that invokes UB).


> > At the moment, the consensus is that pointer
> > comparison should be always allowed and the
> > result should only depend on the address. Again,
> > the idea is to make is simpler and more consistent
> > for the programmer. But yes, this makes it more
> > difficult for the compiler writer.
> 
> it's a conflict of interest on the user side as well - users
> expect DWIM semantics but at the same time want
> fastest possible code...

Yes. It is not just DWIM but also more easily to understand
semantics which should lead to less bugs. The general feeling
is that C moved a bit too much to the side of fastest
possible code. My personal preference would be to put
PVNI-ae-ud in the standard and have compiler options which
re-enable these - then unsafe from the standard' point-of-view -
optimization for those who need it.

Best,
Martin
Michael Matz Jan. 30, 2020, 4:50 p.m. UTC | #25
Hi,

On Thu, 30 Jan 2020, Uecker, Martin wrote:

> > guarantees face serious implementation difficulties I think
> > so the only alternative to PVNI (which I think is implementable
> > but at a optimization opportunity cost) is one that makes
> > two pointers with the same value always have the same
> > provenance (and otherwise make the behavior undefined).
> 
> This would need to come with precise rules about
> when the occurance of two such pointers is UB,
> e.g. comparisons of such pointers, or that
> two such pointers are cast to int in the same
> execution.
> 
> The mere existance of such pointers should be
> quite common and should not already be UB.
> 
> But I am uncomfortable with the idea that
> comparison of pointers is always allowed except
> for some special case which then is UB. This
> might cause are and very difficult to find bugs.

As Richi said, the comparison itself wouldn't be UB, all comparisons would 
be allowed.  But _if_ the pointers compare equal, they must have same (or 
overlapping) provenance (i.e. when they have not, then _that_ is UB).

> > > Others proposed to make the result of the comparison unspecified, 
> > > but I think this does not help.
> > 
> > Indeed.  It's not unspecified, it's known to evaluate to false. I 
> > think there's existing wording in the standard that allows it to 
> > evaluate to true for pointers one-after-the-object, that would need to 
> > be changed of course.
> 
> The problem is that if the comparison if not optimized
> and the pointers have the same address, then it would
> evaluate to true at run-time. If I understand correctly,
> you somehow want to make this case be UB, but I haven't
> quite understood how (if it is not the comparison of such
> pointers that invokes UB).

By saying something like "if two pointers compare equal they must have the 
same provenance, otherwise the behaviour is undefined".

(I don't know if this definition would or would not help with the problems 
PVNI poses to compilers).


Ciao,
Michael.
Michael Matz Jan. 30, 2020, 4:59 p.m. UTC | #26
Hi,

On Thu, 30 Jan 2020, Michael Matz wrote:

> > and the pointers have the same address, then it would evaluate to true 
> > at run-time. If I understand correctly, you somehow want to make this 
> > case be UB, but I haven't quite understood how (if it is not the 
> > comparison of such pointers that invokes UB).
> 
> By saying something like "if two pointers compare equal they must have 
> the same provenance, otherwise the behaviour is undefined".
> 
> (I don't know if this definition would or would not help with the 
> problems PVNI poses to compilers).

Or, actually I know at least one case.  The problem with allowing 
value-equivalent pointers to have non-overlapping provenance is the 
following: many of the compiler optimizations are based on as-if rules.  
Now, if it's very easy for users to detect certain situations, that means 
that the as-if rules can't be invoked as often.  In this specific 
instance, if the user writes a program where the compiler would optimize 
mem accesses based on non-overlapping provenance (e.g. a stored value is 
propagated downwards over a store of different provenance), and then 
somewhere else also compares these non-overlapping pointers for equality, 
and then, if they are equal prints out "hah! invalid optimization 
detected", and the outcome of the comparison of non-overlapping pointers 
weren't left unspecified, then that's the reason why the compiler would 
have to globally disable the first optimization (at least when it can't 
prove that there aren't any such comparisons).  Ideally we don't want that 
:)


Ciao,
Michael.
Uecker, Martin Jan. 30, 2020, 5:09 p.m. UTC | #27
Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz:
> Hi,
> 
> On Thu, 30 Jan 2020, Uecker, Martin wrote:
> 
> > > guarantees face serious implementation difficulties I think
> > > so the only alternative to PVNI (which I think is implementable
> > > but at a optimization opportunity cost) is one that makes
> > > two pointers with the same value always have the same
> > > provenance (and otherwise make the behavior undefined).
> > 
> > This would need to come with precise rules about
> > when the occurance of two such pointers is UB,
> > e.g. comparisons of such pointers, or that
> > two such pointers are cast to int in the same
> > execution.
> > 
> > The mere existance of such pointers should be
> > quite common and should not already be UB.
> > 
> > But I am uncomfortable with the idea that
> > comparison of pointers is always allowed except
> > for some special case which then is UB. This
> > might cause are and very difficult to find bugs.
> 
> As Richi said, the comparison itself wouldn't be UB, all comparisons would 
> be allowed.  But _if_ the pointers compare equal, they must have same (or 
> overlapping) provenance (i.e. when they have not, then _that_ is UB).

Sorry, I still don't get it.  In the following example,

int a[1];
int b[1];

it is often the case that &a[1] and &b[0] compare equal
because they have the same address but the pointer
have different provenance.  

Or does there need to be an actual evaluation of a comparison
operations? In this case, I do not see the difference to what
I said.


Best,
Martin

> > > > Others proposed to make the result of the comparison unspecified, 
> > > > but I think this does not help.
> > > 
> > > Indeed.  It's not unspecified, it's known to evaluate to false. I 
> > > think there's existing wording in the standard that allows it to 
> > > evaluate to true for pointers one-after-the-object, that would need to 
> > > be changed of course.
> > 
> > The problem is that if the comparison if not optimized
> > and the pointers have the same address, then it would
> > evaluate to true at run-time. If I understand correctly,
> > you somehow want to make this case be UB, but I haven't
> > quite understood how (if it is not the comparison of such
> > pointers that invokes UB).
> 
> By saying something like "if two pointers compare equal they must have the 
> same provenance, otherwise the behaviour is undefined".

> (I don't know if this definition would or would not help with the problems 
> PVNI poses to compilers).
> 
> 
> Ciao,
> Michael.
Richard Biener Jan. 31, 2020, 8:02 a.m. UTC | #28
On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz:
> > Hi,
> >
> > On Thu, 30 Jan 2020, Uecker, Martin wrote:
> >
> > > > guarantees face serious implementation difficulties I think
> > > > so the only alternative to PVNI (which I think is implementable
> > > > but at a optimization opportunity cost) is one that makes
> > > > two pointers with the same value always have the same
> > > > provenance (and otherwise make the behavior undefined).
> > >
> > > This would need to come with precise rules about
> > > when the occurance of two such pointers is UB,
> > > e.g. comparisons of such pointers, or that
> > > two such pointers are cast to int in the same
> > > execution.
> > >
> > > The mere existance of such pointers should be
> > > quite common and should not already be UB.
> > >
> > > But I am uncomfortable with the idea that
> > > comparison of pointers is always allowed except
> > > for some special case which then is UB. This
> > > might cause are and very difficult to find bugs.
> >
> > As Richi said, the comparison itself wouldn't be UB, all comparisons would
> > be allowed.  But _if_ the pointers compare equal, they must have same (or
> > overlapping) provenance (i.e. when they have not, then _that_ is UB).
>
> Sorry, I still don't get it.  In the following example,
>
> int a[1];
> int b[1];
>
> it is often the case that &a[1] and &b[0] compare equal
> because they have the same address but the pointer
> have different provenance.
>
> Or does there need to be an actual evaluation of a comparison
> operations? In this case, I do not see the difference to what
> I said.

I guess I wanted to say that if you do

  if (&a[1] == &b[0])
    if (&a[1] != &b[0])
      abort ();

then the abort might happen.  I'm using the term "undefined behavior"
here.  So whenever you create a value based on two pointers with
the same value and different provenance you invoke undefined behavior.
That allows the compiler to optimize

int *q, *r;
if (q == r)
  *r = 1;

into

if (q == r)
  *q = 1;

which it is currently not allowed to do because of that dread one-after-the
object equality compare, not because of PNVI, but similar cases
obviously can be constructed with integers (and make our live difficult
as we're tracking provenance through integers).

Compilers fundamentally work with value-equivalences, the above example
shows we may not.  That's IMHO a defect in the standard.

Richard.

>
> Best,
> Martin
>
> > > > > Others proposed to make the result of the comparison unspecified,
> > > > > but I think this does not help.
> > > >
> > > > Indeed.  It's not unspecified, it's known to evaluate to false. I
> > > > think there's existing wording in the standard that allows it to
> > > > evaluate to true for pointers one-after-the-object, that would need to
> > > > be changed of course.
> > >
> > > The problem is that if the comparison if not optimized
> > > and the pointers have the same address, then it would
> > > evaluate to true at run-time. If I understand correctly,
> > > you somehow want to make this case be UB, but I haven't
> > > quite understood how (if it is not the comparison of such
> > > pointers that invokes UB).
> >
> > By saying something like "if two pointers compare equal they must have the
> > same provenance, otherwise the behaviour is undefined".
>
> > (I don't know if this definition would or would not help with the problems
> > PVNI poses to compilers).
> >
> >
> > Ciao,
> > Michael.
Uecker, Martin Jan. 31, 2020, 12:05 p.m. UTC | #29
Am Freitag, den 31.01.2020, 09:02 +0100 schrieb Richard Biener:
> On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin
> <Martin.Uecker@med.uni-goettingen.de> wrote:
> > 
> > Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz:
> > > Hi,
> > > 
> > > On Thu, 30 Jan 2020, Uecker, Martin wrote:
> > > 
> > > > > guarantees face serious implementation difficulties I think
> > > > > so the only alternative to PVNI (which I think is implementable
> > > > > but at a optimization opportunity cost) is one that makes
> > > > > two pointers with the same value always have the same
> > > > > provenance (and otherwise make the behavior undefined).
> > > > 
> > > > This would need to come with precise rules about
> > > > when the occurance of two such pointers is UB,
> > > > e.g. comparisons of such pointers, or that
> > > > two such pointers are cast to int in the same
> > > > execution.
> > > > 
> > > > The mere existance of such pointers should be
> > > > quite common and should not already be UB.
> > > > 
> > > > But I am uncomfortable with the idea that
> > > > comparison of pointers is always allowed except
> > > > for some special case which then is UB. This
> > > > might cause are and very difficult to find bugs.
> > > 
> > > As Richi said, the comparison itself wouldn't be UB, all comparisons would
> > > be allowed.  But _if_ the pointers compare equal, they must have same (or
> > > overlapping) provenance (i.e. when they have not, then _that_ is UB).
> > 
> > Sorry, I still don't get it.  In the following example,
> > 
> > int a[1];
> > int b[1];
> > 
> > it is often the case that &a[1] and &b[0] compare equal
> > because they have the same address but the pointer
> > have different provenance.
> > 
> > Or does there need to be an actual evaluation of a comparison
> > operations? In this case, I do not see the difference to what
> > I said.
> 
> I guess I wanted to say that if you do
> 
>   if (&a[1] == &b[0])
>     if (&a[1] != &b[0])
>       abort ();
> 
> then the abort might happen.  I'm using the term "undefined behavior"
> here.  So whenever you create a value based on two pointers with
> the same value and different provenance you invoke undefined behavior.

Yes, but it is tricky because one needs to define
"create a value based on two pointers with..."

Assuming one does not track provenance through integers,
the only way to create expressions using two pointers
are comparisons, pointer subtraction, and the tertiary
operator. 

The tertiary operator seems unproblematic. For pointer
subtraction, the standard already requires same provenance.

For comparisons, one could consider making this case UB.
But I fear this could be the source of subtle bugs.

Then there is the question about what happens if a
programm inspects the representation bytes  of a 
pointer directly...

> That allows the compiler to optimize
> 
> int *q, *r;
> if (q == r)
>   *r = 1;
> 
> into
> 
> if (q == r)
>   *q = 1;
> 
> which it is currently not allowed to do because of that dread one-after-the
> object equality compare, not because of PNVI, but similar cases

Yes, but as provenance is tracked at compile-time, you could still
do the optimization if you assign the right provenance to the
replaced variable, i.e. you replace 'r' with 'q' but keep the
provenance of 'r'. So while this puts a burden on the compiler
writers, it seems feasible. Or am  I missing something?

> obviously can be constructed with integers (and make our live difficult
> as we're tracking provenance through integers).

As in PVNI integers do not have provenance, such an optimization would
always be valid for integers as would all other natural algebraic
optimizations for integers. I consider this a major strength of
the proposal and I kind of hoped that compiler writers would agree.

> Compilers fundamentally work with value-equivalences, the above example
> shows we may not.  That's IMHO a defect in the standard.

I consider provenance to be part of the value. Think about
architectures with descriptors that actually trap if you use
the wrong pointer. This nicely corresponds to a concept
of abstract pointers which not simple the address of a
memory location.

The problems we have that we can not (cheaply) track provenance
at runtime on modern CPUs and only the address part of the pointer
is available ar runtime. For the standard, this implies
that the rules must work both abstract pointers with provenance
and address-only pointers where information about provenance
is not available. Whenever there is a discrepancy between
these two models, we can either make it UB or use the semantics
of the address-only case.

The only real problematic case we have with PVNI is comparisons
for one-after-the object pointers with a pointer of different
provenance. The only choices we have is to make this UB or 
to make the result well-defined and based on the address. 
Both choices have disadvantages.

If we track provenance through integers, there are many
other difficult problems. The reason is that you then
cannot work with value-equivalences anymore even for
integer expressions which are much more complex.
The amount of additional problems we create here 
is the main reason we want to have PVNI and not
track provenance through integers.

Best,
Martin
Richard Biener Jan. 31, 2020, 12:26 p.m. UTC | #30
On Fri, Jan 31, 2020 at 1:05 PM Uecker, Martin
<Martin.Uecker@med.uni-goettingen.de> wrote:
>
> Am Freitag, den 31.01.2020, 09:02 +0100 schrieb Richard Biener:
> > On Thu, Jan 30, 2020 at 6:09 PM Uecker, Martin
> > <Martin.Uecker@med.uni-goettingen.de> wrote:
> > >
> > > Am Donnerstag, den 30.01.2020, 16:50 +0000 schrieb Michael Matz:
> > > > Hi,
> > > >
> > > > On Thu, 30 Jan 2020, Uecker, Martin wrote:
> > > >
> > > > > > guarantees face serious implementation difficulties I think
> > > > > > so the only alternative to PVNI (which I think is implementable
> > > > > > but at a optimization opportunity cost) is one that makes
> > > > > > two pointers with the same value always have the same
> > > > > > provenance (and otherwise make the behavior undefined).
> > > > >
> > > > > This would need to come with precise rules about
> > > > > when the occurance of two such pointers is UB,
> > > > > e.g. comparisons of such pointers, or that
> > > > > two such pointers are cast to int in the same
> > > > > execution.
> > > > >
> > > > > The mere existance of such pointers should be
> > > > > quite common and should not already be UB.
> > > > >
> > > > > But I am uncomfortable with the idea that
> > > > > comparison of pointers is always allowed except
> > > > > for some special case which then is UB. This
> > > > > might cause are and very difficult to find bugs.
> > > >
> > > > As Richi said, the comparison itself wouldn't be UB, all comparisons would
> > > > be allowed.  But _if_ the pointers compare equal, they must have same (or
> > > > overlapping) provenance (i.e. when they have not, then _that_ is UB).
> > >
> > > Sorry, I still don't get it.  In the following example,
> > >
> > > int a[1];
> > > int b[1];
> > >
> > > it is often the case that &a[1] and &b[0] compare equal
> > > because they have the same address but the pointer
> > > have different provenance.
> > >
> > > Or does there need to be an actual evaluation of a comparison
> > > operations? In this case, I do not see the difference to what
> > > I said.
> >
> > I guess I wanted to say that if you do
> >
> >   if (&a[1] == &b[0])
> >     if (&a[1] != &b[0])
> >       abort ();
> >
> > then the abort might happen.  I'm using the term "undefined behavior"
> > here.  So whenever you create a value based on two pointers with
> > the same value and different provenance you invoke undefined behavior.
>
> Yes, but it is tricky because one needs to define
> "create a value based on two pointers with..."
>
> Assuming one does not track provenance through integers,
> the only way to create expressions using two pointers
> are comparisons, pointer subtraction, and the tertiary
> operator.
>
> The tertiary operator seems unproblematic. For pointer
> subtraction, the standard already requires same provenance.
>
> For comparisons, one could consider making this case UB.
> But I fear this could be the source of subtle bugs.
>
> Then there is the question about what happens if a
> programm inspects the representation bytes  of a
> pointer directly...

At least the pointer is then no longer a register ;)

> > That allows the compiler to optimize
> >
> > int *q, *r;
> > if (q == r)
> >   *r = 1;
> >
> > into
> >
> > if (q == r)
> >   *q = 1;
> >
> > which it is currently not allowed to do because of that dread one-after-the
> > object equality compare, not because of PNVI, but similar cases
>
> Yes, but as provenance is tracked at compile-time, you could still
> do the optimization if you assign the right provenance to the
> replaced variable, i.e. you replace 'r' with 'q' but keep the
> provenance of 'r'. So while this puts a burden on the compiler
> writers, it seems feasible. Or am  I missing something?

With SSA it's not easy since q before the comparison is the same so it's

  *q_1 = 2;
  if (q_1 == r_2)
    *r_2 = 1;  ->  *q_1 = 1;

and we cannot change the provenance of q_1 since that affects the
earlier store.  We'd have to somehow attach provenance to all
_operations_ where it matters (the dereference in this case).  That's
a much larger change.

> > obviously can be constructed with integers (and make our live difficult
> > as we're tracking provenance through integers).
>
> As in PVNI integers do not have provenance, such an optimization would
> always be valid for integers as would all other natural algebraic
> optimizations for integers. I consider this a major strength of
> the proposal and I kind of hoped that compiler writers would agree.

Yes, sure - it avoids this class of problems.  PVNI is probably the
very simplest approach to fix whatever problem it tries to fix ;)

> > Compilers fundamentally work with value-equivalences, the above example
> > shows we may not.  That's IMHO a defect in the standard.
>
> I consider provenance to be part of the value. Think about
> architectures with descriptors that actually trap if you use
> the wrong pointer. This nicely corresponds to a concept
> of abstract pointers which not simple the address of a
> memory location.
>
> The problems we have that we can not (cheaply) track provenance
> at runtime on modern CPUs and only the address part of the pointer
> is available ar runtime. For the standard, this implies
> that the rules must work both abstract pointers with provenance
> and address-only pointers where information about provenance
> is not available. Whenever there is a discrepancy between
> these two models, we can either make it UB or use the semantics
> of the address-only case.
>
> The only real problematic case we have with PVNI is comparisons
> for one-after-the object pointers with a pointer of different
> provenance. The only choices we have is to make this UB or
> to make the result well-defined and based on the address.
> Both choices have disadvantages.
>
> If we track provenance through integers, there are many
> other difficult problems. The reason is that you then
> cannot work with value-equivalences anymore even for
> integer expressions which are much more complex.
> The amount of additional problems we create here
> is the main reason we want to have PVNI and not
> track provenance through integers.

I understand that.

Richard.

> Best,
> Martin
>
>
diff mbox series

Patch

diff --git a/gcc/doc/implement-c.texi b/gcc/doc/implement-c.texi
index 692297b69c4..beee2510670 100644
--- a/gcc/doc/implement-c.texi
+++ b/gcc/doc/implement-c.texi
@@ -401,11 +401,15 @@  pointer representation is smaller than the integer type, extends according
 to the signedness of the integer type if the pointer representation
 is larger than the integer type, otherwise the bits are unchanged.
 
-When casting from pointer to integer and back again, the resulting
-pointer must reference the same object as the original pointer, otherwise
-the behavior is undefined.  That is, one may not use integer arithmetic to
-avoid the undefined behavior of pointer arithmetic as proscribed in
-C99 and C11 6.5.6/8.
+Arithmetic on integers derived from pointers can produce a value such
+that casting it back produces a valid pointer corresponding to one of
+the original pointers.  Thus, integer arithmetic allows to express
+computations that might not be expressible as pointer arithmetic without
+undefined behavior.  However, at a certain point the distinction between
+pointers and integers is lost (when GCC translates from GIMPLE internal
+representation to RTL), but some optimizations still attempt to track
+pointer arithmetic beyond that point.  In some cases this may cause
+valid code to be incorrectly optimized.
 
 @item
 @cite{The size of the result of subtracting two pointers to elements