diff mbox series

unshare expressions in attribute arguments

Message ID 56b14df5-16a6-58ae-519a-7d2a9c84e948@gmail.com
State New
Headers show
Series unshare expressions in attribute arguments | expand

Commit Message

Martin Sebor Nov. 20, 2020, 7 p.m. UTC
To detect a subset of VLA misuses, the C front associates the bounds
of VLAs in function argument lists with the corresponding variables
by implicitly adding an instance of attribute access to each function
declared to take VLAs with the bound expressions chained on the list
of attribute arguments.

Some of these expressions end up modified by the middle end, which
results in references to nonlocal variables (and perhaps other nodes)
used in these expression getting garbage collected.  A simple example
of this is described in pr97172.

By unsharing the bound expressions the patch below prevents this from
happening (it's not a fix for pr97172).

My understanding of the details of node sharing and garbage collection
in GCC is very limited (I didn't expect a tree to be garbage-collected
if it's still referenced by something).  Is this the right approach
to solving this problem?

Thanks
Martin

             }
@@ -5834,6 +5835,7 @@ get_parm_array_spec (const struct c_parm *parm, 
tree attrs)

        /* Each variable VLA bound is represented by a dollar sign.  */
        spec += "$";
+      nelts = unshare_expr (nelts);
        vbchain = tree_cons (NULL_TREE, nelts, vbchain);
      }

Comments

Marek Polacek Nov. 20, 2020, 7:29 p.m. UTC | #1
On Fri, Nov 20, 2020 at 12:00:58PM -0700, Martin Sebor via Gcc-patches wrote:
> To detect a subset of VLA misuses, the C front associates the bounds
> of VLAs in function argument lists with the corresponding variables
> by implicitly adding an instance of attribute access to each function
> declared to take VLAs with the bound expressions chained on the list
> of attribute arguments.
> 
> Some of these expressions end up modified by the middle end, which
> results in references to nonlocal variables (and perhaps other nodes)
> used in these expression getting garbage collected.  A simple example
> of this is described in pr97172.
> 
> By unsharing the bound expressions the patch below prevents this from
> happening (it's not a fix for pr97172).
> 
> My understanding of the details of node sharing and garbage collection
> in GCC is very limited (I didn't expect a tree to be garbage-collected
> if it's still referenced by something).  Is this the right approach
> to solving this problem?

ISTM that a more natural thing would be to use build_distinct_type_copy
to copy the type you're about to modify.

> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index d348e39c27a..4aea4dcafb9 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -58,7 +58,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/name-hint.h"
>  #include "c-family/known-headers.h"
>  #include "c-family/c-spellcheck.h"
> -
> +#include "gimplify.h"
>  #include "tree-pretty-print.h"
> 
>  /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
> @@ -5780,6 +5780,7 @@ get_parm_array_spec (const struct c_parm *parm, tree
> attrs)
>                   /* Each variable VLA bound is represented by the dollar
>                      sign.  */
>                   spec += "$";
> +                 nelts = unshare_expr (nelts);
>                   tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
>                 }
>             }
> @@ -5834,6 +5835,7 @@ get_parm_array_spec (const struct c_parm *parm, tree
> attrs)
> 
>        /* Each variable VLA bound is represented by a dollar sign.  */
>        spec += "$";
> +      nelts = unshare_expr (nelts);
>        vbchain = tree_cons (NULL_TREE, nelts, vbchain);
>      }
> 

Marek
Martin Sebor Nov. 20, 2020, 8:28 p.m. UTC | #2
On 11/20/20 12:29 PM, Marek Polacek wrote:
> On Fri, Nov 20, 2020 at 12:00:58PM -0700, Martin Sebor via Gcc-patches wrote:
>> To detect a subset of VLA misuses, the C front associates the bounds
>> of VLAs in function argument lists with the corresponding variables
>> by implicitly adding an instance of attribute access to each function
>> declared to take VLAs with the bound expressions chained on the list
>> of attribute arguments.
>>
>> Some of these expressions end up modified by the middle end, which
>> results in references to nonlocal variables (and perhaps other nodes)
>> used in these expression getting garbage collected.  A simple example
>> of this is described in pr97172.
>>
>> By unsharing the bound expressions the patch below prevents this from
>> happening (it's not a fix for pr97172).
>>
>> My understanding of the details of node sharing and garbage collection
>> in GCC is very limited (I didn't expect a tree to be garbage-collected
>> if it's still referenced by something).  Is this the right approach
>> to solving this problem?
> 
> ISTM that a more natural thing would be to use build_distinct_type_copy
> to copy the type you're about to modify.

The get_parm_array_spec function doesn't modify a type.  It's called
from push_parm_decl() to build an "arg spec" attribute with the VLA
bounds as arguments.  push_parm_decl() then adds the attribute to
the function's PARM_DECL by calling decl_attributes().  When all of
the function's parameters have been processed the "arg specs" are
then extracted and added as an attribute access specification with
the VLA bounds added to the function declaration.

Martin

> 
>> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
>> index d348e39c27a..4aea4dcafb9 100644
>> --- a/gcc/c/c-decl.c
>> +++ b/gcc/c/c-decl.c
>> @@ -58,7 +58,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "c-family/name-hint.h"
>>   #include "c-family/known-headers.h"
>>   #include "c-family/c-spellcheck.h"
>> -
>> +#include "gimplify.h"
>>   #include "tree-pretty-print.h"
>>
>>   /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
>> @@ -5780,6 +5780,7 @@ get_parm_array_spec (const struct c_parm *parm, tree
>> attrs)
>>                    /* Each variable VLA bound is represented by the dollar
>>                       sign.  */
>>                    spec += "$";
>> +                 nelts = unshare_expr (nelts);
>>                    tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
>>                  }
>>              }
>> @@ -5834,6 +5835,7 @@ get_parm_array_spec (const struct c_parm *parm, tree
>> attrs)
>>
>>         /* Each variable VLA bound is represented by a dollar sign.  */
>>         spec += "$";
>> +      nelts = unshare_expr (nelts);
>>         vbchain = tree_cons (NULL_TREE, nelts, vbchain);
>>       }
>>
> 
> Marek
>
Jakub Jelinek Nov. 20, 2020, 8:37 p.m. UTC | #3
On Fri, Nov 20, 2020 at 01:28:03PM -0700, Martin Sebor via Gcc-patches wrote:
> On 11/20/20 12:29 PM, Marek Polacek wrote:
> > On Fri, Nov 20, 2020 at 12:00:58PM -0700, Martin Sebor via Gcc-patches wrote:
> > > To detect a subset of VLA misuses, the C front associates the bounds
> > > of VLAs in function argument lists with the corresponding variables
> > > by implicitly adding an instance of attribute access to each function
> > > declared to take VLAs with the bound expressions chained on the list
> > > of attribute arguments.
> > > 
> > > Some of these expressions end up modified by the middle end, which
> > > results in references to nonlocal variables (and perhaps other nodes)
> > > used in these expression getting garbage collected.  A simple example
> > > of this is described in pr97172.
> > > 
> > > By unsharing the bound expressions the patch below prevents this from
> > > happening (it's not a fix for pr97172).
> > > 
> > > My understanding of the details of node sharing and garbage collection
> > > in GCC is very limited (I didn't expect a tree to be garbage-collected
> > > if it's still referenced by something).  Is this the right approach
> > > to solving this problem?
> > 
> > ISTM that a more natural thing would be to use build_distinct_type_copy
> > to copy the type you're about to modify.
> 
> The get_parm_array_spec function doesn't modify a type.  It's called
> from push_parm_decl() to build an "arg spec" attribute with the VLA
> bounds as arguments.  push_parm_decl() then adds the attribute to
> the function's PARM_DECL by calling decl_attributes().  When all of
> the function's parameters have been processed the "arg specs" are
> then extracted and added as an attribute access specification with
> the VLA bounds added to the function declaration.

Guess it isn't that the trees would be GC collected, that can't happen if
they are referenced from reachable trees, but the thing is that the
gimplifier is destructive, it overwrites various trees as it is gimplifying
function bodies.  That is why the function bodies are normally unshared, but
that unsharing doesn't really walk function attributes.
On the other side, for VLAs unsharing is quite harmful, e.g. if there is
  int vla[foo ()];
then if it is unshared (except when it is a SAVE_EXPR that wouldn't be
unshared), then it could call the foo () function multiple times.
For VLA bounds in PARM_DECLs we are hopefully more restricted than that,
if it involves only other PARM_DECLs and constants and expressions composed
of them, the unsharing could be fine.

	Jakub
Martin Sebor Nov. 20, 2020, 9:30 p.m. UTC | #4
On 11/20/20 1:37 PM, Jakub Jelinek wrote:
> On Fri, Nov 20, 2020 at 01:28:03PM -0700, Martin Sebor via Gcc-patches wrote:
>> On 11/20/20 12:29 PM, Marek Polacek wrote:
>>> On Fri, Nov 20, 2020 at 12:00:58PM -0700, Martin Sebor via Gcc-patches wrote:
>>>> To detect a subset of VLA misuses, the C front associates the bounds
>>>> of VLAs in function argument lists with the corresponding variables
>>>> by implicitly adding an instance of attribute access to each function
>>>> declared to take VLAs with the bound expressions chained on the list
>>>> of attribute arguments.
>>>>
>>>> Some of these expressions end up modified by the middle end, which
>>>> results in references to nonlocal variables (and perhaps other nodes)
>>>> used in these expression getting garbage collected.  A simple example
>>>> of this is described in pr97172.
>>>>
>>>> By unsharing the bound expressions the patch below prevents this from
>>>> happening (it's not a fix for pr97172).
>>>>
>>>> My understanding of the details of node sharing and garbage collection
>>>> in GCC is very limited (I didn't expect a tree to be garbage-collected
>>>> if it's still referenced by something).  Is this the right approach
>>>> to solving this problem?
>>>
>>> ISTM that a more natural thing would be to use build_distinct_type_copy
>>> to copy the type you're about to modify.
>>
>> The get_parm_array_spec function doesn't modify a type.  It's called
>> from push_parm_decl() to build an "arg spec" attribute with the VLA
>> bounds as arguments.  push_parm_decl() then adds the attribute to
>> the function's PARM_DECL by calling decl_attributes().  When all of
>> the function's parameters have been processed the "arg specs" are
>> then extracted and added as an attribute access specification with
>> the VLA bounds added to the function declaration.
> 
> Guess it isn't that the trees would be GC collected, that can't happen if
> they are referenced from reachable trees, but the thing is that the
> gimplifier is destructive, it overwrites various trees as it is gimplifying
> function bodies.  That is why the function bodies are normally unshared, but
> that unsharing doesn't really walk function attributes.
> On the other side, for VLAs unsharing is quite harmful, e.g. if there is
>    int vla[foo ()];
> then if it is unshared (except when it is a SAVE_EXPR that wouldn't be
> unshared), then it could call the foo () function multiple times.
> For VLA bounds in PARM_DECLs we are hopefully more restricted than that,
> if it involves only other PARM_DECLs and constants and expressions composed
> of them, the unsharing could be fine.

VLA parameter bounds can involve any other expressions, including
function calls.  It's those rather than other parameters that also
trigger the problem (at least in the test cases I've seen).

When/how would the unsharing cause the expression to be evaluated
multiple times?  And if/when it did, would simply wrapping the whole
expression in a SAVE_EXPR be the right way to avoid it or would it
need to be more involved than that?

Martin
Jakub Jelinek Nov. 20, 2020, 9:41 p.m. UTC | #5
On Fri, Nov 20, 2020 at 02:30:43PM -0700, Martin Sebor wrote:
> VLA parameter bounds can involve any other expressions, including
> function calls.  It's those rather than other parameters that also
> trigger the problem (at least in the test cases I've seen).
> 
> When/how would the unsharing cause the expression to be evaluated
> multiple times?  And if/when it did, would simply wrapping the whole
> expression in a SAVE_EXPR be the right way to avoid it or would it
> need to be more involved than that?

Well, unshare_expr just doesn't unshare SAVE_EXPRs, it only ensures that
the trees inside of them aren't shared with something else (aka unshares the
subtrees the first time it sees the SAVE_EXPR), but doesn't unshare the
SAVE_EXPR node itself and doesn't walk children the second and following
time.
So, the question is whether you are creating the attributes before the
SAVE_EXPRs are added to the bounds or after it, and whether when evaluating
the (unshared) expressions in there you always place it after something
initialized those SAVE_EXPRs first.
The SAVE_EXPRs are essential, so that the functions aren't called multiple
times.

	Jakub
Martin Sebor Nov. 20, 2020, 9:54 p.m. UTC | #6
On 11/20/20 2:41 PM, Jakub Jelinek wrote:
> On Fri, Nov 20, 2020 at 02:30:43PM -0700, Martin Sebor wrote:
>> VLA parameter bounds can involve any other expressions, including
>> function calls.  It's those rather than other parameters that also
>> trigger the problem (at least in the test cases I've seen).
>>
>> When/how would the unsharing cause the expression to be evaluated
>> multiple times?  And if/when it did, would simply wrapping the whole
>> expression in a SAVE_EXPR be the right way to avoid it or would it
>> need to be more involved than that?
> 
> Well, unshare_expr just doesn't unshare SAVE_EXPRs, it only ensures that
> the trees inside of them aren't shared with something else (aka unshares the
> subtrees the first time it sees the SAVE_EXPR), but doesn't unshare the
> SAVE_EXPR node itself and doesn't walk children the second and following
> time.
> So, the question is whether you are creating the attributes before the
> SAVE_EXPRs are added to the bounds or after it, and whether when evaluating
> the (unshared) expressions in there you always place it after something
> initialized those SAVE_EXPRs first.
> The SAVE_EXPRs are essential, so that the functions aren't called multiple
> times.

At the point the attribute is created there is no SAVE_EXPR.  So for
something like:

  int f (void);
  void g (int a[f () + 1]) { }

the bound is a PLUS_EXPR (CALL_EXPR (f), 1).

I don't do anything with the expression except put them on the chain
of arguments to the two attributes and print them in warnings.

Martin
Jakub Jelinek Nov. 20, 2020, 9:57 p.m. UTC | #7
On Fri, Nov 20, 2020 at 02:54:34PM -0700, Martin Sebor wrote:
> At the point the attribute is created there is no SAVE_EXPR.  So for
> something like:
> 
>  int f (void);
>  void g (int a[f () + 1]) { }
> 
> the bound is a PLUS_EXPR (CALL_EXPR (f), 1).
> 
> I don't do anything with the expression except put them on the chain
> of arguments to the two attributes and print them in warnings.

So that likely means you are doing it too early.

	Jakub
Martin Sebor Nov. 20, 2020, 10:44 p.m. UTC | #8
On 11/20/20 2:57 PM, Jakub Jelinek wrote:
> On Fri, Nov 20, 2020 at 02:54:34PM -0700, Martin Sebor wrote:
>> At the point the attribute is created there is no SAVE_EXPR.  So for
>> something like:
>>
>>   int f (void);
>>   void g (int a[f () + 1]) { }
>>
>> the bound is a PLUS_EXPR (CALL_EXPR (f), 1).
>>
>> I don't do anything with the expression except put them on the chain
>> of arguments to the two attributes and print them in warnings.
> 
> So that likely means you are doing it too early.

The bounds are added to attribute "arg spec" for each param in
push_parm_decl.  I think that's both as early and (except maybe
in function definitions) as late as can be.  After that point,
the association between a VLA parameter and its most significant
bound is lost.

For example, in:

   void f (int n, int A[n], int B[foo () + 1]);

A and B become pointers in push_parm_decl() (in grokdeclarator()
called from it) and there's no way that I know to retrieve
the bounds at a later point.  AFAICT, they're gone unless
the function is being defined.  Is there another/better point
to extract this association that escapes me?

The VLA bounds are evaluated in function definitions so there
must be a point where that's done.  I don't know where that
happens but unless at that point the most significant bound
is still associated with the param (AFAIK, it never really
is at the tree level) it wouldn't help me.

Martin
Jakub Jelinek Nov. 21, 2020, 8:01 a.m. UTC | #9
On Fri, Nov 20, 2020 at 03:44:01PM -0700, Martin Sebor via Gcc-patches wrote:
> > So that likely means you are doing it too early.
> 
> The bounds are added to attribute "arg spec" for each param in
> push_parm_decl.  I think that's both as early and (except maybe
> in function definitions) as late as can be.  After that point,
> the association between a VLA parameter and its most significant
> bound is lost.

If the most significant bound is lost, why don't you save in the attribute
early only the most significant bound before it is lost and for the other
bounds just refer to the SAVE_EXPRs in the FUNCTION_TYPE's TYPE_ARG_TYPES
ARRAY_TYPEs?  And for function definitions, even the outermost bounds
aren't really lost, the FE for
int bar ();
int baz ();

int
foo (int n, int x[bar () + 4][baz () + 2])
{
  return sizeof (x[0]);
}
emits all the side-effects, though not sure if it creates a SAVE_EXPR for
that.  But for the definitions you really want to use the same SAVE_EXPR
as the function evaluates.

	Jakub
Jeff Law Nov. 22, 2020, 4:01 a.m. UTC | #10
On 11/20/20 12:00 PM, Martin Sebor via Gcc-patches wrote:
> To detect a subset of VLA misuses, the C front associates the bounds
> of VLAs in function argument lists with the corresponding variables
> by implicitly adding an instance of attribute access to each function
> declared to take VLAs with the bound expressions chained on the list
> of attribute arguments.
>
> Some of these expressions end up modified by the middle end, which
> results in references to nonlocal variables (and perhaps other nodes)
> used in these expression getting garbage collected.  A simple example
> of this is described in pr97172.
>
> By unsharing the bound expressions the patch below prevents this from
> happening (it's not a fix for pr97172).
>
> My understanding of the details of node sharing and garbage collection
> in GCC is very limited (I didn't expect a tree to be garbage-collected
> if it's still referenced by something).  Is this the right approach
> to solving this problem?
So if the tree node is reachable from a GC root, then it won't be
removed by the GC system.     It's a simple mark/sweep with a set of
registered roots.  The only real complexity is the auto-generated code
to walk the data structures (ie, all the gengtype insanity).

From the BZ:


 <tree_list 0x7fffea924f28
    value <tree_list 0x7fffea924d20
        value <plus_expr 0x7fffea924c80 type <integer_type
0x7fffea8105e8 int>
            arg:0 <var_decl 0x7ffff7ffbb40 n>
            arg:1 <integer_cst 0x7fffea815090 constant 1>
            /build/tmp/z.c:2:48 start: /build/tmp/z.c:2:46 finish:
/build/tmp/z.c:2:50>>>

Then later indicate it looks like this (presumably at LTO stream-out time):


 <tree_list 0x7fffea924ed8
    value <tree_list 0x7fffea924cf8
        value <plus_expr 0x7fffea924c80 type <integer_type
0x7fffea8105e8 int>
          
            arg:0 <ssa_name 0x7fffea801cf0 type <error_mark 0x7fffea7f7cc0>
                nothrow
                def_stmt
                version:1 in-free-list>
            arg:1 <integer_cst 0x7fffea815090 constant 1>
            /build/tmp/z.c:2:55 start: /build/tmp/z.c:2:45 finish:
/build/tmp/z.c:2:57>>>


Note the structure of the value in the tree list, in particular note the
PLUS_EXPR node.  It's at address 0x7fffea924c80 in both.  But in the
first it's a VAR_DECL.  In the second it's a released SSA_NAME.


That to me doesn't look like a GC issue.  To me it looks like you have
violated the structure sharing assumptions by inadvertently sharing the
PLUS_EXPR node.  Naturally when the gimplifier and SSA renaming does its
thing, the first operand of the PLUS_EXPR gets changed to an SSA_NAME. 
I strongly suspect that SSA_NAME ultimately ends up dead and gets
released back to the SSA_NAME manager for re-use (hence the
error_mark_node for the type and in-free-list tag for arg0 of the
PLUS_EXPR in the second instance).

So the first question is presumably you want the original form with the
_DECL node?   That argues that you need the unshare_expr so that your
copy is independent of the actions of gimplification and SSA renaming. 
However, as Jakub noted, there may be a SAVE_EXPR issue that needs to be
addressed here.



jeff
Martin Sebor Nov. 23, 2020, 5:03 p.m. UTC | #11
On 11/21/20 1:01 AM, Jakub Jelinek wrote:
> On Fri, Nov 20, 2020 at 03:44:01PM -0700, Martin Sebor via Gcc-patches wrote:
>>> So that likely means you are doing it too early.
>>
>> The bounds are added to attribute "arg spec" for each param in
>> push_parm_decl.  I think that's both as early and (except maybe
>> in function definitions) as late as can be.  After that point,
>> the association between a VLA parameter and its most significant
>> bound is lost.
> 
> If the most significant bound is lost, why don't you save in the attribute
> early only the most significant bound before it is lost

The other bounds are a part of the type so saving them in the attribute
isn't essential.  I save all of them because it simplifies their lookup.
With only the most significant bound in the attribute argument, looking
up the other bounds (e.g., when checking function redeclarations for
mismatches) will, in addition to doing what's done for the most
significant bound, involve scanning the declarations' argument lists,
extracting the bounds from the SAVE_EXPRs, before comparing them.
As an example, in

   void f (int A[m][n]);

the attribute has the chain (VAR_DECL(m), VAR_DECL(n)) as arguments
and comparing them with another declaration of f is as simple as
traversing the chain and comparing each node value.

With the change you suggest, the attribute will only have VAR_DECL(m)
and the least significant bound will have to be extracted from A[m]'s
type's size which is:

   MULT (NOP (bitsizetype, NOP (sizetype, SAVE (VAR (n)))), 32)

It's possible to do but not without some additional complexity and
cost.

> and for the other
> bounds just refer to the SAVE_EXPRs in the FUNCTION_TYPE's TYPE_ARG_TYPES
> ARRAY_TYPEs?  And for function definitions, even the outermost bounds
> aren't really lost, the FE for
> int bar ();
> int baz ();
> 
> int
> foo (int n, int x[bar () + 4][baz () + 2])
> {
>    return sizeof (x[0]);
> }
> emits all the side-effects, though not sure if it creates a SAVE_EXPR for
> that.  But for the definitions you really want to use the same SAVE_EXPR
> as the function evaluates.

It does create a SAVE_EXPR.  I found the code in grokdeclarator.
The SAVE_EXPR is available to callers so I can use it instead.

I'm still not sure I understand why using the SAVE_EXPR of the bound
in the attribute argument where it isn't evaluated is preferable to
saving the unshared bound as the argument instead.  Can you explain
the problem with doing that?

Thanks
Martin
Jakub Jelinek Nov. 23, 2020, 5:30 p.m. UTC | #12
On Mon, Nov 23, 2020 at 10:03:44AM -0700, Martin Sebor wrote:
> > If the most significant bound is lost, why don't you save in the attribute
> > early only the most significant bound before it is lost
> 
> The other bounds are a part of the type so saving them in the attribute
> isn't essential.  I save all of them because it simplifies their lookup.
> With only the most significant bound in the attribute argument, looking
> up the other bounds (e.g., when checking function redeclarations for
> mismatches) will, in addition to doing what's done for the most
> significant bound, involve scanning the declarations' argument lists,
> extracting the bounds from the SAVE_EXPRs, before comparing them.
> As an example, in
> 
>   void f (int A[m][n]);
> 
> the attribute has the chain (VAR_DECL(m), VAR_DECL(n)) as arguments
> and comparing them with another declaration of f is as simple as
> traversing the chain and comparing each node value.
> 
> With the change you suggest, the attribute will only have VAR_DECL(m)
> and the least significant bound will have to be extracted from A[m]'s
> type's size which is:
> 
>   MULT (NOP (bitsizetype, NOP (sizetype, SAVE (VAR (n)))), 32)
> 
> It's possible to do but not without some additional complexity and
> cost.

I don't think it would be significant complication, on the other side you
avoid wasting compile time memory on that (GC one, which means it will be
wasted until GC collection if there is one ever).  Plus all the issues from
having the same information in multiple different places.

	Jakub
Martin Sebor Nov. 23, 2020, 6:08 p.m. UTC | #13
On 11/23/20 10:30 AM, Jakub Jelinek wrote:
> On Mon, Nov 23, 2020 at 10:03:44AM -0700, Martin Sebor wrote:
>>> If the most significant bound is lost, why don't you save in the attribute
>>> early only the most significant bound before it is lost
>>
>> The other bounds are a part of the type so saving them in the attribute
>> isn't essential.  I save all of them because it simplifies their lookup.
>> With only the most significant bound in the attribute argument, looking
>> up the other bounds (e.g., when checking function redeclarations for
>> mismatches) will, in addition to doing what's done for the most
>> significant bound, involve scanning the declarations' argument lists,
>> extracting the bounds from the SAVE_EXPRs, before comparing them.
>> As an example, in
>>
>>    void f (int A[m][n]);
>>
>> the attribute has the chain (VAR_DECL(m), VAR_DECL(n)) as arguments
>> and comparing them with another declaration of f is as simple as
>> traversing the chain and comparing each node value.
>>
>> With the change you suggest, the attribute will only have VAR_DECL(m)
>> and the least significant bound will have to be extracted from A[m]'s
>> type's size which is:
>>
>>    MULT (NOP (bitsizetype, NOP (sizetype, SAVE (VAR (n)))), 32)
>>
>> It's possible to do but not without some additional complexity and
>> cost.
> 
> I don't think it would be significant complication, on the other side you
> avoid wasting compile time memory on that (GC one, which means it will be
> wasted until GC collection if there is one ever).  Plus all the issues from
> having the same information in multiple different places.

So just to make sure I understand correctly (and answer the question
I asked): unsharing the expression as in the proposed patch won't
cause any correctness issues.  You just find rewriting the code to
use the existing SAVE_EXPRs instead preferable for the reasons above.
Please correct me if I misunderstood something.

Thanks
Martin
Jakub Jelinek Nov. 23, 2020, 6:21 p.m. UTC | #14
On Mon, Nov 23, 2020 at 11:08:13AM -0700, Martin Sebor wrote:
> > I don't think it would be significant complication, on the other side you
> > avoid wasting compile time memory on that (GC one, which means it will be
> > wasted until GC collection if there is one ever).  Plus all the issues from
> > having the same information in multiple different places.
> 
> So just to make sure I understand correctly (and answer the question
> I asked): unsharing the expression as in the proposed patch won't
> cause any correctness issues.  You just find rewriting the code to
> use the existing SAVE_EXPRs instead preferable for the reasons above.
> Please correct me if I misunderstood something.

I admit I haven't looked into detail what exactly you are doing
with those expressions.
If they ever result in code generation rather than just warnings, then if
they lack SAVE_EXPRs by the time you unshare, it would be a wrong-code
(evaluating the expression multiple times).
If they are just compared (lexicographically?), it can work, I guess a
question is if foo () in two expressions is really equal, but I guess it
would mean just one invocation of the same function and so can be considered
equal.  Another thing is that one expression can call something and the
other something else and just the user will know they will in that case do
the same thing.  A warning (but not error) is fine in that case though.

	Jakub
Martin Sebor Nov. 23, 2020, 6:51 p.m. UTC | #15
On 11/23/20 11:21 AM, Jakub Jelinek wrote:
> On Mon, Nov 23, 2020 at 11:08:13AM -0700, Martin Sebor wrote:
>>> I don't think it would be significant complication, on the other side you
>>> avoid wasting compile time memory on that (GC one, which means it will be
>>> wasted until GC collection if there is one ever).  Plus all the issues from
>>> having the same information in multiple different places.
>>
>> So just to make sure I understand correctly (and answer the question
>> I asked): unsharing the expression as in the proposed patch won't
>> cause any correctness issues.  You just find rewriting the code to
>> use the existing SAVE_EXPRs instead preferable for the reasons above.
>> Please correct me if I misunderstood something.
> 
> I admit I haven't looked into detail what exactly you are doing
> with those expressions.
> If they ever result in code generation rather than just warnings, then if
> they lack SAVE_EXPRs by the time you unshare, it would be a wrong-code
> (evaluating the expression multiple times).
> If they are just compared (lexicographically?), it can work, I guess a
> question is if foo () in two expressions is really equal, but I guess it
> would mean just one invocation of the same function and so can be considered
> equal.  Another thing is that one expression can call something and the
> other something else and just the user will know they will in that case do
> the same thing.  A warning (but not error) is fine in that case though.

Nontrivial expressions (anything but a straight PARM_DECL which is
mapped to a positional argument) are only used in lexicographical
comparisons to check for equivalence.  They're not used for codegen
decisions.

Examples can certainly be contrived where relying on equivalence is
not reliable and results in warnings for safe code.  There are even
very simple cases where this can happen that are clearly correct
(e.g., pr97548).  Some of the former are probably unavoidable (such
as the one you described) but I'd like to try to deal with at least
the basic ones.

Martin
Joseph Myers Nov. 23, 2020, 11:51 p.m. UTC | #16
On Fri, 20 Nov 2020, Martin Sebor via Gcc-patches wrote:

> The VLA bounds are evaluated in function definitions so there
> must be a point where that's done.  I don't know where that
> happens but unless at that point the most significant bound
> is still associated with the param (AFAIK, it never really
> is at the tree level) it wouldn't help me.

grokdeclarator combines VLA bounds with the *expr passed in using a 
COMPOUND_EXPR.  These later get stored in arg_info->pending_sizes, and the 
evaluation happens via the add_stmt call in store_parm_decls.
diff mbox series

Patch

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index d348e39c27a..4aea4dcafb9 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -58,7 +58,7 @@  along with GCC; see the file COPYING3.  If not see
  #include "c-family/name-hint.h"
  #include "c-family/known-headers.h"
  #include "c-family/c-spellcheck.h"
-
+#include "gimplify.h"
  #include "tree-pretty-print.h"

  /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
@@ -5780,6 +5780,7 @@  get_parm_array_spec (const struct c_parm *parm, 
tree attrs)
                   /* Each variable VLA bound is represented by the dollar
                      sign.  */
                   spec += "$";
+                 nelts = unshare_expr (nelts);
                   tpbnds = tree_cons (NULL_TREE, nelts, tpbnds);
                 }