diff mbox series

Avoid char[] array in tree_def

Message ID 20201029155054.GA54974@kam.mff.cuni.cz
State New
Headers show
Series Avoid char[] array in tree_def | expand

Commit Message

Jan Hubicka Oct. 29, 2020, 3:50 p.m. UTC
Hi,
this patch removes second char array from tree_def union and makes it
!TYPELESS_STORAGE.  Now all accesses to anything in tree no longer have alias
set 0, but they all have alias set 1 :)
This is because the way we handle unions. However it still increases TBAA
effectivity by about 12%. From:

Alias oracle query stats:
  refs_may_alias_p: 65066258 disambiguations, 74846942 queries
  ref_maybe_used_by_call_p: 152444 disambiguations, 65966862 queries
  call_may_clobber_ref_p: 22546 disambiguations, 28559 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 36816 queries
  nonoverlapping_refs_since_match_p: 27230 disambiguations, 58300 must overlaps, 86300 queries
  aliasing_component_refs_p: 66090 disambiguations, 2048800 queries
  TBAA oracle: 25578632 disambiguations 59483650 queries
               12219919 are in alias set 0
               10534575 queries asked about the same object
               125 queries asked about the same alias set
               0 access volatile
               9491563 are dependent in the DAG
               1658836 are aritificially in conflict with void *

Modref stats:
  modref use: 14421 disambiguations, 48129 queries
  modref clobber: 1528229 disambiguations, 1926907 queries
  3881547 tbaa queries (2.014392 per modref query)
  565057 base compares (0.293246 per modref query)

PTA query stats:
  pt_solution_includes: 947491 disambiguations, 13119151 queries
  pt_solutions_intersect: 1043695 disambiguations, 13221495 queries

To:

Alias oracle query stats:
  refs_may_alias_p: 66455561 disambiguations, 75202803 queries
  ref_maybe_used_by_call_p: 155301 disambiguations, 67370278 queries
  call_may_clobber_ref_p: 22550 disambiguations, 28587 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 37058 queries
  nonoverlapping_refs_since_match_p: 28126 disambiguations, 59906 must overlaps, 88990 queries
  aliasing_component_refs_p: 66375 disambiguations, 2440039 queries
  TBAA oracle: 28800751 disambiguations 64328055 queries
               8053661 are in alias set 0
               11181983 queries asked about the same object
               125 queries asked about the same alias set
               0 access volatile
               13905691 are dependent in the DAG
               2385844 are aritificially in conflict with void *

Modref stats:
  modref use: 16781 disambiguations, 52031 queries
  modref clobber: 1745589 disambiguations, 2149518 queries
  4192266 tbaa queries (1.950328 per modref query)
  559148 base compares (0.260127 per modref query)

PTA query stats:
  pt_solution_includes: 906487 disambiguations, 13105994 queries
  pt_solutions_intersect: 1041144 disambiguations, 13659726 queries

Bootstrapped/regtested x86_64-linux, OK?

	* tree.c (build_string): Update.
	* tree-core.h (tree_fixed_cst): Avoid typeless storage.

Comments

Jakub Jelinek Oct. 29, 2020, 3:55 p.m. UTC | #1
On Thu, Oct 29, 2020 at 04:50:54PM +0100, Jan Hubicka wrote:
> 	* tree.c (build_string): Update.
> 	* tree-core.h (tree_fixed_cst): Avoid typeless storage.

Is it valid then to
#define TREE_STRING_POINTER(NODE) \
  ((const char *)(STRING_CST_CHECK (NODE)->string.str))
and strcpy etc. it around though?
Maybe yes, because stores through char can alias anything.

> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index c9280a8d3b1..63dbb5b8eab 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1401,7 +1401,8 @@ struct GTY(()) tree_fixed_cst {
>  struct GTY(()) tree_string {
>    struct tree_typed typed;
>    int length;
> -  char str[1];
> +  /* Avoid char array that would make whole type to be typeless storage.  */
> +  struct {char c;} str[1];
>  };
>  
>  struct GTY(()) tree_complex {
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 81f867ddded..84115630184 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */)
>      memcpy (s->string.str, str, len);
>    else
>      memset (s->string.str, 0, len);
> -  s->string.str[len] = '\0';
> +  s->string.str[len].c = '\0';
>  
>    return s;
>  }

	Jakub
Richard Biener Oct. 29, 2020, 3:58 p.m. UTC | #2
On Thu, 29 Oct 2020, Jan Hubicka wrote:

> Hi,
> this patch removes second char array from tree_def union and makes it
> !TYPELESS_STORAGE.  Now all accesses to anything in tree no longer have alias
> set 0, but they all have alias set 1 :)
> This is because the way we handle unions. However it still increases TBAA
> effectivity by about 12%. From:
> 
> Alias oracle query stats:
>   refs_may_alias_p: 65066258 disambiguations, 74846942 queries
>   ref_maybe_used_by_call_p: 152444 disambiguations, 65966862 queries
>   call_may_clobber_ref_p: 22546 disambiguations, 28559 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 36816 queries
>   nonoverlapping_refs_since_match_p: 27230 disambiguations, 58300 must overlaps, 86300 queries
>   aliasing_component_refs_p: 66090 disambiguations, 2048800 queries
>   TBAA oracle: 25578632 disambiguations 59483650 queries
>                12219919 are in alias set 0
>                10534575 queries asked about the same object
>                125 queries asked about the same alias set
>                0 access volatile
>                9491563 are dependent in the DAG
>                1658836 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 14421 disambiguations, 48129 queries
>   modref clobber: 1528229 disambiguations, 1926907 queries
>   3881547 tbaa queries (2.014392 per modref query)
>   565057 base compares (0.293246 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 947491 disambiguations, 13119151 queries
>   pt_solutions_intersect: 1043695 disambiguations, 13221495 queries
> 
> To:
> 
> Alias oracle query stats:
>   refs_may_alias_p: 66455561 disambiguations, 75202803 queries
>   ref_maybe_used_by_call_p: 155301 disambiguations, 67370278 queries
>   call_may_clobber_ref_p: 22550 disambiguations, 28587 queries
>   nonoverlapping_component_refs_p: 0 disambiguations, 37058 queries
>   nonoverlapping_refs_since_match_p: 28126 disambiguations, 59906 must overlaps, 88990 queries
>   aliasing_component_refs_p: 66375 disambiguations, 2440039 queries
>   TBAA oracle: 28800751 disambiguations 64328055 queries
>                8053661 are in alias set 0
>                11181983 queries asked about the same object
>                125 queries asked about the same alias set
>                0 access volatile
>                13905691 are dependent in the DAG
>                2385844 are aritificially in conflict with void *
> 
> Modref stats:
>   modref use: 16781 disambiguations, 52031 queries
>   modref clobber: 1745589 disambiguations, 2149518 queries
>   4192266 tbaa queries (1.950328 per modref query)
>   559148 base compares (0.260127 per modref query)
> 
> PTA query stats:
>   pt_solution_includes: 906487 disambiguations, 13105994 queries
>   pt_solutions_intersect: 1041144 disambiguations, 13659726 queries
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> 	* tree.c (build_string): Update.
> 	* tree-core.h (tree_fixed_cst): Avoid typeless storage.
> 
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index c9280a8d3b1..63dbb5b8eab 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1401,7 +1401,8 @@ struct GTY(()) tree_fixed_cst {
>  struct GTY(()) tree_string {
>    struct tree_typed typed;
>    int length;
> -  char str[1];
> +  /* Avoid char array that would make whole type to be typeless storage.  */
> +  struct {char c;} str[1];

That's ugly and will for sure defeat warning / access code
when we access this as char[], no?  I mean, we could
as well use 'int str[1];' here?

Maybe we can invent some C++ attribute for this?

[[gnu::string]]

or so that marks it as actual char and not typeless storage?

Richard.

>  };
>  
>  struct GTY(()) tree_complex {
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 81f867ddded..84115630184 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */)
>      memcpy (s->string.str, str, len);
>    else
>      memset (s->string.str, 0, len);
> -  s->string.str[len] = '\0';
> +  s->string.str[len].c = '\0';
>  
>    return s;
>  }
>
Jan Hubicka Oct. 29, 2020, 3:58 p.m. UTC | #3
> On Thu, Oct 29, 2020 at 04:50:54PM +0100, Jan Hubicka wrote:
> > 	* tree.c (build_string): Update.
> > 	* tree-core.h (tree_fixed_cst): Avoid typeless storage.
> 
> Is it valid then to
> #define TREE_STRING_POINTER(NODE) \
>   ((const char *)(STRING_CST_CHECK (NODE)->string.str))
> and strcpy etc. it around though?
> Maybe yes, because stores through char can alias anything.

Yep, I think it should be valid for that reason.  The whole thing is not
terribly pretty (the wide-int change was better), but I do not know of
better solution and it affects our core datastructure.  Typeless storage
is really complicated concept.  Forutnately it seems that there are no
more hacks like this needed: both tree and gimple now gets non-zero
alias set.

Honza
> 
> > diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> > index c9280a8d3b1..63dbb5b8eab 100644
> > --- a/gcc/tree-core.h
> > +++ b/gcc/tree-core.h
> > @@ -1401,7 +1401,8 @@ struct GTY(()) tree_fixed_cst {
> >  struct GTY(()) tree_string {
> >    struct tree_typed typed;
> >    int length;
> > -  char str[1];
> > +  /* Avoid char array that would make whole type to be typeless storage.  */
> > +  struct {char c;} str[1];
> >  };
> >  
> >  struct GTY(()) tree_complex {
> > diff --git a/gcc/tree.c b/gcc/tree.c
> > index 81f867ddded..84115630184 100644
> > --- a/gcc/tree.c
> > +++ b/gcc/tree.c
> > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */)
> >      memcpy (s->string.str, str, len);
> >    else
> >      memset (s->string.str, 0, len);
> > -  s->string.str[len] = '\0';
> > +  s->string.str[len].c = '\0';
> >  
> >    return s;
> >  }
> 
> 	Jakub
>
Jan Hubicka Oct. 29, 2020, 4 p.m. UTC | #4
> 
> That's ugly and will for sure defeat warning / access code
> when we access this as char[], no?  I mean, we could
> as well use 'int str[1];' here?

Well, we always get char pointer via macro that is IMO OK, but I am also
not very much in love with this.
> 
> Maybe we can invent some C++ attribute for this?
> 
> [[gnu::string]]
> 
> or so that marks it as actual char and not typeless storage?

Attribute would probably make sense.  Not sure if gnu::string is best
name given that it can be also meaningful for array of small integers
(such as in wide_int).

Honza
> 
> Richard.
> 
> >  };
> >  
> >  struct GTY(()) tree_complex {
> > diff --git a/gcc/tree.c b/gcc/tree.c
> > index 81f867ddded..84115630184 100644
> > --- a/gcc/tree.c
> > +++ b/gcc/tree.c
> > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */)
> >      memcpy (s->string.str, str, len);
> >    else
> >      memset (s->string.str, 0, len);
> > -  s->string.str[len] = '\0';
> > +  s->string.str[len].c = '\0';
> >  
> >    return s;
> >  }
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> Germany; GF: Felix Imend
Richard Biener Oct. 29, 2020, 4:04 p.m. UTC | #5
On Thu, 29 Oct 2020, Jan Hubicka wrote:

> > 
> > That's ugly and will for sure defeat warning / access code
> > when we access this as char[], no?  I mean, we could
> > as well use 'int str[1];' here?
> 
> Well, we always get char pointer via macro that is IMO OK, but I am also
> not very much in love with this.
> > 
> > Maybe we can invent some C++ attribute for this?
> > 
> > [[gnu::string]]
> > 
> > or so that marks it as actual char and not typeless storage?
> 
> Attribute would probably make sense.  Not sure if gnu::string is best
> name given that it can be also meaningful for array of small integers
> (such as in wide_int).

OK, maybe [[gnu::strictly_typed]] then?

> Honza
> > 
> > Richard.
> > 
> > >  };
> > >  
> > >  struct GTY(()) tree_complex {
> > > diff --git a/gcc/tree.c b/gcc/tree.c
> > > index 81f867ddded..84115630184 100644
> > > --- a/gcc/tree.c
> > > +++ b/gcc/tree.c
> > > @@ -2273,7 +2273,7 @@ build_string (unsigned len, const char *str /*= NULL */)
> > >      memcpy (s->string.str, str, len);
> > >    else
> > >      memset (s->string.str, 0, len);
> > > -  s->string.str[len] = '\0';
> > > +  s->string.str[len].c = '\0';
> > >  
> > >    return s;
> > >  }
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
> > Germany; GF: Felix Imend
>
Jakub Jelinek Oct. 29, 2020, 4:07 p.m. UTC | #6
On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote:
> > 
> > That's ugly and will for sure defeat warning / access code
> > when we access this as char[], no?  I mean, we could
> > as well use 'int str[1];' here?
> 
> Well, we always get char pointer via macro that is IMO OK, but I am also
> not very much in love with this.

Do we treat signed char [...]; as typeless storage too, or just
what the C++ standard requires (i.e. char, unsigned char and std::byte
where the last one is enum type with unsigned char underlying type)?

	Jakub
Jan Hubicka Oct. 29, 2020, 4:59 p.m. UTC | #7
> On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote:
> > > 
> > > That's ugly and will for sure defeat warning / access code
> > > when we access this as char[], no?  I mean, we could
> > > as well use 'int str[1];' here?
> > 
> > Well, we always get char pointer via macro that is IMO OK, but I am also
> > not very much in love with this.
> 
> Do we treat signed char [...]; as typeless storage too, or just
> what the C++ standard requires (i.e. char, unsigned char and std::byte
> where the last one is enum type with unsigned char underlying type)?
struct a {signed char b[10];int d;} c;
void
test ()
{
  c.d=1;
}

still leads to alias set 0 access, so perhaps this can be improved.
Where the standard specifies this? (also coincidentally I have no idea
where C++ sets typeless storage to 1 :)

Honza
> 
> 	Jakub
>
Jan Hubicka Oct. 29, 2020, 5:01 p.m. UTC | #8
> On Thu, 29 Oct 2020, Jan Hubicka wrote:
> 
> > > 
> > > That's ugly and will for sure defeat warning / access code
> > > when we access this as char[], no?  I mean, we could
> > > as well use 'int str[1];' here?
> > 
> > Well, we always get char pointer via macro that is IMO OK, but I am also
> > not very much in love with this.
> > > 
> > > Maybe we can invent some C++ attribute for this?
> > > 
> > > [[gnu::string]]
> > > 
> > > or so that marks it as actual char and not typeless storage?
> > 
> > Attribute would probably make sense.  Not sure if gnu::string is best
> > name given that it can be also meaningful for array of small integers
> > (such as in wide_int).
> 
> OK, maybe [[gnu::strictly_typed]] then?

This looks like good idea to me (and probably also making difference
with signed char as Jakub suggests).

I am adding Jason to CC, since he may know better.
Honza
Richard Biener Oct. 29, 2020, 5:40 p.m. UTC | #9
On Thu, 29 Oct 2020, Jakub Jelinek wrote:

> On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote:
> > > 
> > > That's ugly and will for sure defeat warning / access code
> > > when we access this as char[], no?  I mean, we could
> > > as well use 'int str[1];' here?
> > 
> > Well, we always get char pointer via macro that is IMO OK, but I am also
> > not very much in love with this.
> 
> Do we treat signed char [...]; as typeless storage too, or just
> what the C++ standard requires (i.e. char, unsigned char and std::byte
> where the last one is enum type with unsigned char underlying type)?

All that is covered by is_byte_access_type which includes all
character types (including char16_t and wchar it seems) and std::byte.

Richard.

> 	Jakub
> 
>
Jason Merrill Oct. 29, 2020, 8:46 p.m. UTC | #10
On 10/29/20 1:40 PM, Richard Biener wrote:
> On Thu, 29 Oct 2020, Jakub Jelinek wrote:
> 
>> On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote:
>>>>
>>>> That's ugly and will for sure defeat warning / access code
>>>> when we access this as char[], no?  I mean, we could
>>>> as well use 'int str[1];' here?
>>>
>>> Well, we always get char pointer via macro that is IMO OK, but I am also
>>> not very much in love with this.
>>
>> Do we treat signed char [...]; as typeless storage too, or just
>> what the C++ standard requires (i.e. char, unsigned char and std::byte
>> where the last one is enum type with unsigned char underlying type)?
> 
> All that is covered by is_byte_access_type which includes all
> character types (including char16_t and wchar it seems) and std::byte.

Well, that's a bug, apparently from the PR94923 patch (r11-499); 
previously it was char, signed char, and unsigned char.  And even that 
is too much; even C++98 said just char and unsigned char could be used 
for bytewise access.

When C++17 clarified this with the notion of "provides storage", it 
applied to only unsigned char and std::byte, not even the full set of 
byte-access types.  We still need to allow bytewise access using plain 
char, but perhaps we don't need to treat plain char arrays as typeless.

Attributes to say that a particular array does or does not provide 
storage for objects of other types do sound useful.

Jason
Jan Hubicka Oct. 31, 2020, 10:35 p.m. UTC | #11
> On 10/29/20 1:40 PM, Richard Biener wrote:
> > On Thu, 29 Oct 2020, Jakub Jelinek wrote:
> > 
> > > On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote:
> > > > > 
> > > > > That's ugly and will for sure defeat warning / access code
> > > > > when we access this as char[], no?  I mean, we could
> > > > > as well use 'int str[1];' here?
> > > > 
> > > > Well, we always get char pointer via macro that is IMO OK, but I am also
> > > > not very much in love with this.
> > > 
> > > Do we treat signed char [...]; as typeless storage too, or just
> > > what the C++ standard requires (i.e. char, unsigned char and std::byte
> > > where the last one is enum type with unsigned char underlying type)?
> > 
> > All that is covered by is_byte_access_type which includes all
> > character types (including char16_t and wchar it seems) and std::byte.
> 
> Well, that's a bug, apparently from the PR94923 patch (r11-499); previously
> it was char, signed char, and unsigned char.  And even that is too much;
> even C++98 said just char and unsigned char could be used for bytewise
> access.
> 
> When C++17 clarified this with the notion of "provides storage", it applied
> to only unsigned char and std::byte, not even the full set of byte-access
> types.  We still need to allow bytewise access using plain char, but perhaps
> we don't need to treat plain char arrays as typeless.
> 
> Attributes to say that a particular array does or does not provide storage
> for objects of other types do sound useful.

Thanks a lot for explanation!  
I am adding Martin Sebor to CC. Perhaps you can fix the problem with 
std::byte, and signed char to imply typeless storage and ideally also
add an attribute? This seem too deep into C++ FE details for me.

I was thiking a bit more about all accesses to trees getting same alias
set.  This is because we allow type puning through unions.  If our tree
datastructure was a C++ class hiearchy, this would not happen since
accesses to specialized tree node would not be through unions but
through casted pointers.

We could do that with our acessor macros, too.
I tried to turn (NODE)->base in our accesors to ((tree_base *)(node))->base
and this breaks with const_tree pointers. However the following seem to
work and get better TBAA.

I think converting other acessors should be quite easy and make our
predicates more easy to optimize.

What do you think?
Honza

diff --git a/gcc/tree.h b/gcc/tree.h
index 7f0aa5b8d1d..cd8146808f7 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -235,13 +235,24 @@ as_internal_fn (combined_fn code)
 
 #define NULL_TREE (tree) NULL
 
+inline tree_base *
+as_a_tree_base (tree t)
+{
+  return (tree_base *)t;
+}
+inline const tree_base *
+as_a_tree_base (const_tree t)
+{
+  return (const tree_base *)t;
+}
+
 /* Define accessors for the fields that all tree nodes have
    (though some fields are not used for all kinds of nodes).  */
 
 /* The tree-code says what kind of node it is.
    Codes are defined in tree.def.  */
-#define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
-#define TREE_SET_CODE(NODE, VALUE) ((NODE)->base.code = (VALUE))
+#define TREE_CODE(NODE) ((enum tree_code) (as_a_tree_base (NODE)->code))
+#define TREE_SET_CODE(NODE, VALUE) (as_a_tree_base (NODE)->code = (VALUE))
 
 /* When checking is enabled, errors will be generated if a tree node
    is accessed incorrectly. The macros die with a fatal error.  */
@@ -642,7 +653,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
    In IDENTIFIER_NODEs, this means that some extern decl for this name
    had its address taken.  That matters for inline functions.
    In a STMT_EXPR, it means we want the result of the enclosed expression.  */
-#define TREE_ADDRESSABLE(NODE) ((NODE)->base.addressable_flag)
+#define TREE_ADDRESSABLE(NODE) (as_a_tree_base (NODE)->addressable_flag)
 
 /* Set on a CALL_EXPR if the call is in a tail position, ie. just before the
    exit of a function.  Calls for which this is true are candidates for tail
@@ -670,7 +681,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 /* In a VAR_DECL, nonzero means allocate static storage.
    In a FUNCTION_DECL, nonzero if function has been defined.
    In a CONSTRUCTOR, nonzero means allocate static storage.  */
-#define TREE_STATIC(NODE) ((NODE)->base.static_flag)
+#define TREE_STATIC(NODE) (as_a_tree_base (NODE)->static_flag)
 
 /* In an ADDR_EXPR, nonzero means do not use a trampoline.  */
 #define TREE_NO_TRAMPOLINE(NODE) (ADDR_EXPR_CHECK (NODE)->base.static_flag)
@@ -701,7 +712,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
    warnings concerning the decl should be suppressed.  This is used at
    least for used-before-set warnings, and it set after one warning is
    emitted.  */
-#define TREE_NO_WARNING(NODE) ((NODE)->base.nowarning_flag)
+#define TREE_NO_WARNING(NODE) (as_a_tree_base (NODE)->nowarning_flag)
 
 /* Nonzero if we should warn about the change in empty class parameter
    passing ABI in this TU.  */
@@ -744,7 +755,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
    In an IDENTIFIER_NODE, nonzero means an external declaration
    accessible from outside this translation unit was previously seen
    for this name in an inner scope.  */
-#define TREE_PUBLIC(NODE) ((NODE)->base.public_flag)
+#define TREE_PUBLIC(NODE) (as_a_tree_base (NODE)->public_flag)
 
 /* In a _TYPE, indicates whether TYPE_CACHED_VALUES contains a vector
    of cached values, or is something else.  */
@@ -766,7 +777,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
    reference to a volatile variable.  In a ..._DECL, this is set only if the
    declaration said `volatile'.  This will never be set for a constant.  */
 #define TREE_SIDE_EFFECTS(NODE) \
-  (NON_TYPE_CHECK (NODE)->base.side_effects_flag)
+  (as_a_tree_base (NON_TYPE_CHECK (NODE))->side_effects_flag)
 
 /* In a LABEL_DECL, nonzero means this label had its address taken
    and therefore can never be deleted and is a jump target for
@@ -796,7 +807,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
    because eventually we may make that a different bit.
 
    If this bit is set in an expression, so is TREE_SIDE_EFFECTS.  */
-#define TREE_THIS_VOLATILE(NODE) ((NODE)->base.volatile_flag)
+#define TREE_THIS_VOLATILE(NODE) (as_a_tree_base (NODE)->volatile_flag)
 
 /* Nonzero means this node will not trap.  In an INDIRECT_REF, means
    accessing the memory pointed to won't generate a trap.  However,
@@ -877,20 +888,20 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
    In a BLOCK node, nonzero if reorder_blocks has already seen this block.
    In an SSA_NAME node, nonzero if the SSA_NAME occurs in an abnormal
    PHI node.  */
-#define TREE_ASM_WRITTEN(NODE) ((NODE)->base.asm_written_flag)
+#define TREE_ASM_WRITTEN(NODE) (as_a_tree_base (NODE)->asm_written_flag)
 
 /* Nonzero in a _DECL if the name is used in its scope.
    Nonzero in an expr node means inhibit warning if value is unused.
    In IDENTIFIER_NODEs, this means that some extern decl for this name
    was used.
    In a BLOCK, this means that the block contains variables that are used.  */
-#define TREE_USED(NODE) ((NODE)->base.used_flag)
+#define TREE_USED(NODE) (as_a_tree_base (NODE)->used_flag)
 
 /* In a FUNCTION_DECL, nonzero means a call to the function cannot
    throw an exception.  In a CALL_EXPR, nonzero means the call cannot
    throw.  We can't easily check the node type here as the C++
    frontend also uses this flag (for AGGR_INIT_EXPR).  */
-#define TREE_NOTHROW(NODE) ((NODE)->base.nothrow_flag)
+#define TREE_NOTHROW(NODE) (as_a_tree_base (NODE)->nothrow_flag)
 
 /* In a CALL_EXPR, means that it's safe to use the target of the call
    expansion as the return slot for a call that returns in memory.  */
@@ -939,9 +950,9 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
   (CALL_EXPR_CHECK (NODE)->base.protected_flag)
 
 /* Used in classes in C++.  */
-#define TREE_PRIVATE(NODE) ((NODE)->base.private_flag)
+#define TREE_PRIVATE(NODE) (as_a_tree_base (NODE)->private_flag)
 /* Used in classes in C++. */
-#define TREE_PROTECTED(NODE) ((NODE)->base.protected_flag)
+#define TREE_PROTECTED(NODE) (as_a_tree_base (NODE)->protected_flag)
 
 /* True if reference type NODE is a C++ rvalue reference.  */
 #define TYPE_REF_IS_RVALUE(NODE) \
@@ -950,7 +961,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 /* Nonzero in a _DECL if the use of the name is defined as a
    deprecated feature by __attribute__((deprecated)).  */
 #define TREE_DEPRECATED(NODE) \
-  ((NODE)->base.deprecated_flag)
+  (as_a_tree_base (NODE)->deprecated_flag)
 
 /* Nonzero indicates an IDENTIFIER_NODE that names an anonymous
    aggregate, (as created by anon_aggr_name_format).  */
@@ -2170,7 +2181,7 @@ extern tree vector_element_bits_tree (const_tree);
 
 /* Used to keep track of visited nodes in tree traversals.  This is set to
    0 by copy_node and make_node.  */
-#define TREE_VISITED(NODE) ((NODE)->base.visited)
+#define TREE_VISITED(NODE) (as_a_tree_base (NODE)->visited)
 
 /* If set in an ARRAY_TYPE, indicates a string type (for languages
    that distinguish string from array of char).
Richard Biener Nov. 1, 2020, 6:50 a.m. UTC | #12
On October 31, 2020 11:35:01 PM GMT+01:00, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On 10/29/20 1:40 PM, Richard Biener wrote:
>> > On Thu, 29 Oct 2020, Jakub Jelinek wrote:
>> > 
>> > > On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote:
>> > > > > 
>> > > > > That's ugly and will for sure defeat warning / access code
>> > > > > when we access this as char[], no?  I mean, we could
>> > > > > as well use 'int str[1];' here?
>> > > > 
>> > > > Well, we always get char pointer via macro that is IMO OK, but
>I am also
>> > > > not very much in love with this.
>> > > 
>> > > Do we treat signed char [...]; as typeless storage too, or just
>> > > what the C++ standard requires (i.e. char, unsigned char and
>std::byte
>> > > where the last one is enum type with unsigned char underlying
>type)?
>> > 
>> > All that is covered by is_byte_access_type which includes all
>> > character types (including char16_t and wchar it seems) and
>std::byte.
>> 
>> Well, that's a bug, apparently from the PR94923 patch (r11-499);
>previously
>> it was char, signed char, and unsigned char.  And even that is too
>much;
>> even C++98 said just char and unsigned char could be used for
>bytewise
>> access.
>> 
>> When C++17 clarified this with the notion of "provides storage", it
>applied
>> to only unsigned char and std::byte, not even the full set of
>byte-access
>> types.  We still need to allow bytewise access using plain char, but
>perhaps
>> we don't need to treat plain char arrays as typeless.
>> 
>> Attributes to say that a particular array does or does not provide
>storage
>> for objects of other types do sound useful.
>
>Thanks a lot for explanation!  
>I am adding Martin Sebor to CC. Perhaps you can fix the problem with 
>std::byte, and signed char to imply typeless storage and ideally also
>add an attribute? This seem too deep into C++ FE details for me.
>
>I was thiking a bit more about all accesses to trees getting same alias
>set.  This is because we allow type puning through unions.  If our tree
>datastructure was a C++ class hiearchy, this would not happen since
>accesses to specialized tree node would not be through unions but
>through casted pointers.
>
>We could do that with our acessor macros, too.
>I tried to turn (NODE)->base in our accesors to ((tree_base
>*)(node))->base
>and this breaks with const_tree pointers. However the following seem to
>work and get better TBAA.
>
>I think converting other acessors should be quite easy and make our
>predicates more easy to optimize.
>
>What do you think?

Can you adjust TREE_SET_CODE to assert the corresponding tree struct doesn't change? Otherwise sure - this would be a good change. Can we avoid the online function though? It'll make -O0 debugging even more awkward...

Richard. 

>Honza
>
>diff --git a/gcc/tree.h b/gcc/tree.h
>index 7f0aa5b8d1d..cd8146808f7 100644
>--- a/gcc/tree.h
>+++ b/gcc/tree.h
>@@ -235,13 +235,24 @@ as_internal_fn (combined_fn code)
> 
> #define NULL_TREE (tree) NULL
> 
>+inline tree_base *
>+as_a_tree_base (tree t)
>+{
>+  return (tree_base *)t;
>+}
>+inline const tree_base *
>+as_a_tree_base (const_tree t)
>+{
>+  return (const tree_base *)t;
>+}
>+
> /* Define accessors for the fields that all tree nodes have
>    (though some fields are not used for all kinds of nodes).  */
> 
> /* The tree-code says what kind of node it is.
>    Codes are defined in tree.def.  */
>-#define TREE_CODE(NODE) ((enum tree_code) (NODE)->base.code)
>-#define TREE_SET_CODE(NODE, VALUE) ((NODE)->base.code = (VALUE))
>+#define TREE_CODE(NODE) ((enum tree_code) (as_a_tree_base
>(NODE)->code))
>+#define TREE_SET_CODE(NODE, VALUE) (as_a_tree_base (NODE)->code =
>(VALUE))
> 
> /* When checking is enabled, errors will be generated if a tree node
>    is accessed incorrectly. The macros die with a fatal error.  */
>@@ -642,7 +653,7 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
>    In IDENTIFIER_NODEs, this means that some extern decl for this name
>    had its address taken.  That matters for inline functions.
>In a STMT_EXPR, it means we want the result of the enclosed expression.
> */
>-#define TREE_ADDRESSABLE(NODE) ((NODE)->base.addressable_flag)
>+#define TREE_ADDRESSABLE(NODE) (as_a_tree_base
>(NODE)->addressable_flag)
> 
>/* Set on a CALL_EXPR if the call is in a tail position, ie. just
>before the
>exit of a function.  Calls for which this is true are candidates for
>tail
>@@ -670,7 +681,7 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
> /* In a VAR_DECL, nonzero means allocate static storage.
>    In a FUNCTION_DECL, nonzero if function has been defined.
>    In a CONSTRUCTOR, nonzero means allocate static storage.  */
>-#define TREE_STATIC(NODE) ((NODE)->base.static_flag)
>+#define TREE_STATIC(NODE) (as_a_tree_base (NODE)->static_flag)
> 
> /* In an ADDR_EXPR, nonzero means do not use a trampoline.  */
>#define TREE_NO_TRAMPOLINE(NODE) (ADDR_EXPR_CHECK
>(NODE)->base.static_flag)
>@@ -701,7 +712,7 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
>    warnings concerning the decl should be suppressed.  This is used at
>    least for used-before-set warnings, and it set after one warning is
>    emitted.  */
>-#define TREE_NO_WARNING(NODE) ((NODE)->base.nowarning_flag)
>+#define TREE_NO_WARNING(NODE) (as_a_tree_base (NODE)->nowarning_flag)
> 
> /* Nonzero if we should warn about the change in empty class parameter
>    passing ABI in this TU.  */
>@@ -744,7 +755,7 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
>    In an IDENTIFIER_NODE, nonzero means an external declaration
>    accessible from outside this translation unit was previously seen
>    for this name in an inner scope.  */
>-#define TREE_PUBLIC(NODE) ((NODE)->base.public_flag)
>+#define TREE_PUBLIC(NODE) (as_a_tree_base (NODE)->public_flag)
> 
> /* In a _TYPE, indicates whether TYPE_CACHED_VALUES contains a vector
>    of cached values, or is something else.  */
>@@ -766,7 +777,7 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
>reference to a volatile variable.  In a ..._DECL, this is set only if
>the
>declaration said `volatile'.  This will never be set for a constant. 
>*/
> #define TREE_SIDE_EFFECTS(NODE) \
>-  (NON_TYPE_CHECK (NODE)->base.side_effects_flag)
>+  (as_a_tree_base (NON_TYPE_CHECK (NODE))->side_effects_flag)
> 
> /* In a LABEL_DECL, nonzero means this label had its address taken
>    and therefore can never be deleted and is a jump target for
>@@ -796,7 +807,7 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
>    because eventually we may make that a different bit.
> 
>    If this bit is set in an expression, so is TREE_SIDE_EFFECTS.  */
>-#define TREE_THIS_VOLATILE(NODE) ((NODE)->base.volatile_flag)
>+#define TREE_THIS_VOLATILE(NODE) (as_a_tree_base
>(NODE)->volatile_flag)
> 
> /* Nonzero means this node will not trap.  In an INDIRECT_REF, means
>    accessing the memory pointed to won't generate a trap.  However,
>@@ -877,20 +888,20 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
>In a BLOCK node, nonzero if reorder_blocks has already seen this block.
>    In an SSA_NAME node, nonzero if the SSA_NAME occurs in an abnormal
>    PHI node.  */
>-#define TREE_ASM_WRITTEN(NODE) ((NODE)->base.asm_written_flag)
>+#define TREE_ASM_WRITTEN(NODE) (as_a_tree_base
>(NODE)->asm_written_flag)
> 
> /* Nonzero in a _DECL if the name is used in its scope.
>    Nonzero in an expr node means inhibit warning if value is unused.
>    In IDENTIFIER_NODEs, this means that some extern decl for this name
>    was used.
>In a BLOCK, this means that the block contains variables that are used.
> */
>-#define TREE_USED(NODE) ((NODE)->base.used_flag)
>+#define TREE_USED(NODE) (as_a_tree_base (NODE)->used_flag)
> 
> /* In a FUNCTION_DECL, nonzero means a call to the function cannot
>    throw an exception.  In a CALL_EXPR, nonzero means the call cannot
>    throw.  We can't easily check the node type here as the C++
>    frontend also uses this flag (for AGGR_INIT_EXPR).  */
>-#define TREE_NOTHROW(NODE) ((NODE)->base.nothrow_flag)
>+#define TREE_NOTHROW(NODE) (as_a_tree_base (NODE)->nothrow_flag)
> 
> /* In a CALL_EXPR, means that it's safe to use the target of the call
>    expansion as the return slot for a call that returns in memory.  */
>@@ -939,9 +950,9 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
>   (CALL_EXPR_CHECK (NODE)->base.protected_flag)
> 
> /* Used in classes in C++.  */
>-#define TREE_PRIVATE(NODE) ((NODE)->base.private_flag)
>+#define TREE_PRIVATE(NODE) (as_a_tree_base (NODE)->private_flag)
> /* Used in classes in C++. */
>-#define TREE_PROTECTED(NODE) ((NODE)->base.protected_flag)
>+#define TREE_PROTECTED(NODE) (as_a_tree_base (NODE)->protected_flag)
> 
> /* True if reference type NODE is a C++ rvalue reference.  */
> #define TYPE_REF_IS_RVALUE(NODE) \
>@@ -950,7 +961,7 @@ extern void omp_clause_range_check_failed
>(const_tree, const char *, int,
> /* Nonzero in a _DECL if the use of the name is defined as a
>    deprecated feature by __attribute__((deprecated)).  */
> #define TREE_DEPRECATED(NODE) \
>-  ((NODE)->base.deprecated_flag)
>+  (as_a_tree_base (NODE)->deprecated_flag)
> 
> /* Nonzero indicates an IDENTIFIER_NODE that names an anonymous
>    aggregate, (as created by anon_aggr_name_format).  */
>@@ -2170,7 +2181,7 @@ extern tree vector_element_bits_tree
>(const_tree);
> 
>/* Used to keep track of visited nodes in tree traversals.  This is set
>to
>    0 by copy_node and make_node.  */
>-#define TREE_VISITED(NODE) ((NODE)->base.visited)
>+#define TREE_VISITED(NODE) (as_a_tree_base (NODE)->visited)
> 
> /* If set in an ARRAY_TYPE, indicates a string type (for languages
>    that distinguish string from array of char).
Jason Merrill Nov. 3, 2020, 10:18 p.m. UTC | #13
On Sat, Oct 31, 2020 at 6:35 PM Jan Hubicka <hubicka@ucw.cz> wrote:

> > On 10/29/20 1:40 PM, Richard Biener wrote:
> > > On Thu, 29 Oct 2020, Jakub Jelinek wrote:
> > >
> > > > On Thu, Oct 29, 2020 at 05:00:40PM +0100, Jan Hubicka wrote:
> > > > > >
> > > > > > That's ugly and will for sure defeat warning / access code
> > > > > > when we access this as char[], no?  I mean, we could
> > > > > > as well use 'int str[1];' here?
> > > > >
> > > > > Well, we always get char pointer via macro that is IMO OK, but I
> am also
> > > > > not very much in love with this.
> > > >
> > > > Do we treat signed char [...]; as typeless storage too, or just
> > > > what the C++ standard requires (i.e. char, unsigned char and
> std::byte
> > > > where the last one is enum type with unsigned char underlying type)?
> > >
> > > All that is covered by is_byte_access_type which includes all
> > > character types (including char16_t and wchar it seems) and std::byte.
> >
> > Well, that's a bug, apparently from the PR94923 patch (r11-499);
> previously
> > it was char, signed char, and unsigned char.  And even that is too much;
> > even C++98 said just char and unsigned char could be used for bytewise
> > access.
> >
> > When C++17 clarified this with the notion of "provides storage", it
> applied
> > to only unsigned char and std::byte, not even the full set of byte-access
> > types.  We still need to allow bytewise access using plain char, but
> perhaps
> > we don't need to treat plain char arrays as typeless.
> >
> > Attributes to say that a particular array does or does not provide
> storage
> > for objects of other types do sound useful.
>
> Thanks a lot for explanation!
> I am adding Martin Sebor to CC. Perhaps you can fix the problem with
> std::byte, and signed char to imply typeless storage and ideally also
> add an attribute? This seem too deep into C++ FE details for me.
>

I've fixed is_byte_access_type; now we shouldn't treat arrays of signed
char as typeless storage.  I don't have time to add the attribute in stage
1.

Jason
diff mbox series

Patch

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index c9280a8d3b1..63dbb5b8eab 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1401,7 +1401,8 @@  struct GTY(()) tree_fixed_cst {
 struct GTY(()) tree_string {
   struct tree_typed typed;
   int length;
-  char str[1];
+  /* Avoid char array that would make whole type to be typeless storage.  */
+  struct {char c;} str[1];
 };
 
 struct GTY(()) tree_complex {
diff --git a/gcc/tree.c b/gcc/tree.c
index 81f867ddded..84115630184 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -2273,7 +2273,7 @@  build_string (unsigned len, const char *str /*= NULL */)
     memcpy (s->string.str, str, len);
   else
     memset (s->string.str, 0, len);
-  s->string.str[len] = '\0';
+  s->string.str[len].c = '\0';
 
   return s;
 }