diff mbox

Avoid some C++ local statics with constructors

Message ID 20160923084433.GK7282@tucnak.redhat.com
State New
Headers show

Commit Message

Jakub Jelinek Sept. 23, 2016, 8:44 a.m. UTC
On Fri, Sep 23, 2016 at 08:55:02AM +0200, Richard Biener wrote:
> While I agree with the goal to reduce the number of static constructors
> esp. the vNULL cases I disagree with.  This is just introducing
> undefined behavior (uninitialized object use), and in case we end up
> making vNULL not all-zeros it's bound to break.  So IMHO your patch
> is very much premature optimization here.

No, there is no undefined behavior, and we rely on the zero initialization
in > 100 cases, see
grep '\(^\|static \)vec \?<.*;' *.c *.cc */*.c ada/*/*.c | grep -v '\*'
or anywhere where we have vec part of a struct that we allocate cleared or
clear after allocation.
The thing is, vec is (intentionally) a POD, thus doesn't have a ctor,
and the = vNULL "construction" is
  template <typename T, typename A, typename L>
  operator vec<T, A, L> () { return vec<T, A, L>(); }
actually clearing of the vector.  If we ever wanted to initialize vec to
something not-all-zeros, we'd actually need to turn it into non-POD and add
ctor.

I'm attaching 3 further patches (not bootstrapped/regtested yet), which:
1) adds constexpr keyword for C++11 + to the vNULL operator, this seems to
be enough to get rid of the runtime initialization, both in the
tree-ssa-sccvn.o case with _ZGV var and e.g. in passes.c get rid of
_Z41__static_initialization_and_destruction_0ii ctor
2) another patch to remove the useless = vNULL initializers from file scope
vecs, we don't usually = 0 static vars either (though, apparently we have
hundreds of those)
3) a patch that attempts to clarify when = vNULL is needed and when it is
not

> Maybe we can change vNULL to sth that avoids the constructor, if only
> if C++14 is available (thus in stage2+)?
> 
> For cases like this I hope we could make GCC optimize away the static
> construction, maybe you can reduce it to a simple testcase and file
> a bugreport?  It will require making static constructors "first class"
> in the cgraph so we know at some point the function body initializing
> a specific global and some later cgraph phase builds up the global
> constructor calling all remaining relevant individual constructor
> functions (which can then still be inlined into that fn).

A short testcase is e.g.
struct S { int a; };
void baz (S *);
#if __cpp_constexpr >= 200704
constexpr
#endif
inline S foo () { return (S) { 2 }; }

S s = foo ();

void
bar ()
{
  static S t = foo ();
  baz (&t);
}
Here for -std=c++98 (or even -std=c++1z without the constexpr keyword)
at -O0 there is both _Z41__static_initialization_and_destruction_0ii and
the __cxa_guard*+_ZGV stuff, at -O2 we manage to inline
_Z41__static_initialization_and_destruction_0ii but still end up with
_GLOBAL__sub_I_s:
        movl    $2, s(%rip)
        ret
registered in .ctors.  And the guard stuff is left too.
Optimizing that away would be nice, but non-fun, we'd have to recognize
something like:

  static struct S t;
  unsigned char _1;
  int _2;

  <bb 2>:
  _1 = __atomic_load_1 (&_ZGVZ3barvE1t, 2);
  if (_1 == 0)
    goto <bb 4>;
  else
    goto <bb 3>;

  <bb 3>:
  goto <bb 6>;

  <bb 4>:
  _2 = __cxa_guard_acquire (&_ZGVZ3barvE1t);
  if (_2 != 0)
    goto <bb 5>;
  else
    goto <bb 3>;

  <bb 5>:
  MEM[(struct S *)&t] = 2;
  __cxa_guard_release (&_ZGVZ3barvE1t);

  <bb 6>:

and turn those stores into the static var into a static initializer
and throw away the _ZGV* var, plus if this is in an inline function
(i.e. the static var and its _ZGV* var are comdat) it might be an
ABI thing too (unfortunately the _ZGV* var and the actual var it is guarding
aren't in the same comdat, so even a trick like keeping the runtime
initialization in, but initialize the var to the expected values and
preinitialize the _ZGV* guard into the "already initialized" state wouldn't
work reliably, if the linker picks the _ZGV* var from one TU and the guarded
var from another one, it could fail).  So, I'm afraid if we want to do this
optimization, we'd need to limit it to the cases where the _ZGV* var isn't
visible outside of the TU.

Ok for the 3 patches (or some subset of them) for trunk if they pass
bootstrap/regtest?

	Jakub
2016-09-23  Jakub Jelinek  <jakub@redhat.com>

	* vec.h (vnull::operator vec): Add constexpr keyword for
	C++11 and later.
2016-09-23  Jakub Jelinek  <jakub@redhat.com>

	* sel-sched-ir.c (sel_global_bb_info, sel_region_bb_info,
	loop_nests, s_i_d, last_added_blocks): Remove unnecessary
	= vNULL initialization of file scope vec.
	* passes.c (pass_tab, enabled_pass_uid_range_tab,
	disabled_pass_uid_range_tab): Likewise.
	* haifa-sched.c (sched_luids, h_i_d): Likewise.
	* tree-chkp-opt.c (check_infos): Likewise.
	* sel-sched.c (vec_av_set, vec_temp_moveop_nops): Likewise.
c/
	* c-parser.c (incomplete_record_decls): Remove unnecessary
	= vNULL initialization of file scope vec.
cp/
	* constexpr.c (call_stack): Remove unnecessary
	= vNULL initialization of file scope vec.

--- gcc/sel-sched-ir.c.jj	2016-03-15 17:10:19.000000000 +0100
+++ gcc/sel-sched-ir.c	2016-09-23 09:56:54.145357346 +0200
@@ -45,12 +45,10 @@ along with GCC; see the file COPYING3.
 #include "sel-sched-dump.h"
 
 /* A vector holding bb info for whole scheduling pass.  */
-vec<sel_global_bb_info_def>
-    sel_global_bb_info = vNULL;
+vec<sel_global_bb_info_def> sel_global_bb_info;
 
 /* A vector holding bb info.  */
-vec<sel_region_bb_info_def>
-    sel_region_bb_info = vNULL;
+vec<sel_region_bb_info_def> sel_region_bb_info;
 
 /* A pool for allocating all lists.  */
 object_allocator<_list_node> sched_lists_pool ("sel-sched-lists");
@@ -66,7 +64,7 @@ struct loop *current_loop_nest;
 
 /* LOOP_NESTS is a vector containing the corresponding loop nest for
    each region.  */
-static vec<loop_p> loop_nests = vNULL;
+static vec<loop_p> loop_nests;
 
 /* Saves blocks already in loop regions, indexed by bb->index.  */
 static sbitmap bbs_in_loop_rgns = NULL;
@@ -4163,7 +4161,7 @@ finish_region_bb_info (void)
 
 
 /* Data for each insn in current region.  */
-vec<sel_insn_data_def> s_i_d = vNULL;
+vec<sel_insn_data_def> s_i_d;
 
 /* Extend data structures for insns from current region.  */
 static void
@@ -4499,8 +4497,7 @@ get_av_level (insn_t insn)
 
 /* The basic block that already has been processed by the sched_data_update (),
    but hasn't been in sel_add_bb () yet.  */
-static vec<basic_block>
-    last_added_blocks = vNULL;
+static vec<basic_block> last_added_blocks;
 
 /* A pool for allocating successor infos.  */
 static struct
--- gcc/passes.c.jj	2016-09-14 16:00:50.000000000 +0200
+++ gcc/passes.c	2016-09-23 09:55:41.265281188 +0200
@@ -862,7 +862,7 @@ pass_manager::register_pass_name (opt_pa
 /* Map from pass id to canonicalized pass name.  */
 
 typedef const char *char_ptr;
-static vec<char_ptr> pass_tab = vNULL;
+static vec<char_ptr> pass_tab;
 
 /* Callback function for traversing NAME_TO_PASS_MAP.  */
 
@@ -982,10 +982,8 @@ struct uid_range
 typedef struct uid_range *uid_range_p;
 
 
-static vec<uid_range_p>
-      enabled_pass_uid_range_tab = vNULL;
-static vec<uid_range_p>
-      disabled_pass_uid_range_tab = vNULL;
+static vec<uid_range_p> enabled_pass_uid_range_tab;
+static vec<uid_range_p> disabled_pass_uid_range_tab;
 
 
 /* Parse option string for -fdisable- and -fenable-
--- gcc/haifa-sched.c.jj	2016-08-29 12:17:29.000000000 +0200
+++ gcc/haifa-sched.c	2016-09-23 09:55:15.440612566 +0200
@@ -401,13 +401,13 @@ const struct common_sched_info_def haifa
   };
 
 /* Mapping from instruction UID to its Logical UID.  */
-vec<int> sched_luids = vNULL;
+vec<int> sched_luids;
 
 /* Next LUID to assign to an instruction.  */
 int sched_max_luid = 1;
 
 /* Haifa Instruction Data.  */
-vec<haifa_insn_data_def> h_i_d = vNULL;
+vec<haifa_insn_data_def> h_i_d;
 
 void (* sched_init_only_bb) (basic_block, basic_block);
 
--- gcc/tree-chkp-opt.c.jj	2016-08-17 10:22:14.000000000 +0200
+++ gcc/tree-chkp-opt.c	2016-09-23 09:57:09.715159981 +0200
@@ -84,7 +84,7 @@ static void chkp_collect_value (tree ssa
 #define chkp_checku_fndecl \
   (targetm.builtin_chkp_function (BUILT_IN_CHKP_BNDCU))
 
-static vec<struct bb_checks, va_heap, vl_ptr> check_infos = vNULL;
+static vec<struct bb_checks, va_heap, vl_ptr> check_infos;
 
 /* Comparator for pol_item structures I1 and I2 to be used
    to find items with equal var.  Also used for polynomial
--- gcc/sel-sched.c.jj	2016-08-06 12:11:51.000000000 +0200
+++ gcc/sel-sched.c	2016-09-23 09:56:20.679781563 +0200
@@ -494,7 +494,7 @@ static int max_ws;
 static int num_insns_scheduled;
 
 /* A vector of expressions is used to be able to sort them.  */
-static vec<expr_t> vec_av_set = vNULL;
+static vec<expr_t> vec_av_set;
 
 /* A vector of vinsns is used to hold temporary lists of vinsns.  */
 typedef vec<vinsn_t> vinsn_vec_t;
@@ -512,7 +512,7 @@ static vinsn_vec_t vec_target_unavailabl
 
 /* Vector to store temporary nops inserted in move_op to prevent removal
    of empty bbs.  */
-static vec<insn_t> vec_temp_moveop_nops = vNULL;
+static vec<insn_t> vec_temp_moveop_nops;
 
 /* These bitmaps record original instructions scheduled on the current
    iteration and bookkeeping copies created by them.  */
--- gcc/c/c-parser.c.jj	2016-09-14 23:48:59.000000000 +0200
+++ gcc/c/c-parser.c	2016-09-23 09:57:28.249925030 +0200
@@ -67,7 +67,7 @@ along with GCC; see the file COPYING3.
    In c_parser_translation_unit(), we iterate over incomplete_record_decls
    and report error if any of the decls are still incomplete.  */ 
 
-vec<tree> incomplete_record_decls = vNULL;
+vec<tree> incomplete_record_decls;
 
 void
 set_c_expr_source_range (c_expr *expr,
--- gcc/cp/constexpr.c.jj	2016-09-20 17:17:51.000000000 +0200
+++ gcc/cp/constexpr.c	2016-09-23 09:57:46.055699321 +0200
@@ -1253,7 +1253,7 @@ cxx_bind_parameters_in_call (const const
    These do not need to be marked for PCH or GC.  */
 
 /* FIXME remember and print actual constant arguments.  */
-static vec<tree> call_stack = vNULL;
+static vec<tree> call_stack;
 static int call_stack_tick;
 static int last_cx_error_tick;
2016-09-23  Jakub Jelinek  <jakub@redhat.com>

	* vec.h (vNULL): Extend comment to say = vNULL initialization
	isn't needed for static vars.

--- gcc/vec.h.jj	2016-09-23 09:53:13.000000000 +0200
+++ gcc/vec.h	2016-09-23 10:11:06.811548785 +0200
@@ -410,7 +410,9 @@ struct GTY((user)) vec
 /* Type to provide NULL values for vec<T, A, L>.  This is used to
    provide nil initializers for vec instances.  Since vec must be
    a POD, we cannot have proper ctor/dtor for it.  To initialize
-   a vec instance, you can assign it the value vNULL.  */
+   a vec instance, you can assign it the value vNULL.  This isn't
+   needed for file-scope and function-local static vectors, which
+   are zero-initialized by default.  */
 struct vnull
 {
   template <typename T, typename A, typename L>

Comments

Richard Biener Sept. 23, 2016, 8:59 a.m. UTC | #1
On Fri, 23 Sep 2016, Jakub Jelinek wrote:

> On Fri, Sep 23, 2016 at 08:55:02AM +0200, Richard Biener wrote:
> > While I agree with the goal to reduce the number of static constructors
> > esp. the vNULL cases I disagree with.  This is just introducing
> > undefined behavior (uninitialized object use), and in case we end up
> > making vNULL not all-zeros it's bound to break.  So IMHO your patch
> > is very much premature optimization here.
> 
> No, there is no undefined behavior, and we rely on the zero initialization
> in > 100 cases, see
> grep '\(^\|static \)vec \?<.*;' *.c *.cc */*.c ada/*/*.c | grep -v '\*'
> or anywhere where we have vec part of a struct that we allocate cleared or
> clear after allocation.
> The thing is, vec is (intentionally) a POD, thus doesn't have a ctor,
> and the = vNULL "construction" is
>   template <typename T, typename A, typename L>
>   operator vec<T, A, L> () { return vec<T, A, L>(); }
> actually clearing of the vector.  If we ever wanted to initialize vec to
> something not-all-zeros, we'd actually need to turn it into non-POD and add
> ctor.
> 
> I'm attaching 3 further patches (not bootstrapped/regtested yet), which:
> 1) adds constexpr keyword for C++11 + to the vNULL operator, this seems to
> be enough to get rid of the runtime initialization, both in the
> tree-ssa-sccvn.o case with _ZGV var and e.g. in passes.c get rid of
> _Z41__static_initialization_and_destruction_0ii ctor

Really?  We have

extern vnull vNULL;

thus all = vNULL initializers are really copies of an instance in vec.c.

This patch is nevertheless fine.

> 2) another patch to remove the useless = vNULL initializers from file scope
> vecs, we don't usually = 0 static vars either (though, apparently we have
> hundreds of those)

Ok.

> 3) a patch that attempts to clarify when = vNULL is needed and when it is
> not

Ok as well.

> > Maybe we can change vNULL to sth that avoids the constructor, if only
> > if C++14 is available (thus in stage2+)?
> > 
> > For cases like this I hope we could make GCC optimize away the static
> > construction, maybe you can reduce it to a simple testcase and file
> > a bugreport?  It will require making static constructors "first class"
> > in the cgraph so we know at some point the function body initializing
> > a specific global and some later cgraph phase builds up the global
> > constructor calling all remaining relevant individual constructor
> > functions (which can then still be inlined into that fn).
> 
> A short testcase is e.g.
> struct S { int a; };
> void baz (S *);
> #if __cpp_constexpr >= 200704
> constexpr
> #endif
> inline S foo () { return (S) { 2 }; }
> 
> S s = foo ();
> 
> void
> bar ()
> {
>   static S t = foo ();
>   baz (&t);
> }
> Here for -std=c++98 (or even -std=c++1z without the constexpr keyword)
> at -O0 there is both _Z41__static_initialization_and_destruction_0ii and
> the __cxa_guard*+_ZGV stuff, at -O2 we manage to inline
> _Z41__static_initialization_and_destruction_0ii but still end up with
> _GLOBAL__sub_I_s:
>         movl    $2, s(%rip)
>         ret
> registered in .ctors.

Yeah, I think this is where we could improve.

>  And the guard stuff is left too.
> Optimizing that away would be nice, but non-fun, we'd have to recognize
> something like:
> 
>   static struct S t;
>   unsigned char _1;
>   int _2;
> 
>   <bb 2>:
>   _1 = __atomic_load_1 (&_ZGVZ3barvE1t, 2);
>   if (_1 == 0)
>     goto <bb 4>;
>   else
>     goto <bb 3>;
> 
>   <bb 3>:
>   goto <bb 6>;
> 
>   <bb 4>:
>   _2 = __cxa_guard_acquire (&_ZGVZ3barvE1t);
>   if (_2 != 0)
>     goto <bb 5>;
>   else
>     goto <bb 3>;
> 
>   <bb 5>:
>   MEM[(struct S *)&t] = 2;
>   __cxa_guard_release (&_ZGVZ3barvE1t);
> 
>   <bb 6>:

Ick - I totally forgot about the local static initializer woes
with guards.  _Maybe_ we can represent this by putting the guard
into the initializer function (remember, I'd like to see all
static initializers be represented as functions, refered to
from their initializing varpool node as, say, ->initializer).
Thus we'd have the FE generate

bar ()
{
  __static_init_for_t ();
...
}

__static_init_for_t ()
{
  all the guard stuff plus the init
}

the initializers would be always-inline (at IPA time when optimizing
to be able to do the promotion to a static ctor).

> and turn those stores into the static var into a static initializer
> and throw away the _ZGV* var, plus if this is in an inline function
> (i.e. the static var and its _ZGV* var are comdat) it might be an
> ABI thing too (unfortunately the _ZGV* var and the actual var it is guarding
> aren't in the same comdat, so even a trick like keeping the runtime
> initialization in, but initialize the var to the expected values and
> preinitialize the _ZGV* guard into the "already initialized" state wouldn't
> work reliably, if the linker picks the _ZGV* var from one TU and the guarded
> var from another one, it could fail).  So, I'm afraid if we want to do this
> optimization, we'd need to limit it to the cases where the _ZGV* var isn't
> visible outside of the TU.
> 
> Ok for the 3 patches (or some subset of them) for trunk if they pass
> bootstrap/regtest?

Ok.

Richard.
Jakub Jelinek Sept. 23, 2016, 9:15 a.m. UTC | #2
On Fri, Sep 23, 2016 at 10:59:12AM +0200, Richard Biener wrote:
> On Fri, 23 Sep 2016, Jakub Jelinek wrote:
> 
> > On Fri, Sep 23, 2016 at 08:55:02AM +0200, Richard Biener wrote:
> > > While I agree with the goal to reduce the number of static constructors
> > > esp. the vNULL cases I disagree with.  This is just introducing
> > > undefined behavior (uninitialized object use), and in case we end up
> > > making vNULL not all-zeros it's bound to break.  So IMHO your patch
> > > is very much premature optimization here.
> > 
> > No, there is no undefined behavior, and we rely on the zero initialization
> > in > 100 cases, see
> > grep '\(^\|static \)vec \?<.*;' *.c *.cc */*.c ada/*/*.c | grep -v '\*'
> > or anywhere where we have vec part of a struct that we allocate cleared or
> > clear after allocation.
> > The thing is, vec is (intentionally) a POD, thus doesn't have a ctor,
> > and the = vNULL "construction" is
> >   template <typename T, typename A, typename L>
> >   operator vec<T, A, L> () { return vec<T, A, L>(); }
> > actually clearing of the vector.  If we ever wanted to initialize vec to
> > something not-all-zeros, we'd actually need to turn it into non-POD and add
> > ctor.
> > 
> > I'm attaching 3 further patches (not bootstrapped/regtested yet), which:
> > 1) adds constexpr keyword for C++11 + to the vNULL operator, this seems to
> > be enough to get rid of the runtime initialization, both in the
> > tree-ssa-sccvn.o case with _ZGV var and e.g. in passes.c get rid of
> > _Z41__static_initialization_and_destruction_0ii ctor
> 
> Really?  We have
> 
> extern vnull vNULL;

vNULL is not a vec (after all, vec is a template, so there can't be a single
object), just a dummy object with a conversion operator template which
value-initializes an unnamed vec temporary and the assignment copies it into
the variable.

Anyway, I'll file 2 PRs, one for the file scope static initialization
optimization, another for the local static.

	Jakub
diff mbox

Patch

--- gcc/vec.h.jj	2016-04-26 08:08:18.000000000 +0200
+++ gcc/vec.h	2016-09-23 09:48:48.813586327 +0200
@@ -414,6 +414,9 @@  struct GTY((user)) vec
 struct vnull
 {
   template <typename T, typename A, typename L>
+#if __cpp_constexpr >= 201103
+  constexpr
+#endif
   operator vec<T, A, L> () { return vec<T, A, L>(); }
 };
 extern vnull vNULL;