diff mbox

Fix PR c++/70452 (regression in C++ parsing performance)

Message ID 1459538004-13812-1-git-send-email-patrick@parcs.ath.cx
State New
Headers show

Commit Message

Patrick Palka April 1, 2016, 7:13 p.m. UTC
Currently during constexpr CALL_EXPR evaluation we create a new copy of
the callee function's body for each separate call with no attempt made
at reusing the function body.  So when a function ends up getting called
10s of thousands of times durnig constexpr evaluuation, we end up
creating 10s of thousands of copies of the function.

This patch is an attempt at reusing these copies instead of having a new
copy get created each call.  It implements a per-function freelist of
the unshared body/parm/res copies that were created by copy_fn().  When
a constexpr call is finished it pushes the body/parm/res trees to the
freelist, and before a call is evaluated it tries to reuse the trees
from the freelist, falling back to copy_fn() only if the freelist is
empty.

Consequently, instead of creating 10s of thousands of copies for 10s of
thousands of calls, the number of copies that're made corresponds to the
recursion depth of the function.  This makes sense because the reason we
create copies in the first place (IIUC) is to ensure that recursive
calls to a function do not share the same
PARM_DECLs/VAR_DECLs/RESULT_DECLs.

In order for this reuse to work properly in the presence of SAVE_EXPRs,
we have to track each SAVE_EXPR (belonging to the callee) that we
evaluate during the call and then forget its value after the call.
constexpr-vla4.C is a new test that would fail if this patch had missed
this SAVE_EXPR part.

With this patch, the compile time of the test case in the PR gets cut
from 3.5s to 2s and memory usage gets cut from 550M to 200M.

I bootstrapped + regtested this change on x86_64-pc-linux-gnu, and also
compile-tested it against boost with no regressions.

gcc/cp/ChangeLog:

	PR c++/70452
	* constexpr.c (struct bpr_entry): New struct.
	(struct constexpr_fundef): New field bpr_freelist.
	(retrieve_constexpr_fundef): Initialize field bpr_freelist to
	NULL.
	(register_constexpr_fundef): Likewise.
	(cxx_eval_call_expression): Check constexpr_fundef::bpr_freelist
	for a reusable copy of the function's body before creating a new
	copy.  Keep track of the callee's SAVE_EXPRs that we evaluate
	and forget their values afterwards. Push the function body we
	used onto the freelist for later reuse.

gcc/testsuite/ChangeLog:

	PR c++/70452
	* g++.dg/ext/constexpr-vla4.C: New test.
---
 gcc/cp/constexpr.c                        | 55 ++++++++++++++++++++++++++++---
 gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++++++
 2 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C

Comments

Patrick Palka April 1, 2016, 7:24 p.m. UTC | #1
On Fri, Apr 1, 2016 at 3:13 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> Currently during constexpr CALL_EXPR evaluation we create a new copy of
> the callee function's body for each separate call with no attempt made
> at reusing the function body.  So when a function ends up getting called
> 10s of thousands of times durnig constexpr evaluuation, we end up
> creating 10s of thousands of copies of the function.
>
> This patch is an attempt at reusing these copies instead of having a new
> copy get created each call.  It implements a per-function freelist of
> the unshared body/parm/res copies that were created by copy_fn().  When
> a constexpr call is finished it pushes the body/parm/res trees to the
> freelist, and before a call is evaluated it tries to reuse the trees
> from the freelist, falling back to copy_fn() only if the freelist is
> empty.
>
> Consequently, instead of creating 10s of thousands of copies for 10s of
> thousands of calls, the number of copies that're made corresponds to the
> recursion depth of the function.  This makes sense because the reason we
> create copies in the first place (IIUC) is to ensure that recursive
> calls to a function do not share the same
> PARM_DECLs/VAR_DECLs/RESULT_DECLs.
>
> In order for this reuse to work properly in the presence of SAVE_EXPRs,
> we have to track each SAVE_EXPR (belonging to the callee) that we
> evaluate during the call and then forget its value after the call.
> constexpr-vla4.C is a new test that would fail if this patch had missed
> this SAVE_EXPR part.
>
> With this patch, the compile time of the test case in the PR gets cut
> from 3.5s to 2s and memory usage gets cut from 550M to 200M.
>
> I bootstrapped + regtested this change on x86_64-pc-linux-gnu, and also
> compile-tested it against boost with no regressions.
>
> gcc/cp/ChangeLog:
>
>         PR c++/70452
>         * constexpr.c (struct bpr_entry): New struct.
>         (struct constexpr_fundef): New field bpr_freelist.
>         (retrieve_constexpr_fundef): Initialize field bpr_freelist to
>         NULL.
>         (register_constexpr_fundef): Likewise.
>         (cxx_eval_call_expression): Check constexpr_fundef::bpr_freelist
>         for a reusable copy of the function's body before creating a new
>         copy.  Keep track of the callee's SAVE_EXPRs that we evaluate
>         and forget their values afterwards. Push the function body we
>         used onto the freelist for later reuse.
>
> gcc/testsuite/ChangeLog:
>
>         PR c++/70452
>         * g++.dg/ext/constexpr-vla4.C: New test.
> ---
>  gcc/cp/constexpr.c                        | 55 ++++++++++++++++++++++++++++---
>  gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++++++++++
>  2 files changed, 67 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C
>
> diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> index ea605dc..fc9b67c 100644
> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -112,11 +112,24 @@ ensure_literal_type_for_constexpr_object (tree decl)
>    return decl;
>  }
>
> +/* The representation of a single node in the constexpr_fundef::bpr_freelist chain.  */
> +
> +struct GTY((chain_next ("%h.prev"))) bpr_entry
> +{
> +  tree body;
> +  tree parms;
> +  tree res;
> +  struct bpr_entry *prev;
> +};
> +
>  /* Representation of entries in the constexpr function definition table.  */
>
>  struct GTY((for_user)) constexpr_fundef {
>    tree decl;
>    tree body;
> +  /* A chain of unused copies of this function's body awaiting reuse for
> +     CALL_EXPR evaluation.  */
> +  struct bpr_entry *bpr_freelist;
>  };
>
>  struct constexpr_fundef_hasher : ggc_ptr_hash<constexpr_fundef>
> @@ -154,7 +167,7 @@ constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
>  static constexpr_fundef *
>  retrieve_constexpr_fundef (tree fun)
>  {
> -  constexpr_fundef fundef = { NULL, NULL };
> +  constexpr_fundef fundef = { NULL, NULL, NULL };
>    if (constexpr_fundef_table == NULL)
>      return NULL;
>
> @@ -806,6 +819,7 @@ register_constexpr_fundef (tree fun, tree body)
>
>    entry.decl = fun;
>    entry.body = body;
> +  entry.bpr_freelist = NULL;
>    slot = constexpr_fundef_table->find_slot (&entry, INSERT);
>
>    gcc_assert (*slot == NULL);
> @@ -1377,10 +1391,21 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>        if (!result || result == error_mark_node)
>         {
>           gcc_assert (DECL_SAVED_TREE (fun));
> -         tree parms, res;
> +         tree body, parms, res;
>
> -         /* Unshare the whole function body.  */
> -         tree body = copy_fn (fun, parms, res);
> +         /* Reuse or create a new unshared copy of this function's body.  */
> +         if (bpr_entry *bpr = new_call.fundef->bpr_freelist)
> +           {
> +             body = bpr->body;
> +             parms = bpr->parms;
> +             res = bpr->res;
> +             new_call.fundef->bpr_freelist = bpr->prev;
> +           }
> +         else
> +           {
> +             /* Unshare the whole function body.  */
> +             body = copy_fn (fun, parms, res);
> +           }
>
>           /* Associate the bindings with the remapped parms.  */
>           tree bound = new_call.bindings;
> @@ -1409,11 +1434,22 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
>           else
>             ctx->values->put (res, NULL_TREE);
>
> +         /* Track the SAVE_EXPRs of the callee that we evaluate so that
> +            we can later forget their values.  */
> +         constexpr_ctx ctx_with_save_exprs = *ctx;
> +         hash_set<tree> save_exprs;
> +         ctx_with_save_exprs.save_exprs = &save_exprs;

Didn't like the way this comment is worded so I changed it to

+         /* Track the callee's evaluated SAVE_EXPRs so that we can forget
+            their values after the call.  */
Jason Merrill April 2, 2016, 1:28 a.m. UTC | #2
I like this approach a lot.  One thing, though:

On 04/01/2016 03:13 PM, Patrick Palka wrote:
> +struct GTY((chain_next ("%h.prev"))) bpr_entry
> +{
> +  tree body;
> +  tree parms;
> +  tree res;
> +  struct bpr_entry *prev;
> +};
> +
>   /* Representation of entries in the constexpr function definition table.  */
>
>   struct GTY((for_user)) constexpr_fundef {
>     tree decl;
>     tree body;
> +  /* A chain of unused copies of this function's body awaiting reuse for
> +     CALL_EXPR evaluation.  */
> +  struct bpr_entry *bpr_freelist;
>   };

The freelist should be discarded on GC.  I think the way to achieve that 
is to put all the freelists into a separate deletable hash table rather 
than hang them off the fundefs.

Jason
Patrick Palka April 2, 2016, 1:54 a.m. UTC | #3
On Fri, Apr 1, 2016 at 9:28 PM, Jason Merrill <jason@redhat.com> wrote:
> I like this approach a lot.  One thing, though:
>
> On 04/01/2016 03:13 PM, Patrick Palka wrote:
>>
>> +struct GTY((chain_next ("%h.prev"))) bpr_entry
>> +{
>> +  tree body;
>> +  tree parms;
>> +  tree res;
>> +  struct bpr_entry *prev;
>> +};
>> +
>>   /* Representation of entries in the constexpr function definition table.
>> */
>>
>>   struct GTY((for_user)) constexpr_fundef {
>>     tree decl;
>>     tree body;
>> +  /* A chain of unused copies of this function's body awaiting reuse for
>> +     CALL_EXPR evaluation.  */
>> +  struct bpr_entry *bpr_freelist;
>>   };
>
>
> The freelist should be discarded on GC.  I think the way to achieve that is
> to put all the freelists into a separate deletable hash table rather than
> hang them off the fundefs.
>
> Jason
>

Would instead marking the bpr_freelist field as GTY ((skip)) have the
same effect?
Patrick Palka April 2, 2016, 2:09 a.m. UTC | #4
On Fri, Apr 1, 2016 at 9:54 PM, Patrick Palka <patrick@parcs.ath.cx> wrote:
> On Fri, Apr 1, 2016 at 9:28 PM, Jason Merrill <jason@redhat.com> wrote:
>> I like this approach a lot.  One thing, though:
>>
>> On 04/01/2016 03:13 PM, Patrick Palka wrote:
>>>
>>> +struct GTY((chain_next ("%h.prev"))) bpr_entry
>>> +{
>>> +  tree body;
>>> +  tree parms;
>>> +  tree res;
>>> +  struct bpr_entry *prev;
>>> +};
>>> +
>>>   /* Representation of entries in the constexpr function definition table.
>>> */
>>>
>>>   struct GTY((for_user)) constexpr_fundef {
>>>     tree decl;
>>>     tree body;
>>> +  /* A chain of unused copies of this function's body awaiting reuse for
>>> +     CALL_EXPR evaluation.  */
>>> +  struct bpr_entry *bpr_freelist;
>>>   };
>>
>>
>> The freelist should be discarded on GC.  I think the way to achieve that is
>> to put all the freelists into a separate deletable hash table rather than
>> hang them off the fundefs.
>>
>> Jason
>>
>
> Would instead marking the bpr_freelist field as GTY ((skip)) have the
> same effect?

Looks like not according to
https://gcc.gnu.org/onlinedocs/gccint/GTY-Options.html.  I should have
read that first :)
diff mbox

Patch

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ea605dc..fc9b67c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -112,11 +112,24 @@  ensure_literal_type_for_constexpr_object (tree decl)
   return decl;
 }
 
+/* The representation of a single node in the constexpr_fundef::bpr_freelist chain.  */
+
+struct GTY((chain_next ("%h.prev"))) bpr_entry
+{
+  tree body;
+  tree parms;
+  tree res;
+  struct bpr_entry *prev;
+};
+
 /* Representation of entries in the constexpr function definition table.  */
 
 struct GTY((for_user)) constexpr_fundef {
   tree decl;
   tree body;
+  /* A chain of unused copies of this function's body awaiting reuse for
+     CALL_EXPR evaluation.  */
+  struct bpr_entry *bpr_freelist;
 };
 
 struct constexpr_fundef_hasher : ggc_ptr_hash<constexpr_fundef>
@@ -154,7 +167,7 @@  constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
 static constexpr_fundef *
 retrieve_constexpr_fundef (tree fun)
 {
-  constexpr_fundef fundef = { NULL, NULL };
+  constexpr_fundef fundef = { NULL, NULL, NULL };
   if (constexpr_fundef_table == NULL)
     return NULL;
 
@@ -806,6 +819,7 @@  register_constexpr_fundef (tree fun, tree body)
 
   entry.decl = fun;
   entry.body = body;
+  entry.bpr_freelist = NULL;
   slot = constexpr_fundef_table->find_slot (&entry, INSERT);
 
   gcc_assert (*slot == NULL);
@@ -1377,10 +1391,21 @@  cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
       if (!result || result == error_mark_node)
 	{
 	  gcc_assert (DECL_SAVED_TREE (fun));
-	  tree parms, res;
+	  tree body, parms, res;
 
-	  /* Unshare the whole function body.  */
-	  tree body = copy_fn (fun, parms, res);
+	  /* Reuse or create a new unshared copy of this function's body.  */
+	  if (bpr_entry *bpr = new_call.fundef->bpr_freelist)
+	    {
+	      body = bpr->body;
+	      parms = bpr->parms;
+	      res = bpr->res;
+	      new_call.fundef->bpr_freelist = bpr->prev;
+	    }
+	  else
+	    {
+	      /* Unshare the whole function body.  */
+	      body = copy_fn (fun, parms, res);
+	    }
 
 	  /* Associate the bindings with the remapped parms.  */
 	  tree bound = new_call.bindings;
@@ -1409,11 +1434,22 @@  cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	  else
 	    ctx->values->put (res, NULL_TREE);
 
+	  /* Track the SAVE_EXPRs of the callee that we evaluate so that
+	     we can later forget their values.  */
+	  constexpr_ctx ctx_with_save_exprs = *ctx;
+	  hash_set<tree> save_exprs;
+	  ctx_with_save_exprs.save_exprs = &save_exprs;
+
 	  tree jump_target = NULL_TREE;
-	  cxx_eval_constant_expression (ctx, body,
+	  cxx_eval_constant_expression (&ctx_with_save_exprs, body,
 					lval, non_constant_p, overflow_p,
 					&jump_target);
 
+	  /* Forget the saved values of the callee's SAVE_EXPRs.  */
+	  for (hash_set<tree>::iterator iter = save_exprs.begin();
+	       iter != save_exprs.end(); ++iter)
+	    ctx_with_save_exprs.values->remove (*iter);
+
 	  if (DECL_CONSTRUCTOR_P (fun))
 	    /* This can be null for a subobject constructor call, in
 	       which case what we care about is the initialization
@@ -1444,6 +1480,15 @@  cxx_eval_call_expression (const constexpr_ctx *ctx, tree t,
 	    ctx->values->remove (slot);
 	  for (tree parm = parms; parm; parm = TREE_CHAIN (parm))
 	    ctx->values->remove (parm);
+
+	  /* Make the unshared function body that we evaluated available for
+	     reuse by a later call to cxx_eval_call_expression.  */
+	  bpr_entry *bpr = ggc_alloc<bpr_entry> ();
+	  bpr->body = body;
+	  bpr->parms = parms;
+	  bpr->res = res;
+	  bpr->prev = new_call.fundef->bpr_freelist;
+	  new_call.fundef->bpr_freelist = bpr;
 	}
 
       if (result == error_mark_node)
diff --git a/gcc/testsuite/g++.dg/ext/constexpr-vla4.C b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
new file mode 100644
index 0000000..428a8fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/constexpr-vla4.C
@@ -0,0 +1,17 @@ 
+// PR c++/70452
+// { dg-do compile { target c++14 } }
+
+constexpr int
+foo (int n, bool p)
+{
+  __extension__ int a [n] = { 0 };
+  if (n == 3)
+    foo (n - 2, false);
+  if (n == 3)
+    foo(n - 1, true);
+  if (p)
+    return a[1];
+  return 0;
+}
+
+constexpr int i2 = foo (3, false); // { dg-bogus "array subscript out of bound" }