diff mbox

Perform anonymous constant propagation during inlining

Message ID 6476732.lMFodJZTVz@polaris
State New
Headers show

Commit Message

Eric Botcazou April 29, 2015, 9:50 a.m. UTC
Historically the pragma Inline_Always of GNAT had been implemented in the FE 
because the RTL inliner and then the Tree inliner weren't invoked at -O0 or 
powerful enough to inline some constructs.  But this approach had drawbacks, 
especially wrt debug info.  These restrictions were gradually lifted and now 
the pragma is entirely piggybacked on the Tree inliner.

This went mostly OK, except for a few cases where intrisinc operations that 
used to be reasonably handled at -O0 now generate awful code, the typical 
example being a modulus or division instrinsic by a power-of-2 generating a 
fully-fledged modulus or division instruction instead of a simple shift.

Therefore the attached patch implements anonymous constant propagation in the 
inliner to fix the code quality regression.

Tested on x86_64-suse-linux, OK for the mainline?


2015-04-29  Eric Botcazou  <ebotcazou@adacore.com>

	* tree-inline.c (remap_gimple_op_r): Do anonymous constant propagation.
	(copy_bb): Fold conversions of constants immediately.


2015-04-29  Eric Botcazou  <ebotcazou@adacore.com>

	* gnat.dg/inline12.adb: New test.

Comments

Richard Biener April 29, 2015, 11:55 a.m. UTC | #1
On Wed, Apr 29, 2015 at 11:50 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> Historically the pragma Inline_Always of GNAT had been implemented in the FE
> because the RTL inliner and then the Tree inliner weren't invoked at -O0 or
> powerful enough to inline some constructs.  But this approach had drawbacks,
> especially wrt debug info.  These restrictions were gradually lifted and now
> the pragma is entirely piggybacked on the Tree inliner.
>
> This went mostly OK, except for a few cases where intrisinc operations that
> used to be reasonably handled at -O0 now generate awful code, the typical
> example being a modulus or division instrinsic by a power-of-2 generating a
> fully-fledged modulus or division instruction instead of a simple shift.
>
> Therefore the attached patch implements anonymous constant propagation in the
> inliner to fix the code quality regression.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Hmm, special-casing this in the inliner looks odd to me.  ISTR the inliner
already propagates constant parameters to immediate uses, so I guess
you run into the casting case you handle specially.

But then if any constant propagation is ok at -O0 why not perform full-fledged
constant propagation (with !DECL_IGNORED_P vars as a propagation barrier)
as a regular SSA pass?  That is, if you'd had a Inline_Always function like

int foo (int i)
{
  return (i + 1) / i;
}

you'd not get the desired result as the i + 1 stmt wouldn't be folded
and thus the (i + 1) / i stmt wouldn't either.

That said - why does RTL optimization not save you here anyway?
After all, it is responsible for turning divisions into shifts.

Soo - if you use the existing CCP pass and make surely_varying_stmt_p
return true for defs of !DECL_IGNORED_P stmts at -O0 (strictly needed to make
setting vars in gdb possible) and somehow arrange for ccp to run at -O0...

It might help other obscure cases with inline-asms to behave consistently
with -O0 vs. -On as well.

Richard.

> 2015-04-29  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * tree-inline.c (remap_gimple_op_r): Do anonymous constant propagation.
>         (copy_bb): Fold conversions of constants immediately.
>
>
> 2015-04-29  Eric Botcazou  <ebotcazou@adacore.com>
>
>         * gnat.dg/inline12.adb: New test.
>
>
> --
> Eric Botcazou
Jan Hubicka April 29, 2015, 1:23 p.m. UTC | #2
> Historically the pragma Inline_Always of GNAT had been implemented in the FE 
> because the RTL inliner and then the Tree inliner weren't invoked at -O0 or 
> powerful enough to inline some constructs.  But this approach had drawbacks, 
> especially wrt debug info.  These restrictions were gradually lifted and now 
> the pragma is entirely piggybacked on the Tree inliner.
> 
> This went mostly OK, except for a few cases where intrisinc operations that 
> used to be reasonably handled at -O0 now generate awful code, the typical 
> example being a modulus or division instrinsic by a power-of-2 generating a 
> fully-fledged modulus or division instruction instead of a simple shift.
> 
> Therefore the attached patch implements anonymous constant propagation in the 
> inliner to fix the code quality regression.
> 
> Tested on x86_64-suse-linux, OK for the mainline?
> 
>    if (TREE_CODE (*tp) == SSA_NAME)
>      {
> -      *tp = remap_ssa_name (*tp, id);
> +      tree t = remap_ssa_name (*tp, id);
> +      /* Perform anonymous constant propagation, this makes it possible to
> +         generate reasonable code even at -O0 for operators implemented as
> +         inline functions.  */
> +      if (TREE_CODE (t) == SSA_NAME
> +	  && SSA_NAME_DEF_STMT (t)
> +	  && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t)))
> +	  && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t))
> +	  && is_gimple_min_invariant
> +	     (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t))))
> +	*tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t));
> +      else
> +	*tp = t;

This looks like a good idea to me (though i can't approve it).  We may want to
lift the (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) when optimize
is set - the amount of garbage inliner produce is large and killing it early is
better than killing it later.  This has chance to help early opts where
ordering between ccp and einline is quite hard.
>        *walk_subtrees = 0;
>        return NULL;
>      }
> @@ -1965,7 +1977,7 @@ copy_bb (copy_body_data *id, basic_block
>  
>  	  /* Statements produced by inlining can be unfolded, especially
>  	     when we constant propagated some operands.  We can't fold
> -	     them right now for two reasons:
> +	     them right now in the general case for two reasons:
>  	     1) folding require SSA_NAME_DEF_STMTs to be correct
>  	     2) we can't change function calls to builtins.
>  	     So we just mark statement for later folding.  We mark
> @@ -1974,7 +1986,10 @@ copy_bb (copy_body_data *id, basic_block
>  	     foldable indirectly are updated.  If this turns out to be
>  	     expensive, copy_body can be told to watch for nontrivial
>  	     changes.  */
> -	  if (id->statements_to_fold)
> +	  if (gimple_assign_cast_p (stmt)
> +	      && is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
> +	    fold_stmt (&copy_gsi);
> +	  else if (id->statements_to_fold)

Why this is needed? Is it just an optimization because we know that folding of
casts will not do crazy stuff like SSA graph walking (that was original reason
for delaying the folidng) or just an optimization to reudce the size of the list?

Honza
>  	    id->statements_to_fold->add (stmt);
>  
>  	  /* We're duplicating a CALL_EXPR.  Find any corresponding
Richard Biener April 29, 2015, 1:42 p.m. UTC | #3
On Wed, Apr 29, 2015 at 3:23 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> Historically the pragma Inline_Always of GNAT had been implemented in the FE
>> because the RTL inliner and then the Tree inliner weren't invoked at -O0 or
>> powerful enough to inline some constructs.  But this approach had drawbacks,
>> especially wrt debug info.  These restrictions were gradually lifted and now
>> the pragma is entirely piggybacked on the Tree inliner.
>>
>> This went mostly OK, except for a few cases where intrisinc operations that
>> used to be reasonably handled at -O0 now generate awful code, the typical
>> example being a modulus or division instrinsic by a power-of-2 generating a
>> fully-fledged modulus or division instruction instead of a simple shift.
>>
>> Therefore the attached patch implements anonymous constant propagation in the
>> inliner to fix the code quality regression.
>>
>> Tested on x86_64-suse-linux, OK for the mainline?
>>
>>    if (TREE_CODE (*tp) == SSA_NAME)
>>      {
>> -      *tp = remap_ssa_name (*tp, id);
>> +      tree t = remap_ssa_name (*tp, id);
>> +      /* Perform anonymous constant propagation, this makes it possible to
>> +         generate reasonable code even at -O0 for operators implemented as
>> +         inline functions.  */
>> +      if (TREE_CODE (t) == SSA_NAME
>> +       && SSA_NAME_DEF_STMT (t)
>> +       && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t)))
>> +       && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t))
>> +       && is_gimple_min_invariant
>> +          (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t))))
>> +     *tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t));
>> +      else
>> +     *tp = t;
>
> This looks like a good idea to me (though i can't approve it).  We may want to
> lift the (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) when optimize
> is set - the amount of garbage inliner produce is large and killing it early is
> better than killing it later.  This has chance to help early opts where
> ordering between ccp and einline is quite hard.

Early opts run CCP as pretty much the first pass, so I don't see what
you are refering to here.

>>        *walk_subtrees = 0;
>>        return NULL;
>>      }
>> @@ -1965,7 +1977,7 @@ copy_bb (copy_body_data *id, basic_block
>>
>>         /* Statements produced by inlining can be unfolded, especially
>>            when we constant propagated some operands.  We can't fold
>> -          them right now for two reasons:
>> +          them right now in the general case for two reasons:
>>            1) folding require SSA_NAME_DEF_STMTs to be correct
>>            2) we can't change function calls to builtins.
>>            So we just mark statement for later folding.  We mark
>> @@ -1974,7 +1986,10 @@ copy_bb (copy_body_data *id, basic_block
>>            foldable indirectly are updated.  If this turns out to be
>>            expensive, copy_body can be told to watch for nontrivial
>>            changes.  */
>> -       if (id->statements_to_fold)
>> +       if (gimple_assign_cast_p (stmt)
>> +           && is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
>> +         fold_stmt (&copy_gsi);
>> +       else if (id->statements_to_fold)
>
> Why this is needed? Is it just an optimization because we know that folding of
> casts will not do crazy stuff like SSA graph walking (that was original reason
> for delaying the folidng) or just an optimization to reudce the size of the list?

Beause this folding turns the cast into sth the above hunk handles...

It's basically a hack ;)

A better approach might be to simply fold stmts we copy during the
stmt copy itself by using gimple_build intead of gimple_copy + fold_stmt.
Note that the match-and-simplify machinery does not perform constant
propgation but relies on a valueization hook.

Richard.

> Honza
>>           id->statements_to_fold->add (stmt);
>>
>>         /* We're duplicating a CALL_EXPR.  Find any corresponding
Jan Hubicka April 29, 2015, 2:06 p.m. UTC | #4
> On Wed, Apr 29, 2015 at 3:23 PM, Jan Hubicka <hubicka@ucw.cz> wrote:
> >> Historically the pragma Inline_Always of GNAT had been implemented in the FE
> >> because the RTL inliner and then the Tree inliner weren't invoked at -O0 or
> >> powerful enough to inline some constructs.  But this approach had drawbacks,
> >> especially wrt debug info.  These restrictions were gradually lifted and now
> >> the pragma is entirely piggybacked on the Tree inliner.
> >>
> >> This went mostly OK, except for a few cases where intrisinc operations that
> >> used to be reasonably handled at -O0 now generate awful code, the typical
> >> example being a modulus or division instrinsic by a power-of-2 generating a
> >> fully-fledged modulus or division instruction instead of a simple shift.
> >>
> >> Therefore the attached patch implements anonymous constant propagation in the
> >> inliner to fix the code quality regression.
> >>
> >> Tested on x86_64-suse-linux, OK for the mainline?
> >>
> >>    if (TREE_CODE (*tp) == SSA_NAME)
> >>      {
> >> -      *tp = remap_ssa_name (*tp, id);
> >> +      tree t = remap_ssa_name (*tp, id);
> >> +      /* Perform anonymous constant propagation, this makes it possible to
> >> +         generate reasonable code even at -O0 for operators implemented as
> >> +         inline functions.  */
> >> +      if (TREE_CODE (t) == SSA_NAME
> >> +       && SSA_NAME_DEF_STMT (t)
> >> +       && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t)))
> >> +       && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t))
> >> +       && is_gimple_min_invariant
> >> +          (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t))))
> >> +     *tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t));
> >> +      else
> >> +     *tp = t;
> >
> > This looks like a good idea to me (though i can't approve it).  We may want to
> > lift the (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t))) when optimize
> > is set - the amount of garbage inliner produce is large and killing it early is
> > better than killing it later.  This has chance to help early opts where
> > ordering between ccp and einline is quite hard.
> 
> Early opts run CCP as pretty much the first pass, so I don't see what
> you are refering to here.

Hmm, you are right. I remember playing with similar patch but that was before
we turned off iteration in early inliner and it was motivated to do more
of indirect  call promotion.

Since ipa-prop is no longer complete joke on propagating devirutalization info
perhaps this is no longer too important.

honza
Eric Botcazou May 1, 2015, 10:27 a.m. UTC | #5
> Hmm, special-casing this in the inliner looks odd to me.  ISTR the inliner
> already propagates constant parameters to immediate uses, so I guess
> you run into the casting case you handle specially.

Right on both counts, the original GIMPLE looks like:

  right.3 = (system__storage_elements__integer_address) right;
  D.4203 = D.4201 % right.3;
  D.4200 = (system__storage_elements__storage_offset) D.4203;
  return D.4200;

and setup_one_parameter has:

  /* If there is no setup required and we are in SSA, take the easy route
     replacing all SSA names representing the function parameter by the
     SSA name passed to function.

     We need to construct map for the variable anyway as it might be used
     in different SSA names when parameter is set in function.

     Do replacement at -O0 for const arguments replaced by constant.
     This is important for builtin_constant_p and other construct requiring
     constant argument to be visible in inlined function body.  */
  if (gimple_in_ssa_p (cfun) && rhs && def && is_gimple_reg (p)
      && (optimize
          || (TREE_READONLY (p)
	      && is_gimple_min_invariant (rhs)))
      && (TREE_CODE (rhs) == SSA_NAME
	  || is_gimple_min_invariant (rhs))
      && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (def))
    {
      insert_decl_map (id, def, rhs);
      return insert_init_debug_bind (id, bb, var, rhs, NULL);
    }

and here is the GIMPLE after inlining:

  _17 = _16;
  _18 = _17;
  right.3_19 = 8;
  _20 = _18 % right.3_19;
  _21 = (system__storage_elements__storage_offset) _20;

so the constant replacement is done for right.3_19 by the above block.

> But then if any constant propagation is ok at -O0 why not perform
> full-fledged constant propagation (with !DECL_IGNORED_P vars as a
> propagation barrier) as a regular SSA pass?  That is, if you'd had a
> Inline_Always function like
> 
> int foo (int i)
> {
>   return (i + 1) / i;
> }
> 
> you'd not get the desired result as the i + 1 stmt wouldn't be folded
> and thus the (i + 1) / i stmt wouldn't either.

That's OK, only the trivial cases need to be dealt with since it's -O0 so 
running a fully-fledged constant propagation seems overkill.

> That said - why does RTL optimization not save you here anyway?
> After all, it is responsible for turning divisions into shifts.

You mean the RTL expander?  Because the SSA name isn't replaced with the RHS 
of its defining statement in:

      /* For EXPAND_INITIALIZER try harder to get something simpler.  */
      if (g == NULL
	  && modifier == EXPAND_INITIALIZER
	  && !SSA_NAME_IS_DEFAULT_DEF (exp)
	  && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
	  && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
	g = SSA_NAME_DEF_STMT (exp);

since modifier is EXPAND_NORMAL.  Do you think it would be OK to be a little 
more aggressive here?  Something like:

      /* If we aren't on the LHS, look into the defining statement.  */
      if (g == NULL
	  && modifier != EXPAND_WRITE
	  && !SSA_NAME_IS_DEFAULT_DEF (exp)
	  && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
	  && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
	 {
	    g = SSA_NAME_DEF_STMT (exp);
	    /* For EXPAND_INITIALIZER substitute in all cases, otherwise do
	       it only for constants.  */
	    if (modifier != EXPAND_INITIALIZER
		&& (!gimple_assign_copy_p (g)
		    || !is_gimple_min_invariant (gimple_assign_rhs1 (g))))
	      g = NULL;
	 }

That's certainly sufficient here.
Richard Biener May 1, 2015, 2:19 p.m. UTC | #6
On May 1, 2015 12:27:17 PM GMT+02:00, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Hmm, special-casing this in the inliner looks odd to me.  ISTR the
>inliner
>> already propagates constant parameters to immediate uses, so I guess
>> you run into the casting case you handle specially.
>
>Right on both counts, the original GIMPLE looks like:
>
>  right.3 = (system__storage_elements__integer_address) right;
>  D.4203 = D.4201 % right.3;
>  D.4200 = (system__storage_elements__storage_offset) D.4203;
>  return D.4200;
>
>and setup_one_parameter has:
>
>/* If there is no setup required and we are in SSA, take the easy route
>     replacing all SSA names representing the function parameter by the
>     SSA name passed to function.
>
>   We need to construct map for the variable anyway as it might be used
>     in different SSA names when parameter is set in function.
>
>     Do replacement at -O0 for const arguments replaced by constant.
> This is important for builtin_constant_p and other construct requiring
>     constant argument to be visible in inlined function body.  */
>  if (gimple_in_ssa_p (cfun) && rhs && def && is_gimple_reg (p)
>      && (optimize
>          || (TREE_READONLY (p)
>	      && is_gimple_min_invariant (rhs)))
>      && (TREE_CODE (rhs) == SSA_NAME
>	  || is_gimple_min_invariant (rhs))
>      && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (def))
>    {
>      insert_decl_map (id, def, rhs);
>      return insert_init_debug_bind (id, bb, var, rhs, NULL);
>    }
>
>and here is the GIMPLE after inlining:
>
>  _17 = _16;
>  _18 = _17;
>  right.3_19 = 8;
>  _20 = _18 % right.3_19;
>  _21 = (system__storage_elements__storage_offset) _20;
>
>so the constant replacement is done for right.3_19 by the above block.
>
>> But then if any constant propagation is ok at -O0 why not perform
>> full-fledged constant propagation (with !DECL_IGNORED_P vars as a
>> propagation barrier) as a regular SSA pass?  That is, if you'd had a
>> Inline_Always function like
>> 
>> int foo (int i)
>> {
>>   return (i + 1) / i;
>> }
>> 
>> you'd not get the desired result as the i + 1 stmt wouldn't be folded
>> and thus the (i + 1) / i stmt wouldn't either.
>
>That's OK, only the trivial cases need to be dealt with since it's -O0
>so 
>running a fully-fledged constant propagation seems overkill.
>
>> That said - why does RTL optimization not save you here anyway?
>> After all, it is responsible for turning divisions into shifts.
>
>You mean the RTL expander?  Because the SSA name isn't replaced with
>the RHS 
>of its defining statement in:
>
>     /* For EXPAND_INITIALIZER try harder to get something simpler.  */
>      if (g == NULL
>	  && modifier == EXPAND_INITIALIZER
>	  && !SSA_NAME_IS_DEFAULT_DEF (exp)
>	  && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
>	  && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
>	g = SSA_NAME_DEF_STMT (exp);
>
>since modifier is EXPAND_NORMAL.  Do you think it would be OK to be a
>little 
>more aggressive here?  Something like:
>
>      /* If we aren't on the LHS, look into the defining statement.  */
>      if (g == NULL
>	  && modifier != EXPAND_WRITE
>	  && !SSA_NAME_IS_DEFAULT_DEF (exp)
>	  && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
>	  && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
>	 {
>	    g = SSA_NAME_DEF_STMT (exp);
>	    /* For EXPAND_INITIALIZER substitute in all cases, otherwise do
>	       it only for constants.  */
>	    if (modifier != EXPAND_INITIALIZER
>		&& (!gimple_assign_copy_p (g)
>		    || !is_gimple_min_invariant (gimple_assign_rhs1 (g))))
>	      g = NULL;
>	 }
>
>That's certainly sufficient here.

Yeah, I think that's a way better place for the hack.

Richard.
Eric Botcazou May 1, 2015, 4:42 p.m. UTC | #7
> Yeah, I think that's a way better place for the hack.

OK, how aggressive then?  We could as well do the substitution for all copies:

      /* For EXPAND_INITIALIZER try harder to get something simpler.
	 Otherwise, substitute copies on the RHS, this can propagate
	 constants at -O0 and thus simplify arithmetic operations.  */
      if (g == NULL
	  && !SSA_NAME_IS_DEFAULT_DEF (exp)
	  && (optimize || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
	  && (modifier == EXPAND_INITIALIZER
	      || (modifier != EXPAND_WRITE
		  && gimple_assign_copy_p (SSA_NAME_DEF_STMT (exp))))
	  && stmt_is_replaceable_p (SSA_NAME_DEF_STMT (exp)))
	g = SSA_NAME_DEF_STMT (exp);
diff mbox

Patch

Index: tree-inline.c
===================================================================
--- tree-inline.c	(revision 222439)
+++ tree-inline.c	(working copy)
@@ -898,7 +898,19 @@  remap_gimple_op_r (tree *tp, int *walk_s
 
   if (TREE_CODE (*tp) == SSA_NAME)
     {
-      *tp = remap_ssa_name (*tp, id);
+      tree t = remap_ssa_name (*tp, id);
+      /* Perform anonymous constant propagation, this makes it possible to
+         generate reasonable code even at -O0 for operators implemented as
+         inline functions.  */
+      if (TREE_CODE (t) == SSA_NAME
+	  && SSA_NAME_DEF_STMT (t)
+	  && (!SSA_NAME_VAR (t) || DECL_IGNORED_P (SSA_NAME_VAR (t)))
+	  && gimple_assign_copy_p (SSA_NAME_DEF_STMT (t))
+	  && is_gimple_min_invariant
+	     (gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t))))
+	*tp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (t));
+      else
+	*tp = t;
       *walk_subtrees = 0;
       return NULL;
     }
@@ -1965,7 +1977,7 @@  copy_bb (copy_body_data *id, basic_block
 
 	  /* Statements produced by inlining can be unfolded, especially
 	     when we constant propagated some operands.  We can't fold
-	     them right now for two reasons:
+	     them right now in the general case for two reasons:
 	     1) folding require SSA_NAME_DEF_STMTs to be correct
 	     2) we can't change function calls to builtins.
 	     So we just mark statement for later folding.  We mark
@@ -1974,7 +1986,10 @@  copy_bb (copy_body_data *id, basic_block
 	     foldable indirectly are updated.  If this turns out to be
 	     expensive, copy_body can be told to watch for nontrivial
 	     changes.  */
-	  if (id->statements_to_fold)
+	  if (gimple_assign_cast_p (stmt)
+	      && is_gimple_min_invariant (gimple_assign_rhs1 (stmt)))
+	    fold_stmt (&copy_gsi);
+	  else if (id->statements_to_fold)
 	    id->statements_to_fold->add (stmt);
 
 	  /* We're duplicating a CALL_EXPR.  Find any corresponding