diff mbox series

gimple-iterator, ubsan: Fix ICE during instrumentation of returns_twice calls [PR112709]

Message ID ZfAbWpHgGI/Bs6hp@tucnak
State New
Headers show
Series gimple-iterator, ubsan: Fix ICE during instrumentation of returns_twice calls [PR112709] | expand

Commit Message

Jakub Jelinek March 12, 2024, 9:07 a.m. UTC
Hi!

ubsan, asan (both PR112709) and _BitInt lowering (PR113466) want to
insert some instrumentation or adjustment statements before some statement.
This unfortunately creates invalid IL if inserting before a returns_twice
call, because we require that such calls are the first statement in a basic
block and that we have an edge from the .ABNORMAL_DISPATCHER block to
the block containing the returns_twice call (in addition to other edge(s)).

The following patch adds helper functions for such insertions and uses it
for now in ubsan (I'll post a follow up which uses it in asan and will
work later on the _BitInt lowering PR).

In particular, if the bb with returns_twice call at the start has just
2 edges, one EDGE_ABNORMAL from .ABNORMAL_DISPATCHER and another
(non-EDGE_ABNORMAL/EDGE_EH) from some other bb, it just inserts the
statement or sequence on that other edge.
If the bb has more predecessor edges or the one not from
.ABNORMAL_DISPATCHER is e.g. an EH edge (this latter case likely shouldn't
happen, one would need labels or something like that), the patch splits the
block with returns_twice call such that there is just one edge next to
.ABNORMAL_DISPATCHER edge and adjusts PHIs as needed to make it happen.
The functions also replace uses of PHIs from the returns_twice bb with
the corresponding PHI arguments, because otherwise it would be invalid IL.

E.g. in ubsan/pr112709-2.c (qux) we have before the ubsan pass
  <bb 10> :
  # .MEM_5(ab) = PHI <.MEM_4(9), .MEM_25(ab)(11)>
  # _7(ab) = PHI <_20(9), _8(ab)(11)>
  # .MEM_21(ab) = VDEF <.MEM_5(ab)>
  _22 = bar (*_7(ab));
where bar is returns_twice call and bb 11 has .ABNORMAL_DISPATCHER call,
this patch instruments it like:
  <bb 9> :
  # .MEM_4 = PHI <.MEM_17(ab)(4), .MEM_10(D)(5), .MEM_14(ab)(8)>
  # DEBUG BEGIN_STMT
  # VUSE <.MEM_4>
  _20 = p;
  # .MEM_27 = VDEF <.MEM_4>
  .UBSAN_NULL (_20, 0B, 0);
  # VUSE <.MEM_27>
  _2 = __builtin_dynamic_object_size (_20, 0);
  # .MEM_28 = VDEF <.MEM_27>
  .UBSAN_OBJECT_SIZE (_20, 1024, _2, 0);
  
  <bb 10> :
  # .MEM_5(ab) = PHI <.MEM_28(9), .MEM_25(ab)(11)>
  # _7(ab) = PHI <_20(9), _8(ab)(11)>
  # .MEM_21(ab) = VDEF <.MEM_5(ab)>
  _22 = bar (*_7(ab));
The edge from .ABNORMAL_DISPATCHER is there just to represent the
returning for 2nd and later times, the instrumentation can't be
done at that point as there is no code executed during that point.
The ubsan/pr112709-1.c testcase includes non-virtual PHIs to cover
the handling of those as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-03-12  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/112709
	* gimple-iterator.h (gsi_insert_before_returns_twice_call,
	gsi_insert_seq_before_returns_twice_call): Declare.
	* gimple-iterator.cc: Include gimplify.h.
	(edge_before_returns_twice_call, gsi_insert_before_returns_twice_call,
	gsi_insert_seq_before_returns_twice_call): New functions.
	* ubsan.cc (ubsan_insert_before, ubsan_insert_seq_before): New
	functions.
	(instrument_mem_ref, instrument_pointer_overflow,
	instrument_nonnull_arg, instrument_nonnull_return): Use
	ubsan_insert_before instead of gsi_insert_before.
	(maybe_instrument_pointer_overflow): Use force_gimple_operand,
	gimple_seq_add_seq_without_update and ubsan_insert_seq_before
	instead of force_gimple_operand_gsi.
	(instrument_object_size): Likewise.  Use ubsan_insert_before instead
	of gsi_insert_before.

	* gcc.dg/ubsan/pr112709-1.c: New test.
	* gcc.dg/ubsan/pr112709-2.c: New test.


	Jakub

Comments

Richard Biener March 12, 2024, 10:42 a.m. UTC | #1
On Tue, 12 Mar 2024, Jakub Jelinek wrote:

> Hi!
> 
> ubsan, asan (both PR112709) and _BitInt lowering (PR113466) want to
> insert some instrumentation or adjustment statements before some statement.
> This unfortunately creates invalid IL if inserting before a returns_twice
> call, because we require that such calls are the first statement in a basic
> block and that we have an edge from the .ABNORMAL_DISPATCHER block to
> the block containing the returns_twice call (in addition to other edge(s)).
> 
> The following patch adds helper functions for such insertions and uses it
> for now in ubsan (I'll post a follow up which uses it in asan and will
> work later on the _BitInt lowering PR).
> 
> In particular, if the bb with returns_twice call at the start has just
> 2 edges, one EDGE_ABNORMAL from .ABNORMAL_DISPATCHER and another
> (non-EDGE_ABNORMAL/EDGE_EH) from some other bb, it just inserts the
> statement or sequence on that other edge.
> If the bb has more predecessor edges or the one not from
> .ABNORMAL_DISPATCHER is e.g. an EH edge (this latter case likely shouldn't
> happen, one would need labels or something like that), the patch splits the
> block with returns_twice call such that there is just one edge next to
> .ABNORMAL_DISPATCHER edge and adjusts PHIs as needed to make it happen.
> The functions also replace uses of PHIs from the returns_twice bb with
> the corresponding PHI arguments, because otherwise it would be invalid IL.
> 
> E.g. in ubsan/pr112709-2.c (qux) we have before the ubsan pass
>   <bb 10> :
>   # .MEM_5(ab) = PHI <.MEM_4(9), .MEM_25(ab)(11)>
>   # _7(ab) = PHI <_20(9), _8(ab)(11)>
>   # .MEM_21(ab) = VDEF <.MEM_5(ab)>
>   _22 = bar (*_7(ab));
> where bar is returns_twice call and bb 11 has .ABNORMAL_DISPATCHER call,
> this patch instruments it like:
>   <bb 9> :
>   # .MEM_4 = PHI <.MEM_17(ab)(4), .MEM_10(D)(5), .MEM_14(ab)(8)>
>   # DEBUG BEGIN_STMT
>   # VUSE <.MEM_4>
>   _20 = p;
>   # .MEM_27 = VDEF <.MEM_4>
>   .UBSAN_NULL (_20, 0B, 0);
>   # VUSE <.MEM_27>
>   _2 = __builtin_dynamic_object_size (_20, 0);
>   # .MEM_28 = VDEF <.MEM_27>
>   .UBSAN_OBJECT_SIZE (_20, 1024, _2, 0);
>   
>   <bb 10> :
>   # .MEM_5(ab) = PHI <.MEM_28(9), .MEM_25(ab)(11)>
>   # _7(ab) = PHI <_20(9), _8(ab)(11)>
>   # .MEM_21(ab) = VDEF <.MEM_5(ab)>
>   _22 = bar (*_7(ab));
> The edge from .ABNORMAL_DISPATCHER is there just to represent the
> returning for 2nd and later times, the instrumentation can't be
> done at that point as there is no code executed during that point.
> The ubsan/pr112709-1.c testcase includes non-virtual PHIs to cover
> the handling of those as well.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-03-12  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR sanitizer/112709
> 	* gimple-iterator.h (gsi_insert_before_returns_twice_call,
> 	gsi_insert_seq_before_returns_twice_call): Declare.
> 	* gimple-iterator.cc: Include gimplify.h.
> 	(edge_before_returns_twice_call, gsi_insert_before_returns_twice_call,
> 	gsi_insert_seq_before_returns_twice_call): New functions.
> 	* ubsan.cc (ubsan_insert_before, ubsan_insert_seq_before): New
> 	functions.
> 	(instrument_mem_ref, instrument_pointer_overflow,
> 	instrument_nonnull_arg, instrument_nonnull_return): Use
> 	ubsan_insert_before instead of gsi_insert_before.
> 	(maybe_instrument_pointer_overflow): Use force_gimple_operand,
> 	gimple_seq_add_seq_without_update and ubsan_insert_seq_before
> 	instead of force_gimple_operand_gsi.
> 	(instrument_object_size): Likewise.  Use ubsan_insert_before instead
> 	of gsi_insert_before.
> 
> 	* gcc.dg/ubsan/pr112709-1.c: New test.
> 	* gcc.dg/ubsan/pr112709-2.c: New test.
> 
> --- gcc/gimple-iterator.h.jj	2024-02-08 11:17:37.144661383 +0100
> +++ gcc/gimple-iterator.h	2024-03-11 16:16:24.670464737 +0100
> @@ -93,6 +93,10 @@ extern void gsi_insert_on_edge (edge, gi
>  extern void gsi_insert_seq_on_edge (edge, gimple_seq);
>  extern basic_block gsi_insert_on_edge_immediate (edge, gimple *);
>  extern basic_block gsi_insert_seq_on_edge_immediate (edge, gimple_seq);
> +extern basic_block gsi_insert_before_returns_twice_call (basic_block,
> +							 gimple *);
> +extern basic_block gsi_insert_seq_before_returns_twice_call (basic_block,
> +							     gimple_seq);
>  extern void gsi_commit_edge_inserts (void);
>  extern void gsi_commit_one_edge_insert (edge, basic_block *);
>  extern gphi_iterator gsi_start_phis (basic_block);
> --- gcc/gimple-iterator.cc.jj	2024-02-08 11:17:37.144661383 +0100
> +++ gcc/gimple-iterator.cc	2024-03-11 18:17:08.994804808 +0100
> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-cfg.h"
>  #include "tree-ssa.h"
>  #include "value-prof.h"
> +#include "gimplify.h"
>  
>  
>  /* Mark the statement STMT as modified, and update it.  */
> @@ -944,3 +945,135 @@ gsi_start_phis (basic_block bb)
>  
>    return i;
>  }
> +
> +/* Helper function for gsi_insert_before_returns_twice_call and
> +   gsi_insert_seq_before_returns_twice_call.  Find edge to
> +   insert statements before returns_twice call at the start of BB,
> +   if there isn't just one, split the bb and adjust PHIs to ensure
> +   that.  */
> +
> +static edge
> +edge_before_returns_twice_call (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
> +  gcc_checking_assert (is_gimple_call (gsi_stmt (gsi))
> +		       && (gimple_call_flags (gsi_stmt (gsi))
> +			   & ECF_RETURNS_TWICE) != 0);
> +  edge_iterator ei;
> +  edge e, ad_edge = NULL, other_edge = NULL;
> +  bool split = false;
> +  /* Look for a fallthru and possibly a single backedge.  */

I can't see the backedge handling (I'm not sure it's actually required?)

> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)

EDGE_EH is special because of the landing pad?

> +	{
> +	  gimple_stmt_iterator gsi
> +	    = gsi_start_nondebug_after_labels_bb (e->src);
> +	  gimple *ad = gsi_stmt (gsi);
> +	  if (ad && gimple_call_internal_p (ad, IFN_ABNORMAL_DISPATCHER))
> +	    {
> +	      gcc_checking_assert (ad_edge == NULL);
> +	      ad_edge = e;
> +	      continue;
> +	    }
> +	}
> +      if (other_edge || e->flags & (EDGE_ABNORMAL | EDGE_EH))
> +	split = true;
> +      other_edge = e;
> +    }
> +  gcc_checking_assert (ad_edge);
> +  if (other_edge == NULL)
> +    split = true;
> +  if (split)
> +    {
> +      other_edge = split_block_after_labels (bb);
> +      e = make_edge (ad_edge->src, other_edge->dest, EDGE_ABNORMAL);

In theory there could be multiple abnormal edges.  Did you try
to use make_forwarder_block instead of manually doing it?  Or
does that fail because we end up redirecting the abnormal edges
which you avoid by re-making it new and scrapping the old one?
I fail to remember why we can't redirect abnormal edges - yes,
there's no flow to be redirected but the process of "redirection"
still should work.

> +      for (gphi_iterator gsi = gsi_start_phis (other_edge->src);
> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gphi *phi = gsi.phi ();
> +	  tree lhs = gimple_phi_result (phi);
> +	  tree new_lhs;
> +	  if (virtual_operand_p (lhs))
> +	    new_lhs = make_ssa_name (SSA_NAME_VAR (lhs));
> +	  else
> +	    new_lhs = make_ssa_name (TREE_TYPE (lhs));
> +	  gimple_phi_set_result (phi, new_lhs);
> +	  gphi *new_phi = create_phi_node (lhs, other_edge->dest);
> +	  add_phi_arg (new_phi, new_lhs, other_edge, UNKNOWN_LOCATION);
> +	  add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
> +		       e, gimple_phi_arg_location_from_edge (phi, ad_edge));
> +	}
> +      remove_edge (ad_edge);
> +    }
> +  return other_edge;
> +}
> +
> +/* Insert G before BB which starts with an ECF_RETURNS_TWICE call.
> +   If BB has a single predecessor edge except for edge from
> +   .ABNORMAL_DISPATCHER, insert on that edge, otherwise split
> +   BB before the call, adjust PHIs and insert into the new BB.
> +   If G refers to any results of BB's PHIs, replace them afterwards
> +   with corresponding PHI arg.  */
> +
> +basic_block
> +gsi_insert_before_returns_twice_call (basic_block bb, gimple *g)
> +{
> +  edge e = edge_before_returns_twice_call (bb);
> +  basic_block new_bb = gsi_insert_on_edge_immediate (e, g);
> +  use_operand_p use_p;
> +  ssa_op_iter iter;
> +  bool m = false;
> +  FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
> +    {
> +      tree s = USE_FROM_PTR (use_p);
> +      if (SSA_NAME_DEF_STMT (s)
> +	  && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
> +	  && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
> +	{
> +	  tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
> +	  SET_USE (use_p, unshare_expr (r));
> +	  m = true;
> +	}
> +    }

Ick.  Doesn't the forwarder keep the PHI defs and new PHI defs
are created in e->dest instead, I don't see how we need this?

Btw, gsi_insert_before_returns_twice_call might be usable as
gsi_insert_after_labels ()?  Or alternatively not expose
gsi_insert_before_returns_twice_call but instead make
split_block () behave "correctly" for abnormals (or add
split_block_before_returns_twice_call ()).  But that's just bike-shedding.

Richard.


> +  if (m)
> +    update_stmt (g);
> +  return new_bb;
> +}
> +
> +/* Similiarly for sequence SEQ.  */
> +
> +basic_block
> +gsi_insert_seq_before_returns_twice_call (basic_block bb, gimple_seq seq)
> +{
> +  if (gimple_seq_empty_p (seq))
> +    return NULL;
> +  gimple *f = gimple_seq_first_stmt (seq);
> +  gimple *l = gimple_seq_last_stmt (seq);
> +  edge e = edge_before_returns_twice_call (bb);
> +  basic_block new_bb = gsi_insert_seq_on_edge_immediate (e, seq);
> +  use_operand_p use_p;
> +  ssa_op_iter iter;
> +  for (gimple_stmt_iterator gsi = gsi_for_stmt (f); ; gsi_next (&gsi))
> +    {
> +      gimple *g = gsi_stmt (gsi);
> +      bool m = false;
> +      FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
> +	{
> +	  tree s = USE_FROM_PTR (use_p);
> +	  if (SSA_NAME_DEF_STMT (s)
> +	      && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
> +	      && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
> +	    {
> +	      tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
> +	      SET_USE (use_p, unshare_expr (r));
> +	      m = true;
> +	    }
> +	}
> +      if (m)
> +	update_stmt (g);
> +      if (g == l)
> +	break;
> +    }
> +  return new_bb;
> +}
> --- gcc/ubsan.cc.jj	2024-01-03 11:51:32.312720350 +0100
> +++ gcc/ubsan.cc	2024-03-11 17:07:58.104243341 +0100
> @@ -1432,6 +1432,37 @@ ubsan_expand_vptr_ifn (gimple_stmt_itera
>    return true;
>  }
>  
> +/* Insert G stmt before ITER.  If ITER is a returns_twice call,
> +   insert it on an appropriate edge instead.  */
> +
> +static void
> +ubsan_insert_before (gimple_stmt_iterator *iter, gimple *g)
> +{
> +  gimple *stmt = gsi_stmt (*iter);
> +  if (stmt
> +      && is_gimple_call (stmt)
> +      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +    gsi_insert_before_returns_twice_call (gsi_bb (*iter), g);
> +  else
> +    gsi_insert_before (iter, g, GSI_SAME_STMT);
> +}
> +
> +/* Similarly for sequence SEQ.  */
> +
> +static void
> +ubsan_insert_seq_before (gimple_stmt_iterator *iter, gimple_seq seq)
> +{
> +  if (gimple_seq_empty_p (seq))
> +    return;
> +  gimple *stmt = gsi_stmt (*iter);
> +  if (stmt
> +      && is_gimple_call (stmt)
> +      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +    gsi_insert_seq_before_returns_twice_call (gsi_bb (*iter), seq);
> +  else
> +    gsi_insert_seq_before (iter, seq, GSI_SAME_STMT);
> +}
> +
>  /* Instrument a memory reference.  BASE is the base of MEM, IS_LHS says
>     whether the pointer is on the left hand side of the assignment.  */
>  
> @@ -1458,7 +1489,7 @@ instrument_mem_ref (tree mem, tree base,
>    tree alignt = build_int_cst (pointer_sized_int_node, align);
>    gcall *g = gimple_build_call_internal (IFN_UBSAN_NULL, 3, t, kind, alignt);
>    gimple_set_location (g, gimple_location (gsi_stmt (*iter)));
> -  gsi_insert_before (iter, g, GSI_SAME_STMT);
> +  ubsan_insert_before (iter, g);
>  }
>  
>  /* Perform the pointer instrumentation.  */
> @@ -1485,7 +1516,7 @@ instrument_pointer_overflow (gimple_stmt
>      return;
>    gcall *g = gimple_build_call_internal (IFN_UBSAN_PTR, 2, ptr, off);
>    gimple_set_location (g, gimple_location (gsi_stmt (*gsi)));
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  ubsan_insert_before (gsi, g);
>  }
>  
>  /* Instrument pointer arithmetics if any.  */
> @@ -1577,10 +1608,11 @@ maybe_instrument_pointer_overflow (gimpl
>        else
>  	t = fold_convert (sizetype, moff);
>      }
> -  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
> -				GSI_SAME_STMT);
> -  base_addr = force_gimple_operand_gsi (gsi, base_addr, true, NULL_TREE, true,
> -					GSI_SAME_STMT);
> +  gimple_seq seq, this_seq;
> +  t = force_gimple_operand (t, &seq, true, NULL_TREE);
> +  base_addr = force_gimple_operand (base_addr, &this_seq, true, NULL_TREE);
> +  gimple_seq_add_seq_without_update (&seq, this_seq);
> +  ubsan_insert_seq_before (gsi, seq);
>    instrument_pointer_overflow (gsi, base_addr, t);
>  }
>  
> @@ -2035,7 +2067,7 @@ instrument_nonnull_arg (gimple_stmt_iter
>  	    {
>  	      g = gimple_build_assign (make_ssa_name (TREE_TYPE (arg)), arg);
>  	      gimple_set_location (g, loc[0]);
> -	      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	      ubsan_insert_before (gsi, g);
>  	      arg = gimple_assign_lhs (g);
>  	    }
>  
> @@ -2068,7 +2100,7 @@ instrument_nonnull_arg (gimple_stmt_iter
>  	      g = gimple_build_call (fn, 1, data);
>  	    }
>  	  gimple_set_location (g, loc[0]);
> -	  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	  ubsan_insert_before (gsi, g);
>  	  ubsan_create_edge (g);
>  	}
>        *gsi = gsi_for_stmt (stmt);
> @@ -2124,7 +2156,7 @@ instrument_nonnull_return (gimple_stmt_i
>  	  g = gimple_build_call (fn, 2, data, data2);
>  	}
>        gimple_set_location (g, loc[0]);
> -      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      ubsan_insert_before (gsi, g);
>        ubsan_create_edge (g);
>        *gsi = gsi_for_stmt (stmt);
>      }
> @@ -2231,6 +2263,7 @@ instrument_object_size (gimple_stmt_iter
>    tree sizet;
>    tree base_addr = base;
>    gimple *bos_stmt = NULL;
> +  gimple_seq seq = NULL;
>    if (decl_p)
>      base_addr = build1 (ADDR_EXPR,
>  			build_pointer_type (TREE_TYPE (base)), base);
> @@ -2244,19 +2277,12 @@ instrument_object_size (gimple_stmt_iter
>        sizet = builtin_decl_explicit (BUILT_IN_DYNAMIC_OBJECT_SIZE);
>        sizet = build_call_expr_loc (loc, sizet, 2, base_addr,
>  				   integer_zero_node);
> -      sizet = force_gimple_operand_gsi (gsi, sizet, false, NULL_TREE, true,
> -					GSI_SAME_STMT);
> +      sizet = force_gimple_operand (sizet, &seq, false, NULL_TREE);
>        /* If the call above didn't end up being an integer constant, go one
>  	 statement back and get the __builtin_object_size stmt.  Save it,
>  	 we might need it later.  */
>        if (SSA_VAR_P (sizet))
> -	{
> -	  gsi_prev (gsi);
> -	  bos_stmt = gsi_stmt (*gsi);
> -
> -	  /* Move on to where we were.  */
> -	  gsi_next (gsi);
> -	}
> +	bos_stmt = gsi_stmt (gsi_last (seq));
>      }
>    else
>      return;
> @@ -2298,21 +2324,24 @@ instrument_object_size (gimple_stmt_iter
>        && !TREE_ADDRESSABLE (base))
>      mark_addressable (base);
>  
> +  /* We have to emit the check.  */
> +  gimple_seq this_seq;
> +  t = force_gimple_operand (t, &this_seq, true, NULL_TREE);
> +  gimple_seq_add_seq_without_update (&seq, this_seq);
> +  ptr = force_gimple_operand (ptr, &this_seq, true, NULL_TREE);
> +  gimple_seq_add_seq_without_update (&seq, this_seq);
> +  ubsan_insert_seq_before (gsi, seq);
> +
>    if (bos_stmt
>        && gimple_call_builtin_p (bos_stmt, BUILT_IN_DYNAMIC_OBJECT_SIZE))
>      ubsan_create_edge (bos_stmt);
>  
> -  /* We have to emit the check.  */
> -  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
> -				GSI_SAME_STMT);
> -  ptr = force_gimple_operand_gsi (gsi, ptr, true, NULL_TREE, true,
> -				  GSI_SAME_STMT);
>    tree ckind = build_int_cst (unsigned_char_type_node,
>  			      is_lhs ? UBSAN_STORE_OF : UBSAN_LOAD_OF);
>    gimple *g = gimple_build_call_internal (IFN_UBSAN_OBJECT_SIZE, 4,
>  					 ptr, t, sizet, ckind);
>    gimple_set_location (g, loc);
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  ubsan_insert_before (gsi, g);
>  }
>  
>  /* Instrument values passed to builtin functions.  */
> --- gcc/testsuite/gcc.dg/ubsan/pr112709-1.c.jj	2024-03-11 16:55:09.038760951 +0100
> +++ gcc/testsuite/gcc.dg/ubsan/pr112709-1.c	2024-03-11 16:55:02.159854950 +0100
> @@ -0,0 +1,52 @@
> +/* PR sanitizer/112709 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined -O2" } */
> +
> +struct S { char c[1024]; };
> +int foo (int);
> +
> +__attribute__((returns_twice, noipa)) struct S
> +bar (int x)
> +{
> +  (void) x;
> +  struct S s = {};
> +  s.c[42] = 42;
> +  return s;
> +}
> +
> +void
> +baz (struct S *p)
> +{
> +  foo (1);
> +  *p = bar (0);
> +}
> +
> +void
> +qux (int x, struct S *p)
> +{
> +  if (x == 25)
> +    x = foo (2);
> +  else if (x == 42)
> +    x = foo (foo (3));
> +  *p = bar (x);
> +}
> +
> +void
> +corge (int x, struct S *p)
> +{
> +  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
> +  if (x == 25)
> +    {
> +    l1:
> +      x = foo (2);
> +    }
> +  else if (x == 42)
> +    {
> +    l2:
> +      x = foo (foo (3));
> +    }
> +l3:
> +  *p = bar (x);
> +  if (x < 4)
> +    goto *q[x & 3];
> +}
> --- gcc/testsuite/gcc.dg/ubsan/pr112709-2.c.jj	2024-03-11 16:55:37.000378840 +0100
> +++ gcc/testsuite/gcc.dg/ubsan/pr112709-2.c	2024-03-11 17:13:37.517599492 +0100
> @@ -0,0 +1,50 @@
> +/* PR sanitizer/112709 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined -O2" } */
> +
> +struct S { char c[1024]; } *p;
> +int foo (int);
> +
> +__attribute__((returns_twice, noipa)) int
> +bar (struct S x)
> +{
> +  (void) x.c[0];
> +  return 0;
> +}
> +
> +void
> +baz (int *y)
> +{
> +  foo (1);
> +  *y = bar (*p);
> +}
> +
> +void
> +qux (int x, int *y)
> +{
> +  if (x == 25)
> +    x = foo (2);
> +  else if (x == 42)
> +    x = foo (foo (3));
> +  *y = bar (*p);
> +}
> +
> +void
> +corge (int x, int *y)
> +{
> +  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
> +  if (x == 25)
> +    {
> +    l1:
> +      x = foo (2);
> +    }
> +  else if (x == 42)
> +    {
> +    l2:
> +      x = foo (foo (3));
> +    }
> +l3:
> +  *y = bar (*p);
> +  if (x < 4)
> +    goto *q[x & 3];
> +}
> 
> 	Jakub
> 
>
Jakub Jelinek March 12, 2024, 12:17 p.m. UTC | #2
On Tue, Mar 12, 2024 at 11:42:03AM +0100, Richard Biener wrote:
> > +static edge
> > +edge_before_returns_twice_call (basic_block bb)
> > +{
> > +  gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
> > +  gcc_checking_assert (is_gimple_call (gsi_stmt (gsi))
> > +		       && (gimple_call_flags (gsi_stmt (gsi))
> > +			   & ECF_RETURNS_TWICE) != 0);
> > +  edge_iterator ei;
> > +  edge e, ad_edge = NULL, other_edge = NULL;
> > +  bool split = false;
> > +  /* Look for a fallthru and possibly a single backedge.  */
> 
> I can't see the backedge handling (I'm not sure it's actually required?)

Oops, that comment line just shouldn't be there.  Grepping around, I've
pasted it from tree-ssa-uninit.cc (warn_uninit_phi_uses) - just wanted
to grab something with the edge_iterator, edge declaration and
FOR_EACH_EDGE on bb->preds) and forgot to remove that comment line.

Consider it removed.

> > +  FOR_EACH_EDGE (e, ei, bb->preds)
> > +    {
> > +      if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
> 
> EDGE_EH is special because of the landing pad?

Yes, though I'd think a landing pad would need to start with a label.
Anyway, the (x & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL
test is what is tested for ABNORMAL_DISPATCHER in other places:
tree-cfgcleanup.cc-	  if ((ei_edge (ei)->flags
tree-cfgcleanup.cc:	       & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
tree-cfgcleanup.cc-	    {
tree-cfgcleanup.cc-	      gimple_stmt_iterator gsi
tree-cfgcleanup.cc-		= gsi_start_nondebug_after_labels_bb (dest);
tree-cfgcleanup.cc-	      gimple *g = gsi_stmt (gsi);
tree-cfgcleanup.cc-	      if (g && gimple_call_internal_p (g, IFN_ABNORMAL_DISPATCHER))
tree-cfgcleanup.cc-		abnormal_dispatchers.safe_push (dest);
(that is where I copied it from),
tree-cfg.cc-  FOR_EACH_EDGE (e, ei, bb->succs)
tree-cfg.cc:    if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
tree-cfg.cc-      {
tree-cfg.cc-	gimple_stmt_iterator gsi
tree-cfg.cc-	  = gsi_start_nondebug_after_labels_bb (e->dest);
tree-cfg.cc-	gimple *g = gsi_stmt (gsi);
tree-cfg.cc-	if (g && gimple_call_internal_p (g, IFN_ABNORMAL_DISPATCHER))
tree-cfg.cc-	  return e->dest;
Though, admittedly, git blame shows both were written by myself, so it
doesn't count much.  The edge from IFN_ABNORMAL_DISPATCHER has EDGE_ABNORMAL
flag set:
  make_edge (*dispatcher, for_bb, EDGE_ABNORMAL);
and I bet some EDGE_* flags can be set on that edge later on, but EDGE_EH
certainly shouldn't be.
Bet in both of those preexisting cases and in this new one it could be done
just as flags & EDGE_ABNORMAL test, though perhaps the tree-cfgcleanup.cc
and tree-cfg.cc cases can see there EDGE_EH edges and would query the first
non-debug stmt in them uselessly, while perhaps among the returns_twice
predecessor EDGE_EH is very unlikely and not worth testing.
So, if you want, I can change that line to
      if ((e->flags & EDGE_ABNORMAL) != 0)

> 
> > +	{
> > +	  gimple_stmt_iterator gsi
> > +	    = gsi_start_nondebug_after_labels_bb (e->src);
> > +	  gimple *ad = gsi_stmt (gsi);
> > +	  if (ad && gimple_call_internal_p (ad, IFN_ABNORMAL_DISPATCHER))
> > +	    {
> > +	      gcc_checking_assert (ad_edge == NULL);
> > +	      ad_edge = e;
> > +	      continue;
> > +	    }
> > +	}
> > +      if (other_edge || e->flags & (EDGE_ABNORMAL | EDGE_EH))
> > +	split = true;
> > +      other_edge = e;
> > +    }
> > +  gcc_checking_assert (ad_edge);
> > +  if (other_edge == NULL)
> > +    split = true;
> > +  if (split)
> > +    {
> > +      other_edge = split_block_after_labels (bb);
> > +      e = make_edge (ad_edge->src, other_edge->dest, EDGE_ABNORMAL);
> 
> In theory there could be multiple abnormal edges.

Sure, but a function has at most one .ABNORMAL_DISPATCHER block.
Edge from that block is the only one that needs to be kept pointing to
the returns_twice function.  In the testcases I've even tried to
get other abnormal edges there, but because of labels we actually have
labels on the block that falls through to the returns_twice block.
I bet for EH edges it is the same.

>  Did you try
> to use make_forwarder_block instead of manually doing it?

Looking at it, it doesn't a lot of unnecessary work.  We don't need
to do any redirection of anything but moving the .ABNORMAL_DISPATCHER
edge after splitting the block.  And indeed, it would fail on redirection
of the EDGE_ABNORMAL edge.

>  Or
> does that fail because we end up redirecting the abnormal edges
> which you avoid by re-making it new and scrapping the old one?
> I fail to remember why we can't redirect abnormal edges - yes,
> there's no flow to be redirected but the process of "redirection"
> still should work.

Seems the redirection always wants to adjust the src blocks of the
edges.  That isn't something that should be done for this particular
abnormal edge, that one actually can be redirected just by replacing it,
though for say EH edges we'd need to adjust the on the side EH info etc.

> > +      for (gphi_iterator gsi = gsi_start_phis (other_edge->src);
> > +	   !gsi_end_p (gsi); gsi_next (&gsi))
> > +	{
> > +	  gphi *phi = gsi.phi ();
> > +	  tree lhs = gimple_phi_result (phi);
> > +	  tree new_lhs;
> > +	  if (virtual_operand_p (lhs))
> > +	    new_lhs = make_ssa_name (SSA_NAME_VAR (lhs));
> > +	  else
> > +	    new_lhs = make_ssa_name (TREE_TYPE (lhs));

Guess I should use new_lhs = copy_ssa_name (lhs); instead

> > +	  gimple_phi_set_result (phi, new_lhs);
> > +	  gphi *new_phi = create_phi_node (lhs, other_edge->dest);
> > +	  add_phi_arg (new_phi, new_lhs, other_edge, UNKNOWN_LOCATION);
> > +	  add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
> > +		       e, gimple_phi_arg_location_from_edge (phi, ad_edge));
> > +	}
> > +      remove_edge (ad_edge);
> > +    }
> > +  return other_edge;
> > +}

> > +  FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
> > +    {
> > +      tree s = USE_FROM_PTR (use_p);
> > +      if (SSA_NAME_DEF_STMT (s)
> > +	  && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
> > +	  && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
> > +	{
> > +	  tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
> > +	  SET_USE (use_p, unshare_expr (r));
> > +	  m = true;
> > +	}
> > +    }
> 
> Ick.  Doesn't the forwarder keep the PHI defs and new PHI defs
> are created in e->dest instead, I don't see how we need this?

Admittedly the above is the ugliest part of the patch IMHO.
It isn't needed in all cases, but e.g. for the pr112709-2.c (qux) case
we have before ubsan pass
  # _7(ab) = PHI <_20(9), _8(ab)(11)>
  _22 = bar (*_7(ab));
and want to instrument the *_7(ab) load among the parameters for object
size.
So, it wants to add
  _2 = __builtin_dynamic_object_size (_7(ab), 0);
sequence and later:
  .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0);
before the call insn.  If it isn't a noreturn call, it would just
add those statements into the same basic block using gsi_insert*before.
But because it is a returns_twice call, it needs to be done in a different
block.  And having all of ubsan/asan/bitintlower find that out first and
figure out that it should use _20 instead of _7(ab) seems hard, especially
that for cases like:
void
freddy (int x, int *y, struct S *p)
{
  bar (*p);
  ++p;
  if (x == 25)
    x = foo (2);
  else if (x == 42)
    x = foo (foo (3));
  *y = bar (*p);
}
added to pr112709-2.c we'd have:
  # x_5(ab) = PHI <x_3(ab)(7), x_4(ab)(3), x_24(6), x_21(10)>
  # p_8(ab) = PHI <p_16(ab)(7), p_7(ab)(3), p_16(ab)(6), p_16(ab)(10)>
  _26 = bar (*p_8(ab));
before ubsan and the .ABNORMAL_DISPATCHER block is 3.
And the caller doesn't know what to replace the p_8(ab) in there with
until the new bb is created and PHIs are adjusted.

Though, I ran into ICE on the above testcase, not on the second bar call
with the multiple edges, but on the first one, because unlike everything
else added in the testcases, the first bar splits the e edge on
gsi_insert_on_edge_immediate, so the assumption that e is the edge from
the single non-EDGE_ABNORMAL predecessor of the returns_twice block
to the returns_twice block is then no longer true.  Fixed by adding
  if (new_bb)
    e = single_succ_edge (new_bb);
twice.

> Btw, gsi_insert_before_returns_twice_call might be usable as
> gsi_insert_after_labels ()?

It could be called like that too, sure, though still the callers
would need to check if they want to insert before some statement or
"after labels" which for the returns_twice case actually isn't after any
labels, but on an edge and if there isn't one (except for the EDGE_ABNORMAL),
ensure there is just one.
In theory even gsi_insert_before and gsi_insert_seq_before could do those,
but I think most of the half thousand users of those don't need that
behavior and would be surprised if something like that happened.

> Or alternatively not expose
> gsi_insert_before_returns_twice_call but instead make
> split_block () behave "correctly" for abnormals (or add
> split_block_before_returns_twice_call ()).  But that's just bike-shedding.

I don't see how is that possible.  The usual case is that no split_block
is needed, usually the returns_twice block has just a single normal
predecessor and the .ABNORMAL_EDGE predecessor.  And in that case
we shouldn't be creating further block, just insert on the edge with the
PHI result to arg replacements.  Similarly, even if we do split it, all
further instrumentation that should go before the call should go to the
created block, not result in further splits.

Anyway, here is an updated patch, with the bogus comment removed,
copy_ssa_name used, the ICE fix mentioned above and the additions of freddy
functions into the tests.

Tested so far with make check-gcc check-g++ RUNTESTFLAGS='ubsan.exp asan.exp'

2024-03-12  Jakub Jelinek  <jakub@redhat.com>

	PR sanitizer/112709
	* gimple-iterator.h (gsi_insert_before_returns_twice_call,
	gsi_insert_seq_before_returns_twice_call): Declare.
	* gimple-iterator.cc: Include gimplify.h.
	(edge_before_returns_twice_call, gsi_insert_before_returns_twice_call,
	gsi_insert_seq_before_returns_twice_call): New functions.
	* ubsan.cc (ubsan_insert_before, ubsan_insert_seq_before): New
	functions.
	(instrument_mem_ref, instrument_pointer_overflow,
	instrument_nonnull_arg, instrument_nonnull_return): Use
	ubsan_insert_before instead of gsi_insert_before.
	(maybe_instrument_pointer_overflow): Use force_gimple_operand,
	gimple_seq_add_seq_without_update and ubsan_insert_seq_before
	instead of force_gimple_operand_gsi.
	(instrument_object_size): Likewise.  Use ubsan_insert_before instead
	of gsi_insert_before.

	* gcc.dg/ubsan/pr112709-1.c: New test.
	* gcc.dg/ubsan/pr112709-2.c: New test.

--- gcc/gimple-iterator.h.jj	2024-03-12 10:15:41.253529859 +0100
+++ gcc/gimple-iterator.h	2024-03-12 12:34:56.026880701 +0100
@@ -93,6 +93,10 @@ extern void gsi_insert_on_edge (edge, gi
 extern void gsi_insert_seq_on_edge (edge, gimple_seq);
 extern basic_block gsi_insert_on_edge_immediate (edge, gimple *);
 extern basic_block gsi_insert_seq_on_edge_immediate (edge, gimple_seq);
+extern basic_block gsi_insert_before_returns_twice_call (basic_block,
+							 gimple *);
+extern basic_block gsi_insert_seq_before_returns_twice_call (basic_block,
+							     gimple_seq);
 extern void gsi_commit_edge_inserts (void);
 extern void gsi_commit_one_edge_insert (edge, basic_block *);
 extern gphi_iterator gsi_start_phis (basic_block);
--- gcc/gimple-iterator.cc.jj	2024-03-12 10:15:41.209530471 +0100
+++ gcc/gimple-iterator.cc	2024-03-12 12:53:26.174467429 +0100
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.
 #include "tree-cfg.h"
 #include "tree-ssa.h"
 #include "value-prof.h"
+#include "gimplify.h"
 
 
 /* Mark the statement STMT as modified, and update it.  */
@@ -944,3 +945,134 @@ gsi_start_phis (basic_block bb)
 
   return i;
 }
+
+/* Helper function for gsi_insert_before_returns_twice_call and
+   gsi_insert_seq_before_returns_twice_call.  Find edge to
+   insert statements before returns_twice call at the start of BB,
+   if there isn't just one, split the bb and adjust PHIs to ensure
+   that.  */
+
+static edge
+edge_before_returns_twice_call (basic_block bb)
+{
+  gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
+  gcc_checking_assert (is_gimple_call (gsi_stmt (gsi))
+		       && (gimple_call_flags (gsi_stmt (gsi))
+			   & ECF_RETURNS_TWICE) != 0);
+  edge_iterator ei;
+  edge e, ad_edge = NULL, other_edge = NULL;
+  bool split = false;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
+	{
+	  gimple_stmt_iterator gsi
+	    = gsi_start_nondebug_after_labels_bb (e->src);
+	  gimple *ad = gsi_stmt (gsi);
+	  if (ad && gimple_call_internal_p (ad, IFN_ABNORMAL_DISPATCHER))
+	    {
+	      gcc_checking_assert (ad_edge == NULL);
+	      ad_edge = e;
+	      continue;
+	    }
+	}
+      if (other_edge || e->flags & (EDGE_ABNORMAL | EDGE_EH))
+	split = true;
+      other_edge = e;
+    }
+  gcc_checking_assert (ad_edge);
+  if (other_edge == NULL)
+    split = true;
+  if (split)
+    {
+      other_edge = split_block_after_labels (bb);
+      e = make_edge (ad_edge->src, other_edge->dest, EDGE_ABNORMAL);
+      for (gphi_iterator gsi = gsi_start_phis (other_edge->src);
+	   !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gphi *phi = gsi.phi ();
+	  tree lhs = gimple_phi_result (phi);
+	  tree new_lhs = copy_ssa_name (lhs);
+	  gimple_phi_set_result (phi, new_lhs);
+	  gphi *new_phi = create_phi_node (lhs, other_edge->dest);
+	  add_phi_arg (new_phi, new_lhs, other_edge, UNKNOWN_LOCATION);
+	  add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
+		       e, gimple_phi_arg_location_from_edge (phi, ad_edge));
+	}
+      remove_edge (ad_edge);
+    }
+  return other_edge;
+}
+
+/* Insert G before BB which starts with an ECF_RETURNS_TWICE call.
+   If BB has a single predecessor edge except for edge from
+   .ABNORMAL_DISPATCHER, insert on that edge, otherwise split
+   BB before the call, adjust PHIs and insert into the new BB.
+   If G refers to any results of BB's PHIs, replace them afterwards
+   with corresponding PHI arg.  */
+
+basic_block
+gsi_insert_before_returns_twice_call (basic_block bb, gimple *g)
+{
+  edge e = edge_before_returns_twice_call (bb);
+  basic_block new_bb = gsi_insert_on_edge_immediate (e, g);
+  if (new_bb)
+    e = single_succ_edge (new_bb);
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  bool m = false;
+  FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
+    {
+      tree s = USE_FROM_PTR (use_p);
+      if (SSA_NAME_DEF_STMT (s)
+	  && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
+	  && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
+	{
+	  tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
+	  SET_USE (use_p, unshare_expr (r));
+	  m = true;
+	}
+    }
+  if (m)
+    update_stmt (g);
+  return new_bb;
+}
+
+/* Similiarly for sequence SEQ.  */
+
+basic_block
+gsi_insert_seq_before_returns_twice_call (basic_block bb, gimple_seq seq)
+{
+  if (gimple_seq_empty_p (seq))
+    return NULL;
+  gimple *f = gimple_seq_first_stmt (seq);
+  gimple *l = gimple_seq_last_stmt (seq);
+  edge e = edge_before_returns_twice_call (bb);
+  basic_block new_bb = gsi_insert_seq_on_edge_immediate (e, seq);
+  if (new_bb)
+    e = single_succ_edge (new_bb);
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  for (gimple_stmt_iterator gsi = gsi_for_stmt (f); ; gsi_next (&gsi))
+    {
+      gimple *g = gsi_stmt (gsi);
+      bool m = false;
+      FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
+	{
+	  tree s = USE_FROM_PTR (use_p);
+	  if (SSA_NAME_DEF_STMT (s)
+	      && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
+	      && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
+	    {
+	      tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
+	      SET_USE (use_p, unshare_expr (r));
+	      m = true;
+	    }
+	}
+      if (m)
+	update_stmt (g);
+      if (g == l)
+	break;
+    }
+  return new_bb;
+}
--- gcc/ubsan.cc.jj	2024-03-12 10:15:41.306529121 +0100
+++ gcc/ubsan.cc	2024-03-12 12:34:56.027880687 +0100
@@ -1432,6 +1432,37 @@ ubsan_expand_vptr_ifn (gimple_stmt_itera
   return true;
 }
 
+/* Insert G stmt before ITER.  If ITER is a returns_twice call,
+   insert it on an appropriate edge instead.  */
+
+static void
+ubsan_insert_before (gimple_stmt_iterator *iter, gimple *g)
+{
+  gimple *stmt = gsi_stmt (*iter);
+  if (stmt
+      && is_gimple_call (stmt)
+      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
+    gsi_insert_before_returns_twice_call (gsi_bb (*iter), g);
+  else
+    gsi_insert_before (iter, g, GSI_SAME_STMT);
+}
+
+/* Similarly for sequence SEQ.  */
+
+static void
+ubsan_insert_seq_before (gimple_stmt_iterator *iter, gimple_seq seq)
+{
+  if (gimple_seq_empty_p (seq))
+    return;
+  gimple *stmt = gsi_stmt (*iter);
+  if (stmt
+      && is_gimple_call (stmt)
+      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
+    gsi_insert_seq_before_returns_twice_call (gsi_bb (*iter), seq);
+  else
+    gsi_insert_seq_before (iter, seq, GSI_SAME_STMT);
+}
+
 /* Instrument a memory reference.  BASE is the base of MEM, IS_LHS says
    whether the pointer is on the left hand side of the assignment.  */
 
@@ -1458,7 +1489,7 @@ instrument_mem_ref (tree mem, tree base,
   tree alignt = build_int_cst (pointer_sized_int_node, align);
   gcall *g = gimple_build_call_internal (IFN_UBSAN_NULL, 3, t, kind, alignt);
   gimple_set_location (g, gimple_location (gsi_stmt (*iter)));
-  gsi_insert_before (iter, g, GSI_SAME_STMT);
+  ubsan_insert_before (iter, g);
 }
 
 /* Perform the pointer instrumentation.  */
@@ -1485,7 +1516,7 @@ instrument_pointer_overflow (gimple_stmt
     return;
   gcall *g = gimple_build_call_internal (IFN_UBSAN_PTR, 2, ptr, off);
   gimple_set_location (g, gimple_location (gsi_stmt (*gsi)));
-  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  ubsan_insert_before (gsi, g);
 }
 
 /* Instrument pointer arithmetics if any.  */
@@ -1577,10 +1608,11 @@ maybe_instrument_pointer_overflow (gimpl
       else
 	t = fold_convert (sizetype, moff);
     }
-  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
-				GSI_SAME_STMT);
-  base_addr = force_gimple_operand_gsi (gsi, base_addr, true, NULL_TREE, true,
-					GSI_SAME_STMT);
+  gimple_seq seq, this_seq;
+  t = force_gimple_operand (t, &seq, true, NULL_TREE);
+  base_addr = force_gimple_operand (base_addr, &this_seq, true, NULL_TREE);
+  gimple_seq_add_seq_without_update (&seq, this_seq);
+  ubsan_insert_seq_before (gsi, seq);
   instrument_pointer_overflow (gsi, base_addr, t);
 }
 
@@ -2035,7 +2067,7 @@ instrument_nonnull_arg (gimple_stmt_iter
 	    {
 	      g = gimple_build_assign (make_ssa_name (TREE_TYPE (arg)), arg);
 	      gimple_set_location (g, loc[0]);
-	      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	      ubsan_insert_before (gsi, g);
 	      arg = gimple_assign_lhs (g);
 	    }
 
@@ -2068,7 +2100,7 @@ instrument_nonnull_arg (gimple_stmt_iter
 	      g = gimple_build_call (fn, 1, data);
 	    }
 	  gimple_set_location (g, loc[0]);
-	  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	  ubsan_insert_before (gsi, g);
 	  ubsan_create_edge (g);
 	}
       *gsi = gsi_for_stmt (stmt);
@@ -2124,7 +2156,7 @@ instrument_nonnull_return (gimple_stmt_i
 	  g = gimple_build_call (fn, 2, data, data2);
 	}
       gimple_set_location (g, loc[0]);
-      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      ubsan_insert_before (gsi, g);
       ubsan_create_edge (g);
       *gsi = gsi_for_stmt (stmt);
     }
@@ -2231,6 +2263,7 @@ instrument_object_size (gimple_stmt_iter
   tree sizet;
   tree base_addr = base;
   gimple *bos_stmt = NULL;
+  gimple_seq seq = NULL;
   if (decl_p)
     base_addr = build1 (ADDR_EXPR,
 			build_pointer_type (TREE_TYPE (base)), base);
@@ -2244,19 +2277,12 @@ instrument_object_size (gimple_stmt_iter
       sizet = builtin_decl_explicit (BUILT_IN_DYNAMIC_OBJECT_SIZE);
       sizet = build_call_expr_loc (loc, sizet, 2, base_addr,
 				   integer_zero_node);
-      sizet = force_gimple_operand_gsi (gsi, sizet, false, NULL_TREE, true,
-					GSI_SAME_STMT);
+      sizet = force_gimple_operand (sizet, &seq, false, NULL_TREE);
       /* If the call above didn't end up being an integer constant, go one
 	 statement back and get the __builtin_object_size stmt.  Save it,
 	 we might need it later.  */
       if (SSA_VAR_P (sizet))
-	{
-	  gsi_prev (gsi);
-	  bos_stmt = gsi_stmt (*gsi);
-
-	  /* Move on to where we were.  */
-	  gsi_next (gsi);
-	}
+	bos_stmt = gsi_stmt (gsi_last (seq));
     }
   else
     return;
@@ -2298,21 +2324,24 @@ instrument_object_size (gimple_stmt_iter
       && !TREE_ADDRESSABLE (base))
     mark_addressable (base);
 
+  /* We have to emit the check.  */
+  gimple_seq this_seq;
+  t = force_gimple_operand (t, &this_seq, true, NULL_TREE);
+  gimple_seq_add_seq_without_update (&seq, this_seq);
+  ptr = force_gimple_operand (ptr, &this_seq, true, NULL_TREE);
+  gimple_seq_add_seq_without_update (&seq, this_seq);
+  ubsan_insert_seq_before (gsi, seq);
+
   if (bos_stmt
       && gimple_call_builtin_p (bos_stmt, BUILT_IN_DYNAMIC_OBJECT_SIZE))
     ubsan_create_edge (bos_stmt);
 
-  /* We have to emit the check.  */
-  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
-				GSI_SAME_STMT);
-  ptr = force_gimple_operand_gsi (gsi, ptr, true, NULL_TREE, true,
-				  GSI_SAME_STMT);
   tree ckind = build_int_cst (unsigned_char_type_node,
 			      is_lhs ? UBSAN_STORE_OF : UBSAN_LOAD_OF);
   gimple *g = gimple_build_call_internal (IFN_UBSAN_OBJECT_SIZE, 4,
 					 ptr, t, sizet, ckind);
   gimple_set_location (g, loc);
-  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  ubsan_insert_before (gsi, g);
 }
 
 /* Instrument values passed to builtin functions.  */
--- gcc/testsuite/gcc.dg/ubsan/pr112709-1.c.jj	2024-03-12 12:34:56.027880687 +0100
+++ gcc/testsuite/gcc.dg/ubsan/pr112709-1.c	2024-03-12 12:57:58.993687023 +0100
@@ -0,0 +1,64 @@
+/* PR sanitizer/112709 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -O2" } */
+
+struct S { char c[1024]; };
+int foo (int);
+
+__attribute__((returns_twice, noipa)) struct S
+bar (int x)
+{
+  (void) x;
+  struct S s = {};
+  s.c[42] = 42;
+  return s;
+}
+
+void
+baz (struct S *p)
+{
+  foo (1);
+  *p = bar (0);
+}
+
+void
+qux (int x, struct S *p)
+{
+  if (x == 25)
+    x = foo (2);
+  else if (x == 42)
+    x = foo (foo (3));
+  *p = bar (x);
+}
+
+void
+corge (int x, struct S *p)
+{
+  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
+  if (x == 25)
+    {
+    l1:
+      x = foo (2);
+    }
+  else if (x == 42)
+    {
+    l2:
+      x = foo (foo (3));
+    }
+l3:
+  *p = bar (x);
+  if (x < 4)
+    goto *q[x & 3];
+}
+
+void
+freddy (int x, struct S *p)
+{
+  *p = bar (x);
+  ++p;
+  if (x == 25)
+    x = foo (2);
+  else if (x == 42)
+    x = foo (foo (3));
+  *p = bar (x);
+}
--- gcc/testsuite/gcc.dg/ubsan/pr112709-2.c.jj	2024-03-12 12:34:56.027880687 +0100
+++ gcc/testsuite/gcc.dg/ubsan/pr112709-2.c	2024-03-12 12:58:13.305488706 +0100
@@ -0,0 +1,62 @@
+/* PR sanitizer/112709 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -O2" } */
+
+struct S { char c[1024]; } *p;
+int foo (int);
+
+__attribute__((returns_twice, noipa)) int
+bar (struct S x)
+{
+  (void) x.c[0];
+  return 0;
+}
+
+void
+baz (int *y)
+{
+  foo (1);
+  *y = bar (*p);
+}
+
+void
+qux (int x, int *y)
+{
+  if (x == 25)
+    x = foo (2);
+  else if (x == 42)
+    x = foo (foo (3));
+  *y = bar (*p);
+}
+
+void
+corge (int x, int *y)
+{
+  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
+  if (x == 25)
+    {
+    l1:
+      x = foo (2);
+    }
+  else if (x == 42)
+    {
+    l2:
+      x = foo (foo (3));
+    }
+l3:
+  *y = bar (*p);
+  if (x < 4)
+    goto *q[x & 3];
+}
+
+void
+freddy (int x, int *y, struct S *p)
+{
+  bar (*p);
+  ++p;
+  if (x == 25)
+    x = foo (2);
+  else if (x == 42)
+    x = foo (foo (3));
+  *y = bar (*p);
+}


	Jakub
Richard Biener March 12, 2024, 12:47 p.m. UTC | #3
On Tue, 12 Mar 2024, Jakub Jelinek wrote:

> On Tue, Mar 12, 2024 at 11:42:03AM +0100, Richard Biener wrote:
> > > +static edge
> > > +edge_before_returns_twice_call (basic_block bb)
> > > +{
> > > +  gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
> > > +  gcc_checking_assert (is_gimple_call (gsi_stmt (gsi))
> > > +		       && (gimple_call_flags (gsi_stmt (gsi))
> > > +			   & ECF_RETURNS_TWICE) != 0);
> > > +  edge_iterator ei;
> > > +  edge e, ad_edge = NULL, other_edge = NULL;
> > > +  bool split = false;
> > > +  /* Look for a fallthru and possibly a single backedge.  */
> > 
> > I can't see the backedge handling (I'm not sure it's actually required?)
> 
> Oops, that comment line just shouldn't be there.  Grepping around, I've
> pasted it from tree-ssa-uninit.cc (warn_uninit_phi_uses) - just wanted
> to grab something with the edge_iterator, edge declaration and
> FOR_EACH_EDGE on bb->preds) and forgot to remove that comment line.
> 
> Consider it removed.
> 
> > > +  FOR_EACH_EDGE (e, ei, bb->preds)
> > > +    {
> > > +      if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
> > 
> > EDGE_EH is special because of the landing pad?
> 
> Yes, though I'd think a landing pad would need to start with a label.
> Anyway, the (x & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL
> test is what is tested for ABNORMAL_DISPATCHER in other places:
> tree-cfgcleanup.cc-	  if ((ei_edge (ei)->flags
> tree-cfgcleanup.cc:	       & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
> tree-cfgcleanup.cc-	    {
> tree-cfgcleanup.cc-	      gimple_stmt_iterator gsi
> tree-cfgcleanup.cc-		= gsi_start_nondebug_after_labels_bb (dest);
> tree-cfgcleanup.cc-	      gimple *g = gsi_stmt (gsi);
> tree-cfgcleanup.cc-	      if (g && gimple_call_internal_p (g, IFN_ABNORMAL_DISPATCHER))
> tree-cfgcleanup.cc-		abnormal_dispatchers.safe_push (dest);
> (that is where I copied it from),
> tree-cfg.cc-  FOR_EACH_EDGE (e, ei, bb->succs)
> tree-cfg.cc:    if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
> tree-cfg.cc-      {
> tree-cfg.cc-	gimple_stmt_iterator gsi
> tree-cfg.cc-	  = gsi_start_nondebug_after_labels_bb (e->dest);
> tree-cfg.cc-	gimple *g = gsi_stmt (gsi);
> tree-cfg.cc-	if (g && gimple_call_internal_p (g, IFN_ABNORMAL_DISPATCHER))
> tree-cfg.cc-	  return e->dest;
> Though, admittedly, git blame shows both were written by myself, so it
> doesn't count much.  The edge from IFN_ABNORMAL_DISPATCHER has EDGE_ABNORMAL
> flag set:
>   make_edge (*dispatcher, for_bb, EDGE_ABNORMAL);
> and I bet some EDGE_* flags can be set on that edge later on, but EDGE_EH
> certainly shouldn't be.
> Bet in both of those preexisting cases and in this new one it could be done
> just as flags & EDGE_ABNORMAL test, though perhaps the tree-cfgcleanup.cc
> and tree-cfg.cc cases can see there EDGE_EH edges and would query the first
> non-debug stmt in them uselessly, while perhaps among the returns_twice
> predecessor EDGE_EH is very unlikely and not worth testing.
> So, if you want, I can change that line to
>       if ((e->flags & EDGE_ABNORMAL) != 0)
> 
> > 
> > > +	{
> > > +	  gimple_stmt_iterator gsi
> > > +	    = gsi_start_nondebug_after_labels_bb (e->src);
> > > +	  gimple *ad = gsi_stmt (gsi);
> > > +	  if (ad && gimple_call_internal_p (ad, IFN_ABNORMAL_DISPATCHER))
> > > +	    {
> > > +	      gcc_checking_assert (ad_edge == NULL);
> > > +	      ad_edge = e;
> > > +	      continue;
> > > +	    }
> > > +	}
> > > +      if (other_edge || e->flags & (EDGE_ABNORMAL | EDGE_EH))
> > > +	split = true;
> > > +      other_edge = e;
> > > +    }
> > > +  gcc_checking_assert (ad_edge);
> > > +  if (other_edge == NULL)
> > > +    split = true;
> > > +  if (split)
> > > +    {
> > > +      other_edge = split_block_after_labels (bb);
> > > +      e = make_edge (ad_edge->src, other_edge->dest, EDGE_ABNORMAL);
> > 
> > In theory there could be multiple abnormal edges.
> 
> Sure, but a function has at most one .ABNORMAL_DISPATCHER block.
> Edge from that block is the only one that needs to be kept pointing to
> the returns_twice function.  In the testcases I've even tried to
> get other abnormal edges there, but because of labels we actually have
> labels on the block that falls through to the returns_twice block.
> I bet for EH edges it is the same.
> 
> >  Did you try
> > to use make_forwarder_block instead of manually doing it?
> 
> Looking at it, it doesn't a lot of unnecessary work.  We don't need
> to do any redirection of anything but moving the .ABNORMAL_DISPATCHER
> edge after splitting the block.  And indeed, it would fail on redirection
> of the EDGE_ABNORMAL edge.

I'm mostly concerned about maintainability here, but let's leave that
aside because of the edge redirection that doesn't work (I don't
see why we couldn't just make it work, but ...)

> >  Or
> > does that fail because we end up redirecting the abnormal edges
> > which you avoid by re-making it new and scrapping the old one?
> > I fail to remember why we can't redirect abnormal edges - yes,
> > there's no flow to be redirected but the process of "redirection"
> > still should work.
> 
> Seems the redirection always wants to adjust the src blocks of the
> edges.  That isn't something that should be done for this particular
> abnormal edge, that one actually can be redirected just by replacing it,
> though for say EH edges we'd need to adjust the on the side EH info etc.
> 
> > > +      for (gphi_iterator gsi = gsi_start_phis (other_edge->src);
> > > +	   !gsi_end_p (gsi); gsi_next (&gsi))
> > > +	{
> > > +	  gphi *phi = gsi.phi ();
> > > +	  tree lhs = gimple_phi_result (phi);
> > > +	  tree new_lhs;
> > > +	  if (virtual_operand_p (lhs))
> > > +	    new_lhs = make_ssa_name (SSA_NAME_VAR (lhs));
> > > +	  else
> > > +	    new_lhs = make_ssa_name (TREE_TYPE (lhs));
> 
> Guess I should use new_lhs = copy_ssa_name (lhs); instead
> 
> > > +	  gimple_phi_set_result (phi, new_lhs);
> > > +	  gphi *new_phi = create_phi_node (lhs, other_edge->dest);
> > > +	  add_phi_arg (new_phi, new_lhs, other_edge, UNKNOWN_LOCATION);
> > > +	  add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
> > > +		       e, gimple_phi_arg_location_from_edge (phi, ad_edge));
> > > +	}
> > > +      remove_edge (ad_edge);
> > > +    }
> > > +  return other_edge;
> > > +}
> 
> > > +  FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
> > > +    {
> > > +      tree s = USE_FROM_PTR (use_p);
> > > +      if (SSA_NAME_DEF_STMT (s)
> > > +	  && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
> > > +	  && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
> > > +	{
> > > +	  tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
> > > +	  SET_USE (use_p, unshare_expr (r));
> > > +	  m = true;
> > > +	}
> > > +    }
> > 
> > Ick.  Doesn't the forwarder keep the PHI defs and new PHI defs
> > are created in e->dest instead, I don't see how we need this?
> 
> Admittedly the above is the ugliest part of the patch IMHO.
> It isn't needed in all cases, but e.g. for the pr112709-2.c (qux) case
> we have before ubsan pass
>   # _7(ab) = PHI <_20(9), _8(ab)(11)>
>   _22 = bar (*_7(ab));
> and want to instrument the *_7(ab) load among the parameters for object
> size.
> So, it wants to add
>   _2 = __builtin_dynamic_object_size (_7(ab), 0);
> sequence and later:
>   .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0);
> before the call insn.  If it isn't a noreturn call, it would just
> add those statements into the same basic block using gsi_insert*before.
> But because it is a returns_twice call, it needs to be done in a different
> block.  And having all of ubsan/asan/bitintlower find that out first and
> figure out that it should use _20 instead of _7(ab) seems hard, especially
> that for cases like:

I would have expected the result as

  # _7 = PHI <_20(9)>
  _2 = __builtin_dynamic_object_size (_7, 0);
  .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0);
  // fallthru

  # _99(ab) = PHI <_7, _8(ab)(11)>
  _22 = bar (*_99(ab));

where all uses of _7 in the IL before splitting get replaced via
their immediate uses but the to-be inserted IL can continue using
the defs (that requires the not inserted IL to not have update_stmt
called on them, of course).

I think split_block would have the PHIs organized that way already,
you are adding the _99 defs anyway, right?  That is,

+         tree new_lhs = copy_ssa_name (lhs);
+         gimple_phi_set_result (phi, new_lhs);
+         gphi *new_phi = create_phi_node (lhs, other_edge->dest);

here you swap the defs, I'd omit that.  And instead essentially do
replace_uses_by (lhs, new_lhs) here (without the folding and stuff).
You have to clear SSA_NAME_OCCURS_IN_ABNORMAL_PHI on the old
and set it on the new LHS if it was set.

I think that should contain the "SSA update" to
edge_before_returns_twice_call for the case it splits the block?

> void
> freddy (int x, int *y, struct S *p)
> {
>   bar (*p);
>   ++p;
>   if (x == 25)
>     x = foo (2);
>   else if (x == 42)
>     x = foo (foo (3));
>   *y = bar (*p);
> }
> added to pr112709-2.c we'd have:
>   # x_5(ab) = PHI <x_3(ab)(7), x_4(ab)(3), x_24(6), x_21(10)>
>   # p_8(ab) = PHI <p_16(ab)(7), p_7(ab)(3), p_16(ab)(6), p_16(ab)(10)>
>   _26 = bar (*p_8(ab));
> before ubsan and the .ABNORMAL_DISPATCHER block is 3.
> And the caller doesn't know what to replace the p_8(ab) in there with
> until the new bb is created and PHIs are adjusted.
> 
> Though, I ran into ICE on the above testcase, not on the second bar call
> with the multiple edges, but on the first one, because unlike everything
> else added in the testcases, the first bar splits the e edge on
> gsi_insert_on_edge_immediate, so the assumption that e is the edge from
> the single non-EDGE_ABNORMAL predecessor of the returns_twice block
> to the returns_twice block is then no longer true.  Fixed by adding
>   if (new_bb)
>     e = single_succ_edge (new_bb);
> twice.
> 
> > Btw, gsi_insert_before_returns_twice_call might be usable as
> > gsi_insert_after_labels ()?
> 
> It could be called like that too, sure, though still the callers
> would need to check if they want to insert before some statement or
> "after labels" which for the returns_twice case actually isn't after any
> labels, but on an edge and if there isn't one (except for the EDGE_ABNORMAL),
> ensure there is just one.
> In theory even gsi_insert_before and gsi_insert_seq_before could do those,
> but I think most of the half thousand users of those don't need that
> behavior and would be surprised if something like that happened.
> 
> > Or alternatively not expose
> > gsi_insert_before_returns_twice_call but instead make
> > split_block () behave "correctly" for abnormals (or add
> > split_block_before_returns_twice_call ()).  But that's just bike-shedding.
> 
> I don't see how is that possible.  The usual case is that no split_block
> is needed, usually the returns_twice block has just a single normal
> predecessor and the .ABNORMAL_EDGE predecessor.  And in that case
> we shouldn't be creating further block, just insert on the edge with the
> PHI result to arg replacements.  Similarly, even if we do split it, all
> further instrumentation that should go before the call should go to the
> created block, not result in further splits.
> 
> Anyway, here is an updated patch, with the bogus comment removed,
> copy_ssa_name used, the ICE fix mentioned above and the additions of freddy
> functions into the tests.
> 
> Tested so far with make check-gcc check-g++ RUNTESTFLAGS='ubsan.exp asan.exp'
> 
> 2024-03-12  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR sanitizer/112709
> 	* gimple-iterator.h (gsi_insert_before_returns_twice_call,
> 	gsi_insert_seq_before_returns_twice_call): Declare.
> 	* gimple-iterator.cc: Include gimplify.h.
> 	(edge_before_returns_twice_call, gsi_insert_before_returns_twice_call,
> 	gsi_insert_seq_before_returns_twice_call): New functions.
> 	* ubsan.cc (ubsan_insert_before, ubsan_insert_seq_before): New
> 	functions.
> 	(instrument_mem_ref, instrument_pointer_overflow,
> 	instrument_nonnull_arg, instrument_nonnull_return): Use
> 	ubsan_insert_before instead of gsi_insert_before.
> 	(maybe_instrument_pointer_overflow): Use force_gimple_operand,
> 	gimple_seq_add_seq_without_update and ubsan_insert_seq_before
> 	instead of force_gimple_operand_gsi.
> 	(instrument_object_size): Likewise.  Use ubsan_insert_before instead
> 	of gsi_insert_before.
> 
> 	* gcc.dg/ubsan/pr112709-1.c: New test.
> 	* gcc.dg/ubsan/pr112709-2.c: New test.
> 
> --- gcc/gimple-iterator.h.jj	2024-03-12 10:15:41.253529859 +0100
> +++ gcc/gimple-iterator.h	2024-03-12 12:34:56.026880701 +0100
> @@ -93,6 +93,10 @@ extern void gsi_insert_on_edge (edge, gi
>  extern void gsi_insert_seq_on_edge (edge, gimple_seq);
>  extern basic_block gsi_insert_on_edge_immediate (edge, gimple *);
>  extern basic_block gsi_insert_seq_on_edge_immediate (edge, gimple_seq);
> +extern basic_block gsi_insert_before_returns_twice_call (basic_block,
> +							 gimple *);
> +extern basic_block gsi_insert_seq_before_returns_twice_call (basic_block,
> +							     gimple_seq);
>  extern void gsi_commit_edge_inserts (void);
>  extern void gsi_commit_one_edge_insert (edge, basic_block *);
>  extern gphi_iterator gsi_start_phis (basic_block);
> --- gcc/gimple-iterator.cc.jj	2024-03-12 10:15:41.209530471 +0100
> +++ gcc/gimple-iterator.cc	2024-03-12 12:53:26.174467429 +0100
> @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-cfg.h"
>  #include "tree-ssa.h"
>  #include "value-prof.h"
> +#include "gimplify.h"
>  
>  
>  /* Mark the statement STMT as modified, and update it.  */
> @@ -944,3 +945,134 @@ gsi_start_phis (basic_block bb)
>  
>    return i;
>  }
> +
> +/* Helper function for gsi_insert_before_returns_twice_call and
> +   gsi_insert_seq_before_returns_twice_call.  Find edge to
> +   insert statements before returns_twice call at the start of BB,
> +   if there isn't just one, split the bb and adjust PHIs to ensure
> +   that.  */
> +
> +static edge
> +edge_before_returns_twice_call (basic_block bb)
> +{
> +  gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
> +  gcc_checking_assert (is_gimple_call (gsi_stmt (gsi))
> +		       && (gimple_call_flags (gsi_stmt (gsi))
> +			   & ECF_RETURNS_TWICE) != 0);
> +  edge_iterator ei;
> +  edge e, ad_edge = NULL, other_edge = NULL;
> +  bool split = false;
> +  FOR_EACH_EDGE (e, ei, bb->preds)
> +    {
> +      if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
> +	{
> +	  gimple_stmt_iterator gsi
> +	    = gsi_start_nondebug_after_labels_bb (e->src);
> +	  gimple *ad = gsi_stmt (gsi);
> +	  if (ad && gimple_call_internal_p (ad, IFN_ABNORMAL_DISPATCHER))
> +	    {
> +	      gcc_checking_assert (ad_edge == NULL);
> +	      ad_edge = e;
> +	      continue;
> +	    }
> +	}
> +      if (other_edge || e->flags & (EDGE_ABNORMAL | EDGE_EH))
> +	split = true;
> +      other_edge = e;
> +    }
> +  gcc_checking_assert (ad_edge);
> +  if (other_edge == NULL)
> +    split = true;
> +  if (split)
> +    {
> +      other_edge = split_block_after_labels (bb);
> +      e = make_edge (ad_edge->src, other_edge->dest, EDGE_ABNORMAL);
> +      for (gphi_iterator gsi = gsi_start_phis (other_edge->src);
> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gphi *phi = gsi.phi ();
> +	  tree lhs = gimple_phi_result (phi);
> +	  tree new_lhs = copy_ssa_name (lhs);
> +	  gimple_phi_set_result (phi, new_lhs);
> +	  gphi *new_phi = create_phi_node (lhs, other_edge->dest);
> +	  add_phi_arg (new_phi, new_lhs, other_edge, UNKNOWN_LOCATION);
> +	  add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
> +		       e, gimple_phi_arg_location_from_edge (phi, ad_edge));
> +	}
> +      remove_edge (ad_edge);
> +    }
> +  return other_edge;
> +}
> +
> +/* Insert G before BB which starts with an ECF_RETURNS_TWICE call.
> +   If BB has a single predecessor edge except for edge from
> +   .ABNORMAL_DISPATCHER, insert on that edge, otherwise split
> +   BB before the call, adjust PHIs and insert into the new BB.
> +   If G refers to any results of BB's PHIs, replace them afterwards
> +   with corresponding PHI arg.  */
> +
> +basic_block
> +gsi_insert_before_returns_twice_call (basic_block bb, gimple *g)
> +{
> +  edge e = edge_before_returns_twice_call (bb);
> +  basic_block new_bb = gsi_insert_on_edge_immediate (e, g);
> +  if (new_bb)
> +    e = single_succ_edge (new_bb);
> +  use_operand_p use_p;
> +  ssa_op_iter iter;
> +  bool m = false;
> +  FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
> +    {
> +      tree s = USE_FROM_PTR (use_p);
> +      if (SSA_NAME_DEF_STMT (s)
> +	  && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
> +	  && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
> +	{
> +	  tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
> +	  SET_USE (use_p, unshare_expr (r));
> +	  m = true;
> +	}
> +    }
> +  if (m)
> +    update_stmt (g);
> +  return new_bb;
> +}
> +
> +/* Similiarly for sequence SEQ.  */
> +
> +basic_block
> +gsi_insert_seq_before_returns_twice_call (basic_block bb, gimple_seq seq)
> +{
> +  if (gimple_seq_empty_p (seq))
> +    return NULL;
> +  gimple *f = gimple_seq_first_stmt (seq);
> +  gimple *l = gimple_seq_last_stmt (seq);
> +  edge e = edge_before_returns_twice_call (bb);
> +  basic_block new_bb = gsi_insert_seq_on_edge_immediate (e, seq);
> +  if (new_bb)
> +    e = single_succ_edge (new_bb);
> +  use_operand_p use_p;
> +  ssa_op_iter iter;
> +  for (gimple_stmt_iterator gsi = gsi_for_stmt (f); ; gsi_next (&gsi))
> +    {
> +      gimple *g = gsi_stmt (gsi);
> +      bool m = false;
> +      FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
> +	{
> +	  tree s = USE_FROM_PTR (use_p);
> +	  if (SSA_NAME_DEF_STMT (s)
> +	      && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
> +	      && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
> +	    {
> +	      tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
> +	      SET_USE (use_p, unshare_expr (r));
> +	      m = true;
> +	    }
> +	}
> +      if (m)
> +	update_stmt (g);
> +      if (g == l)
> +	break;
> +    }
> +  return new_bb;
> +}
> --- gcc/ubsan.cc.jj	2024-03-12 10:15:41.306529121 +0100
> +++ gcc/ubsan.cc	2024-03-12 12:34:56.027880687 +0100
> @@ -1432,6 +1432,37 @@ ubsan_expand_vptr_ifn (gimple_stmt_itera
>    return true;
>  }
>  
> +/* Insert G stmt before ITER.  If ITER is a returns_twice call,
> +   insert it on an appropriate edge instead.  */
> +
> +static void
> +ubsan_insert_before (gimple_stmt_iterator *iter, gimple *g)
> +{
> +  gimple *stmt = gsi_stmt (*iter);
> +  if (stmt
> +      && is_gimple_call (stmt)
> +      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +    gsi_insert_before_returns_twice_call (gsi_bb (*iter), g);
> +  else
> +    gsi_insert_before (iter, g, GSI_SAME_STMT);
> +}
> +
> +/* Similarly for sequence SEQ.  */
> +
> +static void
> +ubsan_insert_seq_before (gimple_stmt_iterator *iter, gimple_seq seq)
> +{
> +  if (gimple_seq_empty_p (seq))
> +    return;
> +  gimple *stmt = gsi_stmt (*iter);
> +  if (stmt
> +      && is_gimple_call (stmt)
> +      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
> +    gsi_insert_seq_before_returns_twice_call (gsi_bb (*iter), seq);
> +  else
> +    gsi_insert_seq_before (iter, seq, GSI_SAME_STMT);
> +}
> +
>  /* Instrument a memory reference.  BASE is the base of MEM, IS_LHS says
>     whether the pointer is on the left hand side of the assignment.  */
>  
> @@ -1458,7 +1489,7 @@ instrument_mem_ref (tree mem, tree base,
>    tree alignt = build_int_cst (pointer_sized_int_node, align);
>    gcall *g = gimple_build_call_internal (IFN_UBSAN_NULL, 3, t, kind, alignt);
>    gimple_set_location (g, gimple_location (gsi_stmt (*iter)));
> -  gsi_insert_before (iter, g, GSI_SAME_STMT);
> +  ubsan_insert_before (iter, g);
>  }
>  
>  /* Perform the pointer instrumentation.  */
> @@ -1485,7 +1516,7 @@ instrument_pointer_overflow (gimple_stmt
>      return;
>    gcall *g = gimple_build_call_internal (IFN_UBSAN_PTR, 2, ptr, off);
>    gimple_set_location (g, gimple_location (gsi_stmt (*gsi)));
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  ubsan_insert_before (gsi, g);
>  }
>  
>  /* Instrument pointer arithmetics if any.  */
> @@ -1577,10 +1608,11 @@ maybe_instrument_pointer_overflow (gimpl
>        else
>  	t = fold_convert (sizetype, moff);
>      }
> -  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
> -				GSI_SAME_STMT);
> -  base_addr = force_gimple_operand_gsi (gsi, base_addr, true, NULL_TREE, true,
> -					GSI_SAME_STMT);
> +  gimple_seq seq, this_seq;
> +  t = force_gimple_operand (t, &seq, true, NULL_TREE);
> +  base_addr = force_gimple_operand (base_addr, &this_seq, true, NULL_TREE);
> +  gimple_seq_add_seq_without_update (&seq, this_seq);
> +  ubsan_insert_seq_before (gsi, seq);
>    instrument_pointer_overflow (gsi, base_addr, t);
>  }
>  
> @@ -2035,7 +2067,7 @@ instrument_nonnull_arg (gimple_stmt_iter
>  	    {
>  	      g = gimple_build_assign (make_ssa_name (TREE_TYPE (arg)), arg);
>  	      gimple_set_location (g, loc[0]);
> -	      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	      ubsan_insert_before (gsi, g);
>  	      arg = gimple_assign_lhs (g);
>  	    }
>  
> @@ -2068,7 +2100,7 @@ instrument_nonnull_arg (gimple_stmt_iter
>  	      g = gimple_build_call (fn, 1, data);
>  	    }
>  	  gimple_set_location (g, loc[0]);
> -	  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	  ubsan_insert_before (gsi, g);
>  	  ubsan_create_edge (g);
>  	}
>        *gsi = gsi_for_stmt (stmt);
> @@ -2124,7 +2156,7 @@ instrument_nonnull_return (gimple_stmt_i
>  	  g = gimple_build_call (fn, 2, data, data2);
>  	}
>        gimple_set_location (g, loc[0]);
> -      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      ubsan_insert_before (gsi, g);
>        ubsan_create_edge (g);
>        *gsi = gsi_for_stmt (stmt);
>      }
> @@ -2231,6 +2263,7 @@ instrument_object_size (gimple_stmt_iter
>    tree sizet;
>    tree base_addr = base;
>    gimple *bos_stmt = NULL;
> +  gimple_seq seq = NULL;
>    if (decl_p)
>      base_addr = build1 (ADDR_EXPR,
>  			build_pointer_type (TREE_TYPE (base)), base);
> @@ -2244,19 +2277,12 @@ instrument_object_size (gimple_stmt_iter
>        sizet = builtin_decl_explicit (BUILT_IN_DYNAMIC_OBJECT_SIZE);
>        sizet = build_call_expr_loc (loc, sizet, 2, base_addr,
>  				   integer_zero_node);
> -      sizet = force_gimple_operand_gsi (gsi, sizet, false, NULL_TREE, true,
> -					GSI_SAME_STMT);
> +      sizet = force_gimple_operand (sizet, &seq, false, NULL_TREE);
>        /* If the call above didn't end up being an integer constant, go one
>  	 statement back and get the __builtin_object_size stmt.  Save it,
>  	 we might need it later.  */
>        if (SSA_VAR_P (sizet))
> -	{
> -	  gsi_prev (gsi);
> -	  bos_stmt = gsi_stmt (*gsi);
> -
> -	  /* Move on to where we were.  */
> -	  gsi_next (gsi);
> -	}
> +	bos_stmt = gsi_stmt (gsi_last (seq));
>      }
>    else
>      return;
> @@ -2298,21 +2324,24 @@ instrument_object_size (gimple_stmt_iter
>        && !TREE_ADDRESSABLE (base))
>      mark_addressable (base);
>  
> +  /* We have to emit the check.  */
> +  gimple_seq this_seq;
> +  t = force_gimple_operand (t, &this_seq, true, NULL_TREE);
> +  gimple_seq_add_seq_without_update (&seq, this_seq);
> +  ptr = force_gimple_operand (ptr, &this_seq, true, NULL_TREE);
> +  gimple_seq_add_seq_without_update (&seq, this_seq);
> +  ubsan_insert_seq_before (gsi, seq);
> +
>    if (bos_stmt
>        && gimple_call_builtin_p (bos_stmt, BUILT_IN_DYNAMIC_OBJECT_SIZE))
>      ubsan_create_edge (bos_stmt);
>  
> -  /* We have to emit the check.  */
> -  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
> -				GSI_SAME_STMT);
> -  ptr = force_gimple_operand_gsi (gsi, ptr, true, NULL_TREE, true,
> -				  GSI_SAME_STMT);
>    tree ckind = build_int_cst (unsigned_char_type_node,
>  			      is_lhs ? UBSAN_STORE_OF : UBSAN_LOAD_OF);
>    gimple *g = gimple_build_call_internal (IFN_UBSAN_OBJECT_SIZE, 4,
>  					 ptr, t, sizet, ckind);
>    gimple_set_location (g, loc);
> -  gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +  ubsan_insert_before (gsi, g);
>  }
>  
>  /* Instrument values passed to builtin functions.  */
> --- gcc/testsuite/gcc.dg/ubsan/pr112709-1.c.jj	2024-03-12 12:34:56.027880687 +0100
> +++ gcc/testsuite/gcc.dg/ubsan/pr112709-1.c	2024-03-12 12:57:58.993687023 +0100
> @@ -0,0 +1,64 @@
> +/* PR sanitizer/112709 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined -O2" } */
> +
> +struct S { char c[1024]; };
> +int foo (int);
> +
> +__attribute__((returns_twice, noipa)) struct S
> +bar (int x)
> +{
> +  (void) x;
> +  struct S s = {};
> +  s.c[42] = 42;
> +  return s;
> +}
> +
> +void
> +baz (struct S *p)
> +{
> +  foo (1);
> +  *p = bar (0);
> +}
> +
> +void
> +qux (int x, struct S *p)
> +{
> +  if (x == 25)
> +    x = foo (2);
> +  else if (x == 42)
> +    x = foo (foo (3));
> +  *p = bar (x);
> +}
> +
> +void
> +corge (int x, struct S *p)
> +{
> +  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
> +  if (x == 25)
> +    {
> +    l1:
> +      x = foo (2);
> +    }
> +  else if (x == 42)
> +    {
> +    l2:
> +      x = foo (foo (3));
> +    }
> +l3:
> +  *p = bar (x);
> +  if (x < 4)
> +    goto *q[x & 3];
> +}
> +
> +void
> +freddy (int x, struct S *p)
> +{
> +  *p = bar (x);
> +  ++p;
> +  if (x == 25)
> +    x = foo (2);
> +  else if (x == 42)
> +    x = foo (foo (3));
> +  *p = bar (x);
> +}
> --- gcc/testsuite/gcc.dg/ubsan/pr112709-2.c.jj	2024-03-12 12:34:56.027880687 +0100
> +++ gcc/testsuite/gcc.dg/ubsan/pr112709-2.c	2024-03-12 12:58:13.305488706 +0100
> @@ -0,0 +1,62 @@
> +/* PR sanitizer/112709 */
> +/* { dg-do compile } */
> +/* { dg-options "-fsanitize=undefined -O2" } */
> +
> +struct S { char c[1024]; } *p;
> +int foo (int);
> +
> +__attribute__((returns_twice, noipa)) int
> +bar (struct S x)
> +{
> +  (void) x.c[0];
> +  return 0;
> +}
> +
> +void
> +baz (int *y)
> +{
> +  foo (1);
> +  *y = bar (*p);
> +}
> +
> +void
> +qux (int x, int *y)
> +{
> +  if (x == 25)
> +    x = foo (2);
> +  else if (x == 42)
> +    x = foo (foo (3));
> +  *y = bar (*p);
> +}
> +
> +void
> +corge (int x, int *y)
> +{
> +  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
> +  if (x == 25)
> +    {
> +    l1:
> +      x = foo (2);
> +    }
> +  else if (x == 42)
> +    {
> +    l2:
> +      x = foo (foo (3));
> +    }
> +l3:
> +  *y = bar (*p);
> +  if (x < 4)
> +    goto *q[x & 3];
> +}
> +
> +void
> +freddy (int x, int *y, struct S *p)
> +{
> +  bar (*p);
> +  ++p;
> +  if (x == 25)
> +    x = foo (2);
> +  else if (x == 42)
> +    x = foo (foo (3));
> +  *y = bar (*p);
> +}
> 
> 
> 	Jakub
> 
>
Jakub Jelinek March 12, 2024, 1:12 p.m. UTC | #4
On Tue, Mar 12, 2024 at 01:47:53PM +0100, Richard Biener wrote:
> > Admittedly the above is the ugliest part of the patch IMHO.
> > It isn't needed in all cases, but e.g. for the pr112709-2.c (qux) case
> > we have before ubsan pass
> >   # _7(ab) = PHI <_20(9), _8(ab)(11)>
> >   _22 = bar (*_7(ab));
> > and want to instrument the *_7(ab) load among the parameters for object
> > size.
> > So, it wants to add
> >   _2 = __builtin_dynamic_object_size (_7(ab), 0);
> > sequence and later:
> >   .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0);
> > before the call insn.  If it isn't a noreturn call, it would just
> > add those statements into the same basic block using gsi_insert*before.
> > But because it is a returns_twice call, it needs to be done in a different
> > block.  And having all of ubsan/asan/bitintlower find that out first and
> > figure out that it should use _20 instead of _7(ab) seems hard, especially
> > that for cases like:
> 
> I would have expected the result as
> 
>   # _7 = PHI <_20(9)>
>   _2 = __builtin_dynamic_object_size (_7, 0);
>   .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0);
>   // fallthru
> 
>   # _99(ab) = PHI <_7, _8(ab)(11)>
>   _22 = bar (*_99(ab));
> 
> where all uses of _7 in the IL before splitting get replaced via
> their immediate uses but the to-be inserted IL can continue using
> the defs (that requires the not inserted IL to not have update_stmt
> called on them, of course).
> 
> I think split_block would have the PHIs organized that way already,
> you are adding the _99 defs anyway, right?  That is,
> 
> +         tree new_lhs = copy_ssa_name (lhs);
> +         gimple_phi_set_result (phi, new_lhs);
> +         gphi *new_phi = create_phi_node (lhs, other_edge->dest);
> 
> here you swap the defs, I'd omit that.  And instead essentially do
> replace_uses_by (lhs, new_lhs) here (without the folding and stuff).
> You have to clear SSA_NAME_OCCURS_IN_ABNORMAL_PHI on the old
> and set it on the new LHS if it was set.
> 
> I think that should contain the "SSA update" to
> edge_before_returns_twice_call for the case it splits the block?

I have considered that, but there are problems with that.

The most important is that in the most common case, there is no
block splitting nor any edge splitting, all we have is that
  # _99(ab) = PHI <_7(6), _8(ab)(11)>
  _22 = bar (*_99(ab));
or similar.  And we can do several insertions of statements or sequences
before the bar call.  If it is not returns_twice or even not a call,
we want to use _99(ab) in there and insert just before the call/statement.
If it is returns_twice call, we want to replace the _99(ab) uses with
something that is actually usable on the 6->10 edge.
So, if we wanted to keep using the _99(ab) (wouldn't it be even wrong
that it is (ab)?), we'd need to essentially on every addition
add statements like
  _99(ab) = _7;
(for each PHI node except the virtual one) before the added statement or
statements (or add it only if used in the statement/sequence?) and
change the PHI to be
  # _100(ab) = PHI <_99(ab)(6), _8(ab)(11)>
and replace all uses:
  _22 = bar (*_100(ab));
Right now ubsan can insert something like 3 statements/sequences per
argument, so there can be thousands of them and the block could have
thousands of PHIs, so that is millions of useless SSA_NAME copies.

So, the intention of edge_before_returns_twice_call is just that
it in the common case just finds the non-EDGE_ABNORMAL edge if there is one,
if there isn't just one, it adjusts the IL such that there is just one.
And then the next step is to handle that case.

	Jakub
Richard Biener March 12, 2024, 1:31 p.m. UTC | #5
On Tue, 12 Mar 2024, Jakub Jelinek wrote:

> On Tue, Mar 12, 2024 at 01:47:53PM +0100, Richard Biener wrote:
> > > Admittedly the above is the ugliest part of the patch IMHO.
> > > It isn't needed in all cases, but e.g. for the pr112709-2.c (qux) case
> > > we have before ubsan pass
> > >   # _7(ab) = PHI <_20(9), _8(ab)(11)>
> > >   _22 = bar (*_7(ab));
> > > and want to instrument the *_7(ab) load among the parameters for object
> > > size.
> > > So, it wants to add
> > >   _2 = __builtin_dynamic_object_size (_7(ab), 0);
> > > sequence and later:
> > >   .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0);
> > > before the call insn.  If it isn't a noreturn call, it would just
> > > add those statements into the same basic block using gsi_insert*before.
> > > But because it is a returns_twice call, it needs to be done in a different
> > > block.  And having all of ubsan/asan/bitintlower find that out first and
> > > figure out that it should use _20 instead of _7(ab) seems hard, especially
> > > that for cases like:
> > 
> > I would have expected the result as
> > 
> >   # _7 = PHI <_20(9)>
> >   _2 = __builtin_dynamic_object_size (_7, 0);
> >   .UBSAN_OBJECT_SIZE (_7(ab), 1024, _2, 0);
> >   // fallthru
> > 
> >   # _99(ab) = PHI <_7, _8(ab)(11)>
> >   _22 = bar (*_99(ab));
> > 
> > where all uses of _7 in the IL before splitting get replaced via
> > their immediate uses but the to-be inserted IL can continue using
> > the defs (that requires the not inserted IL to not have update_stmt
> > called on them, of course).
> > 
> > I think split_block would have the PHIs organized that way already,
> > you are adding the _99 defs anyway, right?  That is,
> > 
> > +         tree new_lhs = copy_ssa_name (lhs);
> > +         gimple_phi_set_result (phi, new_lhs);
> > +         gphi *new_phi = create_phi_node (lhs, other_edge->dest);
> > 
> > here you swap the defs, I'd omit that.  And instead essentially do
> > replace_uses_by (lhs, new_lhs) here (without the folding and stuff).
> > You have to clear SSA_NAME_OCCURS_IN_ABNORMAL_PHI on the old
> > and set it on the new LHS if it was set.
> > 
> > I think that should contain the "SSA update" to
> > edge_before_returns_twice_call for the case it splits the block?
> 
> I have considered that, but there are problems with that.
> 
> The most important is that in the most common case, there is no
> block splitting nor any edge splitting, all we have is that
>   # _99(ab) = PHI <_7(6), _8(ab)(11)>
>   _22 = bar (*_99(ab));
> or similar.  And we can do several insertions of statements or sequences
> before the bar call.  If it is not returns_twice or even not a call,
> we want to use _99(ab) in there and insert just before the call/statement.
> If it is returns_twice call, we want to replace the _99(ab) uses with
> something that is actually usable on the 6->10 edge.
> So, if we wanted to keep using the _99(ab) (wouldn't it be even wrong
> that it is (ab)?), we'd need to essentially on every addition
> add statements like
>   _99(ab) = _7;
> (for each PHI node except the virtual one) before the added statement or
> statements (or add it only if used in the statement/sequence?) and
> change the PHI to be
>   # _100(ab) = PHI <_99(ab)(6), _8(ab)(11)>
> and replace all uses:
>   _22 = bar (*_100(ab));
> Right now ubsan can insert something like 3 statements/sequences per
> argument, so there can be thousands of them and the block could have
> thousands of PHIs, so that is millions of useless SSA_NAME copies.

Ah, yeah, I see :/

> So, the intention of edge_before_returns_twice_call is just that
> it in the common case just finds the non-EDGE_ABNORMAL edge if there is one,
> if there isn't just one, it adjusts the IL such that there is just one.
> And then the next step is to handle that case.

So I guess the updated patch is OK then.

Thanks,
Richard.
diff mbox series

Patch

--- gcc/gimple-iterator.h.jj	2024-02-08 11:17:37.144661383 +0100
+++ gcc/gimple-iterator.h	2024-03-11 16:16:24.670464737 +0100
@@ -93,6 +93,10 @@  extern void gsi_insert_on_edge (edge, gi
 extern void gsi_insert_seq_on_edge (edge, gimple_seq);
 extern basic_block gsi_insert_on_edge_immediate (edge, gimple *);
 extern basic_block gsi_insert_seq_on_edge_immediate (edge, gimple_seq);
+extern basic_block gsi_insert_before_returns_twice_call (basic_block,
+							 gimple *);
+extern basic_block gsi_insert_seq_before_returns_twice_call (basic_block,
+							     gimple_seq);
 extern void gsi_commit_edge_inserts (void);
 extern void gsi_commit_one_edge_insert (edge, basic_block *);
 extern gphi_iterator gsi_start_phis (basic_block);
--- gcc/gimple-iterator.cc.jj	2024-02-08 11:17:37.144661383 +0100
+++ gcc/gimple-iterator.cc	2024-03-11 18:17:08.994804808 +0100
@@ -32,6 +32,7 @@  along with GCC; see the file COPYING3.
 #include "tree-cfg.h"
 #include "tree-ssa.h"
 #include "value-prof.h"
+#include "gimplify.h"
 
 
 /* Mark the statement STMT as modified, and update it.  */
@@ -944,3 +945,135 @@  gsi_start_phis (basic_block bb)
 
   return i;
 }
+
+/* Helper function for gsi_insert_before_returns_twice_call and
+   gsi_insert_seq_before_returns_twice_call.  Find edge to
+   insert statements before returns_twice call at the start of BB,
+   if there isn't just one, split the bb and adjust PHIs to ensure
+   that.  */
+
+static edge
+edge_before_returns_twice_call (basic_block bb)
+{
+  gimple_stmt_iterator gsi = gsi_start_nondebug_bb (bb);
+  gcc_checking_assert (is_gimple_call (gsi_stmt (gsi))
+		       && (gimple_call_flags (gsi_stmt (gsi))
+			   & ECF_RETURNS_TWICE) != 0);
+  edge_iterator ei;
+  edge e, ad_edge = NULL, other_edge = NULL;
+  bool split = false;
+  /* Look for a fallthru and possibly a single backedge.  */
+  FOR_EACH_EDGE (e, ei, bb->preds)
+    {
+      if ((e->flags & (EDGE_ABNORMAL | EDGE_EH)) == EDGE_ABNORMAL)
+	{
+	  gimple_stmt_iterator gsi
+	    = gsi_start_nondebug_after_labels_bb (e->src);
+	  gimple *ad = gsi_stmt (gsi);
+	  if (ad && gimple_call_internal_p (ad, IFN_ABNORMAL_DISPATCHER))
+	    {
+	      gcc_checking_assert (ad_edge == NULL);
+	      ad_edge = e;
+	      continue;
+	    }
+	}
+      if (other_edge || e->flags & (EDGE_ABNORMAL | EDGE_EH))
+	split = true;
+      other_edge = e;
+    }
+  gcc_checking_assert (ad_edge);
+  if (other_edge == NULL)
+    split = true;
+  if (split)
+    {
+      other_edge = split_block_after_labels (bb);
+      e = make_edge (ad_edge->src, other_edge->dest, EDGE_ABNORMAL);
+      for (gphi_iterator gsi = gsi_start_phis (other_edge->src);
+	   !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gphi *phi = gsi.phi ();
+	  tree lhs = gimple_phi_result (phi);
+	  tree new_lhs;
+	  if (virtual_operand_p (lhs))
+	    new_lhs = make_ssa_name (SSA_NAME_VAR (lhs));
+	  else
+	    new_lhs = make_ssa_name (TREE_TYPE (lhs));
+	  gimple_phi_set_result (phi, new_lhs);
+	  gphi *new_phi = create_phi_node (lhs, other_edge->dest);
+	  add_phi_arg (new_phi, new_lhs, other_edge, UNKNOWN_LOCATION);
+	  add_phi_arg (new_phi, gimple_phi_arg_def_from_edge (phi, ad_edge),
+		       e, gimple_phi_arg_location_from_edge (phi, ad_edge));
+	}
+      remove_edge (ad_edge);
+    }
+  return other_edge;
+}
+
+/* Insert G before BB which starts with an ECF_RETURNS_TWICE call.
+   If BB has a single predecessor edge except for edge from
+   .ABNORMAL_DISPATCHER, insert on that edge, otherwise split
+   BB before the call, adjust PHIs and insert into the new BB.
+   If G refers to any results of BB's PHIs, replace them afterwards
+   with corresponding PHI arg.  */
+
+basic_block
+gsi_insert_before_returns_twice_call (basic_block bb, gimple *g)
+{
+  edge e = edge_before_returns_twice_call (bb);
+  basic_block new_bb = gsi_insert_on_edge_immediate (e, g);
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  bool m = false;
+  FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
+    {
+      tree s = USE_FROM_PTR (use_p);
+      if (SSA_NAME_DEF_STMT (s)
+	  && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
+	  && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
+	{
+	  tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
+	  SET_USE (use_p, unshare_expr (r));
+	  m = true;
+	}
+    }
+  if (m)
+    update_stmt (g);
+  return new_bb;
+}
+
+/* Similiarly for sequence SEQ.  */
+
+basic_block
+gsi_insert_seq_before_returns_twice_call (basic_block bb, gimple_seq seq)
+{
+  if (gimple_seq_empty_p (seq))
+    return NULL;
+  gimple *f = gimple_seq_first_stmt (seq);
+  gimple *l = gimple_seq_last_stmt (seq);
+  edge e = edge_before_returns_twice_call (bb);
+  basic_block new_bb = gsi_insert_seq_on_edge_immediate (e, seq);
+  use_operand_p use_p;
+  ssa_op_iter iter;
+  for (gimple_stmt_iterator gsi = gsi_for_stmt (f); ; gsi_next (&gsi))
+    {
+      gimple *g = gsi_stmt (gsi);
+      bool m = false;
+      FOR_EACH_SSA_USE_OPERAND (use_p, g, iter, SSA_OP_USE)
+	{
+	  tree s = USE_FROM_PTR (use_p);
+	  if (SSA_NAME_DEF_STMT (s)
+	      && gimple_code (SSA_NAME_DEF_STMT (s)) == GIMPLE_PHI
+	      && gimple_bb (SSA_NAME_DEF_STMT (s)) == e->dest)
+	    {
+	      tree r = gimple_phi_arg_def_from_edge (SSA_NAME_DEF_STMT (s), e);
+	      SET_USE (use_p, unshare_expr (r));
+	      m = true;
+	    }
+	}
+      if (m)
+	update_stmt (g);
+      if (g == l)
+	break;
+    }
+  return new_bb;
+}
--- gcc/ubsan.cc.jj	2024-01-03 11:51:32.312720350 +0100
+++ gcc/ubsan.cc	2024-03-11 17:07:58.104243341 +0100
@@ -1432,6 +1432,37 @@  ubsan_expand_vptr_ifn (gimple_stmt_itera
   return true;
 }
 
+/* Insert G stmt before ITER.  If ITER is a returns_twice call,
+   insert it on an appropriate edge instead.  */
+
+static void
+ubsan_insert_before (gimple_stmt_iterator *iter, gimple *g)
+{
+  gimple *stmt = gsi_stmt (*iter);
+  if (stmt
+      && is_gimple_call (stmt)
+      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
+    gsi_insert_before_returns_twice_call (gsi_bb (*iter), g);
+  else
+    gsi_insert_before (iter, g, GSI_SAME_STMT);
+}
+
+/* Similarly for sequence SEQ.  */
+
+static void
+ubsan_insert_seq_before (gimple_stmt_iterator *iter, gimple_seq seq)
+{
+  if (gimple_seq_empty_p (seq))
+    return;
+  gimple *stmt = gsi_stmt (*iter);
+  if (stmt
+      && is_gimple_call (stmt)
+      && (gimple_call_flags (stmt) & ECF_RETURNS_TWICE) != 0)
+    gsi_insert_seq_before_returns_twice_call (gsi_bb (*iter), seq);
+  else
+    gsi_insert_seq_before (iter, seq, GSI_SAME_STMT);
+}
+
 /* Instrument a memory reference.  BASE is the base of MEM, IS_LHS says
    whether the pointer is on the left hand side of the assignment.  */
 
@@ -1458,7 +1489,7 @@  instrument_mem_ref (tree mem, tree base,
   tree alignt = build_int_cst (pointer_sized_int_node, align);
   gcall *g = gimple_build_call_internal (IFN_UBSAN_NULL, 3, t, kind, alignt);
   gimple_set_location (g, gimple_location (gsi_stmt (*iter)));
-  gsi_insert_before (iter, g, GSI_SAME_STMT);
+  ubsan_insert_before (iter, g);
 }
 
 /* Perform the pointer instrumentation.  */
@@ -1485,7 +1516,7 @@  instrument_pointer_overflow (gimple_stmt
     return;
   gcall *g = gimple_build_call_internal (IFN_UBSAN_PTR, 2, ptr, off);
   gimple_set_location (g, gimple_location (gsi_stmt (*gsi)));
-  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  ubsan_insert_before (gsi, g);
 }
 
 /* Instrument pointer arithmetics if any.  */
@@ -1577,10 +1608,11 @@  maybe_instrument_pointer_overflow (gimpl
       else
 	t = fold_convert (sizetype, moff);
     }
-  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
-				GSI_SAME_STMT);
-  base_addr = force_gimple_operand_gsi (gsi, base_addr, true, NULL_TREE, true,
-					GSI_SAME_STMT);
+  gimple_seq seq, this_seq;
+  t = force_gimple_operand (t, &seq, true, NULL_TREE);
+  base_addr = force_gimple_operand (base_addr, &this_seq, true, NULL_TREE);
+  gimple_seq_add_seq_without_update (&seq, this_seq);
+  ubsan_insert_seq_before (gsi, seq);
   instrument_pointer_overflow (gsi, base_addr, t);
 }
 
@@ -2035,7 +2067,7 @@  instrument_nonnull_arg (gimple_stmt_iter
 	    {
 	      g = gimple_build_assign (make_ssa_name (TREE_TYPE (arg)), arg);
 	      gimple_set_location (g, loc[0]);
-	      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	      ubsan_insert_before (gsi, g);
 	      arg = gimple_assign_lhs (g);
 	    }
 
@@ -2068,7 +2100,7 @@  instrument_nonnull_arg (gimple_stmt_iter
 	      g = gimple_build_call (fn, 1, data);
 	    }
 	  gimple_set_location (g, loc[0]);
-	  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	  ubsan_insert_before (gsi, g);
 	  ubsan_create_edge (g);
 	}
       *gsi = gsi_for_stmt (stmt);
@@ -2124,7 +2156,7 @@  instrument_nonnull_return (gimple_stmt_i
 	  g = gimple_build_call (fn, 2, data, data2);
 	}
       gimple_set_location (g, loc[0]);
-      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      ubsan_insert_before (gsi, g);
       ubsan_create_edge (g);
       *gsi = gsi_for_stmt (stmt);
     }
@@ -2231,6 +2263,7 @@  instrument_object_size (gimple_stmt_iter
   tree sizet;
   tree base_addr = base;
   gimple *bos_stmt = NULL;
+  gimple_seq seq = NULL;
   if (decl_p)
     base_addr = build1 (ADDR_EXPR,
 			build_pointer_type (TREE_TYPE (base)), base);
@@ -2244,19 +2277,12 @@  instrument_object_size (gimple_stmt_iter
       sizet = builtin_decl_explicit (BUILT_IN_DYNAMIC_OBJECT_SIZE);
       sizet = build_call_expr_loc (loc, sizet, 2, base_addr,
 				   integer_zero_node);
-      sizet = force_gimple_operand_gsi (gsi, sizet, false, NULL_TREE, true,
-					GSI_SAME_STMT);
+      sizet = force_gimple_operand (sizet, &seq, false, NULL_TREE);
       /* If the call above didn't end up being an integer constant, go one
 	 statement back and get the __builtin_object_size stmt.  Save it,
 	 we might need it later.  */
       if (SSA_VAR_P (sizet))
-	{
-	  gsi_prev (gsi);
-	  bos_stmt = gsi_stmt (*gsi);
-
-	  /* Move on to where we were.  */
-	  gsi_next (gsi);
-	}
+	bos_stmt = gsi_stmt (gsi_last (seq));
     }
   else
     return;
@@ -2298,21 +2324,24 @@  instrument_object_size (gimple_stmt_iter
       && !TREE_ADDRESSABLE (base))
     mark_addressable (base);
 
+  /* We have to emit the check.  */
+  gimple_seq this_seq;
+  t = force_gimple_operand (t, &this_seq, true, NULL_TREE);
+  gimple_seq_add_seq_without_update (&seq, this_seq);
+  ptr = force_gimple_operand (ptr, &this_seq, true, NULL_TREE);
+  gimple_seq_add_seq_without_update (&seq, this_seq);
+  ubsan_insert_seq_before (gsi, seq);
+
   if (bos_stmt
       && gimple_call_builtin_p (bos_stmt, BUILT_IN_DYNAMIC_OBJECT_SIZE))
     ubsan_create_edge (bos_stmt);
 
-  /* We have to emit the check.  */
-  t = force_gimple_operand_gsi (gsi, t, true, NULL_TREE, true,
-				GSI_SAME_STMT);
-  ptr = force_gimple_operand_gsi (gsi, ptr, true, NULL_TREE, true,
-				  GSI_SAME_STMT);
   tree ckind = build_int_cst (unsigned_char_type_node,
 			      is_lhs ? UBSAN_STORE_OF : UBSAN_LOAD_OF);
   gimple *g = gimple_build_call_internal (IFN_UBSAN_OBJECT_SIZE, 4,
 					 ptr, t, sizet, ckind);
   gimple_set_location (g, loc);
-  gsi_insert_before (gsi, g, GSI_SAME_STMT);
+  ubsan_insert_before (gsi, g);
 }
 
 /* Instrument values passed to builtin functions.  */
--- gcc/testsuite/gcc.dg/ubsan/pr112709-1.c.jj	2024-03-11 16:55:09.038760951 +0100
+++ gcc/testsuite/gcc.dg/ubsan/pr112709-1.c	2024-03-11 16:55:02.159854950 +0100
@@ -0,0 +1,52 @@ 
+/* PR sanitizer/112709 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -O2" } */
+
+struct S { char c[1024]; };
+int foo (int);
+
+__attribute__((returns_twice, noipa)) struct S
+bar (int x)
+{
+  (void) x;
+  struct S s = {};
+  s.c[42] = 42;
+  return s;
+}
+
+void
+baz (struct S *p)
+{
+  foo (1);
+  *p = bar (0);
+}
+
+void
+qux (int x, struct S *p)
+{
+  if (x == 25)
+    x = foo (2);
+  else if (x == 42)
+    x = foo (foo (3));
+  *p = bar (x);
+}
+
+void
+corge (int x, struct S *p)
+{
+  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
+  if (x == 25)
+    {
+    l1:
+      x = foo (2);
+    }
+  else if (x == 42)
+    {
+    l2:
+      x = foo (foo (3));
+    }
+l3:
+  *p = bar (x);
+  if (x < 4)
+    goto *q[x & 3];
+}
--- gcc/testsuite/gcc.dg/ubsan/pr112709-2.c.jj	2024-03-11 16:55:37.000378840 +0100
+++ gcc/testsuite/gcc.dg/ubsan/pr112709-2.c	2024-03-11 17:13:37.517599492 +0100
@@ -0,0 +1,50 @@ 
+/* PR sanitizer/112709 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=undefined -O2" } */
+
+struct S { char c[1024]; } *p;
+int foo (int);
+
+__attribute__((returns_twice, noipa)) int
+bar (struct S x)
+{
+  (void) x.c[0];
+  return 0;
+}
+
+void
+baz (int *y)
+{
+  foo (1);
+  *y = bar (*p);
+}
+
+void
+qux (int x, int *y)
+{
+  if (x == 25)
+    x = foo (2);
+  else if (x == 42)
+    x = foo (foo (3));
+  *y = bar (*p);
+}
+
+void
+corge (int x, int *y)
+{
+  void *q[] = { &&l1, &&l2, &&l3, &&l3 };
+  if (x == 25)
+    {
+    l1:
+      x = foo (2);
+    }
+  else if (x == 42)
+    {
+    l2:
+      x = foo (foo (3));
+    }
+l3:
+  *y = bar (*p);
+  if (x < 4)
+    goto *q[x & 3];
+}