diff mbox

Enhance ASAN_CHECK optimization

Message ID 5474B6F8.809@samsung.com
State New
Headers show

Commit Message

Yury Gribov Nov. 25, 2014, 5:06 p.m. UTC
Hi all,

This patch improves current optimization of ASAN_CHECKS performed by 
sanopt pass.  In addition to searching the sanitized pointer in 
asan_check_map, it also tries to search for definition of this pointer. 
  This allows more checks to be dropped when definition is not a gimple 
value (e.g. load from struct field) and thus cannot be propagated by 
forwprop.

In my experiments this rarely managed to remove more than 0.5% of 
ASAN_CHECKs but some files got as much as 1% improvement e.g.
* gimple.c: 49 (out of 5293)
* varasm.c: 42 (out of 3678)

For a total it was able to remove 2657 checks in Asan-bootstrapped GCC 
(out of ~500K).

I've Asan-bootstrapped, bootstrapped and regtested on x64.

Is this ok for stage3?

Best regards,
Yury

Comments

Jakub Jelinek Nov. 26, 2014, 9:01 a.m. UTC | #1
On Tue, Nov 25, 2014 at 08:06:00PM +0300, Yury Gribov wrote:
> +/* Traits class for tree hash maps below.  */
> +
> +struct tree_map_traits : default_hashmap_traits
> +{
> +  static inline hashval_t hash (const_tree ref)
> +    {
> +      return iterative_hash_expr (ref, 0);
> +    }
> +
> +  static inline bool equal_keys (const_tree ref1, const_tree ref2)
> +    {
> +      return operand_equal_p (ref1, ref2, 0);
> +    }

Formatting.  The {} should be indented like static and return 2 columns to
the right of that.

> @@ -281,37 +316,46 @@ maybe_optimize_asan_check_ifn (struct sanopt_ctx *ctx, gimple stmt)
>  
>    gimple_set_uid (stmt, info->freeing_call_events);
>  
> -  auto_vec<gimple> &v = ctx->asan_check_map.get_or_insert (ptr);
> -  if (v.is_empty ())
> +  auto_vec<gimple> *ptr_checks = &ctx->asan_check_map.get_or_insert (ptr);
> +  gimple g = maybe_get_dominating_check (*ptr_checks);
> +
> +  tree base_addr = maybe_get_single_definition (ptr);
> +  auto_vec<gimple> *base_checks = NULL;
> +  if (base_addr)
>      {
> -      /* For this PTR we don't have any ASAN_CHECK stmts recorded, so there's
> -	 nothing to optimize yet.  */
> -      v.safe_push (stmt);
> -      return false;
> +      base_checks = &ctx->asan_check_map.get_or_insert (base_addr);
> +      /* Original pointer might have been invalidated.  */
> +      ptr_checks = ctx->asan_check_map.get (ptr);
>      }

For base_addr computation, you don't really need g or ptr_checks,
do you?  So why not move the:
  auto_vec<gimple> *ptr_checks = &ctx->asan_check_map.get_or_insert (ptr);
  gimple g = maybe_get_dominating_check (*ptr_checks);
lines below the if?

> @@ -404,10 +445,7 @@ sanopt_optimize_walker (basic_block bb, struct sanopt_ctx *ctx)
>    basic_block son;
>    gimple_stmt_iterator gsi;
>    sanopt_info *info = (sanopt_info *) bb->aux;
> -  bool asan_check_optimize
> -    = (flag_sanitize & SANITIZE_ADDRESS)
> -      && ((flag_sanitize & flag_sanitize_recover
> -	   & SANITIZE_KERNEL_ADDRESS) == 0);
> +  bool asan_check_optimize = (flag_sanitize & SANITIZE_ADDRESS) != 0;
>  
>    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
>      {

I'm afraid I'm not convinced about this hunk.  If asan (kernel-address) is
recovering, I don't see a difference from not reporting two different
invalid accesses to the same function and not reporting two integer
overflows in the same function, at least if they have different location_t.

	Jakub
Yuri Gribov Nov. 26, 2014, 6:21 p.m. UTC | #2
> Formatting.  The {} should be indented like static
> and return 2 columns to the right of that.

Right.

> For base_addr computation, you don't really need g or ptr_checks, 
> do you?  So why not move the: 
>   auto_vec<gimple> *ptr_checks = &ctx->asan_check_map.get_or_insert (ptr); 
>   gimple g = maybe_get_dominating_check (*ptr_checks); 
> lines below the if?

I can do this.  But then base_checks would be invalidated when I call
get_or_insert for ptr_checks so I'll still have to hash_map::get.

> If asan (kernel-address) is 
> recovering, I don't see a difference from not reporting two different 
> invalid accesses to the same function and not reporting two integer 
> overflows in the same function, at least if they have different
> location_t.

Ok, agreed. BTW how about replacing '& SANITIZE_KERNEL_ADDRESS' with '&
SANITIZE_ADDRESS'?  I know we do not support recovery for userspace but
having a general enum sounds more logical.

-Y



--
View this message in context: http://gcc.1065356.n5.nabble.com/PATCH-Optimize-UBSAN-NULL-checks-add-sanopt-c-tp1085786p1095527.html
Sent from the gcc - patches mailing list archive at Nabble.com.
Jakub Jelinek Nov. 26, 2014, 6:28 p.m. UTC | #3
On Wed, Nov 26, 2014 at 11:21:06AM -0700, ygribov wrote:
> > Formatting.  The {} should be indented like static
> > and return 2 columns to the right of that.
> 
> Right.
> 
> > For base_addr computation, you don't really need g or ptr_checks, 
> > do you?  So why not move the: 
> >   auto_vec<gimple> *ptr_checks = &ctx->asan_check_map.get_or_insert (ptr); 
> >   gimple g = maybe_get_dominating_check (*ptr_checks); 
> > lines below the if?
> 
> I can do this.  But then base_checks would be invalidated when I call
> get_or_insert for ptr_checks so I'll still have to hash_map::get.

You're right.

> > If asan (kernel-address) is 
> > recovering, I don't see a difference from not reporting two different 
> > invalid accesses to the same function and not reporting two integer 
> > overflows in the same function, at least if they have different
> > location_t.
> 
> Ok, agreed. BTW how about replacing '& SANITIZE_KERNEL_ADDRESS' with '&
> SANITIZE_ADDRESS'?  I know we do not support recovery for userspace but
> having a general enum sounds more logical.

Testing SANITIZE_ADDRESS bit in flag_sanitize_recover doesn't make sense,
testing it in flag_sanitize of course does, but for recover you care whether
the SANITIZE_{KERNEL,USER}_ADDRESS bit in flag_sanitize_recover is set
depending on if SANITIZE_{KERNEL,USER}_ADDRESS is set in
flag_sanitize_recover.
So supposedly
((flag_sanitize & flag_sanitize_recover)
 & (SANITIZE_USER_ADDRESS | SANITIZE_KERNEL_ADDRESS)) != 0
is the right check for whether the current address sanitization wants to
recover.

	Jakub
Yuri Gribov Nov. 26, 2014, 6:42 p.m. UTC | #4
> Testing SANITIZE_ADDRESS bit in flag_sanitize_recover doesn't make sense, 
> testing it in flag_sanitize of course does, but for recover you care
> whether 
> the SANITIZE_{KERNEL,USER}_ADDRESS bit in flag_sanitize_recover is set 
> depending on if SANITIZE_{KERNEL,USER}_ADDRESS is set in 
> flag_sanitize_recover. 

Ok, got it. BTW shouldn't we disable local optimization of ASan checks (in
asan.c) as well? That would be a massive perf hit ...

-Y



--
View this message in context: http://gcc.1065356.n5.nabble.com/PATCH-Optimize-UBSAN-NULL-checks-add-sanopt-c-tp1085786p1095536.html
Sent from the gcc - patches mailing list archive at Nabble.com.
Jakub Jelinek Nov. 26, 2014, 6:51 p.m. UTC | #5
On Wed, Nov 26, 2014 at 11:42:57AM -0700, ygribov wrote:
> > Testing SANITIZE_ADDRESS bit in flag_sanitize_recover doesn't make sense, 
> > testing it in flag_sanitize of course does, but for recover you care
> > whether 
> > the SANITIZE_{KERNEL,USER}_ADDRESS bit in flag_sanitize_recover is set 
> > depending on if SANITIZE_{KERNEL,USER}_ADDRESS is set in 
> > flag_sanitize_recover. 
> 
> Ok, got it. BTW shouldn't we disable local optimization of ASan checks (in
> asan.c) as well? That would be a massive perf hit ...

Ah, you're right, we are already doing that.  So let's just optimize always
even when recovering and we'll see if users don't complain too much.

	Jakub
diff mbox

Patch

From 85f65c403f132245e9efcc8a420269f8d631fae6 Mon Sep 17 00:00:00 2001
From: Yury Gribov <y.gribov@samsung.com>
Date: Tue, 25 Nov 2014 11:49:11 +0300
Subject: [PATCH] 2014-11-25  Yury Gribov  <y.gribov@samsung.com>

gcc/
	* sanopt.c (maybe_get_single_definition): New function.
	(struct tree_map_traits): New struct.
	(struct sanopt_ctx): Use custom traits for asan_check_map.
	(maybe_get_dominating_check): New function.
	(maybe_optimize_ubsan_null_ifn): Move code to
	maybe_get_dominating_check.
	(maybe_optimize_asan_check_ifn): Ditto. Take non-SSA expressions
	into account when optimizing.
	(sanopt_optimize_walker): Do not treat recoverable sanitization
	specially.
---
 gcc/sanopt.c |  194 +++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 116 insertions(+), 78 deletions(-)

diff --git a/gcc/sanopt.c b/gcc/sanopt.c
index e1d11e0..9fe87de 100644
--- a/gcc/sanopt.c
+++ b/gcc/sanopt.c
@@ -84,6 +84,35 @@  struct sanopt_info
   bool visited_p;
 };
 
+/* If T has a single definition of form T = T2, return T2.  */
+
+static tree
+maybe_get_single_definition (tree t)
+{
+  if (TREE_CODE (t) == SSA_NAME)
+    {
+      gimple g = SSA_NAME_DEF_STMT (t);
+      if (gimple_assign_single_p (g))
+	return gimple_assign_rhs1 (g);
+    }
+  return NULL_TREE;
+}
+
+/* Traits class for tree hash maps below.  */
+
+struct tree_map_traits : default_hashmap_traits
+{
+  static inline hashval_t hash (const_tree ref)
+    {
+      return iterative_hash_expr (ref, 0);
+    }
+
+  static inline bool equal_keys (const_tree ref1, const_tree ref2)
+    {
+      return operand_equal_p (ref1, ref2, 0);
+    }
+}; 
+
 /* This is used to carry various hash maps and variables used
    in sanopt_optimize_walker.  */
 
@@ -95,7 +124,7 @@  struct sanopt_ctx
 
   /* This map maps a pointer (the second argument of ASAN_CHECK) to
      a vector of ASAN_CHECK call statements that check the access.  */
-  hash_map<tree, auto_vec<gimple> > asan_check_map;
+  hash_map<tree, auto_vec<gimple>, tree_map_traits> asan_check_map;
 
   /* Number of IFN_ASAN_CHECK statements.  */
   int asan_num_accesses;
@@ -197,6 +226,24 @@  imm_dom_path_with_freeing_call (basic_block bb, basic_block dom)
   return false;
 }
 
+/* Get the first dominating check from the list of stored checks.
+   Non-dominating checks are silently dropped.  */
+
+static gimple
+maybe_get_dominating_check (auto_vec<gimple> &v)
+{
+  for (; !v.is_empty (); v.pop ())
+    {
+      gimple g = v.last ();
+      sanopt_info *si = (sanopt_info *) gimple_bb (g)->aux;
+      if (!si->visited_p)
+	/* At this point we shouldn't have any statements
+	   that aren't dominating the current BB.  */
+	return g;
+    }
+  return NULL;
+}
+
 /* Optimize away redundant UBSAN_NULL calls.  */
 
 static bool
@@ -209,7 +256,8 @@  maybe_optimize_ubsan_null_ifn (struct sanopt_ctx *ctx, gimple stmt)
   bool remove = false;
 
   auto_vec<gimple> &v = ctx->null_check_map.get_or_insert (ptr);
-  if (v.is_empty ())
+  gimple g = maybe_get_dominating_check (v);
+  if (!g)
     {
       /* For this PTR we don't have any UBSAN_NULL stmts recorded, so there's
 	 nothing to optimize yet.  */
@@ -220,43 +268,30 @@  maybe_optimize_ubsan_null_ifn (struct sanopt_ctx *ctx, gimple stmt)
   /* We already have recorded a UBSAN_NULL check for this pointer. Perhaps we
      can drop this one.  But only if this check doesn't specify stricter
      alignment.  */
-  while (!v.is_empty ())
-    {
-      gimple g = v.last ();
-      /* Remove statements for BBs that have been already processed.  */
-      sanopt_info *si = (sanopt_info *) gimple_bb (g)->aux;
-      if (si->visited_p)
-	{
-	  v.pop ();
-	  continue;
-	}
 
-      /* At this point we shouldn't have any statements that aren't dominating
-	 the current BB.  */
-      tree align = gimple_call_arg (g, 2);
-      int kind = tree_to_shwi (gimple_call_arg (g, 1));
-      /* If this is a NULL pointer check where we had segv anyway, we can
-	 remove it.  */
-      if (integer_zerop (align)
-	  && (kind == UBSAN_LOAD_OF
-	      || kind == UBSAN_STORE_OF
-	      || kind == UBSAN_MEMBER_ACCESS))
-	remove = true;
-      /* Otherwise remove the check in non-recovering mode, or if the
-	 stmts have same location.  */
-      else if (integer_zerop (align))
-	remove = (flag_sanitize_recover & SANITIZE_NULL) == 0
-		 || flag_sanitize_undefined_trap_on_error
-		 || gimple_location (g) == gimple_location (stmt);
-      else if (tree_int_cst_le (cur_align, align))
-	remove = (flag_sanitize_recover & SANITIZE_ALIGNMENT) == 0
-		 || flag_sanitize_undefined_trap_on_error
-		 || gimple_location (g) == gimple_location (stmt);
-      if (!remove && gimple_bb (g) == gimple_bb (stmt)
-	  && tree_int_cst_compare (cur_align, align) == 0)
-	v.pop ();
-      break;
-    }
+  tree align = gimple_call_arg (g, 2);
+  int kind = tree_to_shwi (gimple_call_arg (g, 1));
+  /* If this is a NULL pointer check where we had segv anyway, we can
+     remove it.  */
+  if (integer_zerop (align)
+      && (kind == UBSAN_LOAD_OF
+	  || kind == UBSAN_STORE_OF
+	  || kind == UBSAN_MEMBER_ACCESS))
+    remove = true;
+  /* Otherwise remove the check in non-recovering mode, or if the
+     stmts have same location.  */
+  else if (integer_zerop (align))
+    remove = (flag_sanitize_recover & SANITIZE_NULL) == 0
+	      || flag_sanitize_undefined_trap_on_error
+	      || gimple_location (g) == gimple_location (stmt);
+  else if (tree_int_cst_le (cur_align, align))
+    remove = (flag_sanitize_recover & SANITIZE_ALIGNMENT) == 0
+	      || flag_sanitize_undefined_trap_on_error
+	      || gimple_location (g) == gimple_location (stmt);
+
+  if (!remove && gimple_bb (g) == gimple_bb (stmt)
+      && tree_int_cst_compare (cur_align, align) == 0)
+    v.pop ();
 
   if (!remove)
     v.safe_push (stmt);
@@ -281,37 +316,46 @@  maybe_optimize_asan_check_ifn (struct sanopt_ctx *ctx, gimple stmt)
 
   gimple_set_uid (stmt, info->freeing_call_events);
 
-  auto_vec<gimple> &v = ctx->asan_check_map.get_or_insert (ptr);
-  if (v.is_empty ())
+  auto_vec<gimple> *ptr_checks = &ctx->asan_check_map.get_or_insert (ptr);
+  gimple g = maybe_get_dominating_check (*ptr_checks);
+
+  tree base_addr = maybe_get_single_definition (ptr);
+  auto_vec<gimple> *base_checks = NULL;
+  if (base_addr)
     {
-      /* For this PTR we don't have any ASAN_CHECK stmts recorded, so there's
-	 nothing to optimize yet.  */
-      v.safe_push (stmt);
-      return false;
+      base_checks = &ctx->asan_check_map.get_or_insert (base_addr);
+      /* Original pointer might have been invalidated.  */
+      ptr_checks = ctx->asan_check_map.get (ptr);
     }
 
-  /* We already have recorded a ASAN_CHECK check for this pointer.  Perhaps
-     we can drop this one.  But only if this check doesn't specify larger
-     size.  */
-  while (!v.is_empty ())
+  auto_vec<gimple> *checks = ptr_checks;
+
+  if (!g)
     {
-      gimple g = v.last ();
-      /* Remove statements for BBs that have been already processed.  */
-      sanopt_info *si = (sanopt_info *) gimple_bb (g)->aux;
-      if (si->visited_p)
-	v.pop ();
-      else
-	break;
+      /* Try with base address as well.  */
+      if (base_checks)
+	{
+	  g = maybe_get_dominating_check (*base_checks);
+	  checks = base_checks;
+	}
+      if (!g)
+	{
+	  /* For this PTR we don't have any ASAN_CHECK stmts recorded, so there's
+	     nothing to optimize yet.  */
+	  ptr_checks->safe_push (stmt);
+	  if (base_checks)
+	    base_checks->safe_push (stmt);
+	  return false;
+	}
     }
 
   unsigned int i;
-  gimple g;
   gimple to_pop = NULL;
   bool remove = false;
   basic_block last_bb = bb;
   bool cleanup = false;
 
-  FOR_EACH_VEC_ELT_REVERSE (v, i, g)
+  FOR_EACH_VEC_ELT_REVERSE (*checks, i, g)
     {
       basic_block gbb = gimple_bb (g);
       sanopt_info *si = (sanopt_info *) gbb->aux;
@@ -323,17 +367,9 @@  maybe_optimize_asan_check_ifn (struct sanopt_ctx *ctx, gimple stmt)
 	  continue;
 	}
 
-      if (TREE_CODE (len) != INTEGER_CST)
-	{
-	  /* If there is some stmts not followed by freeing call event
-	     for ptr already in the current bb, no need to insert anything.
-	     Non-constant len is treated just as length 1.  */
-	  if (gbb == bb)
-	    return false;
-	  break;
-	}
-
       tree glen = gimple_call_arg (g, 2);
+      gcc_assert (TREE_CODE (glen) == INTEGER_CST);
+
       /* If we've checked only smaller length than we want to check now,
 	 we can't remove the current stmt.  If g is in the same basic block,
 	 we want to remove it though, as the current stmt is better.  */
@@ -369,22 +405,27 @@  maybe_optimize_asan_check_ifn (struct sanopt_ctx *ctx, gimple stmt)
 
   if (cleanup)
     {
-      unsigned int j = 0, l = v.length ();
+      unsigned int j = 0, l = checks->length ();
       for (i = 0; i < l; i++)
-	if (v[i] != to_pop
-	    && (gimple_uid (v[i])
+	if ((*checks)[i] != to_pop
+	    && (gimple_uid ((*checks)[i])
 		== ((sanopt_info *)
-		    gimple_bb (v[i])->aux)->freeing_call_events))
+		    gimple_bb ((*checks)[i])->aux)->freeing_call_events))
 	  {
 	    if (i != j)
-	      v[j] = v[i];
+	      (*checks)[j] = (*checks)[i];
 	    j++;
 	  }
-      v.truncate (j);
+      checks->truncate (j);
     }
 
   if (!remove)
-    v.safe_push (stmt);
+    {
+      ptr_checks->safe_push (stmt);
+      if (base_checks)
+	base_checks->safe_push (stmt);
+    }
+
   return remove;
 }
 
@@ -404,10 +445,7 @@  sanopt_optimize_walker (basic_block bb, struct sanopt_ctx *ctx)
   basic_block son;
   gimple_stmt_iterator gsi;
   sanopt_info *info = (sanopt_info *) bb->aux;
-  bool asan_check_optimize
-    = (flag_sanitize & SANITIZE_ADDRESS)
-      && ((flag_sanitize & flag_sanitize_recover
-	   & SANITIZE_KERNEL_ADDRESS) == 0);
+  bool asan_check_optimize = (flag_sanitize & SANITIZE_ADDRESS) != 0;
 
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
     {
-- 
1.7.9.5