Add support to trace comparison instructions and switch statements
diff mbox

Message ID 234840fd-a06a-4dfd-a1c5-254e26144754.weixi.wwx@antfin.com
State New
Headers show

Commit Message

=?UTF-8?B?5ZC05r2N5rWgKOatpOW9vCk=?= July 10, 2017, 12:07 p.m. UTC
Hi

I write some codes to make gcc support comparison-guided fuzzing.
It is very like http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow .
With -fsanitize-coverage=trace-cmp the compiler will insert extra instrumentation around comparison instructions and switch statements.
I think it is useful for fuzzing.  :D

Patch is below, I may supply test cases later.

With Regards
Wish Wu

Comments

Jeff Law July 14, 2017, 7:37 a.m. UTC | #1
On 07/10/2017 06:07 AM, 吴潍浠(此彼) wrote:
> Hi
> 
> I write some codes to make gcc support comparison-guided fuzzing.
> It is very like http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow .
> With -fsanitize-coverage=trace-cmp the compiler will insert extra instrumentation around comparison instructions and switch statements.
> I think it is useful for fuzzing.  :D
> 
> Patch is below, I may supply test cases later.
Before anyone can really look at this code you'll need to get a
copyright assignment on file with the FSF.

See:
https://gcc.gnu.org/contribute.html

If you've already done this, please let me know and I'll confirm with
the FSF copyright clerk.

Jeff
David Edelsohn July 21, 2017, 1:14 p.m. UTC | #2
On Fri, Jul 21, 2017 at 1:38 AM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
> Hi Jeff
>
> I have signed the copyright assignment, and used the name 'Wish Wu' .
> Should I send you a copy of my assignment ?

Your assignment now is on file in the FSF Copyright Assignment list
where Jeff, I and other maintainers can see it.  We cannot accept
assurances from developers; please do not send copies of copyright
assignments.

Thanks, David

P.S. It normally is unnecessary to send email to both GCC and GCC
Patches mailing lists.
Ed Smith-Rowland via gcc-patches Aug. 30, 2017, 7:05 p.m. UTC | #3
On Sat, Aug 5, 2017 at 11:53 AM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
> Hi all
> Is it worth adding my codes to gcc ? Are there some steps I need to do ?
> Could somebody tell me the progress ?


FYI, we've mailed a Linux kernel change that uses this instrumentation:
https://groups.google.com/forum/#!topic/syzkaller/r0ARNVV-Bhg
Another reason to have this in gcc.

Can somebody from gcc maintainers take a look at this? Jakub?

Thanks


> Maybe there should be a project like libfuzzer to solve bugs in program.
>
> Wish Wu
> ------------------------------------------------------------------
> From:Wish Wu <weixi.wwx@antfin.com>
> Time:2017 Jul 21 (Fri) 13:38
> To:gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
> Cc:wishwu007 <wishwu007@gmail.com>
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> Hi Jeff
>
> I have signed the copyright assignment, and used the name 'Wish Wu' .
> Should I send you a copy of my assignment ?
>
> The attachment is my new patch with small changes.
> Codes are checked by ./contrib/check_GNU_style.sh, except some special files.
>
> With
>
> ------------------------------------------------------------------
> From:Jeff Law <law@redhat.com>
> Time:2017 Jul 14 (Fri) 15:37
> To:Wish Wu <weixi.wwx@antfin.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>
> Cc:wishwu007 <wishwu007@gmail.com>
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On 07/10/2017 06:07 AM, 吴潍浠(此彼) wrote:
>> Hi
>>
>> I write some codes to make gcc support comparison-guided fuzzing.
>> It is very like http://clang.llvm.org/docs/SanitizerCoverage.html#tracing-data-flow .
>> With -fsanitize-coverage=trace-cmp the compiler will insert extra instrumentation around comparison instructions and switch statements.
>> I think it is useful for fuzzing.  :D
>>
>> Patch is below, I may supply test cases later.
> Before anyone can really look at this code you'll need to get a
> copyright assignment on file with the FSF.
>
> See:
> https://gcc.gnu.org/contribute.html
>
> If you've already done this, please let me know and I'll confirm with
> the FSF copyright clerk.
>
> Jeff
Jakub Jelinek Sept. 1, 2017, 4:23 p.m. UTC | #4
On Fri, Jul 21, 2017 at 01:38:17PM +0800, 吴潍浠(此彼) wrote:
> Hi Jeff
> 
> I have signed the copyright assignment, and used the name 'Wish Wu' .
> Should I send you a copy of my assignment ?
> 
> The attachment is my new patch with small changes. 
> Codes are checked by ./contrib/check_GNU_style.sh, except some special files.

Please provide a ChangeLog entry, you can use ./contrib/mklog as a start.

@@ -975,6 +974,10 @@ fsanitize=
 Common Driver Report Joined
 Select what to sanitize.
 
+fsanitize-coverage=
+Common Driver Report Joined
+Select what to coverage sanitize.
+

Why Driver?  The reason fsanitize= needs it is that say for
-fsanitize=address we add libraries in the driver, etc., but that
isn't the case for the coverage, right?

--- gcc/flag-types.h	(revision 250199)
+++ gcc/flag-types.h	(working copy)
@@ -250,6 +250,14 @@ enum sanitize_code {
 				  | SANITIZE_BOUNDS_STRICT
 };
 
+/* Different trace modes.  */
+enum sanitize_coverage_code {
+  /* Trace PC.  */
+  SANITIZE_COV_TRACE_PC = 1UL << 0,
+  /* Trace Compare.  */
+  SANITIZE_COV_TRACE_CMP = 1UL << 1
+};

No need for UL suffixes, the reason sanitize_code uses them is
that it includes 1 << 16 and above and might be included even in target code
(for host we require 32-bit integers, for target code it might be just
16-bit).

--- gcc/opts.c	(revision 250199)
+++ gcc/opts.c	(working copy)
@@ -1519,6 +1519,17 @@ const struct sanitizer_opts_s sanitizer_opts[] =
   { NULL, 0U, 0UL, false }
 };
 
+/* -f{,no-}sanitize-coverage= suboptions.  */
+const struct sanitizer_opts_s coverage_sanitizer_opts[] =
+{
+#define SANITIZER_OPT(name, flags, recover) \
+    { #name, flags, sizeof #name - 1, recover }
+  SANITIZER_OPT (trace-pc, SANITIZE_COV_TRACE_PC, false),
+  SANITIZER_OPT (trace-cmp, SANITIZE_COV_TRACE_CMP, false),
+#undef SANITIZER_OPT
+  { NULL, 0U, 0UL, false }

No need to have the recover argument for the macro, just add false to it
(unless you want to use a different struct type that wouldn't even include
that member).

+/* Given ARG, an unrecognized coverage sanitizer option, return the best
+   matching coverage sanitizer option, or NULL if there isn't one.  */
+
+static const char *
+get_closest_coverage_sanitizer_option (const string_fragment &arg)
+{
+  best_match <const string_fragment &, const char*> bm (arg);
+  for (int i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
+    {
+      bm.consider (coverage_sanitizer_opts[i].name);
+    }

Body which contains just one line shouldn't be wrapped in {}s, just use
  for (int i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
    bm.consider (coverage_sanitizer_opts[i].name);

+unsigned int
+parse_coverage_sanitizer_options (const char *p, location_t loc,
+			 unsigned int flags, int value, bool complain)

Wrong formatting, unsigned int should go below const char *, like:

parse_coverage_sanitizer_options (const char *p, location_t loc,
				  unsigned int flags, int value, bool complain)

+{
+  while (*p != 0)
+    {
+      size_t len, i;
+      bool found = false;
+      const char *comma = strchr (p, ',');
+
+      if (comma == NULL)
+	len = strlen (p);
+      else
+	len = comma - p;
+      if (len == 0)
+	{
+	  p = comma + 1;
+	  continue;
+	}
+
+      /* Check to see if the string matches an option class name.  */
+      for (i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
+	if (len == coverage_sanitizer_opts[i].len
+	    && memcmp (p, coverage_sanitizer_opts[i].name, len) == 0)
+	  {
+	    if (value)
+	      flags |= coverage_sanitizer_opts[i].flag;
+	    else
+	      flags &= ~coverage_sanitizer_opts[i].flag;
+	    found = true;
+	    break;
+	  }
+
+      if (! found && complain)
+	{
+	  const char *hint
+	    = get_closest_coverage_sanitizer_option (string_fragment (p, len));
+
+	  if (hint)
+	    error_at (loc,
+		      "unrecognized argument to "
+		      "-f%ssanitize-coverage= option: %q.*s;"
+		      " did you mean %qs?",
+		      value ? "" : "no-",
+		      (int) len, p, hint);
+	  else
+	    error_at (loc,
+		      "unrecognized argument to "
+		      "-f%ssanitize-coverage= option: %q.*s",
+		      value ? "" : "no-",
+		      (int) len, p);
+	}
+
+      if (comma == NULL)
+	break;
+      p = comma + 1;
+    }
+  return flags;
+}

Though, looking at the above, it sounds like there is just way too much
duplication.  So, maybe better just use the parse_sanitizer_options
and get_closest_coverage_option functions for all of
-f{,no-}sanitize{,-recover,-coverage}= , add
  const struct sanitizer_opts_s *opts = sanitizer_opts;
  if (code == OPT_fsanitize_coverage_)
    opts = coverage_sanitizer_opts;
early in both functions, deal with the diagnostics (to print "-coverage"
when needed) and look if there is anything else that needs to be
conditionalized.

@@ -1943,6 +2032,12 @@ common_handle_option (struct gcc_options *opts,
 	  &= ~(SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT);
       break;
 
+    case OPT_fsanitize_coverage_:
+      opts->x_flag_sanitize_coverage
+	= parse_coverage_sanitizer_options (arg, loc,
+				   opts->x_flag_sanitize_coverage, value, true);

Wrong formatting, opts should go below arg.  But if you use
process_sanitizer_options as mentioned above, it will be a non-issue.
+      break;
+
     case OPT_O:
     case OPT_Os:
     case OPT_Ofast:
Index: gcc/sancov.c
===================================================================
--- gcc/sancov.c	(revision 250199)
+++ gcc/sancov.c	(working copy)
@@ -1,6 +1,7 @@
 /* Code coverage instrumentation for fuzzing.
    Copyright (C) 2015-2017 Free Software Foundation, Inc.
-   Contributed by Dmitry Vyukov <dvyukov@google.com>
+   Contributed by Dmitry Vyukov <dvyukov@google.com> and
+   Wish Wu <wishwu007@gmail.com>
 
 This file is part of GCC.
 
@@ -29,31 +30,202 @@ along with GCC; see the file COPYING3.  If not see
 #include "flags.h"
 #include "stmt.h"
 #include "gimple-iterator.h"
+#include "tree-core.h"

tree-core.h is already included by tree.h, why do you need to include it
explicitly?

 #include "tree-cfg.h"
 #include "tree-pass.h"
 #include "tree-iterator.h"
+#include "fold-const.h"
+#include "stringpool.h"
+#include "output.h"
+#include "cgraph.h"
 #include "asan.h"
 
 namespace {
 
+static void
+instrument_cond (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+  unsigned int bitno;
+  tree lhs = gimple_cond_lhs (stmt);
+  tree rhs = gimple_cond_rhs (stmt);
+
+  bitno = MAX (TYPE_PRECISION (TREE_TYPE (lhs)),
+	       TYPE_PRECISION (TREE_TYPE (rhs)));

1) this is C++, so you can mix declarations and code,
   in this case there are even just declarations before it,
   so you can use unsigned int bitno = ...;
2) better use prec than bitno for the variable name
3) MAX doesn't make sense, a GIMPLE_COND should have
   both operands of compatible types
4) TYPE_PRECISION is only meaningful for some types, and could
   mean something unrelated for some other types, I think you
   want to move it after the check for type kind
+
+  if (TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE)

What about ENUMERAL_TYPEs (and maybe BOOLEAN_TYPEs)?
Shouldn't they be handled like INTEGER_TYPE, i.e. use
  if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
?
+    {
+      enum built_in_function fncode;
+      switch (bitno)
+	{
+	case 8:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP1;
+	  break;
+
+	case 16:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP2;
+	  break;
+
+	case 32:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP4;
+	  break;
+
+	case 64:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP8;
+	  break;
+
+	default:
+	  return;
+	}
+      tree fndecl = builtin_decl_implicit (fncode);

For signed integers, when all these builtins have unsigned arguments
it looks like pedantically invalid GIMPLE IL, you'd need to
cast the arguments to corresponding unsigned type.

+      gimple *gcall = gimple_build_call (fndecl, 2, lhs, rhs);
+      gimple_set_location (gcall, gimple_location (stmt));
+      gsi_insert_before (gsi, gcall, GSI_SAME_STMT);
+    }
+  else if (TREE_CODE (TREE_TYPE (lhs)) == REAL_TYPE)

  else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (lhs))) ?

+    {
+      enum built_in_function fncode;
+      switch (bitno)
+	{
+	case 32:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPF;
+	  break;
+
+	case 64:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPD;
+	  break;

Not really sure it is a good idea to use a precision of
REAL_TYPE to decide which builtin to use.
I'd think it should be a check whether TYPE_MODE (TREE_TYPE (lhs))
is equal to TYPE_MODE (float_type_node) or TYPE_MODE (double_type_node).

+static void
+instrument_switch (gimple_stmt_iterator *gsi, gimple *stmt, function *fun)
+{
+  gswitch *switch_stmt = as_a<gswitch *> (stmt);
+  tree index = gimple_switch_index (switch_stmt);
+  unsigned bitno = TYPE_PRECISION (TREE_TYPE (index));

Again, please use prec instead of bitno.
Shouldn't you punt if prec > 64?

+  unsigned i, n = gimple_switch_num_labels (switch_stmt), num = 0;
+  for (i = 0; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+	  num++;

Wrong formatting, num++ should be just one tab indented, not tab + 2 spaces.

+      tree high_case = CASE_HIGH (label);
+      if (high_case != NULL_TREE)
+	  num++;

Is that really how do you want to handle CASE_HIGH non-NULL?  That is for
cases like:
  case 'a' ... 'z':
In that case it is actually like 26 cases, not 2.

+    }
+
+  tree case_array_elem_type = build_type_variant (uint64_type_node, 1, 0);
+  tree case_array_type = build_array_type (case_array_elem_type,
+					   build_index_type (size_int (num + 2
+								       - 1)));

Better wrap like:
  tree case_array_type
    = build_array_type (...);

+  for (i = 0; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+	CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+				build_int_cst (uint64_type_node,
+					       TREE_INT_CST_LOW (low_case)));

You don't want build_int_cst, instead use fold_convert (uint64_type_node, low_case);

+  /* Insert callback to every compare statments.  */
+  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_CMP)
+    {
+      FOR_EACH_BB_FN (bb, fun)
+	{
+	  gimple_stmt_iterator gsi;
+	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	    {
+	      gimple *stmt = gsi_stmt (gsi);
+	      switch (gimple_code (stmt))
+		{
+		case GIMPLE_COND:
+		  instrument_cond (&gsi, stmt);
+		  break;
+
+		case GIMPLE_SWITCH:
+		  instrument_switch (&gsi, stmt, fun);
+	    	  break;

If you are just looking for GIMPLE_COND and GIMPLE_SWITCH, then both of them
always satisfy stmt_ends_bb_p and thus you don't need to walk the whole bb
to find them; just look at the last stmt in the bb.

That said, the question is what you really want to instrument and why.
Consider simple:
void bar (void);
void
foo (int x)
{
  if (x == 21 || x == 64 || x == 98 || x == 135)
    bar ();
}
testcase, in GIMPLE IL that will be depending on branch cost e.g. on x86_64:
  <bb 2> [100.00%] [count: INV]:
  _1 = x_8(D) == 21;
  _2 = x_8(D) == 64;
  _3 = _1 | _2;
  if (_3 != 0)
    goto <bb 4>; [33.00%] [count: INV]
  else
    goto <bb 3>; [67.00%] [count: INV]

  <bb 3> [67.00%] [count: INV]:
  _4 = x_8(D) == 98;
  _5 = x_8(D) == 135;
  _6 = _4 | _5;
  if (_6 != 0)
    goto <bb 4>; [50.00%] [count: INV]
  else
    goto <bb 5>; [50.00%] [count: INV]

  <bb 4> [66.50%] [count: INV]:
  bar (); [tail call]

  <bb 5> [100.00%] [count: INV]:
  return;
but e.g. on powerpc64:
  <bb 2> [100.00%] [count: INV]:
  if (x_2(D) == 21)
    goto <bb 6>; [20.24%] [count: INV]
  else
    goto <bb 3>; [79.76%] [count: INV]

  <bb 3> [79.76%] [count: INV]:
  if (x_2(D) == 64)
    goto <bb 6>; [34.00%] [count: INV]
  else
    goto <bb 4>; [66.00%] [count: INV]

  <bb 4> [52.64%] [count: INV]:
  if (x_2(D) == 98)
    goto <bb 6>; [34.00%] [count: INV]
  else
    goto <bb 5>; [66.00%] [count: INV]

  <bb 5> [34.74%] [count: INV]:
  if (x_2(D) == 135)
    goto <bb 6>; [34.00%] [count: INV]
  else
    goto <bb 7>; [66.00%] [count: INV]

  <bb 6> [77.07%] [count: INV]:
  bar (); [tail call]

  <bb 7> [100.00%] [count: INV]:
  return;
So in the powerpc64 case where it is closer to the source you'd instrument
comparisons with the 4 constants, while for x86_64 you wouldn't instrument
anything at all (as you don't instrument GIMPLE_COND with BOOLEAN_TYPE
operands - those have precision 1 and aren't INTEGRAL_TYPE, and those
GIMPLE_CONDs in there are actually artificial anyway and the original
comparisons are GIMPLE_ASSIGNs with EQ_EXPR comparison code).
Do you trace or should you trace the comparison kind, or do you really want
to treat x == 64, x > 64, x < 64, x != 64 etc. the same?
And, should what you trace be tied to basic blocks at whose start you add
the __sanitizer_cov_trace_pc instrumentation for
-fsanitize-coverage=trace-pc or is that completely unrelated?

Perhaps for -fsanitize-coverage= it might be a good idea to force
LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
decisions mentioned above so that the IL is closer to what the user wrote.

	Jakub
Ed Smith-Rowland via gcc-patches Sept. 3, 2017, 8:50 a.m. UTC | #5
On Fri, Sep 1, 2017 at 6:23 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Jul 21, 2017 at 01:38:17PM +0800, 吴潍浠(此彼) wrote:
>> Hi Jeff
>>
>> I have signed the copyright assignment, and used the name 'Wish Wu' .
>> Should I send you a copy of my assignment ?
>>
>> The attachment is my new patch with small changes.
>> Codes are checked by ./contrib/check_GNU_style.sh, except some special files.
>
> Please provide a ChangeLog entry, you can use ./contrib/mklog as a start.
>
> @@ -975,6 +974,10 @@ fsanitize=
>  Common Driver Report Joined
>  Select what to sanitize.
>
> +fsanitize-coverage=
> +Common Driver Report Joined
> +Select what to coverage sanitize.
> +
>
> Why Driver?  The reason fsanitize= needs it is that say for
> -fsanitize=address we add libraries in the driver, etc., but that
> isn't the case for the coverage, right?


Yes, there is no compiler-provided library that provides
implementation of the emitted instrumentation. User is meant to
provide them (or, use a third-party fuzzer that provides them).



> --- gcc/flag-types.h    (revision 250199)
> +++ gcc/flag-types.h    (working copy)
> @@ -250,6 +250,14 @@ enum sanitize_code {
>                                   | SANITIZE_BOUNDS_STRICT
>  };
>
> +/* Different trace modes.  */
> +enum sanitize_coverage_code {
> +  /* Trace PC.  */
> +  SANITIZE_COV_TRACE_PC = 1UL << 0,
> +  /* Trace Compare.  */
> +  SANITIZE_COV_TRACE_CMP = 1UL << 1
> +};
>
> No need for UL suffixes, the reason sanitize_code uses them is
> that it includes 1 << 16 and above and might be included even in target code
> (for host we require 32-bit integers, for target code it might be just
> 16-bit).
>
> --- gcc/opts.c  (revision 250199)
> +++ gcc/opts.c  (working copy)
> @@ -1519,6 +1519,17 @@ const struct sanitizer_opts_s sanitizer_opts[] =
>    { NULL, 0U, 0UL, false }
>  };
>
> +/* -f{,no-}sanitize-coverage= suboptions.  */
> +const struct sanitizer_opts_s coverage_sanitizer_opts[] =
> +{
> +#define SANITIZER_OPT(name, flags, recover) \
> +    { #name, flags, sizeof #name - 1, recover }
> +  SANITIZER_OPT (trace-pc, SANITIZE_COV_TRACE_PC, false),
> +  SANITIZER_OPT (trace-cmp, SANITIZE_COV_TRACE_CMP, false),
> +#undef SANITIZER_OPT
> +  { NULL, 0U, 0UL, false }
>
> No need to have the recover argument for the macro, just add false to it
> (unless you want to use a different struct type that wouldn't even include
> that member).
>
> +/* Given ARG, an unrecognized coverage sanitizer option, return the best
> +   matching coverage sanitizer option, or NULL if there isn't one.  */
> +
> +static const char *
> +get_closest_coverage_sanitizer_option (const string_fragment &arg)
> +{
> +  best_match <const string_fragment &, const char*> bm (arg);
> +  for (int i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
> +    {
> +      bm.consider (coverage_sanitizer_opts[i].name);
> +    }
>
> Body which contains just one line shouldn't be wrapped in {}s, just use
>   for (int i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
>     bm.consider (coverage_sanitizer_opts[i].name);
>
> +unsigned int
> +parse_coverage_sanitizer_options (const char *p, location_t loc,
> +                        unsigned int flags, int value, bool complain)
>
> Wrong formatting, unsigned int should go below const char *, like:
>
> parse_coverage_sanitizer_options (const char *p, location_t loc,
>                                   unsigned int flags, int value, bool complain)
>
> +{
> +  while (*p != 0)
> +    {
> +      size_t len, i;
> +      bool found = false;
> +      const char *comma = strchr (p, ',');
> +
> +      if (comma == NULL)
> +       len = strlen (p);
> +      else
> +       len = comma - p;
> +      if (len == 0)
> +       {
> +         p = comma + 1;
> +         continue;
> +       }
> +
> +      /* Check to see if the string matches an option class name.  */
> +      for (i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
> +       if (len == coverage_sanitizer_opts[i].len
> +           && memcmp (p, coverage_sanitizer_opts[i].name, len) == 0)
> +         {
> +           if (value)
> +             flags |= coverage_sanitizer_opts[i].flag;
> +           else
> +             flags &= ~coverage_sanitizer_opts[i].flag;
> +           found = true;
> +           break;
> +         }
> +
> +      if (! found && complain)
> +       {
> +         const char *hint
> +           = get_closest_coverage_sanitizer_option (string_fragment (p, len));
> +
> +         if (hint)
> +           error_at (loc,
> +                     "unrecognized argument to "
> +                     "-f%ssanitize-coverage= option: %q.*s;"
> +                     " did you mean %qs?",
> +                     value ? "" : "no-",
> +                     (int) len, p, hint);
> +         else
> +           error_at (loc,
> +                     "unrecognized argument to "
> +                     "-f%ssanitize-coverage= option: %q.*s",
> +                     value ? "" : "no-",
> +                     (int) len, p);
> +       }
> +
> +      if (comma == NULL)
> +       break;
> +      p = comma + 1;
> +    }
> +  return flags;
> +}
>
> Though, looking at the above, it sounds like there is just way too much
> duplication.  So, maybe better just use the parse_sanitizer_options
> and get_closest_coverage_option functions for all of
> -f{,no-}sanitize{,-recover,-coverage}= , add
>   const struct sanitizer_opts_s *opts = sanitizer_opts;
>   if (code == OPT_fsanitize_coverage_)
>     opts = coverage_sanitizer_opts;
> early in both functions, deal with the diagnostics (to print "-coverage"
> when needed) and look if there is anything else that needs to be
> conditionalized.
>
> @@ -1943,6 +2032,12 @@ common_handle_option (struct gcc_options *opts,
>           &= ~(SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT);
>        break;
>
> +    case OPT_fsanitize_coverage_:
> +      opts->x_flag_sanitize_coverage
> +       = parse_coverage_sanitizer_options (arg, loc,
> +                                  opts->x_flag_sanitize_coverage, value, true);
>
> Wrong formatting, opts should go below arg.  But if you use
> process_sanitizer_options as mentioned above, it will be a non-issue.
> +      break;
> +
>      case OPT_O:
>      case OPT_Os:
>      case OPT_Ofast:
> Index: gcc/sancov.c
> ===================================================================
> --- gcc/sancov.c        (revision 250199)
> +++ gcc/sancov.c        (working copy)
> @@ -1,6 +1,7 @@
>  /* Code coverage instrumentation for fuzzing.
>     Copyright (C) 2015-2017 Free Software Foundation, Inc.
> -   Contributed by Dmitry Vyukov <dvyukov@google.com>
> +   Contributed by Dmitry Vyukov <dvyukov@google.com> and
> +   Wish Wu <wishwu007@gmail.com>
>
>  This file is part of GCC.
>
> @@ -29,31 +30,202 @@ along with GCC; see the file COPYING3.  If not see
>  #include "flags.h"
>  #include "stmt.h"
>  #include "gimple-iterator.h"
> +#include "tree-core.h"
>
> tree-core.h is already included by tree.h, why do you need to include it
> explicitly?
>
>  #include "tree-cfg.h"
>  #include "tree-pass.h"
>  #include "tree-iterator.h"
> +#include "fold-const.h"
> +#include "stringpool.h"
> +#include "output.h"
> +#include "cgraph.h"
>  #include "asan.h"
>
>  namespace {
>
> +static void
> +instrument_cond (gimple_stmt_iterator *gsi, gimple *stmt)
> +{
> +  unsigned int bitno;
> +  tree lhs = gimple_cond_lhs (stmt);
> +  tree rhs = gimple_cond_rhs (stmt);
> +
> +  bitno = MAX (TYPE_PRECISION (TREE_TYPE (lhs)),
> +              TYPE_PRECISION (TREE_TYPE (rhs)));
>
> 1) this is C++, so you can mix declarations and code,
>    in this case there are even just declarations before it,
>    so you can use unsigned int bitno = ...;
> 2) better use prec than bitno for the variable name
> 3) MAX doesn't make sense, a GIMPLE_COND should have
>    both operands of compatible types
> 4) TYPE_PRECISION is only meaningful for some types, and could
>    mean something unrelated for some other types, I think you
>    want to move it after the check for type kind
> +
> +  if (TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE)
>
> What about ENUMERAL_TYPEs (and maybe BOOLEAN_TYPEs)?
> Shouldn't they be handled like INTEGER_TYPE, i.e. use
>   if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
> ?
> +    {
> +      enum built_in_function fncode;
> +      switch (bitno)
> +       {
> +       case 8:
> +         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP1;
> +         break;
> +
> +       case 16:
> +         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP2;
> +         break;
> +
> +       case 32:
> +         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP4;
> +         break;
> +
> +       case 64:
> +         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP8;
> +         break;
> +
> +       default:
> +         return;
> +       }
> +      tree fndecl = builtin_decl_implicit (fncode);
>
> For signed integers, when all these builtins have unsigned arguments
> it looks like pedantically invalid GIMPLE IL, you'd need to
> cast the arguments to corresponding unsigned type.
>
> +      gimple *gcall = gimple_build_call (fndecl, 2, lhs, rhs);
> +      gimple_set_location (gcall, gimple_location (stmt));
> +      gsi_insert_before (gsi, gcall, GSI_SAME_STMT);
> +    }
> +  else if (TREE_CODE (TREE_TYPE (lhs)) == REAL_TYPE)
>
>   else if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (lhs))) ?
>
> +    {
> +      enum built_in_function fncode;
> +      switch (bitno)
> +       {
> +       case 32:
> +         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPF;
> +         break;
> +
> +       case 64:
> +         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPD;
> +         break;
>
> Not really sure it is a good idea to use a precision of
> REAL_TYPE to decide which builtin to use.
> I'd think it should be a check whether TYPE_MODE (TREE_TYPE (lhs))
> is equal to TYPE_MODE (float_type_node) or TYPE_MODE (double_type_node).
>
> +static void
> +instrument_switch (gimple_stmt_iterator *gsi, gimple *stmt, function *fun)
> +{
> +  gswitch *switch_stmt = as_a<gswitch *> (stmt);
> +  tree index = gimple_switch_index (switch_stmt);
> +  unsigned bitno = TYPE_PRECISION (TREE_TYPE (index));
>
> Again, please use prec instead of bitno.
> Shouldn't you punt if prec > 64?
>
> +  unsigned i, n = gimple_switch_num_labels (switch_stmt), num = 0;
> +  for (i = 0; i < n; ++i)
> +    {
> +      tree label = gimple_switch_label (switch_stmt, i);
> +      tree low_case = CASE_LOW (label);
> +      if (low_case != NULL_TREE)
> +         num++;
>
> Wrong formatting, num++ should be just one tab indented, not tab + 2 spaces.
>
> +      tree high_case = CASE_HIGH (label);
> +      if (high_case != NULL_TREE)
> +         num++;
>
> Is that really how do you want to handle CASE_HIGH non-NULL?  That is for
> cases like:
>   case 'a' ... 'z':
> In that case it is actually like 26 cases, not 2.
>
> +    }
> +
> +  tree case_array_elem_type = build_type_variant (uint64_type_node, 1, 0);
> +  tree case_array_type = build_array_type (case_array_elem_type,
> +                                          build_index_type (size_int (num + 2
> +                                                                      - 1)));
>
> Better wrap like:
>   tree case_array_type
>     = build_array_type (...);
>
> +  for (i = 0; i < n; ++i)
> +    {
> +      tree label = gimple_switch_label (switch_stmt, i);
> +
> +      tree low_case = CASE_LOW (label);
> +      if (low_case != NULL_TREE)
> +       CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
> +                               build_int_cst (uint64_type_node,
> +                                              TREE_INT_CST_LOW (low_case)));
>
> You don't want build_int_cst, instead use fold_convert (uint64_type_node, low_case);
>
> +  /* Insert callback to every compare statments.  */
> +  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_CMP)
> +    {
> +      FOR_EACH_BB_FN (bb, fun)
> +       {
> +         gimple_stmt_iterator gsi;
> +         for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +           {
> +             gimple *stmt = gsi_stmt (gsi);
> +             switch (gimple_code (stmt))
> +               {
> +               case GIMPLE_COND:
> +                 instrument_cond (&gsi, stmt);
> +                 break;
> +
> +               case GIMPLE_SWITCH:
> +                 instrument_switch (&gsi, stmt, fun);
> +                 break;
>
> If you are just looking for GIMPLE_COND and GIMPLE_SWITCH, then both of them
> always satisfy stmt_ends_bb_p and thus you don't need to walk the whole bb
> to find them; just look at the last stmt in the bb.
>
> That said, the question is what you really want to instrument and why.
> Consider simple:
> void bar (void);
> void
> foo (int x)
> {
>   if (x == 21 || x == 64 || x == 98 || x == 135)
>     bar ();
> }
> testcase, in GIMPLE IL that will be depending on branch cost e.g. on x86_64:
>   <bb 2> [100.00%] [count: INV]:
>   _1 = x_8(D) == 21;
>   _2 = x_8(D) == 64;
>   _3 = _1 | _2;
>   if (_3 != 0)
>     goto <bb 4>; [33.00%] [count: INV]
>   else
>     goto <bb 3>; [67.00%] [count: INV]
>
>   <bb 3> [67.00%] [count: INV]:
>   _4 = x_8(D) == 98;
>   _5 = x_8(D) == 135;
>   _6 = _4 | _5;
>   if (_6 != 0)
>     goto <bb 4>; [50.00%] [count: INV]
>   else
>     goto <bb 5>; [50.00%] [count: INV]
>
>   <bb 4> [66.50%] [count: INV]:
>   bar (); [tail call]
>
>   <bb 5> [100.00%] [count: INV]:
>   return;
> but e.g. on powerpc64:
>   <bb 2> [100.00%] [count: INV]:
>   if (x_2(D) == 21)
>     goto <bb 6>; [20.24%] [count: INV]
>   else
>     goto <bb 3>; [79.76%] [count: INV]
>
>   <bb 3> [79.76%] [count: INV]:
>   if (x_2(D) == 64)
>     goto <bb 6>; [34.00%] [count: INV]
>   else
>     goto <bb 4>; [66.00%] [count: INV]
>
>   <bb 4> [52.64%] [count: INV]:
>   if (x_2(D) == 98)
>     goto <bb 6>; [34.00%] [count: INV]
>   else
>     goto <bb 5>; [66.00%] [count: INV]
>
>   <bb 5> [34.74%] [count: INV]:
>   if (x_2(D) == 135)
>     goto <bb 6>; [34.00%] [count: INV]
>   else
>     goto <bb 7>; [66.00%] [count: INV]
>
>   <bb 6> [77.07%] [count: INV]:
>   bar (); [tail call]
>
>   <bb 7> [100.00%] [count: INV]:
>   return;
> So in the powerpc64 case where it is closer to the source you'd instrument
> comparisons with the 4 constants, while for x86_64 you wouldn't instrument
> anything at all (as you don't instrument GIMPLE_COND with BOOLEAN_TYPE
> operands - those have precision 1 and aren't INTEGRAL_TYPE, and those
> GIMPLE_CONDs in there are actually artificial anyway and the original
> comparisons are GIMPLE_ASSIGNs with EQ_EXPR comparison code).


What we instrument in LLVM is _comparisons_ rather than control
structures. So that would be:
    _4 = x_8(D) == 98;
For example, result of the comparison can be stored into a bool struct
field, and then used in branching long time after. We still want to
intercept this comparison.


> Do you trace or should you trace the comparison kind, or do you really want
> to treat x == 64, x > 64, x < 64, x != 64 etc. the same?
> And, should what you trace be tied to basic blocks at whose start you add
> the __sanitizer_cov_trace_pc instrumentation for
> -fsanitize-coverage=trace-pc or is that completely unrelated?

We don't trace comparison kind in LLVM (did not find any use for it)
and it can't be passed to the current API. So, probably, not.
I would say it is unrelated to trace pc instrumentation.


> Perhaps for -fsanitize-coverage= it might be a good idea to force
> LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
> decisions mentioned above so that the IL is closer to what the user wrote.

If we recurse down to comparison operations and instrument them, this
will not be so important, right?


Wish, do you plan to update this patch?
Jakub Jelinek Sept. 3, 2017, 10:01 a.m. UTC | #6
On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
> What we instrument in LLVM is _comparisons_ rather than control
> structures. So that would be:
>     _4 = x_8(D) == 98;
> For example, result of the comparison can be stored into a bool struct
> field, and then used in branching long time after. We still want to
> intercept this comparison.

Then we need to instrument not just GIMPLE_COND, which is the stmt
where the comparison decides to which of the two basic block successors to
jump, but also GIMPLE_ASSIGN with tcc_comparison class
gimple_assign_rhs_code (the comparison above), and maybe also
GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
  _4 = x_1 == y_2 ? 23 : _3;
).

> > Perhaps for -fsanitize-coverage= it might be a good idea to force
> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
> > decisions mentioned above so that the IL is closer to what the user wrote.
> 
> If we recurse down to comparison operations and instrument them, this
> will not be so important, right?

Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
then you don't handle many comparisons from the source code.  And if you
handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
By pretending small branch cost for the tracing case you get fewer
artificial comparisons.

	Jakub
Ed Smith-Rowland via gcc-patches Sept. 3, 2017, 10:19 a.m. UTC | #7
On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>> What we instrument in LLVM is _comparisons_ rather than control
>> structures. So that would be:
>>     _4 = x_8(D) == 98;
>> For example, result of the comparison can be stored into a bool struct
>> field, and then used in branching long time after. We still want to
>> intercept this comparison.
>
> Then we need to instrument not just GIMPLE_COND, which is the stmt
> where the comparison decides to which of the two basic block successors to
> jump, but also GIMPLE_ASSIGN with tcc_comparison class
> gimple_assign_rhs_code (the comparison above), and maybe also
> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>   _4 = x_1 == y_2 ? 23 : _3;
> ).
>
>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>
>> If we recurse down to comparison operations and instrument them, this
>> will not be so important, right?
>
> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
> then you don't handle many comparisons from the source code.  And if you
> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
> By pretending small branch cost for the tracing case you get fewer
> artificial comparisons.


Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
needs to be ignored entirely, there is just like 2 combinations of
possible values.
If not, then what it is? Is it a dup of previous comparisons?

I am not saying these modes should not be enabled. You know much
better. I just wanted to point that that integer comparisons is what
we should be handling.

Your example:

  _1 = x_8(D) == 21;
  _2 = x_8(D) == 64;
  _3 = _1 | _2;
  if (_3 != 0)

raises another point. Most likely we don't want to see speculative
comparisons. At least not yet (we will see them once we get through
the first comparison). So that may be another reason to enable these
modes (make compiler stick closer to original code).
Ed Smith-Rowland via gcc-patches Sept. 3, 2017, 10:21 a.m. UTC | #8
On Sun, Sep 3, 2017 at 12:19 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>>> What we instrument in LLVM is _comparisons_ rather than control
>>> structures. So that would be:
>>>     _4 = x_8(D) == 98;
>>> For example, result of the comparison can be stored into a bool struct
>>> field, and then used in branching long time after. We still want to
>>> intercept this comparison.
>>
>> Then we need to instrument not just GIMPLE_COND, which is the stmt
>> where the comparison decides to which of the two basic block successors to
>> jump, but also GIMPLE_ASSIGN with tcc_comparison class
>> gimple_assign_rhs_code (the comparison above), and maybe also
>> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>>   _4 = x_1 == y_2 ? 23 : _3;
>> ).
>>
>>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>>
>>> If we recurse down to comparison operations and instrument them, this
>>> will not be so important, right?
>>
>> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
>> then you don't handle many comparisons from the source code.  And if you
>> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
>> By pretending small branch cost for the tracing case you get fewer
>> artificial comparisons.
>
>
> Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
> needs to be ignored entirely, there is just like 2 combinations of
> possible values.
> If not, then what it is? Is it a dup of previous comparisons?
>
> I am not saying these modes should not be enabled. You know much
> better. I just wanted to point that that integer comparisons is what
> we should be handling.
>
> Your example:
>
>   _1 = x_8(D) == 21;
>   _2 = x_8(D) == 64;
>   _3 = _1 | _2;
>   if (_3 != 0)
>
> raises another point. Most likely we don't want to see speculative
> comparisons. At least not yet (we will see them once we get through
> the first comparison). So that may be another reason to enable these
> modes (make compiler stick closer to original code).

Wait, it is not speculative in this case as branch is on _1 | _2. But
still, it just makes it harder for fuzzer to get through as it needs
to guess both values at the same time rather then doing incremental
progress.
=?UTF-8?B?5ZC05r2N5rWgKOatpOW9vCk=?= Sept. 3, 2017, 10:38 a.m. UTC | #9
Hi
I will update the patch according to your requirements, and with some my suggestions.
It will take me one or two days.

Wish Wu

------------------------------------------------------------------
From:Dmitry Vyukov <dvyukov@google.com>
Time:2017 Sep 3 (Sun) 18:21
To:Jakub Jelinek <jakub@redhat.com>
Cc:Wish Wu <weixi.wwx@antfin.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
Subject:Re: Add support to trace comparison instructions and switch statements


On Sun, Sep 3, 2017 at 12:19 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>>> What we instrument in LLVM is _comparisons_ rather than control
>>> structures. So that would be:
>>>     _4 = x_8(D) == 98;
>>> For example, result of the comparison can be stored into a bool struct
>>> field, and then used in branching long time after. We still want to
>>> intercept this comparison.
>>
>> Then we need to instrument not just GIMPLE_COND, which is the stmt
>> where the comparison decides to which of the two basic block successors to
>> jump, but also GIMPLE_ASSIGN with tcc_comparison class
>> gimple_assign_rhs_code (the comparison above), and maybe also
>> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>>   _4 = x_1 == y_2 ? 23 : _3;
>> ).
>>
>>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>>
>>> If we recurse down to comparison operations and instrument them, this
>>> will not be so important, right?
>>
>> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
>> then you don't handle many comparisons from the source code.  And if you
>> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
>> By pretending small branch cost for the tracing case you get fewer
>> artificial comparisons.
>
>
> Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
> needs to be ignored entirely, there is just like 2 combinations of
> possible values.
> If not, then what it is? Is it a dup of previous comparisons?
>
> I am not saying these modes should not be enabled. You know much
> better. I just wanted to point that that integer comparisons is what
> we should be handling.
>
> Your example:
>
>   _1 = x_8(D) == 21;
>   _2 = x_8(D) == 64;
>   _3 = _1 | _2;
>   if (_3 != 0)
>
> raises another point. Most likely we don't want to see speculative
> comparisons. At least not yet (we will see them once we get through
> the first comparison). So that may be another reason to enable these
> modes (make compiler stick closer to original code).

Wait, it is not speculative in this case as branch is on _1 | _2. But
still, it just makes it harder for fuzzer to get through as it needs
to guess both values at the same time rather then doing incremental
progress.
Ed Smith-Rowland via gcc-patches Sept. 3, 2017, 11:04 a.m. UTC | #10
On Sun, Sep 3, 2017 at 12:38 PM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
> Hi
> I will update the patch according to your requirements, and with some my suggestions.
> It will take me one or two days.

Thanks! No hurry, just wanted to make sure you still want to pursue this.

> Wish Wu
>
> ------------------------------------------------------------------
> From:Dmitry Vyukov <dvyukov@google.com>
> Time:2017 Sep 3 (Sun) 18:21
> To:Jakub Jelinek <jakub@redhat.com>
> Cc:Wish Wu <weixi.wwx@antfin.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Sun, Sep 3, 2017 at 12:19 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>>>> What we instrument in LLVM is _comparisons_ rather than control
>>>> structures. So that would be:
>>>>     _4 = x_8(D) == 98;
>>>> For example, result of the comparison can be stored into a bool struct
>>>> field, and then used in branching long time after. We still want to
>>>> intercept this comparison.
>>>
>>> Then we need to instrument not just GIMPLE_COND, which is the stmt
>>> where the comparison decides to which of the two basic block successors to
>>> jump, but also GIMPLE_ASSIGN with tcc_comparison class
>>> gimple_assign_rhs_code (the comparison above), and maybe also
>>> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>>>   _4 = x_1 == y_2 ? 23 : _3;
>>> ).
>>>
>>>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>>>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>>>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>>>
>>>> If we recurse down to comparison operations and instrument them, this
>>>> will not be so important, right?
>>>
>>> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
>>> then you don't handle many comparisons from the source code.  And if you
>>> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
>>> By pretending small branch cost for the tracing case you get fewer
>>> artificial comparisons.
>>
>>
>> Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
>> needs to be ignored entirely, there is just like 2 combinations of
>> possible values.
>> If not, then what it is? Is it a dup of previous comparisons?
>>
>> I am not saying these modes should not be enabled. You know much
>> better. I just wanted to point that that integer comparisons is what
>> we should be handling.
>>
>> Your example:
>>
>>   _1 = x_8(D) == 21;
>>   _2 = x_8(D) == 64;
>>   _3 = _1 | _2;
>>   if (_3 != 0)
>>
>> raises another point. Most likely we don't want to see speculative
>> comparisons. At least not yet (we will see them once we get through
>> the first comparison). So that may be another reason to enable these
>> modes (make compiler stick closer to original code).
>
> Wait, it is not speculative in this case as branch is on _1 | _2. But
> still, it just makes it harder for fuzzer to get through as it needs
> to guess both values at the same time rather then doing incremental
> progress.
=?UTF-8?B?5ZC05r2N5rWgKOatpOW9vCk=?= Sept. 4, 2017, 1:17 p.m. UTC | #11
Hi
I updated the patch and put it in attachment.

gcc/ChangeLog:                                                     

2017-09-04  Wish Wu  <wishwu007@gmail.com>                         

        * asan.c (initialize_sanitizer_builtins):                  
        * builtin-types.def (BT_FN_VOID_UINT8_UINT8):              
        (BT_FN_VOID_UINT16_UINT16):                                
        (BT_FN_VOID_UINT32_UINT32):                                
        (BT_FN_VOID_FLOAT_FLOAT):                                  
        (BT_FN_VOID_DOUBLE_DOUBLE):                                
        (BT_FN_VOID_UINT64_PTR):                                   
        * common.opt:                                              
        * flag-types.h (enum sanitize_coverage_code):              
        * opts.c (COVERAGE_SANITIZER_OPT):                         
        (get_closest_sanitizer_option):                            
        (parse_sanitizer_options):                                 
        (common_handle_option):                                    
        * sancov.c (instrument_cond):                              
        (instrument_switch):                                       
        (sancov_pass):                                             
        * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):       
        (BUILT_IN_SANITIZER_COV_TRACE_CMP2):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMP4):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMP8):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMPF):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMPD):                       
        (BUILT_IN_SANITIZER_COV_TRACE_SWITCH):                     

gcc/testsuite/ChangeLog:                                           

2017-09-04  Wish Wu  <wishwu007@gmail.com>                         

        * gcc.dg/sancov/basic3.c: New test.                        

I think the aim of "trace-cmp" is finding reasonable values in runtime, playing approximate tricks when fuzzing.
We don't need to save all of values from low_case to high_case, there may be too much values and wasting resource.
For code :
void bar (void);
void
foo (int x)
{
  if (x == 21 || x == 64 || x == 98 || x == 135)
    bar ();
}
GIMPLE IL on x86_64:
  1 
  2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, symbol_order=0)
  3 
  4 foo (int x)
  5 {
  6   unsigned int _5;
  7   unsigned int _6;
  8   unsigned int _7;
  9   unsigned int _8;
 10   unsigned int _9;
 11   unsigned int _10;
 12   unsigned int _11;
 13   unsigned int _12;
 14 
 15   <bb 2> [0.00%] [count: INV]:
 16   _5 = (unsigned int) x_2(D);
 17   _6 = (unsigned int) 21;
 18   __builtin___sanitizer_cov_trace_cmp4 (_5, _6);
 19   if (x_2(D) == 21)
 20     goto <bb 6>; [INV] [count: INV]
 21   else
 22     goto <bb 3>; [INV] [count: INV]
 23 
 24   <bb 3> [0.00%] [count: INV]:
 25   _7 = (unsigned int) x_2(D);
 26   _8 = (unsigned int) 64;
 27   __builtin___sanitizer_cov_trace_cmp4 (_7, _8);
 28   if (x_2(D) == 64)
 29     goto <bb 6>; [INV] [count: INV]
 30   else
 31     goto <bb 4>; [INV] [count: INV]
 32 
 33   <bb 4> [0.00%] [count: INV]:
 34   _9 = (unsigned int) x_2(D);
 35   _10 = (unsigned int) 98;
 36   __builtin___sanitizer_cov_trace_cmp4 (_9, _10);
 37   if (x_2(D) == 98)
 38     goto <bb 6>; [INV] [count: INV]
 39   else
 40     goto <bb 5>; [INV] [count: INV]
 41 
 42   <bb 5> [0.00%] [count: INV]:
 43   _11 = (unsigned int) x_2(D);
 44   _12 = (unsigned int) 135;
 45   __builtin___sanitizer_cov_trace_cmp4 (_11, _12);
 46   if (x_2(D) == 135)
 47     goto <bb 6>; [INV] [count: INV]
 48   else
 49     goto <bb 7>; [INV] [count: INV]
 50 
 51   <bb 6> [0.00%] [count: INV]:
 52   bar ();
 53 
 54   <bb 7> [0.00%] [count: INV]:
 55   return;
 56 
 57 }
 58 
 59
It actually catches reasonable and meaningful values. 
Maybe we can improve it in the future for tracing all of expression for comparison.

Wish Wu
------------------------------------------------------------------
From:Dmitry Vyukov <dvyukov@google.com>
Time:2017 Sep 3 (Sun) 19:05
To:Wish Wu <weixi.wwx@antfin.com>
Cc:Jakub Jelinek <jakub@redhat.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
Subject:Re: Add support to trace comparison instructions and switch statements


On Sun, Sep 3, 2017 at 12:38 PM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
> Hi
> I will update the patch according to your requirements, and with some my suggestions.
> It will take me one or two days.

Thanks! No hurry, just wanted to make sure you still want to pursue this.

> Wish Wu
>
> ------------------------------------------------------------------
> From:Dmitry Vyukov <dvyukov@google.com>
> Time:2017 Sep 3 (Sun) 18:21
> To:Jakub Jelinek <jakub@redhat.com>
> Cc:Wish Wu <weixi.wwx@antfin.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Sun, Sep 3, 2017 at 12:19 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>>>> What we instrument in LLVM is _comparisons_ rather than control
>>>> structures. So that would be:
>>>>     _4 = x_8(D) == 98;
>>>> For example, result of the comparison can be stored into a bool struct
>>>> field, and then used in branching long time after. We still want to
>>>> intercept this comparison.
>>>
>>> Then we need to instrument not just GIMPLE_COND, which is the stmt
>>> where the comparison decides to which of the two basic block successors to
>>> jump, but also GIMPLE_ASSIGN with tcc_comparison class
>>> gimple_assign_rhs_code (the comparison above), and maybe also
>>> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>>>   _4 = x_1 == y_2 ? 23 : _3;
>>> ).
>>>
>>>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>>>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>>>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>>>
>>>> If we recurse down to comparison operations and instrument them, this
>>>> will not be so important, right?
>>>
>>> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
>>> then you don't handle many comparisons from the source code.  And if you
>>> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
>>> By pretending small branch cost for the tracing case you get fewer
>>> artificial comparisons.
>>
>>
>> Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
>> needs to be ignored entirely, there is just like 2 combinations of
>> possible values.
>> If not, then what it is? Is it a dup of previous comparisons?
>>
>> I am not saying these modes should not be enabled. You know much
>> better. I just wanted to point that that integer comparisons is what
>> we should be handling.
>>
>> Your example:
>>
>>   _1 = x_8(D) == 21;
>>   _2 = x_8(D) == 64;
>>   _3 = _1 | _2;
>>   if (_3 != 0)
>>
>> raises another point. Most likely we don't want to see speculative
>> comparisons. At least not yet (we will see them once we get through
>> the first comparison). So that may be another reason to enable these
>> modes (make compiler stick closer to original code).
>
> Wait, it is not speculative in this case as branch is on _1 | _2. But
> still, it just makes it harder for fuzzer to get through as it needs
> to guess both values at the same time rather then doing incremental
> progress.
=?UTF-8?B?5ZC05r2N5rWgKOatpOW9vCk=?= Sept. 4, 2017, 1:36 p.m. UTC | #12
Sorry. I made a terrible bug in it.
Attachment is my new patch.

Wish Wu

------------------------------------------------------------------
From:Wish Wu <weixi.wwx@antfin.com>
Time:2017 Sep 4 (Mon) 21:17
To:Dmitry Vyukov <dvyukov@google.com>; Jakub Jelinek <jakub@redhat.com>
Cc:gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
Subject:Re: Add support to trace comparison instructions and switch statements


Hi
I updated the patch and put it in attachment.

gcc/ChangeLog:                                                     

2017-09-04  Wish Wu  <wishwu007@gmail.com>                         

        * asan.c (initialize_sanitizer_builtins):                  
        * builtin-types.def (BT_FN_VOID_UINT8_UINT8):              
        (BT_FN_VOID_UINT16_UINT16):                                
        (BT_FN_VOID_UINT32_UINT32):                                
        (BT_FN_VOID_FLOAT_FLOAT):                                  
        (BT_FN_VOID_DOUBLE_DOUBLE):                                
        (BT_FN_VOID_UINT64_PTR):                                   
        * common.opt:                                              
        * flag-types.h (enum sanitize_coverage_code):              
        * opts.c (COVERAGE_SANITIZER_OPT):                         
        (get_closest_sanitizer_option):                            
        (parse_sanitizer_options):                                 
        (common_handle_option):                                    
        * sancov.c (instrument_cond):                              
        (instrument_switch):                                       
        (sancov_pass):                                             
        * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):       
        (BUILT_IN_SANITIZER_COV_TRACE_CMP2):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMP4):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMP8):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMPF):                       
        (BUILT_IN_SANITIZER_COV_TRACE_CMPD):                       
        (BUILT_IN_SANITIZER_COV_TRACE_SWITCH):                     

gcc/testsuite/ChangeLog:                                           

2017-09-04  Wish Wu  <wishwu007@gmail.com>                         

        * gcc.dg/sancov/basic3.c: New test.                        

I think the aim of "trace-cmp" is finding reasonable values in runtime, playing approximate tricks when fuzzing.
We don't need to save all of values from low_case to high_case, there may be too much values and wasting resource.
For code :
void bar (void);
void
foo (int x)
{
  if (x == 21 || x == 64 || x == 98 || x == 135)
    bar ();
}
GIMPLE IL on x86_64:
  1 
  2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, symbol_order=0)
  3 
  4 foo (int x)
  5 {
  6   unsigned int _5;
  7   unsigned int _6;
  8   unsigned int _7;
  9   unsigned int _8;
 10   unsigned int _9;
 11   unsigned int _10;
 12   unsigned int _11;
 13   unsigned int _12;
 14 
 15   <bb 2> [0.00%] [count: INV]:
 16   _5 = (unsigned int) x_2(D);
 17   _6 = (unsigned int) 21;
 18   __builtin___sanitizer_cov_trace_cmp4 (_5, _6);
 19   if (x_2(D) == 21)
 20     goto <bb 6>; [INV] [count: INV]
 21   else
 22     goto <bb 3>; [INV] [count: INV]
 23 
 24   <bb 3> [0.00%] [count: INV]:
 25   _7 = (unsigned int) x_2(D);
 26   _8 = (unsigned int) 64;
 27   __builtin___sanitizer_cov_trace_cmp4 (_7, _8);
 28   if (x_2(D) == 64)
 29     goto <bb 6>; [INV] [count: INV]
 30   else
 31     goto <bb 4>; [INV] [count: INV]
 32 
 33   <bb 4> [0.00%] [count: INV]:
 34   _9 = (unsigned int) x_2(D);
 35   _10 = (unsigned int) 98;
 36   __builtin___sanitizer_cov_trace_cmp4 (_9, _10);
 37   if (x_2(D) == 98)
 38     goto <bb 6>; [INV] [count: INV]
 39   else
 40     goto <bb 5>; [INV] [count: INV]
 41 
 42   <bb 5> [0.00%] [count: INV]:
 43   _11 = (unsigned int) x_2(D);
 44   _12 = (unsigned int) 135;
 45   __builtin___sanitizer_cov_trace_cmp4 (_11, _12);
 46   if (x_2(D) == 135)
 47     goto <bb 6>; [INV] [count: INV]
 48   else
 49     goto <bb 7>; [INV] [count: INV]
 50 
 51   <bb 6> [0.00%] [count: INV]:
 52   bar ();
 53 
 54   <bb 7> [0.00%] [count: INV]:
 55   return;
 56 
 57 }
 58 
 59
It actually catches reasonable and meaningful values. 
Maybe we can improve it in the future for tracing all of expression for comparison.

Wish Wu
------------------------------------------------------------------
From:Dmitry Vyukov <dvyukov@google.com>
Time:2017 Sep 3 (Sun) 19:05
To:Wish Wu <weixi.wwx@antfin.com>
Cc:Jakub Jelinek <jakub@redhat.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
Subject:Re: Add support to trace comparison instructions and switch statements


On Sun, Sep 3, 2017 at 12:38 PM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
> Hi
> I will update the patch according to your requirements, and with some my suggestions.
> It will take me one or two days.

Thanks! No hurry, just wanted to make sure you still want to pursue this.

> Wish Wu
>
> ------------------------------------------------------------------
> From:Dmitry Vyukov <dvyukov@google.com>
> Time:2017 Sep 3 (Sun) 18:21
> To:Jakub Jelinek <jakub@redhat.com>
> Cc:Wish Wu <weixi.wwx@antfin.com>; gcc <gcc@gcc.gnu.org>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Sun, Sep 3, 2017 at 12:19 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> On Sun, Sep 3, 2017 at 12:01 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>>> On Sun, Sep 03, 2017 at 10:50:16AM +0200, Dmitry Vyukov wrote:
>>>> What we instrument in LLVM is _comparisons_ rather than control
>>>> structures. So that would be:
>>>>     _4 = x_8(D) == 98;
>>>> For example, result of the comparison can be stored into a bool struct
>>>> field, and then used in branching long time after. We still want to
>>>> intercept this comparison.
>>>
>>> Then we need to instrument not just GIMPLE_COND, which is the stmt
>>> where the comparison decides to which of the two basic block successors to
>>> jump, but also GIMPLE_ASSIGN with tcc_comparison class
>>> gimple_assign_rhs_code (the comparison above), and maybe also
>>> GIMPLE_ASSIGN with COND_EXPR comparison code (that is say
>>>   _4 = x_1 == y_2 ? 23 : _3;
>>> ).
>>>
>>>> > Perhaps for -fsanitize-coverage= it might be a good idea to force
>>>> > LOGICAL_OP_NON_SHORT_CIRCUIT/BRANCH_COST or whatever affects GIMPLE
>>>> > decisions mentioned above so that the IL is closer to what the user wrote.
>>>>
>>>> If we recurse down to comparison operations and instrument them, this
>>>> will not be so important, right?
>>>
>>> Well, if you just handle tcc_comparison GIMPLE_ASSIGN and not GIMPLE_COND,
>>> then you don't handle many comparisons from the source code.  And if you
>>> handle both, some of the GIMPLE_CONDs might be just artificial comparisons.
>>> By pretending small branch cost for the tracing case you get fewer
>>> artificial comparisons.
>>
>>
>> Are these artificial comparisons on BOOLEAN_TYPE? I think BOOLEAN_TYPE
>> needs to be ignored entirely, there is just like 2 combinations of
>> possible values.
>> If not, then what it is? Is it a dup of previous comparisons?
>>
>> I am not saying these modes should not be enabled. You know much
>> better. I just wanted to point that that integer comparisons is what
>> we should be handling.
>>
>> Your example:
>>
>>   _1 = x_8(D) == 21;
>>   _2 = x_8(D) == 64;
>>   _3 = _1 | _2;
>>   if (_3 != 0)
>>
>> raises another point. Most likely we don't want to see speculative
>> comparisons. At least not yet (we will see them once we get through
>> the first comparison). So that may be another reason to enable these
>> modes (make compiler stick closer to original code).
>
> Wait, it is not speculative in this case as branch is on _1 | _2. But
> still, it just makes it harder for fuzzer to get through as it needs
> to guess both values at the same time rather then doing incremental
> progress.
Jakub Jelinek Sept. 4, 2017, 5:34 p.m. UTC | #13
On Mon, Sep 04, 2017 at 09:36:40PM +0800, 吴潍浠(此彼) wrote:
> gcc/ChangeLog:                                                     
> 
> 2017-09-04  Wish Wu  <wishwu007@gmail.com>                         
> 
>         * asan.c (initialize_sanitizer_builtins):                  
>         * builtin-types.def (BT_FN_VOID_UINT8_UINT8):              
>         (BT_FN_VOID_UINT16_UINT16):                                
>         (BT_FN_VOID_UINT32_UINT32):                                
>         (BT_FN_VOID_FLOAT_FLOAT):                                  
>         (BT_FN_VOID_DOUBLE_DOUBLE):                                
>         (BT_FN_VOID_UINT64_PTR):                                   
>         * common.opt:                                              
>         * flag-types.h (enum sanitize_coverage_code):              
>         * opts.c (COVERAGE_SANITIZER_OPT):                         
>         (get_closest_sanitizer_option):                            
>         (parse_sanitizer_options):                                 
>         (common_handle_option):                                    
>         * sancov.c (instrument_cond):                              
>         (instrument_switch):                                       
>         (sancov_pass):                                             
>         * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMP2):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMP4):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMP8):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMPF):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMPD):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_SWITCH):                     

mklog just generates a template, you need to fill in the details
on what has been changed or added or removed.  See other ChangeLog
entries etc. to see what is expected.

> For code :
> void bar (void);
> void
> foo (int x)
> {
>   if (x == 21 || x == 64 || x == 98 || x == 135)
>     bar ();
> }
> GIMPLE IL on x86_64:
>   1 
>   2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, symbol_order=0)
>   3 
>   4 foo (int x)
>   5 {

...

That is with -O0 though?  With -O2 you'll see that it changes.
IMNSHO you really want to also handle the GIMPLE_ASSIGN with tcc_comparison
class rhs_code.  Shouldn't be that hard to handle that within
instrument_cond, just the way how you extract lhs and rhs from the insn will
differ based on if it is a GIMPLE_COND or GIMPLE_ASSIGN (and in that case
also for tcc_comparison rhs_code or for COND_EXPR with tcc_comparison first
operand).
And I really think we should change the 2 LOGICAL_OP_NON_SHORT_CIRCUIT
uses in fold-const.c and one in tree-ssa-ifcombine.c with
  LOGICAL_OP_NON_SHORT_CIRCUIT
  && !flag_sanitize_coverage
with a comment that for sanitize coverage we want to avoid this optimization
because it negatively affects it.

@@ -1611,38 +1631,51 @@ parse_sanitizer_options (const char *p, location_t
 	}
 
       /* Check to see if the string matches an option class name.  */
-      for (i = 0; sanitizer_opts[i].name != NULL; ++i)
-	if (len == sanitizer_opts[i].len
-	    && memcmp (p, sanitizer_opts[i].name, len) == 0)
+      for (i = 0; opts[i].name != NULL; ++i)
+	if (len == opts[i].len
+	    && memcmp (p, opts[i].name, len) == 0)
 	  {
-	    /* Handle both -fsanitize and -fno-sanitize cases.  */
-	    if (value && sanitizer_opts[i].flag == ~0U)
+	    if (code == OPT_fsanitize_coverage_)
 	      {
-		if (code == OPT_fsanitize_)
-		  {
-		    if (complain)
-		      error_at (loc, "%<-fsanitize=all%> option is not valid");
-		  }
+		if (value)
+		  flags |= opts[i].flag;
 		else
-		  flags |= ~(SANITIZE_THREAD | SANITIZE_LEAK
-			     | SANITIZE_UNREACHABLE | SANITIZE_RETURN);
+		  flags &= ~opts[i].flag;
+		found = true;
+		break;
 	      }
-	    else if (value)
+	    else
 	      {
-		/* Do not enable -fsanitize-recover=unreachable and
-		   -fsanitize-recover=return if -fsanitize-recover=undefined
-		   is selected.  */
-		if (code == OPT_fsanitize_recover_
-		    && sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
-		  flags |= (SANITIZE_UNDEFINED
-			    & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
+		/* Handle both -fsanitize and -fno-sanitize cases.  */
+		if (value && opts[i].flag == ~0U)
+		  {
+		    if (code == OPT_fsanitize_)
+		      {
+			if (complain)
+			  error_at (loc,
+				    "%<-fsanitize=all%> option is not valid");
+		      }
+		    else
+		      flags |= ~(SANITIZE_THREAD | SANITIZE_LEAK
+				 | SANITIZE_UNREACHABLE | SANITIZE_RETURN);
+		  }
+		else if (value)
+		  {
+		    /* Do not enable -fsanitize-recover=unreachable and
+		       -fsanitize-recover=return if -fsanitize-recover=undefined
+		       is selected.  */
+		    if (code == OPT_fsanitize_recover_
+			&& opts[i].flag == SANITIZE_UNDEFINED)
+		      flags |= (SANITIZE_UNDEFINED
+				& ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
+		    else
+		      flags |= opts[i].flag;
+		  }
 		else
-		  flags |= sanitizer_opts[i].flag;
+		  flags &= ~opts[i].flag;
+		found = true;
+		break;
 	      }
-	    else
-	      flags &= ~sanitizer_opts[i].flag;
-	    found = true;
-	    break;
 	  }

I don't see a need for this hunk.  For code == OPT_fsanitize_coverage_
(where you know that there is no entry with ~0U flag value and also
know that code is not OPT_fsanitize_recover_) I think it will
just DTRT without any changes.
 
 namespace {

There should be a function comment before the function here, explaining
what the function is for.
 
+static void
+instrument_cond (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+  tree lhs = gimple_cond_lhs (stmt);
+  tree rhs = gimple_cond_rhs (stmt);
+  tree lhs_type = TREE_TYPE (lhs);
+  tree rhs_type = TREE_TYPE (rhs);
+
+  HOST_WIDE_INT size_in_bytes = MAX (int_size_in_bytes (lhs_type),
+				     int_size_in_bytes (rhs_type));
+  if (size_in_bytes == -1)
+    return;

As I said, GIMPLE_COND operands should have the same (or compatible)
type, so there is no need to use rhs_type at all, nor test it.
And there is no need to test size_in_bytes before the if, just do
it right before the switch in there (and no need to special case -1,
because it is like any other default: handled by return;).

+  else if (SCALAR_FLOAT_TYPE_P (lhs_type) && SCALAR_FLOAT_TYPE_P (rhs_type))

Again, no need to test both.

+    {
+      if (TYPE_MODE (lhs_type) == TYPE_MODE (double_type_node)
+	  || TYPE_MODE (rhs_type) == TYPE_MODE (double_type_node))

Or here.

+	{
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPD;
+	  to_type = double_type_node;
+	}
+      else if (TYPE_MODE (lhs_type) == TYPE_MODE (float_type_node)
+	       || TYPE_MODE (rhs_type) == TYPE_MODE (float_type_node))

Or here.  Just use type instead of lhs_type.

+	{
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPF;
+	  to_type = float_type_node;
+	}
+
+    }
+  if (to_type != NULL_TREE)
+    {
+      gimple_seq seq = NULL;
+
+      gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
+      tree clhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+
+      gimple_seq_add_stmt (&seq, build_type_cast (to_type, rhs));
+      tree crhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));

If the var already has the right type, that will just create waste
that further opts will need to clean up.  Better:
  if (!useless_type_conversion_p (to_type, lhs_type))	// or s/lhs_// if you do the above
    {
      gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
      lhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));

      gimple_seq_add_stmt (&seq, build_type_cast (to_type, rhs));
      rhs = gimple_assign_rhs (gimple_seq_last_stmt (seq));
    }
or perhaps even 
  if (!useless_type_conversion_p (to_type, type))
    {
      if (TREE_CODE (lhs) == INTEGER_CST)
	lhs = fold_convert (to_type, lhs);
      else
	{
	  gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
	  lhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
	}
...
    }

+  tree index = gimple_switch_index (switch_stmt);
+
+  HOST_WIDE_INT size_in_bytes = int_size_in_bytes (TREE_TYPE (index));
+  if (size_in_bytes == -1)
+    return;

Well, you want to punt not just when it is -1, but also when it is
> 8.

+
+  unsigned i, n = gimple_switch_num_labels (switch_stmt), num = 0;
+  for (i = 0; i < n; ++i)

I think gimple_switch_label (switch_stmt, 0) must be the
default label and thus have no low/high case, so you should
use for (i = 1; i < n; ++i).

+  for (i = 0; i < n; ++i)

Ditto.

	Jakub
=?UTF-8?B?5ZC05r2N5rWgKOatpOW9vCk=?= Sept. 5, 2017, 1:03 p.m. UTC | #14
Hi
Attachment is my updated path.
The implementation of parse_sanitizer_options is not elegance enough. Mixing handling flags of fsanitize is easy to make mistakes.

ChangeLog:
gcc/ChangeLog:

2017-09-05  Wish Wu  <wishwu007@gmail.com>

	* asan.c (initialize_sanitizer_builtins):
	Build function type list of trace-cmp.
	* builtin-types.def (BT_FN_VOID_UINT8_UINT8):
	Define function type of trace-cmp.
	(BT_FN_VOID_UINT16_UINT16): Likewise.
	(BT_FN_VOID_UINT32_UINT32): Likewise.
	(BT_FN_VOID_FLOAT_FLOAT): Likewise.
	(BT_FN_VOID_DOUBLE_DOUBLE): Likewise.
	(BT_FN_VOID_UINT64_PTR): Likewise.
	* common.opt: Add options of sanitize coverage.
	* flag-types.h (enum sanitize_coverage_code):
	Declare flags of sanitize coverage.
	* fold-const.c (fold_range_test):
	Disable non-short-circuit feature when sanitize coverage is enabled.
	(fold_truth_andor): Likewise.
	* tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
	* opts.c (COVERAGE_SANITIZER_OPT):
	Define coverage sanitizer options.
	(get_closest_sanitizer_option): Make OPT_fsanitize_,
	OPT_fsanitize_recover_ and OPT_fsanitize_coverage_ to use same
	function.
	(parse_sanitizer_options): Likewise.
	(common_handle_option): Add OPT_fsanitize_coverage_.
	* sancov.c (instrument_comparison): Instrument comparisons.
	(instrument_switch): Likewise.
	(sancov_pass): Add trace-cmp support.
	* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):
	Define builtin functions of trace-cmp.
	(BUILT_IN_SANITIZER_COV_TRACE_CMP2): Likewise.
	(BUILT_IN_SANITIZER_COV_TRACE_CMP4): Likewise.
	(BUILT_IN_SANITIZER_COV_TRACE_CMP8): Likewise.
	(BUILT_IN_SANITIZER_COV_TRACE_CMPF): Likewise.
	(BUILT_IN_SANITIZER_COV_TRACE_CMPD): Likewise.
	(BUILT_IN_SANITIZER_COV_TRACE_SWITCH): Likewise.

gcc/testsuite/ChangeLog:

2017-09-05  Wish Wu  <wishwu007@gmail.com>

	* gcc.dg/sancov/basic3.c: New test.

Thank you every much for improving my codes.

Wish Wu

------------------------------------------------------------------
From:Jakub Jelinek <jakub@redhat.com>
Time:2017 Sep 5 (Tue) 01:34
To:Wish Wu <weixi.wwx@antfin.com>
Cc:Dmitry Vyukov <dvyukov@google.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
Subject:Re: Add support to trace comparison instructions and switch statements


On Mon, Sep 04, 2017 at 09:36:40PM +0800, 吴潍浠(此彼) wrote:
> gcc/ChangeLog:                                                     
> 
> 2017-09-04  Wish Wu  <wishwu007@gmail.com>                         
> 
>         * asan.c (initialize_sanitizer_builtins):                  
>         * builtin-types.def (BT_FN_VOID_UINT8_UINT8):              
>         (BT_FN_VOID_UINT16_UINT16):                                
>         (BT_FN_VOID_UINT32_UINT32):                                
>         (BT_FN_VOID_FLOAT_FLOAT):                                  
>         (BT_FN_VOID_DOUBLE_DOUBLE):                                
>         (BT_FN_VOID_UINT64_PTR):                                   
>         * common.opt:                                              
>         * flag-types.h (enum sanitize_coverage_code):              
>         * opts.c (COVERAGE_SANITIZER_OPT):                         
>         (get_closest_sanitizer_option):                            
>         (parse_sanitizer_options):                                 
>         (common_handle_option):                                    
>         * sancov.c (instrument_cond):                              
>         (instrument_switch):                                       
>         (sancov_pass):                                             
>         * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMP2):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMP4):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMP8):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMPF):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_CMPD):                       
>         (BUILT_IN_SANITIZER_COV_TRACE_SWITCH):                     

mklog just generates a template, you need to fill in the details
on what has been changed or added or removed.  See other ChangeLog
entries etc. to see what is expected.

> For code :
> void bar (void);
> void
> foo (int x)
> {
>   if (x == 21 || x == 64 || x == 98 || x == 135)
>     bar ();
> }
> GIMPLE IL on x86_64:
>   1 
>   2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, symbol_order=0)
>   3 
>   4 foo (int x)
>   5 {

...

That is with -O0 though?  With -O2 you'll see that it changes.
IMNSHO you really want to also handle the GIMPLE_ASSIGN with tcc_comparison
class rhs_code.  Shouldn't be that hard to handle that within
instrument_cond, just the way how you extract lhs and rhs from the insn will
differ based on if it is a GIMPLE_COND or GIMPLE_ASSIGN (and in that case
also for tcc_comparison rhs_code or for COND_EXPR with tcc_comparison first
operand).
And I really think we should change the 2 LOGICAL_OP_NON_SHORT_CIRCUIT
uses in fold-const.c and one in tree-ssa-ifcombine.c with
  LOGICAL_OP_NON_SHORT_CIRCUIT
  && !flag_sanitize_coverage
with a comment that for sanitize coverage we want to avoid this optimization
because it negatively affects it.

@@ -1611,38 +1631,51 @@ parse_sanitizer_options (const char *p, location_t
  }
 
       /* Check to see if the string matches an option class name.  */
-      for (i = 0; sanitizer_opts[i].name != NULL; ++i)
- if (len == sanitizer_opts[i].len
-     && memcmp (p, sanitizer_opts[i].name, len) == 0)
+      for (i = 0; opts[i].name != NULL; ++i)
+ if (len == opts[i].len
+     && memcmp (p, opts[i].name, len) == 0)
    {
-     /* Handle both -fsanitize and -fno-sanitize cases.  */
-     if (value && sanitizer_opts[i].flag == ~0U)
+     if (code == OPT_fsanitize_coverage_)
        {
-  if (code == OPT_fsanitize_)
-    {
-      if (complain)
-        error_at (loc, "%<-fsanitize=all%> option is not valid");
-    }
+  if (value)
+    flags |= opts[i].flag;
   else
-    flags |= ~(SANITIZE_THREAD | SANITIZE_LEAK
-        | SANITIZE_UNREACHABLE | SANITIZE_RETURN);
+    flags &= ~opts[i].flag;
+  found = true;
+  break;
        }
-     else if (value)
+     else
        {
-  /* Do not enable -fsanitize-recover=unreachable and
-     -fsanitize-recover=return if -fsanitize-recover=undefined
-     is selected.  */
-  if (code == OPT_fsanitize_recover_
-      && sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
-    flags |= (SANITIZE_UNDEFINED
-       & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
+  /* Handle both -fsanitize and -fno-sanitize cases.  */
+  if (value && opts[i].flag == ~0U)
+    {
+      if (code == OPT_fsanitize_)
+        {
+   if (complain)
+     error_at (loc,
+        "%<-fsanitize=all%> option is not valid");
+        }
+      else
+        flags |= ~(SANITIZE_THREAD | SANITIZE_LEAK
+     | SANITIZE_UNREACHABLE | SANITIZE_RETURN);
+    }
+  else if (value)
+    {
+      /* Do not enable -fsanitize-recover=unreachable and
+         -fsanitize-recover=return if -fsanitize-recover=undefined
+         is selected.  */
+      if (code == OPT_fsanitize_recover_
+   && opts[i].flag == SANITIZE_UNDEFINED)
+        flags |= (SANITIZE_UNDEFINED
+    & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
+      else
+        flags |= opts[i].flag;
+    }
   else
-    flags |= sanitizer_opts[i].flag;
+    flags &= ~opts[i].flag;
+  found = true;
+  break;
        }
-     else
-       flags &= ~sanitizer_opts[i].flag;
-     found = true;
-     break;
    }

I don't see a need for this hunk.  For code == OPT_fsanitize_coverage_
(where you know that there is no entry with ~0U flag value and also
know that code is not OPT_fsanitize_recover_) I think it will
just DTRT without any changes.
 
 namespace {

There should be a function comment before the function here, explaining
what the function is for.
 
+static void
+instrument_cond (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+  tree lhs = gimple_cond_lhs (stmt);
+  tree rhs = gimple_cond_rhs (stmt);
+  tree lhs_type = TREE_TYPE (lhs);
+  tree rhs_type = TREE_TYPE (rhs);
+
+  HOST_WIDE_INT size_in_bytes = MAX (int_size_in_bytes (lhs_type),
+         int_size_in_bytes (rhs_type));
+  if (size_in_bytes == -1)
+    return;

As I said, GIMPLE_COND operands should have the same (or compatible)
type, so there is no need to use rhs_type at all, nor test it.
And there is no need to test size_in_bytes before the if, just do
it right before the switch in there (and no need to special case -1,
because it is like any other default: handled by return;).

+  else if (SCALAR_FLOAT_TYPE_P (lhs_type) && SCALAR_FLOAT_TYPE_P (rhs_type))

Again, no need to test both.

+    {
+      if (TYPE_MODE (lhs_type) == TYPE_MODE (double_type_node)
+   || TYPE_MODE (rhs_type) == TYPE_MODE (double_type_node))

Or here.

+ {
+         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPD;
+   to_type = double_type_node;
+ }
+      else if (TYPE_MODE (lhs_type) == TYPE_MODE (float_type_node)
+        || TYPE_MODE (rhs_type) == TYPE_MODE (float_type_node))

Or here.  Just use type instead of lhs_type.

+ {
+         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPF;
+   to_type = float_type_node;
+ }
+
+    }
+  if (to_type != NULL_TREE)
+    {
+      gimple_seq seq = NULL;
+
+      gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
+      tree clhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+
+      gimple_seq_add_stmt (&seq, build_type_cast (to_type, rhs));
+      tree crhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));

If the var already has the right type, that will just create waste
that further opts will need to clean up.  Better:
  if (!useless_type_conversion_p (to_type, lhs_type)) // or s/lhs_// if you do the above
    {
      gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
      lhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));

      gimple_seq_add_stmt (&seq, build_type_cast (to_type, rhs));
      rhs = gimple_assign_rhs (gimple_seq_last_stmt (seq));
    }
or perhaps even 
  if (!useless_type_conversion_p (to_type, type))
    {
      if (TREE_CODE (lhs) == INTEGER_CST)
 lhs = fold_convert (to_type, lhs);
      else
 {
   gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
   lhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
 }
...
    }

+  tree index = gimple_switch_index (switch_stmt);
+
+  HOST_WIDE_INT size_in_bytes = int_size_in_bytes (TREE_TYPE (index));
+  if (size_in_bytes == -1)
+    return;

Well, you want to punt not just when it is -1, but also when it is
> 8.

+
+  unsigned i, n = gimple_switch_num_labels (switch_stmt), num = 0;
+  for (i = 0; i < n; ++i)

I think gimple_switch_label (switch_stmt, 0) must be the
default label and thus have no low/high case, so you should
use for (i = 1; i < n; ++i).

+  for (i = 0; i < n; ++i)

Ditto.

 Jakub
Jakub Jelinek Sept. 5, 2017, 9:44 p.m. UTC | #15
On Tue, Sep 05, 2017 at 09:03:52PM +0800, 吴潍浠(此彼) wrote:
> Attachment is my updated path.
> The implementation of parse_sanitizer_options is not elegance enough. Mixing handling flags of fsanitize is easy to make mistakes.

To avoid too many further iterations, I took the liberty to tweak your
patch.  From https://clang.llvm.org/docs/SanitizerCoverage.html
I've noticed that since 2017-08-11 clang/llvm wants to emit
__sanitizer_cov_trace_const_cmpN with the first argument a constant
if one of the comparison operands is a constant, so the patch implements
that too.
I wonder about the __sanitizer_cov_trace_cmp{f,d} entry-points, because
I can't find them on that page nor in llvm sources.
I've also added handling of COND_EXPRs and added some documentation.

I've bootstrapped/regtested the patch on x86_64-linux and i686-linux.
Can you test it on whatever you want to use the patch for?

2017-09-05  Wish Wu  <wishwu007@gmail.com>
	    Jakub Jelinek  <jakub@redhat.com>

	* asan.c (initialize_sanitizer_builtins): Add
	BT_FN_VOID_UINT8_UINT8, BT_FN_VOID_UINT16_UINT16,
	BT_FN_VOID_UINT32_UINT32, BT_FN_VOID_UINT64_UINT64,
	BT_FN_VOID_FLOAT_FLOAT, BT_FN_VOID_DOUBLE_DOUBLE and
	BT_FN_VOID_UINT64_PTR variables.
	* builtin-types.def (BT_FN_VOID_UINT8_UINT8): New fn type.
	(BT_FN_VOID_UINT16_UINT16): Likewise.
	(BT_FN_VOID_UINT32_UINT32): Likewise.
	(BT_FN_VOID_FLOAT_FLOAT): Likewise.
	(BT_FN_VOID_DOUBLE_DOUBLE): Likewise.
	(BT_FN_VOID_UINT64_PTR): Likewise.
	* common.opt (flag_sanitize_coverage): New variable.
	(fsanitize-coverage=trace-pc): Remove.
	(fsanitize-coverage=): Add.
	* flag-types.h (enum sanitize_coverage_code): New enum.
	* fold-const.c (fold_range_test): Disable non-short-circuit
	optimization if flag_sanitize_coverage.
	(fold_truth_andor): Likewise.
	* tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
	* opts.c (COVERAGE_SANITIZER_OPT): Define.
	(coverage_sanitizer_opts): New array.
	(get_closest_sanitizer_option): Add OPTS argument, handle also
	OPT_fsanitize_coverage_.
	(parse_sanitizer_options): Adjusted to also handle
	OPT_fsanitize_coverage_.
	(common_handle_option): Add OPT_fsanitize_coverage_.
	* sancov.c (instrument_comparison, instrument_switch): New function.
	(sancov_pass): Add trace-cmp support.
	* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1,
	BUILT_IN_SANITIZER_COV_TRACE_CMP2, BUILT_IN_SANITIZER_COV_TRACE_CMP4,
	BUILT_IN_SANITIZER_COV_TRACE_CMP8,
	BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1,
	BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2,
	BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4,
	BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8,
	BUILT_IN_SANITIZER_COV_TRACE_CMPF, BUILT_IN_SANITIZER_COV_TRACE_CMPD,
	BUILT_IN_SANITIZER_COV_TRACE_SWITCH): New builtins.
	* doc/invoke.texi: Document -fsanitize-coverage=trace-cmp.

	* gcc.dg/sancov/cmp0.c: New test.

--- gcc/asan.c.jj	2017-09-04 09:55:26.600687479 +0200
+++ gcc/asan.c	2017-09-05 15:39:32.452612728 +0200
@@ -2709,6 +2709,29 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_SIZE_CONST_PTR_INT
     = build_function_type_list (size_type_node, const_ptr_type_node,
 				integer_type_node, NULL_TREE);
+
+  tree BT_FN_VOID_UINT8_UINT8
+    = build_function_type_list (void_type_node, unsigned_char_type_node,
+				unsigned_char_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT16_UINT16
+    = build_function_type_list (void_type_node, uint16_type_node,
+				uint16_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT32_UINT32
+    = build_function_type_list (void_type_node, uint32_type_node,
+				uint32_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_UINT64
+    = build_function_type_list (void_type_node, uint64_type_node,
+				uint64_type_node, NULL_TREE);
+  tree BT_FN_VOID_FLOAT_FLOAT
+    = build_function_type_list (void_type_node, float_type_node,
+				float_type_node, NULL_TREE);
+  tree BT_FN_VOID_DOUBLE_DOUBLE
+    = build_function_type_list (void_type_node, double_type_node,
+				double_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_PTR
+    = build_function_type_list (void_type_node, uint64_type_node,
+				ptr_type_node, NULL_TREE);
+
   tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
   tree BT_FN_IX_CONST_VPTR_INT[5];
   tree BT_FN_IX_VPTR_IX_INT[5];
--- gcc/builtin-types.def.jj	2017-06-28 09:05:45.249396972 +0200
+++ gcc/builtin-types.def	2017-09-05 15:39:32.453612716 +0200
@@ -338,8 +338,20 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRMODE_
 		     BT_VOID, BT_PTRMODE, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
 		     BT_VOID, BT_PTR, BT_PTRMODE)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT8_UINT8,
+     		     BT_VOID, BT_UINT8, BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT16_UINT16,
+     		     BT_VOID, BT_UINT16, BT_UINT16)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT32_UINT32,
+     		     BT_VOID, BT_UINT32, BT_UINT32)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT64_UINT64,
      		     BT_VOID, BT_UINT64, BT_UINT64)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_FLOAT_FLOAT,
+     		     BT_VOID, BT_FLOAT, BT_FLOAT)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_DOUBLE_DOUBLE,
+     		     BT_VOID, BT_DOUBLE, BT_DOUBLE)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT64_PTR,
+     		     BT_VOID, BT_UINT64, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VALIST_REF_VALIST_ARG,
 		     BT_VOID, BT_VALIST_REF, BT_VALIST_ARG)
 DEF_FUNCTION_TYPE_2 (BT_FN_LONG_LONG_LONG,
--- gcc/common.opt.jj	2017-09-01 09:26:48.441614145 +0200
+++ gcc/common.opt	2017-09-05 15:39:32.454612704 +0200
@@ -233,10 +233,9 @@ unsigned int flag_sanitize
 Variable
 unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT | SANITIZE_KERNEL_ADDRESS) & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN)
 
-fsanitize-coverage=trace-pc
-Common Report Var(flag_sanitize_coverage)
-Enable coverage-guided fuzzing code instrumentation.
-Inserts call to __sanitizer_cov_trace_pc into every basic block.
+; What the coverage sanitizers should instrument
+Variable
+unsigned int flag_sanitize_coverage
 
 ; Flag whether a prefix has been added to dump_base_name
 Variable
@@ -982,6 +981,10 @@ fsanitize=
 Common Driver Report Joined
 Select what to sanitize.
 
+fsanitize-coverage=
+Common Report Joined
+Select what to coverage sanitize.
+
 fasan-shadow-offset=
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fasan-shadow-offset=<number>	Use custom shadow memory offset.
--- gcc/flag-types.h.jj	2017-07-28 12:35:27.562412268 +0200
+++ gcc/flag-types.h	2017-09-05 15:39:32.454612704 +0200
@@ -252,6 +252,14 @@ enum sanitize_code {
 				  | SANITIZE_BOUNDS_STRICT
 };
 
+/* Different trace modes.  */
+enum sanitize_coverage_code {
+  /* Trace PC.  */
+  SANITIZE_COV_TRACE_PC = 1 << 0,
+  /* Trace Comparison.  */
+  SANITIZE_COV_TRACE_CMP = 1 << 1
+};
+
 /* flag_vtable_verify initialization levels. */
 enum vtv_priority {
   VTV_NO_PRIORITY       = 0,  /* i.E. Do NOT do vtable verification. */
--- gcc/fold-const.c.jj	2017-09-01 09:25:46.275345134 +0200
+++ gcc/fold-const.c	2017-09-05 15:39:32.457612668 +0200
@@ -5394,6 +5394,7 @@ fold_range_test (location_t loc, enum tr
      short-circuited branch and the underlying object on both sides
      is the same, make a non-short-circuit operation.  */
   else if (LOGICAL_OP_NON_SHORT_CIRCUIT
+	   && !flag_sanitize_coverage
 	   && lhs != 0 && rhs != 0
 	   && (code == TRUTH_ANDIF_EXPR
 	       || code == TRUTH_ORIF_EXPR)
@@ -8035,6 +8036,7 @@ fold_truth_andor (location_t loc, enum t
     return tem;
 
   if (LOGICAL_OP_NON_SHORT_CIRCUIT
+      && !flag_sanitize_coverage
       && (code == TRUTH_AND_EXPR
           || code == TRUTH_ANDIF_EXPR
           || code == TRUTH_OR_EXPR
--- gcc/opts.c.jj	2017-09-01 09:26:28.041854018 +0200
+++ gcc/opts.c	2017-09-05 15:53:18.752765907 +0200
@@ -1526,6 +1526,17 @@ const struct sanitizer_opts_s sanitizer_
   { NULL, 0U, 0UL, false }
 };
 
+/* -f{,no-}sanitize-coverage= suboptions.  */
+const struct sanitizer_opts_s coverage_sanitizer_opts[] =
+{
+#define COVERAGE_SANITIZER_OPT(name, flags) \
+    { #name, flags, sizeof #name - 1, true }
+  COVERAGE_SANITIZER_OPT (trace-pc, SANITIZE_COV_TRACE_PC),
+  COVERAGE_SANITIZER_OPT (trace-cmp, SANITIZE_COV_TRACE_CMP),
+#undef COVERAGE_SANITIZER_OPT
+  { NULL, 0U, 0UL, false }
+};
+
 /* A struct for describing a run of chars within a string.  */
 
 struct string_fragment
@@ -1556,31 +1567,34 @@ struct edit_distance_traits<const string
 
 /* Given ARG, an unrecognized sanitizer option, return the best
    matching sanitizer option, or NULL if there isn't one.
-   CODE is OPT_fsanitize_ or OPT_fsanitize_recover_.
+   OPTS is array of candidate sanitizer options.
+   CODE is OPT_fsanitize_, OPT_fsanitize_recover_ or
+   OPT_fsanitize_coverage_.
    VALUE is non-zero for the regular form of the option, zero
    for the "no-" form (e.g. "-fno-sanitize-recover=").  */
 
 static const char *
 get_closest_sanitizer_option (const string_fragment &arg,
+			      const struct sanitizer_opts_s *opts,
 			      enum opt_code code, int value)
 {
   best_match <const string_fragment &, const char*> bm (arg);
-  for (int i = 0; sanitizer_opts[i].name != NULL; ++i)
+  for (int i = 0; opts[i].name != NULL; ++i)
     {
       /* -fsanitize=all is not valid, so don't offer it.  */
-      if (sanitizer_opts[i].flag == ~0U
-	  && code == OPT_fsanitize_
+      if (code == OPT_fsanitize_
+	  && opts[i].flag == ~0U
 	  && value)
 	continue;
 
       /* For -fsanitize-recover= (and not -fno-sanitize-recover=),
 	 don't offer the non-recoverable options.  */
-      if (!sanitizer_opts[i].can_recover
-	  && code == OPT_fsanitize_recover_
+      if (code == OPT_fsanitize_recover_
+	  && !opts[i].can_recover
 	  && value)
 	continue;
 
-      bm.consider (sanitizer_opts[i].name);
+      bm.consider (opts[i].name);
     }
   return bm.get_best_meaningful_candidate ();
 }
@@ -1594,6 +1608,13 @@ parse_sanitizer_options (const char *p,
 			 unsigned int flags, int value, bool complain)
 {
   enum opt_code code = (enum opt_code) scode;
+
+  const struct sanitizer_opts_s *opts;
+  if (code == OPT_fsanitize_coverage_)
+    opts = coverage_sanitizer_opts;
+  else
+    opts = sanitizer_opts;
+
   while (*p != 0)
     {
       size_t len, i;
@@ -1611,12 +1632,11 @@ parse_sanitizer_options (const char *p,
 	}
 
       /* Check to see if the string matches an option class name.  */
-      for (i = 0; sanitizer_opts[i].name != NULL; ++i)
-	if (len == sanitizer_opts[i].len
-	    && memcmp (p, sanitizer_opts[i].name, len) == 0)
+      for (i = 0; opts[i].name != NULL; ++i)
+	if (len == opts[i].len && memcmp (p, opts[i].name, len) == 0)
 	  {
 	    /* Handle both -fsanitize and -fno-sanitize cases.  */
-	    if (value && sanitizer_opts[i].flag == ~0U)
+	    if (value && opts[i].flag == ~0U)
 	      {
 		if (code == OPT_fsanitize_)
 		  {
@@ -1633,14 +1653,14 @@ parse_sanitizer_options (const char *p,
 		   -fsanitize-recover=return if -fsanitize-recover=undefined
 		   is selected.  */
 		if (code == OPT_fsanitize_recover_
-		    && sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
+		    && opts[i].flag == SANITIZE_UNDEFINED)
 		  flags |= (SANITIZE_UNDEFINED
 			    & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
 		else
-		  flags |= sanitizer_opts[i].flag;
+		  flags |= opts[i].flag;
 	      }
 	    else
-	      flags &= ~sanitizer_opts[i].flag;
+	      flags &= ~opts[i].flag;
 	    found = true;
 	    break;
 	  }
@@ -1649,21 +1669,27 @@ parse_sanitizer_options (const char *p,
 	{
 	  const char *hint
 	    = get_closest_sanitizer_option (string_fragment (p, len),
-					    code, value);
+					    opts, code, value);
+
+	  const char *suffix;
+	  if (code == OPT_fsanitize_recover_)
+	    suffix = "-recover";
+	  else if (code == OPT_fsanitize_coverage_)
+	    suffix = "-coverage";
+	  else
+	    suffix = "";
 
 	  if (hint)
 	    error_at (loc,
 		      "unrecognized argument to -f%ssanitize%s= option: %q.*s;"
 		      " did you mean %qs?",
 		      value ? "" : "no-",
-		      code == OPT_fsanitize_ ? "" : "-recover",
-		      (int) len, p, hint);
+		      suffix, (int) len, p, hint);
 	  else
 	    error_at (loc,
 		      "unrecognized argument to -f%ssanitize%s= option: %q.*s",
 		      value ? "" : "no-",
-		      code == OPT_fsanitize_ ? "" : "-recover",
-		      (int) len, p);
+		      suffix, (int) len, p);
 	}
 
       if (comma == NULL)
@@ -1956,6 +1982,12 @@ common_handle_option (struct gcc_options
 	  &= ~(SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT);
       break;
 
+    case OPT_fsanitize_coverage_:
+      opts->x_flag_sanitize_coverage
+	= parse_sanitizer_options (arg, loc, code,
+				   opts->x_flag_sanitize_coverage, value, true);
+      break;
+
     case OPT_O:
     case OPT_Os:
     case OPT_Ofast:
--- gcc/sancov.c.jj	2017-09-01 09:26:48.603612240 +0200
+++ gcc/sancov.c	2017-09-05 17:51:02.209865830 +0200
@@ -1,6 +1,7 @@
 /* Code coverage instrumentation for fuzzing.
    Copyright (C) 2015-2017 Free Software Foundation, Inc.
-   Contributed by Dmitry Vyukov <dvyukov@google.com>
+   Contributed by Dmitry Vyukov <dvyukov@google.com> and
+   Wish Wu <wishwu007@gmail.com>
 
 This file is part of GCC.
 
@@ -29,32 +30,271 @@ along with GCC; see the file COPYING3.
 #include "flags.h"
 #include "stmt.h"
 #include "gimple-iterator.h"
+#include "gimple-builder.h"
 #include "tree-cfg.h"
 #include "tree-pass.h"
 #include "tree-iterator.h"
+#include "fold-const.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "output.h"
+#include "cgraph.h"
 #include "asan.h"
 
 namespace {
 
+/* Instrument one comparison operation, which compares lhs and rhs.
+   Call the instrumentation function with the comparison operand.
+   For integral comparisons if exactly one of the comparison operands is
+   constant, call __sanitizer_cov_trace_const_cmp* instead of
+   __sanitizer_cov_trace_cmp*.  */
+
+static void
+instrument_comparison (gimple_stmt_iterator *gsi, tree lhs, tree rhs)
+{
+  tree type = TREE_TYPE (lhs);
+  enum built_in_function fncode = END_BUILTINS;
+  tree to_type = NULL_TREE;
+  bool c = false;
+
+  if (INTEGRAL_TYPE_P (type))
+    {
+      c = (is_gimple_min_invariant (lhs)
+	   ^ is_gimple_min_invariant (rhs));
+      switch (int_size_in_bytes (type))
+	{
+	case 1:
+      	  fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1
+		     : BUILT_IN_SANITIZER_COV_TRACE_CMP1;
+	  to_type = unsigned_char_type_node;
+	  break;
+	case 2:
+      	  fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2
+		     : BUILT_IN_SANITIZER_COV_TRACE_CMP2;
+	  to_type = uint16_type_node;
+	  break;
+	case 4:
+      	  fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4
+		     : BUILT_IN_SANITIZER_COV_TRACE_CMP4;
+	  to_type = uint32_type_node;
+	  break;
+	default:
+      	  fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8
+		     : BUILT_IN_SANITIZER_COV_TRACE_CMP8;
+	  to_type = uint64_type_node;
+	  break;
+	}
+    }
+  else if (SCALAR_FLOAT_TYPE_P (type))
+    {
+      if (TYPE_MODE (type) == TYPE_MODE (float_type_node))
+	{
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPF;
+	  to_type = float_type_node;
+	}
+      else if (TYPE_MODE (type) == TYPE_MODE (double_type_node))
+	{
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPD;
+	  to_type = double_type_node;
+	}
+    }
+
+  if (to_type != NULL_TREE)
+    {
+      gimple_seq seq = NULL;
+
+      if (!useless_type_conversion_p (to_type, type))
+	{
+	  if (TREE_CODE (lhs) == INTEGER_CST)
+	    lhs = fold_convert (to_type, lhs);
+	  else
+	    {
+	      gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
+	      lhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+	    }
+
+	  if (TREE_CODE (rhs) == INTEGER_CST)
+	    rhs = fold_convert (to_type, rhs);
+	  else
+	    {
+	      gimple_seq_add_stmt (&seq, build_type_cast (to_type, rhs));
+	      rhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+	    }
+	}
+
+      if (c && !is_gimple_min_invariant (lhs))
+	std::swap (lhs, rhs);
+
+      tree fndecl = builtin_decl_implicit (fncode);
+      gimple *gcall = gimple_build_call (fndecl, 2, lhs, rhs);
+      gimple_seq_add_stmt (&seq, gcall);
+
+      gimple_seq_set_location (seq, gimple_location (gsi_stmt (*gsi)));
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+    }
+}
+
+/* Instrument switch statement.  Call __sanitizer_cov_trace_switch with
+   the value of the index and array that contains number of case values,
+   the bitsize of the index and the case values converted to uint64_t.  */
+
+static void
+instrument_switch (gimple_stmt_iterator *gsi, gimple *stmt, function *fun)
+{
+  gswitch *switch_stmt = as_a<gswitch *> (stmt);
+  tree index = gimple_switch_index (switch_stmt);
+  HOST_WIDE_INT size_in_bytes = int_size_in_bytes (TREE_TYPE (index));
+  if (size_in_bytes == -1 || size_in_bytes > 8)
+    return;
+
+  location_t loc = gimple_location (stmt);
+  unsigned i, n = gimple_switch_num_labels (switch_stmt), num = 0;
+  for (i = 1; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+	num++;
+
+      tree high_case = CASE_HIGH (label);
+      if (high_case != NULL_TREE)
+	num++;
+    }
+
+  tree case_array_type
+   = build_array_type (build_type_variant (uint64_type_node, 1, 0),
+		       build_index_type (size_int (num + 2 - 1)));
+
+  char name[64];
+  static size_t case_array_count = 0;
+  ASM_GENERATE_INTERNAL_LABEL (name, "LCASEARRAY", case_array_count++);
+  tree case_array_var = build_decl (loc, VAR_DECL, get_identifier (name),
+				    case_array_type);
+  TREE_STATIC (case_array_var) = 1;
+  TREE_PUBLIC (case_array_var) = 0;
+  TREE_CONSTANT (case_array_var) = 1;
+  TREE_READONLY (case_array_var) = 1;
+  DECL_EXTERNAL (case_array_var) = 0;
+  DECL_ARTIFICIAL (case_array_var) = 1;
+  DECL_IGNORED_P (case_array_var) = 1;
+
+  vec <constructor_elt, va_gc> *v = NULL;
+  vec_alloc (v, num + 2);
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+			  build_int_cst (uint64_type_node, num));
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+			  build_int_cst (uint64_type_node,
+					 size_in_bytes * BITS_PER_UNIT));
+  for (i = 1; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+	CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+				fold_convert (uint64_type_node, low_case));
+
+      tree high_case = CASE_HIGH (label);
+      if (high_case != NULL_TREE)
+	CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+				fold_convert (uint64_type_node, high_case));
+    }
+  tree ctor = build_constructor (case_array_type, v);
+  TREE_STATIC (ctor) = 1;
+  TREE_PUBLIC (ctor) = 0;
+  TREE_CONSTANT (ctor) = 1;
+  TREE_READONLY (ctor) = 1;
+  DECL_INITIAL (case_array_var) = ctor;
+  varpool_node::finalize_decl (case_array_var);
+  add_local_decl (fun, case_array_var);
+
+  gimple_seq seq = NULL;
+
+  if (!useless_type_conversion_p (uint64_type_node, TREE_TYPE (index)))
+    {
+      if (TREE_CODE (index) == INTEGER_CST)
+	index = fold_convert (uint64_type_node, index);
+      else
+	{
+	  gimple_seq_add_stmt (&seq, build_type_cast (uint64_type_node, index));
+	  index = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+	}
+    }
+
+  tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_SWITCH);
+  gimple *gcall = gimple_build_call (fndecl, 2, index,
+				     build_fold_addr_expr (case_array_var));
+  gimple_seq_add_stmt (&seq, gcall);
+
+  gimple_seq_set_location (seq, loc);
+  gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+}
+
 unsigned
 sancov_pass (function *fun)
 {
   initialize_sanitizer_builtins ();
 
   /* Insert callback into beginning of every BB. */
-  tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
-  basic_block bb;
-  FOR_EACH_BB_FN (bb, fun)
-    {
-      gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
-      if (gsi_end_p (gsi))
-	continue;
-      gimple *stmt = gsi_stmt (gsi);
-      gimple *gcall = gimple_build_call (fndecl, 0);
-      gimple_set_location (gcall, gimple_location (stmt));
-      gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_PC)
+    {
+      basic_block bb;
+      tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
+      FOR_EACH_BB_FN (bb, fun)
+	{
+	  gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
+	  if (gsi_end_p (gsi))
+	    continue;
+	  gimple *stmt = gsi_stmt (gsi);
+	  gimple *gcall = gimple_build_call (fndecl, 0);
+	  gimple_set_location (gcall, gimple_location (stmt));
+	  gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+	}
+    }
+
+  /* Insert callback into every comparison related operation.  */
+  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_CMP)
+    {
+      basic_block bb;
+      FOR_EACH_BB_FN (bb, fun)
+	{
+	  gimple_stmt_iterator gsi;
+	  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	    {
+	      gimple *stmt = gsi_stmt (gsi);
+	      enum tree_code rhs_code;
+	      switch (gimple_code (stmt))
+		{
+		case GIMPLE_ASSIGN:
+		  rhs_code = gimple_assign_rhs_code (stmt);
+		  if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+		    instrument_comparison (&gsi,
+					   gimple_assign_rhs1 (stmt),
+					   gimple_assign_rhs2 (stmt));
+		  else if (rhs_code == COND_EXPR
+			   && COMPARISON_CLASS_P (gimple_assign_rhs1 (stmt)))
+		    {
+		      tree cond = gimple_assign_rhs1 (stmt);
+		      instrument_comparison (&gsi, TREE_OPERAND (cond, 0),
+					     TREE_OPERAND (cond, 1));
+		    }
+		  break;
+		case GIMPLE_COND:
+		  instrument_comparison (&gsi,
+					 gimple_cond_lhs (stmt),
+					 gimple_cond_rhs (stmt));
+		  break;
+
+		case GIMPLE_SWITCH:
+		  instrument_switch (&gsi, stmt, fun);
+		  break;
+
+		default:
+		  break;
+		}
+	    }
+	}
     }
   return 0;
 }
--- gcc/sanitizer.def.jj	2017-07-28 12:35:27.565412232 +0200
+++ gcc/sanitizer.def	2017-09-05 17:01:51.281308488 +0200
@@ -537,6 +537,39 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN
 DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_PC,
 		      "__sanitizer_cov_trace_pc",
 		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP1,
+		      "__sanitizer_cov_trace_cmp1",
+		      BT_FN_VOID_UINT8_UINT8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP2,
+		      "__sanitizer_cov_trace_cmp2",
+		      BT_FN_VOID_UINT16_UINT16, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP4,
+		      "__sanitizer_cov_trace_cmp4",
+		      BT_FN_VOID_UINT32_UINT32, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP8,
+		      "__sanitizer_cov_trace_cmp8",
+		      BT_FN_VOID_UINT64_UINT64, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1,
+		      "__sanitizer_cov_trace_const_cmp1",
+		      BT_FN_VOID_UINT8_UINT8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2,
+		      "__sanitizer_cov_trace_const_cmp2",
+		      BT_FN_VOID_UINT16_UINT16, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4,
+		      "__sanitizer_cov_trace_const_cmp4",
+		      BT_FN_VOID_UINT32_UINT32, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8,
+		      "__sanitizer_cov_trace_const_cmp8",
+		      BT_FN_VOID_UINT64_UINT64, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMPF,
+		      "__sanitizer_cov_trace_cmpf",
+		      BT_FN_VOID_FLOAT_FLOAT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMPD,
+		      "__sanitizer_cov_trace_cmpd",
+		      BT_FN_VOID_DOUBLE_DOUBLE, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_SWITCH,
+		      "__sanitizer_cov_trace_switch",
+		      BT_FN_VOID_UINT64_PTR, ATTR_NOTHROW_LEAF_LIST)
 
 /* This has to come after all the sanitizer builtins.  */
 DEF_BUILTIN_STUB(END_SANITIZER_BUILTINS, (const char *)0)
--- gcc/tree-ssa-ifcombine.c.jj	2017-06-30 09:49:28.004662123 +0200
+++ gcc/tree-ssa-ifcombine.c	2017-09-05 15:39:32.460612633 +0200
@@ -560,7 +560,7 @@ ifcombine_ifandif (basic_block inner_con
 	{
 	  tree t1, t2;
 	  gimple_stmt_iterator gsi;
-	  if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
+	  if (!LOGICAL_OP_NON_SHORT_CIRCUIT || flag_sanitize_coverage)
 	    return false;
 	  /* Only do this optimization if the inner bb contains only the conditional. */
 	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb)))
--- gcc/doc/invoke.texi.jj	2017-09-04 09:55:25.000000000 +0200
+++ gcc/doc/invoke.texi	2017-09-05 18:22:02.863499354 +0200
@@ -11161,6 +11161,20 @@ is usable even in freestanding environme
 Enable coverage-guided fuzzing code instrumentation.
 Inserts a call to @code{__sanitizer_cov_trace_pc} into every basic block.
 
+@item -fsanitize-coverage=trace-cmp
+@opindex fsanitize-coverage=trace-cmp
+Enable dataflow guided fuzzing code instrumentation.
+Inserts a call to @code{__sanitizer_cov_trace_cmp1},
+@code{__sanitizer_cov_trace_cmp2}, @code{__sanitizer_cov_trace_cmp4} or
+@code{__sanitizer_cov_trace_cmp8} for integral comparison with both operands
+variable or @code{__sanitizer_cov_trace_const_cmp1},
+@code{__sanitizer_cov_trace_const_cmp2},
+@code{__sanitizer_cov_trace_const_cmp4} or
+@code{__sanitizer_cov_trace_const_cmp8} for integral comparison with one
+operand constant, @code{__sanitizer_cov_trace_cmpf} or
+@code{__sanitizer_cov_trace_cmpd} for float or double comparisons and
+@code{__sanitizer_cov_trace_switch} for switch statements.
+
 @item -fbounds-check
 @opindex fbounds-check
 For front ends that support it, generate additional code to check that
--- gcc/testsuite/gcc.dg/sancov/cmp0.c.jj	2017-09-05 17:54:00.906717081 +0200
+++ gcc/testsuite/gcc.dg/sancov/cmp0.c	2017-09-05 18:04:15.319329045 +0200
@@ -0,0 +1,92 @@
+/* Basic test on number of inserted callbacks.  */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize-coverage=trace-cmp -fdump-tree-optimized" } */
+
+void
+foo (char *a, short *b, int *c, long long *d, float *e, double *f)
+{
+  if (*a)
+    *a += 1;
+  if (*b)
+    *b = *a;
+  if (*c)
+    *c += 1;
+  if (*d)
+    *d = *c;
+  if (*e == *c)
+    *e = *c;
+  if (*f == *e)
+    *f = *e;
+  switch (*a)
+    {
+    case 2:
+      *b += 2;
+      break;
+    case 3:
+      *b += 3;
+      break;
+    case 4:
+      *b += 4;
+      break;
+    case 5:
+      *b += 5;
+      break;
+    case 6:
+      *b += 6;
+      break;
+    case 7 ... 24:
+      *b += 7;
+      break;
+    default:
+      break;
+    }
+  switch (*d)
+    {
+    case 3:
+      *d += 3;
+    case -4:
+      *d -= 4;
+    case -5:
+      *d -= 5;
+    case -6:
+      *d -= 6;
+    case -7:
+      *d -= 7;
+    case -8:
+      *d -= 8;
+    case -9:
+      *d -= 9;
+    case -10:
+      *d -= 10;
+    }
+}
+
+void
+bar (int *c)
+{
+  if (*c == 27)
+    *c += 2;
+  if (*c == 37)
+    *c += 2;
+}
+
+int
+baz (int *c, long long d, long long e)
+{
+  *c = (*c == 48) ? 12 : 24;
+  return d == e;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp1 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp2 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp8 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(27, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(37, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(48, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmp8 \\(" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmpf \\(" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmpd \\(" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp" 7 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmp" 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_switch \\(" 2 "optimized" } } */


	Jakub
=?UTF-8?B?5ZC05r2N5rWgKOatpOW9vCk=?= Sept. 6, 2017, 11:47 a.m. UTC | #16
Hi Jakub
I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons.
I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs.
But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.

And thanks again for your professional help.

Wish Wu

------------------------------------------------------------------
From:Jakub Jelinek <jakub@redhat.com>
Time:2017 Sep 6 (Wed) 05:44
To:Wish Wu <weixi.wwx@antfin.com>
Cc:Dmitry Vyukov <dvyukov@google.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
Subject:Re: Add support to trace comparison instructions and switch statements


On Tue, Sep 05, 2017 at 09:03:52PM +0800, 吴潍浠(此彼) wrote:
> Attachment is my updated path.
> The implementation of parse_sanitizer_options is not elegance enough. Mixing handling flags of fsanitize is easy to make mistakes.

To avoid too many further iterations, I took the liberty to tweak your
patch.  From https://clang.llvm.org/docs/SanitizerCoverage.html
I've noticed that since 2017-08-11 clang/llvm wants to emit
__sanitizer_cov_trace_const_cmpN with the first argument a constant
if one of the comparison operands is a constant, so the patch implements
that too.
I wonder about the __sanitizer_cov_trace_cmp{f,d} entry-points, because
I can't find them on that page nor in llvm sources.
I've also added handling of COND_EXPRs and added some documentation.

I've bootstrapped/regtested the patch on x86_64-linux and i686-linux.
Can you test it on whatever you want to use the patch for?

2017-09-05  Wish Wu  <wishwu007@gmail.com>
     Jakub Jelinek  <jakub@redhat.com>

 * asan.c (initialize_sanitizer_builtins): Add
 BT_FN_VOID_UINT8_UINT8, BT_FN_VOID_UINT16_UINT16,
 BT_FN_VOID_UINT32_UINT32, BT_FN_VOID_UINT64_UINT64,
 BT_FN_VOID_FLOAT_FLOAT, BT_FN_VOID_DOUBLE_DOUBLE and
 BT_FN_VOID_UINT64_PTR variables.
 * builtin-types.def (BT_FN_VOID_UINT8_UINT8): New fn type.
 (BT_FN_VOID_UINT16_UINT16): Likewise.
 (BT_FN_VOID_UINT32_UINT32): Likewise.
 (BT_FN_VOID_FLOAT_FLOAT): Likewise.
 (BT_FN_VOID_DOUBLE_DOUBLE): Likewise.
 (BT_FN_VOID_UINT64_PTR): Likewise.
 * common.opt (flag_sanitize_coverage): New variable.
 (fsanitize-coverage=trace-pc): Remove.
 (fsanitize-coverage=): Add.
 * flag-types.h (enum sanitize_coverage_code): New enum.
 * fold-const.c (fold_range_test): Disable non-short-circuit
 optimization if flag_sanitize_coverage.
 (fold_truth_andor): Likewise.
 * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise.
 * opts.c (COVERAGE_SANITIZER_OPT): Define.
 (coverage_sanitizer_opts): New array.
 (get_closest_sanitizer_option): Add OPTS argument, handle also
 OPT_fsanitize_coverage_.
 (parse_sanitizer_options): Adjusted to also handle
 OPT_fsanitize_coverage_.
 (common_handle_option): Add OPT_fsanitize_coverage_.
 * sancov.c (instrument_comparison, instrument_switch): New function.
 (sancov_pass): Add trace-cmp support.
 * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1,
 BUILT_IN_SANITIZER_COV_TRACE_CMP2, BUILT_IN_SANITIZER_COV_TRACE_CMP4,
 BUILT_IN_SANITIZER_COV_TRACE_CMP8,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4,
 BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8,
 BUILT_IN_SANITIZER_COV_TRACE_CMPF, BUILT_IN_SANITIZER_COV_TRACE_CMPD,
 BUILT_IN_SANITIZER_COV_TRACE_SWITCH): New builtins.
 * doc/invoke.texi: Document -fsanitize-coverage=trace-cmp.

 * gcc.dg/sancov/cmp0.c: New test.

--- gcc/asan.c.jj 2017-09-04 09:55:26.600687479 +0200
+++ gcc/asan.c 2017-09-05 15:39:32.452612728 +0200
@@ -2709,6 +2709,29 @@ initialize_sanitizer_builtins (void)
   tree BT_FN_SIZE_CONST_PTR_INT
     = build_function_type_list (size_type_node, const_ptr_type_node,
     integer_type_node, NULL_TREE);
+
+  tree BT_FN_VOID_UINT8_UINT8
+    = build_function_type_list (void_type_node, unsigned_char_type_node,
+    unsigned_char_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT16_UINT16
+    = build_function_type_list (void_type_node, uint16_type_node,
+    uint16_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT32_UINT32
+    = build_function_type_list (void_type_node, uint32_type_node,
+    uint32_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_UINT64
+    = build_function_type_list (void_type_node, uint64_type_node,
+    uint64_type_node, NULL_TREE);
+  tree BT_FN_VOID_FLOAT_FLOAT
+    = build_function_type_list (void_type_node, float_type_node,
+    float_type_node, NULL_TREE);
+  tree BT_FN_VOID_DOUBLE_DOUBLE
+    = build_function_type_list (void_type_node, double_type_node,
+    double_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_PTR
+    = build_function_type_list (void_type_node, uint64_type_node,
+    ptr_type_node, NULL_TREE);
+
   tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
   tree BT_FN_IX_CONST_VPTR_INT[5];
   tree BT_FN_IX_VPTR_IX_INT[5];
--- gcc/builtin-types.def.jj 2017-06-28 09:05:45.249396972 +0200
+++ gcc/builtin-types.def 2017-09-05 15:39:32.453612716 +0200
@@ -338,8 +338,20 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRMODE_
        BT_VOID, BT_PTRMODE, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
        BT_VOID, BT_PTR, BT_PTRMODE)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT8_UINT8,
+            BT_VOID, BT_UINT8, BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT16_UINT16,
+            BT_VOID, BT_UINT16, BT_UINT16)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT32_UINT32,
+            BT_VOID, BT_UINT32, BT_UINT32)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT64_UINT64,
             BT_VOID, BT_UINT64, BT_UINT64)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_FLOAT_FLOAT,
+            BT_VOID, BT_FLOAT, BT_FLOAT)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_DOUBLE_DOUBLE,
+            BT_VOID, BT_DOUBLE, BT_DOUBLE)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT64_PTR,
+            BT_VOID, BT_UINT64, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VALIST_REF_VALIST_ARG,
        BT_VOID, BT_VALIST_REF, BT_VALIST_ARG)
 DEF_FUNCTION_TYPE_2 (BT_FN_LONG_LONG_LONG,
--- gcc/common.opt.jj 2017-09-01 09:26:48.441614145 +0200
+++ gcc/common.opt 2017-09-05 15:39:32.454612704 +0200
@@ -233,10 +233,9 @@ unsigned int flag_sanitize
 Variable
 unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT | SANITIZE_KERNEL_ADDRESS) & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN)
 
-fsanitize-coverage=trace-pc
-Common Report Var(flag_sanitize_coverage)
-Enable coverage-guided fuzzing code instrumentation.
-Inserts call to __sanitizer_cov_trace_pc into every basic block.
+; What the coverage sanitizers should instrument
+Variable
+unsigned int flag_sanitize_coverage
 
 ; Flag whether a prefix has been added to dump_base_name
 Variable
@@ -982,6 +981,10 @@ fsanitize=
 Common Driver Report Joined
 Select what to sanitize.
 
+fsanitize-coverage=
+Common Report Joined
+Select what to coverage sanitize.
+
 fasan-shadow-offset=
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fasan-shadow-offset=<number> Use custom shadow memory offset.
--- gcc/flag-types.h.jj 2017-07-28 12:35:27.562412268 +0200
+++ gcc/flag-types.h 2017-09-05 15:39:32.454612704 +0200
@@ -252,6 +252,14 @@ enum sanitize_code {
       | SANITIZE_BOUNDS_STRICT
 };
 
+/* Different trace modes.  */
+enum sanitize_coverage_code {
+  /* Trace PC.  */
+  SANITIZE_COV_TRACE_PC = 1 << 0,
+  /* Trace Comparison.  */
+  SANITIZE_COV_TRACE_CMP = 1 << 1
+};
+
 /* flag_vtable_verify initialization levels. */
 enum vtv_priority {
   VTV_NO_PRIORITY       = 0,  /* i.E. Do NOT do vtable verification. */
--- gcc/fold-const.c.jj 2017-09-01 09:25:46.275345134 +0200
+++ gcc/fold-const.c 2017-09-05 15:39:32.457612668 +0200
@@ -5394,6 +5394,7 @@ fold_range_test (location_t loc, enum tr
      short-circuited branch and the underlying object on both sides
      is the same, make a non-short-circuit operation.  */
   else if (LOGICAL_OP_NON_SHORT_CIRCUIT
+    && !flag_sanitize_coverage
     && lhs != 0 && rhs != 0
     && (code == TRUTH_ANDIF_EXPR
         || code == TRUTH_ORIF_EXPR)
@@ -8035,6 +8036,7 @@ fold_truth_andor (location_t loc, enum t
     return tem;
 
   if (LOGICAL_OP_NON_SHORT_CIRCUIT
+      && !flag_sanitize_coverage
       && (code == TRUTH_AND_EXPR
           || code == TRUTH_ANDIF_EXPR
           || code == TRUTH_OR_EXPR
--- gcc/opts.c.jj 2017-09-01 09:26:28.041854018 +0200
+++ gcc/opts.c 2017-09-05 15:53:18.752765907 +0200
@@ -1526,6 +1526,17 @@ const struct sanitizer_opts_s sanitizer_
   { NULL, 0U, 0UL, false }
 };
 
+/* -f{,no-}sanitize-coverage= suboptions.  */
+const struct sanitizer_opts_s coverage_sanitizer_opts[] =
+{
+#define COVERAGE_SANITIZER_OPT(name, flags) \
+    { #name, flags, sizeof #name - 1, true }
+  COVERAGE_SANITIZER_OPT (trace-pc, SANITIZE_COV_TRACE_PC),
+  COVERAGE_SANITIZER_OPT (trace-cmp, SANITIZE_COV_TRACE_CMP),
+#undef COVERAGE_SANITIZER_OPT
+  { NULL, 0U, 0UL, false }
+};
+
 /* A struct for describing a run of chars within a string.  */
 
 struct string_fragment
@@ -1556,31 +1567,34 @@ struct edit_distance_traits<const string
 
 /* Given ARG, an unrecognized sanitizer option, return the best
    matching sanitizer option, or NULL if there isn't one.
-   CODE is OPT_fsanitize_ or OPT_fsanitize_recover_.
+   OPTS is array of candidate sanitizer options.
+   CODE is OPT_fsanitize_, OPT_fsanitize_recover_ or
+   OPT_fsanitize_coverage_.
    VALUE is non-zero for the regular form of the option, zero
    for the "no-" form (e.g. "-fno-sanitize-recover=").  */
 
 static const char *
 get_closest_sanitizer_option (const string_fragment &arg,
+         const struct sanitizer_opts_s *opts,
          enum opt_code code, int value)
 {
   best_match <const string_fragment &, const char*> bm (arg);
-  for (int i = 0; sanitizer_opts[i].name != NULL; ++i)
+  for (int i = 0; opts[i].name != NULL; ++i)
     {
       /* -fsanitize=all is not valid, so don't offer it.  */
-      if (sanitizer_opts[i].flag == ~0U
-   && code == OPT_fsanitize_
+      if (code == OPT_fsanitize_
+   && opts[i].flag == ~0U
    && value)
  continue;
 
       /* For -fsanitize-recover= (and not -fno-sanitize-recover=),
   don't offer the non-recoverable options.  */
-      if (!sanitizer_opts[i].can_recover
-   && code == OPT_fsanitize_recover_
+      if (code == OPT_fsanitize_recover_
+   && !opts[i].can_recover
    && value)
  continue;
 
-      bm.consider (sanitizer_opts[i].name);
+      bm.consider (opts[i].name);
     }
   return bm.get_best_meaningful_candidate ();
 }
@@ -1594,6 +1608,13 @@ parse_sanitizer_options (const char *p,
     unsigned int flags, int value, bool complain)
 {
   enum opt_code code = (enum opt_code) scode;
+
+  const struct sanitizer_opts_s *opts;
+  if (code == OPT_fsanitize_coverage_)
+    opts = coverage_sanitizer_opts;
+  else
+    opts = sanitizer_opts;
+
   while (*p != 0)
     {
       size_t len, i;
@@ -1611,12 +1632,11 @@ parse_sanitizer_options (const char *p,
  }
 
       /* Check to see if the string matches an option class name.  */
-      for (i = 0; sanitizer_opts[i].name != NULL; ++i)
- if (len == sanitizer_opts[i].len
-     && memcmp (p, sanitizer_opts[i].name, len) == 0)
+      for (i = 0; opts[i].name != NULL; ++i)
+ if (len == opts[i].len && memcmp (p, opts[i].name, len) == 0)
    {
      /* Handle both -fsanitize and -fno-sanitize cases.  */
-     if (value && sanitizer_opts[i].flag == ~0U)
+     if (value && opts[i].flag == ~0U)
        {
   if (code == OPT_fsanitize_)
     {
@@ -1633,14 +1653,14 @@ parse_sanitizer_options (const char *p,
      -fsanitize-recover=return if -fsanitize-recover=undefined
      is selected.  */
   if (code == OPT_fsanitize_recover_
-      && sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
+      && opts[i].flag == SANITIZE_UNDEFINED)
     flags |= (SANITIZE_UNDEFINED
        & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
   else
-    flags |= sanitizer_opts[i].flag;
+    flags |= opts[i].flag;
        }
      else
-       flags &= ~sanitizer_opts[i].flag;
+       flags &= ~opts[i].flag;
      found = true;
      break;
    }
@@ -1649,21 +1669,27 @@ parse_sanitizer_options (const char *p,
  {
    const char *hint
      = get_closest_sanitizer_option (string_fragment (p, len),
-         code, value);
+         opts, code, value);
+
+   const char *suffix;
+   if (code == OPT_fsanitize_recover_)
+     suffix = "-recover";
+   else if (code == OPT_fsanitize_coverage_)
+     suffix = "-coverage";
+   else
+     suffix = "";
 
    if (hint)
      error_at (loc,
         "unrecognized argument to -f%ssanitize%s= option: %q.*s;"
         " did you mean %qs?",
         value ? "" : "no-",
-        code == OPT_fsanitize_ ? "" : "-recover",
-        (int) len, p, hint);
+        suffix, (int) len, p, hint);
    else
      error_at (loc,
         "unrecognized argument to -f%ssanitize%s= option: %q.*s",
         value ? "" : "no-",
-        code == OPT_fsanitize_ ? "" : "-recover",
-        (int) len, p);
+        suffix, (int) len, p);
  }
 
       if (comma == NULL)
@@ -1956,6 +1982,12 @@ common_handle_option (struct gcc_options
    &= ~(SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT);
       break;
 
+    case OPT_fsanitize_coverage_:
+      opts->x_flag_sanitize_coverage
+ = parse_sanitizer_options (arg, loc, code,
+       opts->x_flag_sanitize_coverage, value, true);
+      break;
+
     case OPT_O:
     case OPT_Os:
     case OPT_Ofast:
--- gcc/sancov.c.jj 2017-09-01 09:26:48.603612240 +0200
+++ gcc/sancov.c 2017-09-05 17:51:02.209865830 +0200
@@ -1,6 +1,7 @@
 /* Code coverage instrumentation for fuzzing.
    Copyright (C) 2015-2017 Free Software Foundation, Inc.
-   Contributed by Dmitry Vyukov <dvyukov@google.com>
+   Contributed by Dmitry Vyukov <dvyukov@google.com> and
+   Wish Wu <wishwu007@gmail.com>
 
 This file is part of GCC.
 
@@ -29,32 +30,271 @@ along with GCC; see the file COPYING3.
 #include "flags.h"
 #include "stmt.h"
 #include "gimple-iterator.h"
+#include "gimple-builder.h"
 #include "tree-cfg.h"
 #include "tree-pass.h"
 #include "tree-iterator.h"
+#include "fold-const.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "output.h"
+#include "cgraph.h"
 #include "asan.h"
 
 namespace {
 
+/* Instrument one comparison operation, which compares lhs and rhs.
+   Call the instrumentation function with the comparison operand.
+   For integral comparisons if exactly one of the comparison operands is
+   constant, call __sanitizer_cov_trace_const_cmp* instead of
+   __sanitizer_cov_trace_cmp*.  */
+
+static void
+instrument_comparison (gimple_stmt_iterator *gsi, tree lhs, tree rhs)
+{
+  tree type = TREE_TYPE (lhs);
+  enum built_in_function fncode = END_BUILTINS;
+  tree to_type = NULL_TREE;
+  bool c = false;
+
+  if (INTEGRAL_TYPE_P (type))
+    {
+      c = (is_gimple_min_invariant (lhs)
+    ^ is_gimple_min_invariant (rhs));
+      switch (int_size_in_bytes (type))
+ {
+ case 1:
+         fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1
+       : BUILT_IN_SANITIZER_COV_TRACE_CMP1;
+   to_type = unsigned_char_type_node;
+   break;
+ case 2:
+         fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2
+       : BUILT_IN_SANITIZER_COV_TRACE_CMP2;
+   to_type = uint16_type_node;
+   break;
+ case 4:
+         fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4
+       : BUILT_IN_SANITIZER_COV_TRACE_CMP4;
+   to_type = uint32_type_node;
+   break;
+ default:
+         fncode = c ? BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8
+       : BUILT_IN_SANITIZER_COV_TRACE_CMP8;
+   to_type = uint64_type_node;
+   break;
+ }
+    }
+  else if (SCALAR_FLOAT_TYPE_P (type))
+    {
+      if (TYPE_MODE (type) == TYPE_MODE (float_type_node))
+ {
+         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPF;
+   to_type = float_type_node;
+ }
+      else if (TYPE_MODE (type) == TYPE_MODE (double_type_node))
+ {
+         fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPD;
+   to_type = double_type_node;
+ }
+    }
+
+  if (to_type != NULL_TREE)
+    {
+      gimple_seq seq = NULL;
+
+      if (!useless_type_conversion_p (to_type, type))
+ {
+   if (TREE_CODE (lhs) == INTEGER_CST)
+     lhs = fold_convert (to_type, lhs);
+   else
+     {
+       gimple_seq_add_stmt (&seq, build_type_cast (to_type, lhs));
+       lhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+     }
+
+   if (TREE_CODE (rhs) == INTEGER_CST)
+     rhs = fold_convert (to_type, rhs);
+   else
+     {
+       gimple_seq_add_stmt (&seq, build_type_cast (to_type, rhs));
+       rhs = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+     }
+ }
+
+      if (c && !is_gimple_min_invariant (lhs))
+ std::swap (lhs, rhs);
+
+      tree fndecl = builtin_decl_implicit (fncode);
+      gimple *gcall = gimple_build_call (fndecl, 2, lhs, rhs);
+      gimple_seq_add_stmt (&seq, gcall);
+
+      gimple_seq_set_location (seq, gimple_location (gsi_stmt (*gsi)));
+      gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+    }
+}
+
+/* Instrument switch statement.  Call __sanitizer_cov_trace_switch with
+   the value of the index and array that contains number of case values,
+   the bitsize of the index and the case values converted to uint64_t.  */
+
+static void
+instrument_switch (gimple_stmt_iterator *gsi, gimple *stmt, function *fun)
+{
+  gswitch *switch_stmt = as_a<gswitch *> (stmt);
+  tree index = gimple_switch_index (switch_stmt);
+  HOST_WIDE_INT size_in_bytes = int_size_in_bytes (TREE_TYPE (index));
+  if (size_in_bytes == -1 || size_in_bytes > 8)
+    return;
+
+  location_t loc = gimple_location (stmt);
+  unsigned i, n = gimple_switch_num_labels (switch_stmt), num = 0;
+  for (i = 1; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+ num++;
+
+      tree high_case = CASE_HIGH (label);
+      if (high_case != NULL_TREE)
+ num++;
+    }
+
+  tree case_array_type
+   = build_array_type (build_type_variant (uint64_type_node, 1, 0),
+         build_index_type (size_int (num + 2 - 1)));
+
+  char name[64];
+  static size_t case_array_count = 0;
+  ASM_GENERATE_INTERNAL_LABEL (name, "LCASEARRAY", case_array_count++);
+  tree case_array_var = build_decl (loc, VAR_DECL, get_identifier (name),
+        case_array_type);
+  TREE_STATIC (case_array_var) = 1;
+  TREE_PUBLIC (case_array_var) = 0;
+  TREE_CONSTANT (case_array_var) = 1;
+  TREE_READONLY (case_array_var) = 1;
+  DECL_EXTERNAL (case_array_var) = 0;
+  DECL_ARTIFICIAL (case_array_var) = 1;
+  DECL_IGNORED_P (case_array_var) = 1;
+
+  vec <constructor_elt, va_gc> *v = NULL;
+  vec_alloc (v, num + 2);
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+     build_int_cst (uint64_type_node, num));
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+     build_int_cst (uint64_type_node,
+      size_in_bytes * BITS_PER_UNIT));
+  for (i = 1; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+    fold_convert (uint64_type_node, low_case));
+
+      tree high_case = CASE_HIGH (label);
+      if (high_case != NULL_TREE)
+ CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
+    fold_convert (uint64_type_node, high_case));
+    }
+  tree ctor = build_constructor (case_array_type, v);
+  TREE_STATIC (ctor) = 1;
+  TREE_PUBLIC (ctor) = 0;
+  TREE_CONSTANT (ctor) = 1;
+  TREE_READONLY (ctor) = 1;
+  DECL_INITIAL (case_array_var) = ctor;
+  varpool_node::finalize_decl (case_array_var);
+  add_local_decl (fun, case_array_var);
+
+  gimple_seq seq = NULL;
+
+  if (!useless_type_conversion_p (uint64_type_node, TREE_TYPE (index)))
+    {
+      if (TREE_CODE (index) == INTEGER_CST)
+ index = fold_convert (uint64_type_node, index);
+      else
+ {
+   gimple_seq_add_stmt (&seq, build_type_cast (uint64_type_node, index));
+   index = gimple_assign_lhs (gimple_seq_last_stmt (seq));
+ }
+    }
+
+  tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_SWITCH);
+  gimple *gcall = gimple_build_call (fndecl, 2, index,
+         build_fold_addr_expr (case_array_var));
+  gimple_seq_add_stmt (&seq, gcall);
+
+  gimple_seq_set_location (seq, loc);
+  gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+}
+
 unsigned
 sancov_pass (function *fun)
 {
   initialize_sanitizer_builtins ();
 
   /* Insert callback into beginning of every BB. */
-  tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
-  basic_block bb;
-  FOR_EACH_BB_FN (bb, fun)
-    {
-      gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
-      if (gsi_end_p (gsi))
- continue;
-      gimple *stmt = gsi_stmt (gsi);
-      gimple *gcall = gimple_build_call (fndecl, 0);
-      gimple_set_location (gcall, gimple_location (stmt));
-      gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_PC)
+    {
+      basic_block bb;
+      tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
+      FOR_EACH_BB_FN (bb, fun)
+ {
+   gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
+   if (gsi_end_p (gsi))
+     continue;
+   gimple *stmt = gsi_stmt (gsi);
+   gimple *gcall = gimple_build_call (fndecl, 0);
+   gimple_set_location (gcall, gimple_location (stmt));
+   gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+ }
+    }
+
+  /* Insert callback into every comparison related operation.  */
+  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_CMP)
+    {
+      basic_block bb;
+      FOR_EACH_BB_FN (bb, fun)
+ {
+   gimple_stmt_iterator gsi;
+   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+     {
+       gimple *stmt = gsi_stmt (gsi);
+       enum tree_code rhs_code;
+       switch (gimple_code (stmt))
+  {
+  case GIMPLE_ASSIGN:
+    rhs_code = gimple_assign_rhs_code (stmt);
+    if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+      instrument_comparison (&gsi,
+        gimple_assign_rhs1 (stmt),
+        gimple_assign_rhs2 (stmt));
+    else if (rhs_code == COND_EXPR
+      && COMPARISON_CLASS_P (gimple_assign_rhs1 (stmt)))
+      {
+        tree cond = gimple_assign_rhs1 (stmt);
+        instrument_comparison (&gsi, TREE_OPERAND (cond, 0),
+          TREE_OPERAND (cond, 1));
+      }
+    break;
+  case GIMPLE_COND:
+    instrument_comparison (&gsi,
+      gimple_cond_lhs (stmt),
+      gimple_cond_rhs (stmt));
+    break;
+
+  case GIMPLE_SWITCH:
+    instrument_switch (&gsi, stmt, fun);
+    break;
+
+  default:
+    break;
+  }
+     }
+ }
     }
   return 0;
 }
--- gcc/sanitizer.def.jj 2017-07-28 12:35:27.565412232 +0200
+++ gcc/sanitizer.def 2017-09-05 17:01:51.281308488 +0200
@@ -537,6 +537,39 @@ DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HAN
 DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_PC,
         "__sanitizer_cov_trace_pc",
         BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP1,
+        "__sanitizer_cov_trace_cmp1",
+        BT_FN_VOID_UINT8_UINT8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP2,
+        "__sanitizer_cov_trace_cmp2",
+        BT_FN_VOID_UINT16_UINT16, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP4,
+        "__sanitizer_cov_trace_cmp4",
+        BT_FN_VOID_UINT32_UINT32, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP8,
+        "__sanitizer_cov_trace_cmp8",
+        BT_FN_VOID_UINT64_UINT64, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP1,
+        "__sanitizer_cov_trace_const_cmp1",
+        BT_FN_VOID_UINT8_UINT8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP2,
+        "__sanitizer_cov_trace_const_cmp2",
+        BT_FN_VOID_UINT16_UINT16, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP4,
+        "__sanitizer_cov_trace_const_cmp4",
+        BT_FN_VOID_UINT32_UINT32, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CONST_CMP8,
+        "__sanitizer_cov_trace_const_cmp8",
+        BT_FN_VOID_UINT64_UINT64, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMPF,
+        "__sanitizer_cov_trace_cmpf",
+        BT_FN_VOID_FLOAT_FLOAT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMPD,
+        "__sanitizer_cov_trace_cmpd",
+        BT_FN_VOID_DOUBLE_DOUBLE, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_SWITCH,
+        "__sanitizer_cov_trace_switch",
+        BT_FN_VOID_UINT64_PTR, ATTR_NOTHROW_LEAF_LIST)
 
 /* This has to come after all the sanitizer builtins.  */
 DEF_BUILTIN_STUB(END_SANITIZER_BUILTINS, (const char *)0)
--- gcc/tree-ssa-ifcombine.c.jj 2017-06-30 09:49:28.004662123 +0200
+++ gcc/tree-ssa-ifcombine.c 2017-09-05 15:39:32.460612633 +0200
@@ -560,7 +560,7 @@ ifcombine_ifandif (basic_block inner_con
  {
    tree t1, t2;
    gimple_stmt_iterator gsi;
-   if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
+   if (!LOGICAL_OP_NON_SHORT_CIRCUIT || flag_sanitize_coverage)
      return false;
    /* Only do this optimization if the inner bb contains only the conditional. */
    if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb)))
--- gcc/doc/invoke.texi.jj 2017-09-04 09:55:25.000000000 +0200
+++ gcc/doc/invoke.texi 2017-09-05 18:22:02.863499354 +0200
@@ -11161,6 +11161,20 @@ is usable even in freestanding environme
 Enable coverage-guided fuzzing code instrumentation.
 Inserts a call to @code{__sanitizer_cov_trace_pc} into every basic block.
 
+@item -fsanitize-coverage=trace-cmp
+@opindex fsanitize-coverage=trace-cmp
+Enable dataflow guided fuzzing code instrumentation.
+Inserts a call to @code{__sanitizer_cov_trace_cmp1},
+@code{__sanitizer_cov_trace_cmp2}, @code{__sanitizer_cov_trace_cmp4} or
+@code{__sanitizer_cov_trace_cmp8} for integral comparison with both operands
+variable or @code{__sanitizer_cov_trace_const_cmp1},
+@code{__sanitizer_cov_trace_const_cmp2},
+@code{__sanitizer_cov_trace_const_cmp4} or
+@code{__sanitizer_cov_trace_const_cmp8} for integral comparison with one
+operand constant, @code{__sanitizer_cov_trace_cmpf} or
+@code{__sanitizer_cov_trace_cmpd} for float or double comparisons and
+@code{__sanitizer_cov_trace_switch} for switch statements.
+
 @item -fbounds-check
 @opindex fbounds-check
 For front ends that support it, generate additional code to check that
--- gcc/testsuite/gcc.dg/sancov/cmp0.c.jj 2017-09-05 17:54:00.906717081 +0200
+++ gcc/testsuite/gcc.dg/sancov/cmp0.c 2017-09-05 18:04:15.319329045 +0200
@@ -0,0 +1,92 @@
+/* Basic test on number of inserted callbacks.  */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize-coverage=trace-cmp -fdump-tree-optimized" } */
+
+void
+foo (char *a, short *b, int *c, long long *d, float *e, double *f)
+{
+  if (*a)
+    *a += 1;
+  if (*b)
+    *b = *a;
+  if (*c)
+    *c += 1;
+  if (*d)
+    *d = *c;
+  if (*e == *c)
+    *e = *c;
+  if (*f == *e)
+    *f = *e;
+  switch (*a)
+    {
+    case 2:
+      *b += 2;
+      break;
+    case 3:
+      *b += 3;
+      break;
+    case 4:
+      *b += 4;
+      break;
+    case 5:
+      *b += 5;
+      break;
+    case 6:
+      *b += 6;
+      break;
+    case 7 ... 24:
+      *b += 7;
+      break;
+    default:
+      break;
+    }
+  switch (*d)
+    {
+    case 3:
+      *d += 3;
+    case -4:
+      *d -= 4;
+    case -5:
+      *d -= 5;
+    case -6:
+      *d -= 6;
+    case -7:
+      *d -= 7;
+    case -8:
+      *d -= 8;
+    case -9:
+      *d -= 9;
+    case -10:
+      *d -= 10;
+    }
+}
+
+void
+bar (int *c)
+{
+  if (*c == 27)
+    *c += 2;
+  if (*c == 37)
+    *c += 2;
+}
+
+int
+baz (int *c, long long d, long long e)
+{
+  *c = (*c == 48) ? 12 : 24;
+  return d == e;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp1 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp2 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp8 \\(0, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(27, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(37, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp4 \\(48, " 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmp8 \\(" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmpf \\(" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmpd \\(" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_const_cmp" 7 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_cmp" 3 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "__builtin___sanitizer_cov_trace_switch \\(" 2 "optimized" } } */


 Jakub
Jakub Jelinek Sept. 6, 2017, 2:37 p.m. UTC | #17
On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
> Hi Jakub
> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons.
> I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs.
> But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.

Ok.  Please make sure those entrypoints make it into the various example
__sanitier_cov_trace* fuzzer implementations though, so that people using
-fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
At least it should be added to sanitizer_common (both in LLVM and GCC).

BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
-fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
using some weirdo LLVM IL acronym instead of being named by what it really
traces (trace-array-idx or something similar)).

Any plans to implement some or all of those?

	Jakub
Jakub Jelinek Sept. 6, 2017, 2:38 p.m. UTC | #18
On Wed, Sep 06, 2017 at 04:37:18PM +0200, Jakub Jelinek wrote:
> Ok.  Please make sure those entrypoints make it into the various example
> __sanitier_cov_trace* fuzzer implementations though, so that people using
> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
> At least it should be added to sanitizer_common (both in LLVM and GCC).

Forgot to say that I've committed the patch to GCC trunk today.

	Jakub
=?UTF-8?B?5ZC05r2N5rWgKOatpOW9vCk=?= Sept. 7, 2017, 7:02 a.m. UTC | #19
Hi
The trace-div and trace-gep options seems be used to evaluate corpus 
to trigger specific kind of bugs. And they don't have strong effect to coverage.

The trace-pc-guard is useful, but it may be much more complex than trace-pc.
I think the best benefit of trace-pc-guard is avoiding dealing ASLR of programs,
especially programs with dlopen(). Seems hard to implement it in Linux kernel.

The inline-8bit-counters and pc-table is too close to implementation of fuzz tool and
option trace-pc-guard .

I am interesting in "stack-depth" and "func". If we trace user-defined functions enter and exit, 
we can calculate stack-depth and difference level of stack to past existed stack. 
Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not standard of llvm.

How do you think Dmitry ?

Wish Wu

------------------------------------------------------------------
From:Jakub Jelinek <jakub@redhat.com>
Time:2017 Sep 6 (Wed) 22:37
To:Wish Wu <weixi.wwx@antfin.com>
Cc:Dmitry Vyukov <dvyukov@google.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
Subject:Re: Add support to trace comparison instructions and switch statements


On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
> Hi Jakub
> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons.
> I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs.
> But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.

Ok.  Please make sure those entrypoints make it into the various example
__sanitier_cov_trace* fuzzer implementations though, so that people using
-fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
At least it should be added to sanitizer_common (both in LLVM and GCC).

BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
-fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
using some weirdo LLVM IL acronym instead of being named by what it really
traces (trace-array-idx or something similar)).

Any plans to implement some or all of those?

 Jakub
Ed Smith-Rowland via gcc-patches Sept. 12, 2017, 2:32 p.m. UTC | #20
On Thu, Sep 7, 2017 at 9:02 AM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
> Hi
> The trace-div and trace-gep options seems be used to evaluate corpus
> to trigger specific kind of bugs. And they don't have strong effect to coverage.
>
> The trace-pc-guard is useful, but it may be much more complex than trace-pc.
> I think the best benefit of trace-pc-guard is avoiding dealing ASLR of programs,
> especially programs with dlopen(). Seems hard to implement it in Linux kernel.
>
> The inline-8bit-counters and pc-table is too close to implementation of fuzz tool and
> option trace-pc-guard .
>
> I am interesting in "stack-depth" and "func". If we trace user-defined functions enter and exit,
> we can calculate stack-depth and difference level of stack to past existed stack.
> Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not standard of llvm.
>
> How do you think Dmitry ?
>
> Wish Wu
>
> ------------------------------------------------------------------
> From:Jakub Jelinek <jakub@redhat.com>
> Time:2017 Sep 6 (Wed) 22:37
> To:Wish Wu <weixi.wwx@antfin.com>
> Cc:Dmitry Vyukov <dvyukov@google.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
> Subject:Re: Add support to trace comparison instructions and switch statements
>
>
> On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
>> Hi Jakub
>> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
>> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
>> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons.
>> I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs.
>> But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.
>
> Ok.  Please make sure those entrypoints make it into the various example
> __sanitier_cov_trace* fuzzer implementations though, so that people using
> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
> At least it should be added to sanitizer_common (both in LLVM and GCC).
>
> BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
> -fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
> using some weirdo LLVM IL acronym instead of being named by what it really
> traces (trace-array-idx or something similar)).
>
> Any plans to implement some or all of those?


Thanks, Jakub!

I've tested it on Linux kernel. Compiler does not crash, code is instrumented.

Re  terribly misnamed trace-gep, dunno, I will leave this to Kostya.

Re __sanitizer_cov_trace_cmp{f,d}, I am still not sure.

> But libdng_sdk really has real type comparisons.

Do they come from input data? In what format? How do you want to use
them? E.g. if they come from input but with using some non-trivial
transformation and the fuzzer will try to find them in the input, it
won't be able to do so.
On the other hand, it does not seem to be harmful, fuzzers that are
not interested in them can just add empty callbacks.

Re trace-pc-guard, I also don't have strong preference. Global
variables should work for kernel, but we probably will not use them in
kernel, because even aslr aside we would need to establish some global
numbering of these globals across multiple different machines. But
then it's much easier and simper to just use PCs as identifiers.

Re __sanitizer_cov_trace_pc_{enter,exit}, I don't think we ever
experimented/evaluated this idea. Do you have any data that it's
useful? I suspect that it can grow corpus too much.
Ed Smith-Rowland via gcc-patches Sept. 12, 2017, 2:49 p.m. UTC | #21
Some stats from kernel build for number of trace_cmp callbacks:

gcc
non-const: 38051
const: 272726
total: 310777

clang:
non-const: 45944
const: 266299
total: 312243

The total is quite close. Gcc seems to emit more const callbacks, which is good.



On Tue, Sep 12, 2017 at 4:32 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Sep 7, 2017 at 9:02 AM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
>> Hi
>> The trace-div and trace-gep options seems be used to evaluate corpus
>> to trigger specific kind of bugs. And they don't have strong effect to coverage.
>>
>> The trace-pc-guard is useful, but it may be much more complex than trace-pc.
>> I think the best benefit of trace-pc-guard is avoiding dealing ASLR of programs,
>> especially programs with dlopen(). Seems hard to implement it in Linux kernel.
>>
>> The inline-8bit-counters and pc-table is too close to implementation of fuzz tool and
>> option trace-pc-guard .
>>
>> I am interesting in "stack-depth" and "func". If we trace user-defined functions enter and exit,
>> we can calculate stack-depth and difference level of stack to past existed stack.
>> Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not standard of llvm.
>>
>> How do you think Dmitry ?
>>
>> Wish Wu
>>
>> ------------------------------------------------------------------
>> From:Jakub Jelinek <jakub@redhat.com>
>> Time:2017 Sep 6 (Wed) 22:37
>> To:Wish Wu <weixi.wwx@antfin.com>
>> Cc:Dmitry Vyukov <dvyukov@google.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
>> Subject:Re: Add support to trace comparison instructions and switch statements
>>
>>
>> On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
>>> Hi Jakub
>>> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
>>> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
>>> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons.
>>> I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs.
>>> But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.
>>
>> Ok.  Please make sure those entrypoints make it into the various example
>> __sanitier_cov_trace* fuzzer implementations though, so that people using
>> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.
>> At least it should be added to sanitizer_common (both in LLVM and GCC).
>>
>> BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
>> -fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
>> using some weirdo LLVM IL acronym instead of being named by what it really
>> traces (trace-array-idx or something similar)).
>>
>> Any plans to implement some or all of those?
>
>
> Thanks, Jakub!
>
> I've tested it on Linux kernel. Compiler does not crash, code is instrumented.
>
> Re  terribly misnamed trace-gep, dunno, I will leave this to Kostya.
>
> Re __sanitizer_cov_trace_cmp{f,d}, I am still not sure.
>
>> But libdng_sdk really has real type comparisons.
>
> Do they come from input data? In what format? How do you want to use
> them? E.g. if they come from input but with using some non-trivial
> transformation and the fuzzer will try to find them in the input, it
> won't be able to do so.
> On the other hand, it does not seem to be harmful, fuzzers that are
> not interested in them can just add empty callbacks.
>
> Re trace-pc-guard, I also don't have strong preference. Global
> variables should work for kernel, but we probably will not use them in
> kernel, because even aslr aside we would need to establish some global
> numbering of these globals across multiple different machines. But
> then it's much easier and simper to just use PCs as identifiers.
>
> Re __sanitizer_cov_trace_pc_{enter,exit}, I don't think we ever
> experimented/evaluated this idea. Do you have any data that it's
> useful? I suspect that it can grow corpus too much.
Ed Smith-Rowland via gcc-patches Sept. 12, 2017, 4:35 p.m. UTC | #22
On Tue, Sep 12, 2017 at 7:32 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Sep 7, 2017 at 9:02 AM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:
>> Hi
>> The trace-div and trace-gep options seems be used to evaluate corpus
>> to trigger specific kind of bugs. And they don't have strong effect to coverage.

These are used for what I call data-flow-driven mutations.
If we see x/y we tried to drive the inputs toward y==1.
Similarly, when we see a[idx], we try to drive towards idx being large
or negative.
When combined with value profiling, these do affect "generalized"
coverage and thus the corpus size.
They don't directly affect the regular edge coverage.

>>
>> The trace-pc-guard is useful, but it may be much more complex than trace-pc.
>> I think the best benefit of trace-pc-guard is avoiding dealing ASLR of programs,
>> especially programs with dlopen(). Seems hard to implement it in Linux kernel.

One of the benefits of trace-pc-guard is that you have to deal with
consecutive integers, not PCs (that are indeed affected by ARLR).

>>
>> The inline-8bit-counters and pc-table is too close to implementation of fuzz tool and
>> option trace-pc-guard .
>>
>> I am interesting in "stack-depth" and "func".

stack-depth is fully independent of all others.

"func" is a modifier that tells to insert callbacks only at the function entry.
It applies to trace-pc, trace-pc-guard, inline-8bit-counters, and pc-table
Other modifiers are bb (basic blocks) and edge (split critical edges,
then apply instrumentation to all BBs)

>> If we trace user-defined functions enter and exit,
>> we can calculate stack-depth and difference level of stack to past existed stack.
>> Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not standard of llvm.

__sanitizer_cov_trace_pc_enter would be equivalent to trace-pc,func

we don't have __sanitizer_cov_trace_pc_exit. It's not very useful for
fuzzing (afaict).
I had one or two requests to implement __sanitizer_cov_trace_pc_exit
but at the end I was able to convince the folks that they don't need
it.

With pc-table and trace-pc[-guard] we already can distinguish func
entry from other blocks.
Can probably add special handling for exit, if someone explains why
this is interesting (in a separate thread, perhaps).

Also, gcc already has -finstrument-functions


>>
>> How do you think Dmitry ?
>>
>> Wish Wu
>>
>> ------------------------------------------------------------------
>> From:Jakub Jelinek <jakub@redhat.com>
>> Time:2017 Sep 6 (Wed) 22:37
>> To:Wish Wu <weixi.wwx@antfin.com>
>> Cc:Dmitry Vyukov <dvyukov@google.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>
>> Subject:Re: Add support to trace comparison instructions and switch statements
>>
>>
>> On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:
>>> Hi Jakub
>>> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".
>>> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.
>>> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons.

But why would you need to trace floats?
"Just for completeness" is not good enough.
It's extremely easy to add, but I don't want to pollute the code &
docs with something nobody needs.


>>> I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs.
>>> But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.
>>
>> Ok.  Please make sure those entrypoints make it into the various example
>> __sanitier_cov_trace* fuzzer implementations though, so that people using
>> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.

Yes. It would be lovely if we can keep both LLVM and GCC in sync wrt
the interface.

>> At least it should be added to sanitizer_common (both in LLVM and GCC).
>>
>> BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other
>> -fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep
>> using some weirdo LLVM IL acronym instead of being named by what it really
>> traces (trace-array-idx or something similar)).

I don't mind renaming trace-gep.

>>
>> Any plans to implement some or all of those?
>
>
> Thanks, Jakub!
>
> I've tested it on Linux kernel. Compiler does not crash, code is instrumented.
>
> Re  terribly misnamed trace-gep, dunno, I will leave this to Kostya.
>
> Re __sanitizer_cov_trace_cmp{f,d}, I am still not sure.
>
>> But libdng_sdk really has real type comparisons.
>
> Do they come from input data? In what format? How do you want to use
> them? E.g. if they come from input but with using some non-trivial
> transformation and the fuzzer will try to find them in the input, it
> won't be able to do so.
> On the other hand, it does not seem to be harmful, fuzzers that are
> not interested in them can just add empty callbacks.
>
> Re trace-pc-guard, I also don't have strong preference. Global
> variables should work for kernel, but we probably will not use them in
> kernel, because even aslr aside we would need to establish some global
> numbering of these globals across multiple different machines. But
> then it's much easier and simper to just use PCs as identifiers.
>
> Re __sanitizer_cov_trace_pc_{enter,exit}, I don't think we ever
> experimented/evaluated this idea. Do you have any data that it's
> useful? I suspect that it can grow corpus too much.
Tamar Christina Sept. 19, 2017, 1:14 p.m. UTC | #23
Adding the list back in.
________________________________________
From: Tamar Christina

Sent: Tuesday, September 19, 2017 2:11 PM
To: Dmitry Vyukov; Kostya Serebryany
Cc: 吴潍浠(此彼); Jakub Jelinek; Jeff Law; wishwu007; Alexander Potapenko; andreyknvl; nd
Subject: Re: Add support to trace comparison instructions and switch statements

Hi Jakub,

The testcase seems to be broken on AArch64 (aarch64-none-elf) and -O3:

PASS: gcc.dg/sancov/cmp0.c   -O0 -g  (test for excess errors)
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp1 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp2 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp4 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp8 \\(0, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp4 \\(27, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp4 \\(37, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp4 \\(48, " 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_cmp8 \\(" 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_cmpf \\(" 1
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_cmpd \\(" 1
FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_const_cmp" 7
PASS: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_cmp" 3
FAIL: gcc.dg/sancov/cmp0.c   -O0 -g   scan-tree-dump-times optimized "__builtin___sanitizer_cov_trace_switch \\(" 2

it's fine at O1, O2 and O3 though. Should the test be running for O0?

Thanks,
Tamar
________________________________________
From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org> on behalf of Kostya Serebryany via gcc-patches <gcc-patches@gcc.gnu.org>

Sent: Tuesday, September 12, 2017 5:35 PM
To: Dmitry Vyukov
Cc: 吴潍浠(此彼); Jakub Jelinek; gcc-patches; Jeff Law; wishwu007; Alexander Potapenko; andreyknvl
Subject: Re: Add support to trace comparison instructions and switch statements

On Tue, Sep 12, 2017 at 7:32 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Thu, Sep 7, 2017 at 9:02 AM, 吴潍浠(此彼) <weixi.wwx@antfin.com> wrote:

>> Hi

>> The trace-div and trace-gep options seems be used to evaluate corpus

>> to trigger specific kind of bugs. And they don't have strong effect to coverage.


These are used for what I call data-flow-driven mutations.
If we see x/y we tried to drive the inputs toward y==1.
Similarly, when we see a[idx], we try to drive towards idx being large
or negative.
When combined with value profiling, these do affect "generalized"
coverage and thus the corpus size.
They don't directly affect the regular edge coverage.

>>

>> The trace-pc-guard is useful, but it may be much more complex than trace-pc.

>> I think the best benefit of trace-pc-guard is avoiding dealing ASLR of programs,

>> especially programs with dlopen(). Seems hard to implement it in Linux kernel.


One of the benefits of trace-pc-guard is that you have to deal with
consecutive integers, not PCs (that are indeed affected by ARLR).

>>

>> The inline-8bit-counters and pc-table is too close to implementation of fuzz tool and

>> option trace-pc-guard .

>>

>> I am interesting in "stack-depth" and "func".


stack-depth is fully independent of all others.

"func" is a modifier that tells to insert callbacks only at the function entry.
It applies to trace-pc, trace-pc-guard, inline-8bit-counters, and pc-table
Other modifiers are bb (basic blocks) and edge (split critical edges,
then apply instrumentation to all BBs)

>> If we trace user-defined functions enter and exit,

>> we can calculate stack-depth and difference level of stack to past existed stack.

>> Adding __sanitizer_cov_trace_pc_{enter,exit} is easy , but it is not standard of llvm.


__sanitizer_cov_trace_pc_enter would be equivalent to trace-pc,func

we don't have __sanitizer_cov_trace_pc_exit. It's not very useful for
fuzzing (afaict).
I had one or two requests to implement __sanitizer_cov_trace_pc_exit
but at the end I was able to convince the folks that they don't need
it.

With pc-table and trace-pc[-guard] we already can distinguish func
entry from other blocks.
Can probably add special handling for exit, if someone explains why
this is interesting (in a separate thread, perhaps).

Also, gcc already has -finstrument-functions


>>

>> How do you think Dmitry ?

>>

>> Wish Wu

>>

>> ------------------------------------------------------------------

>> From:Jakub Jelinek <jakub@redhat.com>

>> Time:2017 Sep 6 (Wed) 22:37

>> To:Wish Wu <weixi.wwx@antfin.com>

>> Cc:Dmitry Vyukov <dvyukov@google.com>; gcc-patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; wishwu007 <wishwu007@gmail.com>

>> Subject:Re: Add support to trace comparison instructions and switch statements

>>

>>

>> On Wed, Sep 06, 2017 at 07:47:29PM +0800, 吴潍浠(此彼) wrote:

>>> Hi Jakub

>>> I compiled libjpeg-turbo and libdng_sdk with options "-g -O3 -Wall -fsanitize-coverage=trace-pc,trace-cmp -fsanitize=address".

>>> And run my fuzzer with pc and cmp feedbacks for hours. It works fine.

>>> About __sanitizer_cov_trace_cmp{f,d} , yes, it  isn't provided by llvm. But once we trace integer comparisons, why not real type comparisons.


But why would you need to trace floats?
"Just for completeness" is not good enough.
It's extremely easy to add, but I don't want to pollute the code &
docs with something nobody needs.


>>> I remember Dmitry said it is not enough useful to trace real type comparisons because it is rare to see them in programs.

>>> But libdng_sdk really has real type comparisons. So I want to keep them and implementing __sanitizer_cov_trace_const_cmp{f,d} may be necessary.

>>

>> Ok.  Please make sure those entrypoints make it into the various example

>> __sanitier_cov_trace* fuzzer implementations though, so that people using

>> -fsanitize-coverage=trace-cmp in GCC will not need to hack stuff themselves.


Yes. It would be lovely if we can keep both LLVM and GCC in sync wrt
the interface.

>> At least it should be added to sanitizer_common (both in LLVM and GCC).

>>

>> BTW, https://clang.llvm.org/docs/SanitizerCoverage.html shows various other

>> -fsanitize-coverage= options, some of them terribly misnamed (e.g. trace-gep

>> using some weirdo LLVM IL acronym instead of being named by what it really

>> traces (trace-array-idx or something similar)).


I don't mind renaming trace-gep.

>>

>> Any plans to implement some or all of those?

>

>

> Thanks, Jakub!

>

> I've tested it on Linux kernel. Compiler does not crash, code is instrumented.

>

> Re  terribly misnamed trace-gep, dunno, I will leave this to Kostya.

>

> Re __sanitizer_cov_trace_cmp{f,d}, I am still not sure.

>

>> But libdng_sdk really has real type comparisons.

>

> Do they come from input data? In what format? How do you want to use

> them? E.g. if they come from input but with using some non-trivial

> transformation and the fuzzer will try to find them in the input, it

> won't be able to do so.

> On the other hand, it does not seem to be harmful, fuzzers that are

> not interested in them can just add empty callbacks.

>

> Re trace-pc-guard, I also don't have strong preference. Global

> variables should work for kernel, but we probably will not use them in

> kernel, because even aslr aside we would need to establish some global

> numbering of these globals across multiple different machines. But

> then it's much easier and simper to just use PCs as identifiers.

>

> Re __sanitizer_cov_trace_pc_{enter,exit}, I don't think we ever

> experimented/evaluated this idea. Do you have any data that it's

> useful? I suspect that it can grow corpus too much.
Martin Liška Sept. 19, 2017, 1:31 p.m. UTC | #24
On 09/19/2017 03:14 PM, Tamar Christina wrote:
> it's fine at O1, O2 and O3 though. Should the test be running for O0?

It's a known issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82183

Martin
Tamar Christina Sept. 19, 2017, 1:40 p.m. UTC | #25
Ah thanks!
________________________________________
From: Martin Liška <mliska@suse.cz>

Sent: Tuesday, September 19, 2017 2:31 PM
To: Tamar Christina; Dmitry Vyukov; Kostya Serebryany
Cc: 吴潍浠(此彼); Jakub Jelinek; Jeff Law; wishwu007; Alexander Potapenko; andreyknvl; nd; GCC Patches
Subject: Re: Add support to trace comparison instructions and switch statements

On 09/19/2017 03:14 PM, Tamar Christina wrote:
> it's fine at O1, O2 and O3 though. Should the test be running for O0?


It's a known issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82183

Martin

Patch
diff mbox

Index: gcc/asan.c
===================================================================
--- gcc/asan.c	(revision 250082)
+++ gcc/asan.c	(working copy)
@@ -2705,6 +2705,29 @@  initialize_sanitizer_builtins (void)
   tree BT_FN_SIZE_CONST_PTR_INT
     = build_function_type_list (size_type_node, const_ptr_type_node,
 				integer_type_node, NULL_TREE);
+
+  tree BT_FN_VOID_UINT8_UINT8
+    = build_function_type_list (void_type_node, unsigned_char_type_node,
+				unsigned_char_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT16_UINT16
+    = build_function_type_list (void_type_node, uint16_type_node,
+				uint16_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT32_UINT32
+    = build_function_type_list (void_type_node, uint32_type_node,
+				uint32_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_UINT64
+    = build_function_type_list (void_type_node, uint64_type_node,
+				uint64_type_node, NULL_TREE);
+  tree BT_FN_VOID_FLOAT_FLOAT
+    = build_function_type_list (void_type_node, float_type_node,
+				float_type_node, NULL_TREE);
+  tree BT_FN_VOID_DOUBLE_DOUBLE
+    = build_function_type_list (void_type_node, double_type_node,
+				double_type_node, NULL_TREE);
+  tree BT_FN_VOID_UINT64_PTR
+    = build_function_type_list (void_type_node, uint64_type_node,
+				ptr_type_node, NULL_TREE);
+
   tree BT_FN_BOOL_VPTR_PTR_IX_INT_INT[5];
   tree BT_FN_IX_CONST_VPTR_INT[5];
   tree BT_FN_IX_VPTR_IX_INT[5];
Index: gcc/builtin-types.def
===================================================================
--- gcc/builtin-types.def	(revision 250082)
+++ gcc/builtin-types.def	(working copy)
@@ -338,8 +338,20 @@  DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRMODE_PTR,
 		     BT_VOID, BT_PTRMODE, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
 		     BT_VOID, BT_PTR, BT_PTRMODE)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT8_UINT8,
+     		     BT_VOID, BT_UINT8, BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT16_UINT16,
+     		     BT_VOID, BT_UINT16, BT_UINT16)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT32_UINT32,
+     		     BT_VOID, BT_UINT32, BT_UINT32)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT64_UINT64,
      		     BT_VOID, BT_UINT64, BT_UINT64)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_FLOAT_FLOAT,
+     		     BT_VOID, BT_FLOAT, BT_FLOAT)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_DOUBLE_DOUBLE,
+     		     BT_VOID, BT_DOUBLE, BT_DOUBLE)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT64_PTR,
+     		     BT_VOID, BT_UINT64, BT_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VALIST_REF_VALIST_ARG,
 		     BT_VOID, BT_VALIST_REF, BT_VALIST_ARG)
 DEF_FUNCTION_TYPE_2 (BT_FN_LONG_LONG_LONG,
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 250082)
+++ gcc/common.opt	(working copy)
@@ -226,10 +226,9 @@  unsigned int flag_sanitize
 Variable
 unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT | SANITIZE_KERNEL_ADDRESS) & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN)
 
-fsanitize-coverage=trace-pc
-Common Report Var(flag_sanitize_coverage)
-Enable coverage-guided fuzzing code instrumentation.
-Inserts call to __sanitizer_cov_trace_pc into every basic block.
+; What the coverage sanitizers should instrument
+Variable
+unsigned int flag_sanitize_coverage
 
 ; Flag whether a prefix has been added to dump_base_name
 Variable
@@ -975,6 +974,10 @@  fsanitize=
 Common Driver Report Joined
 Select what to sanitize.
 
+fsanitize-coverage=
+Common Driver Report Joined
+Select what to coverage sanitize.
+
 fasan-shadow-offset=
 Common Joined RejectNegative Var(common_deferred_options) Defer
 -fasan-shadow-offset=<number>	Use custom shadow memory offset.
Index: gcc/flag-types.h
===================================================================
--- gcc/flag-types.h	(revision 250082)
+++ gcc/flag-types.h	(working copy)
@@ -250,6 +250,14 @@  enum sanitize_code {
 				  | SANITIZE_BOUNDS_STRICT
 };
 
+/* Different trace modes */
+enum sanitize_coverage_code {
+  /* Trace PC */
+  SANITIZE_COV_TRACE_PC = 1UL << 0,
+  /* Trace Compare */
+  SANITIZE_COV_TRACE_CMP = 1UL << 1
+};
+
 /* flag_vtable_verify initialization levels. */
 enum vtv_priority {
   VTV_NO_PRIORITY       = 0,  /* i.E. Do NOT do vtable verification. */
Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 250082)
+++ gcc/opts.c	(working copy)
@@ -1518,6 +1518,17 @@  const struct sanitizer_opts_s sanitizer_opts[] =
   { NULL, 0U, 0UL, false }
 };
 
+/* -f{,no-}sanitize-coverage= suboptions.  */
+const struct sanitizer_opts_s coverage_sanitizer_opts[] =
+{
+#define SANITIZER_OPT(name, flags, recover) \
+    { #name, flags, sizeof #name - 1, recover }
+  SANITIZER_OPT (trace-pc, SANITIZE_COV_TRACE_PC, false),
+  SANITIZER_OPT (trace-cmp, SANITIZE_COV_TRACE_CMP, false),
+#undef SANITIZER_OPT
+  { NULL, 0U, 0UL, false }
+};
+
 /* A struct for describing a run of chars within a string.  */
 
 struct string_fragment
@@ -1665,6 +1676,85 @@  parse_sanitizer_options (const char *p, location_t
   return flags;
 }
 
+/* Given ARG, an unrecognized coverage sanitizer option, return the best
+   matching coverage sanitizer option, or NULL if there isn't one.
+   VALUE is non-zero for the regular form of the option, zero
+   for the "no-" form (e.g. "-fno-sanitize-coverage=").  */
+
+static const char *
+get_closest_coverage_sanitizer_option (const string_fragment &arg, int value)
+{
+  best_match <const string_fragment &, const char*> bm (arg);
+  for (int i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
+    {
+      bm.consider (coverage_sanitizer_opts[i].name);
+    }
+  return bm.get_best_meaningful_candidate ();
+}
+
+/* Parse comma separated sanitizer suboptions from P for option SCODE,
+   adjust previous FLAGS and return new ones.  If COMPLAIN is false,
+   don't issue diagnostics.  */
+
+unsigned int
+parse_coverage_sanitizer_options (const char *p, location_t loc,
+			 unsigned int flags, int value, bool complain)
+{
+  while (*p != 0)
+    {
+      size_t len, i;
+      bool found = false;
+      const char *comma = strchr (p, ',');
+
+      if (comma == NULL)
+	len = strlen (p);
+      else
+	len = comma - p;
+      if (len == 0)
+	{
+	  p = comma + 1;
+	  continue;
+	}
+
+      /* Check to see if the string matches an option class name.  */
+      for (i = 0; coverage_sanitizer_opts[i].name != NULL; ++i)
+	if (len == coverage_sanitizer_opts[i].len
+	    && memcmp (p, coverage_sanitizer_opts[i].name, len) == 0)
+	  {
+	    if (value)
+	      flags |= coverage_sanitizer_opts[i].flag;
+	    else
+	      flags &= ~coverage_sanitizer_opts[i].flag;
+	    found = true;
+	    break;
+	  }
+
+      if (! found && complain)
+	{
+	  const char *hint
+	    = get_closest_coverage_sanitizer_option (string_fragment (p, len),
+					    	     value);
+
+	  if (hint)
+	    error_at (loc,
+		      "unrecognized argument to -f%ssanitize-coverage= option: %q.*s;"
+		      " did you mean %qs?",
+		      value ? "" : "no-",
+		      (int) len, p, hint);
+	  else
+	    error_at (loc,
+		      "unrecognized argument to -f%ssanitize-coverage= option: %q.*s",
+		      value ? "" : "no-",
+		      (int) len, p);
+	}
+
+      if (comma == NULL)
+	break;
+      p = comma + 1;
+    }
+  return flags;
+}
+
 /* Parse string values of no_sanitize attribute passed in VALUE.
    Values are separated with comma.  Wrong argument is stored to
    WRONG_ARGUMENT variable.  */
@@ -1942,6 +2032,12 @@  common_handle_option (struct gcc_options *opts,
 	  &= ~(SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NONDEFAULT);
       break;
 
+    case OPT_fsanitize_coverage_:
+      opts->x_flag_sanitize_coverage
+	= parse_coverage_sanitizer_options (arg, loc,
+				   opts->x_flag_sanitize_coverage, value, true);
+      break;
+
     case OPT_O:
     case OPT_Os:
     case OPT_Ofast:
Index: gcc/sancov.c
===================================================================
--- gcc/sancov.c	(revision 250082)
+++ gcc/sancov.c	(working copy)
@@ -29,31 +29,194 @@  along with GCC; see the file COPYING3.  If not see
 #include "flags.h"
 #include "stmt.h"
 #include "gimple-iterator.h"
+#include "tree-core.h"
 #include "tree-cfg.h"
 #include "tree-pass.h"
 #include "tree-iterator.h"
+#include "fold-const.h"
+#include "stringpool.h"
+#include "output.h"
+#include "cgraph.h"
 #include "asan.h"
 
 namespace {
 
+static void
+instrument_cond (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+  tree lhs = gimple_cond_lhs (stmt);
+  tree rhs = gimple_cond_rhs (stmt);
+  unsigned int bitno = TYPE_PRECISION (TREE_TYPE (lhs)) > TYPE_PRECISION (TREE_TYPE (rhs)) ?
+   		       TYPE_PRECISION (TREE_TYPE (lhs)) : TYPE_PRECISION (TREE_TYPE (rhs));
+  if (TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE)
+    {
+      enum built_in_function fncode;
+      switch (bitno)
+	{
+	case 8:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP1;
+	  break;
+
+	case 16:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP2;
+	  break;
+
+	case 32:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP4;
+	  break;
+
+	case 64:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMP8;
+	  break;
+
+	default:
+	  return;
+	  break;
+	}
+      tree fndecl = builtin_decl_implicit (fncode);
+      gimple *gcall = gimple_build_call (fndecl, 2, lhs, rhs);
+      gimple_set_location (gcall, gimple_location (stmt));
+      gsi_insert_before (gsi, gcall, GSI_SAME_STMT);
+    }
+  else if (TREE_CODE (TREE_TYPE (lhs)) == REAL_TYPE)
+    {
+      enum built_in_function fncode;
+      switch (bitno)
+	{
+	case 32:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPF;
+	  break;
+
+	case 64:
+      	  fncode = BUILT_IN_SANITIZER_COV_TRACE_CMPD;
+	  break;
+
+	default:
+	  return;
+	  break;
+        }
+      tree fndecl = builtin_decl_implicit (fncode);
+      gimple *gcall = gimple_build_call (fndecl, 2, lhs, rhs);
+      gimple_set_location (gcall, gimple_location (stmt));
+      gsi_insert_before (gsi, gcall, GSI_SAME_STMT);
+    }
+}
+
+static void
+instrument_switch (gimple_stmt_iterator *gsi, gimple *stmt, function *fun)
+{
+  gswitch *switch_stmt = as_a<gswitch *> (stmt);
+  tree index = gimple_switch_index (switch_stmt);
+  unsigned bitno = TYPE_PRECISION (TREE_TYPE (index));
+  unsigned i, n = gimple_switch_num_labels (switch_stmt), num = 0;
+  for (i = 0; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+          num++;
+      tree high_case = CASE_HIGH (label);
+      if (high_case != NULL_TREE)
+          num++;
+    }
+
+  tree case_array_elem_type = build_type_variant (uint64_type_node, 1, 0);
+  tree case_array_type = build_array_type (case_array_elem_type, 
+					   build_index_type (size_int (num + 2 - 1)));
+  char name[64];
+  static size_t case_array_count = 0;
+  snprintf(name, sizeof(name) - 1, "__sanitizer_cov_trace_switch_array%lu", case_array_count++);
+  tree case_array_var = build_decl (UNKNOWN_LOCATION, VAR_DECL, 
+				    get_identifier (name), case_array_type);
+  TREE_STATIC (case_array_var) = 1;
+  TREE_PUBLIC (case_array_var) = 0;
+  TREE_CONSTANT (case_array_var) = 1;
+  TREE_READONLY (case_array_var) = 1;
+  DECL_EXTERNAL (case_array_var) = 0;
+  DECL_ARTIFICIAL (case_array_var) = 1;
+  DECL_IGNORED_P (case_array_var) = 1;
+
+  vec <constructor_elt, va_gc> *v = NULL;
+  vec_alloc (v, num + 2);
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, build_int_cst (uint64_type_node, num));
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, build_int_cst (uint64_type_node, bitno));
+  for (i = 0; i < n; ++i)
+    {
+      tree label = gimple_switch_label (switch_stmt, i);
+
+      tree low_case = CASE_LOW (label);
+      if (low_case != NULL_TREE)
+        CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, 
+				build_int_cst (uint64_type_node, TREE_INT_CST_LOW (low_case)));
+
+      tree high_case = CASE_HIGH (label);
+      if (high_case != NULL_TREE)
+        CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, 
+				build_int_cst (uint64_type_node, TREE_INT_CST_LOW (high_case)));
+    }
+  tree ctor = build_constructor (case_array_type, v);
+  TREE_STATIC (ctor) = 1;
+  TREE_PUBLIC (ctor) = 0;
+  TREE_CONSTANT (ctor) = 1;
+  TREE_READONLY (ctor) = 1;
+  DECL_EXTERNAL (ctor) = 0;
+  DECL_INITIAL (case_array_var) = ctor;
+  varpool_node::finalize_decl (case_array_var);
+
+  tree case_array_var_ref = build_fold_addr_expr (case_array_var);
+  add_local_decl (fun, case_array_var);
+  tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_SWITCH);
+  gimple *gcall = gimple_build_call (fndecl, 2, index, case_array_var_ref);
+  gimple_set_location (gcall, gimple_location (stmt));
+  gsi_insert_before(gsi, gcall, GSI_SAME_STMT);
+}
+
 unsigned
 sancov_pass (function *fun)
 {
   initialize_sanitizer_builtins ();
 
+  basic_block bb;
+
   /* Insert callback into beginning of every BB. */
-  tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
-  basic_block bb;
-  FOR_EACH_BB_FN (bb, fun)
+  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_PC)
     {
-      gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
-      if (gsi_end_p (gsi))
-	continue;
-      gimple *stmt = gsi_stmt (gsi);
-      gimple *gcall = gimple_build_call (fndecl, 0);
-      gimple_set_location (gcall, gimple_location (stmt));
-      gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+      tree fndecl = builtin_decl_implicit (BUILT_IN_SANITIZER_COV_TRACE_PC);
+      FOR_EACH_BB_FN (bb, fun)
+        {
+          gimple_stmt_iterator gsi = gsi_start_nondebug_after_labels_bb (bb);
+          if (gsi_end_p (gsi))
+	    continue;
+          gimple *stmt = gsi_stmt (gsi);
+          gimple *gcall = gimple_build_call (fndecl, 0);
+          gimple_set_location (gcall, gimple_location (stmt));
+          gsi_insert_before (&gsi, gcall, GSI_SAME_STMT);
+        }
     }
+
+  /* Insert callback to every compare statments. */
+  if (flag_sanitize_coverage & SANITIZE_COV_TRACE_CMP)
+    {
+      FOR_EACH_BB_FN (bb, fun)
+	{
+          gimple_stmt_iterator gsi;
+          for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	    {
+              gimple *stmt = gsi_stmt (gsi);
+              switch (gimple_code (stmt))
+	        {
+                case GIMPLE_COND:
+		  instrument_cond (&gsi, stmt);
+		  break;
+            	case GIMPLE_SWITCH:
+		  instrument_switch (&gsi, stmt, fun);
+            	  break;
+		default:
+		  break;
+            	}
+            }
+        }
+    }
   return 0;
 }
 
Index: gcc/sanitizer.def
===================================================================
--- gcc/sanitizer.def	(revision 250082)
+++ gcc/sanitizer.def	(working copy)
@@ -529,6 +529,27 @@  DEF_SANITIZER_BUILTIN(BUILT_IN_UBSAN_HANDLE_DYNAMI
 DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_PC,
 		      "__sanitizer_cov_trace_pc",
 		      BT_FN_VOID, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP1,
+		      "__sanitizer_cov_trace_cmp1",
+		      BT_FN_VOID_UINT8_UINT8, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP2,
+		      "__sanitizer_cov_trace_cmp2",
+		      BT_FN_VOID_UINT16_UINT16, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP4,
+		      "__sanitizer_cov_trace_cmp4",
+		      BT_FN_VOID_UINT32_UINT32, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMP8,
+		      "__sanitizer_cov_trace_cmp8",
+		      BT_FN_VOID_UINT64_UINT64, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMPF,
+		      "__sanitizer_cov_trace_cmpf",
+		      BT_FN_VOID_FLOAT_FLOAT, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_CMPD,
+		      "__sanitizer_cov_trace_cmpd",
+		      BT_FN_VOID_DOUBLE_DOUBLE, ATTR_NOTHROW_LEAF_LIST)
+DEF_SANITIZER_BUILTIN(BUILT_IN_SANITIZER_COV_TRACE_SWITCH,
+		      "__sanitizer_cov_trace_switch",
+		      BT_FN_VOID_UINT64_PTR, ATTR_NOTHROW_LEAF_LIST)
 
 /* This has to come after all the sanitizer builtins.  */
 DEF_BUILTIN_STUB(END_SANITIZER_BUILTINS, (const char *)0)