diff mbox series

[v2,2/3] Add a pass to automatically add ptwrite instrumentation

Message ID 20181116035704.14820-3-andi@firstfloor.org
State New
Headers show
Series [v2,1/3] Allow memory operands for PTWRITE | expand

Commit Message

Andi Kleen Nov. 16, 2018, 3:57 a.m. UTC
From: Andi Kleen <ak@linux.intel.com>

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows low over head
logging of information. Essentially it's a hardware accelerated
printf.

This allows to reconstruct how values later from the log,
which can be useful for debugging or other analysis of the program
behavior. With the compiler support this can be done with without
having to manually add instrumentation to the code.

Using dwarf information this can be later mapped back to the variables.
The decoder decodes the PTWRITE instructions using IP information
in the log, and then looks up the argument in the debug information.
Then this can be used to reconstruct the original variable
name to display a value history for the variable.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs in a flexible matter.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

The -fvartrace=locals option is experimental: it works, but it
generates redundant ptwrites because the pass doesn't use
the SSA information to minimize instrumentation.
This could be optimized later.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Gemini Lake is used in low
end laptops ("Intel Pentium Silver J5...... / Celeron N4... /
Celeron J4...")

Linux perf can be used to record the values

perf record -e intel_pt/ptw=1,branch=0/ program
perf script --itrace=crw -F +synth ...

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4

It is currently not able to decode all variable locations to names,
but a large subset.

Longer term hopefully gdb will support this information too.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace-locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

The other thing is that perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace=locals instrumentation.

gcc/:

2018-11-15  Andi Kleen  <ak@linux.intel.com>

	* Makefile.in: Add tree-vartrace.o.
	* common.opt: Add -fvartrace
	* opts.c (parse_vartrace_options): Add.
	(common_handle_option): Call parse_vartrace_options.
	* config/i386/i386.c (ix86_vartrace_func): Add.
	(TARGET_VARTRACE_FUNC): Add.
	* doc/extend.texi: Document vartrace/no_vartrace
	attributes.
	* doc/invoke.texi: Document -fvartrace.
	* doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
	* passes.def: Add vartrace pass.
	* target.def (vartrace_func): Add.
	* tree-pass.h (make_pass_vartrace): Add.
	* tree-vartrace.c: New file to implement vartrace pass.

gcc/c-family/:

2018-11-15  Andi Kleen  <ak@linux.intel.com>

	* c-attribs.c (handle_vartrace_attribute,
	  handle_no_vartrace_attribute): New functions.
	  (attr_vartrace_exclusions): Add.

config/:

2018-11-03  Andi Kleen  <ak@linux.intel.com>

	* bootstrap-vartrace.mk: New.
	* bootstrap-vartrace-locals.mk: New.
---
 config/bootstrap-vartrace-locals.mk |   3 +
 config/bootstrap-vartrace.mk        |   3 +
 gcc/Makefile.in                     |   1 +
 gcc/c-family/c-attribs.c            |  77 +++++
 gcc/common.opt                      |   8 +
 gcc/config/i386/i386.c              |  32 ++
 gcc/doc/extend.texi                 |  32 ++
 gcc/doc/invoke.texi                 |  27 ++
 gcc/doc/tm.texi                     |   6 +
 gcc/doc/tm.texi.in                  |   2 +
 gcc/flag-types.h                    |   9 +
 gcc/opts.c                          |  63 ++++
 gcc/passes.def                      |   1 +
 gcc/target.def                      |   9 +
 gcc/tree-pass.h                     |   1 +
 gcc/tree-vartrace.c                 | 491 ++++++++++++++++++++++++++++
 16 files changed, 765 insertions(+)
 create mode 100644 config/bootstrap-vartrace-locals.mk
 create mode 100644 config/bootstrap-vartrace.mk
 create mode 100644 gcc/tree-vartrace.c

Comments

Richard Biener Nov. 20, 2018, 12:04 p.m. UTC | #1
On Fri, Nov 16, 2018 at 4:57 AM Andi Kleen <andi@firstfloor.org> wrote:
>
> From: Andi Kleen <ak@linux.intel.com>
>
> Add a new pass to automatically instrument changes to variables
> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
> field into an Processor Trace log, which allows low over head
> logging of information. Essentially it's a hardware accelerated
> printf.
>
> This allows to reconstruct how values later from the log,
> which can be useful for debugging or other analysis of the program
> behavior. With the compiler support this can be done with without
> having to manually add instrumentation to the code.
>
> Using dwarf information this can be later mapped back to the variables.
> The decoder decodes the PTWRITE instructions using IP information
> in the log, and then looks up the argument in the debug information.
> Then this can be used to reconstruct the original variable
> name to display a value history for the variable.
>
> There are new options to enable instrumentation for different types,
> and also a new attribute to control analysis fine grained per
> function or variable level. The attributes can be set on both
> the variable and the type level, and also on structure fields.
> This allows to enable tracing only for specific code in large
> programs in a flexible matter.
>
> The pass is generic, but only the x86 backend enables the necessary
> hooks. When the backend enables the necessary hooks (with -mptwrite)
> there is an additional pass that looks through the code for
> attribute vartrace enabled functions or variables.
>
> The -fvartrace=locals option is experimental: it works, but it
> generates redundant ptwrites because the pass doesn't use
> the SSA information to minimize instrumentation.
> This could be optimized later.
>
> Currently the code can be tested with SDE, or on a Intel
> Gemini Lake system with a new enough Linux kernel (v4.10+)
> that supports PTWRITE for PT. Gemini Lake is used in low
> end laptops ("Intel Pentium Silver J5...... / Celeron N4... /
> Celeron J4...")
>
> Linux perf can be used to record the values
>
> perf record -e intel_pt/ptw=1,branch=0/ program
> perf script --itrace=crw -F +synth ...
>
> I have an experimential version of perf that can also use
> dwarf information to symbolize many[1] values back to their variable
> names. So far it is not in standard perf, but available at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4
>
> It is currently not able to decode all variable locations to names,
> but a large subset.
>
> Longer term hopefully gdb will support this information too.
>
> The CPU can potentially generate very data high bandwidths when
> code doing a lot of computation is heavily instrumented.
> This can cause some data loss in both the CPU and also in perf
> logging the data when the disk cannot keep up.
>
> Running some larger workloads most workloads do not cause
> CPU level overflows, but I've seen it with -fvartrace
> with crafty, and with more workloads with -fvartrace-locals.
>
> Recommendation is to not fully instrument programs,
> but only areas of interest either at the file level or using
> the attributes.
>
> The other thing is that perf and the disk often cannot keep up
> with the data bandwidth for longer computations. In this case
> it's possible to use perf snapshot mode (add --snapshot
> to the command line above). The data will be only logged to
> a memory ring buffer then, and only dump the buffers on events
> of interest by sending SIGUSR2 to the perf binrary.
>
> In the future this will be hopefully better supported with
> core files and gdb.
>
> Passes bootstrap and test suite on x86_64-linux, also
> bootstrapped and tested gcc itself with full -fvartrace
> and -fvartrace=locals instrumentation.

In the cover mail you mentioned you didn't get rid of SSA update.
That is because your instrumentation does not set the calls
virtual operands.  Since your builtin clobbers memory and you
instrument non-memory ops that's only possible if you'd track
the active virtual operand during the walk over the function.  I
suppose using SSA update is OK for now.

More comments inline

> gcc/:
>
> 2018-11-15  Andi Kleen  <ak@linux.intel.com>
>
>         * Makefile.in: Add tree-vartrace.o.
>         * common.opt: Add -fvartrace
>         * opts.c (parse_vartrace_options): Add.
>         (common_handle_option): Call parse_vartrace_options.
>         * config/i386/i386.c (ix86_vartrace_func): Add.
>         (TARGET_VARTRACE_FUNC): Add.
>         * doc/extend.texi: Document vartrace/no_vartrace
>         attributes.
>         * doc/invoke.texi: Document -fvartrace.
>         * doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
>         * passes.def: Add vartrace pass.
>         * target.def (vartrace_func): Add.
>         * tree-pass.h (make_pass_vartrace): Add.
>         * tree-vartrace.c: New file to implement vartrace pass.
>
> gcc/c-family/:
>
> 2018-11-15  Andi Kleen  <ak@linux.intel.com>
>
>         * c-attribs.c (handle_vartrace_attribute,
>           handle_no_vartrace_attribute): New functions.
>           (attr_vartrace_exclusions): Add.
>
> config/:
>
> 2018-11-03  Andi Kleen  <ak@linux.intel.com>
>
>         * bootstrap-vartrace.mk: New.
>         * bootstrap-vartrace-locals.mk: New.
> ---
>  config/bootstrap-vartrace-locals.mk |   3 +
>  config/bootstrap-vartrace.mk        |   3 +
>  gcc/Makefile.in                     |   1 +
>  gcc/c-family/c-attribs.c            |  77 +++++
>  gcc/common.opt                      |   8 +
>  gcc/config/i386/i386.c              |  32 ++
>  gcc/doc/extend.texi                 |  32 ++
>  gcc/doc/invoke.texi                 |  27 ++
>  gcc/doc/tm.texi                     |   6 +
>  gcc/doc/tm.texi.in                  |   2 +
>  gcc/flag-types.h                    |   9 +
>  gcc/opts.c                          |  63 ++++
>  gcc/passes.def                      |   1 +
>  gcc/target.def                      |   9 +
>  gcc/tree-pass.h                     |   1 +
>  gcc/tree-vartrace.c                 | 491 ++++++++++++++++++++++++++++
>  16 files changed, 765 insertions(+)
>  create mode 100644 config/bootstrap-vartrace-locals.mk
>  create mode 100644 config/bootstrap-vartrace.mk
>  create mode 100644 gcc/tree-vartrace.c
>
> diff --git a/config/bootstrap-vartrace-locals.mk b/config/bootstrap-vartrace-locals.mk
> new file mode 100644
> index 00000000000..dd16640df74
> --- /dev/null
> +++ b/config/bootstrap-vartrace-locals.mk
> @@ -0,0 +1,3 @@
> +STAGE2_CFLAGS += -mptwrite -fvartrace=all
> +STAGE3_CFLAGS += -mptwrite -fvartrace=all
> +STAGE4_CFLAGS += -mptwrite -fvartrace=all
> diff --git a/config/bootstrap-vartrace.mk b/config/bootstrap-vartrace.mk
> new file mode 100644
> index 00000000000..e29824d799b
> --- /dev/null
> +++ b/config/bootstrap-vartrace.mk
> @@ -0,0 +1,3 @@
> +STAGE2_CFLAGS += -mptwrite -fvartrace
> +STAGE3_CFLAGS += -mptwrite -fvartrace
> +STAGE4_CFLAGS += -mptwrite -fvartrace
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index ec793175c3b..64a99a1ec8a 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1594,6 +1594,7 @@ OBJS = \
>         tree-vectorizer.o \
>         tree-vector-builder.o \
>         tree-vrp.o \
> +       tree-vartrace.o \
>         tree.o \
>         typed-splay-tree.o \
>         unique-ptr-tests.o \
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index 1657df7f9df..d50e78de830 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -104,6 +104,10 @@ static tree handle_tls_model_attribute (tree *, tree, tree, int,
>                                         bool *);
>  static tree handle_no_instrument_function_attribute (tree *, tree,
>                                                      tree, int, bool *);
> +static tree handle_vartrace_attribute (tree *, tree,
> +                                                    tree, int, bool *);
> +static tree handle_no_vartrace_attribute (tree *, tree,
> +                                                    tree, int, bool *);
>  static tree handle_no_profile_instrument_function_attribute (tree *, tree,
>                                                              tree, int, bool *);
>  static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
> @@ -235,6 +239,13 @@ static const struct attribute_spec::exclusions attr_const_pure_exclusions[] =
>    ATTR_EXCL (NULL, false, false, false)
>  };
>
> +static const struct attribute_spec::exclusions attr_vartrace_exclusions[] =
> +{
> +  ATTR_EXCL ("vartrace", true, true, true),
> +  ATTR_EXCL ("no_vartrace", true, true, true),
> +  ATTR_EXCL (NULL, false, false, false)
> +};
> +
>  /* Table of machine-independent attributes common to all C-like languages.
>
>     Current list of processed common attributes: nonnull.  */
> @@ -326,6 +337,12 @@ const struct attribute_spec c_common_attribute_table[] =
>    { "no_instrument_function", 0, 0, true,  false, false, false,
>                               handle_no_instrument_function_attribute,
>                               NULL },
> +  { "vartrace",                      0, 0, false,  false, false, false,
> +                             handle_vartrace_attribute,
> +                             attr_vartrace_exclusions },
> +  { "no_vartrace",           0, 0, false,  false, false, false,
> +                             handle_no_vartrace_attribute,
> +                             attr_vartrace_exclusions },
>    { "no_profile_instrument_function",  0, 0, true, false, false, false,
>                               handle_no_profile_instrument_function_attribute,
>                               NULL },
> @@ -770,6 +787,66 @@ handle_no_sanitize_undefined_attribute (tree *node, tree name, tree, int,
>    return NULL_TREE;
>  }
>
> +/* Handle "vartrace" attribute; arguments as in struct
> +   attribute_spec.handler.  */
> +
> +static tree
> +handle_vartrace_attribute (tree *node, tree name, tree, int flags,
> +                          bool *no_add_attrs)
> +{
> +  if (!VAR_OR_FUNCTION_DECL_P (*node) && !TYPE_P (*node) &&
> +      TREE_CODE (*node) != FIELD_DECL)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute ignored for object", name);
> +      *no_add_attrs = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (!targetm.vartrace_func)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute not supported for target", name);
> +      *no_add_attrs = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_TYPE (*node)
> +      && TREE_CODE (*node) != FUNCTION_DECL
> +      && targetm.vartrace_func (TREE_TYPE (*node), true) == NULL_TREE)
> +   {
> +      warning (OPT_Wattributes, "%qE attribute not supported for type", name);
> +      *no_add_attrs = true;
> +      return NULL_TREE;
> +   }
> +
> +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> +    *node = build_variant_type_copy (*node);
> +
> +  /* We look it up later with lookup_attribute.  */
> +  return NULL_TREE;
> +}
> +
> +/* Handle "no_vartrace" attribute; arguments as in struct
> +   attribute_spec.handler.  */
> +
> +static tree
> +handle_no_vartrace_attribute (tree *node, tree name, tree, int flags,
> +                             bool *no_add_attrs)
> +{
> +  if (!VAR_OR_FUNCTION_DECL_P (*node) && !TYPE_P (*node)
> +      && TREE_CODE (*node) != FIELD_DECL)
> +    {
> +      warning (OPT_Wattributes, "%qE attribute ignored", name);
> +      *no_add_attrs = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> +    *node = build_variant_type_copy (*node);
> +
> +  /* We look it up later with lookup_attribute.  */
> +  return NULL_TREE;
> +}
> +
>  /* Handle an "asan odr indicator" attribute; arguments as in
>     struct attribute_spec.handler.  */
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 72a713593c3..f27a57b1177 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -221,6 +221,10 @@ unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NO
>  Variable
>  unsigned int flag_sanitize_coverage
>
> +; What to instrument with vartrace
> +Variable
> +unsigned int flag_vartrace
> +
>  ; Flag whether a prefix has been added to dump_base_name
>  Variable
>  bool dump_base_name_prefixed = false
> @@ -2856,6 +2860,10 @@ ftree-scev-cprop
>  Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
>  Enable copy propagation of scalar-evolution information.
>
> +fvartrace
> +Common JoinedOrMissing Report Driver
> +-fvartrace=default|all|locals|returns|args|reads|writes|off   Enable variable tracing instrumentation.
> +
>  ; -fverbose-asm causes extra commentary information to be produced in
>  ; the generated assembly code (to make it more readable).  This option
>  ; is generally only of use to those who actually need to read the
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index c18c60a1d19..6d130bf0804 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -31900,6 +31900,35 @@ ix86_mangle_function_version_assembler_name (tree decl, tree id)
>    return ret;
>  }
>
> +/* Hook to determine that TYPE can be traced.  Ignore target flags
> +   if FORCE is true. Returns the tracing builtin if tracing is possible,
> +   or otherwise NULL.  */
> +
> +static tree
> +ix86_vartrace_func (tree type, bool force)
> +{
> +  if (!(ix86_isa_flags2 & OPTION_MASK_ISA_PTWRITE))
> +    {
> +      /* With force, as in checking for the attribute, ignore
> +        the current target settings. Otherwise it's not
> +        possible to declare vartrace variables outside
> +        an __attribute__((target("ptwrite"))) function
> +        if -mptwrite is not specified.  */
> +      if (!force)
> +       return NULL;
> +      /* Initialize the builtins if missing, so that we have
> +        something to return.  */
> +      if (!ix86_builtins[(int)IX86_BUILTIN_PTWRITE32])
> +       ix86_add_new_builtins (0, OPTION_MASK_ISA_PTWRITE);
> +    }
> +
> +  if (TYPE_PRECISION (type) == 32)
> +    return ix86_builtins[(int) IX86_BUILTIN_PTWRITE32];
> +  else if (TYPE_PRECISION (type) == 64)
> +    return ix86_builtins[(int) IX86_BUILTIN_PTWRITE64];

I think it makes more sense to pass the target hook a mode
instead of a type.  Thus above you'd check for SImode vs. DImode.

> +  else
> +    return NULL;
> +}
>
>  static tree
>  ix86_mangle_decl_assembler_name (tree decl, tree id)
> @@ -50894,6 +50923,9 @@ ix86_run_selftests (void)
>  #undef TARGET_ASAN_SHADOW_OFFSET
>  #define TARGET_ASAN_SHADOW_OFFSET ix86_asan_shadow_offset
>
> +#undef TARGET_VARTRACE_FUNC
> +#define TARGET_VARTRACE_FUNC ix86_vartrace_func
> +
>  #undef TARGET_GIMPLIFY_VA_ARG_EXPR
>  #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index d4b1046b6ae..f0cde5dcf0f 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3228,6 +3228,13 @@ the standard C library can be guaranteed not to throw an exception
>  with the notable exceptions of @code{qsort} and @code{bsearch} that
>  take function pointer arguments.
>
> +@item no_vartrace
> +The @code{no_vartrace} attribute disables data tracing for
> +the function [or variable or structure field] declared with
> +the attribute. See @pxref{Common Variable Attributes} and
> +@pxref{Common Type Attributes}. When specified for a function
> +nothing in the function is traced.
> +
>  @item optimize (@var{level}, @dots{})
>  @item optimize (@var{string}, @dots{})
>  @cindex @code{optimize} function attribute
> @@ -3489,6 +3496,21 @@ When applied to a member function of a C++ class template, the
>  attribute also means that the function is instantiated if the
>  class itself is instantiated.
>
> +@item vartrace
> +@cindex @code{vartrace} function or variable attribute
> +Enable data tracing for the function or variable or structure field
> +marked with this attribute. When applied to a type all instances of the type
> +will be traced. When applied to a structure or union all fields will be traced.
> +When applied to a structure field that field will be traced.
> +For functions will not trace locals (unless enabled on the command line or
> +on as an attribute the local itself) but arguments, returns, globals, pointer
> +references. For variables or types or structure members any reads or writes will be traced.
> +Only integer or pointer types can be tracked.
> +
> +Currently implemented for x86 when the @option{ptwrite} target option
> +is enabled for systems that support the @code{PTWRITE} instruction,
> +supporting 4 and 8 byte integers.
> +
>  @item visibility ("@var{visibility_type}")
>  @cindex @code{visibility} function attribute
>  This attribute affects the linkage of the declaration to which it is attached.
> @@ -7196,6 +7218,16 @@ A @{ /* @r{@dots{}} */ @};
>  struct __attribute__ ((copy ( (struct A *)0)) B @{ /* @r{@dots{}} */ @};
>  @end smallexample
>
> +@cindex @code{vartrace} type attribute
> +@cindex @code{no_vartrace} type attribute
> +@item vartrace
> +@itemx no_vartrace
> +Specify that all instances of type should be variable traced
> +or not variable traced. Can be also also applied to function
> +types to disable tracing for all instances of that function type.
> +Can be also applied to structure fields. See the description in
> +@pxref{Variable Attributes} for more details.
> +
>  @item deprecated
>  @itemx deprecated (@var{msg})
>  @cindex @code{deprecated} type attribute
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 535b258d22b..8b9d1d71669 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -2754,6 +2754,33 @@ Don't use the @code{__cxa_get_exception_ptr} runtime routine.  This
>  causes @code{std::uncaught_exception} to be incorrect, but is necessary
>  if the runtime routine is not available.
>
> +@item -fvartrace
> +@opindex -fvartrace=...
> +Insert trace instructions to trace variables at runtime with a tracer
> +such as Intel Processor Trace.
> +
> +Requires enabling a backend specific option, like @option{-mptwrite} to enable
> +@code{PTWRITE} instruction generation on x86.
> +
> +Additional qualifiers can be specified after the =: @option{args}
> +for arguments, @option{off} to disable, @option{returns} for tracing
> +return values, @option{reads} to trace reads, @option{writes} to trace
> +writes, @option{locals} to trace locals.
> +Default when enabled is tracing reads, writes, arguments, returns, objects
> +with a static or thread duration but no locals.
> +Multiple options can be separated by comma.
> +
> +When -fvartrace enabled, the compiler will add trace instructions. By
> +default these trace options act like nops, unless tracing is
> +enabled. For example to enable tracing on x86 Linux with Linux perf
> +use:
> +
> +@smallexample
> +gcc -fvartrace -o program -g program.c
> +perf record -e intel_pt/pt=1,ptw=1,branch=0/ program
> +perf script
> +@end smallexample
> +
>  @item -fvisibility-inlines-hidden
>  @opindex fvisibility-inlines-hidden
>  This switch declares that the user does not attempt to compare
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 0a2ad9a745e..861914ac646 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -11936,6 +11936,12 @@ Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
>  supported by the target.
>  @end deftypefn
>
> +@deftypefn {Target Hook} tree TARGET_VARTRACE_FUNC (tree @var{type}, bool @var{force})
> +Return a builtin to call to trace variables of type TYPE or NULL if not supported
> +by the target. Ignore target configuration if FORCE is true. The builtin gets called with a
> +single argument of TYPE.

So make this take a enum machine_mode @var{mode} specified to be an
integer mode.
The argument type of the function would then be an unsigned integer
type with this mode.
That should allow more flexible instrumentation, including of floats
or doubles or small
aggregates.

Maybe even instead pass it a number of bytes so it models how atomics work.

> +@end deftypefn
> +
>  @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK (unsigned HOST_WIDE_INT @var{val})
>  Validate target specific memory model mask bits. When NULL no target specific
>  memory model bits are allowed.
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index f1ad80da467..756e84d0e07 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -8104,6 +8104,8 @@ and the associated definitions of those functions.
>
>  @hook TARGET_ASAN_SHADOW_OFFSET
>
> +@hook TARGET_VARTRACE_FUNC
> +
>  @hook TARGET_MEMMODEL_CHECK
>
>  @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index 500f6638f36..6e5add2da9d 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -261,6 +261,15 @@ enum sanitize_code {
>                                   | SANITIZE_BOUNDS_STRICT
>  };
>
> +/* Settings for flag_vartrace */
> +enum flag_vartrace {
> +  VARTRACE_LOCALS = 1 << 0,
> +  VARTRACE_ARGS = 1 << 1,
> +  VARTRACE_RETURNS = 1 << 2,
> +  VARTRACE_READS = 1 << 3,
> +  VARTRACE_WRITES = 1 << 4
> +};
> +
>  /* Settings of flag_incremental_link.  */
>  enum incremental_link {
>    INCREMENTAL_LINK_NONE,
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 318ed442057..c6c47b55a39 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1872,6 +1872,65 @@ check_alignment_argument (location_t loc, const char *flag, const char *name)
>    parse_and_check_align_values (flag, name, align_result, true, loc);
>  }
>
> +/* Parse vartrace options in P, updating flags OPTS at LOC and return
> +   updated flags.  */
> +
> +static int
> +parse_vartrace_options (const char *p, int opts, location_t loc)
> +{
> +  static struct {
> +    const char *name;
> +    int opt;
> +  } vopts[] =
> +      {
> +       { "default",
> +        VARTRACE_ARGS | VARTRACE_RETURNS | VARTRACE_READS
> +        | VARTRACE_WRITES }, /* Keep as first entry.  */
> +       { "all",
> +        VARTRACE_ARGS | VARTRACE_RETURNS | VARTRACE_READS
> +        | VARTRACE_WRITES | VARTRACE_LOCALS },
> +       { "args", VARTRACE_ARGS },
> +       { "returns", VARTRACE_RETURNS },
> +       { "reads", VARTRACE_READS },
> +       { "writes", VARTRACE_WRITES },
> +       { "locals", VARTRACE_LOCALS },
> +       { NULL, 0 }
> +      };
> +
> +  if (*p == '=')
> +    p++;
> +  if (*p == 0)
> +    return opts | vopts[0].opt;
> +
> +  if (!strcmp (p, "off"))
> +    return 0;
> +
> +  while (*p)
> +    {
> +      unsigned len = strcspn (p, ",");
> +      int i;
> +
> +      for (i = 0; vopts[i].name; i++)
> +       {
> +         if (len == strlen (vopts[i].name) && !strncmp (p, vopts[i].name, len))
> +           {
> +             opts |= vopts[i].opt;
> +             break;
> +           }
> +       }
> +      if (vopts[i].name == NULL)
> +       {
> +         error_at (loc, "invalid argument to %qs", "-fvartrace");
> +         break;
> +       }
> +
> +      p += len;
> +      if (*p == ',')
> +       p++;
> +    }
> +  return opts;
> +}
> +
>  /* Handle target- and language-independent options.  Return zero to
>     generate an "unknown option" message.  Only options that need
>     extra handling need to be listed here; if you simply want
> @@ -2078,6 +2137,10 @@ common_handle_option (struct gcc_options *opts,
>      case OPT__completion_:
>        break;
>
> +    case OPT_fvartrace:
> +      opts->x_flag_vartrace = parse_vartrace_options (arg, opts->x_flag_vartrace, loc);
> +      break;
> +
>      case OPT_fsanitize_:
>        opts->x_flag_sanitize
>         = parse_sanitizer_options (arg, loc, code,
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 24f212c8e31..36d14b0386a 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -394,6 +394,7 @@ along with GCC; see the file COPYING3.  If not see
>    NEXT_PASS (pass_asan_O0);
>    NEXT_PASS (pass_tsan_O0);
>    NEXT_PASS (pass_sanopt);
> +  NEXT_PASS (pass_vartrace);

I'd move it one lower, after pass_cleanup_eh.  Further enhancement
would make it a
RTL pass ...

>    NEXT_PASS (pass_cleanup_eh);
>    NEXT_PASS (pass_lower_resx);
>    NEXT_PASS (pass_nrv);
> diff --git a/gcc/target.def b/gcc/target.def
> index f9469d69cb0..b5d970870a4 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -4300,6 +4300,15 @@ supported by the target.",
>   unsigned HOST_WIDE_INT, (void),
>   NULL)
>
> +/* Defines the builtin to trace variables, or NULL.  */
> +DEFHOOK
> +(vartrace_func,
> + "Return a builtin to call to trace variables of type TYPE or NULL if not supported\n\
> +by the target. Ignore target configuration if FORCE is true. The builtin gets called with a\n\
> +single argument of TYPE.",
> + tree, (tree type, bool force),
> + NULL)
> +
>  /* Functions relating to calls - argument passing, returns, etc.  */
>  /* Members of struct call have no special macro prefix.  */
>  HOOK_VECTOR (TARGET_CALLS, calls)
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index af15adc8e0c..2cf31785a6f 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -423,6 +423,7 @@ extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_post_ipa_warn (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_stdarg (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_vartrace (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_early_warn_uninitialized (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_late_warn_uninitialized (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_cse_reciprocals (gcc::context *ctxt);
> diff --git a/gcc/tree-vartrace.c b/gcc/tree-vartrace.c
> new file mode 100644
> index 00000000000..1ef81b743fc
> --- /dev/null
> +++ b/gcc/tree-vartrace.c
> @@ -0,0 +1,491 @@
> +/* Insert instructions for data value tracing.
> +   Copyright (C) 2017, 2018 Free Software Foundation, Inc.
> +   Contributed by Andi Kleen.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify
> +it under the terms of the GNU General Public License as published by
> +the Free Software Foundation; either version 3, or (at your option)
> +any later version.
> +
> +GCC is distributed in the hope that it will be useful,
> +but WITHOUT ANY WARRANTY; without even the implied warranty of
> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +GNU General Public License for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "target.h"
> +#include "tree.h"
> +#include "tree-iterator.h"
> +#include "tree-pass.h"
> +#include "basic-block.h"
> +#include "gimple.h"
> +#include "gimple-iterator.h"
> +#include "gimplify.h"
> +#include "gimplify-me.h"
> +#include "gimple-ssa.h"
> +#include "gimple-pretty-print.h"
> +#include "cfghooks.h"
> +#include "fold-const.h"
> +#include "ssa.h"
> +#include "tree-dfa.h"
> +#include "attribs.h"
> +
> +namespace {
> +
> +enum attrstate { force_off, force_on, neutral };
> +
> +/* Can we trace with attributes ATTR.  */
> +
> +attrstate
> +supported_attr (tree attr)
> +{
> +  if (lookup_attribute ("no_vartrace", attr))
> +    return force_off;
> +  if (lookup_attribute ("vartrace", attr))
> +    return force_on;
> +  return neutral;
> +}
> +
> +/* Is tracing enabled for ARG considering S.  */
> +
> +attrstate
> +supported_op (tree arg, attrstate s)
> +{
> +  if (s != neutral)
> +    return s;
> +  if (DECL_P (arg))
> +    {
> +      s = supported_attr (DECL_ATTRIBUTES (arg));
> +      if (s != neutral)
> +       return s;
> +    }
> +  return supported_attr (TYPE_ATTRIBUTES (TREE_TYPE (arg)));
> +}
> +
> +/* Can we trace DECL.  */
> +
> +bool
> +supported_type (tree decl)
> +{
> +  tree type = TREE_TYPE (decl);
> +
> +  return POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type);

So in reality the restriction is on the size of the object, correct?
Given you only
use this in supported_mem I'd elide this function (also see below)

The supported_ names are a bit odd since some of them are about
whehter tracing is enabled and some are whether a specific value can
be logged at all.  I suggest to differentiate appropriately here, like
enabled_op vs. supported_type.

> +}
> +
> +/* Return true if DECL is refering to a local variable.  */
> +
> +bool
> +is_local (tree decl)
> +{
> +  if (!(flag_vartrace & VARTRACE_LOCALS) && supported_op (decl, neutral) != force_on)
> +    return false;
> +  return auto_var_in_fn_p (decl, cfun->decl);
> +}

Likewise.

> +/* Is T something we can log, FORCEing the type if needed.  */
> +
> +bool
> +supported_mem (tree t, bool force)

This looks like a bad name since 't' isn't necessarily a memory location?

> +{
> +  if (!supported_type (t))

You handle some nested cases below via recursion,
like a.b.c (but not a[i][j]).  But then this check will
fire.  I think it would be better to restructure the
function to look at the outermost level for whether
the op is of supported type, thus we can log it
at all and then get all the way down to the base via
sth like

  if (!supported_type (t))
    return false;
  enum attrstate s = <some default>;
  do
    {
       s = supported_op (t, s);
       if (s == force_off)
         return false;
    }
  while (handled_component_p (t) && (t = TREE_OPERAND (t, 0)))

Now t is either an SSA_NAME, a DECL (you fail to handle PARM_DECL
and RESULT_DECL below) or a [TARGET_]MEM_REF.  To get rid
of non-pointer indirections do then

      t = get_base_address (t);
      if (DECL_P (t) && is_local (t))
  ....

because...

> +    return false;
> +
> +  enum attrstate s = supported_op (t, neutral);
> +  if (s == force_off)
> +    return false;
> +  if (s == force_on)
> +    force = true;
> +
> +  switch (TREE_CODE (t))
> +    {
> +    case VAR_DECL:
> +      if (DECL_ARTIFICIAL (t))
> +       return false;
> +      if (is_local (t))
> +       return true;
> +      return s == force_on || force;
> +
> +    case ARRAY_REF:
> +      t = TREE_OPERAND (t, 0);
> +      s = supported_op (t, s);
> +      if (s == force_off)
> +       return false;
> +      return supported_type (TREE_TYPE (t));

Your supported_type is said to take a DECL.  And you
already asked for this type - it's the type of the original t
(well, the type of this type given TREE_TYPE (t) is an array type).
But you'll reject a[i][j] where the type of this type is an array type as well.

> +
> +    case COMPONENT_REF:
> +      s = supported_op (TREE_OPERAND (t, 1), s);
> +      t = TREE_OPERAND (t, 0);
> +      if (s == neutral && is_local (t))
> +       return true;
> +      s = supported_op (t, s);
> +      if (s != neutral)
> +       return s == force_on ? true : false;
> +      return supported_mem (t, force);
> +
> +      // support BIT_FIELD_REF?
> +
> +    case VIEW_CONVERT_EXPR:
> +    case TARGET_MEM_REF:
> +    case MEM_REF:
> +      return supported_mem (TREE_OPERAND (t, 0), force);
> +
> +    case SSA_NAME:
> +      if ((flag_vartrace & VARTRACE_LOCALS)
> +         && SSA_NAME_VAR (t)
> +         && !DECL_IGNORED_P (SSA_NAME_VAR (t)))
> +       return true;
> +      return force;
> +
> +    default:
> +      break;
> +    }
> +
> +  return false;
> +}
> +
> +/* Print debugging for inserting CODE at ORIG_STMT with type of VAL for WHY.  */
> +
> +void
> +log_trace_code (gimple *orig_stmt, gimple *code, tree val, const char *why)
> +{
> +  if (!dump_file)
> +    return;
> +  if (orig_stmt)
> +    fprintf (dump_file, "BB%d ", gimple_bb (orig_stmt)->index);
> +  fprintf (dump_file, "%s inserting ", why);
> +  print_gimple_stmt (dump_file, code, 0, TDF_VOPS|TDF_MEMSYMS);
> +  if (orig_stmt)
> +    {
> +      fprintf (dump_file, "orig ");
> +      print_gimple_stmt (dump_file, orig_stmt, 2,
> +                            TDF_VOPS|TDF_MEMSYMS);

TDF_MEMSYMS is dead, or rather it seems to be a leftover
now equivalent to just TDF_VOPS.

> +    }
> +  fprintf (dump_file, "type ");
> +  print_generic_expr (dump_file, TREE_TYPE (val), TDF_SLIM);
> +  fputc ('\n', dump_file);
> +  fputc ('\n', dump_file);
> +}
> +
> +/* Insert variable tracing code for VAL before iterator GI, originally
> +   for ORIG_STMT and optionally at LOC. Normally before ORIG_STMT, but
> +   AFTER if true. Reason is WHY. Return trace var if successfull,
> +   or NULL_TREE.  */
> +
> +tree
> +insert_trace (gimple_stmt_iterator *gi, tree val, gimple *orig_stmt,
> +             const char *why, location_t loc = -1, bool after = false)
> +{
> +  if (loc == (location_t)-1)
> +    loc = gimple_location (orig_stmt);
> +
> +  tree func = targetm.vartrace_func (TREE_TYPE (val), false);
> +  if (!func)
> +    return NULL_TREE;
> +
> +  tree tvar = val;
> +  if (!is_gimple_reg (val))
> +    {
> +      tvar = make_ssa_name (TREE_TYPE (val));
> +      gassign *assign = gimple_build_assign (tvar, unshare_expr (val));
> +      log_trace_code (orig_stmt, assign, val, "copy");
> +      gimple_set_location (assign, loc);
> +      if (after)
> +       gsi_insert_after (gi, assign, GSI_CONTINUE_LINKING);
> +      else
> +       gsi_insert_before (gi, assign, GSI_SAME_STMT);
> +      update_stmt (assign);

gsi_insert_* does update_stmt already.  Btw, if you allow any
SImode or DImode size value you can use a VIEW_CONVERT_EXPR
to view the value as an integer type.  You can of course also do
simple promotion of char or short to int.

I wonder why you do not get IL verification errors when passing
the builtin signed ints/longs btw...  ah, because we do not actually
verify call arguments :/

> +    }
> +
> +  gcall *call = gimple_build_call (func, 1, tvar);
> +  log_trace_code (NULL, call, tvar, why);
> +  gimple_set_location (call, loc);
> +  if (after)
> +    gsi_insert_after (gi, call, GSI_CONTINUE_LINKING);
> +  else
> +    gsi_insert_before (gi, call, GSI_SAME_STMT);
> +  update_stmt (call);
> +  return tvar;
> +}
> +
> +/* Insert trace at GI for T in FUN if suitable memory or variable
> +   reference.  Always if FORCE. Originally on ORIG_STMT. Reason is
> +   WHY.  Insert after GI if AFTER. Returns trace variable or NULL_TREE.  */
> +
> +tree
> +instrument_mem (gimple_stmt_iterator *gi, tree t, bool force,
> +               gimple *orig_stmt, const char *why, bool after = false)
> +{
> +  if (supported_mem (t, force))
> +    return insert_trace (gi, t, orig_stmt, why, -1, after);
> +  return NULL_TREE;
> +}
> +
> +/* Instrument arguments for FUN. Return true if changed.  */
> +
> +bool
> +instrument_args (function *fun)
> +{
> +  gimple_stmt_iterator gi;
> +  bool changed = false;
> +
> +  /* Local tracing usually takes care of the argument too, when
> +     they are read. This avoids redundant trace instructions.  */

But only when instrumenting reads?

> +  if (flag_vartrace & VARTRACE_LOCALS)
> +    return false;
> +
> +  for (tree arg = DECL_ARGUMENTS (current_function_decl);
> +       arg != NULL_TREE;
> +       arg = DECL_CHAIN (arg))
> +    {
> +     gi = gsi_start_bb (BASIC_BLOCK_FOR_FN (fun, NUM_FIXED_BLOCKS));

leftover?  It seems to be unused (and is bougs).

> +     tree type = TREE_TYPE (arg);
> +     if (POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type))

You had supported_type for this...  I think you want to bail out for

   DECL_BY_REFERENCE (arg)  // arg will be a pointer to the actual argument
   || DECL_IGNORED (arg) // no dbeug info

> +       {
> +         tree func = targetm.vartrace_func (TREE_TYPE (arg), false);
> +         if (!func)
> +           continue;
> +
> +         if (!is_gimple_reg (arg))

TREE_CODE (arg) != SSA_NAME  (since you are calling ssa_default_def)

> +           continue;
> +         tree sarg = ssa_default_def (fun, arg);
> +         if (!sarg)
> +           continue;
> +
> +         gimple_stmt_iterator egi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
> +         changed |= !!insert_trace (&egi, sarg, NULL, "arg", fun->function_start_locus);
> +       }
> +    }
> +  return changed;
> +}
> +
> +/* Generate trace call for store GAS at GI, force if FORCE.  Return true
> +   if successfull. Return true if successfully inserted.  */
> +
> +bool
> +instrument_store (gimple_stmt_iterator *gi, gassign *gas, bool force)
> +{
> +  tree orig = gimple_assign_lhs (gas);
> +
> +  if (!supported_mem (orig, force))
> +    return false;
> +
> +  tree func = targetm.vartrace_func (TREE_TYPE (orig), false);
> +  if (!func)
> +    return false;
> +
> +  /* Generate another reference to target. That can be racy, but is
> +     guaranteed to have the debug location of the target.  Better
> +     would be to use the original value to avoid any races, but we
> +     would need to somehow force the target location of the
> +     builtin.  */

Hmm, but then this requires the target instruction to have a memory operand?
That's going to be unlikely for RISCy cases?  On x86 does it work if
combine later does not syntesize a ptwrite with memory operand?
I also wonder how this survives RTL CSE since you are basically doing

  mem = val;  // orig stmt
  val' = mem;
  ptwrite (val');

that probably means when CSE removes the load there ends up a debug-insn
reflecting what you want?

> +  tree tvar = make_ssa_name (TREE_TYPE (orig));
> +  gassign *assign = gimple_build_assign (tvar, unshare_expr (orig));
> +  log_trace_code (gas, assign, orig, "store copy");
> +  gimple_set_location (assign, gimple_location (gas));
> +  gsi_insert_after (gi, assign, GSI_CONTINUE_LINKING);
> +  update_stmt (assign);
> +
> +  gcall *tcall = gimple_build_call (func, 1, tvar);
> +  log_trace_code (gas, tcall, tvar, "store");
> +  gimple_set_location (tcall, gimple_location (gas));
> +  gsi_insert_after (gi, tcall, GSI_CONTINUE_LINKING);
> +  update_stmt (tcall);
> +  return true;
> +}
> +
> +/* Instrument STMT at GI. Force if FORCE. Return true if changed.  */
> +
> +bool
> +instrument_assign (gimple_stmt_iterator *gi, gassign *gas, bool force)
> +{
> +  if (gimple_clobber_p (gas))
> +    return false;
> +  bool changed = false;
> +  tree tvar = instrument_mem (gi, gimple_assign_rhs1 (gas),
> +                             (flag_vartrace & VARTRACE_READS) || force,
> +                             gas, "assign load1");
> +  if (tvar)
> +    {
> +      gimple_assign_set_rhs1 (gas, tvar);
> +      changed = true;
> +    }
> +  /* Handle operators in case they read locals.  */

Does it make sense at all to instrument SSA "reads"?  You know
all SSA vars have a single definition and all locals not in SSA form
are represented with memory reads/writes, so ...

> +  if (gimple_num_ops (gas) > 2)
> +    {

... this case looks suprious.  And as said you probably do not want to
isnstrument SSA "reads"?  Also not above when instrumenting
gimple_assign_rhs1 (gas),
so better guard that with gimple_assing_load_p (gas)?

> +      tvar = instrument_mem (gi, gimple_assign_rhs2 (gas),
> +                             (flag_vartrace & VARTRACE_READS) || force,
> +                             gas, "assign load2");
> +      if (tvar)
> +       {
> +         gimple_assign_set_rhs2 (gas, tvar);

you still have this stmt adjusting here, why?

> +         changed = true;
> +       }
> +    }
> +  // handle more ops?
> +
> +  if (gimple_store_p (gas))
> +    changed |= instrument_store (gi, gas,
> +                                (flag_vartrace & VARTRACE_WRITES) || force);
> +
> +  if (changed)
> +    update_stmt (gas);
> +  return changed;
> +}
> +
> +/* Instrument return at statement STMT at GI with FORCE. Return true
> +   if changed.  */
> +
> +bool
> +instrument_return (gimple_stmt_iterator *gi, greturn *gret, bool force)
> +{
> +  tree rval = gimple_return_retval (gret);
> +
> +  if (!rval)
> +    return false;
> +  if (DECL_P (rval) && DECL_BY_REFERENCE (rval))
> +    rval = build_simple_mem_ref (ssa_default_def (cfun, rval));

This looks bogus.  If DECL_BY_REFERENCE is true then rval
is of pointer type and thus a register, you shouldn't ever see that
returned plainly in gimple_return_retval.

> +  if (supported_mem (rval, force))
> +    return !!insert_trace (gi, rval, gret, "return");
> +  return false;
> +}
> +
> +/* Instrument asm at GI in statement STMT with FORCE if needed. Return
> +   true if changed.  */
> +
> +bool
> +instrument_asm (gimple_stmt_iterator *gi, gasm *stmt, bool force)
> +{
> +  bool changed = false;
> +
> +  for (unsigned i = 0; i < gimple_asm_ninputs (stmt); i++)
> +    changed |= !!instrument_mem (gi, TREE_VALUE (gimple_asm_input_op (stmt, i)),
> +                                force || (flag_vartrace & VARTRACE_READS), stmt,
> +                                "asm input");
> +  for (unsigned i = 0; i < gimple_asm_noutputs (stmt); i++)
> +    {
> +      tree o = TREE_VALUE (gimple_asm_output_op (stmt, i));
> +      if (supported_mem (o, force | (flag_vartrace & VARTRACE_WRITES)))
> +         changed |= !!insert_trace (gi, o, stmt, "asm output", -1, true);
> +    }

I would guess instrumenting asms() has the chance of disturbing
reg-alloc quite a bit...

> +  return changed;
> +}
> +
> +/* Insert vartrace calls for FUN.  */
> +
> +unsigned int
> +vartrace_execute (function *fun)
> +{
> +  basic_block bb;
> +  gimple_stmt_iterator gi;
> +  bool force = 0;
> +
> +  if (lookup_attribute ("vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
> +      || lookup_attribute ("vartrace", DECL_ATTRIBUTES (fun->decl)))

btw, I wonder whether the vartrace attribute should have an argument like
vartrace(locals,reads,...)?

> +    force = true;
> +
> +  bool changed = false;
> +
> +  if ((flag_vartrace & VARTRACE_ARGS) || force)
> +    changed |= instrument_args (fun);
> +
> +  FOR_EACH_BB_FN (bb, fun)
> +    for (gi = gsi_start_bb (bb); !gsi_end_p (gi); gsi_next (&gi))
> +      {
> +       gimple *stmt = gsi_stmt (gi);

Not sure if I suggested it during the first review but there's

  walk_stmt_load_store_ops ()

which lets you walk (via callbacks) all memory loads and stores in a stmt
(also loads of non-SSA registers).  Then there's

  FOR_EACH_SSA_DEF_OPERAND ()

or alternatively FOR_EACH_SSA_TREE_OPERAND () in case you are
also interested in uses.  For SSA uses you are currently missing
indexes in ARRAY_REFs and friends.  But as said I think you really
want to avoid instrumenting SSA uses.

> +       switch (gimple_code (stmt))
> +         {
> +         case GIMPLE_ASSIGN:
> +           changed |= instrument_assign (&gi, as_a <gassign *> (stmt), force);
> +           break;
> +         case GIMPLE_RETURN:
> +           changed |= instrument_return (&gi, as_a <greturn *> (stmt),
> +                                         force || (flag_vartrace & VARTRACE_RETURNS));
> +           break;
> +
> +           // for GIMPLE_CALL we use the argument logging in the callee
> +           // we could optionally log in the caller too to handle all possible
> +           // reads of a local/global when the callee is not instrumented
> +           // possibly later we could also instrument copy and clear calls.
> +
> +         case GIMPLE_SWITCH:
> +           changed |= !!instrument_mem (&gi, gimple_switch_index (as_a <gswitch *> (stmt)),
> +                                        force, stmt, "switch");
> +           break;
> +         case GIMPLE_COND:
> +           changed |= !!instrument_mem (&gi, gimple_cond_lhs (stmt), force, stmt, "if lhs");
> +           changed |= !!instrument_mem (&gi, gimple_cond_rhs (stmt), force, stmt, "if rhs");

switch and cond are always SSA uses (or constants).

> +           break;
> +
> +         case GIMPLE_ASM:
> +           changed |= instrument_asm (&gi, as_a<gasm *> (stmt), force);
> +           break;
> +         default:
> +           // everything else that reads/writes variables should be lowered already
> +           break;
> +         }
> +      }
> +
> +  // for now, until we fix all cases that destroy ssa
> +  return changed ? TODO_update_ssa : 0;;
> +}
> +
> +const pass_data pass_data_vartrace =
> +{
> +  GIMPLE_PASS, /* type */
> +  "vartrace", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_vartrace : public gimple_opt_pass
> +{
> +public:
> +  pass_vartrace (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_vartrace, ctxt)
> +  {}
> +
> +  virtual opt_pass * clone ()
> +    {
> +      return new pass_vartrace (m_ctxt);
> +    }
> +
> +  virtual bool gate (function *fun)
> +    {
> +      // check if vartrace is supported in backend
> +      if (!targetm.vartrace_func
> +         || targetm.vartrace_func (integer_type_node, false) == NULL)
> +       return false;
> +
> +      if (lookup_attribute ("no_vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
> +         || lookup_attribute ("no_vartrace", DECL_ATTRIBUTES (fun->decl)))
> +       return false;
> +
> +      // need to run pass always to check for variable attributes
> +      return true;
> +    }
> +
> +  virtual unsigned int execute (function *f) { return vartrace_execute (f); }
> +};
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_vartrace (gcc::context *ctxt)
> +{
> +  return new pass_vartrace (ctxt);
> +}
> --
> 2.19.1
>
Andi Kleen Nov. 20, 2018, 6:27 p.m. UTC | #2
On Tue, Nov 20, 2018 at 01:04:19PM +0100, Richard Biener wrote:
> Since your builtin clobbers memory

Hmm, maybe we could get rid of that, but then how to avoid
the optimizer moving it around over function calls etc.?
The instrumentation should still be useful when the program
crashes, so we don't want to delay logging too much.

> Maybe even instead pass it a number of bytes so it models how atomics work.

How could that reject float?

Mode seems better for now.

Eventually might support float/double through memory, but not in the
first version.


> >    NEXT_PASS (pass_tsan_O0);
> >    NEXT_PASS (pass_sanopt);
> > +  NEXT_PASS (pass_vartrace);
> 
> I'd move it one lower, after pass_cleanup_eh.  Further enhancement
> would make it a
> RTL pass ...

It's after pass_nrv now.

> So in reality the restriction is on the size of the object, correct?

The instruction accepts 32 or 64bit memory or register.

In principle everything could be logged through this, but i was trying
to limit the cases to integers and pointers for now to simplify
the problem.

Right now the backend fails up when something else than 4 or 8 bytes
is passed.

> 
> > +{
> > +  if (!supported_type (t))
> 
> You handle some nested cases below via recursion,
> like a.b.c (but not a[i][j]).  But then this check will
> fire.  I think it would be better to restructure the
> function to look at the outermost level for whether
> the op is of supported type, thus we can log it
> at all and then get all the way down to the base via
> sth like
> 
>   if (!supported_type (t))
>     return false;
>   enum attrstate s = <some default>;
>   do
>     {
>        s = supported_op (t, s);
>        if (s == force_off)
>          return false;
>     }
>   while (handled_component_p (t) && (t = TREE_OPERAND (t, 0)))
> 
> Now t is either an SSA_NAME, a DECL (you fail to handle PARM_DECL

Incoming arguments and returns are handled separately.

> and RESULT_DECL below) or a [TARGET_]MEM_REF.  To get rid
> of non-pointer indirections do then
> 
>       t = get_base_address (t);
>       if (DECL_P (t) && is_local (t))
>   ....
> 
> because...
> 
> > +    return false;
> > +
> > +  enum attrstate s = supported_op (t, neutral);
> > +  if (s == force_off)
> > +    return false;
> > +  if (s == force_on)
> > +    force = true;
> > +
> > +  switch (TREE_CODE (t))
> > +    {
> > +    case VAR_DECL:
> > +      if (DECL_ARTIFICIAL (t))
> > +       return false;
> > +      if (is_local (t))
> > +       return true;
> > +      return s == force_on || force;
> > +
> > +    case ARRAY_REF:
> > +      t = TREE_OPERAND (t, 0);
> > +      s = supported_op (t, s);
> > +      if (s == force_off)
> > +       return false;
> > +      return supported_type (TREE_TYPE (t));
> 
> Your supported_type is said to take a DECL.  And you
> already asked for this type - it's the type of the original t
> (well, the type of this type given TREE_TYPE (t) is an array type).
> But you'll reject a[i][j] where the type of this type is an array type as well.

Just to be clear, after your changes above I only need
to handle VAR_DECL and SSA_NAME here then, correct?

So one of the reasons I handled ARRAY_REF this way is to
trace the index as a local if needed. If I can assume
it was always in a MEM with an own ASSIGN earlier if the local 
was a user visible that wouldn't be needed (and also some other similar
code elsewhere)

But when I look at a simple test case like vartrace-6 

void
f (void)
{
  int i;
  for (i = 0; i < 10; i++)
   f2 ();
}

i appears to be a SSA name only that is referenced everywhere without
MEM. And if the user wants to track the value of i I would need
to explicitely handle all these cases. Do I miss something here?

I'm starting to think i should perhaps drop locals support to simplify
everything? But that might limit usability for debugging somewhat.


> gsi_insert_* does update_stmt already.  Btw, if you allow any
> SImode or DImode size value you can use a VIEW_CONVERT_EXPR

Just add them unconditionally? 

> > +bool
> > +instrument_args (function *fun)
> > +{
> > +  gimple_stmt_iterator gi;
> > +  bool changed = false;
> > +
> > +  /* Local tracing usually takes care of the argument too, when
> > +     they are read. This avoids redundant trace instructions.  */
> 
> But only when instrumenting reads?

Yes will add the check.

> 
> Hmm, but then this requires the target instruction to have a memory operand?

Yes that's right for now. Eventually it will be fixed and x86 would
benefit too.

> That's going to be unlikely for RISCy cases?  On x86 does it work if
> combine later does not syntesize a ptwrite with memory operand?
> I also wonder how this survives RTL CSE since you are basically doing
> 
>   mem = val;  // orig stmt
>   val' = mem;
>   ptwrite (val');
> 
> that probably means when CSE removes the load there ends up a debug-insn
> reflecting what you want?

I'll check.

> > +  /* Handle operators in case they read locals.  */
> 
> Does it make sense at all to instrument SSA "reads"?  You know
> all SSA vars have a single definition and all locals not in SSA form
> are represented with memory reads/writes, so ...

Ok but how about the locals in SSA form? Like in the example above.

I would love to simplify all this, but I fear that it would make
the locals tracking unusable. 

> 
> > +  if (gimple_num_ops (gas) > 2)
> > +    {
> 
> ... this case looks suprious.  And as said you probably do not want to
> isnstrument SSA "reads"?  Also not above when instrumenting
> gimple_assign_rhs1 (gas),
> so better guard that with gimple_assing_load_p (gas)?
> 
> > +      tvar = instrument_mem (gi, gimple_assign_rhs2 (gas),
> > +                             (flag_vartrace & VARTRACE_READS) || force,
> > +                             gas, "assign load2");
> > +      if (tvar)
> > +       {
> > +         gimple_assign_set_rhs2 (gas, tvar);
> 
> you still have this stmt adjusting here, why?

I tried removing it, but I had problems during testing (IIRC it was
definitions used after assignment), so I readded it. It might be me
doing something bogus elsewhere.

> > +/* Instrument return at statement STMT at GI with FORCE. Return true
> > +   if changed.  */
> > +
> > +bool
> > +instrument_return (gimple_stmt_iterator *gi, greturn *gret, bool force)
> > +{
> > +  tree rval = gimple_return_retval (gret);
> > +
> > +  if (!rval)
> > +    return false;
> > +  if (DECL_P (rval) && DECL_BY_REFERENCE (rval))
> > +    rval = build_simple_mem_ref (ssa_default_def (cfun, rval));
> 
> This looks bogus.  If DECL_BY_REFERENCE is true then rval
> is of pointer type and thus a register, you shouldn't ever see that
> returned plainly in gimple_return_retval.

Ok so what to do with the DECL_BY_REFERENCE then?

I think i copied this from some other pass.
> 
> I would guess instrumenting asms() has the chance of disturbing
> reg-alloc quite a bit...

You want to make it optional with a new argument?

> > +{
> > +  basic_block bb;
> > +  gimple_stmt_iterator gi;
> > +  bool force = 0;
> > +
> > +  if (lookup_attribute ("vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
> > +      || lookup_attribute ("vartrace", DECL_ATTRIBUTES (fun->decl)))
> 
> btw, I wonder whether the vartrace attribute should have an argument like
> vartrace(locals,reads,...)?

I was hoping this could be delayed until actually needed. It'll need
some changes because I don't want to do the full parsing for every decl
all the time, so would need to store a bitmap of options somewhere
in tree.

> > +
> > +  FOR_EACH_BB_FN (bb, fun)
> > +    for (gi = gsi_start_bb (bb); !gsi_end_p (gi); gsi_next (&gi))
> > +      {
> > +       gimple *stmt = gsi_stmt (gi);
> 
> Not sure if I suggested it during the first review but there's
> 
>   walk_stmt_load_store_ops ()
> 
> which lets you walk (via callbacks) all memory loads and stores in a stmt
> (also loads of non-SSA registers).  Then there's
> 
>   FOR_EACH_SSA_DEF_OPERAND ()
> 
> or alternatively FOR_EACH_SSA_TREE_OPERAND () in case you are
> also interested in uses.  For SSA uses you are currently missing
> indexes in ARRAY_REFs and friends.  But as said I think you really
> want to avoid instrumenting SSA uses.

I would love too, but would the example above still trace "i" ?


-Andi
Richard Biener Nov. 22, 2018, 1:53 p.m. UTC | #3
On Tue, Nov 20, 2018 at 7:27 PM Andi Kleen <andi@firstfloor.org> wrote:
>
> On Tue, Nov 20, 2018 at 01:04:19PM +0100, Richard Biener wrote:
> > Since your builtin clobbers memory
>
> Hmm, maybe we could get rid of that, but then how to avoid
> the optimizer moving it around over function calls etc.?
> The instrumentation should still be useful when the program
> crashes, so we don't want to delay logging too much.

You can't avoid moving it, yes.  But it can be even moved now, effectively
detaching it from the "interesting" $pc ranges we have debug location lists for?

> > Maybe even instead pass it a number of bytes so it models how atomics work.
>
> How could that reject float?

Why does it need to reject floats?  Note you are already rejecting floats
in the instrumentation pass.

> Mode seems better for now.
>
> Eventually might support float/double through memory, but not in the
> first version.

Why does movq %xmm0, %rax; ptwrite %rax not work?

>
> > >    NEXT_PASS (pass_tsan_O0);
> > >    NEXT_PASS (pass_sanopt);
> > > +  NEXT_PASS (pass_vartrace);
> >
> > I'd move it one lower, after pass_cleanup_eh.  Further enhancement
> > would make it a
> > RTL pass ...
>
> It's after pass_nrv now.
>
> > So in reality the restriction is on the size of the object, correct?
>
> The instruction accepts 32 or 64bit memory or register.
>
> In principle everything could be logged through this, but i was trying
> to limit the cases to integers and pointers for now to simplify
> the problem.
>
> Right now the backend fails up when something else than 4 or 8 bytes
> is passed.

Fair enough, the instrumentation would need to pad out smaller values
and/or split larger values.  I think 1 and 2 byte values would be interesting
so you can support char and shorts.  Eventually 16byte values for __int128
or vectors.

> >
> > > +{
> > > +  if (!supported_type (t))
> >
> > You handle some nested cases below via recursion,
> > like a.b.c (but not a[i][j]).  But then this check will
> > fire.  I think it would be better to restructure the
> > function to look at the outermost level for whether
> > the op is of supported type, thus we can log it
> > at all and then get all the way down to the base via
> > sth like
> >
> >   if (!supported_type (t))
> >     return false;
> >   enum attrstate s = <some default>;
> >   do
> >     {
> >        s = supported_op (t, s);
> >        if (s == force_off)
> >          return false;
> >     }
> >   while (handled_component_p (t) && (t = TREE_OPERAND (t, 0)))
> >
> > Now t is either an SSA_NAME, a DECL (you fail to handle PARM_DECL
>
> Incoming arguments and returns are handled separately.
>
> > and RESULT_DECL below) or a [TARGET_]MEM_REF.  To get rid
> > of non-pointer indirections do then
> >
> >       t = get_base_address (t);
> >       if (DECL_P (t) && is_local (t))
> >   ....
> >
> > because...
> >
> > > +    return false;
> > > +
> > > +  enum attrstate s = supported_op (t, neutral);
> > > +  if (s == force_off)
> > > +    return false;
> > > +  if (s == force_on)
> > > +    force = true;
> > > +
> > > +  switch (TREE_CODE (t))
> > > +    {
> > > +    case VAR_DECL:
> > > +      if (DECL_ARTIFICIAL (t))
> > > +       return false;
> > > +      if (is_local (t))
> > > +       return true;
> > > +      return s == force_on || force;
> > > +
> > > +    case ARRAY_REF:
> > > +      t = TREE_OPERAND (t, 0);
> > > +      s = supported_op (t, s);
> > > +      if (s == force_off)
> > > +       return false;
> > > +      return supported_type (TREE_TYPE (t));
> >
> > Your supported_type is said to take a DECL.  And you
> > already asked for this type - it's the type of the original t
> > (well, the type of this type given TREE_TYPE (t) is an array type).
> > But you'll reject a[i][j] where the type of this type is an array type as well.
>
> Just to be clear, after your changes above I only need
> to handle VAR_DECL and SSA_NAME here then, correct?

Yes (and PARM_DECL and RESULT_DECL).

> So one of the reasons I handled ARRAY_REF this way is to
> trace the index as a local if needed. If I can assume
> it was always in a MEM with an own ASSIGN earlier if the local
> was a user visible that wouldn't be needed (and also some other similar
> code elsewhere)

If the index lives in memory it has a corresponding load.  For SSA uses
see my comment about instrumenting them at all (together with the
suggestion on how to handle them in an easier way).

>
> But when I look at a simple test case like vartrace-6
>
> void
> f (void)
> {
>   int i;
>   for (i = 0; i < 10; i++)
>    f2 ();
> }
>
> i appears to be a SSA name only that is referenced everywhere without
> MEM. And if the user wants to track the value of i I would need
> to explicitely handle all these cases. Do I miss something here?

You handle those in different places I think.

> I'm starting to think i should perhaps drop locals support to simplify
> everything? But that might limit usability for debugging somewhat.

I think you confuse "locals" a bit.  In GIMPLE an automatic variable
of type 'int' might be assigned a memory location, then reads
(if not cached in a register) happen via separate memory load statements.
If it has not been assigned a memory location then it lives in a (SSA)
register.

>
> > gsi_insert_* does update_stmt already.  Btw, if you allow any
> > SImode or DImode size value you can use a VIEW_CONVERT_EXPR
>
> Just add them unconditionally?

You can use the simplification machinery to elide it automatically
for example.

   tree tem = gimple_build (&seq, VIEW_CONVERT_EXPR,
                                       uint/ulong-type, val);

gives you a register or value of desired type with a conversion
stmt appended to seq if one is required.

> > > +bool
> > > +instrument_args (function *fun)
> > > +{
> > > +  gimple_stmt_iterator gi;
> > > +  bool changed = false;
> > > +
> > > +  /* Local tracing usually takes care of the argument too, when
> > > +     they are read. This avoids redundant trace instructions.  */
> >
> > But only when instrumenting reads?
>
> Yes will add the check.
>
> >
> > Hmm, but then this requires the target instruction to have a memory operand?
>
> Yes that's right for now. Eventually it will be fixed and x86 would
> benefit too.
>
> > That's going to be unlikely for RISCy cases?  On x86 does it work if
> > combine later does not syntesize a ptwrite with memory operand?
> > I also wonder how this survives RTL CSE since you are basically doing
> >
> >   mem = val;  // orig stmt
> >   val' = mem;
> >   ptwrite (val');
> >
> > that probably means when CSE removes the load there ends up a debug-insn
> > reflecting what you want?
>
> I'll check.
>
> > > +  /* Handle operators in case they read locals.  */
> >
> > Does it make sense at all to instrument SSA "reads"?  You know
> > all SSA vars have a single definition and all locals not in SSA form
> > are represented with memory reads/writes, so ...
>
> Ok but how about the locals in SSA form? Like in the example above.
>
> I would love to simplify all this, but I fear that it would make
> the locals tracking unusable.

I'd instrument SSA variables at the point of their definition.

> >
> > > +  if (gimple_num_ops (gas) > 2)
> > > +    {
> >
> > ... this case looks suprious.  And as said you probably do not want to
> > isnstrument SSA "reads"?  Also not above when instrumenting
> > gimple_assign_rhs1 (gas),
> > so better guard that with gimple_assing_load_p (gas)?
> >
> > > +      tvar = instrument_mem (gi, gimple_assign_rhs2 (gas),
> > > +                             (flag_vartrace & VARTRACE_READS) || force,
> > > +                             gas, "assign load2");
> > > +      if (tvar)
> > > +       {
> > > +         gimple_assign_set_rhs2 (gas, tvar);
> >
> > you still have this stmt adjusting here, why?
>
> I tried removing it, but I had problems during testing (IIRC it was
> definitions used after assignment), so I readded it. It might be me
> doing something bogus elsewhere.

Probably...

> > > +/* Instrument return at statement STMT at GI with FORCE. Return true
> > > +   if changed.  */
> > > +
> > > +bool
> > > +instrument_return (gimple_stmt_iterator *gi, greturn *gret, bool force)
> > > +{
> > > +  tree rval = gimple_return_retval (gret);
> > > +
> > > +  if (!rval)
> > > +    return false;
> > > +  if (DECL_P (rval) && DECL_BY_REFERENCE (rval))
> > > +    rval = build_simple_mem_ref (ssa_default_def (cfun, rval));
> >
> > This looks bogus.  If DECL_BY_REFERENCE is true then rval
> > is of pointer type and thus a register, you shouldn't ever see that
> > returned plainly in gimple_return_retval.
>
> Ok so what to do with the DECL_BY_REFERENCE then?
>
> I think i copied this from some other pass.

The above should be unreachable so just remove it ;)

> >
> > I would guess instrumenting asms() has the chance of disturbing
> > reg-alloc quite a bit...
>
> You want to make it optional with a new argument?

Not sure.

> > > +{
> > > +  basic_block bb;
> > > +  gimple_stmt_iterator gi;
> > > +  bool force = 0;
> > > +
> > > +  if (lookup_attribute ("vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
> > > +      || lookup_attribute ("vartrace", DECL_ATTRIBUTES (fun->decl)))
> >
> > btw, I wonder whether the vartrace attribute should have an argument like
> > vartrace(locals,reads,...)?
>
> I was hoping this could be delayed until actually needed. It'll need
> some changes because I don't want to do the full parsing for every decl
> all the time, so would need to store a bitmap of options somewhere
> in tree.

Hmm, indeed.  I wonder if you need the function attribute at all then?  Maybe
a negative, no-vartrace is enough?  Maybe even that is not needed?
That is, on types and decls you'd interpret it as if locals tracing is on
then locals of type are traced but otherwise locals of type are not traced?
That is, if I just want to trace variable i and do

 int i __attribute__((vartrace));

then what options do I enable to make that work and how can I avoid
tracing anything else?  Similar for

 typedef int traced_int __attribute_((vartrace));

?  I guess we'd need -fvar-trace=locals to get the described effect for
the type attribute and then -fvar-trace=locals,all to have _all_ locals
traced?  Or -fvar-trace=locals,only-marked? ... or forgo with the idea
of marking types?

> > > +
> > > +  FOR_EACH_BB_FN (bb, fun)
> > > +    for (gi = gsi_start_bb (bb); !gsi_end_p (gi); gsi_next (&gi))
> > > +      {
> > > +       gimple *stmt = gsi_stmt (gi);
> >
> > Not sure if I suggested it during the first review but there's
> >
> >   walk_stmt_load_store_ops ()
> >
> > which lets you walk (via callbacks) all memory loads and stores in a stmt
> > (also loads of non-SSA registers).  Then there's
> >
> >   FOR_EACH_SSA_DEF_OPERAND ()
> >
> > or alternatively FOR_EACH_SSA_TREE_OPERAND () in case you are
> > also interested in uses.  For SSA uses you are currently missing
> > indexes in ARRAY_REFs and friends.  But as said I think you really
> > want to avoid instrumenting SSA uses.
>
> I would love too, but would the example above still trace "i" ?

Yes.  Or, well...  at -O0 you instrument sth like

f ()
{
  int i;

  <bb 2> :
  i_3 = 0;
  goto <bb 4>; [INV]

  <bb 3> :
  f2 ();
  i_6 = i_1 + 1;

  <bb 4> :
  # i_1 = PHI <i_3(2), i_6(3)>
  if (i_1 <= 9)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 5> :
  return;

if you instrument at SSA definition sites this would become

f ()
{
  int i;

  <bb 2> :
  i_3 = 0;
  trace (0);
  goto <bb 4>; [INV]

  <bb 3> :
  f2 ();
  i_6 = i_1 + 1;
  trace (i_6);

  <bb 4> :
  # i_1 = PHI <i_3(2), i_6(3)>
  trace (i_1);
  if (i_1 <= 9)
    goto <bb 3>; [INV]
  else
    goto <bb 5>; [INV]

  <bb 5> :
  return;

of course when optimizing you'll instead see

f ()
{
  unsigned int ivtmp_2;
  unsigned int ivtmp_10;

  <bb 2> [local count: 97603132]:

  <bb 3> [local count: 976138693]:
  # ivtmp_10 = PHI <10(2), ivtmp_2(3)>
  f2 ();
  ivtmp_2 = ivtmp_10 + 4294967295;
  if (ivtmp_2 != 0)
    goto <bb 3>; [90.91%]
  else
    goto <bb 4>; [9.09%]

  <bb 4> [local count: 97603132]:
  return;

so there's no 'i' anymore.  If you enable debug-info you
can get it back via looking at DEBUG_INSNs:

  <bb 2> [local count: 97603132]:
  # DEBUG BEGIN_STMT
  # DEBUG BEGIN_STMT
  # DEBUG i => 0
  # DEBUG i => 0

  <bb 3> [local count: 976138693]:
  # ivtmp_10 = PHI <10(2), ivtmp_2(3)>
  # DEBUG i => (int) (10 - ivtmp_10)
  # DEBUG BEGIN_STMT
  f2 ();
  # DEBUG D#1 => (int) (11 - ivtmp_10)
  # DEBUG i => D#1
  # DEBUG i => D#1
  ivtmp_2 = ivtmp_10 + 4294967295;
  if (ivtmp_2 != 0)
    goto <bb 3>; [90.91%]
  else
    goto <bb 4>; [9.09%]

  <bb 4> [local count: 97603132]:
  return;

but that's going to be a bit more tricky.  That is,
instrumenting memory is easy, instrumenting
register "reads"/"writes" is going to be a bit
tricky, at least when looking at optimized code
and instrumenting late.  (but ptwrite is only
interesting for optimized code, no?)

Richard.

>
> -Andi
>
Andi Kleen Nov. 22, 2018, 8:58 p.m. UTC | #4
On Thu, Nov 22, 2018 at 02:53:11PM +0100, Richard Biener wrote:
> > the optimizer moving it around over function calls etc.?
> > The instrumentation should still be useful when the program
> > crashes, so we don't want to delay logging too much.
> 
> You can't avoid moving it, yes.  But it can be even moved now, effectively
> detaching it from the "interesting" $pc ranges we have debug location lists for?
> 
> > > Maybe even instead pass it a number of bytes so it models how atomics work.
> >
> > How could that reject float?
> 
> Why does it need to reject floats?  Note you are already rejecting floats
> in the instrumentation pass.

The backend doesn't know how to generate the code for ptwrite %xmm0. 
So it would either need to learn about all these conversions, or reject
them in the hook so that the middle end eventually can generate
them. Or maybe hard code the knowledge in the middle end?

> > Mode seems better for now.
> >
> > Eventually might support float/double through memory, but not in the
> > first version.
> 
> Why does movq %xmm0, %rax; ptwrite %rax not work?

It works of course, the question is just who is in charge
of figuring out that the movq needs to be generated
(and that it's not a round, but a bit cast)

> Fair enough, the instrumentation would need to pad out smaller values
> and/or split larger values.  I think 1 and 2 byte values would be interesting
> so you can support char and shorts.  Eventually 16byte values for __int128
> or vectors.

Right. And for copies/clears too. But I was hoping to just get
the basics solid and then do these more advanced features later.

> > > btw, I wonder whether the vartrace attribute should have an argument like
> > > vartrace(locals,reads,...)?
> >
> > I was hoping this could be delayed until actually needed. It'll need
> > some changes because I don't want to do the full parsing for every decl
> > all the time, so would need to store a bitmap of options somewhere
> > in tree.
> 
> Hmm, indeed.  I wonder if you need the function attribute at all then?  Maybe

I really would like an opt-in per function. I think that's an important
use case. Just instrument a few functions that contribute to the bug
you're debugging to cut down the bandwidth.

The idea was that __attribute__((vartrace)) for a function would
log everything in that function, including locals.

I could see a use case to opt-in into a function without
locals (mainly because local tracking is more likely to cause
trace overflows if there is a lot of execution). But I think I can do
without that in v1 might add it later.

> That is, on types and decls you'd interpret it as if locals tracing is on
> then locals of type are traced but otherwise locals of type are not traced?

When the opt-in is on the type or the variable then the variable
should be tracked, even if it is an local (or maybe even a cast)

The locals check is mainly for the command line.

> That is, if I just want to trace variable i and do
> 
>  int i __attribute__((vartrace));
> 
> then what options do I enable to make that work and how can I avoid
> tracing anything else?  Similar for

Just enable -mptwrite (or nothing if it's implied with -mcpu=...) 

> 
>  typedef int traced_int __attribute_((vartrace));

Same.

> 
> ?  I guess we'd need -fvar-trace=locals to get the described effect for
> the type attribute and then -fvar-trace=locals,all to have _all_ locals
> traced?  Or -fvar-trace=locals,only-marked? ... or forgo with the idea
> of marking types?

You want a command line override to not trace if the attribute
is set?

-mno-ptwrite would work. Could also add something separate
in -fvartrace (perhaps implied in =off) but not sure if it's worth it.

I suppose it could make sense if someone wants to use the _ptwrite 
builtin separately still. I'll add it to =off.

> so there's no 'i' anymore.  If you enable debug-info you

Hmm I guess that's ok for now. I suppose it'll work with -Og.

> and instrumenting late.  (but ptwrite is only
> interesting for optimized code, no?)

It's very interesting for non optimized code too. In fact
that would be a common use case during debugging.


-Andi
diff mbox series

Patch

diff --git a/config/bootstrap-vartrace-locals.mk b/config/bootstrap-vartrace-locals.mk
new file mode 100644
index 00000000000..dd16640df74
--- /dev/null
+++ b/config/bootstrap-vartrace-locals.mk
@@ -0,0 +1,3 @@ 
+STAGE2_CFLAGS += -mptwrite -fvartrace=all
+STAGE3_CFLAGS += -mptwrite -fvartrace=all
+STAGE4_CFLAGS += -mptwrite -fvartrace=all
diff --git a/config/bootstrap-vartrace.mk b/config/bootstrap-vartrace.mk
new file mode 100644
index 00000000000..e29824d799b
--- /dev/null
+++ b/config/bootstrap-vartrace.mk
@@ -0,0 +1,3 @@ 
+STAGE2_CFLAGS += -mptwrite -fvartrace
+STAGE3_CFLAGS += -mptwrite -fvartrace
+STAGE4_CFLAGS += -mptwrite -fvartrace
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index ec793175c3b..64a99a1ec8a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1594,6 +1594,7 @@  OBJS = \
 	tree-vectorizer.o \
 	tree-vector-builder.o \
 	tree-vrp.o \
+	tree-vartrace.o \
 	tree.o \
 	typed-splay-tree.o \
 	unique-ptr-tests.o \
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 1657df7f9df..d50e78de830 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -104,6 +104,10 @@  static tree handle_tls_model_attribute (tree *, tree, tree, int,
 					bool *);
 static tree handle_no_instrument_function_attribute (tree *, tree,
 						     tree, int, bool *);
+static tree handle_vartrace_attribute (tree *, tree,
+						     tree, int, bool *);
+static tree handle_no_vartrace_attribute (tree *, tree,
+						     tree, int, bool *);
 static tree handle_no_profile_instrument_function_attribute (tree *, tree,
 							     tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
@@ -235,6 +239,13 @@  static const struct attribute_spec::exclusions attr_const_pure_exclusions[] =
   ATTR_EXCL (NULL, false, false, false)
 };
 
+static const struct attribute_spec::exclusions attr_vartrace_exclusions[] =
+{
+  ATTR_EXCL ("vartrace", true, true, true),
+  ATTR_EXCL ("no_vartrace", true, true, true),
+  ATTR_EXCL (NULL, false, false, false)
+};
+
 /* Table of machine-independent attributes common to all C-like languages.
 
    Current list of processed common attributes: nonnull.  */
@@ -326,6 +337,12 @@  const struct attribute_spec c_common_attribute_table[] =
   { "no_instrument_function", 0, 0, true,  false, false, false,
 			      handle_no_instrument_function_attribute,
 			      NULL },
+  { "vartrace",		      0, 0, false,  false, false, false,
+			      handle_vartrace_attribute,
+			      attr_vartrace_exclusions },
+  { "no_vartrace",	      0, 0, false,  false, false, false,
+			      handle_no_vartrace_attribute,
+			      attr_vartrace_exclusions },
   { "no_profile_instrument_function",  0, 0, true, false, false, false,
 			      handle_no_profile_instrument_function_attribute,
 			      NULL },
@@ -770,6 +787,66 @@  handle_no_sanitize_undefined_attribute (tree *node, tree name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle "vartrace" attribute; arguments as in struct
+   attribute_spec.handler.  */
+
+static tree
+handle_vartrace_attribute (tree *node, tree name, tree, int flags,
+			   bool *no_add_attrs)
+{
+  if (!VAR_OR_FUNCTION_DECL_P (*node) && !TYPE_P (*node) &&
+      TREE_CODE (*node) != FIELD_DECL)
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored for object", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (!targetm.vartrace_func)
+    {
+      warning (OPT_Wattributes, "%qE attribute not supported for target", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (TREE_TYPE (*node)
+      && TREE_CODE (*node) != FUNCTION_DECL
+      && targetm.vartrace_func (TREE_TYPE (*node), true) == NULL_TREE)
+   {
+      warning (OPT_Wattributes, "%qE attribute not supported for type", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+   }
+
+  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+    *node = build_variant_type_copy (*node);
+
+  /* We look it up later with lookup_attribute.  */
+  return NULL_TREE;
+}
+
+/* Handle "no_vartrace" attribute; arguments as in struct
+   attribute_spec.handler.  */
+
+static tree
+handle_no_vartrace_attribute (tree *node, tree name, tree, int flags,
+			      bool *no_add_attrs)
+{
+  if (!VAR_OR_FUNCTION_DECL_P (*node) && !TYPE_P (*node)
+      && TREE_CODE (*node) != FIELD_DECL)
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+    *node = build_variant_type_copy (*node);
+
+  /* We look it up later with lookup_attribute.  */
+  return NULL_TREE;
+}
+
 /* Handle an "asan odr indicator" attribute; arguments as in
    struct attribute_spec.handler.  */
 
diff --git a/gcc/common.opt b/gcc/common.opt
index 72a713593c3..f27a57b1177 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -221,6 +221,10 @@  unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NO
 Variable
 unsigned int flag_sanitize_coverage
 
+; What to instrument with vartrace
+Variable
+unsigned int flag_vartrace
+
 ; Flag whether a prefix has been added to dump_base_name
 Variable
 bool dump_base_name_prefixed = false
@@ -2856,6 +2860,10 @@  ftree-scev-cprop
 Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
 Enable copy propagation of scalar-evolution information.
 
+fvartrace
+Common JoinedOrMissing Report Driver
+-fvartrace=default|all|locals|returns|args|reads|writes|off   Enable variable tracing instrumentation.
+
 ; -fverbose-asm causes extra commentary information to be produced in
 ; the generated assembly code (to make it more readable).  This option
 ; is generally only of use to those who actually need to read the
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index c18c60a1d19..6d130bf0804 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -31900,6 +31900,35 @@  ix86_mangle_function_version_assembler_name (tree decl, tree id)
   return ret;
 }
 
+/* Hook to determine that TYPE can be traced.  Ignore target flags
+   if FORCE is true. Returns the tracing builtin if tracing is possible,
+   or otherwise NULL.  */
+
+static tree
+ix86_vartrace_func (tree type, bool force)
+{
+  if (!(ix86_isa_flags2 & OPTION_MASK_ISA_PTWRITE))
+    {
+      /* With force, as in checking for the attribute, ignore
+	 the current target settings. Otherwise it's not
+	 possible to declare vartrace variables outside
+	 an __attribute__((target("ptwrite"))) function
+	 if -mptwrite is not specified.  */
+      if (!force)
+	return NULL;
+      /* Initialize the builtins if missing, so that we have
+	 something to return.  */
+      if (!ix86_builtins[(int)IX86_BUILTIN_PTWRITE32])
+	ix86_add_new_builtins (0, OPTION_MASK_ISA_PTWRITE);
+    }
+
+  if (TYPE_PRECISION (type) == 32)
+    return ix86_builtins[(int) IX86_BUILTIN_PTWRITE32];
+  else if (TYPE_PRECISION (type) == 64)
+    return ix86_builtins[(int) IX86_BUILTIN_PTWRITE64];
+  else
+    return NULL;
+}
 
 static tree 
 ix86_mangle_decl_assembler_name (tree decl, tree id)
@@ -50894,6 +50923,9 @@  ix86_run_selftests (void)
 #undef TARGET_ASAN_SHADOW_OFFSET
 #define TARGET_ASAN_SHADOW_OFFSET ix86_asan_shadow_offset
 
+#undef TARGET_VARTRACE_FUNC
+#define TARGET_VARTRACE_FUNC ix86_vartrace_func
+
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index d4b1046b6ae..f0cde5dcf0f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3228,6 +3228,13 @@  the standard C library can be guaranteed not to throw an exception
 with the notable exceptions of @code{qsort} and @code{bsearch} that
 take function pointer arguments.
 
+@item no_vartrace
+The @code{no_vartrace} attribute disables data tracing for
+the function [or variable or structure field] declared with
+the attribute. See @pxref{Common Variable Attributes} and
+@pxref{Common Type Attributes}. When specified for a function
+nothing in the function is traced.
+
 @item optimize (@var{level}, @dots{})
 @item optimize (@var{string}, @dots{})
 @cindex @code{optimize} function attribute
@@ -3489,6 +3496,21 @@  When applied to a member function of a C++ class template, the
 attribute also means that the function is instantiated if the
 class itself is instantiated.
 
+@item vartrace
+@cindex @code{vartrace} function or variable attribute
+Enable data tracing for the function or variable or structure field
+marked with this attribute. When applied to a type all instances of the type
+will be traced. When applied to a structure or union all fields will be traced.
+When applied to a structure field that field will be traced.
+For functions will not trace locals (unless enabled on the command line or
+on as an attribute the local itself) but arguments, returns, globals, pointer
+references. For variables or types or structure members any reads or writes will be traced.
+Only integer or pointer types can be tracked.
+
+Currently implemented for x86 when the @option{ptwrite} target option
+is enabled for systems that support the @code{PTWRITE} instruction,
+supporting 4 and 8 byte integers.
+
 @item visibility ("@var{visibility_type}")
 @cindex @code{visibility} function attribute
 This attribute affects the linkage of the declaration to which it is attached.
@@ -7196,6 +7218,16 @@  A @{ /* @r{@dots{}} */ @};
 struct __attribute__ ((copy ( (struct A *)0)) B @{ /* @r{@dots{}} */ @};
 @end smallexample
 
+@cindex @code{vartrace} type attribute
+@cindex @code{no_vartrace} type attribute
+@item vartrace
+@itemx no_vartrace
+Specify that all instances of type should be variable traced
+or not variable traced. Can be also also applied to function
+types to disable tracing for all instances of that function type.
+Can be also applied to structure fields. See the description in
+@pxref{Variable Attributes} for more details.
+
 @item deprecated
 @itemx deprecated (@var{msg})
 @cindex @code{deprecated} type attribute
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 535b258d22b..8b9d1d71669 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2754,6 +2754,33 @@  Don't use the @code{__cxa_get_exception_ptr} runtime routine.  This
 causes @code{std::uncaught_exception} to be incorrect, but is necessary
 if the runtime routine is not available.
 
+@item -fvartrace
+@opindex -fvartrace=...
+Insert trace instructions to trace variables at runtime with a tracer
+such as Intel Processor Trace.
+
+Requires enabling a backend specific option, like @option{-mptwrite} to enable
+@code{PTWRITE} instruction generation on x86.
+
+Additional qualifiers can be specified after the =: @option{args}
+for arguments, @option{off} to disable, @option{returns} for tracing
+return values, @option{reads} to trace reads, @option{writes} to trace
+writes, @option{locals} to trace locals.
+Default when enabled is tracing reads, writes, arguments, returns, objects
+with a static or thread duration but no locals.
+Multiple options can be separated by comma.
+
+When -fvartrace enabled, the compiler will add trace instructions. By
+default these trace options act like nops, unless tracing is
+enabled. For example to enable tracing on x86 Linux with Linux perf
+use:
+
+@smallexample
+gcc -fvartrace -o program -g program.c
+perf record -e intel_pt/pt=1,ptw=1,branch=0/ program
+perf script
+@end smallexample
+
 @item -fvisibility-inlines-hidden
 @opindex fvisibility-inlines-hidden
 This switch declares that the user does not attempt to compare
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 0a2ad9a745e..861914ac646 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11936,6 +11936,12 @@  Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
 supported by the target.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VARTRACE_FUNC (tree @var{type}, bool @var{force})
+Return a builtin to call to trace variables of type TYPE or NULL if not supported
+by the target. Ignore target configuration if FORCE is true. The builtin gets called with a
+single argument of TYPE.
+@end deftypefn
+
 @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK (unsigned HOST_WIDE_INT @var{val})
 Validate target specific memory model mask bits. When NULL no target specific
 memory model bits are allowed.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f1ad80da467..756e84d0e07 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8104,6 +8104,8 @@  and the associated definitions of those functions.
 
 @hook TARGET_ASAN_SHADOW_OFFSET
 
+@hook TARGET_VARTRACE_FUNC
+
 @hook TARGET_MEMMODEL_CHECK
 
 @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 500f6638f36..6e5add2da9d 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -261,6 +261,15 @@  enum sanitize_code {
 				  | SANITIZE_BOUNDS_STRICT
 };
 
+/* Settings for flag_vartrace */
+enum flag_vartrace {
+  VARTRACE_LOCALS = 1 << 0,
+  VARTRACE_ARGS = 1 << 1,
+  VARTRACE_RETURNS = 1 << 2,
+  VARTRACE_READS = 1 << 3,
+  VARTRACE_WRITES = 1 << 4
+};
+
 /* Settings of flag_incremental_link.  */
 enum incremental_link {
   INCREMENTAL_LINK_NONE,
diff --git a/gcc/opts.c b/gcc/opts.c
index 318ed442057..c6c47b55a39 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1872,6 +1872,65 @@  check_alignment_argument (location_t loc, const char *flag, const char *name)
   parse_and_check_align_values (flag, name, align_result, true, loc);
 }
 
+/* Parse vartrace options in P, updating flags OPTS at LOC and return
+   updated flags.  */
+
+static int
+parse_vartrace_options (const char *p, int opts, location_t loc)
+{
+  static struct {
+    const char *name;
+    int opt;
+  } vopts[] =
+      {
+       { "default",
+	 VARTRACE_ARGS | VARTRACE_RETURNS | VARTRACE_READS
+	 | VARTRACE_WRITES }, /* Keep as first entry.  */
+       { "all",
+	 VARTRACE_ARGS | VARTRACE_RETURNS | VARTRACE_READS
+	 | VARTRACE_WRITES | VARTRACE_LOCALS },
+       { "args", VARTRACE_ARGS },
+       { "returns", VARTRACE_RETURNS },
+       { "reads", VARTRACE_READS },
+       { "writes", VARTRACE_WRITES },
+       { "locals", VARTRACE_LOCALS },
+       { NULL, 0 }
+      };
+
+  if (*p == '=')
+    p++;
+  if (*p == 0)
+    return opts | vopts[0].opt;
+
+  if (!strcmp (p, "off"))
+    return 0;
+
+  while (*p)
+    {
+      unsigned len = strcspn (p, ",");
+      int i;
+
+      for (i = 0; vopts[i].name; i++)
+	{
+	  if (len == strlen (vopts[i].name) && !strncmp (p, vopts[i].name, len))
+	    {
+	      opts |= vopts[i].opt;
+	      break;
+	    }
+	}
+      if (vopts[i].name == NULL)
+	{
+	  error_at (loc, "invalid argument to %qs", "-fvartrace");
+	  break;
+	}
+
+      p += len;
+      if (*p == ',')
+	p++;
+    }
+  return opts;
+}
+
 /* Handle target- and language-independent options.  Return zero to
    generate an "unknown option" message.  Only options that need
    extra handling need to be listed here; if you simply want
@@ -2078,6 +2137,10 @@  common_handle_option (struct gcc_options *opts,
     case OPT__completion_:
       break;
 
+    case OPT_fvartrace:
+      opts->x_flag_vartrace = parse_vartrace_options (arg, opts->x_flag_vartrace, loc);
+      break;
+
     case OPT_fsanitize_:
       opts->x_flag_sanitize
 	= parse_sanitizer_options (arg, loc, code,
diff --git a/gcc/passes.def b/gcc/passes.def
index 24f212c8e31..36d14b0386a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -394,6 +394,7 @@  along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_asan_O0);
   NEXT_PASS (pass_tsan_O0);
   NEXT_PASS (pass_sanopt);
+  NEXT_PASS (pass_vartrace);
   NEXT_PASS (pass_cleanup_eh);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
diff --git a/gcc/target.def b/gcc/target.def
index f9469d69cb0..b5d970870a4 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4300,6 +4300,15 @@  supported by the target.",
  unsigned HOST_WIDE_INT, (void),
  NULL)
 
+/* Defines the builtin to trace variables, or NULL.  */
+DEFHOOK
+(vartrace_func,
+ "Return a builtin to call to trace variables of type TYPE or NULL if not supported\n\
+by the target. Ignore target configuration if FORCE is true. The builtin gets called with a\n\
+single argument of TYPE.",
+ tree, (tree type, bool force),
+ NULL)
+
 /* Functions relating to calls - argument passing, returns, etc.  */
 /* Members of struct call have no special macro prefix.  */
 HOOK_VECTOR (TARGET_CALLS, calls)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index af15adc8e0c..2cf31785a6f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -423,6 +423,7 @@  extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_post_ipa_warn (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_stdarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_vartrace (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_early_warn_uninitialized (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_late_warn_uninitialized (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cse_reciprocals (gcc::context *ctxt);
diff --git a/gcc/tree-vartrace.c b/gcc/tree-vartrace.c
new file mode 100644
index 00000000000..1ef81b743fc
--- /dev/null
+++ b/gcc/tree-vartrace.c
@@ -0,0 +1,491 @@ 
+/* Insert instructions for data value tracing.
+   Copyright (C) 2017, 2018 Free Software Foundation, Inc.
+   Contributed by Andi Kleen.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "tree-iterator.h"
+#include "tree-pass.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "gimplify-me.h"
+#include "gimple-ssa.h"
+#include "gimple-pretty-print.h"
+#include "cfghooks.h"
+#include "fold-const.h"
+#include "ssa.h"
+#include "tree-dfa.h"
+#include "attribs.h"
+
+namespace {
+
+enum attrstate { force_off, force_on, neutral };
+
+/* Can we trace with attributes ATTR.  */
+
+attrstate
+supported_attr (tree attr)
+{
+  if (lookup_attribute ("no_vartrace", attr))
+    return force_off;
+  if (lookup_attribute ("vartrace", attr))
+    return force_on;
+  return neutral;
+}
+
+/* Is tracing enabled for ARG considering S.  */
+
+attrstate
+supported_op (tree arg, attrstate s)
+{
+  if (s != neutral)
+    return s;
+  if (DECL_P (arg))
+    {
+      s = supported_attr (DECL_ATTRIBUTES (arg));
+      if (s != neutral)
+	return s;
+    }
+  return supported_attr (TYPE_ATTRIBUTES (TREE_TYPE (arg)));
+}
+
+/* Can we trace DECL.  */
+
+bool
+supported_type (tree decl)
+{
+  tree type = TREE_TYPE (decl);
+
+  return POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type);
+}
+
+/* Return true if DECL is refering to a local variable.  */
+
+bool
+is_local (tree decl)
+{
+  if (!(flag_vartrace & VARTRACE_LOCALS) && supported_op (decl, neutral) != force_on)
+    return false;
+  return auto_var_in_fn_p (decl, cfun->decl);
+}
+
+/* Is T something we can log, FORCEing the type if needed.  */
+
+bool
+supported_mem (tree t, bool force)
+{
+  if (!supported_type (t))
+    return false;
+
+  enum attrstate s = supported_op (t, neutral);
+  if (s == force_off)
+    return false;
+  if (s == force_on)
+    force = true;
+
+  switch (TREE_CODE (t))
+    {
+    case VAR_DECL:
+      if (DECL_ARTIFICIAL (t))
+	return false;
+      if (is_local (t))
+	return true;
+      return s == force_on || force;
+
+    case ARRAY_REF:
+      t = TREE_OPERAND (t, 0);
+      s = supported_op (t, s);
+      if (s == force_off)
+	return false;
+      return supported_type (TREE_TYPE (t));
+
+    case COMPONENT_REF:
+      s = supported_op (TREE_OPERAND (t, 1), s);
+      t = TREE_OPERAND (t, 0);
+      if (s == neutral && is_local (t))
+	return true;
+      s = supported_op (t, s);
+      if (s != neutral)
+	return s == force_on ? true : false;
+      return supported_mem (t, force);
+
+      // support BIT_FIELD_REF?
+
+    case VIEW_CONVERT_EXPR:
+    case TARGET_MEM_REF:
+    case MEM_REF:
+      return supported_mem (TREE_OPERAND (t, 0), force);
+
+    case SSA_NAME:
+      if ((flag_vartrace & VARTRACE_LOCALS)
+	  && SSA_NAME_VAR (t)
+	  && !DECL_IGNORED_P (SSA_NAME_VAR (t)))
+	return true;
+      return force;
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
+/* Print debugging for inserting CODE at ORIG_STMT with type of VAL for WHY.  */
+
+void
+log_trace_code (gimple *orig_stmt, gimple *code, tree val, const char *why)
+{
+  if (!dump_file)
+    return;
+  if (orig_stmt)
+    fprintf (dump_file, "BB%d ", gimple_bb (orig_stmt)->index);
+  fprintf (dump_file, "%s inserting ", why);
+  print_gimple_stmt (dump_file, code, 0, TDF_VOPS|TDF_MEMSYMS);
+  if (orig_stmt)
+    {
+      fprintf (dump_file, "orig ");
+      print_gimple_stmt (dump_file, orig_stmt, 2,
+			     TDF_VOPS|TDF_MEMSYMS);
+    }
+  fprintf (dump_file, "type ");
+  print_generic_expr (dump_file, TREE_TYPE (val), TDF_SLIM);
+  fputc ('\n', dump_file);
+  fputc ('\n', dump_file);
+}
+
+/* Insert variable tracing code for VAL before iterator GI, originally
+   for ORIG_STMT and optionally at LOC. Normally before ORIG_STMT, but
+   AFTER if true. Reason is WHY. Return trace var if successfull,
+   or NULL_TREE.  */
+
+tree
+insert_trace (gimple_stmt_iterator *gi, tree val, gimple *orig_stmt,
+	      const char *why, location_t loc = -1, bool after = false)
+{
+  if (loc == (location_t)-1)
+    loc = gimple_location (orig_stmt);
+
+  tree func = targetm.vartrace_func (TREE_TYPE (val), false);
+  if (!func)
+    return NULL_TREE;
+
+  tree tvar = val;
+  if (!is_gimple_reg (val))
+    {
+      tvar = make_ssa_name (TREE_TYPE (val));
+      gassign *assign = gimple_build_assign (tvar, unshare_expr (val));
+      log_trace_code (orig_stmt, assign, val, "copy");
+      gimple_set_location (assign, loc);
+      if (after)
+	gsi_insert_after (gi, assign, GSI_CONTINUE_LINKING);
+      else
+	gsi_insert_before (gi, assign, GSI_SAME_STMT);
+      update_stmt (assign);
+    }
+
+  gcall *call = gimple_build_call (func, 1, tvar);
+  log_trace_code (NULL, call, tvar, why);
+  gimple_set_location (call, loc);
+  if (after)
+    gsi_insert_after (gi, call, GSI_CONTINUE_LINKING);
+  else
+    gsi_insert_before (gi, call, GSI_SAME_STMT);
+  update_stmt (call);
+  return tvar;
+}
+
+/* Insert trace at GI for T in FUN if suitable memory or variable
+   reference.  Always if FORCE. Originally on ORIG_STMT. Reason is
+   WHY.  Insert after GI if AFTER. Returns trace variable or NULL_TREE.  */
+
+tree
+instrument_mem (gimple_stmt_iterator *gi, tree t, bool force,
+		gimple *orig_stmt, const char *why, bool after = false)
+{
+  if (supported_mem (t, force))
+    return insert_trace (gi, t, orig_stmt, why, -1, after);
+  return NULL_TREE;
+}
+
+/* Instrument arguments for FUN. Return true if changed.  */
+
+bool
+instrument_args (function *fun)
+{
+  gimple_stmt_iterator gi;
+  bool changed = false;
+
+  /* Local tracing usually takes care of the argument too, when
+     they are read. This avoids redundant trace instructions.  */
+  if (flag_vartrace & VARTRACE_LOCALS)
+    return false;
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+       arg != NULL_TREE;
+       arg = DECL_CHAIN (arg))
+    {
+     gi = gsi_start_bb (BASIC_BLOCK_FOR_FN (fun, NUM_FIXED_BLOCKS));
+     tree type = TREE_TYPE (arg);
+     if (POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type))
+	{
+	  tree func = targetm.vartrace_func (TREE_TYPE (arg), false);
+	  if (!func)
+	    continue;
+
+	  if (!is_gimple_reg (arg))
+	    continue;
+	  tree sarg = ssa_default_def (fun, arg);
+	  if (!sarg)
+	    continue;
+
+	  gimple_stmt_iterator egi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
+	  changed |= !!insert_trace (&egi, sarg, NULL, "arg", fun->function_start_locus);
+	}
+    }
+  return changed;
+}
+
+/* Generate trace call for store GAS at GI, force if FORCE.  Return true
+   if successfull. Return true if successfully inserted.  */
+
+bool
+instrument_store (gimple_stmt_iterator *gi, gassign *gas, bool force)
+{
+  tree orig = gimple_assign_lhs (gas);
+
+  if (!supported_mem (orig, force))
+    return false;
+
+  tree func = targetm.vartrace_func (TREE_TYPE (orig), false);
+  if (!func)
+    return false;
+
+  /* Generate another reference to target. That can be racy, but is
+     guaranteed to have the debug location of the target.  Better
+     would be to use the original value to avoid any races, but we
+     would need to somehow force the target location of the
+     builtin.  */
+
+  tree tvar = make_ssa_name (TREE_TYPE (orig));
+  gassign *assign = gimple_build_assign (tvar, unshare_expr (orig));
+  log_trace_code (gas, assign, orig, "store copy");
+  gimple_set_location (assign, gimple_location (gas));
+  gsi_insert_after (gi, assign, GSI_CONTINUE_LINKING);
+  update_stmt (assign);
+
+  gcall *tcall = gimple_build_call (func, 1, tvar);
+  log_trace_code (gas, tcall, tvar, "store");
+  gimple_set_location (tcall, gimple_location (gas));
+  gsi_insert_after (gi, tcall, GSI_CONTINUE_LINKING);
+  update_stmt (tcall);
+  return true;
+}
+
+/* Instrument STMT at GI. Force if FORCE. Return true if changed.  */
+
+bool
+instrument_assign (gimple_stmt_iterator *gi, gassign *gas, bool force)
+{
+  if (gimple_clobber_p (gas))
+    return false;
+  bool changed = false;
+  tree tvar = instrument_mem (gi, gimple_assign_rhs1 (gas),
+			      (flag_vartrace & VARTRACE_READS) || force,
+			      gas, "assign load1");
+  if (tvar)
+    {
+      gimple_assign_set_rhs1 (gas, tvar);
+      changed = true;
+    }
+  /* Handle operators in case they read locals.  */
+  if (gimple_num_ops (gas) > 2)
+    {
+      tvar = instrument_mem (gi, gimple_assign_rhs2 (gas),
+			      (flag_vartrace & VARTRACE_READS) || force,
+			      gas, "assign load2");
+      if (tvar)
+	{
+	  gimple_assign_set_rhs2 (gas, tvar);
+	  changed = true;
+	}
+    }
+  // handle more ops?
+
+  if (gimple_store_p (gas))
+    changed |= instrument_store (gi, gas,
+				 (flag_vartrace & VARTRACE_WRITES) || force);
+
+  if (changed)
+    update_stmt (gas);
+  return changed;
+}
+
+/* Instrument return at statement STMT at GI with FORCE. Return true
+   if changed.  */
+
+bool
+instrument_return (gimple_stmt_iterator *gi, greturn *gret, bool force)
+{
+  tree rval = gimple_return_retval (gret);
+
+  if (!rval)
+    return false;
+  if (DECL_P (rval) && DECL_BY_REFERENCE (rval))
+    rval = build_simple_mem_ref (ssa_default_def (cfun, rval));
+  if (supported_mem (rval, force))
+    return !!insert_trace (gi, rval, gret, "return");
+  return false;
+}
+
+/* Instrument asm at GI in statement STMT with FORCE if needed. Return
+   true if changed.  */
+
+bool
+instrument_asm (gimple_stmt_iterator *gi, gasm *stmt, bool force)
+{
+  bool changed = false;
+
+  for (unsigned i = 0; i < gimple_asm_ninputs (stmt); i++)
+    changed |= !!instrument_mem (gi, TREE_VALUE (gimple_asm_input_op (stmt, i)),
+				 force || (flag_vartrace & VARTRACE_READS), stmt,
+				 "asm input");
+  for (unsigned i = 0; i < gimple_asm_noutputs (stmt); i++)
+    {
+      tree o = TREE_VALUE (gimple_asm_output_op (stmt, i));
+      if (supported_mem (o, force | (flag_vartrace & VARTRACE_WRITES)))
+	  changed |= !!insert_trace (gi, o, stmt, "asm output", -1, true);
+    }
+  return changed;
+}
+
+/* Insert vartrace calls for FUN.  */
+
+unsigned int
+vartrace_execute (function *fun)
+{
+  basic_block bb;
+  gimple_stmt_iterator gi;
+  bool force = 0;
+
+  if (lookup_attribute ("vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
+      || lookup_attribute ("vartrace", DECL_ATTRIBUTES (fun->decl)))
+    force = true;
+
+  bool changed = false;
+
+  if ((flag_vartrace & VARTRACE_ARGS) || force)
+    changed |= instrument_args (fun);
+
+  FOR_EACH_BB_FN (bb, fun)
+    for (gi = gsi_start_bb (bb); !gsi_end_p (gi); gsi_next (&gi))
+      {
+	gimple *stmt = gsi_stmt (gi);
+	switch (gimple_code (stmt))
+	  {
+	  case GIMPLE_ASSIGN:
+	    changed |= instrument_assign (&gi, as_a <gassign *> (stmt), force);
+	    break;
+	  case GIMPLE_RETURN:
+	    changed |= instrument_return (&gi, as_a <greturn *> (stmt),
+					  force || (flag_vartrace & VARTRACE_RETURNS));
+	    break;
+
+	    // for GIMPLE_CALL we use the argument logging in the callee
+	    // we could optionally log in the caller too to handle all possible
+	    // reads of a local/global when the callee is not instrumented
+	    // possibly later we could also instrument copy and clear calls.
+
+	  case GIMPLE_SWITCH:
+	    changed |= !!instrument_mem (&gi, gimple_switch_index (as_a <gswitch *> (stmt)),
+					 force, stmt, "switch");
+	    break;
+	  case GIMPLE_COND:
+	    changed |= !!instrument_mem (&gi, gimple_cond_lhs (stmt), force, stmt, "if lhs");
+	    changed |= !!instrument_mem (&gi, gimple_cond_rhs (stmt), force, stmt, "if rhs");
+	    break;
+
+	  case GIMPLE_ASM:
+	    changed |= instrument_asm (&gi, as_a<gasm *> (stmt), force);
+	    break;
+	  default:
+	    // everything else that reads/writes variables should be lowered already
+	    break;
+	  }
+      }
+
+  // for now, until we fix all cases that destroy ssa
+  return changed ? TODO_update_ssa : 0;;
+}
+
+const pass_data pass_data_vartrace =
+{
+  GIMPLE_PASS, /* type */
+  "vartrace", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_vartrace : public gimple_opt_pass
+{
+public:
+  pass_vartrace (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_vartrace, ctxt)
+  {}
+
+  virtual opt_pass * clone ()
+    {
+      return new pass_vartrace (m_ctxt);
+    }
+
+  virtual bool gate (function *fun)
+    {
+      // check if vartrace is supported in backend
+      if (!targetm.vartrace_func
+	  || targetm.vartrace_func (integer_type_node, false) == NULL)
+	return false;
+
+      if (lookup_attribute ("no_vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
+	  || lookup_attribute ("no_vartrace", DECL_ATTRIBUTES (fun->decl)))
+	return false;
+
+      // need to run pass always to check for variable attributes
+      return true;
+    }
+
+  virtual unsigned int execute (function *f) { return vartrace_execute (f); }
+};
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_vartrace (gcc::context *ctxt)
+{
+  return new pass_vartrace (ctxt);
+}