diff mbox series

[1/2] Add a pass to automatically add ptwrite instrumentation

Message ID 20191111074026.26013-1-andi@firstfloor.org
State New
Headers show
Series [1/2] Add a pass to automatically add ptwrite instrumentation | expand

Commit Message

Andi Kleen Nov. 11, 2019, 7:40 a.m. UTC
From: Andi Kleen <ak@linux.intel.com>

[v4: Rebased on current tree. Avoid some redundant log statements
for locals and a few other fixes.  Fix some comments. Improve
documentation. Did some studies on the debug information quality,
see below]

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows low over head
logging of information. Essentially it's a hardware accelerated
printf.

This allows to reconstruct how values later from the log,
which can be useful for debugging or other analysis of the program
behavior. With the compiler support this can be done with without
having to manually add instrumentation to the code.

Using dwarf information this can be later mapped back to the variables.
The decoder decodes the PTWRITE instructions using IP information
in the log, and then looks up the argument in the debug information.
Then this can be used to reconstruct the original variable
name to display a value history for the variable.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs in a flexible matter.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

Earlier there were concerns that the debug information is not
always associated with the ptwrite instruction because the
backend doesn't know how to keep it together.

I did some experiments using -fdump-rtl-final, just checking if the
PTWRITE builtin has a variable location on "loop-unroll.c" from
the gcc source.

With -fvartrace=args there is good coverage, the ptwrite always had a
usable variable name.

With -fvartrace=returns there is usually no variable name, but in this
case the decoder can figure it out by looking for the RET, and knowing
that the value is in %rax.

With -fvartrace=reads,writes the ptwrite usually just has the variable
name of the gimple temporary in a register.
However there is near always a rtl set for the address just before it,
and the set tends to have the expected name/type and offset (if for a
structure). There are two ways to handle this: either could teach the
decoder to track debug info for registers. Or alternatively could
change the ptwrite define_insn to avoid splitting the effective
address into a separate register. I tried this with some different
constraints, but wasn't successfull. I hope there's some way to do
this though.

With -fvartrace=locals there's a mix. Most accesses have the correct
variable name, but sometimes it is lost. I believe that's acceptable,
locals is more an experimental option and may not be too useful anyways
because it generates a lot of traffic.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Gemini Lake is used in low
end laptops ("Intel Pentium Silver J5...... / Celeron N4... /
Celeron J4...")

Linux perf (4.10+) can be used to record the values

perf record -e intel_pt/ptw=1,pt=1,branch=0,fup_on_ptw=1/u ./program
perf script -F +srcline ..

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-5

It is currently not able to decode all variable locations to names,
but a large subset.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace=locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace=all instrumentation, and running
the test suite with -fvartrace=all. There are some
additional failures in the test suite with -fvartrace=all, but they
all seem to be due to fragile tests not expecting additional
code.

gcc/:

2019-11-10  Andi Kleen  <ak@linux.intel.com>
	    Richard Biener  <rguenther@suse.de>

	* Makefile.in: Add tree-vartrace.o.
	* common.opt: Add -fvartrace
	* opts.c (parse_vartrace_options): Add.
	(common_handle_option): Call parse_vartrace_options.
	* config/i386/i386.c (ix86_vartrace_func): Add.
	(TARGET_VARTRACE_FUNC): Add.
	* doc/extend.texi: Document vartrace/no_vartrace
	attributes.
	* doc/invoke.texi: Document -fvartrace.
	* doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
	* passes.def: Add vartrace pass.
	* target.def (vartrace_func): Add.
	* tree-pass.h (make_pass_vartrace): Add.
	* tree-vartrace.c: New file to implement vartrace pass.

gcc/c-family/:

2019-11-10  Andi Kleen  <ak@linux.intel.com>

	* c-attribs.c (handle_vartrace_attribute,
	  handle_no_vartrace_attribute): New functions.
	  (attr_vartrace_exclusions): Add.

config/:

2019-11-10  Andi Kleen  <ak@linux.intel.com>

	* bootstrap-vartrace.mk: New.
	* bootstrap-vartrace-locals.mk: New.
---
 config/bootstrap-vartrace-locals.mk |   3 +
 config/bootstrap-vartrace.mk        |   3 +
 gcc/Makefile.in                     |   1 +
 gcc/c-family/c-attribs.c            |  78 ++++
 gcc/common.opt                      |   8 +
 gcc/config/i386/i386-builtins.c     |  31 ++
 gcc/config/i386/i386-protos.h       |   2 +
 gcc/config/i386/i386.c              |   3 +
 gcc/doc/extend.texi                 |  32 ++
 gcc/doc/invoke.texi                 | 109 +++++
 gcc/doc/tm.texi                     |   6 +
 gcc/doc/tm.texi.in                  |   2 +
 gcc/flag-types.h                    |  10 +
 gcc/opts.c                          |  63 +++
 gcc/passes.def                      |   1 +
 gcc/target.def                      |   9 +
 gcc/tree-pass.h                     |   1 +
 gcc/tree-vartrace.c                 | 657 ++++++++++++++++++++++++++++
 18 files changed, 1019 insertions(+)
 create mode 100644 config/bootstrap-vartrace-locals.mk
 create mode 100644 config/bootstrap-vartrace.mk
 create mode 100644 gcc/tree-vartrace.c

Comments

Andi Kleen Nov. 21, 2019, 1:17 p.m. UTC | #1
Andi Kleen <andi@firstfloor.org> writes:

Ping!

> From: Andi Kleen <ak@linux.intel.com>
>
> [v4: Rebased on current tree. Avoid some redundant log statements
> for locals and a few other fixes.  Fix some comments. Improve
> documentation. Did some studies on the debug information quality,
> see below]
>
> Add a new pass to automatically instrument changes to variables
> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
> field into an Processor Trace log, which allows low over head
> logging of information. Essentially it's a hardware accelerated
> printf.
Andi Kleen Dec. 3, 2019, 6:38 p.m. UTC | #2
Andi Kleen <ak@linux.intel.com> writes:

Ping!

> Andi Kleen <andi@firstfloor.org> writes:
>
> Ping!
>
>> From: Andi Kleen <ak@linux.intel.com>
>>
>> [v4: Rebased on current tree. Avoid some redundant log statements
>> for locals and a few other fixes.  Fix some comments. Improve
>> documentation. Did some studies on the debug information quality,
>> see below]
>>
>> Add a new pass to automatically instrument changes to variables
>> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
>> field into an Processor Trace log, which allows low over head
>> logging of information. Essentially it's a hardware accelerated
>> printf.
Andi Kleen Dec. 9, 2019, 6:47 p.m. UTC | #3
Andi Kleen <ak@linux.intel.com> writes:

Ping!

> Andi Kleen <ak@linux.intel.com> writes:
>
> Ping!
>
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> Ping!
>>
>>> From: Andi Kleen <ak@linux.intel.com>
>>>
>>> [v4: Rebased on current tree. Avoid some redundant log statements
>>> for locals and a few other fixes.  Fix some comments. Improve
>>> documentation. Did some studies on the debug information quality,
>>> see below]
>>>
>>> Add a new pass to automatically instrument changes to variables
>>> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
>>> field into an Processor Trace log, which allows low over head
>>> logging of information. Essentially it's a hardware accelerated
>>> printf.
Andi Kleen Dec. 16, 2019, 6:19 p.m. UTC | #4
Andi Kleen <ak@linux.intel.com> writes:

Ping!

> Andi Kleen <ak@linux.intel.com> writes:
>
> Ping!
>
>> Andi Kleen <ak@linux.intel.com> writes:
>>
>> Ping!
>>
>>> Andi Kleen <andi@firstfloor.org> writes:
>>>
>>> Ping!
>>>
>>>> From: Andi Kleen <ak@linux.intel.com>
>>>>
>>>> [v4: Rebased on current tree. Avoid some redundant log statements
>>>> for locals and a few other fixes.  Fix some comments. Improve
>>>> documentation. Did some studies on the debug information quality,
>>>> see below]
>>>>
>>>> Add a new pass to automatically instrument changes to variables
>>>> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
>>>> field into an Processor Trace log, which allows low over head
>>>> logging of information. Essentially it's a hardware accelerated
>>>> printf.
Andi Kleen Jan. 7, 2020, 3:40 p.m. UTC | #5
Andi Kleen <ak@linux.intel.com> writes:

Ping!

> Andi Kleen <ak@linux.intel.com> writes:
>
> Ping!
>
>> Andi Kleen <ak@linux.intel.com> writes:
>>
>> Ping!
>>
>>> Andi Kleen <ak@linux.intel.com> writes:
>>>
>>> Ping!
>>>
>>>> Andi Kleen <andi@firstfloor.org> writes:
>>>>
>>>> Ping!
>>>>
>>>>> From: Andi Kleen <ak@linux.intel.com>
>>>>>
>>>>> [v4: Rebased on current tree. Avoid some redundant log statements
>>>>> for locals and a few other fixes.  Fix some comments. Improve
>>>>> documentation. Did some studies on the debug information quality,
>>>>> see below]
>>>>>
>>>>> Add a new pass to automatically instrument changes to variables
>>>>> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
>>>>> field into an Processor Trace log, which allows low over head
>>>>> logging of information. Essentially it's a hardware accelerated
>>>>> printf.
diff mbox series

Patch

diff --git a/config/bootstrap-vartrace-locals.mk b/config/bootstrap-vartrace-locals.mk
new file mode 100644
index 00000000000..dd16640df74
--- /dev/null
+++ b/config/bootstrap-vartrace-locals.mk
@@ -0,0 +1,3 @@ 
+STAGE2_CFLAGS += -mptwrite -fvartrace=all
+STAGE3_CFLAGS += -mptwrite -fvartrace=all
+STAGE4_CFLAGS += -mptwrite -fvartrace=all
diff --git a/config/bootstrap-vartrace.mk b/config/bootstrap-vartrace.mk
new file mode 100644
index 00000000000..e29824d799b
--- /dev/null
+++ b/config/bootstrap-vartrace.mk
@@ -0,0 +1,3 @@ 
+STAGE2_CFLAGS += -mptwrite -fvartrace
+STAGE3_CFLAGS += -mptwrite -fvartrace
+STAGE4_CFLAGS += -mptwrite -fvartrace
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 035b58f50c0..b2303fd46ee 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1598,6 +1598,7 @@  OBJS = \
 	tree-vectorizer.o \
 	tree-vector-builder.o \
 	tree-vrp.o \
+	tree-vartrace.o \
 	tree.o \
 	typed-splay-tree.o \
 	unique-ptr-tests.o \
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 1c9f28587fb..c2af67cbf58 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -105,6 +105,10 @@  static tree handle_tls_model_attribute (tree *, tree, tree, int,
 					bool *);
 static tree handle_no_instrument_function_attribute (tree *, tree,
 						     tree, int, bool *);
+static tree handle_vartrace_attribute (tree *, tree,
+						     tree, int, bool *);
+static tree handle_no_vartrace_attribute (tree *, tree,
+						     tree, int, bool *);
 static tree handle_no_profile_instrument_function_attribute (tree *, tree,
 							     tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
@@ -245,6 +249,13 @@  static const struct attribute_spec::exclusions attr_noinit_exclusions[] =
   ATTR_EXCL (NULL, false, false, false),
 };
 
+static const struct attribute_spec::exclusions attr_vartrace_exclusions[] =
+{
+  ATTR_EXCL ("vartrace", true, true, true),
+  ATTR_EXCL ("no_vartrace", true, true, true),
+  ATTR_EXCL (NULL, false, false, false)
+};
+
 /* Table of machine-independent attributes common to all C-like languages.
 
    Current list of processed common attributes: nonnull.  */
@@ -336,6 +347,12 @@  const struct attribute_spec c_common_attribute_table[] =
   { "no_instrument_function", 0, 0, true,  false, false, false,
 			      handle_no_instrument_function_attribute,
 			      NULL },
+  { "vartrace",		      0, 0, false,  false, false, false,
+			      handle_vartrace_attribute,
+			      attr_vartrace_exclusions },
+  { "no_vartrace",	      0, 0, false,  false, false, false,
+			      handle_no_vartrace_attribute,
+			      attr_vartrace_exclusions },
   { "no_profile_instrument_function",  0, 0, true, false, false, false,
 			      handle_no_profile_instrument_function_attribute,
 			      NULL },
@@ -991,6 +1008,67 @@  handle_no_sanitize_undefined_attribute (tree *node, tree name, tree, int,
   return NULL_TREE;
 }
 
+/* Handle "vartrace" attribute; arguments as in struct
+   attribute_spec.handler.  */
+
+static tree
+handle_vartrace_attribute (tree *node, tree name, tree, int flags,
+			   bool *no_add_attrs)
+{
+  if (!VAR_OR_FUNCTION_DECL_P (*node) && !TYPE_P (*node) &&
+      TREE_CODE (*node) != FIELD_DECL)
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored for object", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (!targetm.vartrace_func)
+    {
+      warning (OPT_Wattributes, "%qE attribute not supported for target", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (TREE_TYPE (*node)
+      && TREE_CODE (*node) != FUNCTION_DECL
+      && targetm.vartrace_func (TYPE_MODE (TREE_TYPE (*node)), true) ==
+      NULL_TREE)
+   {
+      warning (OPT_Wattributes, "%qE attribute not supported for type", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+   }
+
+  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+    *node = build_variant_type_copy (*node);
+
+  /* We look it up later with lookup_attribute.  */
+  return NULL_TREE;
+}
+
+/* Handle "no_vartrace" attribute; arguments as in struct
+   attribute_spec.handler.  */
+
+static tree
+handle_no_vartrace_attribute (tree *node, tree name, tree, int flags,
+			      bool *no_add_attrs)
+{
+  if (!VAR_OR_FUNCTION_DECL_P (*node) && !TYPE_P (*node)
+      && TREE_CODE (*node) != FIELD_DECL)
+    {
+      warning (OPT_Wattributes, "%qE attribute ignored", name);
+      *no_add_attrs = true;
+      return NULL_TREE;
+    }
+
+  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
+    *node = build_variant_type_copy (*node);
+
+  /* We look it up later with lookup_attribute.  */
+  return NULL_TREE;
+}
+
 /* Handle an "asan odr indicator" attribute; arguments as in
    struct attribute_spec.handler.  */
 
diff --git a/gcc/common.opt b/gcc/common.opt
index cc279f411d7..2d2f146a7d9 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -221,6 +221,10 @@  unsigned int flag_sanitize_recover = (SANITIZE_UNDEFINED | SANITIZE_UNDEFINED_NO
 Variable
 unsigned int flag_sanitize_coverage
 
+; What to instrument with vartrace
+Variable
+unsigned int flag_vartrace
+
 ; Flag whether a prefix has been added to dump_base_name
 Variable
 bool dump_base_name_prefixed = false
@@ -2922,6 +2926,10 @@  ftree-scev-cprop
 Common Report Var(flag_tree_scev_cprop) Init(1) Optimization
 Enable copy propagation of scalar-evolution information.
 
+fvartrace
+Common JoinedOrMissing Report Driver
+-fvartrace=default|all|locals|returns|args|reads|writes|off   Enable variable tracing instrumentation.
+
 ; -fverbose-asm causes extra commentary information to be produced in
 ; the generated assembly code (to make it more readable).  This option
 ; is generally only of use to those who actually need to read the
diff --git a/gcc/config/i386/i386-builtins.c b/gcc/config/i386/i386-builtins.c
index 5b388ec7910..5816a294676 100644
--- a/gcc/config/i386/i386-builtins.c
+++ b/gcc/config/i386/i386-builtins.c
@@ -2556,4 +2556,35 @@  fold_builtin_cpu (tree fndecl, tree *args)
   gcc_unreachable ();
 }
 
+/* Hook to determine that MODE can be traced.  Ignore target flags
+   if FORCE is true. Returns the tracing builtin if tracing is possible,
+   or otherwise NULL.  */
+
+tree
+ix86_vartrace_func (machine_mode mode, bool force)
+{
+  if (!(ix86_isa_flags2 & OPTION_MASK_ISA_PTWRITE))
+    {
+      /* With force, as in checking for the attribute, ignore
+	 the current target settings. Otherwise it's not
+	 possible to declare vartrace variables outside
+	 an __attribute__((target("ptwrite"))) function
+	 if -mptwrite is not specified.  */
+      if (!force)
+	return NULL;
+      /* Initialize the builtins if missing, so that we have
+	 something to return.  */
+      if (!ix86_builtins[(int)IX86_BUILTIN_PTWRITE32])
+	ix86_add_new_builtins (0, OPTION_MASK_ISA_PTWRITE);
+    }
+  // middle end will generate the necessary conversions.
+  if (GET_MODE_SIZE (mode) <= 4)
+    return ix86_builtins[(int) IX86_BUILTIN_PTWRITE32];
+  if (TARGET_64BIT && GET_MODE_SIZE (mode) <= 8)
+      return ix86_builtins[(int) IX86_BUILTIN_PTWRITE64];
+  // so far vectors and larger structs cannot be logged.
+  return NULL;
+}
+
+
 #include "gt-i386-builtins.h"
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ced1780be23..67b95e8950f 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -248,6 +248,8 @@  extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
 
+extern tree ix86_vartrace_func (machine_mode mode, bool force);
+
 /* In i386-c.c  */
 extern void ix86_target_macros (void);
 extern void ix86_register_pragmas (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 03a7082d2fc..23275ba9883 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22872,6 +22872,9 @@  ix86_run_selftests (void)
 #undef TARGET_ASAN_SHADOW_OFFSET
 #define TARGET_ASAN_SHADOW_OFFSET ix86_asan_shadow_offset
 
+#undef TARGET_VARTRACE_FUNC
+#define TARGET_VARTRACE_FUNC ix86_vartrace_func
+
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
 
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 9db4f9b1d29..f0bf382d29f 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3397,6 +3397,14 @@  the standard C library can be guaranteed not to throw an exception
 with the notable exceptions of @code{qsort} and @code{bsearch} that
 take function pointer arguments.
 
+@item no_vartrace
+The @code{no_vartrace} attribute disables data tracing for
+the function [or variable or structure field or type] declared with
+the attribute. See @pxref{Common Variable Attributes} and
+@pxref{Common Type Attributes}. When specified for a function
+nothing in the function is traced. @code{no_vartrace} overrides
+any @code{vartrace} attributes for the specific object.
+
 @item optimize (@var{level}, @dots{})
 @item optimize (@var{string}, @dots{})
 @cindex @code{optimize} function attribute
@@ -3679,6 +3687,20 @@  When applied to a member function of a C++ class template, the
 attribute also means that the function is instantiated if the
 class itself is instantiated.
 
+@item vartrace
+@cindex @code{vartrace} function or variable attribute
+Enable data tracing for the function or variable or structure field
+or type marked with this attribute. When applied to a type all instances of the type
+will be traced. When applied to a structure or union all fields will be traced
+when individually accessed, however not when the whole record is copied.
+When applied to a structure field that field will be traced.
+For functions will trace arguments, returns, globals, pointer
+references (unless overriden with -fvartrace=off)
+
+Currently implemented for x86 when the @option{ptwrite} target option
+is enabled for systems that support the @code{PTWRITE} instruction,
+and supporting data types of 8 bytes or smaller.
+
 @item visibility ("@var{visibility_type}")
 @cindex @code{visibility} function attribute
 This attribute affects the linkage of the declaration to which it is attached.
@@ -7914,6 +7936,16 @@  A @{ /* @r{@dots{}} */ @};
 struct __attribute__ ((copy ( (struct A *)0)) B @{ /* @r{@dots{}} */ @};
 @end smallexample
 
+@cindex @code{vartrace} type attribute
+@cindex @code{no_vartrace} type attribute
+@item vartrace
+@itemx no_vartrace
+Specify that all instances of type should be variable traced
+or not variable traced. Can be also also applied to function
+types to disable tracing for all instances of that function type.
+Can be also applied to structure fields. See the description in
+@pxref{Variable Attributes} for more details.
+
 @item deprecated
 @itemx deprecated (@var{msg})
 @cindex @code{deprecated} type attribute
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index faa7fa95a0e..596f0cf9eca 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -511,6 +511,7 @@  Objective-C and Objective-C++ Dialects}.
 -fno-stack-limit  -fsplit-stack @gol
 -fvtable-verify=@r{[}std@r{|}preinit@r{|}none@r{]} @gol
 -fvtv-counts  -fvtv-debug @gol
+-fvartrace=@var{options} @gol
 -finstrument-functions @gol
 -finstrument-functions-exclude-function-list=@var{sym},@var{sym},@dots{} @gol
 -finstrument-functions-exclude-file-list=@var{file},@var{file},@dots{}}
@@ -2798,6 +2799,114 @@  Don't use the @code{__cxa_get_exception_ptr} runtime routine.  This
 causes @code{std::uncaught_exception} to be incorrect, but is necessary
 if the runtime routine is not available.
 
+@item -fvartrace=@var{options}
+@opindex -fvartrace=options
+Insert trace instructions to trace reads and writes and other operations
+at runtime.
+
+Requires enabling a backend specific option, like @option{-mptwrite} to enable
+@code{PTWRITE} instruction generation on x86.
+
+Additional qualifiers can be specified after the equal:
+
+@itemize
+@item
+@option{args}
+for tracing arguments at the beginning of each function
+
+@item
+@option{returns} for tracing the values each function returns
+inside the function
+
+@item
+@option{reads} to trace memory reads, such as pointer, structure,
+union references, static variables and globals.
+
+@item
+@option{writes} to trace memory writes, similar to the reads above.
+
+@item
+@option{locals} to trace local variables. It will only trace locals
+when they appear after program optimization, many locals may
+actually be modified or eliminated by the optimizer. Local variable
+tracing is depending on memory tracing read or writes being enabled.
+
+@item
+@option{off} to disable all
+tracing, even overriding @code{vartrace} attributes in the program
+
+@item
+@option{all}
+enable all options above.
+@end itemize
+
+Multiple options can be separated by comma.
+
+Default when nothing is specified (as in @option{-fvartrace}) is
+to trace reads, writes, arguments, returns, objects with a
+static or thread duration but no locals.
+
+Only individual variables or members can be logged, not complete structures,
+or variables larger than 8 bytes, such as vectors.
+
+For tracing the compiler will add trace instructions. By
+default these trace options act like nops, unless tracing is
+enabled at execution time, however there might be still
+some overhead from additional memory accesses and larger
+code foot print.
+
+For example to do varable (but no branch) tracing on x86 Linux
+with Linux kernel / Linux perf tool 4.9 or later and a CPU that
+supports Intel Processor Trace and PTWRITE use:
+
+@smallexample
+gcc -fvartrace -mptwrite -o program -g program.c
+perf record -e intel_pt/pt=1,branch=0,ptw=1,fup_on_ptw=1/u ./program
+@end smallexample
+
+Then output the variables with
+
+@smallexample
+perf script -F +srcline
+@end smallexample
+
+To do variable tracing including all branches
+
+@smallexample
+perf record -e intel_pt/ptw=1/u ./program
+perf script --itrace=bew -F +srcline
+@end smallexample
+
+Note that this may report lost data, and
+not be able to trace all updates in a long
+running program doing a lot of computation,
+because the program may generate trace data faster
+than perf or the CPU or the disk can process.
+
+In such a case consider to:
+
+@itemize
+@item
+Increase the perf buffer size through the kernel.perf_event_mlock_kb
+sysctl.
+
+@item
+Don't enable global tracing with this option, but
+only opt-in to tracing for specific strategic functions, variables,
+fields, or types of interest with the @code{vartrace} attribute.
+
+@item
+Avoid using locals tracing.
+
+@item
+Don't save all trace data continuously.
+Use perf snapshot mode with --snapshot and only save the trace
+buffer by sending SIGUSR1 to perf on events of interest,
+or only attach perf record for a short time with the -p
+option.
+
+@end itemize
+
 @item -fvisibility-inlines-hidden
 @opindex fvisibility-inlines-hidden
 This switch declares that the user does not attempt to compare
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index cd9aed9874f..4d82b304a7e 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12029,6 +12029,12 @@  Address Sanitizer shadow memory address.  NULL if Address Sanitizer is not
 supported by the target.
 @end deftypefn
 
+@deftypefn {Target Hook} tree TARGET_VARTRACE_FUNC (machine_mode @var{mode}, bool @var{force})
+Return a builtin to call to trace variables of mode MODE or NULL if not supported
+by the target. Ignore target configuration if FORCE is true. The builtin gets called with a
+single argument of TYPE.
+@end deftypefn
+
 @deftypefn {Target Hook} {unsigned HOST_WIDE_INT} TARGET_MEMMODEL_CHECK (unsigned HOST_WIDE_INT @var{val})
 Validate target specific memory model mask bits. When NULL no target specific
 memory model bits are allowed.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 2739e9ceec5..7ed71778688 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8129,6 +8129,8 @@  and the associated definitions of those functions.
 
 @hook TARGET_ASAN_SHADOW_OFFSET
 
+@hook TARGET_VARTRACE_FUNC
+
 @hook TARGET_MEMMODEL_CHECK
 
 @hook TARGET_ATOMIC_TEST_AND_SET_TRUEVAL
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a2103282d46..9a9d4bb3672 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -269,6 +269,16 @@  enum sanitize_code {
 				  | SANITIZE_BOUNDS_STRICT
 };
 
+/* Settings for flag_vartrace */
+enum vartrace_flags {
+  VARTRACE_LOCALS = 1 << 0,
+  VARTRACE_ARGS = 1 << 1,
+  VARTRACE_RETURNS = 1 << 2,
+  VARTRACE_READS = 1 << 3,
+  VARTRACE_WRITES = 1 << 4,
+  VARTRACE_OFF = 1 << 5
+};
+
 /* Settings of flag_incremental_link.  */
 enum incremental_link {
   INCREMENTAL_LINK_NONE,
diff --git a/gcc/opts.c b/gcc/opts.c
index 10b9f108f8d..4f92b654656 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2217,6 +2217,65 @@  print_help (struct gcc_options *opts, unsigned int lang_mask,
 			 lang_mask);
 }
 
+/* Parse vartrace options in P, updating flags OPTS at LOC and return
+   updated flags.  */
+
+static int
+parse_vartrace_options (const char *p, int opts, location_t loc)
+{
+  static struct {
+    const char *name;
+    int opt;
+  } vopts[] =
+      {
+       { "default",
+	 VARTRACE_ARGS | VARTRACE_RETURNS | VARTRACE_READS
+	 | VARTRACE_WRITES }, /* Keep as first entry.  */
+       { "all",
+	 VARTRACE_ARGS | VARTRACE_RETURNS | VARTRACE_READS
+	 | VARTRACE_WRITES | VARTRACE_LOCALS },
+       { "args", VARTRACE_ARGS },
+       { "returns", VARTRACE_RETURNS },
+       { "reads", VARTRACE_READS },
+       { "writes", VARTRACE_WRITES },
+       { "locals", VARTRACE_LOCALS },
+       { NULL, 0 }
+      };
+
+  if (*p == '=')
+    p++;
+  if (*p == 0)
+    return opts | vopts[0].opt;
+
+  if (!strcmp (p, "off"))
+    return VARTRACE_OFF;
+
+  while (*p)
+    {
+      unsigned len = strcspn (p, ",");
+      int i;
+
+      for (i = 0; vopts[i].name; i++)
+	{
+	  if (len == strlen (vopts[i].name) && !strncmp (p, vopts[i].name, len))
+	    {
+	      opts |= vopts[i].opt;
+	      break;
+	    }
+	}
+      if (vopts[i].name == NULL)
+	{
+	  error_at (loc, "invalid argument to %qs", "-fvartrace");
+	  break;
+	}
+
+      p += len;
+      if (*p == ',')
+	p++;
+    }
+  return opts;
+}
+
 /* Handle target- and language-independent options.  Return zero to
    generate an "unknown option" message.  Only options that need
    extra handling need to be listed here; if you simply want
@@ -2299,6 +2358,10 @@  common_handle_option (struct gcc_options *opts,
     case OPT__completion_:
       break;
 
+    case OPT_fvartrace:
+      opts->x_flag_vartrace = parse_vartrace_options (arg, opts->x_flag_vartrace, loc);
+      break;
+
     case OPT_fsanitize_:
       opts->x_flag_sanitize
 	= parse_sanitizer_options (arg, loc, code,
diff --git a/gcc/passes.def b/gcc/passes.def
index 798a391bd35..e960673c4db 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -394,6 +394,7 @@  along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cleanup_eh);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
+  NEXT_PASS (pass_vartrace);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
diff --git a/gcc/target.def b/gcc/target.def
index 8e83c2c7a71..275c8787249 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4381,6 +4381,15 @@  supported by the target.",
  unsigned HOST_WIDE_INT, (void),
  NULL)
 
+/* Defines the builtin to trace variables, or NULL.  */
+DEFHOOK
+(vartrace_func,
+ "Return a builtin to call to trace variables of mode MODE or NULL if not supported\n\
+by the target. Ignore target configuration if FORCE is true. The builtin gets called with a\n\
+single argument of TYPE.",
+ tree, (machine_mode mode, bool force),
+ NULL)
+
 /* Functions relating to calls - argument passing, returns, etc.  */
 /* Members of struct call have no special macro prefix.  */
 HOOK_VECTOR (TARGET_CALLS, calls)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a987661530e..cd9a76383fa 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -423,6 +423,7 @@  extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_post_ipa_warn (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_stdarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_vartrace (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_early_warn_uninitialized (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_late_warn_uninitialized (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cse_reciprocals (gcc::context *ctxt);
diff --git a/gcc/tree-vartrace.c b/gcc/tree-vartrace.c
new file mode 100644
index 00000000000..5e1a3eb51d5
--- /dev/null
+++ b/gcc/tree-vartrace.c
@@ -0,0 +1,657 @@ 
+/* Insert instructions for data value tracing.
+   Copyright (C) 2017, 2018 Free Software Foundation, Inc.
+   Contributed by Andi Kleen.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* General theory:
+
+   Insert trace builtins into a function to log data of interest
+   for debugging purposes later. We rely on the backend to do
+   the logging; currently only through the PTWRITE instruction
+   on Intel, which writes data to the CPU's Processor Trace log.
+
+   We support arguments, returns, all memory reads and writes (such as
+   pointer references) for user-visible variables, and locals if
+   enabled.
+
+   The locals tracing is quite limited and often cannot trace
+   everything when optimization is enabled. It may also lead
+   to significantly larger code and more run time overhead.
+
+   We only log one def per basic block to avoid too many redundancies.
+   Memory references currently do not use alias information, so there
+   might be redundant logging with non locals.
+
+   When a variable has the vartrace attribute specified, it is always
+   logged independently of -fvartrace options (unless -fvartrace=off is
+   specified which overrides even attributes). Same for types and
+   structure members.
+
+   When a function has the vartrace attribute specified, every user visible
+   access in it (except locals) is logged, except when -fvartrace=off
+   globally overrides it.
+
+   When a variable or type has no_vartrace set it is never logged.
+
+   When a function has no_vartrace set, nothing in it is logged.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "tree-iterator.h"
+#include "tree-pass.h"
+#include "tree-eh.h"
+#include "basic-block.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "gimplify-me.h"
+#include "gimple-ssa.h"
+#include "gimple-walk.h"
+#include "gimple-pretty-print.h"
+#include "cfghooks.h"
+#include "fold-const.h"
+#include "ssa.h"
+#include "tree-dfa.h"
+#include "attribs.h"
+
+enum attrstate { force_off, force_on, neutral };
+
+/* Are locals enabled or forced with FLAG (read or write)?  */
+
+static bool
+locals_enabled (enum vartrace_flags flag)
+{
+  return (flag_vartrace & VARTRACE_LOCALS) && (flag_vartrace & flag);
+}
+
+/* Log tree OP not being traced to dump_file.  */
+
+static void
+log_op (tree op, const char *what)
+{
+  if (dump_file)
+    {
+      fprintf (dump_file, "%s ", what);
+      print_generic_expr (dump_file, op, TDF_VOPS);
+      fprintf (dump_file, " type ");
+      print_generic_expr (dump_file, TREE_TYPE (op));
+      fputc ('\n', dump_file);
+    }
+}
+
+/* Log tree OP if verbose dump enabled.  */
+
+static void
+log_op_verbose (tree op, const char *what)
+{
+  if (dump_flags & TDF_DETAILS)
+    log_op (op, what);
+}
+
+/* Is tracing enabled with attributes ATTR.  */
+
+static attrstate
+enabled_attr (tree attr)
+{
+  if (lookup_attribute ("no_vartrace", attr))
+    return force_off;
+  if (lookup_attribute ("vartrace", attr))
+    return force_on;
+  return neutral;
+}
+
+/* Is tracing enabled for ARG considering S.  */
+
+static attrstate
+enabled_op (tree arg, attrstate s)
+{
+  if (s != neutral)
+    return s;
+  if (DECL_P (arg))
+    {
+      s = enabled_attr (DECL_ATTRIBUTES (arg));
+      if (s != neutral)
+	return s;
+    }
+  return enabled_attr (TYPE_ATTRIBUTES (TREE_TYPE (arg)));
+}
+
+/* Can we trace OP's type? Return true if yes.  */
+
+static bool
+supported_type (tree op)
+{
+  // right now we can only trace objects that fit into gimple
+  // registers. this rejects records, would need special handling
+  // in insert_trace for those.
+  tree type = TREE_TYPE (op);
+  return INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)
+	  || POINTER_TYPE_P (type);
+}
+
+/* Is OP something we can trace?  Helper for supported_op. Read/Write
+   flag in FLAG. Force state in S. Logs temporaries if LOG_TEMPS is true.
+   Return true if traceable.  */
+
+static bool
+do_supported_op (tree op, vartrace_flags flag, attrstate s, bool log_temps)
+{
+  // cannot instrument accesses that throw because that would
+  // need splitting BBs. (with some options they could be common?)
+  if (tree_could_throw_p (op))
+    return false;
+
+  if (DECL_P (op))
+    {
+      // don't log temporaries, unless this is a return
+      if ((DECL_IGNORED_P (op) || DECL_ARTIFICIAL (op)) && !log_temps)
+	return false;
+      if (s == force_on)
+	return true;
+      // rejects locals unless enabled.
+      if (!locals_enabled(flag) && !DECL_EXTERNAL (op) && !TREE_STATIC (op))
+	return false;
+      if (flag_vartrace & flag)
+	return true;
+    }
+  if (TREE_CODE (op) == SSA_NAME)
+    {
+      // don't log with computed gotos
+      // would need code generation handling abnormal edges
+      if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (op))
+	return false;
+      if (log_temps && (flag_vartrace & flag))
+	return true;
+
+      // only log user visible variables for SSA_NAMES
+      if (!SSA_NAME_VAR (op)
+	  || DECL_ARTIFICIAL (SSA_NAME_VAR (op))
+	  || DECL_IGNORED_P (SSA_NAME_VAR (op)))
+	return false;
+
+      if (s == force_on)
+	return true;
+      return locals_enabled(flag);
+    }
+  // log anything that references memory
+  else if (TREE_CODE (op) == MEM_REF || TREE_CODE (op) == TARGET_MEM_REF)
+    {
+      // ??? Handle locals that use mem_ref here?
+      if (s == force_on || (flag_vartrace & flag))
+	return true;
+    }
+
+  return false;
+}
+
+/* Is OP something we can trace?  Read/Write flag in FLAG. Logs
+   temporaries if LOG_TEMPS is true. Return true if traceable.  */
+
+static bool
+supported_op (tree op, vartrace_flags flag, bool log_temps)
+{
+  if (!supported_type (op))
+    {
+      log_op (op, "unsupported orig type");
+      return false;
+    }
+
+  attrstate s = neutral;
+  do
+    {
+      s = enabled_op (op, s);
+      // also check the field declaration for attributes
+      if (TREE_CODE (op) == COMPONENT_REF
+	  || TREE_CODE (op) == BIT_FIELD_REF)
+	s = enabled_op (TREE_OPERAND (op, 1), s);
+    }
+  while (handled_component_p (op) && (op = TREE_OPERAND (op, 0)));
+  op = get_base_address (op);
+  s = enabled_op (op, s);
+  if (s == force_off)
+    return false;
+
+  if (do_supported_op (op, flag, s, log_temps))
+    return true;
+
+  log_op (op, "not tracing unsupported");
+  return false;
+}
+
+/* Print debugging for inserting CODE at ORIG_STMT with type of VAL for WHY.  */
+
+static void
+log_trace_code (gimple *orig_stmt, gimple *code, tree val, const char *why)
+{
+  if (!dump_file)
+    return;
+  fprintf (dump_file, "%s:", IDENTIFIER_POINTER (DECL_NAME (current_function_decl)));
+  if (orig_stmt)
+    {
+      location_t l = gimple_location (orig_stmt);
+      if (l > BUILTINS_LOCATION)
+	fprintf (dump_file, "%s:%d:%d:",
+		 LOCATION_FILE (l), LOCATION_LINE (l), LOCATION_COLUMN (l));
+      fprintf (dump_file, " BB%d ", gimple_bb (orig_stmt)->index);
+    }
+  fprintf (dump_file, "%s inserting ", why);
+  print_gimple_stmt (dump_file, code, 0, TDF_VOPS);
+  if (orig_stmt)
+    {
+      fprintf (dump_file, "orig ");
+      print_gimple_stmt (dump_file, orig_stmt, 2, TDF_VOPS);
+    }
+  fprintf (dump_file, "type ");
+  print_generic_expr (dump_file, TREE_TYPE (val), TDF_SLIM);
+  fputc ('\n', dump_file);
+  fputc ('\n', dump_file);
+}
+
+/* Insert STMT before or AFTER GI with location LOC.  */
+
+static void insert_stmt(gimple_stmt_iterator *gi, gimple *stmt,
+			location_t loc, bool after)
+{
+  gimple_set_location (stmt, loc);
+  if (after)
+    gsi_insert_after (gi, stmt, GSI_CONTINUE_LINKING);
+  else
+    gsi_insert_before (gi, stmt, GSI_SAME_STMT);
+}
+
+/* Insert conversion at GI from VAL to type TTYPE originally at LOC
+   and STMT, inserting AFTER if after is true. Resulting value is
+   returned.  */
+
+static tree
+insert_conversion (gimple_stmt_iterator *gi, tree val,
+		   tree ttype,
+		   gimple *stmt, location_t loc,
+		   bool after)
+{
+  tree type = TREE_TYPE (val);
+  if (ttype == type)
+    return val;
+
+  tree tvar = create_tmp_reg (ttype, "trace");
+  if (ttype != type)
+    {
+      if ((INTEGRAL_TYPE_P (ttype) && INTEGRAL_TYPE_P (type))
+		|| TYPE_SIZE_UNIT (ttype) != TYPE_SIZE_UNIT (type))
+	val = build1 (CONVERT_EXPR, ttype, val);
+      else
+	val = build1 (VIEW_CONVERT_EXPR, ttype, val);
+    }
+  gassign *assign = gimple_build_assign (tvar, val);
+  log_trace_code (stmt, assign, val, "conversion");
+  insert_stmt (gi, assign, loc, after);
+  return tvar;
+}
+
+/* Insert variable tracing code for VAL before iterator GI, originally
+   for ORIG_STMT and optionally at LOC. Normally before ORIG_STMT, but
+   AFTER if true. Reason is WHY. Return true if successfull.  */
+
+static bool
+insert_trace (gimple_stmt_iterator *gi, tree val, gimple *orig_stmt,
+	      const char *why, location_t loc = -1, bool after = false)
+{
+  if (loc == (location_t)-1)
+    loc = gimple_location (orig_stmt);
+
+  tree type = TREE_TYPE (val);
+
+  tree func = targetm.vartrace_func (TYPE_MODE (type), false);
+  if (!func)
+    return false;
+
+  tree ttype = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (func)));
+
+  val = unshare_expr (val);
+
+  tree tvar = create_tmp_reg (type, "trace");
+  gassign *assign = gimple_build_assign (tvar, val);
+  log_trace_code (orig_stmt, assign, val, "mem access");
+  insert_stmt (gi, assign, loc, after);
+
+  tree cvar = insert_conversion (gi, tvar, ttype, orig_stmt, loc, after);
+
+  gcall *call = gimple_build_call (func, 1, cvar);
+  log_trace_code (NULL, call, cvar, why);
+  insert_stmt (gi, call, loc, after);
+  return true;
+}
+
+/* Find unique identifier for tree T.  */
+
+static gimple *
+ref_identifier (tree t)
+{
+  if (TREE_CODE (t) == SSA_NAME)
+    return SSA_NAME_DEF_STMT (t);
+  return NULL;
+}
+
+/* Try instrumenting ARG at position EGI in function FUN. Return true if
+   changed.  */
+
+static bool
+try_instrument_arg (gimple_stmt_iterator *egi, tree arg, function *fun)
+{
+  if (!(flag_vartrace & VARTRACE_ARGS) && enabled_op (arg, neutral) != force_on)
+    return false;
+
+  if (TREE_CODE (arg) != PARM_DECL
+      || DECL_BY_REFERENCE (arg) // arg will be pointer to arg. could reference?
+      || DECL_IGNORED_P (arg)
+      || !supported_type (arg))
+    return false;
+
+  tree sarg = ssa_default_def (fun, arg);
+  if (sarg == NULL)
+    return false;
+
+  return insert_trace (egi, sarg, NULL, "arg",
+		       fun->function_start_locus,
+		       false);
+}
+
+/* Instrument arguments for FUN. Return true if changed.  */
+
+static bool
+instrument_args (function *fun)
+{
+  bool changed = false;
+  gimple_stmt_iterator egi;
+
+  // will be traced when read later. In theory we could check
+  // if it is read unconditionally, but assume that the arguments
+  // are not that interesting if they are not used.
+  if (locals_enabled(VARTRACE_READS))
+    return false;
+
+  egi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fun)));
+
+  for (tree arg = DECL_ARGUMENTS (current_function_decl);
+       arg != NULL_TREE;
+       arg = DECL_CHAIN (arg))
+    {
+      if (try_instrument_arg (&egi, arg, fun))
+	changed = true;
+      else
+	log_op (arg, "not tracing argument");
+    }
+  return changed;
+}
+
+/* Generate trace call for store ORIG at GI. Return true if
+   successfull.  */
+
+static bool
+instrument_store (gimple_stmt_iterator *gi, gimple *stmt, tree orig)
+{
+  log_op_verbose (orig, "store");
+  if (!supported_op (orig, VARTRACE_WRITES, false))
+    return false;
+
+  tree type = TREE_TYPE (orig);
+  tree func = targetm.vartrace_func (TYPE_MODE (type), false);
+  if (!func)
+    return false;
+
+  tree ttype = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (func)));
+  location_t loc = gimple_location (stmt);
+
+  /* Generate another reference to target. That can be racy, but is
+     more likely to have the debug location of the target.  Better
+     would be to use the original value to avoid any races, but we
+     would need to somehow force the target location of the
+     builtin.  */
+
+  tree tvar = create_tmp_reg (type, "trace");
+  gassign *assign = gimple_build_assign (tvar, orig);
+  log_trace_code (stmt, assign, orig, "store copy");
+  insert_stmt (gi, assign, loc, true);
+
+  tvar = insert_conversion (gi, tvar, ttype, stmt, loc, true);
+
+  gcall *tcall = gimple_build_call (func, 1, tvar);
+  log_trace_code (stmt, tcall, tvar, "store");
+  insert_stmt (gi, tcall, loc, true);
+  return true;
+}
+
+/* Instrument return at statement STMT at GI needed. LOGGED contains already
+   logged defs. Return true if changed.  */
+
+static bool
+instrument_return (gimple_stmt_iterator *gi, greturn *gret,
+		   hash_set<gimple *> logged)
+{
+  tree rval = gimple_return_retval (gret);
+
+  if (!rval)
+    return false;
+  // check if it may be already traced to avoid redundancies
+  // does not catch all cases
+  if (TREE_CODE (rval) == SSA_NAME)
+    {
+      gimple *def_stmt = SSA_NAME_DEF_STMT (rval);
+      if (logged.contains (def_stmt))
+	{
+	  log_op (rval, "not tracing return with logged def");
+	  return false;
+	}
+
+      if (gimple_code (def_stmt) == GIMPLE_ASSIGN
+	  && gimple_assign_single_p (def_stmt)
+	  && supported_op (gimple_assign_rhs1 (def_stmt),
+			   (vartrace_flags)(VARTRACE_READS|VARTRACE_WRITES),
+			   false))
+	{
+	  log_op (rval, "not tracing redundant return");
+	  return false;
+	}
+
+    }
+  // otherwise trace temporaries like expressions.
+  if (supported_op (rval, VARTRACE_RETURNS, true))
+    return insert_trace (gi, rval, gret, "return");
+  log_op (rval, "not tracing unsupported return");
+  return false;
+}
+
+/* Data for the loads/stores walker.  */
+
+struct visit_data {
+  gimple_stmt_iterator *gi;
+};
+
+/* Visit all loads at STMT with tree OP and visit_data DATA.  */
+
+static bool
+vartrace_visit_load (gimple *stmt, tree, tree op, void *data)
+{
+  visit_data *vd = (visit_data *)data;
+
+  log_op_verbose (op, "load op");
+  return supported_op (op, VARTRACE_READS, false)
+    && insert_trace (vd->gi, op, stmt, "stmt mem load", -1, true);
+}
+
+/* Visit all stores at STMT with tree OP and visit_data DATA.  */
+
+static bool
+vartrace_visit_store (gimple *stmt, tree, tree op, void *data)
+{
+  visit_data *vd = (visit_data *)data;
+
+  log_op_verbose (op, "store op");
+  return supported_op (op, VARTRACE_WRITES, false)
+    && insert_trace (vd->gi, op, stmt, "stmt mem store", -1, true);
+}
+
+/* Insert vartrace calls for FUN.  */
+
+static unsigned int
+vartrace_execute (function *fun)
+{
+  basic_block bb;
+  gimple_stmt_iterator gi;
+  vartrace_flags old_state = (vartrace_flags)flag_vartrace;
+  hash_set<gimple *> logged;
+  gimple *od;
+
+  if (lookup_attribute ("vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
+      || lookup_attribute ("vartrace", DECL_ATTRIBUTES (fun->decl)))
+    {
+      // reset the global temporarily for a forced function
+      flag_vartrace = VARTRACE_READS | VARTRACE_WRITES | VARTRACE_ARGS
+	| VARTRACE_RETURNS;
+    }
+
+  bool changed = instrument_args (fun);
+
+  /* Instrument each gimple statement.	*/
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      // avoid redundancies per BB
+      // could extend this to super blocks?
+      logged.empty();
+
+      for (gi = gsi_start_bb (bb); !gsi_end_p (gi); gsi_next (&gi))
+	{
+	  gimple *stmt = gsi_stmt (gi);
+
+	  if (gimple_code (stmt) == GIMPLE_RETURN)
+	    {
+	      changed |= instrument_return (&gi, as_a<greturn *> (stmt),
+			      		    logged);
+	      continue;
+	    }
+
+	  /* Cannot handle asm goto. Ignore for now.  */
+	  if (gimple_code (stmt) == GIMPLE_ASM
+	      && gimple_asm_nlabels (as_a<gasm *> (stmt)) > 0)
+	    continue;
+
+	  /* Instrument DEFs.  */
+	  def_operand_p defp;
+	  ssa_op_iter iter;
+	  FOR_EACH_SSA_DEF_OPERAND (defp, stmt, iter, SSA_OP_DEF)
+	    {
+	      tree def = DEF_FROM_PTR (defp);
+	      changed |= instrument_store (&gi, stmt, def);
+	      od = ref_identifier (def);
+	      if (od)
+		logged.add (od);
+	    }
+
+	  // For locals tracing need to handle uses, otherwise
+	  // very little is traced with -O2. But only log one DEF
+	  // per basic block.
+	  if (flag_vartrace & VARTRACE_LOCALS)
+	    {
+	      use_operand_p usep;
+
+	      FOR_EACH_SSA_USE_OPERAND (usep, stmt, iter, SSA_OP_USE)
+		{
+		  tree use = USE_FROM_PTR (usep);
+
+		  od = ref_identifier (use);
+		  if (od && logged.contains (od))
+		    continue;
+		  if (!supported_op (use, VARTRACE_READS, false))
+		    continue;
+		  log_op_verbose (use, "local use");
+		  if (od)
+		    logged.add (od);
+		  changed |= insert_trace (&gi, use, stmt, "local use");
+		}
+	    }
+
+	  /* And memory loads and stores.  */
+	  visit_data vd;
+	  vd.gi = &gi;
+	  changed |= walk_stmt_load_store_ops (stmt, &vd, vartrace_visit_load,
+					       vartrace_visit_store);
+	}
+    }
+
+  flag_vartrace = old_state;
+
+  // for now, until we fix all cases that destroy ssa
+  return changed ? TODO_update_ssa : 0;;
+}
+
+static const pass_data pass_data_vartrace =
+{
+  GIMPLE_PASS, /* type */
+  "vartrace", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_vartrace : public gimple_opt_pass
+{
+public:
+  pass_vartrace (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_vartrace, ctxt)
+  {}
+
+  virtual opt_pass * clone ()
+    {
+      return new pass_vartrace (m_ctxt);
+    }
+
+  virtual bool gate (function *fun)
+    {
+      if (flag_vartrace & VARTRACE_OFF)
+	return false;
+
+      // check if vartrace is supported in backend
+      if (!targetm.vartrace_func
+	  || targetm.vartrace_func (SImode, false) == NULL)
+	return false;
+
+      // disabled for function?
+      if (lookup_attribute ("no_vartrace", TYPE_ATTRIBUTES (TREE_TYPE (fun->decl)))
+	  || lookup_attribute ("no_vartrace", DECL_ATTRIBUTES (fun->decl)))
+	return false;
+
+      // need to run pass always to check for variable attributes
+      return true;
+    }
+
+  virtual unsigned int execute (function *f) { return vartrace_execute (f); }
+};
+
+gimple_opt_pass *
+make_pass_vartrace (gcc::context *ctxt)
+{
+  return new pass_vartrace (ctxt);
+}