diff mbox

of: introduce event tracepoints for dynamic device_node lifecyle

Message ID 1492475525-10827-1-git-send-email-tyreld@linux.vnet.ibm.com (mailing list archive)
State RFC
Headers show

Commit Message

Tyrel Datwyler April 18, 2017, 12:32 a.m. UTC
This patch introduces event tracepoints for tracking a device_nodes
reference cycle as well as reconfig notifications generated in response
to node/property manipulations.

With the recent upstreaming of the refcount API several device_node
underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
resources at runtime such as cpus or IOAs). These tracepoints provide a
easy and quick mechanism for validating the reference counting of
device_nodes during their lifetime.

Further, when pseries lpars are migrated to a different machine we
perform a live update of our device tree to bring it into alignment with the
configuration of the new machine. The of_reconfig_notify trace point
provides a mechanism that can be turned for debuging the device tree
modifications with out having to build a custom kernel to get at the
DEBUG code introduced by commit 00aa3720.

The following trace events are provided: of_node_get, of_node_put,
of_node_release, and of_reconfig_notify. These trace points require a kernel
built with ftrace support to be enabled. In a typical environment where
debugfs is mounted at /sys/kernel/debug the entire set of tracepoints
can be set with the following:

  echo "of:*" > /sys/kernel/debug/tracing/set_event

or

  echo 1 > /sys/kernel/debug/tracing/of/enable

The following shows the trace point data from a DLPAR remove of a cpu
from a pseries lpar:

cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"

cpuhp/23-147   [023] ....   128.324827:
	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324829:
	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324829:
	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324831:
	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439000:
	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439002:
	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
			    prop->name=null, old_prop->name=null
   drmgr-7284  [009] ....   128.439015:
	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439016:
	of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 drivers/of/dynamic.c      | 30 ++++++---------
 include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+), 18 deletions(-)
 create mode 100644 include/trace/events/of.h

Comments

Tyrel Datwyler April 18, 2017, 12:35 a.m. UTC | #1
FYI, this patch was meant to be sent as an RFC. Looks like the RFC tag
got lost in my last spin.

On 04/17/2017 05:32 PM, Tyrel Datwyler wrote:
> This patch introduces event tracepoints for tracking a device_nodes
> reference cycle as well as reconfig notifications generated in response
> to node/property manipulations.
> 
> With the recent upstreaming of the refcount API several device_node
> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
> resources at runtime such as cpus or IOAs). These tracepoints provide a
> easy and quick mechanism for validating the reference counting of
> device_nodes during their lifetime.
> 
> Further, when pseries lpars are migrated to a different machine we
> perform a live update of our device tree to bring it into alignment with the
> configuration of the new machine. The of_reconfig_notify trace point
> provides a mechanism that can be turned for debuging the device tree
> modifications with out having to build a custom kernel to get at the
> DEBUG code introduced by commit 00aa3720.
> 
> The following trace events are provided: of_node_get, of_node_put,
> of_node_release, and of_reconfig_notify. These trace points require a kernel
> built with ftrace support to be enabled. In a typical environment where
> debugfs is mounted at /sys/kernel/debug the entire set of tracepoints
> can be set with the following:
> 
>   echo "of:*" > /sys/kernel/debug/tracing/set_event
> 
> or
> 
>   echo 1 > /sys/kernel/debug/tracing/of/enable
> 
> The following shows the trace point data from a DLPAR remove of a cpu
> from a pseries lpar:
> 
> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
> 
> cpuhp/23-147   [023] ....   128.324827:
> 	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
> 	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
> 	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
> 	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
> 			    prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
> 	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
> 	of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> ---
>  drivers/of/dynamic.c      | 30 ++++++---------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 18 deletions(-)
>  create mode 100644 include/trace/events/of.h
> 
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 888fdbc..85c0966 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -16,6 +16,9 @@
> 
>  #include "of_private.h"
> 
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/of.h>
> +
>  /**
>   * of_node_get() - Increment refcount of a node
>   * @node:	Node to inc refcount, NULL is supported to simplify writing of
> @@ -25,8 +28,10 @@
>   */
>  struct device_node *of_node_get(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
>  		kobject_get(&node->kobj);
> +		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
> +	}
>  	return node;
>  }
>  EXPORT_SYMBOL(of_node_get);
> @@ -38,8 +43,10 @@ struct device_node *of_node_get(struct device_node *node)
>   */
>  void of_node_put(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
> +		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
>  		kobject_put(&node->kobj);
> +	}
>  }
>  EXPORT_SYMBOL(of_node_put);
> 
> @@ -92,24 +99,9 @@ int of_reconfig_notifier_unregister(struct notifier_block *nb)
>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>  {
>  	int rc;
> -#ifdef DEBUG
> -	struct of_reconfig_data *pr = p;
> 
> -	switch (action) {
> -	case OF_RECONFIG_ATTACH_NODE:
> -	case OF_RECONFIG_DETACH_NODE:
> -		pr_debug("notify %-15s %s\n", action_names[action],
> -			pr->dn->full_name);
> -		break;
> -	case OF_RECONFIG_ADD_PROPERTY:
> -	case OF_RECONFIG_REMOVE_PROPERTY:
> -	case OF_RECONFIG_UPDATE_PROPERTY:
> -		pr_debug("notify %-15s %s:%s\n", action_names[action],
> -			pr->dn->full_name, pr->prop->name);
> -		break;
> +	trace_of_reconfig_notify(action, p);
> 
> -	}
> -#endif
>  	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>  	return notifier_to_errno(rc);
>  }
> @@ -326,6 +318,8 @@ void of_node_release(struct kobject *kobj)
>  	struct device_node *node = kobj_to_device_node(kobj);
>  	struct property *prop = node->properties;
> 
> +	trace_of_node_release(node);
> +
>  	/* We should never be releasing nodes that haven't been detached. */
>  	if (!of_node_check_flag(node, OF_DETACHED)) {
>  		pr_err("ERROR: Bad of_node_put() on %s\n", node->full_name);
> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
> new file mode 100644
> index 0000000..0d53271
> --- /dev/null
> +++ b/include/trace/events/of.h
> @@ -0,0 +1,93 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM of
> +
> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_OF_H
> +
> +#include <linux/of.h>
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(of_node_ref_template,
> +
> +	TP_PROTO(int refcount, const char* dn_name),
> +
> +	TP_ARGS(refcount, dn_name),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn_name)
> +		__field(int, refcount)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn_name);
> +		__entry->refcount = refcount;
> +	),
> +
> +	TP_printk("refcount=%d, dn->full_name=%s",
> +		  __entry->refcount, __get_str(dn_name))
> +);
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_get,
> +	     TP_PROTO(int refcount, const char* dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_put,
> +	     TP_PROTO(int refcount, const char* dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +TRACE_EVENT(of_node_release,
> +
> +	TP_PROTO(struct device_node *dn),
> +
> +	TP_ARGS(dn),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn->full_name)
> +		__field(unsigned long, flags)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn->full_name);
> +		__entry->flags = dn->_flags;
> +	),
> +
> +	TP_printk("dn->full_name=%s, dn->_flags=%lu", 
> +		  __get_str(dn_name), __entry->flags)
> +);
> +
> +#define of_reconfig_action_names \
> +	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
> +	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
> +	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
> +	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
> +	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
> +
> +TRACE_EVENT(of_reconfig_notify,
> +
> +	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
> +
> +	TP_ARGS(action, ord),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, action)
> +		__string(dn_name, ord->dn->full_name)
> +		__string(prop_name, ord->prop ? ord->prop->name : "null")
> +		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
> +	),
> +
> +	TP_fast_assign(
> +		__entry->action = action;
> +		__assign_str(dn_name, ord->dn->full_name);
> +		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
> +		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
> +	),
> +
> +	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
> +		  __print_symbolic(__entry->action, of_reconfig_action_names),
> +		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
> +);
> +
> +#endif /*	_TRACE_OF_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
>
Rob Herring April 18, 2017, 4:46 p.m. UTC | #2
On Mon, Apr 17, 2017 at 7:32 PM, Tyrel Datwyler
<tyreld@linux.vnet.ibm.com> wrote:
> This patch introduces event tracepoints for tracking a device_nodes
> reference cycle as well as reconfig notifications generated in response
> to node/property manipulations.
>
> With the recent upstreaming of the refcount API several device_node
> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
> resources at runtime such as cpus or IOAs). These tracepoints provide a
> easy and quick mechanism for validating the reference counting of
> device_nodes during their lifetime.

Not really relevant for this patch, but since you are looking at
pseries and refcounting, the refcounting largely exists for pseries.
It's also hard to get right as this type of fix is fairly common. It's
now used for overlays, but we really probably only need to refcount
the overlays or changesets as a whole, not at a node level. If you
have any thoughts on how a different model of refcounting could work
for pseries, I'd like to discuss it.

> Further, when pseries lpars are migrated to a different machine we
> perform a live update of our device tree to bring it into alignment with the
> configuration of the new machine. The of_reconfig_notify trace point
> provides a mechanism that can be turned for debuging the device tree
> modifications with out having to build a custom kernel to get at the
> DEBUG code introduced by commit 00aa3720.
>
> The following trace events are provided: of_node_get, of_node_put,
> of_node_release, and of_reconfig_notify. These trace points require a kernel
> built with ftrace support to be enabled. In a typical environment where
> debugfs is mounted at /sys/kernel/debug the entire set of tracepoints
> can be set with the following:
>
>   echo "of:*" > /sys/kernel/debug/tracing/set_event
>
> or
>
>   echo 1 > /sys/kernel/debug/tracing/of/enable
>
> The following shows the trace point data from a DLPAR remove of a cpu
> from a pseries lpar:
>
> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
>
> cpuhp/23-147   [023] ....   128.324827:
>         of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
>         of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
>         of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
>         of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
>         of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>                             prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
>         of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
>         of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
>
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> ---
>  drivers/of/dynamic.c      | 30 ++++++---------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 18 deletions(-)
>  create mode 100644 include/trace/events/of.h
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 888fdbc..85c0966 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -16,6 +16,9 @@
>
>  #include "of_private.h"
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/of.h>
> +
>  /**
>   * of_node_get() - Increment refcount of a node
>   * @node:      Node to inc refcount, NULL is supported to simplify writing of
> @@ -25,8 +28,10 @@
>   */
>  struct device_node *of_node_get(struct device_node *node)
>  {
> -       if (node)
> +       if (node) {
>                 kobject_get(&node->kobj);
> +               trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);

Seems like there should be a kobj wrapper to read the refcount.

> +       }
>         return node;
>  }
>  EXPORT_SYMBOL(of_node_get);
> @@ -38,8 +43,10 @@ struct device_node *of_node_get(struct device_node *node)
>   */
>  void of_node_put(struct device_node *node)
>  {
> -       if (node)
> +       if (node) {
> +               trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
>                 kobject_put(&node->kobj);
> +       }
>  }
>  EXPORT_SYMBOL(of_node_put);
>
> @@ -92,24 +99,9 @@ int of_reconfig_notifier_unregister(struct notifier_block *nb)
>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>  {
>         int rc;
> -#ifdef DEBUG
> -       struct of_reconfig_data *pr = p;
>
> -       switch (action) {
> -       case OF_RECONFIG_ATTACH_NODE:
> -       case OF_RECONFIG_DETACH_NODE:
> -               pr_debug("notify %-15s %s\n", action_names[action],
> -                       pr->dn->full_name);
> -               break;
> -       case OF_RECONFIG_ADD_PROPERTY:
> -       case OF_RECONFIG_REMOVE_PROPERTY:
> -       case OF_RECONFIG_UPDATE_PROPERTY:
> -               pr_debug("notify %-15s %s:%s\n", action_names[action],
> -                       pr->dn->full_name, pr->prop->name);
> -               break;
> +       trace_of_reconfig_notify(action, p);
>
> -       }
> -#endif
>         rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>         return notifier_to_errno(rc);
>  }
> @@ -326,6 +318,8 @@ void of_node_release(struct kobject *kobj)
>         struct device_node *node = kobj_to_device_node(kobj);
>         struct property *prop = node->properties;
>
> +       trace_of_node_release(node);
> +
>         /* We should never be releasing nodes that haven't been detached. */
>         if (!of_node_check_flag(node, OF_DETACHED)) {
>                 pr_err("ERROR: Bad of_node_put() on %s\n", node->full_name);
> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
> new file mode 100644
> index 0000000..0d53271
> --- /dev/null
> +++ b/include/trace/events/of.h
> @@ -0,0 +1,93 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM of
> +
> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_OF_H
> +
> +#include <linux/of.h>
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(of_node_ref_template,
> +
> +       TP_PROTO(int refcount, const char* dn_name),
> +
> +       TP_ARGS(refcount, dn_name),
> +
> +       TP_STRUCT__entry(
> +               __string(dn_name, dn_name)
> +               __field(int, refcount)
> +       ),
> +
> +       TP_fast_assign(
> +               __assign_str(dn_name, dn_name);
> +               __entry->refcount = refcount;
> +       ),
> +
> +       TP_printk("refcount=%d, dn->full_name=%s",
> +                 __entry->refcount, __get_str(dn_name))
> +);
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_get,
> +            TP_PROTO(int refcount, const char* dn_name),
> +            TP_ARGS(refcount, dn_name));
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_put,
> +            TP_PROTO(int refcount, const char* dn_name),
> +            TP_ARGS(refcount, dn_name));
> +
> +TRACE_EVENT(of_node_release,
> +
> +       TP_PROTO(struct device_node *dn),
> +
> +       TP_ARGS(dn),
> +
> +       TP_STRUCT__entry(
> +               __string(dn_name, dn->full_name)
> +               __field(unsigned long, flags)
> +       ),
> +
> +       TP_fast_assign(
> +               __assign_str(dn_name, dn->full_name);
> +               __entry->flags = dn->_flags;
> +       ),
> +
> +       TP_printk("dn->full_name=%s, dn->_flags=%lu",
> +                 __get_str(dn_name), __entry->flags)
> +);
> +
> +#define of_reconfig_action_names \
> +       {OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
> +       {OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
> +       {OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
> +       {OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
> +       {OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
> +
> +TRACE_EVENT(of_reconfig_notify,
> +
> +       TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
> +
> +       TP_ARGS(action, ord),
> +
> +       TP_STRUCT__entry(
> +               __field(unsigned long, action)
> +               __string(dn_name, ord->dn->full_name)
> +               __string(prop_name, ord->prop ? ord->prop->name : "null")
> +               __string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->action = action;
> +               __assign_str(dn_name, ord->dn->full_name);
> +               __assign_str(prop_name, ord->prop ? ord->prop->name : "null");
> +               __assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
> +       ),
> +
> +       TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
> +                 __print_symbolic(__entry->action, of_reconfig_action_names),
> +                 __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
> +);
> +
> +#endif /*      _TRACE_OF_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> --
> 1.8.3.1
>
Frank Rowand April 19, 2017, 12:07 a.m. UTC | #3
On 04/17/17 17:32, Tyrel Datwyler wrote:
> This patch introduces event tracepoints for tracking a device_nodes
> reference cycle as well as reconfig notifications generated in response
> to node/property manipulations.
> 
> With the recent upstreaming of the refcount API several device_node
> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
> resources at runtime such as cpus or IOAs). These tracepoints provide a
> easy and quick mechanism for validating the reference counting of
> device_nodes during their lifetime.
> 
> Further, when pseries lpars are migrated to a different machine we
> perform a live update of our device tree to bring it into alignment with the
> configuration of the new machine. The of_reconfig_notify trace point
> provides a mechanism that can be turned for debuging the device tree
> modifications with out having to build a custom kernel to get at the
> DEBUG code introduced by commit 00aa3720.

I do not like changing individual (or small groups of) printk() style
debugging information to tracepoint style.

As far as I know, there is no easy way to combine trace data and printk()
style data to create a single chronology of events.  If some of the
information needed to debug an issue is trace data and some is printk()
style data then it becomes more difficult to understand the overall
situation.

If Rob wants to convert printk() style data to trace data (and I can't
convince him otherwise) then I will have further comments on this specific
patch.

-Frank

> 
> The following trace events are provided: of_node_get, of_node_put,
> of_node_release, and of_reconfig_notify. These trace points require a kernel
> built with ftrace support to be enabled. In a typical environment where
> debugfs is mounted at /sys/kernel/debug the entire set of tracepoints
> can be set with the following:
> 
>   echo "of:*" > /sys/kernel/debug/tracing/set_event
> 
> or
> 
>   echo 1 > /sys/kernel/debug/tracing/of/enable
> 
> The following shows the trace point data from a DLPAR remove of a cpu
> from a pseries lpar:
> 
> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
> 
> cpuhp/23-147   [023] ....   128.324827:
> 	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
> 	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
> 	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
> 	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
> 			    prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
> 	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
> 	of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> ---
>  drivers/of/dynamic.c      | 30 ++++++---------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 18 deletions(-)
>  create mode 100644 include/trace/events/of.h
> 

< snip >
Michael Ellerman April 19, 2017, 1:31 a.m. UTC | #4
Frank Rowand <frowand.list@gmail.com> writes:

> On 04/17/17 17:32, Tyrel Datwyler wrote:
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>> 
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>> easy and quick mechanism for validating the reference counting of
>> device_nodes during their lifetime.
>> 
>> Further, when pseries lpars are migrated to a different machine we
>> perform a live update of our device tree to bring it into alignment with the
>> configuration of the new machine. The of_reconfig_notify trace point
>> provides a mechanism that can be turned for debuging the device tree
>> modifications with out having to build a custom kernel to get at the
>> DEBUG code introduced by commit 00aa3720.
>
> I do not like changing individual (or small groups of) printk() style
> debugging information to tracepoint style.

I'm not quite sure which printks() you're referring to.

The only printks that are removed in this series are under #ifdef DEBUG,
and so are essentially not there unless you build a custom kernel.

They also only cover the reconfig case, which is actually less
interesting than the much more common and bug-prone get/put logic.

> As far as I know, there is no easy way to combine trace data and printk()
> style data to create a single chronology of events.  If some of the
> information needed to debug an issue is trace data and some is printk()
> style data then it becomes more difficult to understand the overall
> situation.

If you enable CONFIG_PRINTK_TIME then you should be able to just sort
the trace and the printk output by the timestamp. If you're really
trying to correlate the two then you should probably just be using
trace_printk().

But IMO this level of detail, tracing every get/put, does not belong in
printk. Trace points are absolutely the right solution for this type of
debugging.

cheers
Frank Rowand April 19, 2017, 1:42 a.m. UTC | #5
On 04/18/17 17:07, Frank Rowand wrote:
> On 04/17/17 17:32, Tyrel Datwyler wrote:
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>>
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>> easy and quick mechanism for validating the reference counting of
>> device_nodes during their lifetime.
>>
>> Further, when pseries lpars are migrated to a different machine we
>> perform a live update of our device tree to bring it into alignment with the
>> configuration of the new machine. The of_reconfig_notify trace point
>> provides a mechanism that can be turned for debuging the device tree
>> modifications with out having to build a custom kernel to get at the
>> DEBUG code introduced by commit 00aa3720.
> 
> I do not like changing individual (or small groups of) printk() style
> debugging information to tracepoint style.
> 
> As far as I know, there is no easy way to combine trace data and printk()
> style data to create a single chronology of events.  If some of the
> information needed to debug an issue is trace data and some is printk()
> style data then it becomes more difficult to understand the overall
> situation.

And of course the other issue with using tracepoints is the extra space
required to hold the tracepoint info.  With the pr_debug() approach, the
space usage can be easily removed for a production kernel via a config
option.

Tracepoints are wonderful technology, but not always the proper tool to
use for debug info.

> If Rob wants to convert printk() style data to trace data (and I can't
> convince him otherwise) then I will have further comments on this specific
> patch.
> 
> -Frank
> 
>>
>> The following trace events are provided: of_node_get, of_node_put,
>> of_node_release, and of_reconfig_notify. These trace points require a kernel
>> built with ftrace support to be enabled. In a typical environment where
>> debugfs is mounted at /sys/kernel/debug the entire set of tracepoints
>> can be set with the following:
>>
>>   echo "of:*" > /sys/kernel/debug/tracing/set_event
>>
>> or
>>
>>   echo 1 > /sys/kernel/debug/tracing/of/enable
>>
>> The following shows the trace point data from a DLPAR remove of a cpu
>> from a pseries lpar:
>>
>> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
>>
>> cpuhp/23-147   [023] ....   128.324827:
>> 	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324829:
>> 	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324829:
>> 	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
>> cpuhp/23-147   [023] ....   128.324831:
>> 	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439000:
>> 	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439002:
>> 	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
>> 			    prop->name=null, old_prop->name=null
>>    drmgr-7284  [009] ....   128.439015:
>> 	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>>    drmgr-7284  [009] ....   128.439016:
>> 	of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
>> ---
>>  drivers/of/dynamic.c      | 30 ++++++---------
>>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 105 insertions(+), 18 deletions(-)
>>  create mode 100644 include/trace/events/of.h
>>
> 
> < snip >
> 
>
Oliver O'Halloran April 19, 2017, 2:30 a.m. UTC | #6
On Wed, Apr 19, 2017 at 2:46 AM, Rob Herring <robh+dt@kernel.org> wrote:
> On Mon, Apr 17, 2017 at 7:32 PM, Tyrel Datwyler
> <tyreld@linux.vnet.ibm.com> wrote:
>> This patch introduces event tracepoints for tracking a device_nodes
>> reference cycle as well as reconfig notifications generated in response
>> to node/property manipulations.
>>
>> With the recent upstreaming of the refcount API several device_node
>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>> easy and quick mechanism for validating the reference counting of
>> device_nodes during their lifetime.
>
> Not really relevant for this patch, but since you are looking at
> pseries and refcounting, the refcounting largely exists for pseries.
> It's also hard to get right as this type of fix is fairly common. It's
> now used for overlays, but we really probably only need to refcount
> the overlays or changesets as a whole, not at a node level. If you
> have any thoughts on how a different model of refcounting could work
> for pseries, I'd like to discuss it.

One idea I've been kicking around is differentiating short and long
term references to a node. I figure most leaks are due to a missing
of_node_put() within a stack frame so it might be possible to use the
ftrace infrastructure to detect and emit warnings if a short term
reference is leaked. Long term references are slightly harder to deal
with, but they're less common so we can add more detailed reference
tracking there (devm_of_get_node?).

Oliver
Frank Rowand April 19, 2017, 2:31 a.m. UTC | #7
On 04/18/17 18:31, Michael Ellerman wrote:
> Frank Rowand <frowand.list@gmail.com> writes:
> 
>> On 04/17/17 17:32, Tyrel Datwyler wrote:
>>> This patch introduces event tracepoints for tracking a device_nodes
>>> reference cycle as well as reconfig notifications generated in response
>>> to node/property manipulations.
>>>
>>> With the recent upstreaming of the refcount API several device_node
>>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>>> easy and quick mechanism for validating the reference counting of
>>> device_nodes during their lifetime.
>>>
>>> Further, when pseries lpars are migrated to a different machine we
>>> perform a live update of our device tree to bring it into alignment with the
>>> configuration of the new machine. The of_reconfig_notify trace point
>>> provides a mechanism that can be turned for debuging the device tree
>>> modifications with out having to build a custom kernel to get at the
>>> DEBUG code introduced by commit 00aa3720.
>>
>> I do not like changing individual (or small groups of) printk() style
>> debugging information to tracepoint style.
> 
> I'm not quite sure which printks() you're referring to.
> 
> The only printks that are removed in this series are under #ifdef DEBUG,
> and so are essentially not there unless you build a custom kernel.

Yes, I am talking about pr_debug(), pr_info(), pr_err(), etc.


> 
> They also only cover the reconfig case, which is actually less
> interesting than the much more common and bug-prone get/put logic.

When I was looking at the get/put issue I used pr_debug().


>> As far as I know, there is no easy way to combine trace data and printk()
>> style data to create a single chronology of events.  If some of the
>> information needed to debug an issue is trace data and some is printk()
>> style data then it becomes more difficult to understand the overall
>> situation.
> 
> If you enable CONFIG_PRINTK_TIME then you should be able to just sort
> the trace and the printk output by the timestamp. If you're really
> trying to correlate the two then you should probably just be using
> trace_printk().

Except the existing debug code that uses pr_debug() does not use
trace_printk().

And "just sort" does not apply to multi-line output like:

cpuhp/23-147   [023] ....   128.324827:
	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324829:
	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324829:
	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
cpuhp/23-147   [023] ....   128.324831:
	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439000:
	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439002:
	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
			    prop->name=null, old_prop->name=null
   drmgr-7284  [009] ....   128.439015:
	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
   drmgr-7284  [009] ....   128.439016:
	of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4

I was kinda hoping that maybe someone had already created a tool to deal
with this issue.  But not too optimistic.


> But IMO this level of detail, tracing every get/put, does not belong in
> printk. Trace points are absolutely the right solution for this type of
> debugging.
> 
> cheers
> .
>
Steven Rostedt April 19, 2017, 2:46 a.m. UTC | #8
On Tue, 18 Apr 2017 17:07:17 -0700
Frank Rowand <frowand.list@gmail.com> wrote:


> As far as I know, there is no easy way to combine trace data and printk()
> style data to create a single chronology of events.  If some of the
> information needed to debug an issue is trace data and some is printk()
> style data then it becomes more difficult to understand the overall
> situation.

You mean like:

 # echo 1 > /sys/kernel/debug/tracing/events/printk/console/enable

Makes all printks also go into the ftrace ring buffer.

-- Steve

> 
> If Rob wants to convert printk() style data to trace data (and I can't
> convince him otherwise) then I will have further comments on this specific
> patch.
>
Steven Rostedt April 19, 2017, 2:49 a.m. UTC | #9
On Tue, 18 Apr 2017 18:42:32 -0700
Frank Rowand <frowand.list@gmail.com> wrote:

> And of course the other issue with using tracepoints is the extra space
> required to hold the tracepoint info.  With the pr_debug() approach, the
> space usage can be easily removed for a production kernel via a config
> option.

Now if you are saying you want to be able to enable debugging without
the tracing infrastructure I would agree. As the tracing infrastructure
is large. But I'm working on shrinking it more.

> 
> Tracepoints are wonderful technology, but not always the proper tool to
> use for debug info.

But if you are going to have tracing enabled regardless, adding a few
more tracepoints isn't going to make the difference.

-- Steve

> 
> > If Rob wants to convert printk() style data to trace data (and I can't
> > convince him otherwise) then I will have further comments on this specific
> > patch.
> >
Frank Rowand April 19, 2017, 3:12 a.m. UTC | #10
On 04/18/17 19:46, Steven Rostedt wrote:
> On Tue, 18 Apr 2017 17:07:17 -0700
> Frank Rowand <frowand.list@gmail.com> wrote:
> 
> 
>> As far as I know, there is no easy way to combine trace data and printk()
>> style data to create a single chronology of events.  If some of the
>> information needed to debug an issue is trace data and some is printk()
>> style data then it becomes more difficult to understand the overall
>> situation.
> 
> You mean like:
> 
>  # echo 1 > /sys/kernel/debug/tracing/events/printk/console/enable
> 
> Makes all printks also go into the ftrace ring buffer.

Thanks!  I was hoping there was going to be an easy answer like this.


> -- Steve
> 
>>
>> If Rob wants to convert printk() style data to trace data (and I can't
>> convince him otherwise) then I will have further comments on this specific
>> patch.
>>
> .
>
Michael Ellerman April 19, 2017, 10:13 a.m. UTC | #11
Oliver O'Halloran <oohall@gmail.com> writes:

> On Wed, Apr 19, 2017 at 2:46 AM, Rob Herring <robh+dt@kernel.org> wrote:
>> On Mon, Apr 17, 2017 at 7:32 PM, Tyrel Datwyler
>> <tyreld@linux.vnet.ibm.com> wrote:
>>> This patch introduces event tracepoints for tracking a device_nodes
>>> reference cycle as well as reconfig notifications generated in response
>>> to node/property manipulations.
>>>
>>> With the recent upstreaming of the refcount API several device_node
>>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>>> easy and quick mechanism for validating the reference counting of
>>> device_nodes during their lifetime.
>>
>> Not really relevant for this patch, but since you are looking at
>> pseries and refcounting, the refcounting largely exists for pseries.
>> It's also hard to get right as this type of fix is fairly common. It's
>> now used for overlays, but we really probably only need to refcount
>> the overlays or changesets as a whole, not at a node level. If you
>> have any thoughts on how a different model of refcounting could work
>> for pseries, I'd like to discuss it.
>
> One idea I've been kicking around is differentiating short and long
> term references to a node.

I actually did this a long time ago, but balked at the size of the patch
to do the conversion. Let me see if I can find it lying around ...

cheers
Frank Rowand April 19, 2017, 5:44 p.m. UTC | #12
On 04/17/17 17:32, Tyrel Datwyler wrote:
> This patch introduces event tracepoints for tracking a device_nodes
> reference cycle as well as reconfig notifications generated in response
> to node/property manipulations.
> 
> With the recent upstreaming of the refcount API several device_node
> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
> resources at runtime such as cpus or IOAs). These tracepoints provide a
> easy and quick mechanism for validating the reference counting of
> device_nodes during their lifetime.
> 
> Further, when pseries lpars are migrated to a different machine we
> perform a live update of our device tree to bring it into alignment with the
> configuration of the new machine. The of_reconfig_notify trace point
> provides a mechanism that can be turned for debuging the device tree
> modifications with out having to build a custom kernel to get at the
> DEBUG code introduced by commit 00aa3720.

Is the normal kernel built with CONFIG_DYNAMIC_DEBUG=y?  If so, then
simply removing the "ifdef DEBUG" around the switch in
of_reconfig_notify() would solve that issue.

-Frank


> The following trace events are provided: of_node_get, of_node_put,
> of_node_release, and of_reconfig_notify. These trace points require a kernel
> built with ftrace support to be enabled. In a typical environment where
> debugfs is mounted at /sys/kernel/debug the entire set of tracepoints
> can be set with the following:
> 
>   echo "of:*" > /sys/kernel/debug/tracing/set_event
> 
> or
> 
>   echo 1 > /sys/kernel/debug/tracing/of/enable
> 
> The following shows the trace point data from a DLPAR remove of a cpu
> from a pseries lpar:
> 
> cat /sys/kernel/debug/tracing/trace | grep "POWER8@10"
> 
> cpuhp/23-147   [023] ....   128.324827:
> 	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
> 	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
> 	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
> 	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
> 			    prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
> 	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
> 	of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
> 
> Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
> ---
>  drivers/of/dynamic.c      | 30 ++++++---------
>  include/trace/events/of.h | 93 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+), 18 deletions(-)
>  create mode 100644 include/trace/events/of.h
> 
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index 888fdbc..85c0966 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -16,6 +16,9 @@
>  
>  #include "of_private.h"
>  
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/of.h>
> +
>  /**
>   * of_node_get() - Increment refcount of a node
>   * @node:	Node to inc refcount, NULL is supported to simplify writing of
> @@ -25,8 +28,10 @@
>   */
>  struct device_node *of_node_get(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
>  		kobject_get(&node->kobj);
> +		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
> +	}
>  	return node;
>  }
>  EXPORT_SYMBOL(of_node_get);
> @@ -38,8 +43,10 @@ struct device_node *of_node_get(struct device_node *node)
>   */
>  void of_node_put(struct device_node *node)
>  {
> -	if (node)
> +	if (node) {
> +		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
>  		kobject_put(&node->kobj);
> +	}
>  }
>  EXPORT_SYMBOL(of_node_put);
>  
> @@ -92,24 +99,9 @@ int of_reconfig_notifier_unregister(struct notifier_block *nb)
>  int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
>  {
>  	int rc;
> -#ifdef DEBUG
> -	struct of_reconfig_data *pr = p;
>  
> -	switch (action) {
> -	case OF_RECONFIG_ATTACH_NODE:
> -	case OF_RECONFIG_DETACH_NODE:
> -		pr_debug("notify %-15s %s\n", action_names[action],
> -			pr->dn->full_name);
> -		break;
> -	case OF_RECONFIG_ADD_PROPERTY:
> -	case OF_RECONFIG_REMOVE_PROPERTY:
> -	case OF_RECONFIG_UPDATE_PROPERTY:
> -		pr_debug("notify %-15s %s:%s\n", action_names[action],
> -			pr->dn->full_name, pr->prop->name);
> -		break;
> +	trace_of_reconfig_notify(action, p);
>  
> -	}
> -#endif
>  	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
>  	return notifier_to_errno(rc);
>  }
> @@ -326,6 +318,8 @@ void of_node_release(struct kobject *kobj)
>  	struct device_node *node = kobj_to_device_node(kobj);
>  	struct property *prop = node->properties;
>  
> +	trace_of_node_release(node);
> +
>  	/* We should never be releasing nodes that haven't been detached. */
>  	if (!of_node_check_flag(node, OF_DETACHED)) {
>  		pr_err("ERROR: Bad of_node_put() on %s\n", node->full_name);
> diff --git a/include/trace/events/of.h b/include/trace/events/of.h
> new file mode 100644
> index 0000000..0d53271
> --- /dev/null
> +++ b/include/trace/events/of.h
> @@ -0,0 +1,93 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM of
> +
> +#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_OF_H
> +
> +#include <linux/of.h>
> +#include <linux/tracepoint.h>
> +
> +DECLARE_EVENT_CLASS(of_node_ref_template,
> +
> +	TP_PROTO(int refcount, const char* dn_name),
> +
> +	TP_ARGS(refcount, dn_name),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn_name)
> +		__field(int, refcount)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn_name);
> +		__entry->refcount = refcount;
> +	),
> +
> +	TP_printk("refcount=%d, dn->full_name=%s",
> +		  __entry->refcount, __get_str(dn_name))
> +);
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_get,
> +	     TP_PROTO(int refcount, const char* dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +DEFINE_EVENT(of_node_ref_template, of_node_put,
> +	     TP_PROTO(int refcount, const char* dn_name),
> +	     TP_ARGS(refcount, dn_name));
> +
> +TRACE_EVENT(of_node_release,
> +
> +	TP_PROTO(struct device_node *dn),
> +
> +	TP_ARGS(dn),
> +
> +	TP_STRUCT__entry(
> +		__string(dn_name, dn->full_name)
> +		__field(unsigned long, flags)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dn_name, dn->full_name);
> +		__entry->flags = dn->_flags;
> +	),
> +
> +	TP_printk("dn->full_name=%s, dn->_flags=%lu", 
> +		  __get_str(dn_name), __entry->flags)
> +);
> +
> +#define of_reconfig_action_names \
> +	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
> +	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
> +	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
> +	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
> +	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
> +
> +TRACE_EVENT(of_reconfig_notify,
> +
> +	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
> +
> +	TP_ARGS(action, ord),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, action)
> +		__string(dn_name, ord->dn->full_name)
> +		__string(prop_name, ord->prop ? ord->prop->name : "null")
> +		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
> +	),
> +
> +	TP_fast_assign(
> +		__entry->action = action;
> +		__assign_str(dn_name, ord->dn->full_name);
> +		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
> +		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
> +	),
> +
> +	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
> +		  __print_symbolic(__entry->action, of_reconfig_action_names),
> +		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
> +);
> +
> +#endif /*	_TRACE_OF_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
>
Tyrel Datwyler April 19, 2017, 6:33 p.m. UTC | #13
On 04/18/2017 07:31 PM, Frank Rowand wrote:
> On 04/18/17 18:31, Michael Ellerman wrote:
>> Frank Rowand <frowand.list@gmail.com> writes:
>>
>>> On 04/17/17 17:32, Tyrel Datwyler wrote:
>>>> This patch introduces event tracepoints for tracking a device_nodes
>>>> reference cycle as well as reconfig notifications generated in response
>>>> to node/property manipulations.
>>>>
>>>> With the recent upstreaming of the refcount API several device_node
>>>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>>>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>>>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>>>> easy and quick mechanism for validating the reference counting of
>>>> device_nodes during their lifetime.
>>>>
>>>> Further, when pseries lpars are migrated to a different machine we
>>>> perform a live update of our device tree to bring it into alignment with the
>>>> configuration of the new machine. The of_reconfig_notify trace point
>>>> provides a mechanism that can be turned for debuging the device tree
>>>> modifications with out having to build a custom kernel to get at the
>>>> DEBUG code introduced by commit 00aa3720.
>>>
>>> I do not like changing individual (or small groups of) printk() style
>>> debugging information to tracepoint style.
>>
>> I'm not quite sure which printks() you're referring to.
>>
>> The only printks that are removed in this series are under #ifdef DEBUG,
>> and so are essentially not there unless you build a custom kernel.
> 
> Yes, I am talking about pr_debug(), pr_info(), pr_err(), etc.
> 
> 
>>
>> They also only cover the reconfig case, which is actually less
>> interesting than the much more common and bug-prone get/put logic.
> 
> When I was looking at the get/put issue I used pr_debug().
> 
> 
>>> As far as I know, there is no easy way to combine trace data and printk()
>>> style data to create a single chronology of events.  If some of the
>>> information needed to debug an issue is trace data and some is printk()
>>> style data then it becomes more difficult to understand the overall
>>> situation.
>>
>> If you enable CONFIG_PRINTK_TIME then you should be able to just sort
>> the trace and the printk output by the timestamp. If you're really
>> trying to correlate the two then you should probably just be using
>> trace_printk().
> 
> Except the existing debug code that uses pr_debug() does not use
> trace_printk().
> 
> And "just sort" does not apply to multi-line output like:
> 
> cpuhp/23-147   [023] ....   128.324827:
> 	of_node_put: refcount=5, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=4, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324829:
> 	of_node_put: refcount=3, dn->full_name=/cpus/PowerPC,POWER8@10
> cpuhp/23-147   [023] ....   128.324831:
> 	of_node_put: refcount=2, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439000:
> 	of_node_put: refcount=1, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439002:
> 	of_reconfig_notify: action=DETACH_NODE, dn->full_name=/cpus/PowerPC,POWER8@10,
> 			    prop->name=null, old_prop->name=null
>    drmgr-7284  [009] ....   128.439015:
> 	of_node_put: refcount=0, dn->full_name=/cpus/PowerPC,POWER8@10
>    drmgr-7284  [009] ....   128.439016:
> 	of_node_release: dn->full_name=/cpus/PowerPC,POWER8@10, dn->_flags=4
> 
> I was kinda hoping that maybe someone had already created a tool to deal
> with this issue.  But not too optimistic.

This output was actually broken into multiple lines for the commit
message. Each trace point is actually a single line in the trace buffer.
This output was pulled from the trace buffer with the following:

cat /sys/kernel/debug/trace/tracing | grep "POWER8@10"

-Tyrel

> 
> 
>> But IMO this level of detail, tracing every get/put, does not belong in
>> printk. Trace points are absolutely the right solution for this type of
>> debugging.
>>
>> cheers
>> .
>>
>
Tyrel Datwyler April 19, 2017, 6:45 p.m. UTC | #14
On 04/18/2017 07:49 PM, Steven Rostedt wrote:
> On Tue, 18 Apr 2017 18:42:32 -0700
> Frank Rowand <frowand.list@gmail.com> wrote:
> 
>> And of course the other issue with using tracepoints is the extra space
>> required to hold the tracepoint info.  With the pr_debug() approach, the
>> space usage can be easily removed for a production kernel via a config
>> option.
> 
> Now if you are saying you want to be able to enable debugging without
> the tracing infrastructure I would agree. As the tracing infrastructure
> is large. But I'm working on shrinking it more.

The primary consumers of OF_DYNAMIC seem to be pseries and powernv where
we are generally going to see the trace infrastructure enabled by
default in production.

-Tyrel

> 
>>
>> Tracepoints are wonderful technology, but not always the proper tool to
>> use for debug info.
> 
> But if you are going to have tracing enabled regardless, adding a few
> more tracepoints isn't going to make the difference.
> 
> -- Steve
> 
>>
>>> If Rob wants to convert printk() style data to trace data (and I can't
>>> convince him otherwise) then I will have further comments on this specific
>>> patch.
>>>
Tyrel Datwyler April 19, 2017, 9:13 p.m. UTC | #15
On 04/19/2017 03:13 AM, Michael Ellerman wrote:
> Oliver O'Halloran <oohall@gmail.com> writes:
> 
>> On Wed, Apr 19, 2017 at 2:46 AM, Rob Herring <robh+dt@kernel.org> wrote:
>>> On Mon, Apr 17, 2017 at 7:32 PM, Tyrel Datwyler
>>> <tyreld@linux.vnet.ibm.com> wrote:
>>>> This patch introduces event tracepoints for tracking a device_nodes
>>>> reference cycle as well as reconfig notifications generated in response
>>>> to node/property manipulations.
>>>>
>>>> With the recent upstreaming of the refcount API several device_node
>>>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>>>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>>>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>>>> easy and quick mechanism for validating the reference counting of
>>>> device_nodes during their lifetime.
>>>
>>> Not really relevant for this patch, but since you are looking at
>>> pseries and refcounting, the refcounting largely exists for pseries.
>>> It's also hard to get right as this type of fix is fairly common. It's
>>> now used for overlays, but we really probably only need to refcount
>>> the overlays or changesets as a whole, not at a node level. If you
>>> have any thoughts on how a different model of refcounting could work
>>> for pseries, I'd like to discuss it.
>>
>> One idea I've been kicking around is differentiating short and long
>> term references to a node.
> 
> I actually did this a long time ago, but balked at the size of the patch
> to do the conversion. Let me see if I can find it lying around ...

It would be interesting to revisit this, and toss around any other ideas
anyone may have.

-Tyrel

> 
> cheers
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Tyrel Datwyler April 19, 2017, 11:27 p.m. UTC | #16
On 04/18/2017 06:31 PM, Michael Ellerman wrote:
> Frank Rowand <frowand.list@gmail.com> writes:
> 
>> On 04/17/17 17:32, Tyrel Datwyler wrote:
>>> This patch introduces event tracepoints for tracking a device_nodes
>>> reference cycle as well as reconfig notifications generated in response
>>> to node/property manipulations.
>>>
>>> With the recent upstreaming of the refcount API several device_node
>>> underflows and leaks have come to my attention in the pseries (DLPAR) dynamic
>>> logical partitioning code (ie. POWER speak for hotplugging virtual and physcial
>>> resources at runtime such as cpus or IOAs). These tracepoints provide a
>>> easy and quick mechanism for validating the reference counting of
>>> device_nodes during their lifetime.
>>>
>>> Further, when pseries lpars are migrated to a different machine we
>>> perform a live update of our device tree to bring it into alignment with the
>>> configuration of the new machine. The of_reconfig_notify trace point
>>> provides a mechanism that can be turned for debuging the device tree
>>> modifications with out having to build a custom kernel to get at the
>>> DEBUG code introduced by commit 00aa3720.
>>
>> I do not like changing individual (or small groups of) printk() style
>> debugging information to tracepoint style.
> 
> I'm not quite sure which printks() you're referring to.
> 
> The only printks that are removed in this series are under #ifdef DEBUG,
> and so are essentially not there unless you build a custom kernel.
> 
> They also only cover the reconfig case, which is actually less
> interesting than the much more common and bug-prone get/put logic.
> 
>> As far as I know, there is no easy way to combine trace data and printk()
>> style data to create a single chronology of events.  If some of the
>> information needed to debug an issue is trace data and some is printk()
>> style data then it becomes more difficult to understand the overall
>> situation.
> 
> If you enable CONFIG_PRINTK_TIME then you should be able to just sort
> the trace and the printk output by the timestamp. If you're really
> trying to correlate the two then you should probably just be using
> trace_printk().
> 
> But IMO this level of detail, tracing every get/put, does not belong in
> printk. Trace points are absolutely the right solution for this type of
> debugging.

Something else to keep in mind is that while pr_debugs could be used to
provide feedback on the reference counts and of_reconfig events they
don't in anyway tell us where they are happening in the kernel. The
trace infrastructure provides the ability to stack trace those events.
The following example provides me a lot more information about who is
doing what and where after I hot-add an ethernet adapter:

# echo stacktrace > /sys/kernel/debug/tracing/trace_options
# cat trace | grep -A6 "/pci@800000020000018"
...
           drmgr-7349  [006] d...  7138.821875: of_node_get: refcount=8,
dn->full_name=/pci@800000020000018
           drmgr-7349  [006] d...  7138.821876: <stack trace>
 => .msi_quota_for_device
 => .rtas_setup_msi_irqs
 => .arch_setup_msi_irqs
 => .__pci_enable_msix
 => .pci_enable_msix_range
--
           drmgr-7349  [006] d...  7138.821876: of_node_put: refcount=2,
dn->full_name=/pci@800000020000018/ethernet@0
           drmgr-7349  [006] d...  7138.821877: <stack trace>
 => .msi_quota_for_device
 => .rtas_setup_msi_irqs
 => .arch_setup_msi_irqs
 => .__pci_enable_msix
 => .pci_enable_msix_range
--
           drmgr-7349  [006] ....  7138.821878: of_node_put: refcount=7,
dn->full_name=/pci@800000020000018
           drmgr-7349  [006] ....  7138.821879: <stack trace>
 => .rtas_setup_msi_irqs
 => .arch_setup_msi_irqs
 => .__pci_enable_msix
 => .pci_enable_msix_range
 => .bnx2x_enable_msix
--

To get that same info as far as I know is to add a dump_stack() after
each pr_debug.

Further, filters can be set on the tracepoint event fields such that
trace data could be restricted to a particular device_node or refcount
threshold. For example:

# cd /sys/kernel/debug/tracing# cat events/of/of_node_get/format
# echo "dn_name == /pci@800000020000018" > events/of/filter

# cat trace
           drmgr-10542 [003] ....  9630.677001: of_node_put: refcount=5,
dn->full_name=/pci@800000020000018
           drmgr-10542 [003] d...  9631.677368: of_node_get: refcount=6,
dn->full_name=/pci@800000020000018
           drmgr-10542 [003] ....  9631.677389: of_node_put: refcount=5,
dn->full_name=/pci@800000020000018
           drmgr-10542 [003] ....  9631.677390: of_reconfig_notify:
action=DETACH_NODE, dn->full_name=/pci@800000020000018, prop->name=null,
old_prop->name=null
           drmgr-10542 [003] .n..  9632.025656: of_node_put: refcount=4,
dn->full_name=/pci@800000020000018
           drmgr-10542 [003] .n..  9632.025657: of_node_put: refcount=3,
dn->full_name=/pci@800000020000018

After setting the filter and doing a hot-remove of the pci device in
question the trace quickly tells me 3 references are being leaked. In
combination with the stacktrace option I can quickly correlate call
sites that take references without releasing them.

-Tyrel

> 
> cheers
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Steven Rostedt April 20, 2017, 2:33 a.m. UTC | #17
On Wed, 19 Apr 2017 16:27:10 -0700
Tyrel Datwyler <turtle.in.the.kernel@gmail.com> wrote:

> # echo stacktrace > /sys/kernel/debug/tracing/trace_options
> # cat trace | grep -A6 "/pci@800000020000018"

Just to let you know that there is now stacktrace event triggers, where
you don't need to stacktrace all events, you can pick and choose. And
even filter the stack trace on specific fields of the event.

 # cd /sys/kernel/debug/tracing
 # echo "stacktrace if common_pid == $$ && reason == 3" \
   > events/tlb/tlb_flush/trigger

 # cat trace
            bash-1103  [003] ...1  1290.100133: tlb_flush: pages:-1 reason:local mm shootdown (3)
            bash-1103  [003] ...2  1290.100140: <stack trace>
 => copy_process.part.39
 => _do_fork
 => SyS_clone
 => do_syscall_64
 => return_from_SYSCALL_64

-- Steve
Frank Rowand April 20, 2017, 2:37 a.m. UTC | #18
On 04/19/17 11:45, Tyrel Datwyler wrote:
> On 04/18/2017 07:49 PM, Steven Rostedt wrote:
>> On Tue, 18 Apr 2017 18:42:32 -0700
>> Frank Rowand <frowand.list@gmail.com> wrote:
>>
>>> And of course the other issue with using tracepoints is the extra space
>>> required to hold the tracepoint info.  With the pr_debug() approach, the
>>> space usage can be easily removed for a production kernel via a config
>>> option.
>>
>> Now if you are saying you want to be able to enable debugging without
>> the tracing infrastructure I would agree. As the tracing infrastructure
>> is large. But I'm working on shrinking it more.
> 
> The primary consumers of OF_DYNAMIC seem to be pseries and powernv where
> we are generally going to see the trace infrastructure enabled by
> default in production.

Another primary consumer will be overlays for ARM expansion boards.  Still
a work in progress.

-Frank

> 
> -Tyrel
> 
>>
>>>
>>> Tracepoints are wonderful technology, but not always the proper tool to
>>> use for debug info.
>>
>> But if you are going to have tracing enabled regardless, adding a few
>> more tracepoints isn't going to make the difference.
>>
>> -- Steve
>>
>>>
>>>> If Rob wants to convert printk() style data to trace data (and I can't
>>>> convince him otherwise) then I will have further comments on this specific
>>>> patch.
>>>>
> 
> .
>
Frank Rowand April 20, 2017, 4:47 a.m. UTC | #19
On 04/19/17 19:33, Steven Rostedt wrote:
> On Wed, 19 Apr 2017 16:27:10 -0700
> Tyrel Datwyler <turtle.in.the.kernel@gmail.com> wrote:
> 
>> # echo stacktrace > /sys/kernel/debug/tracing/trace_options
>> # cat trace | grep -A6 "/pci@800000020000018"
> 
> Just to let you know that there is now stacktrace event triggers, where
> you don't need to stacktrace all events, you can pick and choose. And
> even filter the stack trace on specific fields of the event.
> 
>  # cd /sys/kernel/debug/tracing
>  # echo "stacktrace if common_pid == $$ && reason == 3" \
>    > events/tlb/tlb_flush/trigger
> 
>  # cat trace
>             bash-1103  [003] ...1  1290.100133: tlb_flush: pages:-1 reason:local mm shootdown (3)
>             bash-1103  [003] ...2  1290.100140: <stack trace>
>  => copy_process.part.39
>  => _do_fork
>  => SyS_clone
>  => do_syscall_64
>  => return_from_SYSCALL_64
> 
> -- Steve
> .
> 

Thanks for chiming in.

The power and flexibility of the trace tools is quite amazing
I need to make room in my schedule to catch up on what has been
added in the last several years.

-Frank
Tyrel Datwyler April 20, 2017, 5:24 a.m. UTC | #20
On 04/19/2017 07:33 PM, Steven Rostedt wrote:
> On Wed, 19 Apr 2017 16:27:10 -0700
> Tyrel Datwyler <turtle.in.the.kernel@gmail.com> wrote:
> 
>> # echo stacktrace > /sys/kernel/debug/tracing/trace_options
>> # cat trace | grep -A6 "/pci@800000020000018"
> 
> Just to let you know that there is now stacktrace event triggers, where
> you don't need to stacktrace all events, you can pick and choose. And
> even filter the stack trace on specific fields of the event.

This is great, and I did figure that out this afternoon. One thing I was
still trying to determine though was whether its possible to set these
triggers at boot? As far as I could tell I'm still limited to
"trace_options=stacktrace" as a kernel boot parameter to get the stack
for event tracepoints.

-Tyrel

> 
>  # cd /sys/kernel/debug/tracing
>  # echo "stacktrace if common_pid == $$ && reason == 3" \
>    > events/tlb/tlb_flush/trigger
> 
>  # cat trace
>             bash-1103  [003] ...1  1290.100133: tlb_flush: pages:-1 reason:local mm shootdown (3)
>             bash-1103  [003] ...2  1290.100140: <stack trace>
>  => copy_process.part.39
>  => _do_fork
>  => SyS_clone
>  => do_syscall_64
>  => return_from_SYSCALL_64
> 
> -- Steve
>
Frank Rowand April 20, 2017, 10:44 a.m. UTC | #21
On 04/19/17 19:37, Frank Rowand wrote:
> On 04/19/17 11:45, Tyrel Datwyler wrote:
>> On 04/18/2017 07:49 PM, Steven Rostedt wrote:
>>> On Tue, 18 Apr 2017 18:42:32 -0700
>>> Frank Rowand <frowand.list@gmail.com> wrote:
>>>
>>>> And of course the other issue with using tracepoints is the extra space
>>>> required to hold the tracepoint info.  With the pr_debug() approach, the
>>>> space usage can be easily removed for a production kernel via a config
>>>> option.
>>>
>>> Now if you are saying you want to be able to enable debugging without
>>> the tracing infrastructure I would agree. As the tracing infrastructure
>>> is large. But I'm working on shrinking it more.
>>
>> The primary consumers of OF_DYNAMIC seem to be pseries and powernv where
>> we are generally going to see the trace infrastructure enabled by
>> default in production.
> 
> Another primary consumer will be overlays for ARM expansion boards.  Still
> a work in progress.

And dynamic configuration for the FPGA folks.


> -Frank
> 
>>
>> -Tyrel
>>
>>>
>>>>
>>>> Tracepoints are wonderful technology, but not always the proper tool to
>>>> use for debug info.
>>>
>>> But if you are going to have tracing enabled regardless, adding a few
>>> more tracepoints isn't going to make the difference.
>>>
>>> -- Steve
>>>
>>>>
>>>>> If Rob wants to convert printk() style data to trace data (and I can't
>>>>> convince him otherwise) then I will have further comments on this specific
>>>>> patch.
>>>>>
>>
>> .
>>
> 
> .
>
Steven Rostedt April 20, 2017, 1:37 p.m. UTC | #22
On Wed, 19 Apr 2017 22:24:58 -0700
Tyrel Datwyler <turtle.in.the.kernel@gmail.com> wrote:

> On 04/19/2017 07:33 PM, Steven Rostedt wrote:
> > On Wed, 19 Apr 2017 16:27:10 -0700
> > Tyrel Datwyler <turtle.in.the.kernel@gmail.com> wrote:
> >   
> >> # echo stacktrace > /sys/kernel/debug/tracing/trace_options
> >> # cat trace | grep -A6 "/pci@800000020000018"  
> > 
> > Just to let you know that there is now stacktrace event triggers, where
> > you don't need to stacktrace all events, you can pick and choose. And
> > even filter the stack trace on specific fields of the event.  
> 
> This is great, and I did figure that out this afternoon. One thing I was
> still trying to determine though was whether its possible to set these
> triggers at boot? As far as I could tell I'm still limited to
> "trace_options=stacktrace" as a kernel boot parameter to get the stack
> for event tracepoints.

No not yet. But I'll add that to the todo list.

Thanks,

-- Steve
diff mbox

Patch

diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index 888fdbc..85c0966 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -16,6 +16,9 @@ 
 
 #include "of_private.h"
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/of.h>
+
 /**
  * of_node_get() - Increment refcount of a node
  * @node:	Node to inc refcount, NULL is supported to simplify writing of
@@ -25,8 +28,10 @@ 
  */
 struct device_node *of_node_get(struct device_node *node)
 {
-	if (node)
+	if (node) {
 		kobject_get(&node->kobj);
+		trace_of_node_get(refcount_read(&node->kobj.kref.refcount), node->full_name);
+	}
 	return node;
 }
 EXPORT_SYMBOL(of_node_get);
@@ -38,8 +43,10 @@  struct device_node *of_node_get(struct device_node *node)
  */
 void of_node_put(struct device_node *node)
 {
-	if (node)
+	if (node) {
+		trace_of_node_put(refcount_read(&node->kobj.kref.refcount) - 1, node->full_name);
 		kobject_put(&node->kobj);
+	}
 }
 EXPORT_SYMBOL(of_node_put);
 
@@ -92,24 +99,9 @@  int of_reconfig_notifier_unregister(struct notifier_block *nb)
 int of_reconfig_notify(unsigned long action, struct of_reconfig_data *p)
 {
 	int rc;
-#ifdef DEBUG
-	struct of_reconfig_data *pr = p;
 
-	switch (action) {
-	case OF_RECONFIG_ATTACH_NODE:
-	case OF_RECONFIG_DETACH_NODE:
-		pr_debug("notify %-15s %s\n", action_names[action],
-			pr->dn->full_name);
-		break;
-	case OF_RECONFIG_ADD_PROPERTY:
-	case OF_RECONFIG_REMOVE_PROPERTY:
-	case OF_RECONFIG_UPDATE_PROPERTY:
-		pr_debug("notify %-15s %s:%s\n", action_names[action],
-			pr->dn->full_name, pr->prop->name);
-		break;
+	trace_of_reconfig_notify(action, p);
 
-	}
-#endif
 	rc = blocking_notifier_call_chain(&of_reconfig_chain, action, p);
 	return notifier_to_errno(rc);
 }
@@ -326,6 +318,8 @@  void of_node_release(struct kobject *kobj)
 	struct device_node *node = kobj_to_device_node(kobj);
 	struct property *prop = node->properties;
 
+	trace_of_node_release(node);
+
 	/* We should never be releasing nodes that haven't been detached. */
 	if (!of_node_check_flag(node, OF_DETACHED)) {
 		pr_err("ERROR: Bad of_node_put() on %s\n", node->full_name);
diff --git a/include/trace/events/of.h b/include/trace/events/of.h
new file mode 100644
index 0000000..0d53271
--- /dev/null
+++ b/include/trace/events/of.h
@@ -0,0 +1,93 @@ 
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM of
+
+#if !defined(_TRACE_OF_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_OF_H
+
+#include <linux/of.h>
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(of_node_ref_template,
+
+	TP_PROTO(int refcount, const char* dn_name),
+
+	TP_ARGS(refcount, dn_name),
+
+	TP_STRUCT__entry(
+		__string(dn_name, dn_name)
+		__field(int, refcount)
+	),
+
+	TP_fast_assign(
+		__assign_str(dn_name, dn_name);
+		__entry->refcount = refcount;
+	),
+
+	TP_printk("refcount=%d, dn->full_name=%s",
+		  __entry->refcount, __get_str(dn_name))
+);
+
+DEFINE_EVENT(of_node_ref_template, of_node_get,
+	     TP_PROTO(int refcount, const char* dn_name),
+	     TP_ARGS(refcount, dn_name));
+
+DEFINE_EVENT(of_node_ref_template, of_node_put,
+	     TP_PROTO(int refcount, const char* dn_name),
+	     TP_ARGS(refcount, dn_name));
+
+TRACE_EVENT(of_node_release,
+
+	TP_PROTO(struct device_node *dn),
+
+	TP_ARGS(dn),
+
+	TP_STRUCT__entry(
+		__string(dn_name, dn->full_name)
+		__field(unsigned long, flags)
+	),
+
+	TP_fast_assign(
+		__assign_str(dn_name, dn->full_name);
+		__entry->flags = dn->_flags;
+	),
+
+	TP_printk("dn->full_name=%s, dn->_flags=%lu", 
+		  __get_str(dn_name), __entry->flags)
+);
+
+#define of_reconfig_action_names \
+	{OF_RECONFIG_ATTACH_NODE, "ATTACH_NODE"}, \
+	{OF_RECONFIG_DETACH_NODE, "DETACH_NODE"}, \
+	{OF_RECONFIG_ADD_PROPERTY, "ADD_PROPERTY"}, \
+	{OF_RECONFIG_REMOVE_PROPERTY, "REMOVE_PROPERTY"}, \
+	{OF_RECONFIG_UPDATE_PROPERTY, "UPDATE_PROPERTY"}
+
+TRACE_EVENT(of_reconfig_notify,
+
+	TP_PROTO(unsigned long action, struct of_reconfig_data *ord),
+
+	TP_ARGS(action, ord),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, action)
+		__string(dn_name, ord->dn->full_name)
+		__string(prop_name, ord->prop ? ord->prop->name : "null")
+		__string(oldprop_name, ord->old_prop ? ord->old_prop->name : "null")
+	),
+
+	TP_fast_assign(
+		__entry->action = action;
+		__assign_str(dn_name, ord->dn->full_name);
+		__assign_str(prop_name, ord->prop ? ord->prop->name : "null");
+		__assign_str(oldprop_name, ord->old_prop ? ord->old_prop->name : "null");
+	),
+
+	TP_printk("action=%s, dn->full_name=%s, prop->name=%s, old_prop->name=%s",
+		  __print_symbolic(__entry->action, of_reconfig_action_names),
+		  __get_str(dn_name), __get_str(prop_name), __get_str(oldprop_name))
+);
+
+#endif /*	_TRACE_OF_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>