mbox series

[v4,0/6] Coresight: Support panic kdump

Message ID 1522379724-30648-1-git-send-email-leo.yan@linaro.org
Headers show
Series Coresight: Support panic kdump | expand

Message

Leo Yan March 30, 2018, 3:15 a.m. UTC
This patch set is to explore Coresight tracing data for postmortem
debugging.  When kernel panic happens, the Coresight panic kdump can
help to save on-chip tracing data and tracer metadata into DRAM, later
relies on kdump and crash/perf tools to recovery tracing data for
"offline" analysis.

The documentation is important to understand the purpose of Coresight
panic kdump, the implementation of framework and usage. Patches 0001
and patch 0002 are used for creating new sub directory for placing
Coresight docs and add a new doc for Coresight panic kdump.

Patch 0003 introduces the simple panic kdump framework which provides
helper functions can be used by Coresight devices, and it registers
panic notifier for dump tracing data.

Patches 0004/0005 support panic kdump for ETB; Patch 0006 supports the
kdump for ETMv4.

This patch set has been reworked by following suggestions at Linaro
HKG18 connect (mainly suggestions from Mathieu, thanks a lot!), and
it's rebased on acme git tree [1] with last commit 109d59b900e7 ('perf
vendor events s390: Add JSON files for IBM z14').

Due Coresight kdump data structure has been changed significantly, the
corresponding crash extension program also has been updated for this
reason [2]; furthermore the crash extension program is updated to
dynamically generate kernel buildid according to vmlinux elf info [3],
this is a fixing for the old code which uses hard-coded buildid value.

This patch set has been verified on 96boards Hikey620 with Coresight
enabling by the sysFS interface.  Also the updated crash extension
program has been verified to cowork with Coresight panic kdump and it
successfully extracts tracing data from the vmcore and finally can be
decoded by perf tool.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
[2] https://git.linaro.org/people/leo.yan/crash.git/tree/extensions/csdump.c
[3] https://git.linaro.org/people/leo.yan/crash.git/tree/extensions/csdump_buildid.c

Changes from v3:
* Following Mathieu suggestion, reworked the panic kdump framework,
  used kdump array to maintain source and sink device handlers;
* According to Mathieu suggestion, optimized panic notifier to
  firstly dump panic CPU tracing data and then dump other CPUs tracing
  data;
* Refined doc to reflect these implementation changes;
* Changed ETMv4 driver to add source device handler at probe phase;
* Refactored crash extension program to reflect kernel changes.

Changes from v2:
* Add the two patches for documentation.
* Following Mathieu suggestion, reworked the panic kdump framework,
  removed the useless flag "PRE_PANIC".
* According to comment, changed to add and delete kdump node operations
  in sink enable/disable functions;
* According to Mathieu suggestion, handle kdump node
  addition/deletion/updating separately for sysFS interface and perf
  method.

Changes from v1:
* Add support to dump ETMv4 meta data.
* Wrote 'crash' extension csdump.so so rely on it to generate 'perf'
  format compatible file.
* Refactored panic dump driver to support pre & post panic dump.

Changes from RFC:
* Follow Mathieu's suggestion, use general framework to support dump
  functionality.
* Changed to use perf to analyse trace data.

Leo Yan (6):
  doc: Add Coresight documentation directory
  doc: Add documentation for Coresight panic kdump
  coresight: Support panic kdump functionality
  coresight: tmc: Hook callback for panic kdump
  coresight: Set and clear sink device handler for kdump node
  coresight: etm4x: Support panic kdump

 Documentation/trace/coresight-cpu-debug.txt        | 187 ----------
 Documentation/trace/coresight.txt                  | 383 ---------------------
 .../trace/coresight/coresight-cpu-debug.txt        | 187 ++++++++++
 .../trace/coresight/coresight-panic-kdump.txt      | 130 +++++++
 Documentation/trace/coresight/coresight.txt        | 383 +++++++++++++++++++++
 MAINTAINERS                                        |   5 +-
 drivers/hwtracing/coresight/Kconfig                |   9 +
 drivers/hwtracing/coresight/Makefile               |   1 +
 drivers/hwtracing/coresight/coresight-etm-perf.c   |   5 +
 drivers/hwtracing/coresight/coresight-etm4x.c      |  27 ++
 drivers/hwtracing/coresight/coresight-etm4x.h      |  15 +
 .../hwtracing/coresight/coresight-panic-kdump.c    | 199 +++++++++++
 drivers/hwtracing/coresight/coresight-priv.h       |  12 +
 drivers/hwtracing/coresight/coresight-tmc-etf.c    |  30 ++
 drivers/hwtracing/coresight/coresight.c            |  16 +-
 include/linux/coresight.h                          |   4 +
 16 files changed, 1019 insertions(+), 574 deletions(-)
 delete mode 100644 Documentation/trace/coresight-cpu-debug.txt
 delete mode 100644 Documentation/trace/coresight.txt
 create mode 100644 Documentation/trace/coresight/coresight-cpu-debug.txt
 create mode 100644 Documentation/trace/coresight/coresight-panic-kdump.txt
 create mode 100644 Documentation/trace/coresight/coresight.txt
 create mode 100644 drivers/hwtracing/coresight/coresight-panic-kdump.c

Comments

Mathieu Poirier April 2, 2018, 4:40 p.m. UTC | #1
Hi Leo,

Please see below (and in upcoming patches) my comments related to your latest
work.

Thanks,
Mathieu

On Fri, Mar 30, 2018 at 11:15:18AM +0800, Leo Yan wrote:
> This patch set is to explore Coresight tracing data for postmortem
> debugging.  When kernel panic happens, the Coresight panic kdump can
> help to save on-chip tracing data and tracer metadata into DRAM, later
> relies on kdump and crash/perf tools to recovery tracing data for
> "offline" analysis.
> 
> The documentation is important to understand the purpose of Coresight
> panic kdump, the implementation of framework and usage. Patches 0001
> and patch 0002 are used for creating new sub directory for placing
> Coresight docs and add a new doc for Coresight panic kdump.
> 
> Patch 0003 introduces the simple panic kdump framework which provides
> helper functions can be used by Coresight devices, and it registers
> panic notifier for dump tracing data.
> 
> Patches 0004/0005 support panic kdump for ETB; Patch 0006 supports the
> kdump for ETMv4.
> 
> This patch set has been reworked by following suggestions at Linaro
> HKG18 connect (mainly suggestions from Mathieu, thanks a lot!), and
> it's rebased on acme git tree [1] with last commit 109d59b900e7 ('perf
> vendor events s390: Add JSON files for IBM z14').
> 
> Due Coresight kdump data structure has been changed significantly, the
> corresponding crash extension program also has been updated for this
> reason [2]; furthermore the crash extension program is updated to
> dynamically generate kernel buildid according to vmlinux elf info [3],
> this is a fixing for the old code which uses hard-coded buildid value.
> 
> This patch set has been verified on 96boards Hikey620 with Coresight
> enabling by the sysFS interface.  Also the updated crash extension
> program has been verified to cowork with Coresight panic kdump and it
> successfully extracts tracing data from the vmcore and finally can be
> decoded by perf tool.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
> [2] https://git.linaro.org/people/leo.yan/crash.git/tree/extensions/csdump.c
> [3] https://git.linaro.org/people/leo.yan/crash.git/tree/extensions/csdump_buildid.c
> 
> Changes from v3:
> * Following Mathieu suggestion, reworked the panic kdump framework,
>   used kdump array to maintain source and sink device handlers;
> * According to Mathieu suggestion, optimized panic notifier to
>   firstly dump panic CPU tracing data and then dump other CPUs tracing
>   data;
> * Refined doc to reflect these implementation changes;
> * Changed ETMv4 driver to add source device handler at probe phase;
> * Refactored crash extension program to reflect kernel changes.
> 
> Changes from v2:
> * Add the two patches for documentation.
> * Following Mathieu suggestion, reworked the panic kdump framework,
>   removed the useless flag "PRE_PANIC".
> * According to comment, changed to add and delete kdump node operations
>   in sink enable/disable functions;
> * According to Mathieu suggestion, handle kdump node
>   addition/deletion/updating separately for sysFS interface and perf
>   method.
> 
> Changes from v1:
> * Add support to dump ETMv4 meta data.
> * Wrote 'crash' extension csdump.so so rely on it to generate 'perf'
>   format compatible file.
> * Refactored panic dump driver to support pre & post panic dump.
> 
> Changes from RFC:
> * Follow Mathieu's suggestion, use general framework to support dump
>   functionality.
> * Changed to use perf to analyse trace data.
> 
> Leo Yan (6):
>   doc: Add Coresight documentation directory
>   doc: Add documentation for Coresight panic kdump
>   coresight: Support panic kdump functionality
>   coresight: tmc: Hook callback for panic kdump
>   coresight: Set and clear sink device handler for kdump node
>   coresight: etm4x: Support panic kdump
> 
>  Documentation/trace/coresight-cpu-debug.txt        | 187 ----------
>  Documentation/trace/coresight.txt                  | 383 ---------------------
>  .../trace/coresight/coresight-cpu-debug.txt        | 187 ++++++++++
>  .../trace/coresight/coresight-panic-kdump.txt      | 130 +++++++
>  Documentation/trace/coresight/coresight.txt        | 383 +++++++++++++++++++++

Please use the -M option with git format-patch in order to prevent the metrics
associated with the renaming of files to be tallied.

>  MAINTAINERS                                        |   5 +-
>  drivers/hwtracing/coresight/Kconfig                |   9 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  drivers/hwtracing/coresight/coresight-etm-perf.c   |   5 +
>  drivers/hwtracing/coresight/coresight-etm4x.c      |  27 ++
>  drivers/hwtracing/coresight/coresight-etm4x.h      |  15 +
>  .../hwtracing/coresight/coresight-panic-kdump.c    | 199 +++++++++++
>  drivers/hwtracing/coresight/coresight-priv.h       |  12 +
>  drivers/hwtracing/coresight/coresight-tmc-etf.c    |  30 ++
>  drivers/hwtracing/coresight/coresight.c            |  16 +-
>  include/linux/coresight.h                          |   4 +
>  16 files changed, 1019 insertions(+), 574 deletions(-)
>  delete mode 100644 Documentation/trace/coresight-cpu-debug.txt
>  delete mode 100644 Documentation/trace/coresight.txt
>  create mode 100644 Documentation/trace/coresight/coresight-cpu-debug.txt
>  create mode 100644 Documentation/trace/coresight/coresight-panic-kdump.txt
>  create mode 100644 Documentation/trace/coresight/coresight.txt
>  create mode 100644 drivers/hwtracing/coresight/coresight-panic-kdump.c
> 
> -- 
> 2.7.4
>
Mathieu Poirier April 2, 2018, 8:22 p.m. UTC | #2
On Fri, Mar 30, 2018 at 11:15:21AM +0800, Leo Yan wrote:
> After kernel panic happens, Coresight tracing data has much useful info
> which can be used for analysis.  For example, the trace info from ETB
> RAM can be used to check the CPU execution flows before the crash.  So
> we can save the tracing data from sink devices, and rely on kdump to
> save DDR content and uses "crash" tool to extract Coresight dumping
> from the vmcore file.
> 
> This patch is to add a simple framework to support panic dump
> functionality; it registers panic notifier, and provide the helper
> functions coresight_kdump_source()/coresight_kdump_sink() so Coresight
> source and sink devices can be recorded into Coresight kdump node for
> kernel panic kdump.
> 
> When kernel panic happens, the notifier iterates dump array and invoke
> callback function to dump tracing data.  Later the tracing data can be
> used to reverse execution flow before the kernel panic.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/hwtracing/coresight/Kconfig                |   9 +
>  drivers/hwtracing/coresight/Makefile               |   1 +
>  .../hwtracing/coresight/coresight-panic-kdump.c    | 199 +++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-priv.h       |  12 ++
>  include/linux/coresight.h                          |   4 +
>  5 files changed, 225 insertions(+)
>  create mode 100644 drivers/hwtracing/coresight/coresight-panic-kdump.c
> 
> diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig
> index ef9cb3c..3089abf 100644
> --- a/drivers/hwtracing/coresight/Kconfig
> +++ b/drivers/hwtracing/coresight/Kconfig
> @@ -103,4 +103,13 @@ config CORESIGHT_CPU_DEBUG
>  	  properly, please refer Documentation/trace/coresight-cpu-debug.txt
>  	  for detailed description and the example for usage.
>  
> +config CORESIGHT_PANIC_KDUMP
> +	bool "CoreSight Panic Kdump driver"
> +	depends on ARM || ARM64
> +	help
> +	  This driver provides panic kdump functionality for CoreSight devices.
> +	  When kernel panic happen Coresight device supplied callback function

s/Coresight/CoreSight

> +	  is to dump trace data to memory. From then on, kdump can be used to
> +	  extract the trace data from kernel dump file.
> +
>  endif
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index 61db9dd..946fe19 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -18,3 +18,4 @@ obj-$(CONFIG_CORESIGHT_SOURCE_ETM4X) += coresight-etm4x.o \
>  obj-$(CONFIG_CORESIGHT_DYNAMIC_REPLICATOR) += coresight-dynamic-replicator.o
>  obj-$(CONFIG_CORESIGHT_STM) += coresight-stm.o
>  obj-$(CONFIG_CORESIGHT_CPU_DEBUG) += coresight-cpu-debug.o
> +obj-$(CONFIG_CORESIGHT_PANIC_KDUMP) += coresight-panic-kdump.o
> diff --git a/drivers/hwtracing/coresight/coresight-panic-kdump.c b/drivers/hwtracing/coresight/coresight-panic-kdump.c
> new file mode 100644
> index 0000000..f4589e9
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-panic-kdump.c
> @@ -0,0 +1,199 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2017~2018 Linaro Limited.

I don't remember if I commented on this before but the above line (not the SPDX)
should be enclosed with C style comments (/* */) rather than C++ (//).

I would also add a new line between the copyright statement and the header file
listing.

> +#include <linux/coresight.h>
> +#include <linux/coresight-pmu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/init.h>
> +#include <linux/list.h>
> +#include <linux/mm.h>
> +#include <linux/perf_event.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include "coresight-priv.h"
> +
> +/**
> + * struct coresight_kdump_node - Node information for dump
> + * @source_csdev:	Handler for source coresight device
> + * @sink_csdev:		Handler for sink coresight device
> + */
> +struct coresight_kdump_node {
> +	struct coresight_device *source_csdev;
> +	struct coresight_device *sink_csdev;
> +};
> +
> +static DEFINE_SPINLOCK(coresight_kdump_lock);
> +static struct coresight_kdump_node *coresight_kdump_nodes;
> +static struct notifier_block coresight_kdump_nb;
> +
> +/**
> + * coresight_kdump_source - Set source dump info for specific CPU
> + * @cpu:	CPU ID
> + * @csdev:	Source device structure handler
> + * @data:	Pointer for source device metadata buffer
> + * @data_sz:	Size of source device metadata buffer
> + *
> + * This function is a helper function which is used to set/clear source device
> + * handler and metadata when the tracer is enabled; and it can be used to clear
> + * source device related info when the tracer is disabled.
> + *
> + * Returns:	0 on success, negative errno otherwise.
> + */
> +int coresight_kdump_source(int cpu, struct coresight_device *csdev,
> +			   char *data, unsigned int data_sz)
> +{
> +	struct coresight_kdump_node *node;
> +	unsigned long flags;
> +
> +	if (!coresight_kdump_nodes)
> +		return -EPROBE_DEFER;

Before grabbing the lock you also need to make sure
@cpu is < num_possible_cpus().

> +
> +	spin_lock_irqsave(&coresight_kdump_lock, flags);
> +
> +	node = &coresight_kdump_nodes[cpu];
> +	node->source_csdev = csdev;
> +
> +	csdev->kdump_buf = data;
> +	csdev->kdump_buf_sz = data_sz;
> +
> +	spin_unlock_irqrestore(&coresight_kdump_lock, flags);
> +	return 0;
> +}
> +
> +/**
> + * coresight_kdump_sink - Set sink device handler for specific CPU
> + * @cpu:	CPU ID
> + * @csdev:	Sink device structure handler
> + *
> + * This function is a helper function which is used to set sink device handler
> + * when the Coresight path has been enabled for specific CPU; and it can be used
> + * to clear sink device handler when the path is disabled.
> + *
> + * Returns:	0 on success, negative errno otherwise.
> + */
> +int coresight_kdump_sink(int cpu, struct coresight_device *csdev)
> +{
> +	struct coresight_kdump_node *node;
> +	unsigned long flags;
> +
> +	if (!coresight_kdump_nodes)
> +		return -EPROBE_DEFER;

Same comment as above.

> +
> +	spin_lock_irqsave(&coresight_kdump_lock, flags);
> +
> +	node = &coresight_kdump_nodes[cpu];
> +	node->sink_csdev = csdev;

	csdev->kdump_buf = NULL;
        csdev->kdump_buf_sz = 0;

> +
> +	spin_unlock_irqrestore(&coresight_kdump_lock, flags);
> +	return 0;
> +}
> +
> +/**
> + * coresight_kdump_sink_cb - Invoke sink callback for specific CPU
> + * @cpu:	CPU ID
> + *
> + * This function is to invoke sink device corresponding callback.  It needs
> + * to check two cases: one case is the CPU has not been enabled for Coresight
> + * path so there totally has no trace data for the CPU, another case is the
> + * CPU shares the same sink device with other CPUs but the tracing data has
> + * been dumped by previous CPUs; it skips dump for these two cases.
> + */
> +static void coresight_kdump_sink_cb(int cpu)
> +{
> +	struct coresight_kdump_node *node;
> +	struct coresight_device *csdev;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&coresight_kdump_lock, flags);
> +
> +	node = &coresight_kdump_nodes[cpu];
> +	csdev = node->sink_csdev;
> +
> +	/* Path has not been enabled */
> +	if (!csdev)
> +		goto skip_dump;
> +
> +	/* Have been dumped by previous CPU */
> +	if (csdev->kdump_buf)

I would use csdev->kdump_buf_sz instead of csdev->kdump_buf.  The reason is that
the sink may not always use an internal SRAM buffer.  For instance the ETR uses
system RAM, either as a contiguous buffer or a scatter-gather list.  When the
panic callback for an ETR is implemented the code in the core (i.e this file)
need not change as kdump_buf_sz is independent of the way data is conveyed.

> +		goto skip_dump;
> +
> +	/* Invoke panic callback */
> +	csdev = coresight_kdump_nodes[cpu].sink_csdev;
> +	if (csdev && sink_ops(csdev)->panic_cb)
> +		sink_ops(csdev)->panic_cb(csdev);
> +
> +skip_dump:
> +	spin_unlock_irqrestore(&coresight_kdump_lock, flags);
> +}
> +
> +/**
> + * coresight_kdump_notify - Invoke panic dump callbacks
> + * @nb:		Pointer to notifier block
> + * @event:	Notification reason
> + * @_unused:	Pointer to notification data object, unused
> + *
> + * This function is called when panic happens to invoke dump callbacks, it takes
> + * panic CPU tracing data with high priority to firstly invoke panic CPU sink
> + * callback function, then the notifier iterates callback functions one by one
> + * for other CPUs.  If one sink device is shared among CPUs, the sink panic
> + * callback is invoked for the first traversed CPU node and other sequential
> + * CPUs are skipped.
> + *
> + * Returns:	0 on success.
> + */
> +static int coresight_kdump_notify(struct notifier_block *nb,
> +				  unsigned long event, void *_unused)
> +{
> +	int cpu, first;
> +
> +	/* Give panic CPU trace data with high priority */

I would replace the above comment with "Start with the panic'ed CPU".


> +	first = atomic_read(&panic_cpu);
> +	coresight_kdump_sink_cb(first);
> +
> +	/* Dump rest CPUs trace data */
> +	for (cpu = 0; cpu < num_possible_cpus(); cpu++) {
> +		if (cpu == first)
> +			continue;
> +
> +		coresight_kdump_sink_cb(cpu);
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * coresight_kdump_init - Coresight kdump module initialization
> + *
> + * This function allcoates dump array and register panic norifier.
> + *
> + * Returns:	0 on success, negative errno otherwise.
> + */
> +static int __init coresight_kdump_init(void)
> +{
> +	int ret;
> +
> +	coresight_kdump_nodes = kmalloc_array(num_possible_cpus(),
> +					      sizeof(*coresight_kdump_nodes),
> +					      GFP_KERNEL);
> +	if (!coresight_kdump_nodes) {
> +		pr_err("%s: kmalloc failed\n", __func__);
> +		return -ENOMEM;
> +	}
> +
> +	memset(coresight_kdump_nodes, 0,
> +	       num_possible_cpus() * sizeof(*coresight_kdump_nodes));

If you use kcalloc() above you don't need to explicitly zero out the memory.

> +
> +	coresight_kdump_nb.notifier_call = coresight_kdump_notify;
> +	ret = atomic_notifier_chain_register(&panic_notifier_list,
> +					     &coresight_kdump_nb);
> +	if (ret) {
> +		pr_err("%s: unable to register notifier: %d\n",
> +		       __func__, ret);
> +		kfree(coresight_kdump_nodes);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +postcore_initcall(coresight_kdump_init);
> diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
> index f1d0e21d..76d27d6 100644
> --- a/drivers/hwtracing/coresight/coresight-priv.h
> +++ b/drivers/hwtracing/coresight/coresight-priv.h
> @@ -151,4 +151,16 @@ static inline int etm_readl_cp14(u32 off, unsigned int *val) { return 0; }
>  static inline int etm_writel_cp14(u32 off, u32 val) { return 0; }
>  #endif
>  
> +#ifdef CONFIG_CORESIGHT_PANIC_KDUMP
> +extern int coresight_kdump_source(int cpu, struct coresight_device *csdev,
> +				  char *data, unsigned int data_sz);
> +extern int coresight_kdump_sink(int cpu, struct coresight_device *csdev);
> +#else
> +static inline int coresight_kdump_source(int cpu,
> +		struct coresight_device *csdev,
> +		char *data, unsigned int data_sz) { return 0; }
> +static inline void coresight_kdump_sink(int cpu,
> +		struct coresight_device *csdev) { return 0; }

To me the above is harder to read - I suggest:

static inline int
coresight_kdump_source(int cpu,
                       struct coresight_device *csdev,
                       char *data, unsigned int data_sz) { return 0; }
static inline void
coresight_kdump_sink(int cpu,
                     struct coresight_device *csdev) { return 0; }

> +#endif
> +
>  #endif
> diff --git a/include/linux/coresight.h b/include/linux/coresight.h
> index d950dad..89aad8d 100644
> --- a/include/linux/coresight.h
> +++ b/include/linux/coresight.h
> @@ -171,6 +171,8 @@ struct coresight_device {
>  	bool orphan;
>  	bool enable;	/* true only if configured as part of a path */
>  	bool activated;	/* true only if a sink is part of a path */
> +	char *kdump_buf;
> +	unsigned int kdump_buf_sz;

Please add structure documentation, the same way all the other fields in this
structure is.

>  };
>  
>  #define to_coresight_device(d) container_of(d, struct coresight_device, dev)
> @@ -189,6 +191,7 @@ struct coresight_device {
>   * @set_buffer:		initialises buffer mechanic before a trace session.
>   * @reset_buffer:	finalises buffer mechanic after a trace session.
>   * @update_buffer:	update buffer pointers after a trace session.
> + * @panic_cb:		hook function for panic notifier.
>   */
>  struct coresight_ops_sink {
>  	int (*enable)(struct coresight_device *csdev, u32 mode);
> @@ -205,6 +208,7 @@ struct coresight_ops_sink {
>  	void (*update_buffer)(struct coresight_device *csdev,
>  			      struct perf_output_handle *handle,
>  			      void *sink_config);
> +	void (*panic_cb)(void *data);
>  };
>  
>  /**
> -- 
> 2.7.4
>
Mathieu Poirier April 2, 2018, 8:36 p.m. UTC | #3
On Fri, Mar 30, 2018 at 11:15:22AM +0800, Leo Yan wrote:
> Since Coresight panic kdump functionality has been ready, this patch is
> to hook panic callback function for ETB/ETF driver.  The driver data
> structure has allocated a buffer when the session started, so simply
> save tracing data into this buffer when panic happens and update buffer
> related info for kdump.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/hwtracing/coresight/coresight-tmc-etf.c | 30 +++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> index e2513b7..d20d546 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
> @@ -504,6 +504,35 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
>  	CS_LOCK(drvdata->base);
>  }
>  
> +static void tmc_panic_cb(void *data)

I would call the function tmc_kdump_panic_cb()... That way there is absolutely
no confusion as to what it does.

> +{
> +	struct coresight_device *csdev = (struct coresight_device *)data;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	unsigned long flags;
> +
> +	if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETB &&
> +			 drvdata->config_type != TMC_CONFIG_TYPE_ETF))
> +		return;
> +
> +	if (drvdata->mode == CS_MODE_DISABLED)
> +		return;

This is racy - between the check and acquiring the spinlock someone may beat you
to it.

> +
> +	spin_lock_irqsave(&drvdata->spinlock, flags);

        if (drvdata->mode == CS_MODE_DISABLED)
                goto out;

        drvdata->mode = CS_MODE_DISABLED

> +
> +	CS_UNLOCK(drvdata->base);
> +
> +	tmc_flush_and_stop(drvdata);
> +	tmc_etb_dump_hw(drvdata);
> +
> +	CS_LOCK(drvdata->base);
> +
> +	/* Update buffer info for panic dump */
> +	csdev->kdump_buf = drvdata->buf;
> +	csdev->kdump_buf_sz = drvdata->len;

out:
> +
> +	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +}
> +
>  static const struct coresight_ops_sink tmc_etf_sink_ops = {
>  	.enable		= tmc_enable_etf_sink,
>  	.disable	= tmc_disable_etf_sink,
> @@ -512,6 +541,7 @@ static const struct coresight_ops_sink tmc_etf_sink_ops = {
>  	.set_buffer	= tmc_set_etf_buffer,
>  	.reset_buffer	= tmc_reset_etf_buffer,
>  	.update_buffer	= tmc_update_etf_buffer,
> +	.panic_cb	= tmc_panic_cb,
>  };
>  
>  static const struct coresight_ops_link tmc_etf_link_ops = {
> -- 
> 2.7.4
>
Mathieu Poirier April 2, 2018, 9:34 p.m. UTC | #4
On Fri, Mar 30, 2018 at 11:15:23AM +0800, Leo Yan wrote:
> If Coresight path is enabled for specific CPU, the sink device handler
> need to be set to kdump node; on the other hand we also need to clear
> sink device handler when path is disabled.
> 
> This patch sets sink devices handler for kdump node for two separate
> Coresight enabling modes: CS_MODE_SYSFS and CS_MODE_PERF; and clear the
> handler when Coresight is disabled.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/hwtracing/coresight/coresight-etm-perf.c |  5 +++++
>  drivers/hwtracing/coresight/coresight.c          | 16 ++++++++++++++--
>  2 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 8a0ad77..f8b159c 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -139,6 +139,8 @@ static void free_event_data(struct work_struct *work)
>  	for_each_cpu(cpu, mask) {
>  		if (!(IS_ERR_OR_NULL(event_data->path[cpu])))
>  			coresight_release_path(event_data->path[cpu]);
> +
> +		coresight_kdump_sink(cpu, NULL);
>  	}
>  
>  	kfree(event_data->path);
> @@ -238,6 +240,9 @@ static void *etm_setup_aux(int event_cpu, void **pages,
>  		event_data->path[cpu] = coresight_build_path(csdev, sink);
>  		if (IS_ERR(event_data->path[cpu]))
>  			goto err;
> +
> +		if (coresight_kdump_sink(cpu, sink))
> +			goto err;

I remember telling you to use free_event_data() and etm_setup_aux().  _Maybe_ it
made sense in the previous patchset but in this one it won't work.  We need to
reflect the current trace context, as such use etm_event_start() and
etm_event_stop().

In etm_event_start() call coresight_kdump_sink(cpu, sink) just before
source_ops(csdev)->enable().  Similarly call coresight_kdump_sink(cpu, NULL)
right after source_ops(csdev)->disable() in etm_event_stop().

Find me on IRC if you want more information on this.

>  	}
>  
>  	if (!sink_ops(sink)->alloc_buffer)
> diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
> index 389c4ba..483a1f7 100644
> --- a/drivers/hwtracing/coresight/coresight.c
> +++ b/drivers/hwtracing/coresight/coresight.c
> @@ -272,6 +272,7 @@ static int coresight_enable_source(struct coresight_device *csdev, u32 mode)
>  static bool coresight_disable_source(struct coresight_device *csdev)
>  {
>  	if (atomic_dec_return(csdev->refcnt) == 0) {
> +

This newline shouldn't be part of this set.

>  		if (source_ops(csdev)->disable)
>  			source_ops(csdev)->disable(csdev, NULL);
>  		csdev->enable = false;
> @@ -612,6 +613,13 @@ int coresight_enable(struct coresight_device *csdev)
>  	if (ret)
>  		goto err_source;
>  
> +	cpu = source_ops(csdev)->cpu_id(csdev);
> +
> +	/* Set sink device handler into kdump node */
> +	ret = coresight_kdump_sink(cpu, sink);
> +	if (ret)
> +		goto err_kdump;
> +

Call coresight_kdump_sink() just before coresight_enable_source().  That way if
there is a dump just after coresight_enable_source() is called we get the chance
of getting some traces in the dump file. 

>  	switch (subtype) {
>  	case CORESIGHT_DEV_SUBTYPE_SOURCE_PROC:
>  		/*
> @@ -621,7 +629,6 @@ int coresight_enable(struct coresight_device *csdev)
>  		 * be a single session per tracer (when working from sysFS)
>  		 * a per-cpu variable will do just fine.
>  		 */
> -		cpu = source_ops(csdev)->cpu_id(csdev);
>  		per_cpu(tracer_path, cpu) = path;
>  		break;
>  	case CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE:
> @@ -636,6 +643,9 @@ int coresight_enable(struct coresight_device *csdev)
>  	mutex_unlock(&coresight_mutex);
>  	return ret;
>  
> +err_kdump:
> +	coresight_disable_source(csdev);
> +
>  err_source:
>  	coresight_disable_path(path);
>  
> @@ -659,9 +669,10 @@ void coresight_disable(struct coresight_device *csdev)
>  	if (!csdev->enable || !coresight_disable_source(csdev))
>  		goto out;
>  
> +	cpu = source_ops(csdev)->cpu_id(csdev);
> +
>  	switch (csdev->subtype.source_subtype) {
>  	case CORESIGHT_DEV_SUBTYPE_SOURCE_PROC:
> -		cpu = source_ops(csdev)->cpu_id(csdev);
>  		path = per_cpu(tracer_path, cpu);
>  		per_cpu(tracer_path, cpu) = NULL;
>  		break;
> @@ -674,6 +685,7 @@ void coresight_disable(struct coresight_device *csdev)
>  		break;
>  	}
>  
> +	coresight_kdump_sink(cpu, NULL);
>  	coresight_disable_path(path);
>  	coresight_release_path(path);
>  
> -- 
> 2.7.4
>
Mathieu Poirier April 2, 2018, 9:57 p.m. UTC | #5
On Fri, Mar 30, 2018 at 11:15:24AM +0800, Leo Yan wrote:
> ETMv4 hardware information and configuration needs to be saved as
> metadata; the metadata format should be compatible with 'perf' tool and
> finally is used by tracing data decoder.  ETMv4 works as tracer per CPU,
> we cannot wait for gathering ETM info after CPU panic has happened in
> case there have CPU is locked up and can't response inter-processor
> interrupt for execution dump operations; so it's more reliable to gather
> tracer metadata when all of the CPUs are alive.
> 
> This patch saves ETMv4 metadata but with the different method for
> different registers.  Since values in TRCIDR{0, 1, 2, 8} and
> TRCAUTHSTATUS are read-only and won't change afterward, thus those
> registers values are filled into metadata structure when tracers are
> instantiated.  The configuration and control registers TRCCONFIGR and
> TRCTRACEIDR are dynamically configured, their values are recorded during
> tracer enabling phase.
> 
> To avoid unnecessary overload introduced by set/clear operations for
> updating kdump node, we only set ETMv4 metadata info for the
> corresponding kdump node at initialization and won't be cleared anymore.
> 
> Suggested-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  drivers/hwtracing/coresight/coresight-etm4x.c | 27 +++++++++++++++++++++++++++
>  drivers/hwtracing/coresight/coresight-etm4x.h | 15 +++++++++++++++
>  2 files changed, 42 insertions(+)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
> index cf364a5..88b1e19 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x.c
> @@ -288,6 +288,8 @@ static int etm4_enable(struct coresight_device *csdev,
>  	int ret;
>  	u32 val;
>  	struct etmv4_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> +	struct etmv4_config *config = &drvdata->config;
> +	struct etmv4_metadata *metadata = &drvdata->metadata;
>  
>  	val = local_cmpxchg(&drvdata->mode, CS_MODE_DISABLED, mode);
>  
> @@ -306,6 +308,10 @@ static int etm4_enable(struct coresight_device *csdev,
>  		ret = -EINVAL;
>  	}
>  
> +	/* Update tracer meta data after tracer configuration */
> +	metadata->trcconfigr = config->cfg;
> +	metadata->trctraceidr = drvdata->trcid;
> +
>  	/* The tracer didn't start */
>  	if (ret)
>  		local_set(&drvdata->mode, CS_MODE_DISABLED);
> @@ -438,6 +444,7 @@ static void etm4_init_arch_data(void *info)
>  	u32 etmidr4;
>  	u32 etmidr5;
>  	struct etmv4_drvdata *drvdata = info;
> +	struct etmv4_metadata *metadata = &drvdata->metadata;
>  
>  	/* Make sure all registers are accessible */
>  	etm4_os_unlock(drvdata);
> @@ -590,6 +597,16 @@ static void etm4_init_arch_data(void *info)
>  	drvdata->nrseqstate = BMVAL(etmidr5, 25, 27);
>  	/* NUMCNTR, bits[30:28] number of counters available for tracing */
>  	drvdata->nr_cntr = BMVAL(etmidr5, 28, 30);
> +
> +	/* Update metadata */
> +	metadata->magic = ETM4_METADATA_MAGIC;
> +	metadata->cpu = drvdata->cpu;
> +	metadata->trcidr0 = readl_relaxed(drvdata->base + TRCIDR0);
> +	metadata->trcidr1 = readl_relaxed(drvdata->base + TRCIDR1);
> +	metadata->trcidr2 = readl_relaxed(drvdata->base + TRCIDR2);
> +	metadata->trcidr8 = readl_relaxed(drvdata->base + TRCIDR8);
> +	metadata->trcauthstatus = readl_relaxed(drvdata->base + TRCAUTHSTATUS);
> +
>  	CS_LOCK(drvdata->base);
>  }
>  
> @@ -957,6 +974,7 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
>  	struct device *dev = &adev->dev;
>  	struct coresight_platform_data *pdata = NULL;
>  	struct etmv4_drvdata *drvdata;
> +	struct etmv4_metadata *metadata;
>  	struct resource *res = &adev->res;
>  	struct coresight_desc desc = { 0 };
>  	struct device_node *np = adev->dev.of_node;
> @@ -1027,6 +1045,15 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
>  		goto err_arch_supported;
>  	}
>  
> +	/* Set source device handler and metadata into kdump node */
> +	metadata = &drvdata->metadata;
> +	ret = coresight_kdump_source(drvdata->cpu, drvdata->csdev,
> +				     (char *)metadata, sizeof(*metadata));
> +	if (ret) {
> +		coresight_unregister(drvdata->csdev);
> +		goto err_arch_supported;
> +	}
> +
>  	ret = etm_perf_symlink(drvdata->csdev, true);
>  	if (ret) {
>  		coresight_unregister(drvdata->csdev);
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
> index b3b5ea7..08dc8b7 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x.h
> +++ b/drivers/hwtracing/coresight/coresight-etm4x.h
> @@ -198,6 +198,20 @@
>  #define ETM_EXLEVEL_NS_HYP		BIT(14)
>  #define ETM_EXLEVEL_NS_NA		BIT(15)
>  
> +#define ETM4_METADATA_MAGIC		0x4040404040404040ULL

Please dd a comment that this needs to be kept in sync with
__perf_cs_etmv4_magic in tools/perf/util/cs-etm.h.

> +
> +struct etmv4_metadata {
> +	u64 magic;
> +	u64 cpu;
> +	u64 trcconfigr;
> +	u64 trctraceidr;
> +	u64 trcidr0;
> +	u64 trcidr1;
> +	u64 trcidr2;
> +	u64 trcidr8;
> +	u64 trcauthstatus;
> +};
> +
>  /**
>   * struct etmv4_config - configuration information related to an ETMv4
>   * @mode:	Controls various modes supported by this ETM.
> @@ -393,6 +407,7 @@ struct etmv4_drvdata {
>  	bool				atbtrig;
>  	bool				lpoverride;
>  	struct etmv4_config		config;
> +	struct etmv4_metadata		metadata;

Structure documentation please.

>  };
>  
>  /* Address comparator access types */
> -- 
> 2.7.4
>