diff mbox

[2/4] perf: jevents: Program to convert JSON file to C style file

Message ID 1432080130-6678-3-git-send-email-sukadev@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Sukadev Bhattiprolu May 20, 2015, 12:02 a.m. UTC
From: Andi Kleen <ak@linux.intel.com>

This is a modified version of an earlier patch by Andi Kleen.

We expect architectures to describe the performance monitoring events
for each CPU in a corresponding JSON file, which look like:

	[
	{
	"EventCode": "0x00",
	"UMask": "0x01",
	"EventName": "INST_RETIRED.ANY",
	"BriefDescription": "Instructions retired from execution.",
	"PublicDescription": "Instructions retired from execution.",
	"Counter": "Fixed counter 1",
	"CounterHTOff": "Fixed counter 1",
	"SampleAfterValue": "2000003",
	"SampleAfterValue": "2000003",
	"MSRIndex": "0",
	"MSRValue": "0",
	"TakenAlone": "0",
	"CounterMask": "0",
	"Invert": "0",
	"AnyThread": "0",
	"EdgeDetect": "0",
	"PEBS": "0",
	"PRECISE_STORE": "0",
	"Errata": "null",
	"Offcore": "0"
	}
	]

We also expect the architectures to provide a mapping between individual
CPUs to their JSON files. Eg:

	GenuineIntel-6-1E,V1,/NHM-EP/NehalemEP_core_V1.json,core

which maps each CPU, identified by [vendor, family, model, version, type]
to a JSON file.

Given these files, the program, jevents::
	- locates all JSON files for the architecture,
	- parses each JSON file and generates a C-style "PMU-events table"
	  (pmu-events.c)
	- locates a mapfile for the architecture
	- builds a global table, mapping each model of CPU to the
	  corresponding PMU-events table.

The 'pmu-events.c' is generated when building perf and added to libperf.a.
The global table pmu_events_map[] table in this pmu-events.c will be used
in perf in a follow-on patch.

If the architecture does not have any JSON files or there is an error in
processing them, an empty mapping file is created. This would allow the
build of perf to proceed even if we are not able to provide aliases for
events.

The parser for JSON files allows parsing Intel style JSON event files. This
allows to use an Intel event list directly with perf. The Intel event lists
can be quite large and are too big to store in unswappable kernel memory.

The conversion from JSON to C-style is straight forward.  The parser knows
(very little) Intel specific information, and can be easily extended to
handle fields for other CPUs.

The parser code is partially shared with an independent parsing library,
which is 2-clause BSD licenced. To avoid any conflicts I marked those
files as BSD licenced too. As part of perf they become GPLv2.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>

v2: Address review feedback. Rename option to --event-files
v3: Add JSON example
v4: Update manpages.
v5: Don't remove dot in fixname. Fix compile error. Add include
	protection. Comment realloc.
v6: Include debug/util.h
v7: (Sukadev Bhattiprolu)
	Rebase to 4.0 and fix some conflicts.
v8: (Sukadev Bhattiprolu)
	Move jevents.[hc] to tools/perf/pmu-events/
	Rewrite to locate and process arch specific JSON and "map" files;
	and generate a C file.
	(Removed acked-by Namhyung Kim due to modest changes to patch)
	Compile the generated pmu-events.c and add the pmu-events.o to
	libperf.a
---
 tools/perf/Build                   |    1 +
 tools/perf/Makefile.perf           |    4 +-
 tools/perf/pmu-events/Build        |   38 ++
 tools/perf/pmu-events/README       |   67 ++++
 tools/perf/pmu-events/jevents.c    |  700 ++++++++++++++++++++++++++++++++++++
 tools/perf/pmu-events/jevents.h    |   17 +
 tools/perf/pmu-events/pmu-events.h |   39 ++
 7 files changed, 865 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/pmu-events/Build
 create mode 100644 tools/perf/pmu-events/README
 create mode 100644 tools/perf/pmu-events/jevents.c
 create mode 100644 tools/perf/pmu-events/jevents.h
 create mode 100644 tools/perf/pmu-events/pmu-events.h

Comments

Jiri Olsa May 22, 2015, 2:56 p.m. UTC | #1
On Tue, May 19, 2015 at 05:02:08PM -0700, Sukadev Bhattiprolu wrote:

SNIP

> +int main(int argc, char *argv[])
> +{
> +	int rc;
> +	int flags;

SNIP

> +
> +	rc = uname(&uts);
> +	if (rc < 0) {
> +		printf("%s: uname() failed: %s\n", argv[0], strerror(errno));
> +		goto empty_map;
> +	}
> +
> +	/* TODO: Add other flavors of machine type here */
> +	if (!strcmp(uts.machine, "ppc64"))
> +		arch = "powerpc";
> +	else if (!strcmp(uts.machine, "i686"))
> +		arch = "x86";
> +	else if (!strcmp(uts.machine, "x86_64"))
> +		arch = "x86";
> +	else {
> +		printf("%s: Unknown architecture %s\n", argv[0], uts.machine);
> +		goto empty_map;
> +	}

hum, wouldnt it be easier to pass the arch directly from the Makefile,
we should have it ready in the $(ARCH) variable..

jirka
Sukadev Bhattiprolu May 22, 2015, 5:25 p.m. UTC | #2
Jiri Olsa [jolsa@redhat.com] wrote:
| On Tue, May 19, 2015 at 05:02:08PM -0700, Sukadev Bhattiprolu wrote:
| 
| SNIP
| 
| > +int main(int argc, char *argv[])
| > +{
| > +	int rc;
| > +	int flags;
| 
| SNIP
| 
| > +
| > +	rc = uname(&uts);
| > +	if (rc < 0) {
| > +		printf("%s: uname() failed: %s\n", argv[0], strerror(errno));
| > +		goto empty_map;
| > +	}
| > +
| > +	/* TODO: Add other flavors of machine type here */
| > +	if (!strcmp(uts.machine, "ppc64"))
| > +		arch = "powerpc";
| > +	else if (!strcmp(uts.machine, "i686"))
| > +		arch = "x86";
| > +	else if (!strcmp(uts.machine, "x86_64"))
| > +		arch = "x86";
| > +	else {
| > +		printf("%s: Unknown architecture %s\n", argv[0], uts.machine);
| > +		goto empty_map;
| > +	}
| 
| hum, wouldnt it be easier to pass the arch directly from the Makefile,
| we should have it ready in the $(ARCH) variable..

Yes, I will do that and make all three args (arch, start_dir, output_file)
mandatory (jevents won't be run from command line often, it doesn't need
default args).
Namhyung Kim May 27, 2015, 1:54 p.m. UTC | #3
Hi Sukadev,

On Tue, May 19, 2015 at 05:02:08PM -0700, Sukadev Bhattiprolu wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> This is a modified version of an earlier patch by Andi Kleen.
> 
> We expect architectures to describe the performance monitoring events
> for each CPU in a corresponding JSON file, which look like:
> 
> 	[
> 	{
> 	"EventCode": "0x00",
> 	"UMask": "0x01",
> 	"EventName": "INST_RETIRED.ANY",
> 	"BriefDescription": "Instructions retired from execution.",
> 	"PublicDescription": "Instructions retired from execution.",
> 	"Counter": "Fixed counter 1",
> 	"CounterHTOff": "Fixed counter 1",
> 	"SampleAfterValue": "2000003",
> 	"SampleAfterValue": "2000003",
> 	"MSRIndex": "0",
> 	"MSRValue": "0",
> 	"TakenAlone": "0",
> 	"CounterMask": "0",
> 	"Invert": "0",
> 	"AnyThread": "0",
> 	"EdgeDetect": "0",
> 	"PEBS": "0",
> 	"PRECISE_STORE": "0",
> 	"Errata": "null",
> 	"Offcore": "0"
> 	}
> 	]
> 
> We also expect the architectures to provide a mapping between individual
> CPUs to their JSON files. Eg:
> 
> 	GenuineIntel-6-1E,V1,/NHM-EP/NehalemEP_core_V1.json,core
> 
> which maps each CPU, identified by [vendor, family, model, version, type]
> to a JSON file.
> 
> Given these files, the program, jevents::
> 	- locates all JSON files for the architecture,
> 	- parses each JSON file and generates a C-style "PMU-events table"
> 	  (pmu-events.c)
> 	- locates a mapfile for the architecture
> 	- builds a global table, mapping each model of CPU to the
> 	  corresponding PMU-events table.

So we build tables of all models in the architecture, and choose
matching one when compiling perf, right?  Can't we do that when
building the tables?  IOW, why don't we check the VFM and discard
non-matching tables?  Those non-matching tables are also needed?

Sorry if I missed something..


> 
> The 'pmu-events.c' is generated when building perf and added to libperf.a.
> The global table pmu_events_map[] table in this pmu-events.c will be used
> in perf in a follow-on patch.
> 
> If the architecture does not have any JSON files or there is an error in
> processing them, an empty mapping file is created. This would allow the
> build of perf to proceed even if we are not able to provide aliases for
> events.
> 
> The parser for JSON files allows parsing Intel style JSON event files. This
> allows to use an Intel event list directly with perf. The Intel event lists
> can be quite large and are too big to store in unswappable kernel memory.
> 
> The conversion from JSON to C-style is straight forward.  The parser knows
> (very little) Intel specific information, and can be easily extended to
> handle fields for other CPUs.
> 
> The parser code is partially shared with an independent parsing library,
> which is 2-clause BSD licenced. To avoid any conflicts I marked those
> files as BSD licenced too. As part of perf they become GPLv2.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> 
> v2: Address review feedback. Rename option to --event-files
> v3: Add JSON example
> v4: Update manpages.
> v5: Don't remove dot in fixname. Fix compile error. Add include
> 	protection. Comment realloc.
> v6: Include debug/util.h
> v7: (Sukadev Bhattiprolu)
> 	Rebase to 4.0 and fix some conflicts.
> v8: (Sukadev Bhattiprolu)
> 	Move jevents.[hc] to tools/perf/pmu-events/
> 	Rewrite to locate and process arch specific JSON and "map" files;
> 	and generate a C file.
> 	(Removed acked-by Namhyung Kim due to modest changes to patch)
> 	Compile the generated pmu-events.c and add the pmu-events.o to
> 	libperf.a
> ---

[SNIP]
> +/* Call func with each event in the json file */
> +int json_events(const char *fn,
> +	  int (*func)(void *data, char *name, char *event, char *desc),
> +	  void *data)
> +{
> +	int err = -EIO;
> +	size_t size;
> +	jsmntok_t *tokens, *tok;
> +	int i, j, len;
> +	char *map;
> +
> +	if (!fn)
> +		return -ENOENT;
> +
> +	tokens = parse_json(fn, &map, &size, &len);
> +	if (!tokens)
> +		return -EIO;
> +	EXPECT(tokens->type == JSMN_ARRAY, tokens, "expected top level array");
> +	tok = tokens + 1;
> +	for (i = 0; i < tokens->size; i++) {
> +		char *event = NULL, *desc = NULL, *name = NULL;
> +		struct msrmap *msr = NULL;
> +		jsmntok_t *msrval = NULL;
> +		jsmntok_t *precise = NULL;
> +		jsmntok_t *obj = tok++;
> +
> +		EXPECT(obj->type == JSMN_OBJECT, obj, "expected object");
> +		for (j = 0; j < obj->size; j += 2) {
> +			jsmntok_t *field, *val;
> +			int nz;
> +
> +			field = tok + j;
> +			EXPECT(field->type == JSMN_STRING, tok + j,
> +			       "Expected field name");
> +			val = tok + j + 1;
> +			EXPECT(val->type == JSMN_STRING, tok + j + 1,
> +			       "Expected string value");
> +
> +			nz = !json_streq(map, val, "0");
> +			if (match_field(map, field, nz, &event, val)) {
> +				/* ok */
> +			} else if (json_streq(map, field, "EventName")) {
> +				addfield(map, &name, "", "", val);
> +			} else if (json_streq(map, field, "BriefDescription")) {
> +				addfield(map, &desc, "", "", val);
> +				fixdesc(desc);
> +			} else if (json_streq(map, field, "PEBS") && nz) {
> +				precise = val;
> +			} else if (json_streq(map, field, "MSRIndex") && nz) {
> +				msr = lookup_msr(map, val);
> +			} else if (json_streq(map, field, "MSRValue")) {
> +				msrval = val;
> +			} else if (json_streq(map, field, "Errata") &&
> +				   !json_streq(map, val, "null")) {
> +				addfield(map, &desc, ". ",
> +					" Spec update: ", val);
> +			} else if (json_streq(map, field, "Data_LA") && nz) {
> +				addfield(map, &desc, ". ",
> +					" Supports address when precise",
> +					NULL);
> +			}

Wouldn't it be better split arch-specific fields and put them in
somewhere in arch directory?

> +			/* ignore unknown fields */
> +		}
> +		if (precise && !strstr(desc, "(Precise Event)")) {
> +			if (json_streq(map, precise, "2"))
> +				addfield(map, &desc, " ", "(Must be precise)",
> +						NULL);
> +			else
> +				addfield(map, &desc, " ",
> +						"(Precise event)", NULL);
> +		}
> +		if (msr != NULL)
> +			addfield(map, &event, ",", msr->pname, msrval);
> +		fixname(name);
> +		err = func(data, name, event, desc);
> +		free(event);
> +		free(desc);
> +		free(name);
> +		if (err)
> +			break;
> +		tok += j;
> +	}
> +	EXPECT(tok - tokens == len, tok, "unexpected objects at end");
> +	err = 0;
> +out_free:
> +	free_json(map, size, tokens);
> +	return err;
> +}

[SNIP]
> +static int process_mapfile(FILE *outfp, char *fpath)
> +{
> +	int n = 16384;
> +	FILE *mapfp;
> +	char *save;
> +	char *line, *p;
> +	int line_num;
> +	char *tblname;
> +
> +	printf("Processing mapfile %s\n", fpath);
> +
> +	line = malloc(n);
> +	if (!line)
> +		return -1;
> +
> +	mapfp = fopen(fpath, "r");
> +	if (!mapfp) {
> +		printf("Error %s opening %s\n", strerror(errno), fpath);
> +		return -1;
> +	}
> +
> +	print_mapping_table_prefix(outfp);
> +
> +	line_num = 0;
> +	while (1) {
> +		char *vfm, *version, *type, *fname;
> +
> +		line_num++;
> +		p = fgets(line, n, mapfp);
> +		if (!p)
> +			break;
> +
> +		if (line[0] == '#')
> +			continue;
> +
> +		if (line[strlen(line)-1] != '\n') {
> +			/* TODO Deal with lines longer than 16K */
> +			printf("Mapfile %s: line %d too long, aborting\n",
> +					fpath, line_num);
> +			return -1;
> +		}
> +		line[strlen(line)-1] = '\0';
> +
> +		vfm = strtok_r(p, ",", &save);
> +		version = strtok_r(NULL, ",", &save);
> +		fname = strtok_r(NULL, ",", &save);
> +		type = strtok_r(NULL, ",", &save);
> +
> +		tblname = file_name_to_table_name(fname);
> +		fprintf(outfp, "{\n");
> +		fprintf(outfp, "\t.vfm = \"%s\",\n", vfm);
> +		fprintf(outfp, "\t.version = \"%s\",\n", version);
> +		fprintf(outfp, "\t.type = \"%s\",\n", type);
> +
> +		/*
> +		 * CHECK: We can't use the type (eg "core") field in the
> +		 * table name. For us to do that, we need to somehow tweak
> +		 * the other caller of file_name_to_table(), process_json()
> +		 * to determine the type. process_json() file has no way
> +		 * of knowing these are "core" events unless file name has
> +		 * core in it. If filename has core in it, we can safely
> +		 * ignore the type field here also.
> +		 */
> +		fprintf(outfp, "\t.table = %s\n", tblname);
> +		fprintf(outfp, "},\n");
> +	}
> +
> +	print_mapping_table_suffix(outfp);
> +

You need to free 'line' for each return path..

Thanks,
Namhyung


> +	return 0;
> +}
Andi Kleen May 27, 2015, 2:40 p.m. UTC | #4
> So we build tables of all models in the architecture, and choose
> matching one when compiling perf, right?  Can't we do that when
> building the tables?  IOW, why don't we check the VFM and discard
> non-matching tables?  Those non-matching tables are also needed?

We build it for all cpus in an architecture, not all architectures.
So e.g. for an x86 binary power is not included, and vice versa.
It always includes all CPUs for a given architecture, so it's possible
to use the perf binary on other systems than just the one it was 
build on.

-andi
Namhyung Kim May 27, 2015, 2:59 p.m. UTC | #5
Hi Andi,

On Wed, May 27, 2015 at 11:40 PM, Andi Kleen <ak@linux.intel.com> wrote:
>> So we build tables of all models in the architecture, and choose
>> matching one when compiling perf, right?  Can't we do that when
>> building the tables?  IOW, why don't we check the VFM and discard
>> non-matching tables?  Those non-matching tables are also needed?
>
> We build it for all cpus in an architecture, not all architectures.
> So e.g. for an x86 binary power is not included, and vice versa.

OK.

> It always includes all CPUs for a given architecture, so it's possible
> to use the perf binary on other systems than just the one it was
> build on.

So it selects one at run-time not build-time, good.  But I worry about
the size of the intel tables.  How large are they?  Maybe we can make
it dynamic-loadable if needed..

Thanks,
Namhyung
Jiri Olsa May 28, 2015, 11:52 a.m. UTC | #6
On Wed, May 27, 2015 at 11:59:04PM +0900, Namhyung Kim wrote:
> Hi Andi,
> 
> On Wed, May 27, 2015 at 11:40 PM, Andi Kleen <ak@linux.intel.com> wrote:
> >> So we build tables of all models in the architecture, and choose
> >> matching one when compiling perf, right?  Can't we do that when
> >> building the tables?  IOW, why don't we check the VFM and discard
> >> non-matching tables?  Those non-matching tables are also needed?
> >
> > We build it for all cpus in an architecture, not all architectures.
> > So e.g. for an x86 binary power is not included, and vice versa.
> 
> OK.
> 
> > It always includes all CPUs for a given architecture, so it's possible
> > to use the perf binary on other systems than just the one it was
> > build on.
> 
> So it selects one at run-time not build-time, good.  But I worry about
> the size of the intel tables.  How large are they?  Maybe we can make
> it dynamic-loadable if needed..

just compiled Sukadev's new version with Andi's events list
and stripped binary size is:

[jolsa@krava perf]$ ls -l perf
-rwxrwxr-x 1 jolsa jolsa 2772640 May 28 13:49 perf


while perf on Arnaldo's perf/core is:

[jolsa@krava perf]$ ls -l perf
-rwxrwxr-x 1 jolsa jolsa 2334816 May 28 13:49 perf


seems not that bad

jirka
Ingo Molnar May 28, 2015, 12:09 p.m. UTC | #7
* Jiri Olsa <jolsa@redhat.com> wrote:

> On Wed, May 27, 2015 at 11:59:04PM +0900, Namhyung Kim wrote:
> > Hi Andi,
> > 
> > On Wed, May 27, 2015 at 11:40 PM, Andi Kleen <ak@linux.intel.com> wrote:
> > >> So we build tables of all models in the architecture, and choose
> > >> matching one when compiling perf, right?  Can't we do that when
> > >> building the tables?  IOW, why don't we check the VFM and discard
> > >> non-matching tables?  Those non-matching tables are also needed?
> > >
> > > We build it for all cpus in an architecture, not all architectures.
> > > So e.g. for an x86 binary power is not included, and vice versa.
> > 
> > OK.
> > 
> > > It always includes all CPUs for a given architecture, so it's possible
> > > to use the perf binary on other systems than just the one it was
> > > build on.
> > 
> > So it selects one at run-time not build-time, good.  But I worry about
> > the size of the intel tables.  How large are they?  Maybe we can make
> > it dynamic-loadable if needed..
> 
> just compiled Sukadev's new version with Andi's events list
> and stripped binary size is:
> 
> [jolsa@krava perf]$ ls -l perf
> -rwxrwxr-x 1 jolsa jolsa 2772640 May 28 13:49 perf
> 
> 
> while perf on Arnaldo's perf/core is:
> 
> [jolsa@krava perf]$ ls -l perf
> -rwxrwxr-x 1 jolsa jolsa 2334816 May 28 13:49 perf
> 
> seems not that bad

It's not bad at all.

Do you have a Git tree URI where I could take a look at its current state? A tree 
would be nice that has as many of these patches integrated as possible.

Thanks,

	Ingo
Ingo Molnar May 28, 2015, 1:07 p.m. UTC | #8
* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Jiri Olsa <jolsa@redhat.com> wrote:
> 
> > On Wed, May 27, 2015 at 11:59:04PM +0900, Namhyung Kim wrote:
> > > Hi Andi,
> > > 
> > > On Wed, May 27, 2015 at 11:40 PM, Andi Kleen <ak@linux.intel.com> wrote:
> > > >> So we build tables of all models in the architecture, and choose
> > > >> matching one when compiling perf, right?  Can't we do that when
> > > >> building the tables?  IOW, why don't we check the VFM and discard
> > > >> non-matching tables?  Those non-matching tables are also needed?
> > > >
> > > > We build it for all cpus in an architecture, not all architectures.
> > > > So e.g. for an x86 binary power is not included, and vice versa.
> > > 
> > > OK.
> > > 
> > > > It always includes all CPUs for a given architecture, so it's possible
> > > > to use the perf binary on other systems than just the one it was
> > > > build on.
> > > 
> > > So it selects one at run-time not build-time, good.  But I worry about
> > > the size of the intel tables.  How large are they?  Maybe we can make
> > > it dynamic-loadable if needed..
> > 
> > just compiled Sukadev's new version with Andi's events list
> > and stripped binary size is:
> > 
> > [jolsa@krava perf]$ ls -l perf
> > -rwxrwxr-x 1 jolsa jolsa 2772640 May 28 13:49 perf
> > 
> > 
> > while perf on Arnaldo's perf/core is:
> > 
> > [jolsa@krava perf]$ ls -l perf
> > -rwxrwxr-x 1 jolsa jolsa 2334816 May 28 13:49 perf
> > 
> > seems not that bad
> 
> It's not bad at all.
> 
> Do you have a Git tree URI where I could take a look at its current state? A 
> tree would be nice that has as many of these patches integrated as possible.

A couple of observations:

1)

The x86 JSON files are unnecessarily large, and for no good reason, for example:

 triton:~/tip/tools/perf/pmu-events/arch/x86> grep -h EdgeDetect * | sort | uniq -c
   5534         "EdgeDetect": "0",
     57         "EdgeDetect": "1",

it's ridiculous to repeat "EdgeDetect": "0" more than 5 thousand times, just so 
that in 57 cases we can say '1'. Those lines should be omitted, and the default 
value should be 0.

This would reduce the source code line count of the JSON files by 40% already:

 triton:~/tip/tools/perf/pmu-events/arch/x86> grep ': "0",' * | wc -l
 42127
 triton:~/tip/tools/perf/pmu-events/arch/x86> cat * | wc -l
 103702

And no, I don't care if manufacturers release crappy JSON files - they need to be 
fixed/stripped before applied to our source tree.

2)

Also, the JSON files should carry more high levelstructure than they do today. 
Let's take SandyBridge_core.json as an example: it defines 386 events, but they 
are all in a 'flat' hierarchy, which is almost impossible for all but the most 
expert users to overview.

So instead of this flat structure, there should at minimum be broad categorization 
of the various parts of the hardware they relate to: whether they relate to the 
branch predictor, memory caches, TLB caches, memory ops, offcore, decoders, 
execution units, FPU ops, etc., etc. - so that they can be queried via 'perf 
list'.

We don't just want the import the unstructured mess that these event files are - 
we want to turn them into real structure. We can still keep the messy vendor names 
as well, like IDQ.DSB_CYCLES, but we want to impose structure as well.

3)

There should be good 'perf list' visualization for these events: grouping, 
individual names, with a good interface to query details if needed. I.e. it should 
be possible to browse and discover events relevant to the CPU the tool is 
executing on.

Thanks,

	Ingo
Andi Kleen May 28, 2015, 3:39 p.m. UTC | #9
> So instead of this flat structure, there should at minimum be broad categorization 
> of the various parts of the hardware they relate to: whether they relate to the 
> branch predictor, memory caches, TLB caches, memory ops, offcore, decoders, 
> execution units, FPU ops, etc., etc. - so that they can be queried via 'perf 
> list'.

The categorization is generally on the stem name, which already works fine with
the existing perf list wildcard support. So for example you only want 
branches. 

perf list br*
...
  br_inst_exec.all_branches                         
       [Speculative and retired branches]
  br_inst_exec.all_conditional                      
       [Speculative and retired macro-conditional branches]
  br_inst_exec.all_direct_jmp                       
       [Speculative and retired macro-unconditional branches excluding calls and indirects]
  br_inst_exec.all_direct_near_call                 
       [Speculative and retired direct near calls]
  br_inst_exec.all_indirect_jump_non_call_ret       
       [Speculative and retired indirect branches excluding calls and returns]
  br_inst_exec.all_indirect_near_return             
       [Speculative and retired indirect return branches]
...

Or mid level cache events:

perf list l2*
...
  l2_l1d_wb_rqsts.all                               
       [Not rejected writebacks from L1D to L2 cache lines in any state]
  l2_l1d_wb_rqsts.hit_e                             
       [Not rejected writebacks from L1D to L2 cache lines in E state]
  l2_l1d_wb_rqsts.hit_m                             
       [Not rejected writebacks from L1D to L2 cache lines in M state]
  l2_l1d_wb_rqsts.miss                              
       [Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.)]
  l2_lines_in.all                                   
       [L2 cache lines filling L2]
...

There are some exceptions, but generally it works this way.

The stem could be put into a separate header, but it would seem redundant to me. 

> We don't just want the import the unstructured mess that these event files are - 
> we want to turn them into real structure. We can still keep the messy vendor names 
> as well, like IDQ.DSB_CYCLES, but we want to impose structure as well.

The vendor names directly map to the micro architecture, which is whole
point of the events. IDQ is a part of the CPU, and is described in the 
CPU manuals. One of the main motivations for adding event lists is to make
perf match to that documentation.

> 
> 3)
> 
> There should be good 'perf list' visualization for these events: grouping, 
> individual names, with a good interface to query details if needed. I.e. it should 
> be possible to browse and discover events relevant to the CPU the tool is 
> executing on.

I suppose we could change perf list to give the stem names as section headers
to make the long list a bit more readable.

Generally you need to have some knowledge of the micro architecture to use
these events. There is no way around that.

-Andi
Ingo Molnar May 29, 2015, 7:27 a.m. UTC | #10
* Andi Kleen <ak@linux.intel.com> wrote:

> > So instead of this flat structure, there should at minimum be broad categorization 
> > of the various parts of the hardware they relate to: whether they relate to the 
> > branch predictor, memory caches, TLB caches, memory ops, offcore, decoders, 
> > execution units, FPU ops, etc., etc. - so that they can be queried via 'perf 
> > list'.
> 
> The categorization is generally on the stem name, which already works fine with 
> the existing perf list wildcard support. So for example you only want branches.
>
> perf list br*
> ...
>   br_inst_exec.all_branches                         
>        [Speculative and retired branches]
>   br_inst_exec.all_conditional                      
>        [Speculative and retired macro-conditional branches]
>   br_inst_exec.all_direct_jmp                       
>        [Speculative and retired macro-unconditional branches excluding calls and indirects]
>   br_inst_exec.all_direct_near_call                 
>        [Speculative and retired direct near calls]
>   br_inst_exec.all_indirect_jump_non_call_ret       
>        [Speculative and retired indirect branches excluding calls and returns]
>   br_inst_exec.all_indirect_near_return             
>        [Speculative and retired indirect return branches]
> ...
> 
> Or mid level cache events:
> 
> perf list l2*
> ...
>   l2_l1d_wb_rqsts.all                               
>        [Not rejected writebacks from L1D to L2 cache lines in any state]
>   l2_l1d_wb_rqsts.hit_e                             
>        [Not rejected writebacks from L1D to L2 cache lines in E state]
>   l2_l1d_wb_rqsts.hit_m                             
>        [Not rejected writebacks from L1D to L2 cache lines in M state]
>   l2_l1d_wb_rqsts.miss                              
>        [Count the number of modified Lines evicted from L1 and missed L2. (Non-rejected WBs from the DCU.)]
>   l2_lines_in.all                                   
>        [L2 cache lines filling L2]
> ...
> 
> There are some exceptions, but generally it works this way.

You are missing my point in several ways:

1)

Firstly, there are _tons_ of 'exceptions' to the 'stem name' grouping, to the 
level that makes it unusable for high level grouping of events.

Here's the 'stem name' histogram on the SandyBridge event list:

  $ grep EventName pmu-events/arch/x86/SandyBridge_core.json  | cut -d\. -f1 | cut -d\" -f4 | cut -d\_ -f1 | sort | uniq -c | sort -n

      1 AGU
      1 BACLEARS
      1 EPT
      1 HW
      1 ICACHE
      1 INSTS
      1 PAGE
      1 ROB
      1 RS
      1 SQ
      2 ARITH
      2 DSB2MITE
      2 ILD
      2 LOAD
      2 LOCK
      2 LONGEST
      2 MISALIGN
      2 SIMD
      2 TLB
      3 CPL
      3 DSB
      3 INST
      3 INT
      3 LSD
      3 MACHINE
      4 CPU
      4 OTHER
      4 PARTIAL
      5 CYCLE
      5 ITLB
      6 LD
      7 L1D
      8 DTLB
     10 FP
     12 RESOURCE
     21 UOPS
     24 IDQ
     25 MEM
     37 BR
     37 L2
    131 OFFCORE

Out of 386 events. This grouping has the following severe problems:

  - that's 41 'stem name' groups, way too much as a first hop high level 
    structure. We want the kind of high level categorization I suggested:
    cache, decoding, branches, execution pipeline, memory events, vector unit 
    events - which broad categories exist in all CPUs and are microarchitecture 
    independent.

  - even these 'stem names' are mostly unstructured and unreadable. The two 
    examples you cited are the best case that are borderline readable, but they
    cover less than 20% of all events.

  - the 'stem name' concept is not even used consistently, the names are 
    essentially a random collection of Intel internal acronyms, which occasionally 
    match up with high level concepts. These vendor defined names have very poor 
    high level structure.

  - the 'stem names' are totally imbalanced: there's one 'super' category 'stem 
    name': OFFCORE_RESPONSE, with 131 events in it and then there are super small 
    groups in the list above. Not well suited to get a good overview about what 
    measurement capabilities the hardware has.

So forget about using 'stem names' as the high level structure. These events have 
no high level structure and we should provide that, instead of dumping 380+ events 
on the unsuspecting user.

2)

Secondly, categorization and higher level hieararchy should be used to keep the 
list manageable. The fact that if _you_ know what to search for you can list just 
a subset does not mean anything to the new user trying to discover events.

A simple 'perf list' should list the high level categories by default, with a 
count displayed that shows how many further events are within that category. 
(compacted tree output would be usable as well.)

> The stem could be put into a separate header, but it would seem redundant to me.

Higher level categories simply don't exist in these names in any usable form, so 
it has to be created. Just redundantly repeating the 'stem name' would be silly, 
as they are unusable for the purposes of high level categorization.

> > We don't just want the import the unstructured mess that these event files are 
> > - we want to turn them into real structure. We can still keep the messy vendor 
> > names as well, like IDQ.DSB_CYCLES, but we want to impose structure as well.
> 
> The vendor names directly map to the micro architecture, which is whole point of 
> the events. IDQ is a part of the CPU, and is described in the CPU manuals. One 
> of the main motivations for adding event lists is to make perf match to that 
> documentation.

Your argument is a logical fallacy: there is absolutely no conflict between also 
supporting quirky vendor names and also having good high level structure and 
naming, to make it all accessible to the first time user.

> > 3)
> > 
> > There should be good 'perf list' visualization for these events: grouping, 
> > individual names, with a good interface to query details if needed. I.e. it 
> > should be possible to browse and discover events relevant to the CPU the tool 
> > is executing on.
> 
> I suppose we could change perf list to give the stem names as section headers to 
> make the long list a bit more readable.

No, the 'stem names' are crap - instead we want to create sensible high level 
categories and want to categorize the events, I gave you a few ideas above and in 
the previous mail.

> Generally you need to have some knowledge of the micro architecture to use these 
> events. There is no way around that.

Here your argument again relies on a logical fallacy: there is absolutely no 
conflict between good high level structure, and the idea that you need to know 
about CPUs to make sense of hardware events that deal with fine internal details.

Also, you are denying the plain fact that the highest level categories _are_ 
largely microarchitecture independent: can you show me a single modern mainstream 
x86 CPU that doesn't have these broad high level categories:

  - CPU cache
  - memory accesses
  - decoding, branch execution
  - execution pipeline
  - FPU, vector units

?

There's none, and the reason is simple: the high level structure of CPUs is still 
dictated by basic physics, and physics is microarchitecture independent.

Lower level structure will inevitably be microarchitecture and sometimes even 
model specific - but that's absolutely no excuse to not have good high level 
structure.

So these are not difficult concepts at all, please make an honest effort at 
understanding then and responding to them, as properly addressing them is a 
must-have for this patch submission.

Thanks,

	Ingo
Andi Kleen May 31, 2015, 4:07 p.m. UTC | #11
Ok I did some scripting to add these topics you requested to the Intel JSON files,
and changed perf list to group events by them. 

I'll redirect any questions on their value to you.  
And I certainly hope this is the last of your "improvements" for now.

The updated event lists are available in

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/intel-json-files-3

The updated patches are available in 

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc perf/builtin-json-6

Also posted separately.

The output looks like this

% perf list
...
Cache:
  l1d.replacement                                   
       [L1D data line replacements]
  l1d_pend_miss.pending                             
       [L1D miss oustandings duration in cycles]
  l1d_pend_miss.pending_cycles                      
       [Cycles with L1D load Misses outstanding]
...
Floating point:
  fp_assist.any                                     
       [Cycles with any input/output SSE or FP assist]
  fp_assist.simd_input                              
       [Number of SIMD FP assists due to input values]
  fp_assist.simd_output                             
       [Number of SIMD FP assists due to Output values]
...
Memory:
  machine_clears.memory_ordering                    
       [Counts the number of machine clears due to memory order conflicts]
  mem_trans_retired.load_latency_gt_128             
       [Loads with latency value being above 128 (Must be precise)]
  mem_trans_retired.load_latency_gt_16              
       [Loads with latency value being above 16 (Must be precise)]
...
Pipeline:
  arith.fpu_div                                     
       [Divide operations executed]
  arith.fpu_div_active                              
       [Cycles when divider is busy executing divide operations]
  baclears.any                                      
       [Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct
        prediction and this is corrected by other branch handling mechanisms at the front end]


-Andi

P.S.: You may want to look up the definition of logical fallacy in wikipedia.
diff mbox

Patch

diff --git a/tools/perf/Build b/tools/perf/Build
index b77370e..40bffa0 100644
--- a/tools/perf/Build
+++ b/tools/perf/Build
@@ -36,6 +36,7 @@  CFLAGS_builtin-help.o      += $(paths)
 CFLAGS_builtin-timechart.o += $(paths)
 CFLAGS_perf.o              += -DPERF_HTML_PATH="BUILD_STR($(htmldir_SQ))" -include $(OUTPUT)PERF-VERSION-FILE
 
+libperf-y += pmu-events/
 libperf-y += util/
 libperf-y += arch/
 libperf-y += ui/
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index c43a205..d078c71 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -306,6 +306,8 @@  perf.spec $(SCRIPTS) \
 ifneq ($(OUTPUT),)
 %.o: $(OUTPUT)%.o
 	@echo "    # Redirected target $@ => $(OUTPUT)$@"
+pmu-events/%.o: $(OUTPUT)pmu-events/%.o
+	@echo "    # Redirected target $@ => $(OUTPUT)$@"
 util/%.o: $(OUTPUT)util/%.o
 	@echo "    # Redirected target $@ => $(OUTPUT)$@"
 bench/%.o: $(OUTPUT)bench/%.o
@@ -529,7 +531,7 @@  clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) .config-detected
-	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
+	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32 $(OUTPUT)pmu-events/jevents
 	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
 	$(python-clean)
diff --git a/tools/perf/pmu-events/Build b/tools/perf/pmu-events/Build
new file mode 100644
index 0000000..7a2aaaf
--- /dev/null
+++ b/tools/perf/pmu-events/Build
@@ -0,0 +1,38 @@ 
+.SUFFIXES:
+
+libperf-y += pmu-events.o
+
+JEVENTS =	$(OUTPUT)pmu-events/jevents
+JEVENTS_OBJS =	$(OUTPUT)pmu-events/json.o $(OUTPUT)pmu-events/jsmn.o \
+		$(OUTPUT)pmu-events/jevents.o
+
+PMU_EVENTS =	$(srctree)/tools/perf/pmu-events/
+
+all: $(OUTPUT)pmu-events.o
+
+$(OUTPUT)pmu-events/jevents: $(JEVENTS_OBJS)
+	$(call rule_mkdir)
+	$(CC) -o $@ $(JEVENTS_OBJS)
+
+#
+# Look for JSON files in $(PMU_EVENTS)/arch directory,
+# process them and create tables in $(PMU_EVENTS)/pmu-events.c
+#
+pmu-events/pmu-events.c: $(JEVENTS) FORCE
+	$(JEVENTS) $(PMU_EVENTS)/arch $(PMU_EVENTS)/pmu-events.c
+ 
+
+#
+# If we fail to build pmu-events.o, it could very well be due to
+# inconsistencies between the architecture's mapfile.csv and the
+# directory tree. If the compilation of the pmu-events.c generated
+# by jevents fails, create an "empty" mapping table in pmu-events.c
+# so the build of perf can succeed even if we are not able to use
+# the PMU event aliases.
+#
+
+clean:
+	rm -f $(JEVENTS_OBJS) $(JEVENTS) $(OUTPUT)pmu-events.o \
+		$(PMU_EVENTS)pmu-events.c
+
+FORCE:
diff --git a/tools/perf/pmu-events/README b/tools/perf/pmu-events/README
new file mode 100644
index 0000000..d9ed641
--- /dev/null
+++ b/tools/perf/pmu-events/README
@@ -0,0 +1,67 @@ 
+The contents of this directory allows users to specify PMU events
+in their CPUs by their symbolic names rather than raw event codes
+(see example below).
+
+
+The main program in this directory, is the 'jevents', which is built
+and executed _before_ the perf binary itself is built.
+
+The 'jevents' program tries to locate and process JSON files in the directory
+tree tools/perf/pmu-events/arch/xxx. 
+
+	- Regular files with .json extension in the name are assumed to be
+	  JSON files.
+
+	- Regular files with base name starting with 'mapfile' are assumed to
+	  be a CSV file that - maps a specific CPU to its set of PMU events.
+
+Directories are traversed but all other files are ignored.
+
+Using the JSON files and the mapfile, 'jevents' generates a C source file,
+'pmu-events.c', which encodes the two sets of tables:
+
+	- Set of 'PMU events tables' for all known CPUs in the architecture
+
+	- A 'mapping table' that maps a CPU ot its 'PMU events table'
+
+The 'pmu-events.h' has an extern declaration for the mapping table and the
+generated 'pmu-events.c' defines this table.
+
+After the 'pmu-events.c' is generated, it is compiled and the resulting
+'pmu-events.o' is added to 'libperf.a' which is then used by perf to process
+PMU event aliases. eg:
+
+	$ perf stat -e pm_1plus_ppc_cmpl sleep 1
+
+where pm_1plus_ppc_cmpl is a Power8 PMU event.
+
+In case of errors when processing files in the tools/perf/pmu-events/arch
+directory, 'jevents' tries to create an empty mapping file to allow perf
+build to succeed even if the PMU event aliases cannot be used.
+
+However some errors in processing may cause the perf build to fail.
+
+The mapfile format is expected to be:
+
+	VFM,Version,JSON_file_path_name,Type
+
+where:
+	Comma:
+		is the required field delimiter.
+
+	VFM:
+		represents vendor, family, model of the CPU. Architectures
+		can use a delimiter other than comma to further separate the
+		fields if they so choose. Architectures should implement the
+		function arch_pmu_events_match_cpu() and can use the
+		VFM, Version and Type fields to uniquely identify a CPU.
+
+	Version:
+		is the CPU version (PVR in case of Powerpc)
+
+	JSON_file_path_name:
+		is the pathname for the JSON file, relative to the directory
+		containing the mapfile.
+
+	Type:
+		indicates whether the events or "core" or "uncore" events.
diff --git a/tools/perf/pmu-events/jevents.c b/tools/perf/pmu-events/jevents.c
new file mode 100644
index 0000000..3afa6e9
--- /dev/null
+++ b/tools/perf/pmu-events/jevents.c
@@ -0,0 +1,700 @@ 
+#define  _XOPEN_SOURCE 500	/* needed for nftw() */
+
+/* Parse event JSON files */
+
+/*
+ * Copyright (c) 2014, Intel Corporation
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright notice,
+ * this list of conditions and the following disclaimer.
+ *
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+*/
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <errno.h>
+#include <string.h>
+#include <ctype.h>
+#include <unistd.h>
+#include <stdarg.h>
+#include <libgen.h>
+#include <dirent.h>
+#include <sys/utsname.h>
+#include <sys/time.h>			/* getrlimit */
+#include <sys/resource.h>		/* getrlimit */
+#include <ftw.h>
+#include <sys/stat.h>
+#include "jsmn.h"
+#include "json.h"
+#include "jevents.h"
+
+#ifndef  __maybe_unused
+#define __maybe_unused                  __attribute__((unused))
+#endif
+
+int verbose = 1;
+
+int eprintf(int level, int var, const char *fmt, ...)
+{
+
+	int ret;
+	va_list args;
+
+	if (var < level)
+		return 0;
+
+	va_start(args, fmt);
+
+	ret = vfprintf(stderr, fmt, args);
+
+	va_end(args);
+
+	return ret;
+}
+
+__attribute__((weak)) char *get_cpu_str(void)
+{
+	return NULL;
+}
+
+static void addfield(char *map, char **dst, const char *sep,
+		     const char *a, jsmntok_t *bt)
+{
+	unsigned len = strlen(a) + 1 + strlen(sep);
+	int olen = *dst ? strlen(*dst) : 0;
+	int blen = bt ? json_len(bt) : 0;
+	char *out;
+
+	out = realloc(*dst, len + olen + blen);
+	if (!out) {
+		/* Don't add field in this case */
+		return;
+	}
+	*dst = out;
+
+	if (!olen)
+		*(*dst) = 0;
+	else
+		strcat(*dst, sep);
+	strcat(*dst, a);
+	if (bt)
+		strncat(*dst, map + bt->start, blen);
+}
+
+static void fixname(char *s)
+{
+	for (; *s; s++)
+		*s = tolower(*s);
+}
+
+static void fixdesc(char *s)
+{
+	char *e = s + strlen(s);
+
+	/* Remove trailing dots that look ugly in perf list */
+	--e;
+	while (e >= s && isspace(*e))
+		--e;
+	if (*e == '.')
+		*e = 0;
+}
+
+static struct msrmap {
+	const char *num;
+	const char *pname;
+} msrmap[] = {
+	{ "0x3F6", "ldlat=" },
+	{ "0x1A6", "offcore_rsp=" },
+	{ "0x1A7", "offcore_rsp=" },
+	{ NULL, NULL }
+};
+
+static struct field {
+	const char *field;
+	const char *kernel;
+} fields[] = {
+	{ "EventCode",	"event=" },
+	{ "UMask",	"umask=" },
+	{ "CounterMask", "cmask=" },
+	{ "Invert",	"inv=" },
+	{ "AnyThread",	"any=" },
+	{ "EdgeDetect",	"edge=" },
+	{ "SampleAfterValue", "period=" },
+	{ NULL, NULL }
+};
+
+static void cut_comma(char *map, jsmntok_t *newval)
+{
+	int i;
+
+	/* Cut off everything after comma */
+	for (i = newval->start; i < newval->end; i++) {
+		if (map[i] == ',')
+			newval->end = i;
+	}
+}
+
+static int match_field(char *map, jsmntok_t *field, int nz,
+		       char **event, jsmntok_t *val)
+{
+	struct field *f;
+	jsmntok_t newval = *val;
+
+	for (f = fields; f->field; f++)
+		if (json_streq(map, field, f->field) && nz) {
+			cut_comma(map, &newval);
+			addfield(map, event, ",", f->kernel, &newval);
+			return 1;
+		}
+	return 0;
+}
+
+static struct msrmap *lookup_msr(char *map, jsmntok_t *val)
+{
+	jsmntok_t newval = *val;
+	static bool warned;
+	int i;
+
+	cut_comma(map, &newval);
+	for (i = 0; msrmap[i].num; i++)
+		if (json_streq(map, &newval, msrmap[i].num))
+			return &msrmap[i];
+	if (!warned) {
+		warned = true;
+		pr_err("Unknown MSR in event file %.*s\n",
+			json_len(val), map + val->start);
+	}
+	return NULL;
+}
+
+#define EXPECT(e, t, m) do { if (!(e)) {			\
+	jsmntok_t *loc = (t);					\
+	if (!(t)->start && (t) > tokens)			\
+		loc = (t) - 1;					\
+		pr_err("%s:%d: " m ", got %s\n", fn,		\
+			json_line(map, loc),			\
+			json_name(t));				\
+	goto out_free;						\
+} } while (0)
+
+static void print_events_table_prefix(FILE *fp, const char *tblname)
+{
+	fprintf(fp, "struct pmu_event %s[] = {\n", tblname);
+}
+
+static int print_events_table_entry(void *data, char *name, char *event,
+				    char *desc)
+{
+	FILE *outfp = data;
+	/*
+	 * TODO: Remove formatting chars after debugging to reduce
+	 *	 string lengths.
+	 */
+	fprintf(outfp, "{\n");
+
+	fprintf(outfp, "\t.name = \"%s\",\n", name);
+	fprintf(outfp, "\t.event = \"%s\",\n", event);
+	fprintf(outfp, "\t.desc = \"%s\",\n", desc);
+
+	fprintf(outfp, "},\n");
+
+	return 0;
+}
+
+static void print_events_table_suffix(FILE *outfp)
+{
+	fprintf(outfp, "{\n");
+
+	fprintf(outfp, "\t.name = 0,\n");
+	fprintf(outfp, "\t.event = 0,\n");
+	fprintf(outfp, "\t.desc = 0,\n");
+
+	fprintf(outfp, "},\n");
+	fprintf(outfp, "};\n");
+}
+
+/* Call func with each event in the json file */
+int json_events(const char *fn,
+	  int (*func)(void *data, char *name, char *event, char *desc),
+	  void *data)
+{
+	int err = -EIO;
+	size_t size;
+	jsmntok_t *tokens, *tok;
+	int i, j, len;
+	char *map;
+
+	if (!fn)
+		return -ENOENT;
+
+	tokens = parse_json(fn, &map, &size, &len);
+	if (!tokens)
+		return -EIO;
+	EXPECT(tokens->type == JSMN_ARRAY, tokens, "expected top level array");
+	tok = tokens + 1;
+	for (i = 0; i < tokens->size; i++) {
+		char *event = NULL, *desc = NULL, *name = NULL;
+		struct msrmap *msr = NULL;
+		jsmntok_t *msrval = NULL;
+		jsmntok_t *precise = NULL;
+		jsmntok_t *obj = tok++;
+
+		EXPECT(obj->type == JSMN_OBJECT, obj, "expected object");
+		for (j = 0; j < obj->size; j += 2) {
+			jsmntok_t *field, *val;
+			int nz;
+
+			field = tok + j;
+			EXPECT(field->type == JSMN_STRING, tok + j,
+			       "Expected field name");
+			val = tok + j + 1;
+			EXPECT(val->type == JSMN_STRING, tok + j + 1,
+			       "Expected string value");
+
+			nz = !json_streq(map, val, "0");
+			if (match_field(map, field, nz, &event, val)) {
+				/* ok */
+			} else if (json_streq(map, field, "EventName")) {
+				addfield(map, &name, "", "", val);
+			} else if (json_streq(map, field, "BriefDescription")) {
+				addfield(map, &desc, "", "", val);
+				fixdesc(desc);
+			} else if (json_streq(map, field, "PEBS") && nz) {
+				precise = val;
+			} else if (json_streq(map, field, "MSRIndex") && nz) {
+				msr = lookup_msr(map, val);
+			} else if (json_streq(map, field, "MSRValue")) {
+				msrval = val;
+			} else if (json_streq(map, field, "Errata") &&
+				   !json_streq(map, val, "null")) {
+				addfield(map, &desc, ". ",
+					" Spec update: ", val);
+			} else if (json_streq(map, field, "Data_LA") && nz) {
+				addfield(map, &desc, ". ",
+					" Supports address when precise",
+					NULL);
+			}
+			/* ignore unknown fields */
+		}
+		if (precise && !strstr(desc, "(Precise Event)")) {
+			if (json_streq(map, precise, "2"))
+				addfield(map, &desc, " ", "(Must be precise)",
+						NULL);
+			else
+				addfield(map, &desc, " ",
+						"(Precise event)", NULL);
+		}
+		if (msr != NULL)
+			addfield(map, &event, ",", msr->pname, msrval);
+		fixname(name);
+		err = func(data, name, event, desc);
+		free(event);
+		free(desc);
+		free(name);
+		if (err)
+			break;
+		tok += j;
+	}
+	EXPECT(tok - tokens == len, tok, "unexpected objects at end");
+	err = 0;
+out_free:
+	free_json(map, size, tokens);
+	return err;
+}
+
+static char *file_name_to_table_name(char *fname)
+{
+	unsigned int i, j;
+	int c;
+	int n = 1024;		/* use max variable length? */
+	char *tblname;
+	char *p;
+
+	tblname = malloc(n);
+	if (!tblname)
+		return NULL;
+
+	p = basename(fname);
+
+	memset(tblname, 0, n);
+
+	/* Ensure table name starts with an alphabetic char */
+	strcpy(tblname, "pme_");
+
+	n = strlen(fname) + strlen(tblname);
+	n = min(1024, n);
+
+	for (i = 0, j = strlen(tblname); i < strlen(fname); i++, j++) {
+		c = p[i];
+		if (isalnum(c) || c == '_')
+			tblname[j] = c;
+		else if (c == '-')
+			tblname[j] = '_';
+		else if (c == '.') {
+			tblname[j] = '\0';
+			break;
+		} else {
+			printf("Invalid character '%c' in file name %s\n",
+					c, p);
+			free(tblname);
+			return NULL;
+		}
+	}
+
+	return tblname;
+}
+
+static void print_mapping_table_prefix(FILE *outfp)
+{
+	fprintf(outfp, "struct pmu_events_map pmu_events_map[] = {\n");
+}
+
+static void print_mapping_table_suffix(FILE *outfp)
+{
+	/*
+	 * Print the terminating, NULL entry.
+	 */
+	fprintf(outfp, "{\n");
+	fprintf(outfp, "\t.vfm = 0,\n");
+	fprintf(outfp, "\t.version = 0,\n");
+	fprintf(outfp, "\t.type = 0,\n");
+	fprintf(outfp, "\t.table = 0,\n");
+	fprintf(outfp, "},\n");
+
+	/* and finally, the closing curly bracket for the struct */
+	fprintf(outfp, "};\n");
+}
+
+/*
+ * Process the JSON file @json_file and write a table of PMU events found in
+ * the JSON file to the outfp.
+ */
+static int process_json(FILE *outfp, const char *json_file)
+{
+	char *tblname;
+	int err;
+
+	/*
+	 * Drop file name suffix. Replace hyphens with underscores.
+	 * Fail if file name contains any alphanum characters besides
+	 * underscores.
+	 */
+	tblname = file_name_to_table_name((char *)json_file);
+	if (!tblname) {
+		printf("Error determining table name for %s\n", json_file);
+		return -1;
+	}
+
+	print_events_table_prefix(outfp, tblname);
+
+	err = json_events(json_file, print_events_table_entry, outfp);
+
+	if (err) {
+		printf("Translation failed\n");
+		_Exit(1);
+	}
+
+	print_events_table_suffix(outfp);
+
+	return 0;
+}
+
+static int process_mapfile(FILE *outfp, char *fpath)
+{
+	int n = 16384;
+	FILE *mapfp;
+	char *save;
+	char *line, *p;
+	int line_num;
+	char *tblname;
+
+	printf("Processing mapfile %s\n", fpath);
+
+	line = malloc(n);
+	if (!line)
+		return -1;
+
+	mapfp = fopen(fpath, "r");
+	if (!mapfp) {
+		printf("Error %s opening %s\n", strerror(errno), fpath);
+		return -1;
+	}
+
+	print_mapping_table_prefix(outfp);
+
+	line_num = 0;
+	while (1) {
+		char *vfm, *version, *type, *fname;
+
+		line_num++;
+		p = fgets(line, n, mapfp);
+		if (!p)
+			break;
+
+		if (line[0] == '#')
+			continue;
+
+		if (line[strlen(line)-1] != '\n') {
+			/* TODO Deal with lines longer than 16K */
+			printf("Mapfile %s: line %d too long, aborting\n",
+					fpath, line_num);
+			return -1;
+		}
+		line[strlen(line)-1] = '\0';
+
+		vfm = strtok_r(p, ",", &save);
+		version = strtok_r(NULL, ",", &save);
+		fname = strtok_r(NULL, ",", &save);
+		type = strtok_r(NULL, ",", &save);
+
+		tblname = file_name_to_table_name(fname);
+		fprintf(outfp, "{\n");
+		fprintf(outfp, "\t.vfm = \"%s\",\n", vfm);
+		fprintf(outfp, "\t.version = \"%s\",\n", version);
+		fprintf(outfp, "\t.type = \"%s\",\n", type);
+
+		/*
+		 * CHECK: We can't use the type (eg "core") field in the
+		 * table name. For us to do that, we need to somehow tweak
+		 * the other caller of file_name_to_table(), process_json()
+		 * to determine the type. process_json() file has no way
+		 * of knowing these are "core" events unless file name has
+		 * core in it. If filename has core in it, we can safely
+		 * ignore the type field here also.
+		 */
+		fprintf(outfp, "\t.table = %s\n", tblname);
+		fprintf(outfp, "},\n");
+	}
+
+	print_mapping_table_suffix(outfp);
+
+	return 0;
+}
+
+/*
+ * If we fail to locate/process JSON and map files, create a NULL mapping
+ * table. This would at least allow perf to build even if we can't find/use
+ * the aliases.
+ */
+static void create_empty_mapping(const char *output_file)
+{
+	FILE *outfp;
+
+	printf("Creating empty pmu_events_map[] table\n");
+
+	/* Unlink file to clear any partial writes to it */
+	unlink(output_file);
+
+	outfp = fopen(output_file, "a");
+	if (!outfp) {
+		perror("fopen()");
+		_Exit(1);
+	}
+
+	fprintf(outfp, "#include \"pmu-events.h\"\n");
+	print_mapping_table_prefix(outfp);
+	print_mapping_table_suffix(outfp);
+	fclose(outfp);
+}
+
+static int get_maxfds(void)
+{
+	struct rlimit rlim;
+
+	if (getrlimit(RLIMIT_NOFILE, &rlim) == 0)
+		return rlim.rlim_max;
+
+	return 512;
+}
+
+/*
+ * nftw() doesn't let us pass an argument to the processing function,
+ * so use a global variables.
+ */
+FILE *eventsfp;
+char *mapfile;
+
+static int process_one_file(const char *fpath, const struct stat *sb,
+				int typeflag __maybe_unused,
+				struct FTW *ftwbuf __maybe_unused)
+{
+	char *bname;
+
+	if (!S_ISREG(sb->st_mode))
+		return 0;
+
+	/*
+	 * Save the mapfile name for now. We will process mapfile
+	 * after processing all JSON files (so we can write out the
+	 * mapping table after all PMU events tables).
+	 *
+	 * Allow for optional .csv on mapfile name.
+	 *
+	 * TODO: Allow for multiple mapfiles? Punt for now.
+	 */
+	bname = basename((char *)fpath);
+	if (!strncmp(bname, "mapfile", 7)) {
+		if (mapfile) {
+			printf("Multiple mapfiles? Using %s, ignoring %s\n",
+					mapfile, fpath);
+		} else {
+			mapfile = strdup(fpath);
+		}
+		return 0;
+	}
+
+	/*
+	 * If the file name does not have a .json extension,
+	 * ignore it. It could be a readme.txt for instance.
+	 */
+	bname += strlen(bname) - 5;
+	if (strncmp(bname, ".json", 5)) {
+		printf("Ignoring file without .json suffix %s\n", fpath);
+		return 0;
+	}
+
+	/*
+	 * Assume all other files are JSON files.
+	 *
+	 * If mapfile refers to 'power7_core.json', we create a table
+	 * named 'power7_core'. Any inconsistencies between the mapfile
+	 * and directory tree could result in build failure due to table
+	 * names not being found.
+	 *
+	 * Atleast for now, be strict with processing JSON file names.
+	 * i.e. if JSON file name cannot be mapped to C-style table name,
+	 * fail.
+	 */
+	if (process_json(eventsfp, fpath)) {
+		printf("Error processing JSON file %s, ignoring all\n", fpath);
+		return -1;
+	}
+
+	return 0;
+}
+
+#ifndef PATH_MAX
+#define PATH_MAX	4096
+#endif
+
+/*
+ * Starting in directory 'start_dirname', find the "mapfile.csv" and
+ * the set of JSON files for this architecture.
+ *
+ * From each JSON file, create a C-style "PMU events table" from the
+ * JSON file (see struct pmu_event).
+ *
+ * From the mapfile, create a mapping between the CPU revisions and
+ * PMU event tables (see struct pmu_events_map).
+ *
+ * Write out the PMU events tables and the mapping table to pmu-event.c.
+ *
+ * If unable to process the JSON or arch files, create an empty mapping
+ * table so we can continue to build/use  perf even if we cannot use the
+ * PMU event aliases.
+ */
+int main(int argc, char *argv[])
+{
+	int rc;
+	int flags;
+	int maxfds;
+	const char *arch;
+	struct utsname uts;
+
+	char dirname[PATH_MAX];
+
+	const char *output_file = "pmu-events.c";
+	const char *start_dirname = "arch";
+
+	if (argc > 1)
+		start_dirname = argv[1];
+
+	if (argc > 2)
+		output_file = argv[2];
+
+	unlink(output_file);
+	eventsfp = fopen(output_file, "a");
+	if (!eventsfp) {
+		printf("%s Unable to create required file %s (%s)\n",
+				argv[0], output_file, strerror(errno));
+		_Exit(1);
+	}
+
+	rc = uname(&uts);
+	if (rc < 0) {
+		printf("%s: uname() failed: %s\n", argv[0], strerror(errno));
+		goto empty_map;
+	}
+
+	/* TODO: Add other flavors of machine type here */
+	if (!strcmp(uts.machine, "ppc64"))
+		arch = "powerpc";
+	else if (!strcmp(uts.machine, "i686"))
+		arch = "x86";
+	else if (!strcmp(uts.machine, "x86_64"))
+		arch = "x86";
+	else {
+		printf("%s: Unknown architecture %s\n", argv[0], uts.machine);
+		goto empty_map;
+	}
+
+	/* Include pmu-events.h first */
+	fprintf(eventsfp, "#include \"pmu-events.h\"\n");
+
+	sprintf(dirname, "%s/%s", start_dirname, arch);
+
+	/*
+	 * Treat symlinks of JSON files as regular files for now and create
+	 * separate tables for each symlink (presumably, each symlink refers
+	 * to specific version of the CPU).
+	 *
+	 * TODO: Maybe add another level of mapping if necessary to allow
+	 *	 several processor versions (i.e symlinks) share a table
+	 *	 of PMU events.
+	 */
+	maxfds = get_maxfds();
+	mapfile = NULL;
+	flags = FTW_DEPTH;
+	rc = nftw(dirname, process_one_file, maxfds, flags);
+	if (rc) {
+		printf("%s: Error walking file tree %s\n", argv[0], dirname);
+		goto empty_map;
+	}
+
+	if (!mapfile) {
+		printf("No CPU->JSON mapping?\n");
+		goto empty_map;
+	}
+
+	if (process_mapfile(eventsfp, mapfile)) {
+		printf("Error processing mapfile %s\n", mapfile);
+		goto empty_map;
+	}
+
+	return 0;
+
+empty_map:
+	fclose(eventsfp);
+	create_empty_mapping(output_file);
+	return 0;
+}
diff --git a/tools/perf/pmu-events/jevents.h b/tools/perf/pmu-events/jevents.h
new file mode 100644
index 0000000..996601f
--- /dev/null
+++ b/tools/perf/pmu-events/jevents.h
@@ -0,0 +1,17 @@ 
+#ifndef JEVENTS_H
+#define JEVENTS_H 1
+
+int json_events(const char *fn,
+		int (*func)(void *data, char *name, char *event, char *desc),
+		void *data);
+char *get_cpu_str(void);
+
+#ifndef min
+#define min(x, y) ({                            \
+	typeof(x) _min1 = (x);                  \
+	typeof(y) _min2 = (y);                  \
+	(void) (&_min1 == &_min2);              \
+	_min1 < _min2 ? _min1 : _min2; })
+#endif
+
+#endif
diff --git a/tools/perf/pmu-events/pmu-events.h b/tools/perf/pmu-events/pmu-events.h
new file mode 100644
index 0000000..a24faef
--- /dev/null
+++ b/tools/perf/pmu-events/pmu-events.h
@@ -0,0 +1,39 @@ 
+#ifndef PMU_EVENTS_H
+#define PMU_EVENTS_H
+
+/*
+ * Describe each PMU event. Each CPU has a table of these
+ * events.
+ */
+struct pmu_event {
+	const char *name;
+	const char *event;
+	const char *desc;
+};
+
+/*
+ *
+ * Map a CPU to its table of PMU events. The CPU is identified, in
+ * an arch-specific manner, in arch_pmu_events_match_cpu(), by one
+ * or more of the following attributes:
+ *
+ *	vendor, family, model, revision, type
+ *
+ * TODO: Split vfm into individual fields or leave it to architectures
+ *	 to split it with an alternate delimiter like hyphen in the
+ *	 mapfile?
+ */
+struct pmu_events_map {
+	const char *vfm;		/* vendor, family, model */
+	const char *version;
+	const char *type;		/* core, uncore etc */
+	struct pmu_event *table;
+};
+
+/*
+ * Global table mapping each known CPU for the architecture to its
+ * table of PMU-Events.
+ */
+extern struct pmu_events_map pmu_events_map[];
+
+#endif