diff mbox

[1/4] Add gcc-auto-profile script

Message ID 1459140266-6902-1-git-send-email-andi@firstfloor.org
State New
Headers show

Commit Message

Andi Kleen March 28, 2016, 4:44 a.m. UTC
From: Andi Kleen <ak@linux.intel.com>

Using autofdo is currently something difficult. It requires using the
model specific branches taken event, which differs on different CPUs.
The example shown in the manual requires a special patched version of
perf that is non standard, and also will likely not work everywhere.

This patch adds a new gcc-auto-profile script that figures out the
correct event and runs perf. The script is installed with on Linux systems.

Since maintaining the script would be somewhat tedious (needs changes
every time a new CPU comes out) I auto generated it from the online
Intel event database. The script to do that is in contrib and can be
rerun.

Right now there is no test if perf works in configure. This
would vary depending on the build and target system, and since
it currently doesn't work in virtualization and needs uptodate
kernel it may often fail in common distribution build setups.

So Linux just hardcodes installing the script, but it may fail at runtime.

This is needed to actually make use of autofdo in a generic way
in the build system and in the test suite.

So far the script is not installed.

gcc/:
2016-03-27  Andi Kleen  <ak@linux.intel.com>

	* doc/invoke.texi: Document gcc-auto-profile
	* gcc-auto-profile: Create.

contrib/:

2016-03-27  Andi Kleen  <ak@linux.intel.com>

	* gen_autofdo_event.py: New file to regenerate
	gcc-auto-profile.
---
 contrib/gen_autofdo_event.py | 155 +++++++++++++++++++++++++++++++++++++++++++
 gcc/doc/invoke.texi          |  31 +++++++--
 gcc/gcc-auto-profile         |  70 +++++++++++++++++++
 3 files changed, 251 insertions(+), 5 deletions(-)
 create mode 100755 contrib/gen_autofdo_event.py
 create mode 100755 gcc/gcc-auto-profile

Comments

Joseph Myers March 30, 2016, 6:05 p.m. UTC | #1
On Sun, 27 Mar 2016, Andi Kleen wrote:

> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
> 
> 	* gen_autofdo_event.py: New file to regenerate
> 	gcc-auto-profile.

It may not be required in contrib, but does this script work with both 
Python 2 and Python 3?  (New code that only works with Python 2 seems like 
a bad idea nowadays, with GNU/Linux distributions moving to having only 
Python 3 in a default install.)
Andi Kleen March 30, 2016, 6:47 p.m. UTC | #2
On Wed, Mar 30, 2016 at 06:05:00PM +0000, Joseph Myers wrote:
> On Sun, 27 Mar 2016, Andi Kleen wrote:
> 
> > 2016-03-27  Andi Kleen  <ak@linux.intel.com>
> > 
> > 	* gen_autofdo_event.py: New file to regenerate
> > 	gcc-auto-profile.
> 
> It may not be required in contrib, but does this script work with both 
> Python 2 and Python 3?  (New code that only works with Python 2 seems like 
> a bad idea nowadays, with GNU/Linux distributions moving to having only 
> Python 3 in a default install.)

Currently it's python 2. It could be run through the conversion tool,
but then you would lose older distributions which don't have 2.7.
As I understand the newer distributions always have options to install 2.7.

-andi
Andi Kleen April 16, 2016, 3:23 p.m. UTC | #3
Andi Kleen <andi@firstfloor.org> writes:

Ping for the patch series!

> From: Andi Kleen <ak@linux.intel.com>
>
> Using autofdo is currently something difficult. It requires using the
> model specific branches taken event, which differs on different CPUs.
> The example shown in the manual requires a special patched version of
> perf that is non standard, and also will likely not work everywhere.
>
> This patch adds a new gcc-auto-profile script that figures out the
> correct event and runs perf. The script is installed with on Linux systems.
>
> Since maintaining the script would be somewhat tedious (needs changes
> every time a new CPU comes out) I auto generated it from the online
> Intel event database. The script to do that is in contrib and can be
> rerun.
>
> Right now there is no test if perf works in configure. This
> would vary depending on the build and target system, and since
> it currently doesn't work in virtualization and needs uptodate
> kernel it may often fail in common distribution build setups.
>
> So Linux just hardcodes installing the script, but it may fail at runtime.
>
> This is needed to actually make use of autofdo in a generic way
> in the build system and in the test suite.
>
> So far the script is not installed.
>
> gcc/:
> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>
> 	* doc/invoke.texi: Document gcc-auto-profile
> 	* gcc-auto-profile: Create.
>
> contrib/:
>
> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>
> 	* gen_autofdo_event.py: New file to regenerate
> 	gcc-auto-profile.
> ---
>  contrib/gen_autofdo_event.py | 155 +++++++++++++++++++++++++++++++++++++++++++
>  gcc/doc/invoke.texi          |  31 +++++++--
>  gcc/gcc-auto-profile         |  70 +++++++++++++++++++
>  3 files changed, 251 insertions(+), 5 deletions(-)
>  create mode 100755 contrib/gen_autofdo_event.py
>  create mode 100755 gcc/gcc-auto-profile
>
> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
> new file mode 100755
> index 0000000..db4db33
> --- /dev/null
> +++ b/contrib/gen_autofdo_event.py
> @@ -0,0 +1,155 @@
> +#!/usr/bin/python
> +# generate Intel taken branches Linux perf event script for autofdo profiling
> +
> +# Copyright (C) 2016 Free Software Foundation, Inc.
> +#
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +#
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.  */
> +
> +# run it with perf record -b -e EVENT program ...
> +# The Linux Kernel needs to support the PMU of the current CPU, and
> +# it will likely not work in VMs.
> +# add --all to print for all cpus, otherwise for current cpu
> +# add --script to generate shell script to run correct event
> +#
> +# requires internet (https) access. this may require setting up a proxy
> +# with export https_proxy=...
> +#
> +import urllib2
> +import sys
> +import json
> +import argparse
> +import collections
> +
> +baseurl = "https://download.01.org/perfmon"
> +
> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
> +                 u'BR_INST_EXEC.TAKEN',
> +                 u'BR_INST_RETIRED.TAKEN_JCC',
> +                 u'BR_INST_TYPE_RETIRED.COND_TAKEN')
> +
> +ap = argparse.ArgumentParser()
> +ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
> +ap.add_argument('--script', help='Generate shell script', action='store_true')
> +args = ap.parse_args()
> +
> +eventmap = collections.defaultdict(list)
> +
> +def get_cpu_str():
> +    with open('/proc/cpuinfo', 'r') as c:
> +        vendor, fam, model = None, None, None
> +        for j in c:
> +            n = j.split()
> +            if n[0] == 'vendor_id':
> +                vendor = n[2]
> +            elif n[0] == 'model' and n[1] == ':':
> +                model = int(n[2])
> +            elif n[0] == 'cpu' and n[1] == 'family':
> +                fam = int(n[3])
> +            if vendor and fam and model:
> +                return "%s-%d-%X" % (vendor, fam, model), model
> +    return None, None
> +
> +def find_event(eventurl, model):
> +    print >>sys.stderr, "Downloading", eventurl
> +    u = urllib2.urlopen(eventurl)
> +    events = json.loads(u.read())
> +    u.close()
> +
> +    found = 0
> +    for j in events:
> +        if j[u'EventName'] in target_events:
> +            event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
> +            if u'PEBS' in j and j[u'PEBS'] > 0:
> +                event += "p"
> +            if args.script:
> +                eventmap[event].append(model)
> +            else:
> +                print j[u'EventName'], "event for model", model, "is", event
> +            found += 1
> +    return found
> +
> +if not args.all:
> +    cpu, model = get_cpu_str()
> +    if not cpu:
> +        sys.exit("Unknown CPU type")
> +
> +url = baseurl + "/mapfile.csv"
> +print >>sys.stderr, "Downloading", url
> +u = urllib2.urlopen(url)
> +found = 0
> +cpufound = 0
> +for j in u:
> +    n = j.rstrip().split(',')
> +    if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
> +        if args.all:
> +            vendor, fam, model = n[0].split("-")
> +            model = int(model, 16)
> +        cpufound += 1
> +        found += find_event(baseurl + n[2], model)
> +u.close()
> +
> +if args.script:
> +    print '''#!/bin/sh
> +# profile workload for gcc profile feedback (autofdo) using Linux perf
> +# auto generated. to regenerate for new CPUs run
> +# contrib/gen_autofdo_event.py --shell --all in gcc source
> +
> +# usages:
> +# gcc-auto-profile program             (profile program and children)
> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
> +# gcc-auto-profile -p PID sleep X      (profile PID)
> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
> +
> +# identify branches taken event for CPU
> +#
> +
> +FLAGS=u
> +
> +if [ "$1" = "--kernel" ] ; then
> +  FLAGS=k
> +  shift
> +fi
> +if [ "$1" == "--all" ] ; then
> +  FLAGS=uk
> +  shift
> +fi
> +
> +if ! grep -q Intel /proc/cpuinfo ] ; then
> +  echo >&2 "Only Intel CPUs supported"
> +  exit 1
> +fi
> +
> +if grep -q hypervisor /proc/cpuinfo ; then
> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
> +fi
> +
> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in'''
> +    for event, mod in eventmap.iteritems():
> +        for m in mod[:-1]:
> +            print "model*:\ %s|\\" % m
> +        print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
> +    print '''*)
> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
> +	exit 1 ;;'''
> +    print "esac"
> +    print 'exec perf record -e $E -b "$@"'
> +
> +if cpufound == 0 and not args.all:
> +    sys.exit('CPU %s not found' % cpu)
> +
> +if found == 0:
> +    sys.exit('Branch event not found')
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 9e54bb7..427d89a 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8249,13 +8249,34 @@ which are generally profitable only with profile feedback available:
>  If omitted, it defaults to @file{fbdata.afdo} in the current directory.
>  
>  Producing an AutoFDO profile data file requires running your program
> -with the @command{perf} utility on a supported GNU/Linux target system.
> +with the @command{gcc-auto-profile} utility on a supported GNU/Linux target system. @command{gcc-auto-profile} calls the @command{perf} utility.
> +It also requires Last-Branch-Record support, which typically requires
> +a new enough kernel not running virtualized.
> +@command{gcc-auto-profile} accepts the same arguments as @command{perf record}.
>  For more information, see @uref{https://perf.wiki.kernel.org/}.
>  
> -E.g.
>  @smallexample
> -perf record -e br_inst_retired:near_taken -b -o perf.data \
> -    -- your_program
> +gcc-auto-profile your_program
> +@end smallexample
> +
> +On larger programs the resulting perf.data file may be very large.
> +In this case it can be better to reduce the sampling rate.
> +Collect samples every million taken branches:
> +
> +@smallexample
> +gcc-auto-profile -c 1000000 program
> +@end smallexample
> +
> +Or only profile representative run intervals of the program:
> +
> +@smallexample
> +gcc-auto-profile -p PID-OF-PROGRAM sleep 5
> +@end smallexample
> +
> +Profile complete system for 10 seconds (may require root)
> +
> +@smallexample
> +gcc-auto-profile -a sleep 10
>  @end smallexample
>  
>  Then use the @command{create_gcov} tool to convert the raw profile data
> @@ -8266,7 +8287,7 @@ See @uref{https://github.com/google/autofdo}.
>  E.g.
>  @smallexample
>  create_gcov --binary=your_program.unstripped --profile=perf.data \
> -    --gcov=profile.afdo
> +    --gcov=profile.afdo -gcov_version 1
>  @end smallexample
>  @end table
>  
> diff --git a/gcc/gcc-auto-profile b/gcc/gcc-auto-profile
> new file mode 100755
> index 0000000..c6712b2
> --- /dev/null
> +++ b/gcc/gcc-auto-profile
> @@ -0,0 +1,70 @@
> +#!/bin/sh
> +# profile workload for gcc profile feedback (autofdo) using Linux perf
> +# auto generated. to regenerate for new CPUs run
> +# contrib/gen_autofdo_event.py --shell --all in gcc source
> +
> +# usages:
> +# gcc-auto-profile program             (profile program and children)
> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
> +# gcc-auto-profile -p PID sleep X      (profile PID)
> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
> +
> +# identify branches taken event for CPU
> +#
> +
> +FLAGS=u
> +
> +if [ "$1" = "--kernel" ] ; then
> +  FLAGS=k
> +  shift
> +fi
> +if [ "$1" == "--all" ] ; then
> +  FLAGS=uk
> +  shift
> +fi
> +
> +if ! grep -q Intel /proc/cpuinfo ] ; then
> +  echo >&2 "Only Intel CPUs supported"
> +  exit 1
> +fi
> +
> +if grep -q hypervisor /proc/cpuinfo ; then
> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
> +fi
> +
> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in
> +model*:\ 55|\
> +model*:\ 77|\
> +model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
> +model*:\ 42|\
> +model*:\ 45|\
> +model*:\ 58|\
> +model*:\ 62|\
> +model*:\ 60|\
> +model*:\ 69|\
> +model*:\ 70|\
> +model*:\ 63|\
> +model*:\ 61|\
> +model*:\ 71|\
> +model*:\ 86|\
> +model*:\ 78|\
> +model*:\ 94) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
> +model*:\ 46|\
> +model*:\ 30|\
> +model*:\ 31|\
> +model*:\ 26|\
> +model*:\ 47|\
> +model*:\ 37|\
> +model*:\ 44) E="cpu/event=0x88,umask=0x40/p$FLAGS" ;;
> +model*:\ 28|\
> +model*:\ 38|\
> +model*:\ 39|\
> +model*:\ 54|\
> +model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;;
> +*)
> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
> +	exit 1 ;;
> +esac
> +exec perf record -e $E -b "$@"
Andi Kleen April 23, 2016, 4:15 p.m. UTC | #4
Andi Kleen <andi@firstfloor.org> writes:

Ping^2 for the patch series!

> Andi Kleen <andi@firstfloor.org> writes:
>
> Ping for the patch series!
>
>> From: Andi Kleen <ak@linux.intel.com>
>>
>> Using autofdo is currently something difficult. It requires using the
>> model specific branches taken event, which differs on different CPUs.
>> The example shown in the manual requires a special patched version of
>> perf that is non standard, and also will likely not work everywhere.
>>
>> This patch adds a new gcc-auto-profile script that figures out the
>> correct event and runs perf. The script is installed with on Linux systems.
>>
>> Since maintaining the script would be somewhat tedious (needs changes
>> every time a new CPU comes out) I auto generated it from the online
>> Intel event database. The script to do that is in contrib and can be
>> rerun.
>>
>> Right now there is no test if perf works in configure. This
>> would vary depending on the build and target system, and since
>> it currently doesn't work in virtualization and needs uptodate
>> kernel it may often fail in common distribution build setups.
>>
>> So Linux just hardcodes installing the script, but it may fail at runtime.
>>
>> This is needed to actually make use of autofdo in a generic way
>> in the build system and in the test suite.
>>
>> So far the script is not installed.
>>
>> gcc/:
>> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>>
>> 	* doc/invoke.texi: Document gcc-auto-profile
>> 	* gcc-auto-profile: Create.
>>
>> contrib/:
>>
>> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>>
>> 	* gen_autofdo_event.py: New file to regenerate
>> 	gcc-auto-profile.
>> ---
>>  contrib/gen_autofdo_event.py | 155 +++++++++++++++++++++++++++++++++++++++++++
>>  gcc/doc/invoke.texi          |  31 +++++++--
>>  gcc/gcc-auto-profile         |  70 +++++++++++++++++++
>>  3 files changed, 251 insertions(+), 5 deletions(-)
>>  create mode 100755 contrib/gen_autofdo_event.py
>>  create mode 100755 gcc/gcc-auto-profile
>>
>> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
>> new file mode 100755
>> index 0000000..db4db33
>> --- /dev/null
>> +++ b/contrib/gen_autofdo_event.py
>> @@ -0,0 +1,155 @@
>> +#!/usr/bin/python
>> +# generate Intel taken branches Linux perf event script for autofdo profiling
>> +
>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>> +#
>> +# GCC is free software; you can redistribute it and/or modify it under
>> +# the terms of the GNU General Public License as published by the Free
>> +# Software Foundation; either version 3, or (at your option) any later
>> +# version.
>> +#
>> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>> +# for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with GCC; see the file COPYING3.  If not see
>> +# <http://www.gnu.org/licenses/>.  */
>> +
>> +# run it with perf record -b -e EVENT program ...
>> +# The Linux Kernel needs to support the PMU of the current CPU, and
>> +# it will likely not work in VMs.
>> +# add --all to print for all cpus, otherwise for current cpu
>> +# add --script to generate shell script to run correct event
>> +#
>> +# requires internet (https) access. this may require setting up a proxy
>> +# with export https_proxy=...
>> +#
>> +import urllib2
>> +import sys
>> +import json
>> +import argparse
>> +import collections
>> +
>> +baseurl = "https://download.01.org/perfmon"
>> +
>> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
>> +                 u'BR_INST_EXEC.TAKEN',
>> +                 u'BR_INST_RETIRED.TAKEN_JCC',
>> +                 u'BR_INST_TYPE_RETIRED.COND_TAKEN')
>> +
>> +ap = argparse.ArgumentParser()
>> +ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
>> +ap.add_argument('--script', help='Generate shell script', action='store_true')
>> +args = ap.parse_args()
>> +
>> +eventmap = collections.defaultdict(list)
>> +
>> +def get_cpu_str():
>> +    with open('/proc/cpuinfo', 'r') as c:
>> +        vendor, fam, model = None, None, None
>> +        for j in c:
>> +            n = j.split()
>> +            if n[0] == 'vendor_id':
>> +                vendor = n[2]
>> +            elif n[0] == 'model' and n[1] == ':':
>> +                model = int(n[2])
>> +            elif n[0] == 'cpu' and n[1] == 'family':
>> +                fam = int(n[3])
>> +            if vendor and fam and model:
>> +                return "%s-%d-%X" % (vendor, fam, model), model
>> +    return None, None
>> +
>> +def find_event(eventurl, model):
>> +    print >>sys.stderr, "Downloading", eventurl
>> +    u = urllib2.urlopen(eventurl)
>> +    events = json.loads(u.read())
>> +    u.close()
>> +
>> +    found = 0
>> +    for j in events:
>> +        if j[u'EventName'] in target_events:
>> +            event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
>> +            if u'PEBS' in j and j[u'PEBS'] > 0:
>> +                event += "p"
>> +            if args.script:
>> +                eventmap[event].append(model)
>> +            else:
>> +                print j[u'EventName'], "event for model", model, "is", event
>> +            found += 1
>> +    return found
>> +
>> +if not args.all:
>> +    cpu, model = get_cpu_str()
>> +    if not cpu:
>> +        sys.exit("Unknown CPU type")
>> +
>> +url = baseurl + "/mapfile.csv"
>> +print >>sys.stderr, "Downloading", url
>> +u = urllib2.urlopen(url)
>> +found = 0
>> +cpufound = 0
>> +for j in u:
>> +    n = j.rstrip().split(',')
>> +    if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
>> +        if args.all:
>> +            vendor, fam, model = n[0].split("-")
>> +            model = int(model, 16)
>> +        cpufound += 1
>> +        found += find_event(baseurl + n[2], model)
>> +u.close()
>> +
>> +if args.script:
>> +    print '''#!/bin/sh
>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>> +# auto generated. to regenerate for new CPUs run
>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>> +
>> +# usages:
>> +# gcc-auto-profile program             (profile program and children)
>> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
>> +# gcc-auto-profile -p PID sleep X      (profile PID)
>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
>> +
>> +# identify branches taken event for CPU
>> +#
>> +
>> +FLAGS=u
>> +
>> +if [ "$1" = "--kernel" ] ; then
>> +  FLAGS=k
>> +  shift
>> +fi
>> +if [ "$1" == "--all" ] ; then
>> +  FLAGS=uk
>> +  shift
>> +fi
>> +
>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>> +  echo >&2 "Only Intel CPUs supported"
>> +  exit 1
>> +fi
>> +
>> +if grep -q hypervisor /proc/cpuinfo ; then
>> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
>> +fi
>> +
>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in'''
>> +    for event, mod in eventmap.iteritems():
>> +        for m in mod[:-1]:
>> +            print "model*:\ %s|\\" % m
>> +        print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
>> +    print '''*)
>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>> +	exit 1 ;;'''
>> +    print "esac"
>> +    print 'exec perf record -e $E -b "$@"'
>> +
>> +if cpufound == 0 and not args.all:
>> +    sys.exit('CPU %s not found' % cpu)
>> +
>> +if found == 0:
>> +    sys.exit('Branch event not found')
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 9e54bb7..427d89a 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -8249,13 +8249,34 @@ which are generally profitable only with profile feedback available:
>>  If omitted, it defaults to @file{fbdata.afdo} in the current directory.
>>  
>>  Producing an AutoFDO profile data file requires running your program
>> -with the @command{perf} utility on a supported GNU/Linux target system.
>> +with the @command{gcc-auto-profile} utility on a supported GNU/Linux target system. @command{gcc-auto-profile} calls the @command{perf} utility.
>> +It also requires Last-Branch-Record support, which typically requires
>> +a new enough kernel not running virtualized.
>> +@command{gcc-auto-profile} accepts the same arguments as @command{perf record}.
>>  For more information, see @uref{https://perf.wiki.kernel.org/}.
>>  
>> -E.g.
>>  @smallexample
>> -perf record -e br_inst_retired:near_taken -b -o perf.data \
>> -    -- your_program
>> +gcc-auto-profile your_program
>> +@end smallexample
>> +
>> +On larger programs the resulting perf.data file may be very large.
>> +In this case it can be better to reduce the sampling rate.
>> +Collect samples every million taken branches:
>> +
>> +@smallexample
>> +gcc-auto-profile -c 1000000 program
>> +@end smallexample
>> +
>> +Or only profile representative run intervals of the program:
>> +
>> +@smallexample
>> +gcc-auto-profile -p PID-OF-PROGRAM sleep 5
>> +@end smallexample
>> +
>> +Profile complete system for 10 seconds (may require root)
>> +
>> +@smallexample
>> +gcc-auto-profile -a sleep 10
>>  @end smallexample
>>  
>>  Then use the @command{create_gcov} tool to convert the raw profile data
>> @@ -8266,7 +8287,7 @@ See @uref{https://github.com/google/autofdo}.
>>  E.g.
>>  @smallexample
>>  create_gcov --binary=your_program.unstripped --profile=perf.data \
>> -    --gcov=profile.afdo
>> +    --gcov=profile.afdo -gcov_version 1
>>  @end smallexample
>>  @end table
>>  
>> diff --git a/gcc/gcc-auto-profile b/gcc/gcc-auto-profile
>> new file mode 100755
>> index 0000000..c6712b2
>> --- /dev/null
>> +++ b/gcc/gcc-auto-profile
>> @@ -0,0 +1,70 @@
>> +#!/bin/sh
>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>> +# auto generated. to regenerate for new CPUs run
>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>> +
>> +# usages:
>> +# gcc-auto-profile program             (profile program and children)
>> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
>> +# gcc-auto-profile -p PID sleep X      (profile PID)
>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
>> +
>> +# identify branches taken event for CPU
>> +#
>> +
>> +FLAGS=u
>> +
>> +if [ "$1" = "--kernel" ] ; then
>> +  FLAGS=k
>> +  shift
>> +fi
>> +if [ "$1" == "--all" ] ; then
>> +  FLAGS=uk
>> +  shift
>> +fi
>> +
>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>> +  echo >&2 "Only Intel CPUs supported"
>> +  exit 1
>> +fi
>> +
>> +if grep -q hypervisor /proc/cpuinfo ; then
>> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
>> +fi
>> +
>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in
>> +model*:\ 55|\
>> +model*:\ 77|\
>> +model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
>> +model*:\ 42|\
>> +model*:\ 45|\
>> +model*:\ 58|\
>> +model*:\ 62|\
>> +model*:\ 60|\
>> +model*:\ 69|\
>> +model*:\ 70|\
>> +model*:\ 63|\
>> +model*:\ 61|\
>> +model*:\ 71|\
>> +model*:\ 86|\
>> +model*:\ 78|\
>> +model*:\ 94) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
>> +model*:\ 46|\
>> +model*:\ 30|\
>> +model*:\ 31|\
>> +model*:\ 26|\
>> +model*:\ 47|\
>> +model*:\ 37|\
>> +model*:\ 44) E="cpu/event=0x88,umask=0x40/p$FLAGS" ;;
>> +model*:\ 28|\
>> +model*:\ 38|\
>> +model*:\ 39|\
>> +model*:\ 54|\
>> +model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;;
>> +*)
>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>> +	exit 1 ;;
>> +esac
>> +exec perf record -e $E -b "$@"
Andi Kleen April 27, 2016, 2:29 p.m. UTC | #5
Andi Kleen <andi@firstfloor.org> writes:

Ping^3 for the patch series!

> Andi Kleen <andi@firstfloor.org> writes:
>
> Ping^2 for the patch series!
>
>> Andi Kleen <andi@firstfloor.org> writes:
>>
>> Ping for the patch series!
>>
>>> From: Andi Kleen <ak@linux.intel.com>
>>>
>>> Using autofdo is currently something difficult. It requires using the
>>> model specific branches taken event, which differs on different CPUs.
>>> The example shown in the manual requires a special patched version of
>>> perf that is non standard, and also will likely not work everywhere.
>>>
>>> This patch adds a new gcc-auto-profile script that figures out the
>>> correct event and runs perf. The script is installed with on Linux systems.
>>>
>>> Since maintaining the script would be somewhat tedious (needs changes
>>> every time a new CPU comes out) I auto generated it from the online
>>> Intel event database. The script to do that is in contrib and can be
>>> rerun.
>>>
>>> Right now there is no test if perf works in configure. This
>>> would vary depending on the build and target system, and since
>>> it currently doesn't work in virtualization and needs uptodate
>>> kernel it may often fail in common distribution build setups.
>>>
>>> So Linux just hardcodes installing the script, but it may fail at runtime.
>>>
>>> This is needed to actually make use of autofdo in a generic way
>>> in the build system and in the test suite.
>>>
>>> So far the script is not installed.
>>>
>>> gcc/:
>>> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>>>
>>> 	* doc/invoke.texi: Document gcc-auto-profile
>>> 	* gcc-auto-profile: Create.
>>>
>>> contrib/:
>>>
>>> 2016-03-27  Andi Kleen  <ak@linux.intel.com>
>>>
>>> 	* gen_autofdo_event.py: New file to regenerate
>>> 	gcc-auto-profile.
>>> ---
>>>  contrib/gen_autofdo_event.py | 155 +++++++++++++++++++++++++++++++++++++++++++
>>>  gcc/doc/invoke.texi          |  31 +++++++--
>>>  gcc/gcc-auto-profile         |  70 +++++++++++++++++++
>>>  3 files changed, 251 insertions(+), 5 deletions(-)
>>>  create mode 100755 contrib/gen_autofdo_event.py
>>>  create mode 100755 gcc/gcc-auto-profile
>>>
>>> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
>>> new file mode 100755
>>> index 0000000..db4db33
>>> --- /dev/null
>>> +++ b/contrib/gen_autofdo_event.py
>>> @@ -0,0 +1,155 @@
>>> +#!/usr/bin/python
>>> +# generate Intel taken branches Linux perf event script for autofdo profiling
>>> +
>>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>>> +#
>>> +# GCC is free software; you can redistribute it and/or modify it under
>>> +# the terms of the GNU General Public License as published by the Free
>>> +# Software Foundation; either version 3, or (at your option) any later
>>> +# version.
>>> +#
>>> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +# for more details.
>>> +#
>>> +# You should have received a copy of the GNU General Public License
>>> +# along with GCC; see the file COPYING3.  If not see
>>> +# <http://www.gnu.org/licenses/>.  */
>>> +
>>> +# run it with perf record -b -e EVENT program ...
>>> +# The Linux Kernel needs to support the PMU of the current CPU, and
>>> +# it will likely not work in VMs.
>>> +# add --all to print for all cpus, otherwise for current cpu
>>> +# add --script to generate shell script to run correct event
>>> +#
>>> +# requires internet (https) access. this may require setting up a proxy
>>> +# with export https_proxy=...
>>> +#
>>> +import urllib2
>>> +import sys
>>> +import json
>>> +import argparse
>>> +import collections
>>> +
>>> +baseurl = "https://download.01.org/perfmon"
>>> +
>>> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
>>> +                 u'BR_INST_EXEC.TAKEN',
>>> +                 u'BR_INST_RETIRED.TAKEN_JCC',
>>> +                 u'BR_INST_TYPE_RETIRED.COND_TAKEN')
>>> +
>>> +ap = argparse.ArgumentParser()
>>> +ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
>>> +ap.add_argument('--script', help='Generate shell script', action='store_true')
>>> +args = ap.parse_args()
>>> +
>>> +eventmap = collections.defaultdict(list)
>>> +
>>> +def get_cpu_str():
>>> +    with open('/proc/cpuinfo', 'r') as c:
>>> +        vendor, fam, model = None, None, None
>>> +        for j in c:
>>> +            n = j.split()
>>> +            if n[0] == 'vendor_id':
>>> +                vendor = n[2]
>>> +            elif n[0] == 'model' and n[1] == ':':
>>> +                model = int(n[2])
>>> +            elif n[0] == 'cpu' and n[1] == 'family':
>>> +                fam = int(n[3])
>>> +            if vendor and fam and model:
>>> +                return "%s-%d-%X" % (vendor, fam, model), model
>>> +    return None, None
>>> +
>>> +def find_event(eventurl, model):
>>> +    print >>sys.stderr, "Downloading", eventurl
>>> +    u = urllib2.urlopen(eventurl)
>>> +    events = json.loads(u.read())
>>> +    u.close()
>>> +
>>> +    found = 0
>>> +    for j in events:
>>> +        if j[u'EventName'] in target_events:
>>> +            event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
>>> +            if u'PEBS' in j and j[u'PEBS'] > 0:
>>> +                event += "p"
>>> +            if args.script:
>>> +                eventmap[event].append(model)
>>> +            else:
>>> +                print j[u'EventName'], "event for model", model, "is", event
>>> +            found += 1
>>> +    return found
>>> +
>>> +if not args.all:
>>> +    cpu, model = get_cpu_str()
>>> +    if not cpu:
>>> +        sys.exit("Unknown CPU type")
>>> +
>>> +url = baseurl + "/mapfile.csv"
>>> +print >>sys.stderr, "Downloading", url
>>> +u = urllib2.urlopen(url)
>>> +found = 0
>>> +cpufound = 0
>>> +for j in u:
>>> +    n = j.rstrip().split(',')
>>> +    if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
>>> +        if args.all:
>>> +            vendor, fam, model = n[0].split("-")
>>> +            model = int(model, 16)
>>> +        cpufound += 1
>>> +        found += find_event(baseurl + n[2], model)
>>> +u.close()
>>> +
>>> +if args.script:
>>> +    print '''#!/bin/sh
>>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>>> +# auto generated. to regenerate for new CPUs run
>>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>>> +
>>> +# usages:
>>> +# gcc-auto-profile program             (profile program and children)
>>> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
>>> +# gcc-auto-profile -p PID sleep X      (profile PID)
>>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>>> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
>>> +
>>> +# identify branches taken event for CPU
>>> +#
>>> +
>>> +FLAGS=u
>>> +
>>> +if [ "$1" = "--kernel" ] ; then
>>> +  FLAGS=k
>>> +  shift
>>> +fi
>>> +if [ "$1" == "--all" ] ; then
>>> +  FLAGS=uk
>>> +  shift
>>> +fi
>>> +
>>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>>> +  echo >&2 "Only Intel CPUs supported"
>>> +  exit 1
>>> +fi
>>> +
>>> +if grep -q hypervisor /proc/cpuinfo ; then
>>> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
>>> +fi
>>> +
>>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>>> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in'''
>>> +    for event, mod in eventmap.iteritems():
>>> +        for m in mod[:-1]:
>>> +            print "model*:\ %s|\\" % m
>>> +        print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
>>> +    print '''*)
>>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>>> +	exit 1 ;;'''
>>> +    print "esac"
>>> +    print 'exec perf record -e $E -b "$@"'
>>> +
>>> +if cpufound == 0 and not args.all:
>>> +    sys.exit('CPU %s not found' % cpu)
>>> +
>>> +if found == 0:
>>> +    sys.exit('Branch event not found')
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 9e54bb7..427d89a 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -8249,13 +8249,34 @@ which are generally profitable only with profile feedback available:
>>>  If omitted, it defaults to @file{fbdata.afdo} in the current directory.
>>>  
>>>  Producing an AutoFDO profile data file requires running your program
>>> -with the @command{perf} utility on a supported GNU/Linux target system.
>>> +with the @command{gcc-auto-profile} utility on a supported GNU/Linux target system. @command{gcc-auto-profile} calls the @command{perf} utility.
>>> +It also requires Last-Branch-Record support, which typically requires
>>> +a new enough kernel not running virtualized.
>>> +@command{gcc-auto-profile} accepts the same arguments as @command{perf record}.
>>>  For more information, see @uref{https://perf.wiki.kernel.org/}.
>>>  
>>> -E.g.
>>>  @smallexample
>>> -perf record -e br_inst_retired:near_taken -b -o perf.data \
>>> -    -- your_program
>>> +gcc-auto-profile your_program
>>> +@end smallexample
>>> +
>>> +On larger programs the resulting perf.data file may be very large.
>>> +In this case it can be better to reduce the sampling rate.
>>> +Collect samples every million taken branches:
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -c 1000000 program
>>> +@end smallexample
>>> +
>>> +Or only profile representative run intervals of the program:
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -p PID-OF-PROGRAM sleep 5
>>> +@end smallexample
>>> +
>>> +Profile complete system for 10 seconds (may require root)
>>> +
>>> +@smallexample
>>> +gcc-auto-profile -a sleep 10
>>>  @end smallexample
>>>  
>>>  Then use the @command{create_gcov} tool to convert the raw profile data
>>> @@ -8266,7 +8287,7 @@ See @uref{https://github.com/google/autofdo}.
>>>  E.g.
>>>  @smallexample
>>>  create_gcov --binary=your_program.unstripped --profile=perf.data \
>>> -    --gcov=profile.afdo
>>> +    --gcov=profile.afdo -gcov_version 1
>>>  @end smallexample
>>>  @end table
>>>  
>>> diff --git a/gcc/gcc-auto-profile b/gcc/gcc-auto-profile
>>> new file mode 100755
>>> index 0000000..c6712b2
>>> --- /dev/null
>>> +++ b/gcc/gcc-auto-profile
>>> @@ -0,0 +1,70 @@
>>> +#!/bin/sh
>>> +# profile workload for gcc profile feedback (autofdo) using Linux perf
>>> +# auto generated. to regenerate for new CPUs run
>>> +# contrib/gen_autofdo_event.py --shell --all in gcc source
>>> +
>>> +# usages:
>>> +# gcc-auto-profile program             (profile program and children)
>>> +# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
>>> +# gcc-auto-profile -p PID sleep X      (profile PID)
>>> +# gcc-auto-profile --kernel -a sleep X (profile kernel)
>>> +# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
>>> +
>>> +# identify branches taken event for CPU
>>> +#
>>> +
>>> +FLAGS=u
>>> +
>>> +if [ "$1" = "--kernel" ] ; then
>>> +  FLAGS=k
>>> +  shift
>>> +fi
>>> +if [ "$1" == "--all" ] ; then
>>> +  FLAGS=uk
>>> +  shift
>>> +fi
>>> +
>>> +if ! grep -q Intel /proc/cpuinfo ] ; then
>>> +  echo >&2 "Only Intel CPUs supported"
>>> +  exit 1
>>> +fi
>>> +
>>> +if grep -q hypervisor /proc/cpuinfo ; then
>>> +  echo >&2 "Warning: branch profiling may not be functional in VMs"
>>> +fi
>>> +
>>> +case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
>>> +  egrep "^model\s*:" /proc/cpuinfo | head -1` in
>>> +model*:\ 55|\
>>> +model*:\ 77|\
>>> +model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
>>> +model*:\ 42|\
>>> +model*:\ 45|\
>>> +model*:\ 58|\
>>> +model*:\ 62|\
>>> +model*:\ 60|\
>>> +model*:\ 69|\
>>> +model*:\ 70|\
>>> +model*:\ 63|\
>>> +model*:\ 61|\
>>> +model*:\ 71|\
>>> +model*:\ 86|\
>>> +model*:\ 78|\
>>> +model*:\ 94) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
>>> +model*:\ 46|\
>>> +model*:\ 30|\
>>> +model*:\ 31|\
>>> +model*:\ 26|\
>>> +model*:\ 47|\
>>> +model*:\ 37|\
>>> +model*:\ 44) E="cpu/event=0x88,umask=0x40/p$FLAGS" ;;
>>> +model*:\ 28|\
>>> +model*:\ 38|\
>>> +model*:\ 39|\
>>> +model*:\ 54|\
>>> +model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;;
>>> +*)
>>> +echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
>>> +	exit 1 ;;
>>> +esac
>>> +exec perf record -e $E -b "$@"
Bernd Schmidt April 27, 2016, 3:42 p.m. UTC | #6
On 03/28/2016 06:44 AM, Andi Kleen wrote:
> This patch adds a new gcc-auto-profile script that figures out the
> correct event and runs perf. The script is installed with on Linux systems.

That sounds useful, and I think we'll want to accept this.

> So Linux just hardcodes installing the script, but it may fail at runtime.

For this reason it would probably be best to retain the documentation 
for the old method alongside the new one.

> +
> +baseurl = "https://download.01.org/perfmon"

Slightly scary to see a random unknown download URL. Apparently it's an 
Intel thing? Is this referenced somewhere on an intel.com web page?

>   E.g.
>   @smallexample
>   create_gcov --binary=your_program.unstripped --profile=perf.data \
> -    --gcov=profile.afdo
> +    --gcov=profile.afdo -gcov_version 1
>   @end smallexample
>   @end table

Why this change? What does it do?

Why isn't the new script in contrib? Does it have to be in gcc to be 
installed? As a target-specific thing it probably needs to live at least 
inside config/.

Please review the patch yourself for proper sentences everywhere.


Bernd
Andi Kleen April 27, 2016, 4:30 p.m. UTC | #7
On Wed, Apr 27, 2016 at 05:42:48PM +0200, Bernd Schmidt wrote:
> On 03/28/2016 06:44 AM, Andi Kleen wrote:
> >This patch adds a new gcc-auto-profile script that figures out the
> >correct event and runs perf. The script is installed with on Linux systems.
> 
> That sounds useful, and I think we'll want to accept this.
> 
> >So Linux just hardcodes installing the script, but it may fail at runtime.
> 
> For this reason it would probably be best to retain the documentation for
> the old method alongside the new one.

The old method actually doesn't work, unless you apply a very obscure
patch to your perf. I don't think it is very useful for users.

> 
> >+
> >+baseurl = "https://download.01.org/perfmon"
> 
> Slightly scary to see a random unknown download URL. Apparently it's an
> Intel thing? Is this referenced somewhere on an intel.com web page?

http://www.intel.com/content/www/us/en/search.html?toplevelcategory=none&keyword=01.org


> 
> >  E.g.
> >  @smallexample
> >  create_gcov --binary=your_program.unstripped --profile=perf.data \
> >-    --gcov=profile.afdo
> >+    --gcov=profile.afdo -gcov_version 1
> >  @end smallexample
> >  @end table
> 
> Why this change? What does it do?

It actually makes it work. The google autofdo distribution defaults
to some google internal magic gcov version number that doesn't work
with standard gcc.

I can split it out.

> 
> Why isn't the new script in contrib? Does it have to be in gcc to be
> installed?

Because autoprofiledfeedback needs it.

Also the idea was to eventually install it by default (although the patch
doesn't do that yet)

-Andi
diff mbox

Patch

diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
new file mode 100755
index 0000000..db4db33
--- /dev/null
+++ b/contrib/gen_autofdo_event.py
@@ -0,0 +1,155 @@ 
+#!/usr/bin/python
+# generate Intel taken branches Linux perf event script for autofdo profiling
+
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# GCC is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation; either version 3, or (at your option) any later
+# version.
+#
+# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+# WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.  */
+
+# run it with perf record -b -e EVENT program ...
+# The Linux Kernel needs to support the PMU of the current CPU, and
+# it will likely not work in VMs.
+# add --all to print for all cpus, otherwise for current cpu
+# add --script to generate shell script to run correct event
+#
+# requires internet (https) access. this may require setting up a proxy
+# with export https_proxy=...
+#
+import urllib2
+import sys
+import json
+import argparse
+import collections
+
+baseurl = "https://download.01.org/perfmon"
+
+target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
+                 u'BR_INST_EXEC.TAKEN',
+                 u'BR_INST_RETIRED.TAKEN_JCC',
+                 u'BR_INST_TYPE_RETIRED.COND_TAKEN')
+
+ap = argparse.ArgumentParser()
+ap.add_argument('--all', '-a', help='Print for all CPUs', action='store_true')
+ap.add_argument('--script', help='Generate shell script', action='store_true')
+args = ap.parse_args()
+
+eventmap = collections.defaultdict(list)
+
+def get_cpu_str():
+    with open('/proc/cpuinfo', 'r') as c:
+        vendor, fam, model = None, None, None
+        for j in c:
+            n = j.split()
+            if n[0] == 'vendor_id':
+                vendor = n[2]
+            elif n[0] == 'model' and n[1] == ':':
+                model = int(n[2])
+            elif n[0] == 'cpu' and n[1] == 'family':
+                fam = int(n[3])
+            if vendor and fam and model:
+                return "%s-%d-%X" % (vendor, fam, model), model
+    return None, None
+
+def find_event(eventurl, model):
+    print >>sys.stderr, "Downloading", eventurl
+    u = urllib2.urlopen(eventurl)
+    events = json.loads(u.read())
+    u.close()
+
+    found = 0
+    for j in events:
+        if j[u'EventName'] in target_events:
+            event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
+            if u'PEBS' in j and j[u'PEBS'] > 0:
+                event += "p"
+            if args.script:
+                eventmap[event].append(model)
+            else:
+                print j[u'EventName'], "event for model", model, "is", event
+            found += 1
+    return found
+
+if not args.all:
+    cpu, model = get_cpu_str()
+    if not cpu:
+        sys.exit("Unknown CPU type")
+
+url = baseurl + "/mapfile.csv"
+print >>sys.stderr, "Downloading", url
+u = urllib2.urlopen(url)
+found = 0
+cpufound = 0
+for j in u:
+    n = j.rstrip().split(',')
+    if len(n) >= 4 and (args.all or n[0] == cpu) and n[3] == "core":
+        if args.all:
+            vendor, fam, model = n[0].split("-")
+            model = int(model, 16)
+        cpufound += 1
+        found += find_event(baseurl + n[2], model)
+u.close()
+
+if args.script:
+    print '''#!/bin/sh
+# profile workload for gcc profile feedback (autofdo) using Linux perf
+# auto generated. to regenerate for new CPUs run
+# contrib/gen_autofdo_event.py --shell --all in gcc source
+
+# usages:
+# gcc-auto-profile program             (profile program and children)
+# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
+# gcc-auto-profile -p PID sleep X      (profile PID)
+# gcc-auto-profile --kernel -a sleep X (profile kernel)
+# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
+
+# identify branches taken event for CPU
+#
+
+FLAGS=u
+
+if [ "$1" = "--kernel" ] ; then
+  FLAGS=k
+  shift
+fi
+if [ "$1" == "--all" ] ; then
+  FLAGS=uk
+  shift
+fi
+
+if ! grep -q Intel /proc/cpuinfo ] ; then
+  echo >&2 "Only Intel CPUs supported"
+  exit 1
+fi
+
+if grep -q hypervisor /proc/cpuinfo ; then
+  echo >&2 "Warning: branch profiling may not be functional in VMs"
+fi
+
+case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
+  egrep "^model\s*:" /proc/cpuinfo | head -1` in'''
+    for event, mod in eventmap.iteritems():
+        for m in mod[:-1]:
+            print "model*:\ %s|\\" % m
+        print 'model*:\ %s) E="%s$FLAGS" ;;' % (mod[-1], event)
+    print '''*)
+echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
+	exit 1 ;;'''
+    print "esac"
+    print 'exec perf record -e $E -b "$@"'
+
+if cpufound == 0 and not args.all:
+    sys.exit('CPU %s not found' % cpu)
+
+if found == 0:
+    sys.exit('Branch event not found')
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9e54bb7..427d89a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8249,13 +8249,34 @@  which are generally profitable only with profile feedback available:
 If omitted, it defaults to @file{fbdata.afdo} in the current directory.
 
 Producing an AutoFDO profile data file requires running your program
-with the @command{perf} utility on a supported GNU/Linux target system.
+with the @command{gcc-auto-profile} utility on a supported GNU/Linux target system. @command{gcc-auto-profile} calls the @command{perf} utility.
+It also requires Last-Branch-Record support, which typically requires
+a new enough kernel not running virtualized.
+@command{gcc-auto-profile} accepts the same arguments as @command{perf record}.
 For more information, see @uref{https://perf.wiki.kernel.org/}.
 
-E.g.
 @smallexample
-perf record -e br_inst_retired:near_taken -b -o perf.data \
-    -- your_program
+gcc-auto-profile your_program
+@end smallexample
+
+On larger programs the resulting perf.data file may be very large.
+In this case it can be better to reduce the sampling rate.
+Collect samples every million taken branches:
+
+@smallexample
+gcc-auto-profile -c 1000000 program
+@end smallexample
+
+Or only profile representative run intervals of the program:
+
+@smallexample
+gcc-auto-profile -p PID-OF-PROGRAM sleep 5
+@end smallexample
+
+Profile complete system for 10 seconds (may require root)
+
+@smallexample
+gcc-auto-profile -a sleep 10
 @end smallexample
 
 Then use the @command{create_gcov} tool to convert the raw profile data
@@ -8266,7 +8287,7 @@  See @uref{https://github.com/google/autofdo}.
 E.g.
 @smallexample
 create_gcov --binary=your_program.unstripped --profile=perf.data \
-    --gcov=profile.afdo
+    --gcov=profile.afdo -gcov_version 1
 @end smallexample
 @end table
 
diff --git a/gcc/gcc-auto-profile b/gcc/gcc-auto-profile
new file mode 100755
index 0000000..c6712b2
--- /dev/null
+++ b/gcc/gcc-auto-profile
@@ -0,0 +1,70 @@ 
+#!/bin/sh
+# profile workload for gcc profile feedback (autofdo) using Linux perf
+# auto generated. to regenerate for new CPUs run
+# contrib/gen_autofdo_event.py --shell --all in gcc source
+
+# usages:
+# gcc-auto-profile program             (profile program and children)
+# gcc-auto-profile -a sleep X          (profile all for X secs, may need root)
+# gcc-auto-profile -p PID sleep X      (profile PID)
+# gcc-auto-profile --kernel -a sleep X (profile kernel)
+# gcc-auto-profile --all -a sleep X    (profile kernel and user space)
+
+# identify branches taken event for CPU
+#
+
+FLAGS=u
+
+if [ "$1" = "--kernel" ] ; then
+  FLAGS=k
+  shift
+fi
+if [ "$1" == "--all" ] ; then
+  FLAGS=uk
+  shift
+fi
+
+if ! grep -q Intel /proc/cpuinfo ] ; then
+  echo >&2 "Only Intel CPUs supported"
+  exit 1
+fi
+
+if grep -q hypervisor /proc/cpuinfo ; then
+  echo >&2 "Warning: branch profiling may not be functional in VMs"
+fi
+
+case `egrep -q "^cpu family\s*: 6" /proc/cpuinfo &&
+  egrep "^model\s*:" /proc/cpuinfo | head -1` in
+model*:\ 55|\
+model*:\ 77|\
+model*:\ 76) E="cpu/event=0xC4,umask=0xFE/p$FLAGS" ;;
+model*:\ 42|\
+model*:\ 45|\
+model*:\ 58|\
+model*:\ 62|\
+model*:\ 60|\
+model*:\ 69|\
+model*:\ 70|\
+model*:\ 63|\
+model*:\ 61|\
+model*:\ 71|\
+model*:\ 86|\
+model*:\ 78|\
+model*:\ 94) E="cpu/event=0xC4,umask=0x20/p$FLAGS" ;;
+model*:\ 46|\
+model*:\ 30|\
+model*:\ 31|\
+model*:\ 26|\
+model*:\ 47|\
+model*:\ 37|\
+model*:\ 44) E="cpu/event=0x88,umask=0x40/p$FLAGS" ;;
+model*:\ 28|\
+model*:\ 38|\
+model*:\ 39|\
+model*:\ 54|\
+model*:\ 53) E="cpu/event=0x88,umask=0x41/p$FLAGS" ;;
+*)
+echo >&2 "Unknown CPU. Run contrib/gen_autofdo_event.py --all --script to update script."
+	exit 1 ;;
+esac
+exec perf record -e $E -b "$@"