diff mbox series

[10/10] perf/doc: update design.txt for exclude_{host|guest} flags

Message ID 1542363853-13849-11-git-send-email-andrew.murray@arm.com (mailing list archive)
State Not Applicable
Headers show
Series perf/core: Generalise event exclusion checking | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success next/apply_patch Successfully applied
snowpatch_ozlabs/build-ppc64le success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64be success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-ppc64e success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/build-pmac32 success build succeded & removed 0 sparse warning(s)
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked

Commit Message

Andrew Murray Nov. 16, 2018, 10:24 a.m. UTC
Update design.txt to reflect the presence of the exclude_host
and exclude_guest perf flags.

Signed-off-by: Andrew Murray <andrew.murray@arm.com>
---
 tools/perf/design.txt | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Michael Ellerman Nov. 20, 2018, 11:31 a.m. UTC | #1
Andrew Murray <andrew.murray@arm.com> writes:

> Update design.txt to reflect the presence of the exclude_host
> and exclude_guest perf flags.
>
> Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> ---
>  tools/perf/design.txt | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> index a28dca2..7de7d83 100644
> --- a/tools/perf/design.txt
> +++ b/tools/perf/design.txt
> @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
>  way to request that counting of events be restricted to times when the
>  CPU is in user, kernel and/or hypervisor mode.
>  
> +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> +to request counting of events restricted to guest and host contexts when
> +using virtualisation.

How does exclude_host differ from exclude_hv ?

cheers
Andrew Murray Nov. 20, 2018, 1:32 p.m. UTC | #2
On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> Andrew Murray <andrew.murray@arm.com> writes:
> 
> > Update design.txt to reflect the presence of the exclude_host
> > and exclude_guest perf flags.
> >
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > ---
> >  tools/perf/design.txt | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > index a28dca2..7de7d83 100644
> > --- a/tools/perf/design.txt
> > +++ b/tools/perf/design.txt
> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> >  way to request that counting of events be restricted to times when the
> >  CPU is in user, kernel and/or hypervisor mode.
> >  
> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> > +to request counting of events restricted to guest and host contexts when
> > +using virtualisation.
> 
> How does exclude_host differ from exclude_hv ?

I believe exclude_host / exclude_guest are intented to distinguish
between host and guest in the hosted hypervisor context (KVM).
Whereas exclude_hv allows to distinguish between guest and
hypervisor in the bare-metal type hypervisors.

In the case of arm64 - if VHE extensions are present then the host
kernel will run at a higher privilege to the guest kernel, in which
case there is no distinction between hypervisor and host so we ignore
exclude_hv. But where VHE extensions are not present then the host
kernel runs at the same privilege level as the guest and we use a
higher privilege level to switch between them - in this case we can
use exclude_hv to discount that hypervisor role of switching between
guests.

Thanks,

Andrew Murray

> 
> cheers
Michael Ellerman Dec. 11, 2018, 11:06 a.m. UTC | #3
[ Reviving old thread. ]

Andrew Murray <andrew.murray@arm.com> writes:
> On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> Andrew Murray <andrew.murray@arm.com> writes:
>> 
>> > Update design.txt to reflect the presence of the exclude_host
>> > and exclude_guest perf flags.
>> >
>> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> > ---
>> >  tools/perf/design.txt | 4 ++++
>> >  1 file changed, 4 insertions(+)
>> >
>> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> > index a28dca2..7de7d83 100644
>> > --- a/tools/perf/design.txt
>> > +++ b/tools/perf/design.txt
>> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
>> >  way to request that counting of events be restricted to times when the
>> >  CPU is in user, kernel and/or hypervisor mode.
>> >  
>> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> > +to request counting of events restricted to guest and host contexts when
>> > +using virtualisation.
>> 
>> How does exclude_host differ from exclude_hv ?
>
> I believe exclude_host / exclude_guest are intented to distinguish
> between host and guest in the hosted hypervisor context (KVM).

OK yeah, from the perf-list man page:

           u - user-space counting
           k - kernel counting
           h - hypervisor counting
           I - non idle counting
           G - guest counting (in KVM guests)
           H - host counting (not in KVM guests)

> Whereas exclude_hv allows to distinguish between guest and
> hypervisor in the bare-metal type hypervisors.

Except that's exactly not how we use them on powerpc :)

We use exclude_hv to exclude "the hypervisor", regardless of whether
it's KVM or PowerVM (which is a bare-metal hypervisor).

We don't use exclude_host / exclude_guest at all, which I guess is a
bug, except I didn't know they existed until this thread.

eg, in a KVM guest:

  $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
  $ perf report -D | grep -Fc "dso: [hypervisor]"
  16


> In the case of arm64 - if VHE extensions are present then the host
> kernel will run at a higher privilege to the guest kernel, in which
> case there is no distinction between hypervisor and host so we ignore
> exclude_hv. But where VHE extensions are not present then the host
> kernel runs at the same privilege level as the guest and we use a
> higher privilege level to switch between them - in this case we can
> use exclude_hv to discount that hypervisor role of switching between
> guests.

I couldn't find any arm64 perf code using exclude_host/guest at all?

And I don't see any x86 code using exclude_hv.

But maybe that's OK, I just worry this is confusing for users.

cheers
Andrew Murray Dec. 11, 2018, 1:59 p.m. UTC | #4
On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> [ Reviving old thread. ]
> 
> Andrew Murray <andrew.murray@arm.com> writes:
> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> >> Andrew Murray <andrew.murray@arm.com> writes:
> >> 
> >> > Update design.txt to reflect the presence of the exclude_host
> >> > and exclude_guest perf flags.
> >> >
> >> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> >> > ---
> >> >  tools/perf/design.txt | 4 ++++
> >> >  1 file changed, 4 insertions(+)
> >> >
> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> >> > index a28dca2..7de7d83 100644
> >> > --- a/tools/perf/design.txt
> >> > +++ b/tools/perf/design.txt
> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> >> >  way to request that counting of events be restricted to times when the
> >> >  CPU is in user, kernel and/or hypervisor mode.
> >> >  
> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> >> > +to request counting of events restricted to guest and host contexts when
> >> > +using virtualisation.
> >> 
> >> How does exclude_host differ from exclude_hv ?
> >
> > I believe exclude_host / exclude_guest are intented to distinguish
> > between host and guest in the hosted hypervisor context (KVM).
> 
> OK yeah, from the perf-list man page:
> 
>            u - user-space counting
>            k - kernel counting
>            h - hypervisor counting
>            I - non idle counting
>            G - guest counting (in KVM guests)
>            H - host counting (not in KVM guests)
> 
> > Whereas exclude_hv allows to distinguish between guest and
> > hypervisor in the bare-metal type hypervisors.
> 
> Except that's exactly not how we use them on powerpc :)
> 
> We use exclude_hv to exclude "the hypervisor", regardless of whether
> it's KVM or PowerVM (which is a bare-metal hypervisor).
> 
> We don't use exclude_host / exclude_guest at all, which I guess is a
> bug, except I didn't know they existed until this thread.
> 
> eg, in a KVM guest:
> 
>   $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
>   $ perf report -D | grep -Fc "dso: [hypervisor]"
>   16
> 
> 
> > In the case of arm64 - if VHE extensions are present then the host
> > kernel will run at a higher privilege to the guest kernel, in which
> > case there is no distinction between hypervisor and host so we ignore
> > exclude_hv. But where VHE extensions are not present then the host
> > kernel runs at the same privilege level as the guest and we use a
> > higher privilege level to switch between them - in this case we can
> > use exclude_hv to discount that hypervisor role of switching between
> > guests.
> 
> I couldn't find any arm64 perf code using exclude_host/guest at all?

Correct - but this is in flight as I am currently adding support for this
see [1].

> 
> And I don't see any x86 code using exclude_hv.

I can't find any either.

> 
> But maybe that's OK, I just worry this is confusing for users.

There is some extra context regarding this where exclude_guest/exclude_host
was added, see [2] and where exclude_hv was added, see [3]

Generally it seems that exclude_guest/exclude_host relies upon switching
counters off/on on guest/host switch code (which works well in the nested
virt case). Whereas exclude_hv tends to rely solely on hardware capability
based on privilege level (which works well in the bare metal case where
the guest doesn't run at same privilege as the host).

I think from the user perspective exclude_hv allows you to see your overhead
if you are a guest (i.e. work done by bare metal hypervisor associated with
you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
see events above you (i.e. the kernel hypervisor) if you are the guest...

At least that's how I read this, I've copied in others that may provide
more authoritative feedback.

[1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
[2] https://www.spinics.net/lists/kvm/msg53996.html
[3] https://lore.kernel.org/patchwork/patch/143918/

Thanks,

Andrew Murray

> 
> cheers
Michael Ellerman Dec. 12, 2018, 4:48 a.m. UTC | #5
Andrew Murray <andrew.murray@arm.com> writes:
> On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
>> [ Reviving old thread. ]
>> 
>> Andrew Murray <andrew.murray@arm.com> writes:
>> > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
>> >> Andrew Murray <andrew.murray@arm.com> writes:
>> >> 
>> >> > Update design.txt to reflect the presence of the exclude_host
>> >> > and exclude_guest perf flags.
>> >> >
>> >> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
>> >> > ---
>> >> >  tools/perf/design.txt | 4 ++++
>> >> >  1 file changed, 4 insertions(+)
>> >> >
>> >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
>> >> > index a28dca2..7de7d83 100644
>> >> > --- a/tools/perf/design.txt
>> >> > +++ b/tools/perf/design.txt
>> >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
>> >> >  way to request that counting of events be restricted to times when the
>> >> >  CPU is in user, kernel and/or hypervisor mode.
>> >> >  
>> >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
>> >> > +to request counting of events restricted to guest and host contexts when
>> >> > +using virtualisation.
>> >> 
>> >> How does exclude_host differ from exclude_hv ?
>> >
>> > I believe exclude_host / exclude_guest are intented to distinguish
>> > between host and guest in the hosted hypervisor context (KVM).
>> 
>> OK yeah, from the perf-list man page:
>> 
>>            u - user-space counting
>>            k - kernel counting
>>            h - hypervisor counting
>>            I - non idle counting
>>            G - guest counting (in KVM guests)
>>            H - host counting (not in KVM guests)
>> 
>> > Whereas exclude_hv allows to distinguish between guest and
>> > hypervisor in the bare-metal type hypervisors.
>> 
>> Except that's exactly not how we use them on powerpc :)
>> 
>> We use exclude_hv to exclude "the hypervisor", regardless of whether
>> it's KVM or PowerVM (which is a bare-metal hypervisor).
>> 
>> We don't use exclude_host / exclude_guest at all, which I guess is a
>> bug, except I didn't know they existed until this thread.
>> 
>> eg, in a KVM guest:
>> 
>>   $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
>>   $ perf report -D | grep -Fc "dso: [hypervisor]"
>>   16
>> 
>> 
>> > In the case of arm64 - if VHE extensions are present then the host
>> > kernel will run at a higher privilege to the guest kernel, in which
>> > case there is no distinction between hypervisor and host so we ignore
>> > exclude_hv. But where VHE extensions are not present then the host
>> > kernel runs at the same privilege level as the guest and we use a
>> > higher privilege level to switch between them - in this case we can
>> > use exclude_hv to discount that hypervisor role of switching between
>> > guests.
>> 
>> I couldn't find any arm64 perf code using exclude_host/guest at all?
>
> Correct - but this is in flight as I am currently adding support for this
> see [1].

OK, so at least that will be consistent across arm64 & x86.

>> And I don't see any x86 code using exclude_hv.
>
> I can't find any either.

I think that's because they don't need it, because they don't let guests
program the PMU directly. It's all handled by the host and the host
doesn't let the guest count host cycles anyway. But I could be wrong I'm
no x86 expert.

>> But maybe that's OK, I just worry this is confusing for users.
>
> There is some extra context regarding this where exclude_guest/exclude_host
> was added, see [2]

Good find. I had looked at that commit, but the thread on the list is
more informative.

In fact there was even a man page update! Never occurred to me look
there :P

http://man7.org/linux/man-pages/man2/perf_event_open.2.html

       exclude_host (since Linux 3.2)
              When conducting measurements that include processes running VM
              instances (i.e., have executed a KVM_RUN ioctl(2)), only mea‐
              sure events happening inside a guest instance.  This is only
              meaningful outside the guests; this setting does not change
              counts gathered inside of a guest.  Currently, this function‐
              ality is x86 only.

       exclude_guest (since Linux 3.2)
              When conducting measurements that include processes running VM
              instances (i.e., have executed a KVM_RUN ioctl(2)), do not
              measure events happening inside guest instances.  This is only
              meaningful outside the guests; this setting does not change
              counts gathered inside of a guest.  Currently, this function‐
              ality is x86 only.


Which makes things much clearer.

Perhaps you want to add a reference to the man page in your text,
something like?

  Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
  to request counting of events restricted to guest and host contexts when
  using virtualisation. See the perf_event_open(2) man page for more
  detail.


cheers
Christoffer Dall Dec. 12, 2018, 8:07 a.m. UTC | #6
On Tue, Dec 11, 2018 at 01:59:03PM +0000, Andrew Murray wrote:
> On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> > [ Reviving old thread. ]
> > 
> > Andrew Murray <andrew.murray@arm.com> writes:
> > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> > >> Andrew Murray <andrew.murray@arm.com> writes:
> > >> 
> > >> > Update design.txt to reflect the presence of the exclude_host
> > >> > and exclude_guest perf flags.
> > >> >
> > >> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > >> > ---
> > >> >  tools/perf/design.txt | 4 ++++
> > >> >  1 file changed, 4 insertions(+)
> > >> >
> > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > >> > index a28dca2..7de7d83 100644
> > >> > --- a/tools/perf/design.txt
> > >> > +++ b/tools/perf/design.txt
> > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> > >> >  way to request that counting of events be restricted to times when the
> > >> >  CPU is in user, kernel and/or hypervisor mode.
> > >> >  
> > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> > >> > +to request counting of events restricted to guest and host contexts when
> > >> > +using virtualisation.
> > >> 
> > >> How does exclude_host differ from exclude_hv ?
> > >
> > > I believe exclude_host / exclude_guest are intented to distinguish
> > > between host and guest in the hosted hypervisor context (KVM).
> > 
> > OK yeah, from the perf-list man page:
> > 
> >            u - user-space counting
> >            k - kernel counting
> >            h - hypervisor counting
> >            I - non idle counting
> >            G - guest counting (in KVM guests)
> >            H - host counting (not in KVM guests)
> > 
> > > Whereas exclude_hv allows to distinguish between guest and
> > > hypervisor in the bare-metal type hypervisors.
> > 
> > Except that's exactly not how we use them on powerpc :)
> > 
> > We use exclude_hv to exclude "the hypervisor", regardless of whether
> > it's KVM or PowerVM (which is a bare-metal hypervisor).
> > 
> > We don't use exclude_host / exclude_guest at all, which I guess is a
> > bug, except I didn't know they existed until this thread.
> > 
> > eg, in a KVM guest:
> > 
> >   $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
> >   $ perf report -D | grep -Fc "dso: [hypervisor]"
> >   16
> > 
> > 
> > > In the case of arm64 - if VHE extensions are present then the host
> > > kernel will run at a higher privilege to the guest kernel, in which
> > > case there is no distinction between hypervisor and host so we ignore
> > > exclude_hv. But where VHE extensions are not present then the host
> > > kernel runs at the same privilege level as the guest and we use a
> > > higher privilege level to switch between them - in this case we can
> > > use exclude_hv to discount that hypervisor role of switching between
> > > guests.
> > 
> > I couldn't find any arm64 perf code using exclude_host/guest at all?
> 
> Correct - but this is in flight as I am currently adding support for this
> see [1].
> 
> > 
> > And I don't see any x86 code using exclude_hv.
> 
> I can't find any either.
> 
> > 
> > But maybe that's OK, I just worry this is confusing for users.
> 
> There is some extra context regarding this where exclude_guest/exclude_host
> was added, see [2] and where exclude_hv was added, see [3]
> 
> Generally it seems that exclude_guest/exclude_host relies upon switching
> counters off/on on guest/host switch code (which works well in the nested
> virt case). Whereas exclude_hv tends to rely solely on hardware capability
> based on privilege level (which works well in the bare metal case where
> the guest doesn't run at same privilege as the host).
> 
> I think from the user perspective exclude_hv allows you to see your overhead
> if you are a guest (i.e. work done by bare metal hypervisor associated with
> you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
> see events above you (i.e. the kernel hypervisor) if you are the guest...
> 
> At least that's how I read this, I've copied in others that may provide
> more authoritative feedback.
> 
> [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
> [2] https://www.spinics.net/lists/kvm/msg53996.html
> [3] https://lore.kernel.org/patchwork/patch/143918/
> 

I'll try to answer this in a different way, based on previous
discussions with Joerg et al. who introduced these flags.  Assume no
support for nested virtualization as a first approximation:

  If you are running as a guest:
    - exclude_hv: stop counting events when the hypervisor runs
    - exclude_host: has no effect
    - exclude_guest: has no effect
  
  If you are running as a host/hypervisor:
   - exclude_hv: has no effect
   - exclude_host: only count events when the guest is running
   - exclude_guest: only count events when the host is running

With nested virtualization, you get the natural union of the above.

**This has nothing to do with the design of the hypervisor such as the
ARM non-VHE KVM which splits its execution across EL1 and EL2 -- those
are both considered host from the point of view of Linux as a hypervisor
using KVM, and both considered hypervisor from the point of view of a
guest.**


Thanks,

    Christoffer
Andrew Murray Dec. 12, 2018, 5:08 p.m. UTC | #7
On Wed, Dec 12, 2018 at 09:07:42AM +0100, Christoffer Dall wrote:
> On Tue, Dec 11, 2018 at 01:59:03PM +0000, Andrew Murray wrote:
> > On Tue, Dec 11, 2018 at 10:06:53PM +1100, Michael Ellerman wrote:
> > > [ Reviving old thread. ]
> > > 
> > > Andrew Murray <andrew.murray@arm.com> writes:
> > > > On Tue, Nov 20, 2018 at 10:31:36PM +1100, Michael Ellerman wrote:
> > > >> Andrew Murray <andrew.murray@arm.com> writes:
> > > >> 
> > > >> > Update design.txt to reflect the presence of the exclude_host
> > > >> > and exclude_guest perf flags.
> > > >> >
> > > >> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> > > >> > ---
> > > >> >  tools/perf/design.txt | 4 ++++
> > > >> >  1 file changed, 4 insertions(+)
> > > >> >
> > > >> > diff --git a/tools/perf/design.txt b/tools/perf/design.txt
> > > >> > index a28dca2..7de7d83 100644
> > > >> > --- a/tools/perf/design.txt
> > > >> > +++ b/tools/perf/design.txt
> > > >> > @@ -222,6 +222,10 @@ The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
> > > >> >  way to request that counting of events be restricted to times when the
> > > >> >  CPU is in user, kernel and/or hypervisor mode.
> > > >> >  
> > > >> > +Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
> > > >> > +to request counting of events restricted to guest and host contexts when
> > > >> > +using virtualisation.
> > > >> 
> > > >> How does exclude_host differ from exclude_hv ?
> > > >
> > > > I believe exclude_host / exclude_guest are intented to distinguish
> > > > between host and guest in the hosted hypervisor context (KVM).
> > > 
> > > OK yeah, from the perf-list man page:
> > > 
> > >            u - user-space counting
> > >            k - kernel counting
> > >            h - hypervisor counting
> > >            I - non idle counting
> > >            G - guest counting (in KVM guests)
> > >            H - host counting (not in KVM guests)
> > > 
> > > > Whereas exclude_hv allows to distinguish between guest and
> > > > hypervisor in the bare-metal type hypervisors.
> > > 
> > > Except that's exactly not how we use them on powerpc :)
> > > 
> > > We use exclude_hv to exclude "the hypervisor", regardless of whether
> > > it's KVM or PowerVM (which is a bare-metal hypervisor).
> > > 
> > > We don't use exclude_host / exclude_guest at all, which I guess is a
> > > bug, except I didn't know they existed until this thread.
> > > 
> > > eg, in a KVM guest:
> > > 
> > >   $ perf record -e cycles:G /bin/bash -c "for i in {0..100000}; do :;done"
> > >   $ perf report -D | grep -Fc "dso: [hypervisor]"
> > >   16
> > > 
> > > 
> > > > In the case of arm64 - if VHE extensions are present then the host
> > > > kernel will run at a higher privilege to the guest kernel, in which
> > > > case there is no distinction between hypervisor and host so we ignore
> > > > exclude_hv. But where VHE extensions are not present then the host
> > > > kernel runs at the same privilege level as the guest and we use a
> > > > higher privilege level to switch between them - in this case we can
> > > > use exclude_hv to discount that hypervisor role of switching between
> > > > guests.
> > > 
> > > I couldn't find any arm64 perf code using exclude_host/guest at all?
> > 
> > Correct - but this is in flight as I am currently adding support for this
> > see [1].
> > 
> > > 
> > > And I don't see any x86 code using exclude_hv.
> > 
> > I can't find any either.
> > 
> > > 
> > > But maybe that's OK, I just worry this is confusing for users.
> > 
> > There is some extra context regarding this where exclude_guest/exclude_host
> > was added, see [2] and where exclude_hv was added, see [3]
> > 
> > Generally it seems that exclude_guest/exclude_host relies upon switching
> > counters off/on on guest/host switch code (which works well in the nested
> > virt case). Whereas exclude_hv tends to rely solely on hardware capability
> > based on privilege level (which works well in the bare metal case where
> > the guest doesn't run at same privilege as the host).
> > 
> > I think from the user perspective exclude_hv allows you to see your overhead
> > if you are a guest (i.e. work done by bare metal hypervisor associated with
> > you as the guest). Whereas exclude_guest/exclude_host doesn't allow you to
> > see events above you (i.e. the kernel hypervisor) if you are the guest...
> > 
> > At least that's how I read this, I've copied in others that may provide
> > more authoritative feedback.
> > 
> > [1] https://lists.cs.columbia.edu/pipermail/kvmarm/2018-December/033698.html
> > [2] https://www.spinics.net/lists/kvm/msg53996.html
> > [3] https://lore.kernel.org/patchwork/patch/143918/
> > 
> 
> I'll try to answer this in a different way, based on previous
> discussions with Joerg et al. who introduced these flags.  Assume no
> support for nested virtualization as a first approximation:
> 
>   If you are running as a guest:
>     - exclude_hv: stop counting events when the hypervisor runs
>     - exclude_host: has no effect
>     - exclude_guest: has no effect
>   
>   If you are running as a host/hypervisor:
>    - exclude_hv: has no effect
>    - exclude_host: only count events when the guest is running
>    - exclude_guest: only count events when the host is running
> 
> With nested virtualization, you get the natural union of the above.
> 
> **This has nothing to do with the design of the hypervisor such as the
> ARM non-VHE KVM which splits its execution across EL1 and EL2 -- those
> are both considered host from the point of view of Linux as a hypervisor
> using KVM, and both considered hypervisor from the point of view of a
> guest.**

For clarity, this is what arm64 currently does (assuming no nesting and
without the current version of this patchset):

   If you are running as a guest (VHE or !VHE host):
     - exclude_hv: has no effect for a KVM guest (filters hypervisor on !VHE
		   bare metal hypervisor guest)
     - exclude_host: has no effect
     - exclude_guest: has no effect
   
   If you are running as a host/hypervisor:
    - exclude_hv: has no effect for VHE (filters EL2 on !VHE)
    - exclude_host: only count events when the guest is running
    - exclude_guest: only count events when the host is running
 
Is this as expected?

Thanks,

Andrew Murray

> 
> 
> Thanks,
> 
>     Christoffer
diff mbox series

Patch

diff --git a/tools/perf/design.txt b/tools/perf/design.txt
index a28dca2..7de7d83 100644
--- a/tools/perf/design.txt
+++ b/tools/perf/design.txt
@@ -222,6 +222,10 @@  The 'exclude_user', 'exclude_kernel' and 'exclude_hv' bits provide a
 way to request that counting of events be restricted to times when the
 CPU is in user, kernel and/or hypervisor mode.
 
+Furthermore the 'exclude_host' and 'exclude_guest' bits provide a way
+to request counting of events restricted to guest and host contexts when
+using virtualisation.
+
 The 'mmap' and 'munmap' bits allow recording of PROT_EXEC mmap/munmap
 operations, these can be used to relate userspace IP addresses to actual
 code, even after the mapping (or even the whole process) is gone,