mbox series

[00/35] exec: drop BQL from interrupt handling

Message ID 20180917163103.6113-1-cota@braap.org
Headers show
Series exec: drop BQL from interrupt handling | expand

Message

Emilio Cota Sept. 17, 2018, 4:30 p.m. UTC
This series comes originally from a series of patches that Paolo
sent to me a long time ago. I have kept most of his S-o-b tags,
but I have done the forward port of the patches to the current
QEMU code base, so please blame all possible bugs on me, not him.

The goal of this series is to push the BQL out of interrupt
handling wherever possible. Many targets do not need the BQL
to handle interrupts, so for those targets we get rid of the
BQL in interrupt handling. In TCG mode, this by itself does
not yield measurable performance gains. However, it is a
necessary first step before tackling BQL contention when
managing CPU states (i.e. exiting the exec loop always
involves taking the BQL, which does not scale).

The patches:

- Patch 0 accesses icount_decr.u16 consistently with atomics.
  This is important because we use it across threads to signal
  that an interrupt is pending.

- 2-28: convert direct callers of cpu->interrupt_request
  to use atomic accesses or CPU helper functions that use atomics.
  I have broken these up into individual patches to ease review.
  Note that in some cases I have added many atomic_reads, most
  of which are probably unnecessary. However, I'd leave this up
  to the maintainers, since I do not want to break things by
  doing a single atomic_read(&cpu->interrupt_request) at the
  beginning of a function and then miss subsequent updates to
  cpu->interrupt_request.

- 29-31: drop the BQL requirement when handling cpu->interrupt_request

- 32-33: drop now unnecessary BQL acquisitions in i386 and ppc

- 34-35: push down the BQL acquisition to cpu->do_interrupt
  and cpu->cpu_exec_interrupt. Here I am Cc'ing everyone so that
  we can make sure I am not missing any target that requires
  the BQL on either do_interrupt or cpu_exec_interrupt.

This series is checkpatch-clean, and can be fetched from:
  https://github.com/cota/qemu/tree/cpu-lock

Note that the series applies on top of patches that are
already on the qemu-devel list and will hopefully soon
be upstream. This series begins after commit
4ece0cbd74 ("cpus: access .qemu_icount_bias with atomic64")
in that tree.

Thanks,

		Emilio

Comments

David Hildenbrand Sept. 18, 2018, 9:51 a.m. UTC | #1
Am 17.09.18 um 18:30 schrieb Emilio G. Cota:
> This series comes originally from a series of patches that Paolo
> sent to me a long time ago. I have kept most of his S-o-b tags,
> but I have done the forward port of the patches to the current
> QEMU code base, so please blame all possible bugs on me, not him.
> 
> The goal of this series is to push the BQL out of interrupt
> handling wherever possible. Many targets do not need the BQL
> to handle interrupts, so for those targets we get rid of the
> BQL in interrupt handling. In TCG mode, this by itself does
> not yield measurable performance gains. However, it is a
> necessary first step before tackling BQL contention when
> managing CPU states (i.e. exiting the exec loop always
> involves taking the BQL, which does not scale).
> 
> The patches:
> 
> - Patch 0 accesses icount_decr.u16 consistently with atomics.
>   This is important because we use it across threads to signal
>   that an interrupt is pending.
> 
> - 2-28: convert direct callers of cpu->interrupt_request
>   to use atomic accesses or CPU helper functions that use atomics.
>   I have broken these up into individual patches to ease review.
>   Note that in some cases I have added many atomic_reads, most
>   of which are probably unnecessary. However, I'd leave this up
>   to the maintainers, since I do not want to break things by
>   doing a single atomic_read(&cpu->interrupt_request) at the
>   beginning of a function and then miss subsequent updates to
>   cpu->interrupt_request.
> 
> - 29-31: drop the BQL requirement when handling cpu->interrupt_request
> 
> - 32-33: drop now unnecessary BQL acquisitions in i386 and ppc
> 
> - 34-35: push down the BQL acquisition to cpu->do_interrupt
>   and cpu->cpu_exec_interrupt. Here I am Cc'ing everyone so that
>   we can make sure I am not missing any target that requires
>   the BQL on either do_interrupt or cpu_exec_interrupt.
> 
> This series is checkpatch-clean, and can be fetched from:
>   https://github.com/cota/qemu/tree/cpu-lock
> 
> Note that the series applies on top of patches that are
> already on the qemu-devel list and will hopefully soon
> be upstream. This series begins after commit
> 4ece0cbd74 ("cpus: access .qemu_icount_bias with atomic64")
> in that tree.
> 
> Thanks,
> 
> 		Emilio
> 

I just did a (very slow) make -j32 of QEMU on my 16 core notebook inside
a MTTCG s390x-softmmu guest with 32 VCPUs. I used the same test when
reworking floating interrupt support to make sure there are no races.

As I got no deadlock, I assume you series does not break anything for s390x.

Thanks!
Mark Cave-Ayland Sept. 20, 2018, 8:05 p.m. UTC | #2
On 17/09/2018 17:30, Emilio G. Cota wrote:

> This series comes originally from a series of patches that Paolo
> sent to me a long time ago. I have kept most of his S-o-b tags,
> but I have done the forward port of the patches to the current
> QEMU code base, so please blame all possible bugs on me, not him.
> 
> The goal of this series is to push the BQL out of interrupt
> handling wherever possible. Many targets do not need the BQL
> to handle interrupts, so for those targets we get rid of the
> BQL in interrupt handling. In TCG mode, this by itself does
> not yield measurable performance gains. However, it is a
> necessary first step before tackling BQL contention when
> managing CPU states (i.e. exiting the exec loop always
> involves taking the BQL, which does not scale).
> 
> The patches:
> 
> - Patch 0 accesses icount_decr.u16 consistently with atomics.
>   This is important because we use it across threads to signal
>   that an interrupt is pending.
> 
> - 2-28: convert direct callers of cpu->interrupt_request
>   to use atomic accesses or CPU helper functions that use atomics.
>   I have broken these up into individual patches to ease review.
>   Note that in some cases I have added many atomic_reads, most
>   of which are probably unnecessary. However, I'd leave this up
>   to the maintainers, since I do not want to break things by
>   doing a single atomic_read(&cpu->interrupt_request) at the
>   beginning of a function and then miss subsequent updates to
>   cpu->interrupt_request.
> 
> - 29-31: drop the BQL requirement when handling cpu->interrupt_request
> 
> - 32-33: drop now unnecessary BQL acquisitions in i386 and ppc
> 
> - 34-35: push down the BQL acquisition to cpu->do_interrupt
>   and cpu->cpu_exec_interrupt. Here I am Cc'ing everyone so that
>   we can make sure I am not missing any target that requires
>   the BQL on either do_interrupt or cpu_exec_interrupt.
> 
> This series is checkpatch-clean, and can be fetched from:
>   https://github.com/cota/qemu/tree/cpu-lock
> 
> Note that the series applies on top of patches that are
> already on the qemu-devel list and will hopefully soon
> be upstream. This series begins after commit
> 4ece0cbd74 ("cpus: access .qemu_icount_bias with atomic64")
> in that tree.

I ran the above cpu-lock branch through my SPARC32 and SPARC64 test suite just to be
sure, and everything appeared to work fine. So for the SPARC32/64 patches:

Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>


ATB,

Mark.