mbox series

[RFC,0/4] async: fix hangs on weakly-ordered architectures

Message ID 20200407140746.8041-1-pbonzini@redhat.com
Headers show
Series async: fix hangs on weakly-ordered architectures | expand

Message

Paolo Bonzini April 7, 2020, 2:07 p.m. UTC
ARM machines and other weakly-ordered architectures have been suffering
for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
these are mitigated by the timers that sooner or later fire in the main
loop, but these will not happen for the tools and probably not with I/O
threads either.

The fix is in patch 5.  Patch 1-3 are docs updates that explain the bug,
and patch 4 is a bugfix exposed by the new patch.

Paolo

Paolo Bonzini (5):
  atomics: convert to reStructuredText
  atomics: update documentation
  rcu: do not mention atomic_mb_read/set in documentation
  aio-wait: delegate polling of main AioContext if BQL not held
  async: use explicit memory barriers

 docs/devel/atomics.rst   | 501 +++++++++++++++++++++++++++++++++++++++
 docs/devel/atomics.txt   | 403 -------------------------------
 docs/devel/index.rst     |   1 +
 docs/devel/rcu.txt       |   4 +-
 include/block/aio-wait.h |  22 ++
 include/block/aio.h      |  29 +--
 util/aio-posix.c         |  16 +-
 util/aio-win32.c         |  17 +-
 util/async.c             |  16 +-
 9 files changed, 576 insertions(+), 433 deletions(-)
 create mode 100644 docs/devel/atomics.rst
 delete mode 100644 docs/devel/atomics.txt

Comments

fangying April 8, 2020, 9:12 a.m. UTC | #1
On 2020/4/7 22:07, Paolo Bonzini wrote:
> ARM machines and other weakly-ordered architectures have been suffering
> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
> these are mitigated by the timers that sooner or later fire in the main
> loop, but these will not happen for the tools and probably not with I/O
> threads either.
yes, we occasionally see qemu main thread hangs and VM stuck in in-shutdown
state on aarch64 platform. So this could happen with I/O threads.
> 
> The fix is in patch 5.  Patch 1-3 are docs updates that explain the bug,
> and patch 4 is a bugfix exposed by the new patch.
> 
> Paolo
> 
> Paolo Bonzini (5):
>    atomics: convert to reStructuredText
>    atomics: update documentation
>    rcu: do not mention atomic_mb_read/set in documentation
>    aio-wait: delegate polling of main AioContext if BQL not held
>    async: use explicit memory barriers
> 
>   docs/devel/atomics.rst   | 501 +++++++++++++++++++++++++++++++++++++++
>   docs/devel/atomics.txt   | 403 -------------------------------
>   docs/devel/index.rst     |   1 +
>   docs/devel/rcu.txt       |   4 +-
>   include/block/aio-wait.h |  22 ++
>   include/block/aio.h      |  29 +--
>   util/aio-posix.c         |  16 +-
>   util/aio-win32.c         |  17 +-
>   util/async.c             |  16 +-
>   9 files changed, 576 insertions(+), 433 deletions(-)
>   create mode 100644 docs/devel/atomics.rst
>   delete mode 100644 docs/devel/atomics.txt
>
Paolo Bonzini April 8, 2020, 3:05 p.m. UTC | #2
On 08/04/20 11:12, Ying Fang wrote:
> On 2020/4/7 22:07, Paolo Bonzini wrote:
>> ARM machines and other weakly-ordered architectures have been suffering
>> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
>> these are mitigated by the timers that sooner or later fire in the main
>> loop, but these will not happen for the tools and probably not with I/O
>> threads either.
>
> yes, we occasionally see qemu main thread hangs and VM stuck in in-shutdown
> state on aarch64 platform. So this could happen with I/O threads.

Thanks for confirming!  Have you managed to test the final version of
the patches?  It would be great to include test results.

Paolo
fangying April 9, 2020, 6:54 a.m. UTC | #3
On 2020/4/8 23:05, Paolo Bonzini wrote:
> On 08/04/20 11:12, Ying Fang wrote:
>> On 2020/4/7 22:07, Paolo Bonzini wrote:
>>> ARM machines and other weakly-ordered architectures have been suffering
>>> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
>>> these are mitigated by the timers that sooner or later fire in the main
>>> loop, but these will not happen for the tools and probably not with I/O
>>> threads either.
>>
>> yes, we occasionally see qemu main thread hangs and VM stuck in in-shutdown
>> state on aarch64 platform. So this could happen with I/O threads.
> 
> Thanks for confirming!  Have you managed to test the final version of
> the patches?  It would be great to include test results.

Yes, I did the test with your latest patches on both aarch64 and
x86 platform.Test results show that the hang has been fixed. Thanks.

> 
> Paolo
> 
> 
>
Stefan Hajnoczi April 9, 2020, 3:17 p.m. UTC | #4
On Tue, Apr 07, 2020 at 10:07:41AM -0400, Paolo Bonzini wrote:
> ARM machines and other weakly-ordered architectures have been suffering
> for a long time from hangs in qemu-img and qemu-io.  For QEMU binaries
> these are mitigated by the timers that sooner or later fire in the main
> loop, but these will not happen for the tools and probably not with I/O
> threads either.
> 
> The fix is in patch 5.  Patch 1-3 are docs updates that explain the bug,
> and patch 4 is a bugfix exposed by the new patch.
> 
> Paolo
> 
> Paolo Bonzini (5):
>   atomics: convert to reStructuredText
>   atomics: update documentation
>   rcu: do not mention atomic_mb_read/set in documentation
>   aio-wait: delegate polling of main AioContext if BQL not held
>   async: use explicit memory barriers
> 
>  docs/devel/atomics.rst   | 501 +++++++++++++++++++++++++++++++++++++++
>  docs/devel/atomics.txt   | 403 -------------------------------
>  docs/devel/index.rst     |   1 +
>  docs/devel/rcu.txt       |   4 +-
>  include/block/aio-wait.h |  22 ++
>  include/block/aio.h      |  29 +--
>  util/aio-posix.c         |  16 +-
>  util/aio-win32.c         |  17 +-
>  util/async.c             |  16 +-
>  9 files changed, 576 insertions(+), 433 deletions(-)
>  create mode 100644 docs/devel/atomics.rst
>  delete mode 100644 docs/devel/atomics.txt

Applied patches 4 and 5 to my block branch.

Stefan