mbox series

[v7,0/8] Introduce 'yank' oob qmp command to recover from hanging qemu

Message ID cover.1596528468.git.lukasstraub2@web.de
Headers show
Series Introduce 'yank' oob qmp command to recover from hanging qemu | expand

Message

Lukas Straub Aug. 4, 2020, 8:11 a.m. UTC
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

v7:
 -yank_register_instance now returns error via Error **errp instead of aborting
 -dropped "chardev/char.c: Check for duplicate id before  creating chardev"

v6:
 -add Reviewed-by and Acked-by tags
 -rebase on master
 -lots of changes in nbd due to rebase
 -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
 -fix a crash discovered by the newly added chardev test
 -fix the test itself

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Lukas Straub (8):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document thread-safety of qio_channel_shutdown
  MAINTAINERS: Add myself as maintainer for yank feature
  tests/test-char.c: Wait for the chardev to connect in
    char_socket_client_dupid_test

 MAINTAINERS                   |   6 ++
 block/nbd.c                   | 129 +++++++++++++++---------
 chardev/char-socket.c         |  31 ++++++
 include/io/channel.h          |   2 +
 include/qemu/yank.h           |  80 +++++++++++++++
 io/channel-tls.c              |   6 +-
 migration/channel.c           |  12 +++
 migration/migration.c         |  25 ++++-
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   6 ++
 migration/savevm.c            |   6 ++
 qapi/misc.json                |  45 +++++++++
 tests/Makefile.include        |   2 +-
 tests/test-char.c             |   1 +
 util/Makefile.objs            |   1 +
 util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
 16 files changed, 493 insertions(+), 53 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

--
2.20.1

Comments

Lukas Straub Aug. 18, 2020, 12:26 p.m. UTC | #1
On Tue, 4 Aug 2020 10:11:22 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> to some other server and that server dies or hangs, qemu hangs too.
> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.
> 
> Regards,
> Lukas Straub
> 
> v7:
>  -yank_register_instance now returns error via Error **errp instead of aborting
>  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> 
> v6:
>  -add Reviewed-by and Acked-by tags
>  -rebase on master
>  -lots of changes in nbd due to rebase
>  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
>  -fix a crash discovered by the newly added chardev test
>  -fix the test itself
> 
> v5:
>  -move yank.c to util/
>  -move yank.h to include/qemu/
>  -add license to yank.h
>  -use const char*
>  -nbd: use atomic_store_release and atomic_load_aqcuire
>  -io-channel: ensure thread-safety and document it
>  -add myself as maintainer for yank
> 
> v4:
>  -fix build errors...
> 
> v3:
>  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
>  -fix build errors
>  -rewrite migration patch so it actually passes all tests
> 
> v2:
>  -don't touch io/ code anymore
>  -always register yank functions
>  -'yank' now takes a list of instances to yank
>  -'query-yank' returns a list of yankable instances
> 
> Lukas Straub (8):
>   Introduce yank feature
>   block/nbd.c: Add yank feature
>   chardev/char-socket.c: Add yank feature
>   migration: Add yank feature
>   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
>   io: Document thread-safety of qio_channel_shutdown
>   MAINTAINERS: Add myself as maintainer for yank feature
>   tests/test-char.c: Wait for the chardev to connect in
>     char_socket_client_dupid_test
> 
>  MAINTAINERS                   |   6 ++
>  block/nbd.c                   | 129 +++++++++++++++---------
>  chardev/char-socket.c         |  31 ++++++
>  include/io/channel.h          |   2 +
>  include/qemu/yank.h           |  80 +++++++++++++++
>  io/channel-tls.c              |   6 +-
>  migration/channel.c           |  12 +++
>  migration/migration.c         |  25 ++++-
>  migration/multifd.c           |  10 ++
>  migration/qemu-file-channel.c |   6 ++
>  migration/savevm.c            |   6 ++
>  qapi/misc.json                |  45 +++++++++
>  tests/Makefile.include        |   2 +-
>  tests/test-char.c             |   1 +
>  util/Makefile.objs            |   1 +
>  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
>  16 files changed, 493 insertions(+), 53 deletions(-)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
> 
> --
> 2.20.1

Ping...
Lukas Straub Aug. 27, 2020, 8:42 a.m. UTC | #2
On Tue, 18 Aug 2020 14:26:31 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> On Tue, 4 Aug 2020 10:11:22 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > to some other server and that server dies or hangs, qemu hangs too.
> > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.
> > 
> > Regards,
> > Lukas Straub
> > 
> > v7:
> >  -yank_register_instance now returns error via Error **errp instead of aborting
> >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> > 
> > v6:
> >  -add Reviewed-by and Acked-by tags
> >  -rebase on master
> >  -lots of changes in nbd due to rebase
> >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> >  -fix a crash discovered by the newly added chardev test
> >  -fix the test itself
> > 
> > v5:
> >  -move yank.c to util/
> >  -move yank.h to include/qemu/
> >  -add license to yank.h
> >  -use const char*
> >  -nbd: use atomic_store_release and atomic_load_aqcuire
> >  -io-channel: ensure thread-safety and document it
> >  -add myself as maintainer for yank
> > 
> > v4:
> >  -fix build errors...
> > 
> > v3:
> >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> >  -fix build errors
> >  -rewrite migration patch so it actually passes all tests
> > 
> > v2:
> >  -don't touch io/ code anymore
> >  -always register yank functions
> >  -'yank' now takes a list of instances to yank
> >  -'query-yank' returns a list of yankable instances
> > 
> > Lukas Straub (8):
> >   Introduce yank feature
> >   block/nbd.c: Add yank feature
> >   chardev/char-socket.c: Add yank feature
> >   migration: Add yank feature
> >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> >   io: Document thread-safety of qio_channel_shutdown
> >   MAINTAINERS: Add myself as maintainer for yank feature
> >   tests/test-char.c: Wait for the chardev to connect in
> >     char_socket_client_dupid_test
> > 
> >  MAINTAINERS                   |   6 ++
> >  block/nbd.c                   | 129 +++++++++++++++---------
> >  chardev/char-socket.c         |  31 ++++++
> >  include/io/channel.h          |   2 +
> >  include/qemu/yank.h           |  80 +++++++++++++++
> >  io/channel-tls.c              |   6 +-
> >  migration/channel.c           |  12 +++
> >  migration/migration.c         |  25 ++++-
> >  migration/multifd.c           |  10 ++
> >  migration/qemu-file-channel.c |   6 ++
> >  migration/savevm.c            |   6 ++
> >  qapi/misc.json                |  45 +++++++++
> >  tests/Makefile.include        |   2 +-
> >  tests/test-char.c             |   1 +
> >  util/Makefile.objs            |   1 +
> >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> >  16 files changed, 493 insertions(+), 53 deletions(-)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c
> > 
> > --
> > 2.20.1  
> 
> Ping...

Ping 2...

Also, can the different subsystems have a look at this and give their ok?

Regards,
Lukas Straub
Daniel P. Berrangé Aug. 27, 2020, 10:41 a.m. UTC | #3
On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:
> On Tue, 18 Aug 2020 14:26:31 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > On Tue, 4 Aug 2020 10:11:22 +0200
> > Lukas Straub <lukasstraub2@web.de> wrote:
> > 
> > > Hello Everyone,
> > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > to some other server and that server dies or hangs, qemu hangs too.
> > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > these kinds of hangs. The different subsystems register callbacks which get
> > > executed with the yank command. For example the callback can shutdown() a
> > > socket. This is intended for the colo use-case, but it can be used for other
> > > things too of course.
> > > 
> > > Regards,
> > > Lukas Straub
> > > 
> > > v7:
> > >  -yank_register_instance now returns error via Error **errp instead of aborting
> > >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> > > 
> > > v6:
> > >  -add Reviewed-by and Acked-by tags
> > >  -rebase on master
> > >  -lots of changes in nbd due to rebase
> > >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> > >  -fix a crash discovered by the newly added chardev test
> > >  -fix the test itself
> > > 
> > > v5:
> > >  -move yank.c to util/
> > >  -move yank.h to include/qemu/
> > >  -add license to yank.h
> > >  -use const char*
> > >  -nbd: use atomic_store_release and atomic_load_aqcuire
> > >  -io-channel: ensure thread-safety and document it
> > >  -add myself as maintainer for yank
> > > 
> > > v4:
> > >  -fix build errors...
> > > 
> > > v3:
> > >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> > >  -fix build errors
> > >  -rewrite migration patch so it actually passes all tests
> > > 
> > > v2:
> > >  -don't touch io/ code anymore
> > >  -always register yank functions
> > >  -'yank' now takes a list of instances to yank
> > >  -'query-yank' returns a list of yankable instances
> > > 
> > > Lukas Straub (8):
> > >   Introduce yank feature
> > >   block/nbd.c: Add yank feature
> > >   chardev/char-socket.c: Add yank feature
> > >   migration: Add yank feature
> > >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> > >   io: Document thread-safety of qio_channel_shutdown
> > >   MAINTAINERS: Add myself as maintainer for yank feature
> > >   tests/test-char.c: Wait for the chardev to connect in
> > >     char_socket_client_dupid_test
> > > 
> > >  MAINTAINERS                   |   6 ++
> > >  block/nbd.c                   | 129 +++++++++++++++---------
> > >  chardev/char-socket.c         |  31 ++++++
> > >  include/io/channel.h          |   2 +
> > >  include/qemu/yank.h           |  80 +++++++++++++++
> > >  io/channel-tls.c              |   6 +-
> > >  migration/channel.c           |  12 +++
> > >  migration/migration.c         |  25 ++++-
> > >  migration/multifd.c           |  10 ++
> > >  migration/qemu-file-channel.c |   6 ++
> > >  migration/savevm.c            |   6 ++
> > >  qapi/misc.json                |  45 +++++++++
> > >  tests/Makefile.include        |   2 +-
> > >  tests/test-char.c             |   1 +
> > >  util/Makefile.objs            |   1 +
> > >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> > >  16 files changed, 493 insertions(+), 53 deletions(-)
> > >  create mode 100644 include/qemu/yank.h
> > >  create mode 100644 util/yank.c
> > > 
> > > --
> > > 2.20.1  
> > 
> > Ping...
> 
> Ping 2...
> 
> Also, can the different subsystems have a look at this and give their ok?

We need ACKs from the NBD, migration and chardev maintainers, for the
respective patches, then I think this series is ready for a pull request.

Once acks arrive, I'm happy to send a PULL unless someone else has a
desire todo it.


Regards,
Daniel
Markus Armbruster Aug. 27, 2020, 2:18 p.m. UTC | #4
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:
[...]
>> Also, can the different subsystems have a look at this and give their ok?
>
> We need ACKs from the NBD, migration and chardev maintainers, for the
> respective patches, then I think this series is ready for a pull request.

The QMP interface and its documentation need a bit of work, see my
review of PATCH 1.  I'm hopeful v8 will nail it.

> Once acks arrive, I'm happy to send a PULL unless someone else has a
> desire todo it.

Not yet, please.
Dr. David Alan Gilbert Aug. 27, 2020, 5:58 p.m. UTC | #5
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Thu, Aug 27, 2020 at 10:42:46AM +0200, Lukas Straub wrote:
> > On Tue, 18 Aug 2020 14:26:31 +0200
> > Lukas Straub <lukasstraub2@web.de> wrote:
> > 
> > > On Tue, 4 Aug 2020 10:11:22 +0200
> > > Lukas Straub <lukasstraub2@web.de> wrote:
> > > 
> > > > Hello Everyone,
> > > > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > > > to some other server and that server dies or hangs, qemu hangs too.
> > > > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > > > these kinds of hangs. The different subsystems register callbacks which get
> > > > executed with the yank command. For example the callback can shutdown() a
> > > > socket. This is intended for the colo use-case, but it can be used for other
> > > > things too of course.
> > > > 
> > > > Regards,
> > > > Lukas Straub
> > > > 
> > > > v7:
> > > >  -yank_register_instance now returns error via Error **errp instead of aborting
> > > >  -dropped "chardev/char.c: Check for duplicate id before  creating chardev"
> > > > 
> > > > v6:
> > > >  -add Reviewed-by and Acked-by tags
> > > >  -rebase on master
> > > >  -lots of changes in nbd due to rebase
> > > >  -only take maintainership of util/yank.c and include/qemu/yank.h (Daniel P. Berrangé)
> > > >  -fix a crash discovered by the newly added chardev test
> > > >  -fix the test itself
> > > > 
> > > > v5:
> > > >  -move yank.c to util/
> > > >  -move yank.h to include/qemu/
> > > >  -add license to yank.h
> > > >  -use const char*
> > > >  -nbd: use atomic_store_release and atomic_load_aqcuire
> > > >  -io-channel: ensure thread-safety and document it
> > > >  -add myself as maintainer for yank
> > > > 
> > > > v4:
> > > >  -fix build errors...
> > > > 
> > > > v3:
> > > >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> > > >  -fix build errors
> > > >  -rewrite migration patch so it actually passes all tests
> > > > 
> > > > v2:
> > > >  -don't touch io/ code anymore
> > > >  -always register yank functions
> > > >  -'yank' now takes a list of instances to yank
> > > >  -'query-yank' returns a list of yankable instances
> > > > 
> > > > Lukas Straub (8):
> > > >   Introduce yank feature
> > > >   block/nbd.c: Add yank feature
> > > >   chardev/char-socket.c: Add yank feature
> > > >   migration: Add yank feature
> > > >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> > > >   io: Document thread-safety of qio_channel_shutdown
> > > >   MAINTAINERS: Add myself as maintainer for yank feature
> > > >   tests/test-char.c: Wait for the chardev to connect in
> > > >     char_socket_client_dupid_test
> > > > 
> > > >  MAINTAINERS                   |   6 ++
> > > >  block/nbd.c                   | 129 +++++++++++++++---------
> > > >  chardev/char-socket.c         |  31 ++++++
> > > >  include/io/channel.h          |   2 +
> > > >  include/qemu/yank.h           |  80 +++++++++++++++
> > > >  io/channel-tls.c              |   6 +-
> > > >  migration/channel.c           |  12 +++
> > > >  migration/migration.c         |  25 ++++-
> > > >  migration/multifd.c           |  10 ++
> > > >  migration/qemu-file-channel.c |   6 ++
> > > >  migration/savevm.c            |   6 ++
> > > >  qapi/misc.json                |  45 +++++++++
> > > >  tests/Makefile.include        |   2 +-
> > > >  tests/test-char.c             |   1 +
> > > >  util/Makefile.objs            |   1 +
> > > >  util/yank.c                   | 184 ++++++++++++++++++++++++++++++++++
> > > >  16 files changed, 493 insertions(+), 53 deletions(-)
> > > >  create mode 100644 include/qemu/yank.h
> > > >  create mode 100644 util/yank.c
> > > > 
> > > > --
> > > > 2.20.1  
> > > 
> > > Ping...
> > 
> > Ping 2...
> > 
> > Also, can the different subsystems have a look at this and give their ok?
> 
> We need ACKs from the NBD, migration and chardev maintainers, for the
> respective patches, then I think this series is ready for a pull request.

I'm happy from Migration:

Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Once acks arrive, I'm happy to send a PULL unless someone else has a
> desire todo it.

Looks like Markus would like a QMP tweak; but other than that I'd also
be happy to take it via migration; whichever is easiest.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|