Message ID | 20180328170207.49512-1-dgilbert@redhat.com |
---|---|
State | New |
Headers | show |
Series | migration: Don't activate block devices if using -S | expand |
On 03/28/2018 12:02 PM, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Activating the block devices causes the locks to be taken on > the backing file. If we're running with -S and the destination libvirt > hasn't started the destination with 'cont', it's expecting the locks are > still untaken. > > Don't activate the block devices if we're not going to autostart the VM; > 'cont' already will do that anyway. > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > migration/migration.c | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) Sounds like 2.12 material. Reviewed-by: Eric Blake <eblake@redhat.com>
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Activating the block devices causes the locks to be taken on > the backing file. If we're running with -S and the destination libvirt > hasn't started the destination with 'cont', it's expecting the locks are > still untaken. > > Don't activate the block devices if we're not going to autostart the VM; > 'cont' already will do that anyway. > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Queued > --- > migration/migration.c | 22 +++++++++++++++------- > 1 file changed, 15 insertions(+), 7 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 52a5092add..58bd382730 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -306,13 +306,21 @@ static void process_incoming_migration_bh(void *opaque) > Error *local_err = NULL; > MigrationIncomingState *mis = opaque; > > - /* Make sure all file formats flush their mutable metadata. > - * If we get an error here, just don't restart the VM yet. */ > - bdrv_invalidate_cache_all(&local_err); > - if (local_err) { > - error_report_err(local_err); > - local_err = NULL; > - autostart = false; > + /* Only fire up the block code now if we're going to restart the > + * VM, else 'cont' will do it. > + * This causes file locking to happen; so we don't want it to happen > + * unless we really are starting the VM. > + */ > + if (autostart && (!global_state_received() || > + global_state_get_runstate() == RUN_STATE_RUNNING)) { > + /* Make sure all file formats flush their mutable metadata. > + * If we get an error here, just don't restart the VM yet. */ > + bdrv_invalidate_cache_all(&local_err); > + if (local_err) { > + error_report_err(local_err); > + local_err = NULL; > + autostart = false; > + } > } > > /* > -- > 2.14.3 > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi, This series failed docker-build@min-glib build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. Type: series Message-id: 20180328170207.49512-1-dgilbert@redhat.com Subject: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-build@min-glib === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 4862dfc311 migration: Don't activate block devices if using -S === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' BUILD min-glib make[1]: Entering directory '/var/tmp/patchew-tester-tmp-_1xmie25/src' GEN /var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar.vroot'... done. Checking out files: 20% (1255/6066) Checking out files: 21% (1274/6066) Checking out files: 22% (1335/6066) Checking out files: 22% (1355/6066) Checking out files: 23% (1396/6066) Checking out files: 24% (1456/6066) Checking out files: 25% (1517/6066) Checking out files: 26% (1578/6066) Checking out files: 27% (1638/6066) Checking out files: 28% (1699/6066) Checking out files: 29% (1760/6066) Checking out files: 29% (1806/6066) Checking out files: 30% (1820/6066) Checking out files: 31% (1881/6066) Checking out files: 32% (1942/6066) Checking out files: 33% (2002/6066) Checking out files: 34% (2063/6066) Checking out files: 35% (2124/6066) Checking out files: 36% (2184/6066) Checking out files: 37% (2245/6066) Checking out files: 38% (2306/6066) Checking out files: 39% (2366/6066) Checking out files: 40% (2427/6066) Checking out files: 41% (2488/6066) Checking out files: 42% (2548/6066) Checking out files: 43% (2609/6066) Checking out files: 44% (2670/6066) Checking out files: 45% (2730/6066) Checking out files: 46% (2791/6066) Checking out files: 47% (2852/6066) Checking out files: 48% (2912/6066) Checking out files: 49% (2973/6066) Checking out files: 50% (3033/6066) Checking out files: 51% (3094/6066) Checking out files: 52% (3155/6066) Checking out files: 53% (3215/6066) Checking out files: 54% (3276/6066) Checking out files: 55% (3337/6066) Checking out files: 56% (3397/6066) Checking out files: 57% (3458/6066) Checking out files: 58% (3519/6066) Checking out files: 59% (3579/6066) Checking out files: 60% (3640/6066) Checking out files: 61% (3701/6066) Checking out files: 62% (3761/6066) Checking out files: 63% (3822/6066) Checking out files: 64% (3883/6066) Checking out files: 65% (3943/6066) Checking out files: 66% (4004/6066) Checking out files: 67% (4065/6066) Checking out files: 68% (4125/6066) Checking out files: 69% (4186/6066) Checking out files: 70% (4247/6066) Checking out files: 71% (4307/6066) Checking out files: 72% (4368/6066) Checking out files: 73% (4429/6066) Checking out files: 74% (4489/6066) Checking out files: 75% (4550/6066) Checking out files: 76% (4611/6066) Checking out files: 77% (4671/6066) Checking out files: 78% (4732/6066) Checking out files: 79% (4793/6066) Checking out files: 79% (4794/6066) Checking out files: 80% (4853/6066) Checking out files: 81% (4914/6066) Checking out files: 82% (4975/6066) Checking out files: 83% (5035/6066) Checking out files: 84% (5096/6066) Checking out files: 85% (5157/6066) Checking out files: 86% (5217/6066) Checking out files: 87% (5278/6066) Checking out files: 88% (5339/6066) Checking out files: 89% (5399/6066) Checking out files: 90% (5460/6066) Checking out files: 91% (5521/6066) Checking out files: 92% (5581/6066) Checking out files: 93% (5642/6066) Checking out files: 94% (5703/6066) Checking out files: 95% (5763/6066) Checking out files: 96% (5824/6066) Checking out files: 97% (5885/6066) Checking out files: 98% (5945/6066) Checking out files: 98% (5997/6066) Checking out files: 99% (6006/6066) Checking out files: 100% (6066/6066) Checking out files: 100% (6066/6066), done. Your branch is up-to-date with 'origin/test'. Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar.vroot/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb' Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar.vroot/ui/keycodemapdb'... Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce' tar: /var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar: Wrote only 2048 of 10240 bytes tar: Error is not recoverable: exiting now failed to create tar file COPY RUNNER RUN test-build in qemu:min-glib tar: Unexpected EOF in archive tar: rmtlseek not stopped at a record boundary tar: Error is not recoverable: exiting now /var/tmp/qemu/run: line 32: prep_fail: command not found Environment variables: HOSTNAME=e1cbd43a966e MAKEFLAGS= -j8 J=8 CCACHE_DIR=/var/tmp/ccache EXTRA_CONFIGURE_OPTS= V= SHOW_ENV=1 PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin PWD=/ TARGET_LIST= SHLVL=1 HOME=/root TEST_DIR=/tmp/qemu-test FEATURES= dtc DEBUG= _=/usr/bin/env /var/tmp/qemu/run: line 52: cd: /tmp/qemu-test/src/tests/docker: No such file or directory /var/tmp/qemu/run: line 57: /test-build: No such file or directory /var/tmp/qemu/run: line 57: exec: /test-build: cannot execute: No such file or directory Traceback (most recent call last): File "./tests/docker/docker.py", line 407, in <module> sys.exit(main()) File "./tests/docker/docker.py", line 404, in main return args.cmdobj.run(args, argv) File "./tests/docker/docker.py", line 261, in run return Docker().run(argv, args.keep, quiet=args.quiet) File "./tests/docker/docker.py", line 229, in run quiet=quiet) File "./tests/docker/docker.py", line 147, in _do_check return subprocess.check_call(self._command + cmd, **kwargs) File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=0111c03c34b911e88e0d52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863:/var/tmp/qemu:z,ro', 'qemu:min-glib', '/var/tmp/qemu/run', 'test-build']' returned non-zero exit status 126 make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1 make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-_1xmie25/src' make: *** [tests/docker/Makefile.include:163: docker-run-test-build@min-glib] Error 2 real 0m36.921s user 0m8.999s sys 0m6.657s === OUTPUT END === Test command exited with code: 2 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > Activating the block devices causes the locks to be taken on > the backing file. If we're running with -S and the destination libvirt > hasn't started the destination with 'cont', it's expecting the locks are > still untaken. > > Don't activate the block devices if we're not going to autostart the VM; > 'cont' already will do that anyway. > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> I'm not sure that this is a good idea. Going back to my old writeup of the migration phases... https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html ...the phase between migration completion and 'cont' is described like this: b) Migration converges: Both VMs are stopped (assuming -S is given on the destination, otherwise this phase is skipped), the destination is in control of the resources This patch changes the definition of the phase so that neither side is in control of the resources. We lose the phase where the destination is in control, but the VM isn't running yet. This feels like a problem to me. Consider a case where the management tool keeps a mirror job with sync=none running to expose all I/O requests to some external process. It needs to shut down the old block job on the source in the 'pre-switchover' state, and start a new block job on the destination when the destination controls the images, but the VM doesn't run yet (so that it doesn't miss an I/O request). This patch removes the migration phase that the management tool needs to implement this correctly. If we need a "neither side has control" phase, we might need to introduce it in addition to the existing phases rather than replacing a phase that is still needed in other cases. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > Activating the block devices causes the locks to be taken on > > the backing file. If we're running with -S and the destination libvirt > > hasn't started the destination with 'cont', it's expecting the locks are > > still untaken. > > > > Don't activate the block devices if we're not going to autostart the VM; > > 'cont' already will do that anyway. > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > I'm not sure that this is a good idea. Going back to my old writeup of > the migration phases... > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html > > ...the phase between migration completion and 'cont' is described like > this: > > b) Migration converges: > Both VMs are stopped (assuming -S is given on the destination, > otherwise this phase is skipped), the destination is in control of > the resources > > This patch changes the definition of the phase so that neither side is > in control of the resources. We lose the phase where the destination is > in control, but the VM isn't running yet. This feels like a problem to > me. But see Jiri's writeup on that bz; libvirt is hitting the opposite problem; in this corner case they can't have the destination taking control yet. > Consider a case where the management tool keeps a mirror job with > sync=none running to expose all I/O requests to some external process. > It needs to shut down the old block job on the source in the > 'pre-switchover' state, and start a new block job on the destination > when the destination controls the images, but the VM doesn't run yet (so > that it doesn't miss an I/O request). This patch removes the migration > phase that the management tool needs to implement this correctly. > > If we need a "neither side has control" phase, we might need to > introduce it in addition to the existing phases rather than replacing a > phase that is still needed in other cases. This is yet another phase to be added. IMHO this needs the managment tool to explicitly take control in the case you're talking about. > Kevin Dave (out this week) -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > Activating the block devices causes the locks to be taken on > > > the backing file. If we're running with -S and the destination libvirt > > > hasn't started the destination with 'cont', it's expecting the locks are > > > still untaken. > > > > > > Don't activate the block devices if we're not going to autostart the VM; > > > 'cont' already will do that anyway. > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > I'm not sure that this is a good idea. Going back to my old writeup of > > the migration phases... > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html > > > > ...the phase between migration completion and 'cont' is described like > > this: > > > > b) Migration converges: > > Both VMs are stopped (assuming -S is given on the destination, > > otherwise this phase is skipped), the destination is in control of > > the resources > > > > This patch changes the definition of the phase so that neither side is > > in control of the resources. We lose the phase where the destination is > > in control, but the VM isn't running yet. This feels like a problem to > > me. > > But see Jiri's writeup on that bz; libvirt is hitting the opposite > problem; in this corner case they can't have the destination taking > control yet. I wonder if they can't already grant the destination QEMU the necessary permission in the pre-switchover phase. Just a thought, I don't know how this works in detail, so it might not possible after all. > > Consider a case where the management tool keeps a mirror job with > > sync=none running to expose all I/O requests to some external process. > > It needs to shut down the old block job on the source in the > > 'pre-switchover' state, and start a new block job on the destination > > when the destination controls the images, but the VM doesn't run yet (so > > that it doesn't miss an I/O request). This patch removes the migration > > phase that the management tool needs to implement this correctly. > > > > If we need a "neither side has control" phase, we might need to > > introduce it in addition to the existing phases rather than replacing a > > phase that is still needed in other cases. > > This is yet another phase to be added. > IMHO this needs the managment tool to explicitly take control in the > case you're talking about. What kind of mechanism do you have in mind there? Maybe what could work would be separate QMP commands to inactivate (and possibly for symmetry activate) all block nodes. Then the management tool could use the pre-switchover phase to shut down its block jobs etc., inactivate all block nodes, transfer its own locks and then call migrate-continue. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > Activating the block devices causes the locks to be taken on > > > > the backing file. If we're running with -S and the destination libvirt > > > > hasn't started the destination with 'cont', it's expecting the locks are > > > > still untaken. > > > > > > > > Don't activate the block devices if we're not going to autostart the VM; > > > > 'cont' already will do that anyway. > > > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > > > I'm not sure that this is a good idea. Going back to my old writeup of > > > the migration phases... > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html > > > > > > ...the phase between migration completion and 'cont' is described like > > > this: > > > > > > b) Migration converges: > > > Both VMs are stopped (assuming -S is given on the destination, > > > otherwise this phase is skipped), the destination is in control of > > > the resources > > > > > > This patch changes the definition of the phase so that neither side is > > > in control of the resources. We lose the phase where the destination is > > > in control, but the VM isn't running yet. This feels like a problem to > > > me. > > > > But see Jiri's writeup on that bz; libvirt is hitting the opposite > > problem; in this corner case they can't have the destination taking > > control yet. > > I wonder if they can't already grant the destination QEMU the necessary > permission in the pre-switchover phase. Just a thought, I don't know how > this works in detail, so it might not possible after all. It's a fairly hairy failure case they had; if I remember correctly it's: a) Start migration b) Migration gets to completion point c) Destination is still paused d) Libvirt is restarted on the source e) Since libvirt was restarted it fails the migration (and hence knows the destination won't be started) f) It now tries to resume the qemu on the source (f) fails because (b) caused the locks to be taken on the destination; hence this patch stops doing that. It's a case we don't really think about - i.e. that the migration has actually completed and all the data is on the destination, but libvirt decides for some other reason to abandon migration. > > > Consider a case where the management tool keeps a mirror job with > > > sync=none running to expose all I/O requests to some external process. > > > It needs to shut down the old block job on the source in the > > > 'pre-switchover' state, and start a new block job on the destination > > > when the destination controls the images, but the VM doesn't run yet (so > > > that it doesn't miss an I/O request). This patch removes the migration > > > phase that the management tool needs to implement this correctly. > > > > > > If we need a "neither side has control" phase, we might need to > > > introduce it in addition to the existing phases rather than replacing a > > > phase that is still needed in other cases. > > > > This is yet another phase to be added. > > IMHO this needs the managment tool to explicitly take control in the > > case you're talking about. > > What kind of mechanism do you have in mind there? > > Maybe what could work would be separate QMP commands to inactivate (and > possibly for symmetry activate) all block nodes. Then the management > tool could use the pre-switchover phase to shut down its block jobs > etc., inactivate all block nodes, transfer its own locks and then call > migrate-continue. Yes it was a 'block-activate' that I'd wondered about. One complication is that if this now under the control of the management layer then we should stop asserting when the block devices aren't in the expected state and just cleanly fail the command instead. Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben: > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > > > Activating the block devices causes the locks to be taken on > > > > > the backing file. If we're running with -S and the destination libvirt > > > > > hasn't started the destination with 'cont', it's expecting the locks are > > > > > still untaken. > > > > > > > > > > Don't activate the block devices if we're not going to autostart the VM; > > > > > 'cont' already will do that anyway. > > > > > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > > > > > I'm not sure that this is a good idea. Going back to my old writeup of > > > > the migration phases... > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html > > > > > > > > ...the phase between migration completion and 'cont' is described like > > > > this: > > > > > > > > b) Migration converges: > > > > Both VMs are stopped (assuming -S is given on the destination, > > > > otherwise this phase is skipped), the destination is in control of > > > > the resources > > > > > > > > This patch changes the definition of the phase so that neither side is > > > > in control of the resources. We lose the phase where the destination is > > > > in control, but the VM isn't running yet. This feels like a problem to > > > > me. > > > > > > But see Jiri's writeup on that bz; libvirt is hitting the opposite > > > problem; in this corner case they can't have the destination taking > > > control yet. > > > > I wonder if they can't already grant the destination QEMU the necessary > > permission in the pre-switchover phase. Just a thought, I don't know how > > this works in detail, so it might not possible after all. > > It's a fairly hairy failure case they had; if I remember correctly it's: > a) Start migration > b) Migration gets to completion point > c) Destination is still paused > d) Libvirt is restarted on the source > e) Since libvirt was restarted it fails the migration (and hence knows > the destination won't be started) > f) It now tries to resume the qemu on the source > > (f) fails because (b) caused the locks to be taken on the destination; > hence this patch stops doing that. It's a case we don't really think > about - i.e. that the migration has actually completed and all the data > is on the destination, but libvirt decides for some other reason to > abandon migration. If you do remember correctly, that scenario doesn't feel tricky at all. libvirt needs to quit the destination qemu, which will inactivate the images on the destination and release the lock, and then it can continue the source. In fact, this is so straightforward that I wonder what else libvirt is doing. Is the destination qemu only shut down after trying to continue the source? That would be libvirt using the wrong order of steps. > > > > Consider a case where the management tool keeps a mirror job with > > > > sync=none running to expose all I/O requests to some external process. > > > > It needs to shut down the old block job on the source in the > > > > 'pre-switchover' state, and start a new block job on the destination > > > > when the destination controls the images, but the VM doesn't run yet (so > > > > that it doesn't miss an I/O request). This patch removes the migration > > > > phase that the management tool needs to implement this correctly. > > > > > > > > If we need a "neither side has control" phase, we might need to > > > > introduce it in addition to the existing phases rather than replacing a > > > > phase that is still needed in other cases. > > > > > > This is yet another phase to be added. > > > IMHO this needs the managment tool to explicitly take control in the > > > case you're talking about. > > > > What kind of mechanism do you have in mind there? > > > > Maybe what could work would be separate QMP commands to inactivate (and > > possibly for symmetry activate) all block nodes. Then the management > > tool could use the pre-switchover phase to shut down its block jobs > > etc., inactivate all block nodes, transfer its own locks and then call > > migrate-continue. > > Yes it was a 'block-activate' that I'd wondered about. One complication > is that if this now under the control of the management layer then we > should stop asserting when the block devices aren't in the expected > state and just cleanly fail the command instead. Requiring an explicit 'block-activate' on the destination would be an incompatible change, so you would have to introduce a new option for that. 'block-inactivate' on the source feels a bit simpler. And yes, you're probably right that we would have to be more careful to catch inactive images without crashing. On the other hand, it would become a state that is easier to test because it can be directly influenced via QMP rather than being only a side effect of migration. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben: > > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > > > > > Activating the block devices causes the locks to be taken on > > > > > > the backing file. If we're running with -S and the destination libvirt > > > > > > hasn't started the destination with 'cont', it's expecting the locks are > > > > > > still untaken. > > > > > > > > > > > > Don't activate the block devices if we're not going to autostart the VM; > > > > > > 'cont' already will do that anyway. > > > > > > > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > > > > > > > I'm not sure that this is a good idea. Going back to my old writeup of > > > > > the migration phases... > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html > > > > > > > > > > ...the phase between migration completion and 'cont' is described like > > > > > this: > > > > > > > > > > b) Migration converges: > > > > > Both VMs are stopped (assuming -S is given on the destination, > > > > > otherwise this phase is skipped), the destination is in control of > > > > > the resources > > > > > > > > > > This patch changes the definition of the phase so that neither side is > > > > > in control of the resources. We lose the phase where the destination is > > > > > in control, but the VM isn't running yet. This feels like a problem to > > > > > me. > > > > > > > > But see Jiri's writeup on that bz; libvirt is hitting the opposite > > > > problem; in this corner case they can't have the destination taking > > > > control yet. > > > > > > I wonder if they can't already grant the destination QEMU the necessary > > > permission in the pre-switchover phase. Just a thought, I don't know how > > > this works in detail, so it might not possible after all. > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > a) Start migration > > b) Migration gets to completion point > > c) Destination is still paused > > d) Libvirt is restarted on the source > > e) Since libvirt was restarted it fails the migration (and hence knows > > the destination won't be started) > > f) It now tries to resume the qemu on the source > > > > (f) fails because (b) caused the locks to be taken on the destination; > > hence this patch stops doing that. It's a case we don't really think > > about - i.e. that the migration has actually completed and all the data > > is on the destination, but libvirt decides for some other reason to > > abandon migration. > > If you do remember correctly, that scenario doesn't feel tricky at all. > libvirt needs to quit the destination qemu, which will inactivate the > images on the destination and release the lock, and then it can continue > the source. > > In fact, this is so straightforward that I wonder what else libvirt is > doing. Is the destination qemu only shut down after trying to continue > the source? That would be libvirt using the wrong order of steps. I'll leave Jiri to reply to this; I think this is a case of the source realising libvirt has restarted, then trying to recover all of it's VMs without being in the position of being able to check on the destination. > > > > > Consider a case where the management tool keeps a mirror job with > > > > > sync=none running to expose all I/O requests to some external process. > > > > > It needs to shut down the old block job on the source in the > > > > > 'pre-switchover' state, and start a new block job on the destination > > > > > when the destination controls the images, but the VM doesn't run yet (so > > > > > that it doesn't miss an I/O request). This patch removes the migration > > > > > phase that the management tool needs to implement this correctly. > > > > > > > > > > If we need a "neither side has control" phase, we might need to > > > > > introduce it in addition to the existing phases rather than replacing a > > > > > phase that is still needed in other cases. > > > > > > > > This is yet another phase to be added. > > > > IMHO this needs the managment tool to explicitly take control in the > > > > case you're talking about. > > > > > > What kind of mechanism do you have in mind there? > > > > > > Maybe what could work would be separate QMP commands to inactivate (and > > > possibly for symmetry activate) all block nodes. Then the management > > > tool could use the pre-switchover phase to shut down its block jobs > > > etc., inactivate all block nodes, transfer its own locks and then call > > > migrate-continue. > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > is that if this now under the control of the management layer then we > > should stop asserting when the block devices aren't in the expected > > state and just cleanly fail the command instead. > > Requiring an explicit 'block-activate' on the destination would be an > incompatible change, so you would have to introduce a new option for > that. 'block-inactivate' on the source feels a bit simpler. I'd only want the 'block-activate' in the case of this new block-job you're suggesting; not in the case of normal migrates - they'd still get it when they do 'cont' - so the change in behaviour is only with that block-job case that must start before the end of migrate. > And yes, you're probably right that we would have to be more careful to > catch inactive images without crashing. On the other hand, it would > become a state that is easier to test because it can be directly > influenced via QMP rather than being only a side effect of migration. Yes; but crashing is really bad, so we should really really stopping asserting all over. Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 09.04.2018 um 16:04 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben: > > > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > > > > > > > Activating the block devices causes the locks to be taken on > > > > > > > the backing file. If we're running with -S and the destination libvirt > > > > > > > hasn't started the destination with 'cont', it's expecting the locks are > > > > > > > still untaken. > > > > > > > > > > > > > > Don't activate the block devices if we're not going to autostart the VM; > > > > > > > 'cont' already will do that anyway. > > > > > > > > > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > > > > > > > > > I'm not sure that this is a good idea. Going back to my old writeup of > > > > > > the migration phases... > > > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html > > > > > > > > > > > > ...the phase between migration completion and 'cont' is described like > > > > > > this: > > > > > > > > > > > > b) Migration converges: > > > > > > Both VMs are stopped (assuming -S is given on the destination, > > > > > > otherwise this phase is skipped), the destination is in control of > > > > > > the resources > > > > > > > > > > > > This patch changes the definition of the phase so that neither side is > > > > > > in control of the resources. We lose the phase where the destination is > > > > > > in control, but the VM isn't running yet. This feels like a problem to > > > > > > me. > > > > > > > > > > But see Jiri's writeup on that bz; libvirt is hitting the opposite > > > > > problem; in this corner case they can't have the destination taking > > > > > control yet. > > > > > > > > I wonder if they can't already grant the destination QEMU the necessary > > > > permission in the pre-switchover phase. Just a thought, I don't know how > > > > this works in detail, so it might not possible after all. > > > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > a) Start migration > > > b) Migration gets to completion point > > > c) Destination is still paused > > > d) Libvirt is restarted on the source > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > the destination won't be started) > > > f) It now tries to resume the qemu on the source > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > hence this patch stops doing that. It's a case we don't really think > > > about - i.e. that the migration has actually completed and all the data > > > is on the destination, but libvirt decides for some other reason to > > > abandon migration. > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > libvirt needs to quit the destination qemu, which will inactivate the > > images on the destination and release the lock, and then it can continue > > the source. > > > > In fact, this is so straightforward that I wonder what else libvirt is > > doing. Is the destination qemu only shut down after trying to continue > > the source? That would be libvirt using the wrong order of steps. > > I'll leave Jiri to reply to this; I think this is a case of the source > realising libvirt has restarted, then trying to recover all of it's VMs > without being in the position of being able to check on the destination. > > > > > > > Consider a case where the management tool keeps a mirror job with > > > > > > sync=none running to expose all I/O requests to some external process. > > > > > > It needs to shut down the old block job on the source in the > > > > > > 'pre-switchover' state, and start a new block job on the destination > > > > > > when the destination controls the images, but the VM doesn't run yet (so > > > > > > that it doesn't miss an I/O request). This patch removes the migration > > > > > > phase that the management tool needs to implement this correctly. > > > > > > > > > > > > If we need a "neither side has control" phase, we might need to > > > > > > introduce it in addition to the existing phases rather than replacing a > > > > > > phase that is still needed in other cases. > > > > > > > > > > This is yet another phase to be added. > > > > > IMHO this needs the managment tool to explicitly take control in the > > > > > case you're talking about. > > > > > > > > What kind of mechanism do you have in mind there? > > > > > > > > Maybe what could work would be separate QMP commands to inactivate (and > > > > possibly for symmetry activate) all block nodes. Then the management > > > > tool could use the pre-switchover phase to shut down its block jobs > > > > etc., inactivate all block nodes, transfer its own locks and then call > > > > migrate-continue. > > > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > is that if this now under the control of the management layer then we > > > should stop asserting when the block devices aren't in the expected > > > state and just cleanly fail the command instead. > > > > Requiring an explicit 'block-activate' on the destination would be an > > incompatible change, so you would have to introduce a new option for > > that. 'block-inactivate' on the source feels a bit simpler. > > I'd only want the 'block-activate' in the case of this new block-job > you're suggesting; not in the case of normal migrates - they'd still get > it when they do 'cont' - so the change in behaviour is only with that > block-job case that must start before the end of migrate. I'm not aware of having suggested a new block job? > > And yes, you're probably right that we would have to be more careful to > > catch inactive images without crashing. On the other hand, it would > > become a state that is easier to test because it can be directly > > influenced via QMP rather than being only a side effect of migration. > > Yes; but crashing is really bad, so we should really really stopping > asserting all over. Are you aware of any wrong assertions currently? The thing is, inactive images can only happen in a fairly restricted set of scenarios today - either on the source after migration completed, or on the destination before it completed. If you get any write I/O requests in these states, that's a QEMU bug, so assertions to catch these bugs feel right to me. Kevin
On Wed, Apr 04, 2018 at 12:03:03 +0200, Kevin Wolf wrote: > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Consider a case where the management tool keeps a mirror job with > > > sync=none running to expose all I/O requests to some external process. > > > It needs to shut down the old block job on the source in the > > > 'pre-switchover' state, and start a new block job on the destination > > > when the destination controls the images, but the VM doesn't run yet (so > > > that it doesn't miss an I/O request). This patch removes the migration > > > phase that the management tool needs to implement this correctly. > > > > > > If we need a "neither side has control" phase, we might need to > > > introduce it in addition to the existing phases rather than replacing a > > > phase that is still needed in other cases. > > > > This is yet another phase to be added. > > IMHO this needs the managment tool to explicitly take control in the > > case you're talking about. > > What kind of mechanism do you have in mind there? > > Maybe what could work would be separate QMP commands to inactivate (and > possibly for symmetry activate) all block nodes. Then the management > tool could use the pre-switchover phase to shut down its block jobs > etc., inactivate all block nodes, transfer its own locks and then call > migrate-continue. Libvirt's migration protocol consists of several phases, the ones relevant to QEMU are: 1. destination libvirtd starts a new QEMU process with -incoming 2. source libvirtd calls migrate QMP command and waits until migration completes; in this step we actually wait for the pre-switchover, shut down all block jobs, and call migrate-continue 3. destination libvirtd calls cont QMP command The daemons only communicated between these steps, i.e., the result of each steps is transferred to the other side of migration. In other words, transferring of locks and other state IIUC what you suggested would require step 2. to be split so that some work can be done on the destination between "migrate" command and completed migration. This would be pretty complicated and I don't think the problem we're trying to solve would be worth it. Not to mention that it would multiply the number of possible code paths in the migration code. However, I don't think the extra step is even necessary. We could just have a separate QMP command to activate block nodes. Thus the source would wait for pre-switchover, shut down all block jobs and call migrate-continue. Once migration completes, the control would be transferred to the destination which would call the new command to activate block nodes followed by cont. That is, the "cont" command would just be replaced with two commands. And this similarly to the pre-switchover state, this could be controlled by a new migration capability to make sure all block nodes are activated automatically with old libvirt which doesn't know anything about the new command. This would solve compatibility with non-libvirt use cases too. Jirka
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 09.04.2018 um 16:04 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben: > > > > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben: > > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > > > > > > > > > > > > > > > Activating the block devices causes the locks to be taken on > > > > > > > > the backing file. If we're running with -S and the destination libvirt > > > > > > > > hasn't started the destination with 'cont', it's expecting the locks are > > > > > > > > still untaken. > > > > > > > > > > > > > > > > Don't activate the block devices if we're not going to autostart the VM; > > > > > > > > 'cont' already will do that anyway. > > > > > > > > > > > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 > > > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > > > > > > > > > > > > > I'm not sure that this is a good idea. Going back to my old writeup of > > > > > > > the migration phases... > > > > > > > > > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html > > > > > > > > > > > > > > ...the phase between migration completion and 'cont' is described like > > > > > > > this: > > > > > > > > > > > > > > b) Migration converges: > > > > > > > Both VMs are stopped (assuming -S is given on the destination, > > > > > > > otherwise this phase is skipped), the destination is in control of > > > > > > > the resources > > > > > > > > > > > > > > This patch changes the definition of the phase so that neither side is > > > > > > > in control of the resources. We lose the phase where the destination is > > > > > > > in control, but the VM isn't running yet. This feels like a problem to > > > > > > > me. > > > > > > > > > > > > But see Jiri's writeup on that bz; libvirt is hitting the opposite > > > > > > problem; in this corner case they can't have the destination taking > > > > > > control yet. > > > > > > > > > > I wonder if they can't already grant the destination QEMU the necessary > > > > > permission in the pre-switchover phase. Just a thought, I don't know how > > > > > this works in detail, so it might not possible after all. > > > > > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > > a) Start migration > > > > b) Migration gets to completion point > > > > c) Destination is still paused > > > > d) Libvirt is restarted on the source > > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > > the destination won't be started) > > > > f) It now tries to resume the qemu on the source > > > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > > hence this patch stops doing that. It's a case we don't really think > > > > about - i.e. that the migration has actually completed and all the data > > > > is on the destination, but libvirt decides for some other reason to > > > > abandon migration. > > > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > > libvirt needs to quit the destination qemu, which will inactivate the > > > images on the destination and release the lock, and then it can continue > > > the source. > > > > > > In fact, this is so straightforward that I wonder what else libvirt is > > > doing. Is the destination qemu only shut down after trying to continue > > > the source? That would be libvirt using the wrong order of steps. > > > > I'll leave Jiri to reply to this; I think this is a case of the source > > realising libvirt has restarted, then trying to recover all of it's VMs > > without being in the position of being able to check on the destination. > > > > > > > > > Consider a case where the management tool keeps a mirror job with > > > > > > > sync=none running to expose all I/O requests to some external process. > > > > > > > It needs to shut down the old block job on the source in the > > > > > > > 'pre-switchover' state, and start a new block job on the destination > > > > > > > when the destination controls the images, but the VM doesn't run yet (so > > > > > > > that it doesn't miss an I/O request). This patch removes the migration > > > > > > > phase that the management tool needs to implement this correctly. > > > > > > > > > > > > > > If we need a "neither side has control" phase, we might need to > > > > > > > introduce it in addition to the existing phases rather than replacing a > > > > > > > phase that is still needed in other cases. > > > > > > > > > > > > This is yet another phase to be added. > > > > > > IMHO this needs the managment tool to explicitly take control in the > > > > > > case you're talking about. > > > > > > > > > > What kind of mechanism do you have in mind there? > > > > > > > > > > Maybe what could work would be separate QMP commands to inactivate (and > > > > > possibly for symmetry activate) all block nodes. Then the management > > > > > tool could use the pre-switchover phase to shut down its block jobs > > > > > etc., inactivate all block nodes, transfer its own locks and then call > > > > > migrate-continue. > > > > > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > > is that if this now under the control of the management layer then we > > > > should stop asserting when the block devices aren't in the expected > > > > state and just cleanly fail the command instead. > > > > > > Requiring an explicit 'block-activate' on the destination would be an > > > incompatible change, so you would have to introduce a new option for > > > that. 'block-inactivate' on the source feels a bit simpler. > > > > I'd only want the 'block-activate' in the case of this new block-job > > you're suggesting; not in the case of normal migrates - they'd still get > > it when they do 'cont' - so the change in behaviour is only with that > > block-job case that must start before the end of migrate. > > I'm not aware of having suggested a new block job? I'm referring to your concern in your first reply in the thread: Consider a case where the management tool keeps a mirror job with sync=none running to expose all I/O requests to some external process. > > > And yes, you're probably right that we would have to be more careful to > > > catch inactive images without crashing. On the other hand, it would > > > become a state that is easier to test because it can be directly > > > influenced via QMP rather than being only a side effect of migration. > > > > Yes; but crashing is really bad, so we should really really stopping > > asserting all over. > > Are you aware of any wrong assertions currently? Well, there's https://bugzilla.redhat.com/show_bug.cgi?id=1408653 that I've not looked at for a while. But we have had a few lately. > The thing is, inactive images can only happen in a fairly restricted set > of scenarios today - either on the source after migration completed, or > on the destination before it completed. If you get any write I/O > requests in these states, that's a QEMU bug, so assertions to catch > these bugs feel right to me. But if we add the 'block-inactivate' command you suggest, then it could be a management screwup rather than a qemu bug, and so assertions feel wrong. Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote: > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > It's a fairly hairy failure case they had; if I remember correctly it's: > > a) Start migration > > b) Migration gets to completion point > > c) Destination is still paused > > d) Libvirt is restarted on the source > > e) Since libvirt was restarted it fails the migration (and hence knows > > the destination won't be started) > > f) It now tries to resume the qemu on the source > > > > (f) fails because (b) caused the locks to be taken on the destination; > > hence this patch stops doing that. It's a case we don't really think > > about - i.e. that the migration has actually completed and all the data > > is on the destination, but libvirt decides for some other reason to > > abandon migration. > > If you do remember correctly, that scenario doesn't feel tricky at all. > libvirt needs to quit the destination qemu, which will inactivate the > images on the destination and release the lock, and then it can continue > the source. > > In fact, this is so straightforward that I wonder what else libvirt is > doing. Is the destination qemu only shut down after trying to continue > the source? That would be libvirt using the wrong order of steps. There's no connection between the two libvirt daemons in the case we're talking about so they can't really synchronize the actions. The destination daemon will kill the new QEMU process and the source will resume the old one, but the order is completely random. ... > > Yes it was a 'block-activate' that I'd wondered about. One complication > > is that if this now under the control of the management layer then we > > should stop asserting when the block devices aren't in the expected > > state and just cleanly fail the command instead. > > Requiring an explicit 'block-activate' on the destination would be an > incompatible change, so you would have to introduce a new option for > that. 'block-inactivate' on the source feels a bit simpler. As I said in another email, the explicit block-activate command could depend on a migration capability similarly to how pre-switchover state works. Jirka
Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben: > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote: > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > a) Start migration > > > b) Migration gets to completion point > > > c) Destination is still paused > > > d) Libvirt is restarted on the source > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > the destination won't be started) > > > f) It now tries to resume the qemu on the source > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > hence this patch stops doing that. It's a case we don't really think > > > about - i.e. that the migration has actually completed and all the data > > > is on the destination, but libvirt decides for some other reason to > > > abandon migration. > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > libvirt needs to quit the destination qemu, which will inactivate the > > images on the destination and release the lock, and then it can continue > > the source. > > > > In fact, this is so straightforward that I wonder what else libvirt is > > doing. Is the destination qemu only shut down after trying to continue > > the source? That would be libvirt using the wrong order of steps. > > There's no connection between the two libvirt daemons in the case we're > talking about so they can't really synchronize the actions. The > destination daemon will kill the new QEMU process and the source will > resume the old one, but the order is completely random. Hm, okay... > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > is that if this now under the control of the management layer then we > > > should stop asserting when the block devices aren't in the expected > > > state and just cleanly fail the command instead. > > > > Requiring an explicit 'block-activate' on the destination would be an > > incompatible change, so you would have to introduce a new option for > > that. 'block-inactivate' on the source feels a bit simpler. > > As I said in another email, the explicit block-activate command could > depend on a migration capability similarly to how pre-switchover state > works. Yeah, that's exactly the thing that we wouldn't need if we could use 'block-inactivate' on the source instead. It feels a bit wrong to design a more involved QEMU interface around the libvirt internals, but as long as we implement both sides for symmetry and libvirt just happens to pick the destination side for now, I think it's okay. By the way, are block devices the only thing that need to be explicitly activated? For example, what about qemu_announce_self() for network cards, do we need to delay that, too? In any case, I think this patch needs to be reverted for 2.12 because it's wrong, and then we can create the proper solution in the 2.13 timefrage. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben: > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote: > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > > a) Start migration > > > > b) Migration gets to completion point > > > > c) Destination is still paused > > > > d) Libvirt is restarted on the source > > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > > the destination won't be started) > > > > f) It now tries to resume the qemu on the source > > > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > > hence this patch stops doing that. It's a case we don't really think > > > > about - i.e. that the migration has actually completed and all the data > > > > is on the destination, but libvirt decides for some other reason to > > > > abandon migration. > > > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > > libvirt needs to quit the destination qemu, which will inactivate the > > > images on the destination and release the lock, and then it can continue > > > the source. > > > > > > In fact, this is so straightforward that I wonder what else libvirt is > > > doing. Is the destination qemu only shut down after trying to continue > > > the source? That would be libvirt using the wrong order of steps. > > > > There's no connection between the two libvirt daemons in the case we're > > talking about so they can't really synchronize the actions. The > > destination daemon will kill the new QEMU process and the source will > > resume the old one, but the order is completely random. > > Hm, okay... > > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > > is that if this now under the control of the management layer then we > > > > should stop asserting when the block devices aren't in the expected > > > > state and just cleanly fail the command instead. > > > > > > Requiring an explicit 'block-activate' on the destination would be an > > > incompatible change, so you would have to introduce a new option for > > > that. 'block-inactivate' on the source feels a bit simpler. > > > > As I said in another email, the explicit block-activate command could > > depend on a migration capability similarly to how pre-switchover state > > works. > > Yeah, that's exactly the thing that we wouldn't need if we could use > 'block-inactivate' on the source instead. It feels a bit wrong to > design a more involved QEMU interface around the libvirt internals, It's not necessarily 'libvirt internals' - it's a case of them having to cope with recovering from failures that happen around migration; it's not an easy problem, and if they've got a way to stop both sides running at the same time that's pretty important. > but > as long as we implement both sides for symmetry and libvirt just happens > to pick the destination side for now, I think it's okay. > > By the way, are block devices the only thing that need to be explicitly > activated? For example, what about qemu_announce_self() for network > cards, do we need to delay that, too? > > In any case, I think this patch needs to be reverted for 2.12 because > it's wrong, and then we can create the proper solution in the 2.13 > timefrage. what case does this break? I'm a bit wary of reverting this, which fixes a known problem, on the basis that it causes a theoretical problem. Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben: > > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote: > > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > > > a) Start migration > > > > > b) Migration gets to completion point > > > > > c) Destination is still paused > > > > > d) Libvirt is restarted on the source > > > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > > > the destination won't be started) > > > > > f) It now tries to resume the qemu on the source > > > > > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > > > hence this patch stops doing that. It's a case we don't really think > > > > > about - i.e. that the migration has actually completed and all the data > > > > > is on the destination, but libvirt decides for some other reason to > > > > > abandon migration. > > > > > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > > > libvirt needs to quit the destination qemu, which will inactivate the > > > > images on the destination and release the lock, and then it can continue > > > > the source. > > > > > > > > In fact, this is so straightforward that I wonder what else libvirt is > > > > doing. Is the destination qemu only shut down after trying to continue > > > > the source? That would be libvirt using the wrong order of steps. > > > > > > There's no connection between the two libvirt daemons in the case we're > > > talking about so they can't really synchronize the actions. The > > > destination daemon will kill the new QEMU process and the source will > > > resume the old one, but the order is completely random. > > > > Hm, okay... > > > > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > > > is that if this now under the control of the management layer then we > > > > > should stop asserting when the block devices aren't in the expected > > > > > state and just cleanly fail the command instead. > > > > > > > > Requiring an explicit 'block-activate' on the destination would be an > > > > incompatible change, so you would have to introduce a new option for > > > > that. 'block-inactivate' on the source feels a bit simpler. > > > > > > As I said in another email, the explicit block-activate command could > > > depend on a migration capability similarly to how pre-switchover state > > > works. > > > > Yeah, that's exactly the thing that we wouldn't need if we could use > > 'block-inactivate' on the source instead. It feels a bit wrong to > > design a more involved QEMU interface around the libvirt internals, > > It's not necessarily 'libvirt internals' - it's a case of them having to > cope with recovering from failures that happen around migration; it's > not an easy problem, and if they've got a way to stop both sides running > at the same time that's pretty important. The 'libvirt internals' isn't that it needs an additional state where neither source nor destination QEMU own the images, but that it has to be between migration completion and image activation on the destination rather than between image inactivation on the source and migration completion. The latter would be much easier for qemu, but apparently it doesn't work for libvirt because of how it works internally. But as I said, I'd just implement both for symmetry and then management tools can pick whatever makes their life easier. > > but > > as long as we implement both sides for symmetry and libvirt just happens > > to pick the destination side for now, I think it's okay. > > > > By the way, are block devices the only thing that need to be explicitly > > activated? For example, what about qemu_announce_self() for network > > cards, do we need to delay that, too? > > > > In any case, I think this patch needs to be reverted for 2.12 because > > it's wrong, and then we can create the proper solution in the 2.13 > > timefrage. > > what case does this break? > I'm a bit wary of reverting this, which fixes a known problem, on the > basis that it causes a theoretical problem. It breaks the API. And the final design we're having in mind now is compatible with the old API, not with the new one exposed by this patch, so that switch would break the API again to get back to the old state. Do you know all the scripts that people are using around QEMU? I don't, but I know that plenty of them exist, so I don't think we can declare this API breakage purely theoretical. Yes, the patch fixes a known problem, but also a problem that is a rare corner case error that you can only hit with really bad timing. Do we really want to risk unconditionally breaking success cases for fixing a mostly theoretical corner case error path (with the failure mode that the guest is paused when it shouldn't be)? Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben: > > > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote: > > > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > > > > a) Start migration > > > > > > b) Migration gets to completion point > > > > > > c) Destination is still paused > > > > > > d) Libvirt is restarted on the source > > > > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > > > > the destination won't be started) > > > > > > f) It now tries to resume the qemu on the source > > > > > > > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > > > > hence this patch stops doing that. It's a case we don't really think > > > > > > about - i.e. that the migration has actually completed and all the data > > > > > > is on the destination, but libvirt decides for some other reason to > > > > > > abandon migration. > > > > > > > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > > > > libvirt needs to quit the destination qemu, which will inactivate the > > > > > images on the destination and release the lock, and then it can continue > > > > > the source. > > > > > > > > > > In fact, this is so straightforward that I wonder what else libvirt is > > > > > doing. Is the destination qemu only shut down after trying to continue > > > > > the source? That would be libvirt using the wrong order of steps. > > > > > > > > There's no connection between the two libvirt daemons in the case we're > > > > talking about so they can't really synchronize the actions. The > > > > destination daemon will kill the new QEMU process and the source will > > > > resume the old one, but the order is completely random. > > > > > > Hm, okay... > > > > > > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > > > > is that if this now under the control of the management layer then we > > > > > > should stop asserting when the block devices aren't in the expected > > > > > > state and just cleanly fail the command instead. > > > > > > > > > > Requiring an explicit 'block-activate' on the destination would be an > > > > > incompatible change, so you would have to introduce a new option for > > > > > that. 'block-inactivate' on the source feels a bit simpler. > > > > > > > > As I said in another email, the explicit block-activate command could > > > > depend on a migration capability similarly to how pre-switchover state > > > > works. > > > > > > Yeah, that's exactly the thing that we wouldn't need if we could use > > > 'block-inactivate' on the source instead. It feels a bit wrong to > > > design a more involved QEMU interface around the libvirt internals, > > > > It's not necessarily 'libvirt internals' - it's a case of them having to > > cope with recovering from failures that happen around migration; it's > > not an easy problem, and if they've got a way to stop both sides running > > at the same time that's pretty important. > > The 'libvirt internals' isn't that it needs an additional state where > neither source nor destination QEMU own the images, but that it has to > be between migration completion and image activation on the destination > rather than between image inactivation on the source and migration > completion. The latter would be much easier for qemu, but apparently it > doesn't work for libvirt because of how it works internally. I suspect this is actually a fundamental requirement to ensuring that we don't end up with a QEMU running on both sides rather than how libvirt is structured. > But as I said, I'd just implement both for symmetry and then management > tools can pick whatever makes their life easier. > > > > but > > > as long as we implement both sides for symmetry and libvirt just happens > > > to pick the destination side for now, I think it's okay. > > > > > > By the way, are block devices the only thing that need to be explicitly > > > activated? For example, what about qemu_announce_self() for network > > > cards, do we need to delay that, too? > > > > > > In any case, I think this patch needs to be reverted for 2.12 because > > > it's wrong, and then we can create the proper solution in the 2.13 > > > timefrage. > > > > what case does this break? > > I'm a bit wary of reverting this, which fixes a known problem, on the > > basis that it causes a theoretical problem. > > It breaks the API. And the final design we're having in mind now is > compatible with the old API, not with the new one exposed by this patch, > so that switch would break the API again to get back to the old state. > > Do you know all the scripts that people are using around QEMU? I don't, > but I know that plenty of them exist, so I don't think we can declare > this API breakage purely theoretical. > > Yes, the patch fixes a known problem, but also a problem that is a rare > corner case error that you can only hit with really bad timing. Do we > really want to risk unconditionally breaking success cases for fixing a > mostly theoretical corner case error path (with the failure mode that > the guest is paused when it shouldn't be)? Hmm; having chatted to Jiri I'm OK with reverting it, on the condition that I actually understand how this alternative would work first. I can't currently see how a block-inactivate would be used. I also can't see how a block-activate unless it's also with the change that you're asking to revert. Can you explain the way you see it working? Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben: > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben: > > > > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote: > > > > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > > > > > a) Start migration > > > > > > > b) Migration gets to completion point > > > > > > > c) Destination is still paused > > > > > > > d) Libvirt is restarted on the source > > > > > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > > > > > the destination won't be started) > > > > > > > f) It now tries to resume the qemu on the source > > > > > > > > > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > > > > > hence this patch stops doing that. It's a case we don't really think > > > > > > > about - i.e. that the migration has actually completed and all the data > > > > > > > is on the destination, but libvirt decides for some other reason to > > > > > > > abandon migration. > > > > > > > > > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > > > > > libvirt needs to quit the destination qemu, which will inactivate the > > > > > > images on the destination and release the lock, and then it can continue > > > > > > the source. > > > > > > > > > > > > In fact, this is so straightforward that I wonder what else libvirt is > > > > > > doing. Is the destination qemu only shut down after trying to continue > > > > > > the source? That would be libvirt using the wrong order of steps. > > > > > > > > > > There's no connection between the two libvirt daemons in the case we're > > > > > talking about so they can't really synchronize the actions. The > > > > > destination daemon will kill the new QEMU process and the source will > > > > > resume the old one, but the order is completely random. > > > > > > > > Hm, okay... > > > > > > > > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > > > > > is that if this now under the control of the management layer then we > > > > > > > should stop asserting when the block devices aren't in the expected > > > > > > > state and just cleanly fail the command instead. > > > > > > > > > > > > Requiring an explicit 'block-activate' on the destination would be an > > > > > > incompatible change, so you would have to introduce a new option for > > > > > > that. 'block-inactivate' on the source feels a bit simpler. > > > > > > > > > > As I said in another email, the explicit block-activate command could > > > > > depend on a migration capability similarly to how pre-switchover state > > > > > works. > > > > > > > > Yeah, that's exactly the thing that we wouldn't need if we could use > > > > 'block-inactivate' on the source instead. It feels a bit wrong to > > > > design a more involved QEMU interface around the libvirt internals, > > > > > > It's not necessarily 'libvirt internals' - it's a case of them having to > > > cope with recovering from failures that happen around migration; it's > > > not an easy problem, and if they've got a way to stop both sides running > > > at the same time that's pretty important. > > > > The 'libvirt internals' isn't that it needs an additional state where > > neither source nor destination QEMU own the images, but that it has to > > be between migration completion and image activation on the destination > > rather than between image inactivation on the source and migration > > completion. The latter would be much easier for qemu, but apparently it > > doesn't work for libvirt because of how it works internally. > > I suspect this is actually a fundamental requirement to ensuring that we > don't end up with a QEMU running on both sides rather than how libvirt is > structured. I don't think so. In theory, both options can provide the same. If anything, it's related specifically to the phases that Jirka described that libvirt uses to implement migration. > > But as I said, I'd just implement both for symmetry and then management > > tools can pick whatever makes their life easier. > > > > > > but > > > > as long as we implement both sides for symmetry and libvirt just happens > > > > to pick the destination side for now, I think it's okay. > > > > > > > > By the way, are block devices the only thing that need to be explicitly > > > > activated? For example, what about qemu_announce_self() for network > > > > cards, do we need to delay that, too? > > > > > > > > In any case, I think this patch needs to be reverted for 2.12 because > > > > it's wrong, and then we can create the proper solution in the 2.13 > > > > timefrage. > > > > > > what case does this break? > > > I'm a bit wary of reverting this, which fixes a known problem, on the > > > basis that it causes a theoretical problem. > > > > It breaks the API. And the final design we're having in mind now is > > compatible with the old API, not with the new one exposed by this patch, > > so that switch would break the API again to get back to the old state. > > > > Do you know all the scripts that people are using around QEMU? I don't, > > but I know that plenty of them exist, so I don't think we can declare > > this API breakage purely theoretical. > > > > Yes, the patch fixes a known problem, but also a problem that is a rare > > corner case error that you can only hit with really bad timing. Do we > > really want to risk unconditionally breaking success cases for fixing a > > mostly theoretical corner case error path (with the failure mode that > > the guest is paused when it shouldn't be)? > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition > that I actually understand how this alternative would work first. > > I can't currently see how a block-inactivate would be used. > I also can't see how a block-activate unless it's also with the > change that you're asking to revert. > > Can you explain the way you see it working? The key is making the delayed activation of block devices (and probably delayed announcement of NICs? - you didn't answer that part) optional instead of making it the default. We can use Jirka's suggestion of adding a migration capability that enables it, or I suppose a new option to -incoming could work, too. It doesn't really matter what the syntax is, but the management tool must request it explicitly. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben: > > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben: > > > > > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote: > > > > > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben: > > > > > > > > It's a fairly hairy failure case they had; if I remember correctly it's: > > > > > > > > a) Start migration > > > > > > > > b) Migration gets to completion point > > > > > > > > c) Destination is still paused > > > > > > > > d) Libvirt is restarted on the source > > > > > > > > e) Since libvirt was restarted it fails the migration (and hence knows > > > > > > > > the destination won't be started) > > > > > > > > f) It now tries to resume the qemu on the source > > > > > > > > > > > > > > > > (f) fails because (b) caused the locks to be taken on the destination; > > > > > > > > hence this patch stops doing that. It's a case we don't really think > > > > > > > > about - i.e. that the migration has actually completed and all the data > > > > > > > > is on the destination, but libvirt decides for some other reason to > > > > > > > > abandon migration. > > > > > > > > > > > > > > If you do remember correctly, that scenario doesn't feel tricky at all. > > > > > > > libvirt needs to quit the destination qemu, which will inactivate the > > > > > > > images on the destination and release the lock, and then it can continue > > > > > > > the source. > > > > > > > > > > > > > > In fact, this is so straightforward that I wonder what else libvirt is > > > > > > > doing. Is the destination qemu only shut down after trying to continue > > > > > > > the source? That would be libvirt using the wrong order of steps. > > > > > > > > > > > > There's no connection between the two libvirt daemons in the case we're > > > > > > talking about so they can't really synchronize the actions. The > > > > > > destination daemon will kill the new QEMU process and the source will > > > > > > resume the old one, but the order is completely random. > > > > > > > > > > Hm, okay... > > > > > > > > > > > > > Yes it was a 'block-activate' that I'd wondered about. One complication > > > > > > > > is that if this now under the control of the management layer then we > > > > > > > > should stop asserting when the block devices aren't in the expected > > > > > > > > state and just cleanly fail the command instead. > > > > > > > > > > > > > > Requiring an explicit 'block-activate' on the destination would be an > > > > > > > incompatible change, so you would have to introduce a new option for > > > > > > > that. 'block-inactivate' on the source feels a bit simpler. > > > > > > > > > > > > As I said in another email, the explicit block-activate command could > > > > > > depend on a migration capability similarly to how pre-switchover state > > > > > > works. > > > > > > > > > > Yeah, that's exactly the thing that we wouldn't need if we could use > > > > > 'block-inactivate' on the source instead. It feels a bit wrong to > > > > > design a more involved QEMU interface around the libvirt internals, > > > > > > > > It's not necessarily 'libvirt internals' - it's a case of them having to > > > > cope with recovering from failures that happen around migration; it's > > > > not an easy problem, and if they've got a way to stop both sides running > > > > at the same time that's pretty important. > > > > > > The 'libvirt internals' isn't that it needs an additional state where > > > neither source nor destination QEMU own the images, but that it has to > > > be between migration completion and image activation on the destination > > > rather than between image inactivation on the source and migration > > > completion. The latter would be much easier for qemu, but apparently it > > > doesn't work for libvirt because of how it works internally. > > > > I suspect this is actually a fundamental requirement to ensuring that we > > don't end up with a QEMU running on both sides rather than how libvirt is > > structured. > > I don't think so. In theory, both options can provide the same. If > anything, it's related specifically to the phases that Jirka described > that libvirt uses to implement migration. > > > > But as I said, I'd just implement both for symmetry and then management > > > tools can pick whatever makes their life easier. > > > > > > > > but > > > > > as long as we implement both sides for symmetry and libvirt just happens > > > > > to pick the destination side for now, I think it's okay. > > > > > > > > > > By the way, are block devices the only thing that need to be explicitly > > > > > activated? For example, what about qemu_announce_self() for network > > > > > cards, do we need to delay that, too? > > > > > > > > > > In any case, I think this patch needs to be reverted for 2.12 because > > > > > it's wrong, and then we can create the proper solution in the 2.13 > > > > > timefrage. > > > > > > > > what case does this break? > > > > I'm a bit wary of reverting this, which fixes a known problem, on the > > > > basis that it causes a theoretical problem. > > > > > > It breaks the API. And the final design we're having in mind now is > > > compatible with the old API, not with the new one exposed by this patch, > > > so that switch would break the API again to get back to the old state. > > > > > > Do you know all the scripts that people are using around QEMU? I don't, > > > but I know that plenty of them exist, so I don't think we can declare > > > this API breakage purely theoretical. > > > > > > Yes, the patch fixes a known problem, but also a problem that is a rare > > > corner case error that you can only hit with really bad timing. Do we > > > really want to risk unconditionally breaking success cases for fixing a > > > mostly theoretical corner case error path (with the failure mode that > > > the guest is paused when it shouldn't be)? > > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition > > that I actually understand how this alternative would work first. > > > > I can't currently see how a block-inactivate would be used. > > I also can't see how a block-activate unless it's also with the > > change that you're asking to revert. > > > > Can you explain the way you see it working? > > The key is making the delayed activation of block devices (and probably > delayed announcement of NICs? - you didn't answer that part) optional > instead of making it the default. NIC announcments are broken in similar but slightly different ways; we did have a series on list to help a while ago but it never got merged; I'd like to keep that mess separate. > We can use Jirka's suggestion of adding a migration capability that > enables it, or I suppose a new option to -incoming could work, too. It > doesn't really matter what the syntax is, but the management tool must > request it explicitly. A new capability is easy to gate the change in behaviour that this patch added; I'll do that first thing for 2.13 (given today is rc3 tag it's too late). However, once we turn this on, to cope with the situation of a block user that must start prior to the 'cont' when this behaviour is active, we'd also need the 'block-activate' command. Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben: > * Kevin Wolf (kwolf@redhat.com) wrote: > > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben: > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition > > > that I actually understand how this alternative would work first. > > > > > > I can't currently see how a block-inactivate would be used. > > > I also can't see how a block-activate unless it's also with the > > > change that you're asking to revert. > > > > > > Can you explain the way you see it working? > > > > The key is making the delayed activation of block devices (and probably > > delayed announcement of NICs? - you didn't answer that part) optional > > instead of making it the default. > > NIC announcments are broken in similar but slightly different ways; we > did have a series on list to help a while ago but it never got merged; > I'd like to keep that mess separate. Okay. I just thought that it would make sense to have clear migration phases that are the same for all external resources that the QEMU processes use. > > We can use Jirka's suggestion of adding a migration capability that > > enables it, or I suppose a new option to -incoming could work, too. It > > doesn't really matter what the syntax is, but the management tool must > > request it explicitly. > > A new capability is easy to gate the change in behaviour that this patch > added; I'll do that first thing for 2.13 (given today is rc3 tag it's > too late). > > However, once we turn this on, to cope with the situation of a block user > that must start prior to the 'cont' when this behaviour is active, we'd > also need the 'block-activate' command. Yes, that's right. Kevin
On Tue, Apr 10, 2018 at 16:47:56 +0200, Kevin Wolf wrote: > Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben: > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben: > > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition > > > > that I actually understand how this alternative would work first. > > > > > > > > I can't currently see how a block-inactivate would be used. > > > > I also can't see how a block-activate unless it's also with the > > > > change that you're asking to revert. > > > > > > > > Can you explain the way you see it working? > > > > > > The key is making the delayed activation of block devices (and probably > > > delayed announcement of NICs? - you didn't answer that part) optional > > > instead of making it the default. > > > > NIC announcments are broken in similar but slightly different ways; we > > did have a series on list to help a while ago but it never got merged; > > I'd like to keep that mess separate. > > Okay. I just thought that it would make sense to have clear migration > phases that are the same for all external resources that the QEMU > processes use. I don't think NIC announcements should be delayed in this specific case since we're dealing with a failure recovery which should be rare in comparison to successful migration when we want NIC announcements to be send early. In other words, any NIC issues should be solved separately and Laine would likely be a better person for discussing them since he has a broader knowledge of all the fancy network stuff which libvirt needs to coordinate with. Jirka
Am 11.04.2018 um 12:01 hat Jiri Denemark geschrieben: > On Tue, Apr 10, 2018 at 16:47:56 +0200, Kevin Wolf wrote: > > Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben: > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben: > > > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition > > > > > that I actually understand how this alternative would work first. > > > > > > > > > > I can't currently see how a block-inactivate would be used. > > > > > I also can't see how a block-activate unless it's also with the > > > > > change that you're asking to revert. > > > > > > > > > > Can you explain the way you see it working? > > > > > > > > The key is making the delayed activation of block devices (and probably > > > > delayed announcement of NICs? - you didn't answer that part) optional > > > > instead of making it the default. > > > > > > NIC announcments are broken in similar but slightly different ways; we > > > did have a series on list to help a while ago but it never got merged; > > > I'd like to keep that mess separate. > > > > Okay. I just thought that it would make sense to have clear migration > > phases that are the same for all external resources that the QEMU > > processes use. > > I don't think NIC announcements should be delayed in this specific case > since we're dealing with a failure recovery which should be rare in > comparison to successful migration when we want NIC announcements to be > send early. In other words, any NIC issues should be solved separately > and Laine would likely be a better person for discussing them since he > has a broader knowledge of all the fancy network stuff which libvirt > needs to coordinate with. Well, if I were the migration maintainer, I would insist on a properly designed phase model that solves the problem once and for all because it would be clear where everything belongs. We could still have bugs in the future, but that would be internal implementation bugs with no effect on the API. But I'm not the maintainer and Dave prefers to deal with it basically as a bunch of one-off fixes, and that will work, too. It will probably clutter up the external API a bit (because the management tool will have to separately address migration of block devices, network devices and possibly other things in the future), but that shouldn't matter much for libvirt. Maybe what we do need is some documentation of the recommended process for performing a live migration so that management tools know which QMP commands they need to issue when. Kevin
* Kevin Wolf (kwolf@redhat.com) wrote: > Am 11.04.2018 um 12:01 hat Jiri Denemark geschrieben: > > On Tue, Apr 10, 2018 at 16:47:56 +0200, Kevin Wolf wrote: > > > Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben: > > > > * Kevin Wolf (kwolf@redhat.com) wrote: > > > > > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben: > > > > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition > > > > > > that I actually understand how this alternative would work first. > > > > > > > > > > > > I can't currently see how a block-inactivate would be used. > > > > > > I also can't see how a block-activate unless it's also with the > > > > > > change that you're asking to revert. > > > > > > > > > > > > Can you explain the way you see it working? > > > > > > > > > > The key is making the delayed activation of block devices (and probably > > > > > delayed announcement of NICs? - you didn't answer that part) optional > > > > > instead of making it the default. > > > > > > > > NIC announcments are broken in similar but slightly different ways; we > > > > did have a series on list to help a while ago but it never got merged; > > > > I'd like to keep that mess separate. > > > > > > Okay. I just thought that it would make sense to have clear migration > > > phases that are the same for all external resources that the QEMU > > > processes use. > > > > I don't think NIC announcements should be delayed in this specific case > > since we're dealing with a failure recovery which should be rare in > > comparison to successful migration when we want NIC announcements to be > > send early. In other words, any NIC issues should be solved separately > > and Laine would likely be a better person for discussing them since he > > has a broader knowledge of all the fancy network stuff which libvirt > > needs to coordinate with. > > Well, if I were the migration maintainer, I would insist on a properly > designed phase model that solves the problem once and for all because it > would be clear where everything belongs. We could still have bugs in the > future, but that would be internal implementation bugs with no effect on > the API. My main reason for believing this wouldn't work is that most of the things we've had recently have been things where we've found out about subtle constraints that we previously didn't realise, and hence if we were writing down the mythical phase model we wouldn't have put in. I'd have loved to have had some more discussion about what those requirements were _before_ block locking went in a few versions back, because unsurprisingly adding hard locking constraints shook a lot of problems out (and IMHO was a much bigger API change than this change) > But I'm not the maintainer and Dave prefers to deal with it basically as > a bunch of one-off fixes, and that will work, too. It will probably > clutter up the external API a bit (because the management tool will have > to separately address migration of block devices, network devices and > possibly other things in the future), but that shouldn't matter much for > libvirt. Maybe what we do need is some documentation of the recommended > process for performing a live migration so that management tools know > which QMP commands they need to issue when. I'd like to keep the networking stuff separate because it's got a whole bunch of other interactions that we found out last time we tried to fix it; in particular Op=enStack's networking interaciton with Libvirt isn't quite what's expected and OpenStack have a whole bunch of different network configurations whose behaviour when we change something is fun. Oh and it depends heavily on the guest - which fortunately the block stuff doesn't (because on modern virtio-net guests it does the modern announce from the guest and so only ever happens once the guest CPU is running which simplifies stuff massively). Dave > Kevin -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff --git a/migration/migration.c b/migration/migration.c index 52a5092add..58bd382730 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -306,13 +306,21 @@ static void process_incoming_migration_bh(void *opaque) Error *local_err = NULL; MigrationIncomingState *mis = opaque; - /* Make sure all file formats flush their mutable metadata. - * If we get an error here, just don't restart the VM yet. */ - bdrv_invalidate_cache_all(&local_err); - if (local_err) { - error_report_err(local_err); - local_err = NULL; - autostart = false; + /* Only fire up the block code now if we're going to restart the + * VM, else 'cont' will do it. + * This causes file locking to happen; so we don't want it to happen + * unless we really are starting the VM. + */ + if (autostart && (!global_state_received() || + global_state_get_runstate() == RUN_STATE_RUNNING)) { + /* Make sure all file formats flush their mutable metadata. + * If we get an error here, just don't restart the VM yet. */ + bdrv_invalidate_cache_all(&local_err); + if (local_err) { + error_report_err(local_err); + local_err = NULL; + autostart = false; + } } /*