Message ID | 1317729885-17534-1-git-send-email-pbonzini@redhat.com |
---|---|
State | New |
Headers | show |
On Tue, 4 Oct 2011 14:04:45 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > Trying to migrate a paused machine fails. The reason is that > the RSTATE_PRE_MIGRATE is reached with vm_stop, and this > transition is eaten when the vm is already paused. This patch > fixes the problem by always going through runstate_set and > always notifying the new state. There's a semantic change which I'm not completely sure it won't generate unexpected side-effects: today vm_stop() will only carry any action if the machine is running, otherwise it's no-op. This patch changes that. We probably can fix it by adding a new transition to the transition table too... > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > cpus.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/cpus.c b/cpus.c > index 8978779..eab8ff6 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -128,6 +128,8 @@ static void do_vm_stop(RunState state) > qemu_aio_flush(); > bdrv_flush_all(); > monitor_protocol_event(QEVENT_STOP, NULL); > + } else { > + runstate_set(state); > } > } >
On 10/04/2011 03:49 PM, Luiz Capitulino wrote: > There's a semantic change which I'm not completely sure it won't generate > unexpected side-effects: today vm_stop() will only carry any action if the > machine is running, otherwise it's no-op. This patch changes that. More or less, yes. I tried to limit the semantic change by not running notifiers, which again could be better or worse. I don't think adding a new transition is a good solution, because you'll have to add a transition from PAUSED to anything that uses runstate_set instead of vm_stop. However, you could change all vm_stop() to vm_stop(RSTATE_PAUSED) followed by runstate_set(), adjust the transition table consequently and possibly drop the argument to vm_stop. I tried to get the smallest patch, but I did need to follow-up with changes to the transition table. In any case, can I assume this to be in your hands now? :) Paolo
On Tue, 04 Oct 2011 16:09:03 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 10/04/2011 03:49 PM, Luiz Capitulino wrote: > > There's a semantic change which I'm not completely sure it won't generate > > unexpected side-effects: today vm_stop() will only carry any action if the > > machine is running, otherwise it's no-op. This patch changes that. > > More or less, yes. I tried to limit the semantic change by not running > notifiers, which again could be better or worse. > > I don't think adding a new transition is a good solution, because you'll > have to add a transition from PAUSED to anything that uses runstate_set > instead of vm_stop. > > However, you could change all vm_stop() to vm_stop(RSTATE_PAUSED) > followed by runstate_set(), adjust the transition table consequently and > possibly drop the argument to vm_stop. I tried to get the smallest > patch, but I did need to follow-up with changes to the transition table. > > In any case, can I assume this to be in your hands now? :) Yes. I'm not sure what's the best solution here, but I'll decide soon :)
On Tue, 4 Oct 2011 14:04:45 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > Trying to migrate a paused machine fails. The reason is that > the RSTATE_PRE_MIGRATE is reached with vm_stop, and this > transition is eaten when the vm is already paused. This patch > fixes the problem by always going through runstate_set and > always notifying the new state. Let's start over, this time CC'ing Jan, Anthony and Avi. Basically, what Paolo is describing above is this: 1. The user issues the stop command. vm_stop() will set the state to RSTATE_PAUSED 2. The user starts a migration. migrate_fd_put_ready() will call vm_stop(RSTATE_PRE_MIGRATE). However, the VM is already stopped so vm_stop() just returns (IOW, the state is still RSTATE_PAUSED) 3. The migration process completes. migrate_fd_put_ready() will now call runstate_set(RSTATE_POST_MIGRATE), which in turn causes the transition RSTATE_PAUSED -> RSTATE_POST_MIGRATE, which is invalid and the world of qemu ends Now, we have three options to fix this but I don't know which one to choose: 1. We could just add the transition RSTATE_PAUSED -> RSTATE_POST_MIGRATE as valid. Not sure this is a good thing to do though, as it seems a silly workaround for the fact that the transition to RSTATE_PRE_MIGRATE has never occurred 2. This patch makes vm_stop() do the state transition even if the VM is already stopped. Seems good enough, except that I fear two things. First, today we know that vm_stop() is a no-op if the VM is already stopped, so there's a semantic change that could turn out to be trap. Second, I also fear people using vm_stop() as a way to change the VM status, just like runstate_set() (which can also become an horrible trap) 3. Avi suggested we should keep a reference count, so that states are not discarded: http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00595.html That solution seemed to be the perfect one, except for one important detail: how should we implement vm_start() (and thus 'cont')? In order to maintain how we behave with the external world, the only option is that vm_start() will set the stop count to 0. Ie, doesn't matter if we have stopped the VM 500 times at some point, a vm_start() call will discard all stored states. Not sure if that's what you expected, but the first time I read Avi's idea I had the impression that it would be a good idea that vm_start() decremented the ref count only once, ie. vm_stop() and vm_start() calls have to match. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > cpus.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/cpus.c b/cpus.c > index 8978779..eab8ff6 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -128,6 +128,8 @@ static void do_vm_stop(RunState state) > qemu_aio_flush(); > bdrv_flush_all(); > monitor_protocol_event(QEVENT_STOP, NULL); > + } else { > + runstate_set(state); > } > } >
On 10/05/2011 04:37 PM, Luiz Capitulino wrote: > Now, we have three options to fix this but I don't know which one to choose: > > 1. We could just add the transition RSTATE_PAUSED -> RSTATE_POST_MIGRATE > as valid. Not sure this is a good thing to do though, as it seems a silly > workaround for the fact that the transition to RSTATE_PRE_MIGRATE has > never occurred > > 2. This patch makes vm_stop() do the state transition even if the VM > is already stopped. Seems good enough, except that I fear two things. > First, today we know that vm_stop() is a no-op if the VM is already > stopped, so there's a semantic change that could turn out to be trap. > Second, I also fear people using vm_stop() as a way to change the > VM status, just like runstate_set() (which can also become an horrible > trap) > > 3. Avi suggested we should keep a reference count, so that states are > not discarded: > > http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00595.html > > That solution seemed to be the perfect one, except for one important > detail: how should we implement vm_start() (and thus 'cont')? > > In order to maintain how we behave with the external world, the only > option is that vm_start() will set the stop count to 0. Ie, doesn't > matter if we have stopped the VM 500 times at some point, a vm_start() > call will discard all stored states. > > Not sure if that's what you expected, but the first time I read Avi's > idea I had the impression that it would be a good idea that vm_start() > decremented the ref count only once, ie. vm_stop() and vm_start() calls > have to match. > vm_start() should be symmetric with vm_stop(). That is, if a piece of code wants to execute with vcpus stopped, it should just run inside a stop/start pair. The only confusion can come from the user, if he sees multiple stop events and expects that just one cont will continue the vm. For the machine monitor, we should just document that the you have to issue one cont for every stop event you see (plus any stops you issue). It's not unnatural - the code that handles a stop_due_to_enospace can work to fix the error and issue a cont, disregarding any other stops in progress (due to a user pressing the stop button, or migration, or cpu hotplug, or whatever). For the human monitor, it's not so intuitive, but the situation is so rare we can just rely on the user to issue cont again.
On 10/05/2011 05:43 PM, Avi Kivity wrote: > On 10/05/2011 04:37 PM, Luiz Capitulino wrote: >> Now, we have three options to fix this but I don't know which one to >> choose: >> >> 1. We could just add the transition RSTATE_PAUSED -> >> RSTATE_POST_MIGRATE >> as valid. Not sure this is a good thing to do though, as it >> seems a silly >> workaround for the fact that the transition to >> RSTATE_PRE_MIGRATE has >> never occurred >> >> 2. This patch makes vm_stop() do the state transition even if the VM >> is already stopped. Seems good enough, except that I fear two >> things. >> First, today we know that vm_stop() is a no-op if the VM is already >> stopped, so there's a semantic change that could turn out to be >> trap. >> Second, I also fear people using vm_stop() as a way to change the >> VM status, just like runstate_set() (which can also become an >> horrible >> trap) >> >> 3. Avi suggested we should keep a reference count, so that states are >> not discarded: >> >> http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00595.html >> >> That solution seemed to be the perfect one, except for one >> important >> detail: how should we implement vm_start() (and thus 'cont')? >> >> In order to maintain how we behave with the external world, the >> only >> option is that vm_start() will set the stop count to 0. Ie, doesn't >> matter if we have stopped the VM 500 times at some point, a >> vm_start() >> call will discard all stored states. >> >> Not sure if that's what you expected, but the first time I read >> Avi's >> idea I had the impression that it would be a good idea that >> vm_start() >> decremented the ref count only once, ie. vm_stop() and >> vm_start() calls >> have to match. >> > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > code wants to execute with vcpus stopped, it should just run inside a > stop/start pair. > > The only confusion can come from the user, if he sees multiple stop > events and expects that just one cont will continue the vm. For the > machine monitor, we should just document that the you have to issue > one cont for every stop event you see (plus any stops you issue). > It's not unnatural - the code that handles a stop_due_to_enospace can > work to fix the error and issue a cont, disregarding any other stops > in progress (due to a user pressing the stop button, or migration, or > cpu hotplug, or whatever). For the human monitor, it's not so > intuitive, but the situation is so rare we can just rely on the user > to issue cont again. > btw, another way to look at it is like a lock: vm_lock() / vm_unlock(). It's obvious that vm_unlock() can't release all the locks.
On 2011-10-05 17:43, Avi Kivity wrote: > On 10/05/2011 04:37 PM, Luiz Capitulino wrote: >> Now, we have three options to fix this but I don't know which one to >> choose: >> >> 1. We could just add the transition RSTATE_PAUSED -> >> RSTATE_POST_MIGRATE >> as valid. Not sure this is a good thing to do though, as it seems >> a silly >> workaround for the fact that the transition to RSTATE_PRE_MIGRATE >> has >> never occurred >> >> 2. This patch makes vm_stop() do the state transition even if the VM >> is already stopped. Seems good enough, except that I fear two >> things. >> First, today we know that vm_stop() is a no-op if the VM is already >> stopped, so there's a semantic change that could turn out to be >> trap. >> Second, I also fear people using vm_stop() as a way to change the >> VM status, just like runstate_set() (which can also become an >> horrible >> trap) >> >> 3. Avi suggested we should keep a reference count, so that states are >> not discarded: >> >> http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00595.html >> >> That solution seemed to be the perfect one, except for one important >> detail: how should we implement vm_start() (and thus 'cont')? >> >> In order to maintain how we behave with the external world, the only >> option is that vm_start() will set the stop count to 0. Ie, doesn't >> matter if we have stopped the VM 500 times at some point, a >> vm_start() >> call will discard all stored states. >> >> Not sure if that's what you expected, but the first time I read >> Avi's >> idea I had the impression that it would be a good idea that >> vm_start() >> decremented the ref count only once, ie. vm_stop() and vm_start() >> calls >> have to match. >> > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > code wants to execute with vcpus stopped, it should just run inside a > stop/start pair. > > The only confusion can come from the user, if he sees multiple stop > events and expects that just one cont will continue the vm. For the > machine monitor, we should just document that the you have to issue one > cont for every stop event you see (plus any stops you issue). It's not > unnatural - the code that handles a stop_due_to_enospace can work to fix > the error and issue a cont, disregarding any other stops in progress > (due to a user pressing the stop button, or migration, or cpu hotplug, > or whatever). For the human monitor, it's not so intuitive, but the > situation is so rare we can just rely on the user to issue cont again. Making this kind of user-visible change would be a bad idea. We are talking about multiple stop states here, but only a single function (vm_stop) to enter them - maybe that's not optimal. But the point is that we were missing one stop-to-stop transition. And that needs to be fixed, either inside vm_stop or when it is called. If you want to lock the VM into paused state, add a new symmetric API that does precisely this. That API would send the VM into RSTATE_LOCKED if it is not yet stopped on lock or is still locked on resume. That would avoid redefining stop states that have no use for lock-counting semantics. Jan
On 10/05/2011 06:31 PM, Jan Kiszka wrote: > >> > > > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > > code wants to execute with vcpus stopped, it should just run inside a > > stop/start pair. > > > > The only confusion can come from the user, if he sees multiple stop > > events and expects that just one cont will continue the vm. For the > > machine monitor, we should just document that the you have to issue one > > cont for every stop event you see (plus any stops you issue). It's not > > unnatural - the code that handles a stop_due_to_enospace can work to fix > > the error and issue a cont, disregarding any other stops in progress > > (due to a user pressing the stop button, or migration, or cpu hotplug, > > or whatever). For the human monitor, it's not so intuitive, but the > > situation is so rare we can just rely on the user to issue cont again. > > Making this kind of user-visible change would be a bad idea. The current situation is a bad idea. Consider a user-initiated or qemu-initiated stop; the user starts to deal with it, types 'cont', and as the Enter key is being depressed another qemu-initiated stop comes along. The 'cont' restarts the guest even though the second event was not dealt with. > We are talking about multiple stop states here, but only a single > function (vm_stop) to enter them - maybe that's not optimal. But the > point is that we were missing one stop-to-stop transition. And that > needs to be fixed, either inside vm_stop or when it is called. Those stops are orthogonal. There is no relationship between a migration stop, a user stop, an ENOSPC stop, a hotplug stop, and a debugger stop. There is no reason to start inventing stop-to-stop transitions between them. A 'cont' intended for one should not undo another. There are two ways to do this, one is to store a set of stop reasons and let both 'stop' and 'cont' specify the reason, the other, which is simpler but less safe, is to use a reference counting approach. > > If you want to lock the VM into paused state, add a new symmetric API > that does precisely this. That API would send the VM into RSTATE_LOCKED > if it is not yet stopped on lock or is still locked on resume. That > would avoid redefining stop states that have no use for lock-counting > semantics. > Which stop states would these be? When would you want one cont to undo two stops?
On 2011-10-05 18:37, Avi Kivity wrote: > On 10/05/2011 06:31 PM, Jan Kiszka wrote: >> >> >> > >> > vm_start() should be symmetric with vm_stop(). That is, if a piece of >> > code wants to execute with vcpus stopped, it should just run inside a >> > stop/start pair. >> > >> > The only confusion can come from the user, if he sees multiple stop >> > events and expects that just one cont will continue the vm. For the >> > machine monitor, we should just document that the you have to issue >> one >> > cont for every stop event you see (plus any stops you issue). It's >> not >> > unnatural - the code that handles a stop_due_to_enospace can work >> to fix >> > the error and issue a cont, disregarding any other stops in progress >> > (due to a user pressing the stop button, or migration, or cpu hotplug, >> > or whatever). For the human monitor, it's not so intuitive, but the >> > situation is so rare we can just rely on the user to issue cont again. >> >> Making this kind of user-visible change would be a bad idea. > > The current situation is a bad idea. > > Consider a user-initiated or qemu-initiated stop; the user starts to > deal with it, types 'cont', and as the Enter key is being depressed > another qemu-initiated stop comes along. The 'cont' restarts the guest > even though the second event was not dealt with. You always have this kind of problems when you attach two keyboards to the same console. A counting stop/cont will just create different effects of the same problem but not solve it. > >> We are talking about multiple stop states here, but only a single >> function (vm_stop) to enter them - maybe that's not optimal. But the >> point is that we were missing one stop-to-stop transition. And that >> needs to be fixed, either inside vm_stop or when it is called. > > Those stops are orthogonal. There is no relationship between a > migration stop, a user stop, an ENOSPC stop, a hotplug stop, and a > debugger stop. There is no reason to start inventing stop-to-stop > transitions between them. A 'cont' intended for one should not undo > another. No, they aren't. They are all different states, and we need to model transitions between them. If there are redundant states, lets collapse them. > > There are two ways to do this, one is to store a set of stop reasons and > let both 'stop' and 'cont' specify the reason, the other, which is > simpler but less safe, is to use a reference counting approach. > >> >> If you want to lock the VM into paused state, add a new symmetric API >> that does precisely this. That API would send the VM into RSTATE_LOCKED >> if it is not yet stopped on lock or is still locked on resume. That >> would avoid redefining stop states that have no use for lock-counting >> semantics. >> > > Which stop states would these be? When would you want one cont to undo > two stops? No, cont would remain cont: the resume-after-stop command. Locking would never be user-exposed. It would be a QEMU-internal thing, used when there must be no resume for a certain finite while (e.g. during migration or savevm). I think one problem with the current state machine is that it does not differentiate between these two types of "stop". Jan
On Wed, 05 Oct 2011 18:37:51 +0200 Avi Kivity <avi@redhat.com> wrote: > On 10/05/2011 06:31 PM, Jan Kiszka wrote: > > >> > > > > > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > > > code wants to execute with vcpus stopped, it should just run inside a > > > stop/start pair. > > > > > > The only confusion can come from the user, if he sees multiple stop > > > events and expects that just one cont will continue the vm. For the > > > machine monitor, we should just document that the you have to issue one > > > cont for every stop event you see (plus any stops you issue). It's not > > > unnatural - the code that handles a stop_due_to_enospace can work to fix > > > the error and issue a cont, disregarding any other stops in progress > > > (due to a user pressing the stop button, or migration, or cpu hotplug, > > > or whatever). For the human monitor, it's not so intuitive, but the > > > situation is so rare we can just rely on the user to issue cont again. > > > > Making this kind of user-visible change would be a bad idea. > > The current situation is a bad idea. Let's take the migration use-case as an example (ie. the user stops the VM before performing the migration). Today, if migration fails, migrate_fd_put_ready() will call vm_start() which will put the VM to run again. But if we implement the ref count idea, then vm_start() will just "unlock" migrate_fd_put_ready()'s own call to vm_stop(), that's, the user stop will remain and the user is required to do a 'cont'. I'd probably agree that that's the ideal semantics, but chances are we're going to break qmp clients here.
On 10/05/2011 06:49 PM, Jan Kiszka wrote: > On 2011-10-05 18:37, Avi Kivity wrote: > > On 10/05/2011 06:31 PM, Jan Kiszka wrote: > >> >> > >> > > >> > vm_start() should be symmetric with vm_stop(). That is, if a piece of > >> > code wants to execute with vcpus stopped, it should just run inside a > >> > stop/start pair. > >> > > >> > The only confusion can come from the user, if he sees multiple stop > >> > events and expects that just one cont will continue the vm. For the > >> > machine monitor, we should just document that the you have to issue > >> one > >> > cont for every stop event you see (plus any stops you issue). It's > >> not > >> > unnatural - the code that handles a stop_due_to_enospace can work > >> to fix > >> > the error and issue a cont, disregarding any other stops in progress > >> > (due to a user pressing the stop button, or migration, or cpu hotplug, > >> > or whatever). For the human monitor, it's not so intuitive, but the > >> > situation is so rare we can just rely on the user to issue cont again. > >> > >> Making this kind of user-visible change would be a bad idea. > > > > The current situation is a bad idea. > > > > Consider a user-initiated or qemu-initiated stop; the user starts to > > deal with it, types 'cont', and as the Enter key is being depressed > > another qemu-initiated stop comes along. The 'cont' restarts the guest > > even though the second event was not dealt with. > > You always have this kind of problems when you attach two keyboards to > the same console. A counting stop/cont will just create different > effects of the same problem but not solve it. Let's examine a concrete example: a user is debugging a guest, which stops at a breakpoint. Meanwhile a live migration is going on, involving internal stops. When the guest does manage to run for a bit, it runs out of disk space, generating a stop, which the management agent resolves by allocating more space and issuing a cont. With a counting cont, no matter in what order these events happen, things work out fine. How do they work out with your proposal? > > > >> We are talking about multiple stop states here, but only a single > >> function (vm_stop) to enter them - maybe that's not optimal. But the > >> point is that we were missing one stop-to-stop transition. And that > >> needs to be fixed, either inside vm_stop or when it is called. > > > > Those stops are orthogonal. There is no relationship between a > > migration stop, a user stop, an ENOSPC stop, a hotplug stop, and a > > debugger stop. There is no reason to start inventing stop-to-stop > > transitions between them. A 'cont' intended for one should not undo > > another. > > No, they aren't. They are all different states, and we need to model > transitions between them. If there are redundant states, lets collapse them. What's the state caused by 'stopped-due-to-migration-phase-3' and 'stopped-due-to-breakpoint'? 'stopped-due-to-migration-phase-3-and-breakpoint'? > >> > > > > Which stop states would these be? When would you want one cont to undo > > two stops? > > No, cont would remain cont: the resume-after-stop command. > > Locking would never be user-exposed. It would be a QEMU-internal thing, > used when there must be no resume for a certain finite while (e.g. > during migration or savevm). I think one problem with the current state > machine is that it does not differentiate between these two types of "stop". > My second proposal (a set of stop reasons) differentiates between all kinds of cont. Maybe we should extend the monitor command to cont [reason].
On 10/05/2011 07:02 PM, Luiz Capitulino wrote: > On Wed, 05 Oct 2011 18:37:51 +0200 > Avi Kivity<avi@redhat.com> wrote: > > > On 10/05/2011 06:31 PM, Jan Kiszka wrote: > > > >> > > > > > > > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > > > > code wants to execute with vcpus stopped, it should just run inside a > > > > stop/start pair. > > > > > > > > The only confusion can come from the user, if he sees multiple stop > > > > events and expects that just one cont will continue the vm. For the > > > > machine monitor, we should just document that the you have to issue one > > > > cont for every stop event you see (plus any stops you issue). It's not > > > > unnatural - the code that handles a stop_due_to_enospace can work to fix > > > > the error and issue a cont, disregarding any other stops in progress > > > > (due to a user pressing the stop button, or migration, or cpu hotplug, > > > > or whatever). For the human monitor, it's not so intuitive, but the > > > > situation is so rare we can just rely on the user to issue cont again. > > > > > > Making this kind of user-visible change would be a bad idea. > > > > The current situation is a bad idea. > > Let's take the migration use-case as an example (ie. the user stops the VM > before performing the migration). Today, if migration fails, > migrate_fd_put_ready() will call vm_start() which will put the VM to > run again. > > But if we implement the ref count idea, then vm_start() will just "unlock" > migrate_fd_put_ready()'s own call to vm_stop(), that's, the user stop will > remain and the user is required to do a 'cont'. > > I'd probably agree that that's the ideal semantics, but chances are we're > going to break qmp clients here. There are two questions here. Is this autostart desirable? (IMO no, but haven't given it much thought). If yes, we should provide it somehow. If not, we should default to providing it, but switch to non-autostart if a newer client indicates it understands the new semantics.
On Wed, 05 Oct 2011 19:23:21 +0200 Avi Kivity <avi@redhat.com> wrote: > On 10/05/2011 07:02 PM, Luiz Capitulino wrote: > > On Wed, 05 Oct 2011 18:37:51 +0200 > > Avi Kivity<avi@redhat.com> wrote: > > > > > On 10/05/2011 06:31 PM, Jan Kiszka wrote: > > > > >> > > > > > > > > > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > > > > > code wants to execute with vcpus stopped, it should just run inside a > > > > > stop/start pair. > > > > > > > > > > The only confusion can come from the user, if he sees multiple stop > > > > > events and expects that just one cont will continue the vm. For the > > > > > machine monitor, we should just document that the you have to issue one > > > > > cont for every stop event you see (plus any stops you issue). It's not > > > > > unnatural - the code that handles a stop_due_to_enospace can work to fix > > > > > the error and issue a cont, disregarding any other stops in progress > > > > > (due to a user pressing the stop button, or migration, or cpu hotplug, > > > > > or whatever). For the human monitor, it's not so intuitive, but the > > > > > situation is so rare we can just rely on the user to issue cont again. > > > > > > > > Making this kind of user-visible change would be a bad idea. > > > > > > The current situation is a bad idea. > > > > Let's take the migration use-case as an example (ie. the user stops the VM > > before performing the migration). Today, if migration fails, > > migrate_fd_put_ready() will call vm_start() which will put the VM to > > run again. > > > > But if we implement the ref count idea, then vm_start() will just "unlock" > > migrate_fd_put_ready()'s own call to vm_stop(), that's, the user stop will > > remain and the user is required to do a 'cont'. > > > > I'd probably agree that that's the ideal semantics, but chances are we're > > going to break qmp clients here. > > There are two questions here. Is this autostart desirable? (IMO no, > but haven't given it much thought). It should be configurable at migration time I think. > If yes, we should provide it > somehow. If not, we should default to providing it, but switch to > non-autostart if a newer client indicates it understands the new semantics. That's only one example. You mention another one above: if qemu stops itself twice, will the user be required to run 'cont' twice? I'm not exactly against the semantics you're proposing, but they don't seem to fit today's qemu.
On 10/05/2011 07:39 PM, Luiz Capitulino wrote: > On Wed, 05 Oct 2011 19:23:21 +0200 > Avi Kivity<avi@redhat.com> wrote: > > > On 10/05/2011 07:02 PM, Luiz Capitulino wrote: > > > On Wed, 05 Oct 2011 18:37:51 +0200 > > > Avi Kivity<avi@redhat.com> wrote: > > > > > > > On 10/05/2011 06:31 PM, Jan Kiszka wrote: > > > > > >> > > > > > > > > > > > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > > > > > > code wants to execute with vcpus stopped, it should just run inside a > > > > > > stop/start pair. > > > > > > > > > > > > The only confusion can come from the user, if he sees multiple stop > > > > > > events and expects that just one cont will continue the vm. For the > > > > > > machine monitor, we should just document that the you have to issue one > > > > > > cont for every stop event you see (plus any stops you issue). It's not > > > > > > unnatural - the code that handles a stop_due_to_enospace can work to fix > > > > > > the error and issue a cont, disregarding any other stops in progress > > > > > > (due to a user pressing the stop button, or migration, or cpu hotplug, > > > > > > or whatever). For the human monitor, it's not so intuitive, but the > > > > > > situation is so rare we can just rely on the user to issue cont again. > > > > > > > > > > Making this kind of user-visible change would be a bad idea. > > > > > > > > The current situation is a bad idea. > > > > > > Let's take the migration use-case as an example (ie. the user stops the VM > > > before performing the migration). Today, if migration fails, > > > migrate_fd_put_ready() will call vm_start() which will put the VM to > > > run again. > > > > > > But if we implement the ref count idea, then vm_start() will just "unlock" > > > migrate_fd_put_ready()'s own call to vm_stop(), that's, the user stop will > > > remain and the user is required to do a 'cont'. > > > > > > I'd probably agree that that's the ideal semantics, but chances are we're > > > going to break qmp clients here. > > > > There are two questions here. Is this autostart desirable? (IMO no, > > but haven't given it much thought). > > It should be configurable at migration time I think. Why? autostart can easily be emulated by non-autostart, but not vice versa. > > If yes, we should provide it > > somehow. If not, we should default to providing it, but switch to > > non-autostart if a newer client indicates it understands the new semantics. > > That's only one example. You mention another one above: if qemu stops > itself twice, will the user be required to run 'cont' twice? Yes. That's how the user tells qemu it saw the two events and handled both of them. Otherwise there is a race condition and an event goes unhandled (or rather, it is handled while the guest is running instead of stopped). > I'm not exactly against the semantics you're proposing, but they don't > seem to fit today's qemu. Today's qemu is broken here.
On 2011-10-05 19:12, Avi Kivity wrote: > On 10/05/2011 06:49 PM, Jan Kiszka wrote: >> On 2011-10-05 18:37, Avi Kivity wrote: >> > On 10/05/2011 06:31 PM, Jan Kiszka wrote: >> >> >> >> >> > >> >> > vm_start() should be symmetric with vm_stop(). That is, if a >> piece of >> >> > code wants to execute with vcpus stopped, it should just run >> inside a >> >> > stop/start pair. >> >> > >> >> > The only confusion can come from the user, if he sees multiple >> stop >> >> > events and expects that just one cont will continue the vm. >> For the >> >> > machine monitor, we should just document that the you have to >> issue >> >> one >> >> > cont for every stop event you see (plus any stops you issue). >> It's >> >> not >> >> > unnatural - the code that handles a stop_due_to_enospace can work >> >> to fix >> >> > the error and issue a cont, disregarding any other stops in >> progress >> >> > (due to a user pressing the stop button, or migration, or cpu >> hotplug, >> >> > or whatever). For the human monitor, it's not so intuitive, >> but the >> >> > situation is so rare we can just rely on the user to issue >> cont again. >> >> >> >> Making this kind of user-visible change would be a bad idea. >> > >> > The current situation is a bad idea. >> > >> > Consider a user-initiated or qemu-initiated stop; the user starts to >> > deal with it, types 'cont', and as the Enter key is being depressed >> > another qemu-initiated stop comes along. The 'cont' restarts the >> guest >> > even though the second event was not dealt with. >> >> You always have this kind of problems when you attach two keyboards to >> the same console. A counting stop/cont will just create different >> effects of the same problem but not solve it. > > Let's examine a concrete example: a user is debugging a guest, which > stops at a breakpoint. Meanwhile a live migration is going on, > involving internal stops. When the guest does manage to run for a bit, > it runs out of disk space, generating a stop, which the management agent > resolves by allocating more space and issuing a cont. > > With a counting cont, no matter in what order these events happen, > things work out fine. How do they work out with your proposal? We can enforce stop for temporal reasons (migration/savevm), something that overrules user/management initiated stops. BTW, does stop due to migration actually have a window where it accepts other commands? I thought that phase is synchronous. Then we would just have to implement proper state saving/restoring. Anyway, there is no point in lock counting for stop reasons that require external synchronization anyway. gdb vs. management stack vs. human monitor - nothing is solved by counting the stops, they all can step on each other's shoes. Even worse, exposing a counting stop via the user interface requires additional interfaces to recover lost or forgotten locks. We've discussed this in the past IIRC. Jan
On Wed, 05 Oct 2011 20:02:04 +0200 Avi Kivity <avi@redhat.com> wrote: > On 10/05/2011 07:39 PM, Luiz Capitulino wrote: > > On Wed, 05 Oct 2011 19:23:21 +0200 > > Avi Kivity<avi@redhat.com> wrote: > > > > > On 10/05/2011 07:02 PM, Luiz Capitulino wrote: > > > > On Wed, 05 Oct 2011 18:37:51 +0200 > > > > Avi Kivity<avi@redhat.com> wrote: > > > > > > > > > On 10/05/2011 06:31 PM, Jan Kiszka wrote: > > > > > > >> > > > > > > > > > > > > > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > > > > > > > code wants to execute with vcpus stopped, it should just run inside a > > > > > > > stop/start pair. > > > > > > > > > > > > > > The only confusion can come from the user, if he sees multiple stop > > > > > > > events and expects that just one cont will continue the vm. For the > > > > > > > machine monitor, we should just document that the you have to issue one > > > > > > > cont for every stop event you see (plus any stops you issue). It's not > > > > > > > unnatural - the code that handles a stop_due_to_enospace can work to fix > > > > > > > the error and issue a cont, disregarding any other stops in progress > > > > > > > (due to a user pressing the stop button, or migration, or cpu hotplug, > > > > > > > or whatever). For the human monitor, it's not so intuitive, but the > > > > > > > situation is so rare we can just rely on the user to issue cont again. > > > > > > > > > > > > Making this kind of user-visible change would be a bad idea. > > > > > > > > > > The current situation is a bad idea. > > > > > > > > Let's take the migration use-case as an example (ie. the user stops the VM > > > > before performing the migration). Today, if migration fails, > > > > migrate_fd_put_ready() will call vm_start() which will put the VM to > > > > run again. > > > > > > > > But if we implement the ref count idea, then vm_start() will just "unlock" > > > > migrate_fd_put_ready()'s own call to vm_stop(), that's, the user stop will > > > > remain and the user is required to do a 'cont'. > > > > > > > > I'd probably agree that that's the ideal semantics, but chances are we're > > > > going to break qmp clients here. > > > > > > There are two questions here. Is this autostart desirable? (IMO no, > > > but haven't given it much thought). > > > > It should be configurable at migration time I think. > > Why? autostart can easily be emulated by non-autostart, but not vice versa. What I meant is to give the user the choice to use. > > > If yes, we should provide it > > > somehow. If not, we should default to providing it, but switch to > > > non-autostart if a newer client indicates it understands the new semantics. > > > > That's only one example. You mention another one above: if qemu stops > > itself twice, will the user be required to run 'cont' twice? > > Yes. That's how the user tells qemu it saw the two events and handled > both of them. Otherwise there is a race condition and an event goes > unhandled (or rather, it is handled while the guest is running instead > of stopped). Breaks compatibility. > > I'm not exactly against the semantics you're proposing, but they don't > > seem to fit today's qemu. > > Today's qemu is broken here. For me it's broken because it will abort() if you migrate a paused vm, for you it seems to be broken at the semantic level. We can fix the semantics without breaking compatibility.
On Wed, 5 Oct 2011 15:49:42 -0300 Luiz Capitulino <lcapitulino@redhat.com> wrote: > On Wed, 05 Oct 2011 20:02:04 +0200 > Avi Kivity <avi@redhat.com> wrote: > > > On 10/05/2011 07:39 PM, Luiz Capitulino wrote: > > > On Wed, 05 Oct 2011 19:23:21 +0200 > > > Avi Kivity<avi@redhat.com> wrote: > > > > > > > On 10/05/2011 07:02 PM, Luiz Capitulino wrote: > > > > > On Wed, 05 Oct 2011 18:37:51 +0200 > > > > > Avi Kivity<avi@redhat.com> wrote: > > > > > > > > > > > On 10/05/2011 06:31 PM, Jan Kiszka wrote: > > > > > > > >> > > > > > > > > > > > > > > > > vm_start() should be symmetric with vm_stop(). That is, if a piece of > > > > > > > > code wants to execute with vcpus stopped, it should just run inside a > > > > > > > > stop/start pair. > > > > > > > > > > > > > > > > The only confusion can come from the user, if he sees multiple stop > > > > > > > > events and expects that just one cont will continue the vm. For the > > > > > > > > machine monitor, we should just document that the you have to issue one > > > > > > > > cont for every stop event you see (plus any stops you issue). It's not > > > > > > > > unnatural - the code that handles a stop_due_to_enospace can work to fix > > > > > > > > the error and issue a cont, disregarding any other stops in progress > > > > > > > > (due to a user pressing the stop button, or migration, or cpu hotplug, > > > > > > > > or whatever). For the human monitor, it's not so intuitive, but the > > > > > > > > situation is so rare we can just rely on the user to issue cont again. > > > > > > > > > > > > > > Making this kind of user-visible change would be a bad idea. > > > > > > > > > > > > The current situation is a bad idea. > > > > > > > > > > Let's take the migration use-case as an example (ie. the user stops the VM > > > > > before performing the migration). Today, if migration fails, > > > > > migrate_fd_put_ready() will call vm_start() which will put the VM to > > > > > run again. > > > > > > > > > > But if we implement the ref count idea, then vm_start() will just "unlock" > > > > > migrate_fd_put_ready()'s own call to vm_stop(), that's, the user stop will > > > > > remain and the user is required to do a 'cont'. > > > > > > > > > > I'd probably agree that that's the ideal semantics, but chances are we're > > > > > going to break qmp clients here. > > > > > > > > There are two questions here. Is this autostart desirable? (IMO no, > > > > but haven't given it much thought). > > > > > > It should be configurable at migration time I think. > > > > Why? autostart can easily be emulated by non-autostart, but not vice versa. > > What I meant is to give the user the choice to use. > > > > > If yes, we should provide it > > > > somehow. If not, we should default to providing it, but switch to > > > > non-autostart if a newer client indicates it understands the new semantics. > > > > > > That's only one example. You mention another one above: if qemu stops > > > itself twice, will the user be required to run 'cont' twice? > > > > Yes. That's how the user tells qemu it saw the two events and handled > > both of them. Otherwise there is a race condition and an event goes > > unhandled (or rather, it is handled while the guest is running instead > > of stopped). > > Breaks compatibility. > > > > I'm not exactly against the semantics you're proposing, but they don't > > > seem to fit today's qemu. > > > > Today's qemu is broken here. > > For me it's broken because it will abort() if you migrate a paused vm, for > you it seems to be broken at the semantic level. > > We can fix the semantics without breaking compatibility. s/We can/ We can't
On 10/05/2011 08:50 PM, Luiz Capitulino wrote: >>>> > > > I'm not exactly against the semantics you're proposing, but they don't >>>> > > > seem to fit today's qemu. >>> > > >>> > > Today's qemu is broken here. >> > >> > For me it's broken because it will abort() if you migrate a paused vm, for >> > you it seems to be broken at the semantic level. >> > >> > We can fix the semantics without breaking compatibility. > > s/We can/ We can't I think we should divide stop causes into three groups: 1) those that are undone by QEMU itself: RSTATE_DEBUG RSTATE_SAVEVM RSTATE_PRE_MIGRATE RSTATE_RESTORE For these a lock/release scheme is definitely better. The VM should not start until none of these conditions is in effect, even after a "cont" command. 2) those that are undone by management: RSTATE_IO_ERROR For this we can add a new "retry" monitor command that guarantees no races if the user issues a "stop" or "cont" command while management is processing it. Effectively, it is also a lock/release scheme but controlled by management. 3) those that are undone by "cont": RSTATE_PRE_LAUNCH RSTATE_PAUSED RSTATE_WATCHDOG RSTATE_POST_MIGRATE RSTATE_PANICKED It put here the three runstates where the VM should really not be restarted at all. We can then add a new "start" command that only flips these five to RSTATE_RUNNING. So the runstate is composed of six elements: five lock/unlock states (of which only one can be unlocked by the user), and one running/paused state (composed of five pause reasons + "none"). That is, the runstate is a tuple like [debug, savevm, pre_migrate, restore, io_error, pause_reason] and for the VM to run it must look like [false, false, false, false, false, none]. The four monitor commands would be: 1) "stop": if runstate[pause_reason] == none then runstate[pause_reason] = paused 2) "retry": runstate[io_error] = false 3) "start": runstate[pause_reason] = none There could also be a differentiation between "start" and "start -f", where "-f" would be needed to get out of RSTATE_POST_MIGRATE, RSTATE_PANICKED and probably RSTATE_WATCHDOG too. 4) "cont": backwards compatibility provided by "retry"+"start -f". How does this look? Paolo
On 10/05/2011 08:02 PM, Jan Kiszka wrote: > > > > Let's examine a concrete example: a user is debugging a guest, which > > stops at a breakpoint. Meanwhile a live migration is going on, > > involving internal stops. When the guest does manage to run for a bit, > > it runs out of disk space, generating a stop, which the management agent > > resolves by allocating more space and issuing a cont. > > > > With a counting cont, no matter in what order these events happen, > > things work out fine. How do they work out with your proposal? > > We can enforce stop for temporal reasons (migration/savevm), something > that overrules user/management initiated stops. Migration resume shouldn't overrule user stop. It's really simple. If any agent wants the system stopped, it's stopped. Only when no one wants it stopped, it may run. > > BTW, does stop due to migration actually have a window where it accepts > other commands? I thought that phase is synchronous. Then we would just > have to implement proper state saving/restoring. Save: ++stop_count, restore: --stop_count. > > Anyway, there is no point in lock counting for stop reasons that require > external synchronization anyway. gdb vs. management stack vs. human > monitor - nothing is solved by counting the stops, they all can step on > each other's shoes. Please elaborate. > Even worse, exposing a counting stop via the user > interface requires additional interfaces to recover lost or forgotten > locks. We've discussed this in the past IIRC. > Agree with that. So there's the second proposal: vm_stop(unsigned reason) { if (!stop_state) { do_vm_stop(); } stop_state |= 1 << reason; } vm_resume(unsigned reason) { stop_state &= ~(1 << reason); if (!stop_state) { do_vm_resume(); } } so now each agent is separated from the other.
On 2011-10-06 16:27, Avi Kivity wrote: > On 10/05/2011 08:02 PM, Jan Kiszka wrote: >> > >> > Let's examine a concrete example: a user is debugging a guest, which >> > stops at a breakpoint. Meanwhile a live migration is going on, >> > involving internal stops. When the guest does manage to run for a >> bit, >> > it runs out of disk space, generating a stop, which the management >> agent >> > resolves by allocating more space and issuing a cont. >> > >> > With a counting cont, no matter in what order these events happen, >> > things work out fine. How do they work out with your proposal? >> >> We can enforce stop for temporal reasons (migration/savevm), something >> that overrules user/management initiated stops. > > Migration resume shouldn't overrule user stop. That's not what I had in mind. Migration stop could overrule user resume. But that discussion is moot as there is no time span where this could happen. Migration just needs to re-enter the original state on error, savevm/loadvm restore what it found on entry. All this is atomic /wrt other agents. > > It's really simple. If any agent wants the system stopped, it's > stopped. Only when no one wants it stopped, it may run. > >> >> BTW, does stop due to migration actually have a window where it accepts >> other commands? I thought that phase is synchronous. Then we would just >> have to implement proper state saving/restoring. > > Save: ++stop_count, restore: --stop_count. > >> >> Anyway, there is no point in lock counting for stop reasons that require >> external synchronization anyway. gdb vs. management stack vs. human >> monitor - nothing is solved by counting the stops, they all can step on >> each other's shoes. > > Please elaborate. Every agent can issue every monitor command. If you have a gdb session running, you don't want the management stack to migrate your VM away or mess with it otherwise. If you try to migrate a machine, you don't want any other agent change its configuration beforehand, adding a device that is not present on the target, etc. > >> Even worse, exposing a counting stop via the user >> interface requires additional interfaces to recover lost or forgotten >> locks. We've discussed this in the past IIRC. >> > > Agree with that. So there's the second proposal: > > vm_stop(unsigned reason) > { > if (!stop_state) { > do_vm_stop(); > } > stop_state |= 1 << reason; > } > > vm_resume(unsigned reason) > { > stop_state &= ~(1 << reason); > if (!stop_state) { > do_vm_resume(); > } > } > > so now each agent is separated from the other. > Stop reasons are orthogonal to agents. BTW, the above model would still require extending the user interface to report pending stop reasons and allow specifying resume reasons. Jan
On Tue, 4 Oct 2011 14:04:45 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > Trying to migrate a paused machine fails. The reason is that > the RSTATE_PRE_MIGRATE is reached with vm_stop, and this > transition is eaten when the vm is already paused. This patch > fixes the problem by always going through runstate_set and > always notifying the new state. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> As the discussion on this subject is taking too long, I'll apply this one. The only problem is that it doesn't fully fix the problem: we need a transition from paused to finish-migrate (yes, I've clarified the name in another series). Could you fix that and resend please? > --- > cpus.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/cpus.c b/cpus.c > index 8978779..eab8ff6 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -128,6 +128,8 @@ static void do_vm_stop(RunState state) > qemu_aio_flush(); > bdrv_flush_all(); > monitor_protocol_event(QEVENT_STOP, NULL); > + } else { > + runstate_set(state); > } > } >
diff --git a/cpus.c b/cpus.c index 8978779..eab8ff6 100644 --- a/cpus.c +++ b/cpus.c @@ -128,6 +128,8 @@ static void do_vm_stop(RunState state) qemu_aio_flush(); bdrv_flush_all(); monitor_protocol_event(QEVENT_STOP, NULL); + } else { + runstate_set(state); } }
Trying to migrate a paused machine fails. The reason is that the RSTATE_PRE_MIGRATE is reached with vm_stop, and this transition is eaten when the vm is already paused. This patch fixes the problem by always going through runstate_set and always notifying the new state. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- cpus.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-)