Message ID | c704dd9f86e827f6e87c7dbfda96d309c0dd43bd.1279881908.git.amit.shah@redhat.com |
---|---|
State | New |
Headers | show |
On Fri, 23 Jul 2010 16:15:15 +0530 Amit Shah <amit.shah@redhat.com> wrote: > When a 'cont' is issued on a VM that's just waiting for an incoming > migration, the VM reboots and boots into the guest, possibly corrupting > its storage since it could be shared with another VM running elsewhere. > > Ensure that a VM started with '-incoming' is only run when an incoming > migration successfully completes. > > Reported-by: Laine Stump <laine@redhat.com> > Signed-off-by: Amit Shah <amit.shah@redhat.com> > --- > migration.c | 3 +++ > monitor.c | 4 ++++ > sysemu.h | 1 + > vl.c | 3 +++ > 4 files changed, 11 insertions(+), 0 deletions(-) > > diff --git a/migration.c b/migration.c > index 650eb78..84d7e4a 100644 > --- a/migration.c > +++ b/migration.c > @@ -67,6 +67,9 @@ void process_incoming_migration(QEMUFile *f) > qemu_announce_self(); > DPRINTF("successfully loaded vm state\n"); > > + incoming_expected = false; > + incoming_done = true; > + > if (autostart) > vm_start(); > } > diff --git a/monitor.c b/monitor.c > index 45fd482..d12a7b5 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -1056,6 +1056,10 @@ static int do_cont(Monitor *mon, const QDict *qdict, QObject **ret_data) > { > struct bdrv_iterate_context context = { mon, 0 }; > > + if (incoming_expected && !incoming_done) { > + autostart = 1; Why do we need to set autostart? We should just fail if we're unable to run. > + return 1; /* Waiting for incoming migration */ You should return -1 and use qerror_report(), so that we have a meaningful error in the user Monitor and QMP (otherwise we'll get UndefinedError). And incomding_done is not needed. > + } > bdrv_iterate(encrypted_bdrv_it, &context); > /* only resume the vm if all keys are set and valid */ > if (!context.err) { > diff --git a/sysemu.h b/sysemu.h > index 9c988bb..f9c1962 100644 > --- a/sysemu.h > +++ b/sysemu.h > @@ -99,6 +99,7 @@ typedef enum DisplayType > } DisplayType; > > extern int autostart; > +extern int incoming_expected, incoming_done; > extern int bios_size; > > typedef enum { > diff --git a/vl.c b/vl.c > index ba6ee11..b69c11b 100644 > --- a/vl.c > +++ b/vl.c > @@ -182,6 +182,8 @@ int nb_nics; > NICInfo nd_table[MAX_NICS]; > int vm_running; > int autostart; > +int incoming_expected; /* Started with -incoming and waiting for incoming */ > +int incoming_done; /* Incoming migration successful */ > static int rtc_utc = 1; > static int rtc_date_offset = -1; /* -1 means no change */ > QEMUClock *rtc_clock; > @@ -2557,6 +2559,7 @@ int main(int argc, char **argv, char **envp) > break; > case QEMU_OPTION_incoming: > incoming = optarg; > + incoming_expected = true; > break; > case QEMU_OPTION_nodefaults: > default_serial = 0;
On (Fri) Jul 23 2010 [15:08:18], Luiz Capitulino wrote: > > diff --git a/monitor.c b/monitor.c > > index 45fd482..d12a7b5 100644 > > --- a/monitor.c > > +++ b/monitor.c > > @@ -1056,6 +1056,10 @@ static int do_cont(Monitor *mon, const QDict *qdict, QObject **ret_data) > > { > > struct bdrv_iterate_context context = { mon, 0 }; > > > > + if (incoming_expected && !incoming_done) { > > + autostart = 1; > > Why do we need to set autostart? We should just fail if we're unable to run. > > > + return 1; /* Waiting for incoming migration */ > > You should return -1 and use qerror_report(), so that we have a meaningful > error in the user Monitor and QMP (otherwise we'll get UndefinedError). That would mean old/existing libvirt will be confused on why guests wouldn't start even though it issued cont. If it's not a problem for the libvirt folks, I can do that. > And incomding_done is not needed. Yes, not in this version. Amit
On Sat, 24 Jul 2010 13:01:24 +0530 Amit Shah <amit.shah@redhat.com> wrote: > On (Fri) Jul 23 2010 [15:08:18], Luiz Capitulino wrote: > > > diff --git a/monitor.c b/monitor.c > > > index 45fd482..d12a7b5 100644 > > > --- a/monitor.c > > > +++ b/monitor.c > > > @@ -1056,6 +1056,10 @@ static int do_cont(Monitor *mon, const QDict *qdict, QObject **ret_data) > > > { > > > struct bdrv_iterate_context context = { mon, 0 }; > > > > > > + if (incoming_expected && !incoming_done) { > > > + autostart = 1; > > > > Why do we need to set autostart? We should just fail if we're unable to run. > > > > > + return 1; /* Waiting for incoming migration */ > > > > You should return -1 and use qerror_report(), so that we have a meaningful > > error in the user Monitor and QMP (otherwise we'll get UndefinedError). > > That would mean old/existing libvirt will be confused on why guests > wouldn't start even though it issued cont. Yes, although delaying to start could cause a problem too and this is also introducing an new error in QMP already. I really would like to avoid adding weird semantics, specially in QMP where cont will return an error but will put the VM to run later. We could fix this there only, but then it will get complex w/o reason. We should fix it properly right now, IMO. > If it's not a problem for the libvirt folks, I can do that. Laine, could you please check that? > > > And incomding_done is not needed. > > Yes, not in this version. > > Amit >
On 07/26/2010 10:23 AM, Luiz Capitulino wrote: > On Sat, 24 Jul 2010 13:01:24 +0530 > Amit Shah<amit.shah@redhat.com> wrote: > >> On (Fri) Jul 23 2010 [15:08:18], Luiz Capitulino wrote: >>>> diff --git a/monitor.c b/monitor.c >>>> index 45fd482..d12a7b5 100644 >>>> --- a/monitor.c >>>> +++ b/monitor.c >>>> @@ -1056,6 +1056,10 @@ static int do_cont(Monitor *mon, const QDict *qdict, QObject **ret_data) >>>> { >>>> struct bdrv_iterate_context context = { mon, 0 }; >>>> >>>> + if (incoming_expected&& !incoming_done) { >>>> + autostart = 1; >>> Why do we need to set autostart? We should just fail if we're unable to run. >>> >>>> + return 1; /* Waiting for incoming migration */ >>> You should return -1 and use qerror_report(), so that we have a meaningful >>> error in the user Monitor and QMP (otherwise we'll get UndefinedError). >> That would mean old/existing libvirt will be confused on why guests >> wouldn't start even though it issued cont. > Yes, although delaying to start could cause a problem too and this is > also introducing an new error in QMP already. > > I really would like to avoid adding weird semantics, specially in QMP where > cont will return an error but will put the VM to run later. We could fix this > there only, but then it will get complex w/o reason. > > We should fix it properly right now, IMO. > >> If it's not a problem for the libvirt folks, I can do that. > Laine, could you please check that? That should really be answered by someone who better understands the implications (I'm a newcomer to that part of the code). Dan Berrange or Chris Lalancette maybe? (I am setting up to test the current version of the patch on the system where I can reproduce the problem. Haven't flipped the switch yet, though.)
Laine Stump <laine@redhat.com> wrote: > On 07/26/2010 10:23 AM, Luiz Capitulino wrote: >> On Sat, 24 Jul 2010 13:01:24 +0530 >> Amit Shah<amit.shah@redhat.com> wrote: >> >>> On (Fri) Jul 23 2010 [15:08:18], Luiz Capitulino wrote: >>>>> diff --git a/monitor.c b/monitor.c >>>>> index 45fd482..d12a7b5 100644 >>>>> --- a/monitor.c >>>>> +++ b/monitor.c >>>>> @@ -1056,6 +1056,10 @@ static int do_cont(Monitor *mon, const QDict *qdict, QObject **ret_data) >>>>> { >>>>> struct bdrv_iterate_context context = { mon, 0 }; >>>>> >>>>> + if (incoming_expected&& !incoming_done) { >>>>> + autostart = 1; >>>> Why do we need to set autostart? We should just fail if we're unable to run. >>>> >>>>> + return 1; /* Waiting for incoming migration */ >>>> You should return -1 and use qerror_report(), so that we have a meaningful >>>> error in the user Monitor and QMP (otherwise we'll get UndefinedError). >>> That would mean old/existing libvirt will be confused on why guests >>> wouldn't start even though it issued cont. >> Yes, although delaying to start could cause a problem too and this is >> also introducing an new error in QMP already. >> >> I really would like to avoid adding weird semantics, specially in QMP where >> cont will return an error but will put the VM to run later. We could fix this >> there only, but then it will get complex w/o reason. >> >> We should fix it properly right now, IMO. >> >>> If it's not a problem for the libvirt folks, I can do that. >> Laine, could you please check that? > > That should really be answered by someone who better understands the > implications (I'm a newcomer to that part of the code). Dan Berrange > or Chris Lalancette maybe? > > (I am setting up to test the current version of the patch on the > system where I can reproduce the problem. Haven't flipped the switch > yet, though.) Just to be sure, what do you want "cont" to return if we are in the middle of a migration? This patch just sets autostart=1 (i.e. user wants to start guest as soon as possible), is that ok, or a warning/error is better? Later, Juan.
On Mon, Jul 26, 2010 at 09:49:12PM +0200, Juan Quintela wrote: > Laine Stump <laine@redhat.com> wrote: > > On 07/26/2010 10:23 AM, Luiz Capitulino wrote: > >> On Sat, 24 Jul 2010 13:01:24 +0530 > >> Amit Shah<amit.shah@redhat.com> wrote: > >> > >>> On (Fri) Jul 23 2010 [15:08:18], Luiz Capitulino wrote: > >>>>> diff --git a/monitor.c b/monitor.c > >>>>> index 45fd482..d12a7b5 100644 > >>>>> --- a/monitor.c > >>>>> +++ b/monitor.c > >>>>> @@ -1056,6 +1056,10 @@ static int do_cont(Monitor *mon, const QDict *qdict, QObject **ret_data) > >>>>> { > >>>>> struct bdrv_iterate_context context = { mon, 0 }; > >>>>> > >>>>> + if (incoming_expected&& !incoming_done) { > >>>>> + autostart = 1; > >>>> Why do we need to set autostart? We should just fail if we're unable to run. > >>>> > >>>>> + return 1; /* Waiting for incoming migration */ > >>>> You should return -1 and use qerror_report(), so that we have a meaningful > >>>> error in the user Monitor and QMP (otherwise we'll get UndefinedError). > >>> That would mean old/existing libvirt will be confused on why guests > >>> wouldn't start even though it issued cont. > >> Yes, although delaying to start could cause a problem too and this is > >> also introducing an new error in QMP already. > >> > >> I really would like to avoid adding weird semantics, specially in QMP where > >> cont will return an error but will put the VM to run later. We could fix this > >> there only, but then it will get complex w/o reason. > >> > >> We should fix it properly right now, IMO. > >> > >>> If it's not a problem for the libvirt folks, I can do that. > >> Laine, could you please check that? > > > > That should really be answered by someone who better understands the > > implications (I'm a newcomer to that part of the code). Dan Berrange > > or Chris Lalancette maybe? > > > > (I am setting up to test the current version of the patch on the > > system where I can reproduce the problem. Haven't flipped the switch > > yet, though.) > > Just to be sure, what do you want "cont" to return if we are in the > middle of a migration? NB, this is only making a difference if 'cont' is run between the time QEMU starts with -incoming and the time migration starts. Once migration starts, the entire QEMU monitor is blocked until it completes, so you'll never see 'cont' in the middle of migration. libvirt currently relies on that behaviour but as per previous discussions on the subject, we'd prefer to have some kind of async notification whether via async events, or via an async QMP command. > This patch just sets autostart=1 (i.e. user wants to start guest as soon > as possible), is that ok, or a warning/error is better? I agree with Luiz that having such 'magical' behaviour for commands is not very desirable. If 'cont' isn't valid based on the current QEMU execution state, then it should return an error. The downside is that while migration will still complete successfully, the current libvirt will unfortunately report an error in this scenario. If there is an explicit QMP error code associated with this condition though, we can catch the error and handle it appropriately in future. Daniel
diff --git a/migration.c b/migration.c index 650eb78..84d7e4a 100644 --- a/migration.c +++ b/migration.c @@ -67,6 +67,9 @@ void process_incoming_migration(QEMUFile *f) qemu_announce_self(); DPRINTF("successfully loaded vm state\n"); + incoming_expected = false; + incoming_done = true; + if (autostart) vm_start(); } diff --git a/monitor.c b/monitor.c index 45fd482..d12a7b5 100644 --- a/monitor.c +++ b/monitor.c @@ -1056,6 +1056,10 @@ static int do_cont(Monitor *mon, const QDict *qdict, QObject **ret_data) { struct bdrv_iterate_context context = { mon, 0 }; + if (incoming_expected && !incoming_done) { + autostart = 1; + return 1; /* Waiting for incoming migration */ + } bdrv_iterate(encrypted_bdrv_it, &context); /* only resume the vm if all keys are set and valid */ if (!context.err) { diff --git a/sysemu.h b/sysemu.h index 9c988bb..f9c1962 100644 --- a/sysemu.h +++ b/sysemu.h @@ -99,6 +99,7 @@ typedef enum DisplayType } DisplayType; extern int autostart; +extern int incoming_expected, incoming_done; extern int bios_size; typedef enum { diff --git a/vl.c b/vl.c index ba6ee11..b69c11b 100644 --- a/vl.c +++ b/vl.c @@ -182,6 +182,8 @@ int nb_nics; NICInfo nd_table[MAX_NICS]; int vm_running; int autostart; +int incoming_expected; /* Started with -incoming and waiting for incoming */ +int incoming_done; /* Incoming migration successful */ static int rtc_utc = 1; static int rtc_date_offset = -1; /* -1 means no change */ QEMUClock *rtc_clock; @@ -2557,6 +2559,7 @@ int main(int argc, char **argv, char **envp) break; case QEMU_OPTION_incoming: incoming = optarg; + incoming_expected = true; break; case QEMU_OPTION_nodefaults: default_serial = 0;
When a 'cont' is issued on a VM that's just waiting for an incoming migration, the VM reboots and boots into the guest, possibly corrupting its storage since it could be shared with another VM running elsewhere. Ensure that a VM started with '-incoming' is only run when an incoming migration successfully completes. Reported-by: Laine Stump <laine@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> --- migration.c | 3 +++ monitor.c | 4 ++++ sysemu.h | 1 + vl.c | 3 +++ 4 files changed, 11 insertions(+), 0 deletions(-)