Patchwork [7/8] pseries: savevm support for PAPR virtual SCSI

login
register
mail settings
Submitter David Gibson
Date May 3, 2013, 1:38 a.m.
Message ID <1367545092-19980-8-git-send-email-david@gibson.dropbear.id.au>
Download mbox | patch
Permalink /patch/241132/
State New
Headers show

Comments

David Gibson - May 3, 2013, 1:38 a.m.
This patch adds the necessary support for saving the state of the PAPR VIO
virtual SCSI device.  This turns out to be trivial, because the generiC
SCSI code already quiesces the attached virtual SCSI bus.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/scsi/spapr_vscsi.c |   28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)
Paolo Bonzini - May 6, 2013, 7:37 a.m.
Il 03/05/2013 03:38, David Gibson ha scritto:
> This patch adds the necessary support for saving the state of the PAPR VIO
> virtual SCSI device.  This turns out to be trivial, because the generiC
> SCSI code already quiesces the attached virtual SCSI bus.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/scsi/spapr_vscsi.c |   28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
> index 3d322d5..f416871 100644
> --- a/hw/scsi/spapr_vscsi.c
> +++ b/hw/scsi/spapr_vscsi.c
> @@ -954,6 +954,33 @@ static Property spapr_vscsi_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static void spapr_vscsi_pre_save(void *opaque)
> +{
> +    VSCSIState *s = opaque;
> +    int i;
> +
> +    /* Can't save active requests, apparently the general SCSI code
> +     * quiesces the queue for us on vmsave */
> +    for (i = 0; i < VSCSI_REQ_LIMIT; i++) {
> +        assert(!s->reqs[i].active);
> +    }
> +}

This is only true when the rerror and werror options have the values
"ignore" or "report".  See virtio-scsi for an example of how to save the
requests using the save_request and load_request callbacks in SCSIBusInfo.

Paolo

> +static const VMStateDescription vmstate_spapr_vscsi = {
> +    .name = "spapr_vscsi",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .pre_save = spapr_vscsi_pre_save,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_SPAPR_VIO(vdev, VSCSIState),
> +        /* VSCSI state */
> +        /* ???? */
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -968,6 +995,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
>      k->signal_mask = 0x00000001;
>      dc->props = spapr_vscsi_properties;
>      k->rtce_window_size = 0x10000000;
> +    dc->vmsd = &vmstate_spapr_vscsi;
>  }
>  
>  static const TypeInfo spapr_vscsi_info = {
>
David Gibson - May 7, 2013, 3:07 a.m.
On Mon, May 06, 2013 at 09:37:11AM +0200, Paolo Bonzini wrote:
> Il 03/05/2013 03:38, David Gibson ha scritto:
> > This patch adds the necessary support for saving the state of the PAPR VIO
> > virtual SCSI device.  This turns out to be trivial, because the generiC
> > SCSI code already quiesces the attached virtual SCSI bus.
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/scsi/spapr_vscsi.c |   28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> > 
> > diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
> > index 3d322d5..f416871 100644
> > --- a/hw/scsi/spapr_vscsi.c
> > +++ b/hw/scsi/spapr_vscsi.c
> > @@ -954,6 +954,33 @@ static Property spapr_vscsi_properties[] = {
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > +static void spapr_vscsi_pre_save(void *opaque)
> > +{
> > +    VSCSIState *s = opaque;
> > +    int i;
> > +
> > +    /* Can't save active requests, apparently the general SCSI code
> > +     * quiesces the queue for us on vmsave */
> > +    for (i = 0; i < VSCSI_REQ_LIMIT; i++) {
> > +        assert(!s->reqs[i].active);
> > +    }
> > +}
> 
> This is only true when the rerror and werror options have the values
> "ignore" or "report".  See virtio-scsi for an example of how to save the
> requests using the save_request and load_request callbacks in
> SCSIBusInfo.

Ah, bother.  Unfortunately the save request is quite a lot more
complicated for vscsi, since we have a lot more private data, and I'm
not sure which bits can be reconstructed from other information.  I'll
see what I can come up with.

What guarantees _does_ the scsi layer give about the lifecycle state
of the requests when we savevm?
Alexey Kardashevskiy - May 27, 2013, 6:48 a.m.
On 05/06/2013 05:37 PM, Paolo Bonzini wrote:
> Il 03/05/2013 03:38, David Gibson ha scritto:
>> This patch adds the necessary support for saving the state of the PAPR VIO
>> virtual SCSI device.  This turns out to be trivial, because the generiC
>> SCSI code already quiesces the attached virtual SCSI bus.
>>
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> ---
>>  hw/scsi/spapr_vscsi.c |   28 ++++++++++++++++++++++++++++
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
>> index 3d322d5..f416871 100644
>> --- a/hw/scsi/spapr_vscsi.c
>> +++ b/hw/scsi/spapr_vscsi.c
>> @@ -954,6 +954,33 @@ static Property spapr_vscsi_properties[] = {
>>      DEFINE_PROP_END_OF_LIST(),
>>  };
>>  
>> +static void spapr_vscsi_pre_save(void *opaque)
>> +{
>> +    VSCSIState *s = opaque;
>> +    int i;
>> +
>> +    /* Can't save active requests, apparently the general SCSI code
>> +     * quiesces the queue for us on vmsave */
>> +    for (i = 0; i < VSCSI_REQ_LIMIT; i++) {
>> +        assert(!s->reqs[i].active);
>> +    }
>> +}
> 
> This is only true when the rerror and werror options have the values
> "ignore" or "report".  See virtio-scsi for an example of how to save the
> requests using the save_request and load_request callbacks in SCSIBusInfo.


Sigh.
How do you test that requests are saved/restored correctly? What does
happen to requests which were already sent to the real hardware (real block
device, etc) but have not completed at the moment of the end of migration?


> Paolo
> 
>> +static const VMStateDescription vmstate_spapr_vscsi = {
>> +    .name = "spapr_vscsi",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .pre_save = spapr_vscsi_pre_save,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_SPAPR_VIO(vdev, VSCSIState),
>> +        /* VSCSI state */
>> +        /* ???? */
>> +
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
>>  {
>>      DeviceClass *dc = DEVICE_CLASS(klass);
>> @@ -968,6 +995,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
>>      k->signal_mask = 0x00000001;
>>      dc->props = spapr_vscsi_properties;
>>      k->rtce_window_size = 0x10000000;
>> +    dc->vmsd = &vmstate_spapr_vscsi;
>>  }
>>  
>>  static const TypeInfo spapr_vscsi_info = {
>>
>
Paolo Bonzini - May 27, 2013, 7:03 a.m.
Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto:
>> > 
>> > This is only true when the rerror and werror options have the values
>> > "ignore" or "report".  See virtio-scsi for an example of how to save the
>> > requests using the save_request and load_request callbacks in SCSIBusInfo.
> 
> Sigh.

?

> How do you test that requests are saved/restored correctly? What does
> happen to requests which were already sent to the real hardware (real block
> device, etc) but have not completed at the moment of the end of migration?

They aren't saved, there is a bdrv_drain_all() in the migration code.

This is only used for rerror=stop or werror=stop.  To test it you can
use blkdebug (also a bit underdocumented) or hack block/raw-posix.c with
code that makes it fail the 100th write or something like that.  Start
the VM and migrate it while paused to a QEMU that doesn't have the hack.

Paolo
Alexey Kardashevskiy - May 31, 2013, 5:58 a.m.
On 05/27/2013 05:03 PM, Paolo Bonzini wrote:
> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto:
>>>>
>>>> This is only true when the rerror and werror options have the values
>>>> "ignore" or "report".  See virtio-scsi for an example of how to save the
>>>> requests using the save_request and load_request callbacks in SCSIBusInfo.
>>
>> Sigh.
> 
> ?

I thought the series is ready to go but I was wrong. Furthermore when I got
to the point where I could actually test the save/restore for vscsi_req,
migration was totally broken on PPC and it took some time to fix it :-/


>> How do you test that requests are saved/restored correctly? What does
>> happen to requests which were already sent to the real hardware (real block
>> device, etc) but have not completed at the moment of the end of migration?
> 
> They aren't saved, there is a bdrv_drain_all() in the migration code.
> 
> This is only used for rerror=stop or werror=stop.  To test it you can
> use blkdebug (also a bit underdocumented) or hack block/raw-posix.c with
> code that makes it fail the 100th write or something like that.  Start
> the VM and migrate it while paused to a QEMU that doesn't have the hack.

I run QEMU as (this is the destination, the source just does not have
-incoming):
./qemu-system-ppc64 \
 -L "qemu-ppc64-bios/" \
 -device "spapr-vscsi,id=ibmvscsi0" \
 -drive
"file=virtimg/fc18guest,if=none,id=dddrive0,readonly=off,format=blkdebug,media=disk,werror=stop,rerror=stop"
\
 -device
"scsi-disk,id=scsidisk0,bus=ibmvscsi0.0,channel=0,scsi-id=0,lun=0,drive=dddrive0,removable=off"
\
 -incoming "tcp:localhost:4000" \
 -m "1024" \
 -machine "pseries" \
 -nographic \
 -vga "none" \
 -enable-kvm

Am I using werror/rerror correctly?

I did not really understand how to use blkdebug or what else to hack in
raw-posix but the point is I cannot get QEMU into a state with at least one
vcsci_req.active==1, they are always inactive no matter what I do - I run
10 instances of "dd if=/def/sda of=/dev/null bs=4K" (on 8GB image with
FC18) and increase migration speed to 500MB/s, no effect.

How do you trigger the situation when there are inactive requests which
have to be migrated?


And another question (sorry I am not very familiar with terminology but
cc:Ben is :) ) - what happens with indirect requests if migration happened
in the middle of handling such a request? virtio-scsi does not seem to
handle this situation anyhow, it just reconstructs the whole request and
that's it.
Paolo Bonzini - May 31, 2013, 8:18 a.m.
Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto:
> On 05/27/2013 05:03 PM, Paolo Bonzini wrote:
>> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto:
>>>>>
>>>>> This is only true when the rerror and werror options have the values
>>>>> "ignore" or "report".  See virtio-scsi for an example of how to save the
>>>>> requests using the save_request and load_request callbacks in SCSIBusInfo.
>>>
>>> Sigh.
>>
>> ?
> 
> I thought the series is ready to go but I was wrong. Furthermore when I got
> to the point where I could actually test the save/restore for vscsi_req,
> migration was totally broken on PPC and it took some time to fix it :-/

It is ready.  I was just pointing out that it's not _production_ ready.

(Sorry, I'm unusually terse these days).

> I run QEMU as (this is the destination, the source just does not have
> -incoming):
> ./qemu-system-ppc64 \
>  -L "qemu-ppc64-bios/" \
>  -device "spapr-vscsi,id=ibmvscsi0" \
>  -drive
> "file=virtimg/fc18guest,if=none,id=dddrive0,readonly=off,format=blkdebug,media=disk,werror=stop,rerror=stop"
> \
>  -device
> "scsi-disk,id=scsidisk0,bus=ibmvscsi0.0,channel=0,scsi-id=0,lun=0,drive=dddrive0,removable=off"
> \
>  -incoming "tcp:localhost:4000" \
>  -m "1024" \
>  -machine "pseries" \
>  -nographic \
>  -vga "none" \
>  -enable-kvm
> 
> Am I using werror/rerror correctly?

Yes.

> I did not really understand how to use blkdebug or what else to hack in
> raw-posix but the point is I cannot get QEMU into a state with at least one
> vcsci_req.active==1, they are always inactive no matter what I do - I run
> 10 instances of "dd if=/def/sda of=/dev/null bs=4K" (on 8GB image with
> FC18) and increase migration speed to 500MB/s, no effect.

No, that doesn't help.

> How do you trigger the situation when there are inactive requests which
> have to be migrated?

You need to trigger an error.  For example, you could use a sparse image
on an almost-full partition and let "dd" fill your disk.  Then migrate
to another instance of QEMU on the same machine, the destination machine
should succeed migration but fail starting the machine.  When free space
on that partition, and "cont" on the destination, it should resume.

> And another question (sorry I am not very familiar with terminology but
> cc:Ben is :) ) - what happens with indirect requests if migration happened
> in the middle of handling such a request? virtio-scsi does not seem to
> handle this situation anyhow, it just reconstructs the whole request and
> that's it.

What are indirect requests?

Paolo
Benjamin Herrenschmidt - May 31, 2013, 10:07 a.m.
On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote:
> 
> And another question (sorry I am not very familiar with terminology but
> cc:Ben is :) ) - what happens with indirect requests if migration happened
> in the middle of handling such a request? virtio-scsi does not seem to
> handle this situation anyhow, it just reconstructs the whole request and
> that's it.

So Paolo, the crux of the question here is really whether we have any
guarantee about the state of the request when this happens (by this I
mean a save happening with requests still "in flight") ?

IE. Can the request can be at any stage of processing, with the data
transfer phase being half way through, or do we somewhat know for sure
that the request will *not* have started transferring any data ?

This is key, because in the latter case, all we really need to do is
save the request itself, and re-parse it on restore as if it was
new really (at least from a DMA descriptor perspective).

However, if the data transfer is already half way through, we need to
somewhat save the state of the data transfer machinery, ie. the position
of the "cursor" that follows the guest-provided DMA descriptor list,
etc... (which isn't *that* trivial since we have a concept of indirect
descriptors and we use pointers to follow them, so we'd probably have
to re-walk the whole user descriptors list until we reach the same position).

Cheers,
Ben.
Alexey Kardashevskiy - May 31, 2013, 10:12 a.m.
On 05/31/2013 06:18 PM, Paolo Bonzini wrote:
> Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto:
>> On 05/27/2013 05:03 PM, Paolo Bonzini wrote:
>>> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto:
>>>>>>
>>>>>> This is only true when the rerror and werror options have the values
>>>>>> "ignore" or "report".  See virtio-scsi for an example of how to save the
>>>>>> requests using the save_request and load_request callbacks in SCSIBusInfo.
>>>>
>>>> Sigh.
>>>
>>> ?
>>
>> I thought the series is ready to go but I was wrong. Furthermore when I got
>> to the point where I could actually test the save/restore for vscsi_req,
>> migration was totally broken on PPC and it took some time to fix it :-/
> 
> It is ready.  I was just pointing out that it's not _production_ ready.

What is the difference then? :)


> (Sorry, I'm unusually terse these days).
> 
>> I run QEMU as (this is the destination, the source just does not have
>> -incoming):
>> ./qemu-system-ppc64 \
>>  -L "qemu-ppc64-bios/" \
>>  -device "spapr-vscsi,id=ibmvscsi0" \
>>  -drive
>> "file=virtimg/fc18guest,if=none,id=dddrive0,readonly=off,format=blkdebug,media=disk,werror=stop,rerror=stop"
>> \
>>  -device
>> "scsi-disk,id=scsidisk0,bus=ibmvscsi0.0,channel=0,scsi-id=0,lun=0,drive=dddrive0,removable=off"
>> \
>>  -incoming "tcp:localhost:4000" \
>>  -m "1024" \
>>  -machine "pseries" \
>>  -nographic \
>>  -vga "none" \
>>  -enable-kvm
>>
>> Am I using werror/rerror correctly?
> 
> Yes.
> 
>> I did not really understand how to use blkdebug or what else to hack in
>> raw-posix but the point is I cannot get QEMU into a state with at least one
>> vcsci_req.active==1, they are always inactive no matter what I do - I run
>> 10 instances of "dd if=/def/sda of=/dev/null bs=4K" (on 8GB image with
>> FC18) and increase migration speed to 500MB/s, no effect.
> 
> No, that doesn't help.
> 
>> How do you trigger the situation when there are inactive requests which
>> have to be migrated?
> 
> You need to trigger an error.  For example, you could use a sparse image
> on an almost-full partition and let "dd" fill your disk.  Then migrate
> to another instance of QEMU on the same machine, the destination machine
> should succeed migration but fail starting the machine.

Why would it fail? I run "dd", it fills the disk and stops. Then I migrate.
How can I get pending SCSI request in a queue? Sorry, I am definitely
missing something.


> When free space
> on that partition, and "cont" on the destination, it should resume.
> 
>> And another question (sorry I am not very familiar with terminology but
>> cc:Ben is :) ) - what happens with indirect requests if migration happened
>> in the middle of handling such a request? virtio-scsi does not seem to
>> handle this situation anyhow, it just reconstructs the whole request and
>> that's it.
> 
> What are indirect requests?

I'll leave it to Ben to avoid confusing :)
Alexey Kardashevskiy - May 31, 2013, 10:25 a.m.
On 05/31/2013 08:07 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote:
>>
>> And another question (sorry I am not very familiar with terminology but
>> cc:Ben is :) ) - what happens with indirect requests if migration happened
>> in the middle of handling such a request? virtio-scsi does not seem to
>> handle this situation anyhow, it just reconstructs the whole request and
>> that's it.
> 
> So Paolo, the crux of the question here is really whether we have any
> guarantee about the state of the request when this happens (by this I
> mean a save happening with requests still "in flight") ?
> 
> IE. Can the request can be at any stage of processing, with the data
> transfer phase being half way through, or do we somewhat know for sure
> that the request will *not* have started transferring any data ?
> 
> This is key, because in the latter case, all we really need to do is
> save the request itself, and re-parse it on restore as if it was
> new really (at least from a DMA descriptor perspective).
> 
> However, if the data transfer is already half way through, we need to
> somewhat save the state of the data transfer machinery, ie. the position
> of the "cursor" that follows the guest-provided DMA descriptor list,
> etc... (which isn't *that* trivial since we have a concept of indirect
> descriptors and we use pointers to follow them, so we'd probably have
> to re-walk the whole user descriptors list until we reach the same position).


Is not it the same QEMU thread which handles hcalls and QEMU console
commands so the migration cannot stop parsing/handling a vscsi_req?
Paolo Bonzini - May 31, 2013, 10:26 a.m.
Il 31/05/2013 12:12, Alexey Kardashevskiy ha scritto:
> On 05/31/2013 06:18 PM, Paolo Bonzini wrote:
>> Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto:
>>> On 05/27/2013 05:03 PM, Paolo Bonzini wrote:
>>>> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto:
>>>>>>>
>>>>>>> This is only true when the rerror and werror options have the values
>>>>>>> "ignore" or "report".  See virtio-scsi for an example of how to save the
>>>>>>> requests using the save_request and load_request callbacks in SCSIBusInfo.
>>>>>
>>>>> Sigh.
>>>>
>>>> ?
>>>
>>> I thought the series is ready to go but I was wrong. Furthermore when I got
>>> to the point where I could actually test the save/restore for vscsi_req,
>>> migration was totally broken on PPC and it took some time to fix it :-/
>>
>> It is ready.  I was just pointing out that it's not _production_ ready.
> 
> What is the difference then? :)

It is mergeable, but it needs further work and you should be aware of that.

>>> How do you trigger the situation when there are inactive requests which
>>> have to be migrated?
>>
>> You need to trigger an error.  For example, you could use a sparse image
>> on an almost-full partition and let "dd" fill your disk.  Then migrate
>> to another instance of QEMU on the same machine, the destination machine
>> should succeed migration but fail starting the machine.
> 
> Why would it fail? I run "dd", it fills the disk and stops.

You have to make it fill the _host_ disk before it fills the guest disk.
 That's why I mentioned a sparse image.

Then the machine pauses with the failing request in its queue.  When you
migrate, the request is migrated as well.

Paolo
Alexey Kardashevskiy - May 31, 2013, 10:33 a.m.
On 05/31/2013 08:26 PM, Paolo Bonzini wrote:
> Il 31/05/2013 12:12, Alexey Kardashevskiy ha scritto:
>> On 05/31/2013 06:18 PM, Paolo Bonzini wrote:
>>> Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto:
>>>> On 05/27/2013 05:03 PM, Paolo Bonzini wrote:
>>>>> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto:
>>>>>>>>
>>>>>>>> This is only true when the rerror and werror options have the values
>>>>>>>> "ignore" or "report".  See virtio-scsi for an example of how to save the
>>>>>>>> requests using the save_request and load_request callbacks in SCSIBusInfo.
>>>>>>
>>>>>> Sigh.
>>>>>
>>>>> ?
>>>>
>>>> I thought the series is ready to go but I was wrong. Furthermore when I got
>>>> to the point where I could actually test the save/restore for vscsi_req,
>>>> migration was totally broken on PPC and it took some time to fix it :-/
>>>
>>> It is ready.  I was just pointing out that it's not _production_ ready.
>>
>> What is the difference then? :)
> 
> It is mergeable, but it needs further work and you should be aware of that.
> 
>>>> How do you trigger the situation when there are inactive requests which
>>>> have to be migrated?
>>>
>>> You need to trigger an error.  For example, you could use a sparse image
>>> on an almost-full partition and let "dd" fill your disk.  Then migrate
>>> to another instance of QEMU on the same machine, the destination machine
>>> should succeed migration but fail starting the machine.
>>
>> Why would it fail? I run "dd", it fills the disk and stops.
> 
> You have to make it fill the _host_ disk before it fills the guest disk.
>  That's why I mentioned a sparse image.
> 
> Then the machine pauses with the failing request in its queue.


Does the machine pause automatically in such case? Did not know that, now
it makes sense. Thanks.


> When you migrate, the request is migrated as well.
Paolo Bonzini - May 31, 2013, 10:34 a.m.
Il 31/05/2013 12:33, Alexey Kardashevskiy ha scritto:
>>>>> How do you trigger the situation when there are inactive requests which
>>>>> have to be migrated?
>>>>
>>>> You need to trigger an error.  For example, you could use a sparse image
>>>> on an almost-full partition and let "dd" fill your disk.  Then migrate
>>>> to another instance of QEMU on the same machine, the destination machine
>>>> should succeed migration but fail starting the machine.
>>>
>>> Why would it fail? I run "dd", it fills the disk and stops.
>>
>> You have to make it fill the _host_ disk before it fills the guest disk.
>>  That's why I mentioned a sparse image.
>>
>> Then the machine pauses with the failing request in its queue.
> 
> Does the machine pause automatically in such case? Did not know that, now
> it makes sense. Thanks.

That's the point of rerror=stop/werror=stop. :)

Paolo
Paolo Bonzini - May 31, 2013, 10:41 a.m.
Il 31/05/2013 12:25, Alexey Kardashevskiy ha scritto:
> On 05/31/2013 08:07 PM, Benjamin Herrenschmidt wrote:
>> On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote:
>>>
>>> And another question (sorry I am not very familiar with terminology but
>>> cc:Ben is :) ) - what happens with indirect requests if migration happened
>>> in the middle of handling such a request? virtio-scsi does not seem to
>>> handle this situation anyhow, it just reconstructs the whole request and
>>> that's it.
>>
>> So Paolo, the crux of the question here is really whether we have any
>> guarantee about the state of the request when this happens (by this I
>> mean a save happening with requests still "in flight") ?
>>
>> IE. Can the request can be at any stage of processing, with the data
>> transfer phase being half way through, or do we somewhat know for sure
>> that the request will *not* have started transferring any data ?
>>
>> This is key, because in the latter case, all we really need to do is
>> save the request itself, and re-parse it on restore as if it was
>> new really (at least from a DMA descriptor perspective).
>>
>> However, if the data transfer is already half way through, we need to
>> somewhat save the state of the data transfer machinery, ie. the position
>> of the "cursor" that follows the guest-provided DMA descriptor list,
>> etc... (which isn't *that* trivial since we have a concept of indirect
>> descriptors and we use pointers to follow them, so we'd probably have
>> to re-walk the whole user descriptors list until we reach the same position).

It may be halfway through, but it is always restarted on the destination.

virtio-scsi parses the whole descriptor chain upfront and sends the
guest addresses in the migration stream.

> Is not it the same QEMU thread which handles hcalls and QEMU console
> commands so the migration cannot stop parsing/handling a vscsi_req?

The VM is paused and I/O is flushed at the point when the reqs are sent.
 That's why you couldn't get a pending request.  Only failed requests
remain in queue.

Paolo
Benjamin Herrenschmidt - June 1, 2013, 12:01 a.m.
On Fri, 2013-05-31 at 12:41 +0200, Paolo Bonzini wrote:

> It may be halfway through, but it is always restarted on the destination.

"restarted" as in the whole transfer is restarted if any right ? So we
can essentially consider as a new request for which we just did
scsi_req_enqueue() ?

IE. We don't do direct DMA to guest pages just yet (we still do copies)
so basically our process is:

 1- Obtain request from guest
 2- Queue it (scsi_req_enqueue)
 3- No transfer -> go away (completion is called)
 4- Pre-process user descriptors (check desc type direct vs indirect,
    position our "cursor" walking them etc....)
 5- scsi_req_continue()
    .../... loop of callbacks & transfer

Now from what you say, I assume that regardless of the point where
the request was, when we "resume" it will always be at step 4 ?

IE. I can just pre-process the descriptors again ? (I actually need
to transfer them again from the guest since I suspect I clobber them
at the very least due to byteswap) and call scsi_req_continue() and
assume the transfer (if any) started from the beginning ?

> virtio-scsi parses the whole descriptor chain upfront and sends the
> guest addresses in the migration stream.
> 
> > Is not it the same QEMU thread which handles hcalls and QEMU console
> > commands so the migration cannot stop parsing/handling a vscsi_req?
> 
> The VM is paused and I/O is flushed at the point when the reqs are sent.
>  That's why you couldn't get a pending request.  Only failed requests
> remain in queue.

Ben.
Alexey Kardashevskiy - June 3, 2013, 5:46 a.m.
On 05/31/2013 08:41 PM, Paolo Bonzini wrote:
> Il 31/05/2013 12:25, Alexey Kardashevskiy ha scritto:
>> On 05/31/2013 08:07 PM, Benjamin Herrenschmidt wrote:
>>> On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote:
>>>>
>>>> And another question (sorry I am not very familiar with terminology but
>>>> cc:Ben is :) ) - what happens with indirect requests if migration happened
>>>> in the middle of handling such a request? virtio-scsi does not seem to
>>>> handle this situation anyhow, it just reconstructs the whole request and
>>>> that's it.
>>>
>>> So Paolo, the crux of the question here is really whether we have any
>>> guarantee about the state of the request when this happens (by this I
>>> mean a save happening with requests still "in flight") ?
>>>
>>> IE. Can the request can be at any stage of processing, with the data
>>> transfer phase being half way through, or do we somewhat know for sure
>>> that the request will *not* have started transferring any data ?
>>>
>>> This is key, because in the latter case, all we really need to do is
>>> save the request itself, and re-parse it on restore as if it was
>>> new really (at least from a DMA descriptor perspective).
>>>
>>> However, if the data transfer is already half way through, we need to
>>> somewhat save the state of the data transfer machinery, ie. the position
>>> of the "cursor" that follows the guest-provided DMA descriptor list,
>>> etc... (which isn't *that* trivial since we have a concept of indirect
>>> descriptors and we use pointers to follow them, so we'd probably have
>>> to re-walk the whole user descriptors list until we reach the same position).
> 
> It may be halfway through, but it is always restarted on the destination.
> 
> virtio-scsi parses the whole descriptor chain upfront and sends the
> guest addresses in the migration stream.
> 
>> Is not it the same QEMU thread which handles hcalls and QEMU console
>> commands so the migration cannot stop parsing/handling a vscsi_req?
> 
> The VM is paused and I/O is flushed at the point when the reqs are sent.
>  That's why you couldn't get a pending request.  Only failed requests
> remain in queue.


Ok. I implemented {save|load}_request for IBMVSCSI, started testing - the
destination system behaves very unstable, sometime it is a fault in
_raw_spin_lock or it looks okay but any attempt to read the filesystem
leads to 100% cpu load in qemu process and no response from the guest.

I tried virtio-scsi as well (as it was referred as a good example), it
fails in exactly the same way. So I started wondering - when did you try it
last time? :)

My test is:
1. create qcow2 image 8GB, put it to 2GB USB disk.
2. put 1.8GB "dummy" image onto the same USB disk.
3. run qemu with qcow2 image.
4. do "mkfs.ext4 /dev/sda" in the guest. It creates 300MB file when there
is enough space.
5. wait till the source qemu gets stopped due to io error (info status
confirms this).
6. migrate.
7. remove "dummy".
8. "c"ontinue in the destination guest.

Is it good/bad/ugly? What do I miss? Thanks!
Paolo Bonzini - June 3, 2013, 6:21 a.m.
Il 01/06/2013 02:01, Benjamin Herrenschmidt ha scritto:
> On Fri, 2013-05-31 at 12:41 +0200, Paolo Bonzini wrote:
> 
>> It may be halfway through, but it is always restarted on the destination.
> 
> "restarted" as in the whole transfer is restarted if any right ? So we
> can essentially consider as a new request for which we just did
> scsi_req_enqueue() ?
> 
> IE. We don't do direct DMA to guest pages just yet (we still do copies)
> so basically our process is:
> 
>  1- Obtain request from guest
>  2- Queue it (scsi_req_enqueue)
>  3- No transfer -> go away (completion is called)
>  4- Pre-process user descriptors (check desc type direct vs indirect,
>     position our "cursor" walking them etc....)
>  5- scsi_req_continue()
>     .../... loop of callbacks & transfer
> 
> Now from what you say, I assume that regardless of the point where
> the request was, when we "resume" it will always be at step 4 ?
> 
> IE. I can just pre-process the descriptors again ? (I actually need
> to transfer them again from the guest since I suspect I clobber them
> at the very least due to byteswap) and call scsi_req_continue() and
> assume the transfer (if any) started from the beginning ?

Yes.  Unless the spec somehow lets the guest figure out the point at
which the whole chain has been pre-processed, and lets the guest modify
the chain at this point.  But if that's not the case, you can do that.
Memory has already been loaded when load_request runs.

Paolo
Paolo Bonzini - June 3, 2013, 6:23 a.m.
Il 03/06/2013 07:46, Alexey Kardashevskiy ha scritto:
> Ok. I implemented {save|load}_request for IBMVSCSI, started testing - the
> destination system behaves very unstable, sometime it is a fault in
> _raw_spin_lock or it looks okay but any attempt to read the filesystem
> leads to 100% cpu load in qemu process and no response from the guest.
> 
> I tried virtio-scsi as well (as it was referred as a good example), it
> fails in exactly the same way. So I started wondering - when did you try it
> last time? :)

Perhaps a year ago.  Gerd must have tested usb-storage later than that
though.

> My test is:
> 1. create qcow2 image 8GB, put it to 2GB USB disk.
> 2. put 1.8GB "dummy" image onto the same USB disk.
> 3. run qemu with qcow2 image.
> 4. do "mkfs.ext4 /dev/sda" in the guest. It creates 300MB file when there
> is enough space.
> 5. wait till the source qemu gets stopped due to io error (info status
> confirms this).
> 6. migrate.
> 7. remove "dummy".
> 8. "c"ontinue in the destination guest.

Sounds good.  We really need testcases for this.  I'll take a look when
I come back from vacation.

Paolo
Benjamin Herrenschmidt - June 3, 2013, 8:07 a.m.
On Mon, 2013-06-03 at 15:46 +1000, Alexey Kardashevskiy wrote:
> Ok. I implemented {save|load}_request for IBMVSCSI, started testing -
> the
> destination system behaves very unstable, sometime it is a fault in
> _raw_spin_lock or it looks okay but any attempt to read the filesystem
> leads to 100% cpu load in qemu process and no response from the guest.
> 
> I tried virtio-scsi as well (as it was referred as a good example), it
> fails in exactly the same way. So I started wondering - when did you
> try it
> last time? :)

Did you try virtio-blk or even a ramdisk ? IE, Make sure the problem
isn't some kind of generic migration issue unrelated to storage ?

Cheers,
Ben.
Alexey Kardashevskiy - June 3, 2013, 9:37 a.m.
On 06/03/2013 06:07 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2013-06-03 at 15:46 +1000, Alexey Kardashevskiy wrote:
>> Ok. I implemented {save|load}_request for IBMVSCSI, started testing -
>> the
>> destination system behaves very unstable, sometime it is a fault in
>> _raw_spin_lock or it looks okay but any attempt to read the filesystem
>> leads to 100% cpu load in qemu process and no response from the guest.
>>
>> I tried virtio-scsi as well (as it was referred as a good example), it
>> fails in exactly the same way. So I started wondering - when did you
>> try it
>> last time? :)
> 
> Did you try virtio-blk or even a ramdisk ? IE, Make sure the problem
> isn't some kind of generic migration issue unrelated to storage ?


False alarm. During multiple switches between different git branches, I
just lost my own patch which disables "bulk" migration (which we want to
revert anyway, just waiting for the author to do that himself) :)

At least my test does not fail any more. Sorry for confusing.
Paolo Bonzini - June 3, 2013, 9:41 a.m.
Il 03/06/2013 11:37, Alexey Kardashevskiy ha scritto:
> On 06/03/2013 06:07 PM, Benjamin Herrenschmidt wrote:
>> On Mon, 2013-06-03 at 15:46 +1000, Alexey Kardashevskiy wrote:
>>> Ok. I implemented {save|load}_request for IBMVSCSI, started testing -
>>> the
>>> destination system behaves very unstable, sometime it is a fault in
>>> _raw_spin_lock or it looks okay but any attempt to read the filesystem
>>> leads to 100% cpu load in qemu process and no response from the guest.
>>>
>>> I tried virtio-scsi as well (as it was referred as a good example), it
>>> fails in exactly the same way. So I started wondering - when did you
>>> try it
>>> last time? :)
>>
>> Did you try virtio-blk or even a ramdisk ? IE, Make sure the problem
>> isn't some kind of generic migration issue unrelated to storage ?
> 
> 
> False alarm. During multiple switches between different git branches, I
> just lost my own patch which disables "bulk" migration (which we want to
> revert anyway, just waiting for the author to do that himself) :)
> 
> At least my test does not fail any more. Sorry for confusing.

Good, I was surprised.  That area doesn't see much testing, but it
hasn't seen much change either.

Paolo

Patch

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index 3d322d5..f416871 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -954,6 +954,33 @@  static Property spapr_vscsi_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static void spapr_vscsi_pre_save(void *opaque)
+{
+    VSCSIState *s = opaque;
+    int i;
+
+    /* Can't save active requests, apparently the general SCSI code
+     * quiesces the queue for us on vmsave */
+    for (i = 0; i < VSCSI_REQ_LIMIT; i++) {
+        assert(!s->reqs[i].active);
+    }
+}
+
+static const VMStateDescription vmstate_spapr_vscsi = {
+    .name = "spapr_vscsi",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = spapr_vscsi_pre_save,
+    .fields      = (VMStateField []) {
+        VMSTATE_SPAPR_VIO(vdev, VSCSIState),
+        /* VSCSI state */
+        /* ???? */
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -968,6 +995,7 @@  static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
     k->signal_mask = 0x00000001;
     dc->props = spapr_vscsi_properties;
     k->rtce_window_size = 0x10000000;
+    dc->vmsd = &vmstate_spapr_vscsi;
 }
 
 static const TypeInfo spapr_vscsi_info = {