Message ID | 1367545092-19980-8-git-send-email-david@gibson.dropbear.id.au |
---|---|
State | New |
Headers | show |
Il 03/05/2013 03:38, David Gibson ha scritto: > This patch adds the necessary support for saving the state of the PAPR VIO > virtual SCSI device. This turns out to be trivial, because the generiC > SCSI code already quiesces the attached virtual SCSI bus. > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > --- > hw/scsi/spapr_vscsi.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c > index 3d322d5..f416871 100644 > --- a/hw/scsi/spapr_vscsi.c > +++ b/hw/scsi/spapr_vscsi.c > @@ -954,6 +954,33 @@ static Property spapr_vscsi_properties[] = { > DEFINE_PROP_END_OF_LIST(), > }; > > +static void spapr_vscsi_pre_save(void *opaque) > +{ > + VSCSIState *s = opaque; > + int i; > + > + /* Can't save active requests, apparently the general SCSI code > + * quiesces the queue for us on vmsave */ > + for (i = 0; i < VSCSI_REQ_LIMIT; i++) { > + assert(!s->reqs[i].active); > + } > +} This is only true when the rerror and werror options have the values "ignore" or "report". See virtio-scsi for an example of how to save the requests using the save_request and load_request callbacks in SCSIBusInfo. Paolo > +static const VMStateDescription vmstate_spapr_vscsi = { > + .name = "spapr_vscsi", > + .version_id = 1, > + .minimum_version_id = 1, > + .minimum_version_id_old = 1, > + .pre_save = spapr_vscsi_pre_save, > + .fields = (VMStateField []) { > + VMSTATE_SPAPR_VIO(vdev, VSCSIState), > + /* VSCSI state */ > + /* ???? */ > + > + VMSTATE_END_OF_LIST() > + }, > +}; > + > static void spapr_vscsi_class_init(ObjectClass *klass, void *data) > { > DeviceClass *dc = DEVICE_CLASS(klass); > @@ -968,6 +995,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data) > k->signal_mask = 0x00000001; > dc->props = spapr_vscsi_properties; > k->rtce_window_size = 0x10000000; > + dc->vmsd = &vmstate_spapr_vscsi; > } > > static const TypeInfo spapr_vscsi_info = { >
On Mon, May 06, 2013 at 09:37:11AM +0200, Paolo Bonzini wrote: > Il 03/05/2013 03:38, David Gibson ha scritto: > > This patch adds the necessary support for saving the state of the PAPR VIO > > virtual SCSI device. This turns out to be trivial, because the generiC > > SCSI code already quiesces the attached virtual SCSI bus. > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au> > > --- > > hw/scsi/spapr_vscsi.c | 28 ++++++++++++++++++++++++++++ > > 1 file changed, 28 insertions(+) > > > > diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c > > index 3d322d5..f416871 100644 > > --- a/hw/scsi/spapr_vscsi.c > > +++ b/hw/scsi/spapr_vscsi.c > > @@ -954,6 +954,33 @@ static Property spapr_vscsi_properties[] = { > > DEFINE_PROP_END_OF_LIST(), > > }; > > > > +static void spapr_vscsi_pre_save(void *opaque) > > +{ > > + VSCSIState *s = opaque; > > + int i; > > + > > + /* Can't save active requests, apparently the general SCSI code > > + * quiesces the queue for us on vmsave */ > > + for (i = 0; i < VSCSI_REQ_LIMIT; i++) { > > + assert(!s->reqs[i].active); > > + } > > +} > > This is only true when the rerror and werror options have the values > "ignore" or "report". See virtio-scsi for an example of how to save the > requests using the save_request and load_request callbacks in > SCSIBusInfo. Ah, bother. Unfortunately the save request is quite a lot more complicated for vscsi, since we have a lot more private data, and I'm not sure which bits can be reconstructed from other information. I'll see what I can come up with. What guarantees _does_ the scsi layer give about the lifecycle state of the requests when we savevm?
On 05/06/2013 05:37 PM, Paolo Bonzini wrote: > Il 03/05/2013 03:38, David Gibson ha scritto: >> This patch adds the necessary support for saving the state of the PAPR VIO >> virtual SCSI device. This turns out to be trivial, because the generiC >> SCSI code already quiesces the attached virtual SCSI bus. >> >> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> >> --- >> hw/scsi/spapr_vscsi.c | 28 ++++++++++++++++++++++++++++ >> 1 file changed, 28 insertions(+) >> >> diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c >> index 3d322d5..f416871 100644 >> --- a/hw/scsi/spapr_vscsi.c >> +++ b/hw/scsi/spapr_vscsi.c >> @@ -954,6 +954,33 @@ static Property spapr_vscsi_properties[] = { >> DEFINE_PROP_END_OF_LIST(), >> }; >> >> +static void spapr_vscsi_pre_save(void *opaque) >> +{ >> + VSCSIState *s = opaque; >> + int i; >> + >> + /* Can't save active requests, apparently the general SCSI code >> + * quiesces the queue for us on vmsave */ >> + for (i = 0; i < VSCSI_REQ_LIMIT; i++) { >> + assert(!s->reqs[i].active); >> + } >> +} > > This is only true when the rerror and werror options have the values > "ignore" or "report". See virtio-scsi for an example of how to save the > requests using the save_request and load_request callbacks in SCSIBusInfo. Sigh. How do you test that requests are saved/restored correctly? What does happen to requests which were already sent to the real hardware (real block device, etc) but have not completed at the moment of the end of migration? > Paolo > >> +static const VMStateDescription vmstate_spapr_vscsi = { >> + .name = "spapr_vscsi", >> + .version_id = 1, >> + .minimum_version_id = 1, >> + .minimum_version_id_old = 1, >> + .pre_save = spapr_vscsi_pre_save, >> + .fields = (VMStateField []) { >> + VMSTATE_SPAPR_VIO(vdev, VSCSIState), >> + /* VSCSI state */ >> + /* ???? */ >> + >> + VMSTATE_END_OF_LIST() >> + }, >> +}; >> + >> static void spapr_vscsi_class_init(ObjectClass *klass, void *data) >> { >> DeviceClass *dc = DEVICE_CLASS(klass); >> @@ -968,6 +995,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data) >> k->signal_mask = 0x00000001; >> dc->props = spapr_vscsi_properties; >> k->rtce_window_size = 0x10000000; >> + dc->vmsd = &vmstate_spapr_vscsi; >> } >> >> static const TypeInfo spapr_vscsi_info = { >> >
Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto: >> > >> > This is only true when the rerror and werror options have the values >> > "ignore" or "report". See virtio-scsi for an example of how to save the >> > requests using the save_request and load_request callbacks in SCSIBusInfo. > > Sigh. ? > How do you test that requests are saved/restored correctly? What does > happen to requests which were already sent to the real hardware (real block > device, etc) but have not completed at the moment of the end of migration? They aren't saved, there is a bdrv_drain_all() in the migration code. This is only used for rerror=stop or werror=stop. To test it you can use blkdebug (also a bit underdocumented) or hack block/raw-posix.c with code that makes it fail the 100th write or something like that. Start the VM and migrate it while paused to a QEMU that doesn't have the hack. Paolo
On 05/27/2013 05:03 PM, Paolo Bonzini wrote: > Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto: >>>> >>>> This is only true when the rerror and werror options have the values >>>> "ignore" or "report". See virtio-scsi for an example of how to save the >>>> requests using the save_request and load_request callbacks in SCSIBusInfo. >> >> Sigh. > > ? I thought the series is ready to go but I was wrong. Furthermore when I got to the point where I could actually test the save/restore for vscsi_req, migration was totally broken on PPC and it took some time to fix it :-/ >> How do you test that requests are saved/restored correctly? What does >> happen to requests which were already sent to the real hardware (real block >> device, etc) but have not completed at the moment of the end of migration? > > They aren't saved, there is a bdrv_drain_all() in the migration code. > > This is only used for rerror=stop or werror=stop. To test it you can > use blkdebug (also a bit underdocumented) or hack block/raw-posix.c with > code that makes it fail the 100th write or something like that. Start > the VM and migrate it while paused to a QEMU that doesn't have the hack. I run QEMU as (this is the destination, the source just does not have -incoming): ./qemu-system-ppc64 \ -L "qemu-ppc64-bios/" \ -device "spapr-vscsi,id=ibmvscsi0" \ -drive "file=virtimg/fc18guest,if=none,id=dddrive0,readonly=off,format=blkdebug,media=disk,werror=stop,rerror=stop" \ -device "scsi-disk,id=scsidisk0,bus=ibmvscsi0.0,channel=0,scsi-id=0,lun=0,drive=dddrive0,removable=off" \ -incoming "tcp:localhost:4000" \ -m "1024" \ -machine "pseries" \ -nographic \ -vga "none" \ -enable-kvm Am I using werror/rerror correctly? I did not really understand how to use blkdebug or what else to hack in raw-posix but the point is I cannot get QEMU into a state with at least one vcsci_req.active==1, they are always inactive no matter what I do - I run 10 instances of "dd if=/def/sda of=/dev/null bs=4K" (on 8GB image with FC18) and increase migration speed to 500MB/s, no effect. How do you trigger the situation when there are inactive requests which have to be migrated? And another question (sorry I am not very familiar with terminology but cc:Ben is :) ) - what happens with indirect requests if migration happened in the middle of handling such a request? virtio-scsi does not seem to handle this situation anyhow, it just reconstructs the whole request and that's it.
Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto: > On 05/27/2013 05:03 PM, Paolo Bonzini wrote: >> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto: >>>>> >>>>> This is only true when the rerror and werror options have the values >>>>> "ignore" or "report". See virtio-scsi for an example of how to save the >>>>> requests using the save_request and load_request callbacks in SCSIBusInfo. >>> >>> Sigh. >> >> ? > > I thought the series is ready to go but I was wrong. Furthermore when I got > to the point where I could actually test the save/restore for vscsi_req, > migration was totally broken on PPC and it took some time to fix it :-/ It is ready. I was just pointing out that it's not _production_ ready. (Sorry, I'm unusually terse these days). > I run QEMU as (this is the destination, the source just does not have > -incoming): > ./qemu-system-ppc64 \ > -L "qemu-ppc64-bios/" \ > -device "spapr-vscsi,id=ibmvscsi0" \ > -drive > "file=virtimg/fc18guest,if=none,id=dddrive0,readonly=off,format=blkdebug,media=disk,werror=stop,rerror=stop" > \ > -device > "scsi-disk,id=scsidisk0,bus=ibmvscsi0.0,channel=0,scsi-id=0,lun=0,drive=dddrive0,removable=off" > \ > -incoming "tcp:localhost:4000" \ > -m "1024" \ > -machine "pseries" \ > -nographic \ > -vga "none" \ > -enable-kvm > > Am I using werror/rerror correctly? Yes. > I did not really understand how to use blkdebug or what else to hack in > raw-posix but the point is I cannot get QEMU into a state with at least one > vcsci_req.active==1, they are always inactive no matter what I do - I run > 10 instances of "dd if=/def/sda of=/dev/null bs=4K" (on 8GB image with > FC18) and increase migration speed to 500MB/s, no effect. No, that doesn't help. > How do you trigger the situation when there are inactive requests which > have to be migrated? You need to trigger an error. For example, you could use a sparse image on an almost-full partition and let "dd" fill your disk. Then migrate to another instance of QEMU on the same machine, the destination machine should succeed migration but fail starting the machine. When free space on that partition, and "cont" on the destination, it should resume. > And another question (sorry I am not very familiar with terminology but > cc:Ben is :) ) - what happens with indirect requests if migration happened > in the middle of handling such a request? virtio-scsi does not seem to > handle this situation anyhow, it just reconstructs the whole request and > that's it. What are indirect requests? Paolo
On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote: > > And another question (sorry I am not very familiar with terminology but > cc:Ben is :) ) - what happens with indirect requests if migration happened > in the middle of handling such a request? virtio-scsi does not seem to > handle this situation anyhow, it just reconstructs the whole request and > that's it. So Paolo, the crux of the question here is really whether we have any guarantee about the state of the request when this happens (by this I mean a save happening with requests still "in flight") ? IE. Can the request can be at any stage of processing, with the data transfer phase being half way through, or do we somewhat know for sure that the request will *not* have started transferring any data ? This is key, because in the latter case, all we really need to do is save the request itself, and re-parse it on restore as if it was new really (at least from a DMA descriptor perspective). However, if the data transfer is already half way through, we need to somewhat save the state of the data transfer machinery, ie. the position of the "cursor" that follows the guest-provided DMA descriptor list, etc... (which isn't *that* trivial since we have a concept of indirect descriptors and we use pointers to follow them, so we'd probably have to re-walk the whole user descriptors list until we reach the same position). Cheers, Ben.
On 05/31/2013 06:18 PM, Paolo Bonzini wrote: > Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto: >> On 05/27/2013 05:03 PM, Paolo Bonzini wrote: >>> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto: >>>>>> >>>>>> This is only true when the rerror and werror options have the values >>>>>> "ignore" or "report". See virtio-scsi for an example of how to save the >>>>>> requests using the save_request and load_request callbacks in SCSIBusInfo. >>>> >>>> Sigh. >>> >>> ? >> >> I thought the series is ready to go but I was wrong. Furthermore when I got >> to the point where I could actually test the save/restore for vscsi_req, >> migration was totally broken on PPC and it took some time to fix it :-/ > > It is ready. I was just pointing out that it's not _production_ ready. What is the difference then? :) > (Sorry, I'm unusually terse these days). > >> I run QEMU as (this is the destination, the source just does not have >> -incoming): >> ./qemu-system-ppc64 \ >> -L "qemu-ppc64-bios/" \ >> -device "spapr-vscsi,id=ibmvscsi0" \ >> -drive >> "file=virtimg/fc18guest,if=none,id=dddrive0,readonly=off,format=blkdebug,media=disk,werror=stop,rerror=stop" >> \ >> -device >> "scsi-disk,id=scsidisk0,bus=ibmvscsi0.0,channel=0,scsi-id=0,lun=0,drive=dddrive0,removable=off" >> \ >> -incoming "tcp:localhost:4000" \ >> -m "1024" \ >> -machine "pseries" \ >> -nographic \ >> -vga "none" \ >> -enable-kvm >> >> Am I using werror/rerror correctly? > > Yes. > >> I did not really understand how to use blkdebug or what else to hack in >> raw-posix but the point is I cannot get QEMU into a state with at least one >> vcsci_req.active==1, they are always inactive no matter what I do - I run >> 10 instances of "dd if=/def/sda of=/dev/null bs=4K" (on 8GB image with >> FC18) and increase migration speed to 500MB/s, no effect. > > No, that doesn't help. > >> How do you trigger the situation when there are inactive requests which >> have to be migrated? > > You need to trigger an error. For example, you could use a sparse image > on an almost-full partition and let "dd" fill your disk. Then migrate > to another instance of QEMU on the same machine, the destination machine > should succeed migration but fail starting the machine. Why would it fail? I run "dd", it fills the disk and stops. Then I migrate. How can I get pending SCSI request in a queue? Sorry, I am definitely missing something. > When free space > on that partition, and "cont" on the destination, it should resume. > >> And another question (sorry I am not very familiar with terminology but >> cc:Ben is :) ) - what happens with indirect requests if migration happened >> in the middle of handling such a request? virtio-scsi does not seem to >> handle this situation anyhow, it just reconstructs the whole request and >> that's it. > > What are indirect requests? I'll leave it to Ben to avoid confusing :)
On 05/31/2013 08:07 PM, Benjamin Herrenschmidt wrote: > On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote: >> >> And another question (sorry I am not very familiar with terminology but >> cc:Ben is :) ) - what happens with indirect requests if migration happened >> in the middle of handling such a request? virtio-scsi does not seem to >> handle this situation anyhow, it just reconstructs the whole request and >> that's it. > > So Paolo, the crux of the question here is really whether we have any > guarantee about the state of the request when this happens (by this I > mean a save happening with requests still "in flight") ? > > IE. Can the request can be at any stage of processing, with the data > transfer phase being half way through, or do we somewhat know for sure > that the request will *not* have started transferring any data ? > > This is key, because in the latter case, all we really need to do is > save the request itself, and re-parse it on restore as if it was > new really (at least from a DMA descriptor perspective). > > However, if the data transfer is already half way through, we need to > somewhat save the state of the data transfer machinery, ie. the position > of the "cursor" that follows the guest-provided DMA descriptor list, > etc... (which isn't *that* trivial since we have a concept of indirect > descriptors and we use pointers to follow them, so we'd probably have > to re-walk the whole user descriptors list until we reach the same position). Is not it the same QEMU thread which handles hcalls and QEMU console commands so the migration cannot stop parsing/handling a vscsi_req?
Il 31/05/2013 12:12, Alexey Kardashevskiy ha scritto: > On 05/31/2013 06:18 PM, Paolo Bonzini wrote: >> Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto: >>> On 05/27/2013 05:03 PM, Paolo Bonzini wrote: >>>> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto: >>>>>>> >>>>>>> This is only true when the rerror and werror options have the values >>>>>>> "ignore" or "report". See virtio-scsi for an example of how to save the >>>>>>> requests using the save_request and load_request callbacks in SCSIBusInfo. >>>>> >>>>> Sigh. >>>> >>>> ? >>> >>> I thought the series is ready to go but I was wrong. Furthermore when I got >>> to the point where I could actually test the save/restore for vscsi_req, >>> migration was totally broken on PPC and it took some time to fix it :-/ >> >> It is ready. I was just pointing out that it's not _production_ ready. > > What is the difference then? :) It is mergeable, but it needs further work and you should be aware of that. >>> How do you trigger the situation when there are inactive requests which >>> have to be migrated? >> >> You need to trigger an error. For example, you could use a sparse image >> on an almost-full partition and let "dd" fill your disk. Then migrate >> to another instance of QEMU on the same machine, the destination machine >> should succeed migration but fail starting the machine. > > Why would it fail? I run "dd", it fills the disk and stops. You have to make it fill the _host_ disk before it fills the guest disk. That's why I mentioned a sparse image. Then the machine pauses with the failing request in its queue. When you migrate, the request is migrated as well. Paolo
On 05/31/2013 08:26 PM, Paolo Bonzini wrote: > Il 31/05/2013 12:12, Alexey Kardashevskiy ha scritto: >> On 05/31/2013 06:18 PM, Paolo Bonzini wrote: >>> Il 31/05/2013 07:58, Alexey Kardashevskiy ha scritto: >>>> On 05/27/2013 05:03 PM, Paolo Bonzini wrote: >>>>> Il 27/05/2013 08:48, Alexey Kardashevskiy ha scritto: >>>>>>>> >>>>>>>> This is only true when the rerror and werror options have the values >>>>>>>> "ignore" or "report". See virtio-scsi for an example of how to save the >>>>>>>> requests using the save_request and load_request callbacks in SCSIBusInfo. >>>>>> >>>>>> Sigh. >>>>> >>>>> ? >>>> >>>> I thought the series is ready to go but I was wrong. Furthermore when I got >>>> to the point where I could actually test the save/restore for vscsi_req, >>>> migration was totally broken on PPC and it took some time to fix it :-/ >>> >>> It is ready. I was just pointing out that it's not _production_ ready. >> >> What is the difference then? :) > > It is mergeable, but it needs further work and you should be aware of that. > >>>> How do you trigger the situation when there are inactive requests which >>>> have to be migrated? >>> >>> You need to trigger an error. For example, you could use a sparse image >>> on an almost-full partition and let "dd" fill your disk. Then migrate >>> to another instance of QEMU on the same machine, the destination machine >>> should succeed migration but fail starting the machine. >> >> Why would it fail? I run "dd", it fills the disk and stops. > > You have to make it fill the _host_ disk before it fills the guest disk. > That's why I mentioned a sparse image. > > Then the machine pauses with the failing request in its queue. Does the machine pause automatically in such case? Did not know that, now it makes sense. Thanks. > When you migrate, the request is migrated as well.
Il 31/05/2013 12:33, Alexey Kardashevskiy ha scritto: >>>>> How do you trigger the situation when there are inactive requests which >>>>> have to be migrated? >>>> >>>> You need to trigger an error. For example, you could use a sparse image >>>> on an almost-full partition and let "dd" fill your disk. Then migrate >>>> to another instance of QEMU on the same machine, the destination machine >>>> should succeed migration but fail starting the machine. >>> >>> Why would it fail? I run "dd", it fills the disk and stops. >> >> You have to make it fill the _host_ disk before it fills the guest disk. >> That's why I mentioned a sparse image. >> >> Then the machine pauses with the failing request in its queue. > > Does the machine pause automatically in such case? Did not know that, now > it makes sense. Thanks. That's the point of rerror=stop/werror=stop. :) Paolo
Il 31/05/2013 12:25, Alexey Kardashevskiy ha scritto: > On 05/31/2013 08:07 PM, Benjamin Herrenschmidt wrote: >> On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote: >>> >>> And another question (sorry I am not very familiar with terminology but >>> cc:Ben is :) ) - what happens with indirect requests if migration happened >>> in the middle of handling such a request? virtio-scsi does not seem to >>> handle this situation anyhow, it just reconstructs the whole request and >>> that's it. >> >> So Paolo, the crux of the question here is really whether we have any >> guarantee about the state of the request when this happens (by this I >> mean a save happening with requests still "in flight") ? >> >> IE. Can the request can be at any stage of processing, with the data >> transfer phase being half way through, or do we somewhat know for sure >> that the request will *not* have started transferring any data ? >> >> This is key, because in the latter case, all we really need to do is >> save the request itself, and re-parse it on restore as if it was >> new really (at least from a DMA descriptor perspective). >> >> However, if the data transfer is already half way through, we need to >> somewhat save the state of the data transfer machinery, ie. the position >> of the "cursor" that follows the guest-provided DMA descriptor list, >> etc... (which isn't *that* trivial since we have a concept of indirect >> descriptors and we use pointers to follow them, so we'd probably have >> to re-walk the whole user descriptors list until we reach the same position). It may be halfway through, but it is always restarted on the destination. virtio-scsi parses the whole descriptor chain upfront and sends the guest addresses in the migration stream. > Is not it the same QEMU thread which handles hcalls and QEMU console > commands so the migration cannot stop parsing/handling a vscsi_req? The VM is paused and I/O is flushed at the point when the reqs are sent. That's why you couldn't get a pending request. Only failed requests remain in queue. Paolo
On Fri, 2013-05-31 at 12:41 +0200, Paolo Bonzini wrote: > It may be halfway through, but it is always restarted on the destination. "restarted" as in the whole transfer is restarted if any right ? So we can essentially consider as a new request for which we just did scsi_req_enqueue() ? IE. We don't do direct DMA to guest pages just yet (we still do copies) so basically our process is: 1- Obtain request from guest 2- Queue it (scsi_req_enqueue) 3- No transfer -> go away (completion is called) 4- Pre-process user descriptors (check desc type direct vs indirect, position our "cursor" walking them etc....) 5- scsi_req_continue() .../... loop of callbacks & transfer Now from what you say, I assume that regardless of the point where the request was, when we "resume" it will always be at step 4 ? IE. I can just pre-process the descriptors again ? (I actually need to transfer them again from the guest since I suspect I clobber them at the very least due to byteswap) and call scsi_req_continue() and assume the transfer (if any) started from the beginning ? > virtio-scsi parses the whole descriptor chain upfront and sends the > guest addresses in the migration stream. > > > Is not it the same QEMU thread which handles hcalls and QEMU console > > commands so the migration cannot stop parsing/handling a vscsi_req? > > The VM is paused and I/O is flushed at the point when the reqs are sent. > That's why you couldn't get a pending request. Only failed requests > remain in queue. Ben.
On 05/31/2013 08:41 PM, Paolo Bonzini wrote: > Il 31/05/2013 12:25, Alexey Kardashevskiy ha scritto: >> On 05/31/2013 08:07 PM, Benjamin Herrenschmidt wrote: >>> On Fri, 2013-05-31 at 15:58 +1000, Alexey Kardashevskiy wrote: >>>> >>>> And another question (sorry I am not very familiar with terminology but >>>> cc:Ben is :) ) - what happens with indirect requests if migration happened >>>> in the middle of handling such a request? virtio-scsi does not seem to >>>> handle this situation anyhow, it just reconstructs the whole request and >>>> that's it. >>> >>> So Paolo, the crux of the question here is really whether we have any >>> guarantee about the state of the request when this happens (by this I >>> mean a save happening with requests still "in flight") ? >>> >>> IE. Can the request can be at any stage of processing, with the data >>> transfer phase being half way through, or do we somewhat know for sure >>> that the request will *not* have started transferring any data ? >>> >>> This is key, because in the latter case, all we really need to do is >>> save the request itself, and re-parse it on restore as if it was >>> new really (at least from a DMA descriptor perspective). >>> >>> However, if the data transfer is already half way through, we need to >>> somewhat save the state of the data transfer machinery, ie. the position >>> of the "cursor" that follows the guest-provided DMA descriptor list, >>> etc... (which isn't *that* trivial since we have a concept of indirect >>> descriptors and we use pointers to follow them, so we'd probably have >>> to re-walk the whole user descriptors list until we reach the same position). > > It may be halfway through, but it is always restarted on the destination. > > virtio-scsi parses the whole descriptor chain upfront and sends the > guest addresses in the migration stream. > >> Is not it the same QEMU thread which handles hcalls and QEMU console >> commands so the migration cannot stop parsing/handling a vscsi_req? > > The VM is paused and I/O is flushed at the point when the reqs are sent. > That's why you couldn't get a pending request. Only failed requests > remain in queue. Ok. I implemented {save|load}_request for IBMVSCSI, started testing - the destination system behaves very unstable, sometime it is a fault in _raw_spin_lock or it looks okay but any attempt to read the filesystem leads to 100% cpu load in qemu process and no response from the guest. I tried virtio-scsi as well (as it was referred as a good example), it fails in exactly the same way. So I started wondering - when did you try it last time? :) My test is: 1. create qcow2 image 8GB, put it to 2GB USB disk. 2. put 1.8GB "dummy" image onto the same USB disk. 3. run qemu with qcow2 image. 4. do "mkfs.ext4 /dev/sda" in the guest. It creates 300MB file when there is enough space. 5. wait till the source qemu gets stopped due to io error (info status confirms this). 6. migrate. 7. remove "dummy". 8. "c"ontinue in the destination guest. Is it good/bad/ugly? What do I miss? Thanks!
Il 01/06/2013 02:01, Benjamin Herrenschmidt ha scritto: > On Fri, 2013-05-31 at 12:41 +0200, Paolo Bonzini wrote: > >> It may be halfway through, but it is always restarted on the destination. > > "restarted" as in the whole transfer is restarted if any right ? So we > can essentially consider as a new request for which we just did > scsi_req_enqueue() ? > > IE. We don't do direct DMA to guest pages just yet (we still do copies) > so basically our process is: > > 1- Obtain request from guest > 2- Queue it (scsi_req_enqueue) > 3- No transfer -> go away (completion is called) > 4- Pre-process user descriptors (check desc type direct vs indirect, > position our "cursor" walking them etc....) > 5- scsi_req_continue() > .../... loop of callbacks & transfer > > Now from what you say, I assume that regardless of the point where > the request was, when we "resume" it will always be at step 4 ? > > IE. I can just pre-process the descriptors again ? (I actually need > to transfer them again from the guest since I suspect I clobber them > at the very least due to byteswap) and call scsi_req_continue() and > assume the transfer (if any) started from the beginning ? Yes. Unless the spec somehow lets the guest figure out the point at which the whole chain has been pre-processed, and lets the guest modify the chain at this point. But if that's not the case, you can do that. Memory has already been loaded when load_request runs. Paolo
Il 03/06/2013 07:46, Alexey Kardashevskiy ha scritto: > Ok. I implemented {save|load}_request for IBMVSCSI, started testing - the > destination system behaves very unstable, sometime it is a fault in > _raw_spin_lock or it looks okay but any attempt to read the filesystem > leads to 100% cpu load in qemu process and no response from the guest. > > I tried virtio-scsi as well (as it was referred as a good example), it > fails in exactly the same way. So I started wondering - when did you try it > last time? :) Perhaps a year ago. Gerd must have tested usb-storage later than that though. > My test is: > 1. create qcow2 image 8GB, put it to 2GB USB disk. > 2. put 1.8GB "dummy" image onto the same USB disk. > 3. run qemu with qcow2 image. > 4. do "mkfs.ext4 /dev/sda" in the guest. It creates 300MB file when there > is enough space. > 5. wait till the source qemu gets stopped due to io error (info status > confirms this). > 6. migrate. > 7. remove "dummy". > 8. "c"ontinue in the destination guest. Sounds good. We really need testcases for this. I'll take a look when I come back from vacation. Paolo
On Mon, 2013-06-03 at 15:46 +1000, Alexey Kardashevskiy wrote: > Ok. I implemented {save|load}_request for IBMVSCSI, started testing - > the > destination system behaves very unstable, sometime it is a fault in > _raw_spin_lock or it looks okay but any attempt to read the filesystem > leads to 100% cpu load in qemu process and no response from the guest. > > I tried virtio-scsi as well (as it was referred as a good example), it > fails in exactly the same way. So I started wondering - when did you > try it > last time? :) Did you try virtio-blk or even a ramdisk ? IE, Make sure the problem isn't some kind of generic migration issue unrelated to storage ? Cheers, Ben.
On 06/03/2013 06:07 PM, Benjamin Herrenschmidt wrote: > On Mon, 2013-06-03 at 15:46 +1000, Alexey Kardashevskiy wrote: >> Ok. I implemented {save|load}_request for IBMVSCSI, started testing - >> the >> destination system behaves very unstable, sometime it is a fault in >> _raw_spin_lock or it looks okay but any attempt to read the filesystem >> leads to 100% cpu load in qemu process and no response from the guest. >> >> I tried virtio-scsi as well (as it was referred as a good example), it >> fails in exactly the same way. So I started wondering - when did you >> try it >> last time? :) > > Did you try virtio-blk or even a ramdisk ? IE, Make sure the problem > isn't some kind of generic migration issue unrelated to storage ? False alarm. During multiple switches between different git branches, I just lost my own patch which disables "bulk" migration (which we want to revert anyway, just waiting for the author to do that himself) :) At least my test does not fail any more. Sorry for confusing.
Il 03/06/2013 11:37, Alexey Kardashevskiy ha scritto: > On 06/03/2013 06:07 PM, Benjamin Herrenschmidt wrote: >> On Mon, 2013-06-03 at 15:46 +1000, Alexey Kardashevskiy wrote: >>> Ok. I implemented {save|load}_request for IBMVSCSI, started testing - >>> the >>> destination system behaves very unstable, sometime it is a fault in >>> _raw_spin_lock or it looks okay but any attempt to read the filesystem >>> leads to 100% cpu load in qemu process and no response from the guest. >>> >>> I tried virtio-scsi as well (as it was referred as a good example), it >>> fails in exactly the same way. So I started wondering - when did you >>> try it >>> last time? :) >> >> Did you try virtio-blk or even a ramdisk ? IE, Make sure the problem >> isn't some kind of generic migration issue unrelated to storage ? > > > False alarm. During multiple switches between different git branches, I > just lost my own patch which disables "bulk" migration (which we want to > revert anyway, just waiting for the author to do that himself) :) > > At least my test does not fail any more. Sorry for confusing. Good, I was surprised. That area doesn't see much testing, but it hasn't seen much change either. Paolo
diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c index 3d322d5..f416871 100644 --- a/hw/scsi/spapr_vscsi.c +++ b/hw/scsi/spapr_vscsi.c @@ -954,6 +954,33 @@ static Property spapr_vscsi_properties[] = { DEFINE_PROP_END_OF_LIST(), }; +static void spapr_vscsi_pre_save(void *opaque) +{ + VSCSIState *s = opaque; + int i; + + /* Can't save active requests, apparently the general SCSI code + * quiesces the queue for us on vmsave */ + for (i = 0; i < VSCSI_REQ_LIMIT; i++) { + assert(!s->reqs[i].active); + } +} + +static const VMStateDescription vmstate_spapr_vscsi = { + .name = "spapr_vscsi", + .version_id = 1, + .minimum_version_id = 1, + .minimum_version_id_old = 1, + .pre_save = spapr_vscsi_pre_save, + .fields = (VMStateField []) { + VMSTATE_SPAPR_VIO(vdev, VSCSIState), + /* VSCSI state */ + /* ???? */ + + VMSTATE_END_OF_LIST() + }, +}; + static void spapr_vscsi_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); @@ -968,6 +995,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data) k->signal_mask = 0x00000001; dc->props = spapr_vscsi_properties; k->rtce_window_size = 0x10000000; + dc->vmsd = &vmstate_spapr_vscsi; } static const TypeInfo spapr_vscsi_info = {
This patch adds the necessary support for saving the state of the PAPR VIO virtual SCSI device. This turns out to be trivial, because the generiC SCSI code already quiesces the attached virtual SCSI bus. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> --- hw/scsi/spapr_vscsi.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)