diff mbox

[v2,2/5] Fix migration in case of scsi-generic

Message ID 1431107242-31947-3-git-send-email-dimara@arrikto.com
State New
Headers show

Commit Message

Dimitris Aragiorgis May 8, 2015, 5:47 p.m. UTC
During migration, QEMU uses fsync()/fdatasync() on the open file
descriptor for read-write block devices to flush data just before
stopping the VM.

However, fsync() on a scsi-generic device returns -EINVAL which
causes the migration to fail. This patch skips flushing data in case
of an SG device, since submitting SCSI commands directly via an SG
character device (e.g. /dev/sg0) bypasses the page cache completely,
anyway.

Remove the bdrv_is_sg() test from iscsi_co_flush() since this is
now redundant (we flush the underlying protocol at the end
of bdrv_co_flush() which, with this patch, we never reach).

Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
---
 block/io.c    |    2 +-
 block/iscsi.c |    4 ----
 2 files changed, 1 insertion(+), 5 deletions(-)

Comments

Kevin Wolf May 11, 2015, 10:12 a.m. UTC | #1
Am 08.05.2015 um 19:47 hat Dimitris Aragiorgis geschrieben:
> During migration, QEMU uses fsync()/fdatasync() on the open file
> descriptor for read-write block devices to flush data just before
> stopping the VM.
> 
> However, fsync() on a scsi-generic device returns -EINVAL which
> causes the migration to fail. This patch skips flushing data in case
> of an SG device, since submitting SCSI commands directly via an SG
> character device (e.g. /dev/sg0) bypasses the page cache completely,
> anyway.

fsync() doesn't only flush the kernel page cache, but also the disk
cache. The full justification includes at least that the scsi-generic
device never sends flushes, and for migration it's assumed that the same
SCSI device is used by the destination host (rather than another way to
access the same backing storage). If this were not enough, we would have
to send a SCSI flush command.

On another note, I expect we still neglect to check bs_is_sg() in too
many places. Any monitor command that does normal I/O on such a BDS
should actually fail.

> Remove the bdrv_is_sg() test from iscsi_co_flush() since this is
> now redundant (we flush the underlying protocol at the end
> of bdrv_co_flush() which, with this patch, we never reach).
> 
> Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
> ---
>  block/io.c    |    2 +-
>  block/iscsi.c |    4 ----
>  2 files changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 1ce62c4..d248a4d 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2231,7 +2231,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>  {
>      int ret;
>  
> -    if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
> +    if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) || bdrv_is_sg(bs)) {

This line exceeds 80 characters.

Kevin
Dimitris Aragiorgis May 14, 2015, 11:02 a.m. UTC | #2
Hello Kevin,

* Kevin Wolf <kwolf@redhat.com> [2015-05-11 12:12:23 +0200]:

> Am 08.05.2015 um 19:47 hat Dimitris Aragiorgis geschrieben:
> > During migration, QEMU uses fsync()/fdatasync() on the open file
> > descriptor for read-write block devices to flush data just before
> > stopping the VM.
> > 
> > However, fsync() on a scsi-generic device returns -EINVAL which
> > causes the migration to fail. This patch skips flushing data in case
> > of an SG device, since submitting SCSI commands directly via an SG
> > character device (e.g. /dev/sg0) bypasses the page cache completely,
> > anyway.
> 
> fsync() doesn't only flush the kernel page cache, but also the disk
> cache. The full justification includes at least that the scsi-generic
> device never sends flushes, and for migration it's assumed that the same
> SCSI device is used by the destination host (rather than another way to
> access the same backing storage). If this were not enough, we would have
> to send a SCSI flush command.
> 

If I understand correctly you mean that QEMU will not issue any
SCSI SYNCHRONIZE CACHE (10) command and thus the disk cache does not get
flushed during migration. This requires, as you say, that the same SCSI
device is used by the destination host.

Note taken. I have added your comment in the commit message of the
corresponding patch in v3 just sent to the list. Thanks for clarifying
this.

Also note that QEMU seems to flush the disk cache explicitly in case of
iSCSI, using iscsi_synchronizecache10_task() from libiscsi inside
iscsi_co_flush().

Perhaps we can also extend this approach to scsi-generic e.g. by calling
sg_ll_sync_cache_10() from libsgutils2. This is a distinct, orthogonal
patch that could be added in the future.

> On another note, I expect we still neglect to check bs_is_sg() in too
> many places. Any monitor command that does normal I/O on such a BDS
> should actually fail.
> 
> > Remove the bdrv_is_sg() test from iscsi_co_flush() since this is
> > now redundant (we flush the underlying protocol at the end
> > of bdrv_co_flush() which, with this patch, we never reach).
> > 
> > Signed-off-by: Dimitris Aragiorgis <dimara@arrikto.com>
> > ---
> >  block/io.c    |    2 +-
> >  block/iscsi.c |    4 ----
> >  2 files changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/block/io.c b/block/io.c
> > index 1ce62c4..d248a4d 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2231,7 +2231,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> >  {
> >      int ret;
> >  
> > -    if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
> > +    if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) || bdrv_is_sg(bs)) {
> 
> This line exceeds 80 characters.
> 
> Kevin
Paolo Bonzini May 14, 2015, 1:21 p.m. UTC | #3
On 14/05/2015 13:02, Dimitris Aragiorgis wrote:
> Also note that QEMU seems to flush the disk cache explicitly in
> case of iSCSI, using iscsi_synchronizecache10_task() from libiscsi
> inside iscsi_co_flush().

Right, QEMU does that if type is DISK only.

> Perhaps we can also extend this approach to scsi-generic e.g. by
> calling sg_ll_sync_cache_10() from libsgutils2. This is a distinct,
> orthogonal patch that could be added in the future.

Yes.  But do not use libsgutils2, just do it manually in QEMU.  See
for example get_stream_blocksize in hw/scsi/scsi-generic.c.

Paolo
diff mbox

Patch

diff --git a/block/io.c b/block/io.c
index 1ce62c4..d248a4d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2231,7 +2231,7 @@  int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
     int ret;
 
-    if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
+    if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) || bdrv_is_sg(bs)) {
         return 0;
     }
 
diff --git a/block/iscsi.c b/block/iscsi.c
index 502a81f..965978b 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -627,10 +627,6 @@  static int coroutine_fn iscsi_co_flush(BlockDriverState *bs)
     IscsiLun *iscsilun = bs->opaque;
     struct IscsiTask iTask;
 
-    if (bdrv_is_sg(bs)) {
-        return 0;
-    }
-
     if (!iscsilun->force_next_flush) {
         return 0;
     }