diff mbox

[2/2] virtio-blk: do not relay a previous driver's WCE configuration to the current

Message ID 1379609334-20811-3-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Sept. 19, 2013, 4:48 p.m. UTC
The following sequence happens:
- the SeaBIOS virtio-blk driver does not support the WCE feature, which
causes QEMU to disable writeback caching

- the Linux virtio-blk driver resets the device, finds WCE is available
but writeback caching is disabled; tells block layer to not send cache
flush commands

- the Linux virtio-blk driver sets the DRIVER_OK bit, which causes
writeback caching to be re-enabled, but the Linux virtio-blk driver does
not know of this side effect and cache flushes remain disabled

The bug is at the third step.  If the guest does know about CONFIG_WCE,
QEMU should ignore the WCE feature's state.  The guest will control the
cache mode solely using configuration space.  This change makes Linux
do flushes correctly, but Linux will keep SeaBIOS's writethrough mode.

Hence, whenever the guest is reset, the cache mode of the disk should
be reset to whatever was specified in the "-drive" option.  With this
change, the Linux virtio-blk driver finds that writeback caching is
enabled, and tells the block layer to send cache flush commands
appropriately.

Reported-by: Rusty Russell <rusty@au1.ibm.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/block/virtio-blk.c          | 8 ++++++--
 include/hw/virtio/virtio-blk.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

Comments

Eric Blake Sept. 19, 2013, 5:42 p.m. UTC | #1
On 09/19/2013 10:48 AM, Paolo Bonzini wrote:
> The following sequence happens:

> Hence, whenever the guest is reset, the cache mode of the disk should
> be reset to whatever was specified in the "-drive" option.  With this
> change, the Linux virtio-blk driver finds that writeback caching is
> enabled, and tells the block layer to send cache flush commands
> appropriately.
> 
> Reported-by: Rusty Russell <rusty@au1.ibm.com
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/block/virtio-blk.c          | 8 ++++++--
>  include/hw/virtio/virtio-blk.h | 1 +
>  2 files changed, 7 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

>  
>      features = vdev->guest_features;
> -    bdrv_set_enable_write_cache(s->bs, !!(features & (1 << VIRTIO_BLK_F_WCE)));
> +    if (!(features & (1 << VIRTIO_BLK_F_CONFIG_WCE))) {
> +        bdrv_set_enable_write_cache(s->bs,
> +                                    !!(features & (1 << VIRTIO_BLK_F_WCE)));
> +    }
>  }
>  
>  static void virtio_blk_save(QEMUFile *f, void *opaque)
> @@ -674,6 +678,7 @@ static int virtio_blk_device_init(VirtIODevice *vdev)
>      }
>  
>      blkconf_serial(&blk->conf, &blk->serial);
> +    s->original_wce = bdrv_enable_write_cache(blk->conf.bs);

At first, I was worried that this does 'bool = int', and whether that
was correct in all cases.  But looking further, bdrv_enable_write_cache
merely returns bs->enable_write_cache (also an int), but that all
assignments to bs->enable_write_cache are careful to only assign 0 or 1.
 A followup patch that changes the types to bool might be in order, but
doesn't invalidate this patch.
Kevin Wolf Sept. 20, 2013, 9:54 a.m. UTC | #2
Am 19.09.2013 um 18:48 hat Paolo Bonzini geschrieben:
> The following sequence happens:
> - the SeaBIOS virtio-blk driver does not support the WCE feature, which
> causes QEMU to disable writeback caching
> 
> - the Linux virtio-blk driver resets the device, finds WCE is available
> but writeback caching is disabled; tells block layer to not send cache
> flush commands
> 
> - the Linux virtio-blk driver sets the DRIVER_OK bit, which causes
> writeback caching to be re-enabled, but the Linux virtio-blk driver does
> not know of this side effect and cache flushes remain disabled
> 
> The bug is at the third step.  If the guest does know about CONFIG_WCE,
> QEMU should ignore the WCE feature's state.  The guest will control the
> cache mode solely using configuration space.  This change makes Linux
> do flushes correctly, but Linux will keep SeaBIOS's writethrough mode.

This sounds fishy. The solutions happens to make recent Linux kernels do
the right thing, but wouldn't drivers that don't know CONFIG_WCE still
fall into the same trap? I guess making a host feature flag dynamic was
a bad idea to start with.

Perhaps we should restrict the magic to disabling WCE in case the guest
doesn't have VIRTIO_BLK_F_WCE, but never allow it to enable WCE even
though we've already advertised that the host doesn't have WCE.

Kevin
Paolo Bonzini Sept. 20, 2013, 10:28 a.m. UTC | #3
Il 20/09/2013 11:54, Kevin Wolf ha scritto:
> Am 19.09.2013 um 18:48 hat Paolo Bonzini geschrieben:
>> The following sequence happens:
>> - the SeaBIOS virtio-blk driver does not support the WCE feature, which
>> causes QEMU to disable writeback caching
>>
>> - the Linux virtio-blk driver resets the device, finds WCE is available
>> but writeback caching is disabled; tells block layer to not send cache
>> flush commands
>>
>> - the Linux virtio-blk driver sets the DRIVER_OK bit, which causes
>> writeback caching to be re-enabled, but the Linux virtio-blk driver does
>> not know of this side effect and cache flushes remain disabled
>>
>> The bug is at the third step.  If the guest does know about CONFIG_WCE,
>> QEMU should ignore the WCE feature's state.  The guest will control the
>> cache mode solely using configuration space.  This change makes Linux
>> do flushes correctly, but Linux will keep SeaBIOS's writethrough mode.
> 
> This sounds fishy. The solutions happens to make recent Linux kernels do
> the right thing, but wouldn't drivers that don't know CONFIG_WCE still
> fall into the same trap?

No, drivers that don't know CONFIG_WCE will do the following:

1) -drive cache=writethrough case, WCE supported

    When the driver resets the device, QEMU disables the write cache
    (virtio_blk_reset).  Thus VIRTIO_BLK_F_WCE is not advertised.
    The Linux virtio-blk driver tells the block layer to not send
    cache flush commands, which is correct because they are useless.
    VIRTIO_BLK_F_WCE is obviously not negotiated, and
    virtio_blk_set_status confirms the disk in writethrough mode.

2) -drive cache=writeback case, WCE supported

    When the driver resets the device, QEMU disables the write cache
    (virtio_blk_reset).  Thus VIRTIO_BLK_F_WCE is advertised by the
    device and negotiated by the driver.  The Linux virtio-blk driver
    recognizes that VIRTIO_BLK_F_WCE is negotiated and tells the block
    layer to send cache flush commands.  virtio_blk_set_status confirms
    the disk in writeback mode.

3) -drive cache=writethrough case, WCE not supported

    When the driver resets the device, QEMU disables the write cache
    (virtio_blk_reset).  Thus VIRTIO_BLK_F_WCE is not advertised.
    The virtio-blk driver doesn't do anything.
    virtio_blk_set_status confirms the disk in writethrough mode.

4) -drive cache=writeback case, WCE not supported

    When the driver resets the device, QEMU disables the write cache
    (virtio_blk_reset).  Thus VIRTIO_BLK_F_WCE is advertised by the
    device, but not negotiated by the driver.
    The virtio-blk driver doesn't do anything.
    virtio_blk_set_status places the disk in writethrough mode.

> I guess making a host feature flag dynamic was
> a bad idea to start with.

I disagree, it is very useful.  The bug was unfortunate indeed, and
probably happened due to testing the two patches (CONFIG_WCE and
no-WCE-implies-writethrough) independently rather than together.

> Perhaps we should restrict the magic to disabling WCE in case the guest
> doesn't have VIRTIO_BLK_F_WCE, but never allow it to enable WCE even
> though we've already advertised that the host doesn't have WCE.

That's already what happens, because (thanks to the new
"bdrv_set_enable_write_cache(s->bs, s->original_wce);" at reset time)
VIRTIO_BLK_F_WCE is never exposed in writethrough mode.

Paolo
diff mbox

Patch

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e2f55cc..6ed9666 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -460,9 +460,9 @@  static void virtio_blk_dma_restart_cb(void *opaque, int running,
 
 static void virtio_blk_reset(VirtIODevice *vdev)
 {
-#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
     VirtIOBlock *s = VIRTIO_BLK(vdev);
 
+#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
     if (s->dataplane) {
         virtio_blk_data_plane_stop(s->dataplane);
     }
@@ -473,6 +473,7 @@  static void virtio_blk_reset(VirtIODevice *vdev)
      * are per-device request lists.
      */
     bdrv_drain_all();
+    bdrv_set_enable_write_cache(s->bs, s->original_wce);
 }
 
 /* coalesce internal state, copy to pci i/o region 0
@@ -564,7 +565,10 @@  static void virtio_blk_set_status(VirtIODevice *vdev, uint8_t status)
     }
 
     features = vdev->guest_features;
-    bdrv_set_enable_write_cache(s->bs, !!(features & (1 << VIRTIO_BLK_F_WCE)));
+    if (!(features & (1 << VIRTIO_BLK_F_CONFIG_WCE))) {
+        bdrv_set_enable_write_cache(s->bs,
+                                    !!(features & (1 << VIRTIO_BLK_F_WCE)));
+    }
 }
 
 static void virtio_blk_save(QEMUFile *f, void *opaque)
@@ -674,6 +678,7 @@  static int virtio_blk_device_init(VirtIODevice *vdev)
     }
 
     blkconf_serial(&blk->conf, &blk->serial);
+    s->original_wce = bdrv_enable_write_cache(blk->conf.bs);
     if (blkconf_geometry(&blk->conf, NULL, 65535, 255, 255) < 0) {
         return -1;
     }
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index b87cf49..41885da 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -123,6 +123,7 @@  typedef struct VirtIOBlock {
     BlockConf *conf;
     VirtIOBlkConf blk;
     unsigned short sector_mask;
+    bool original_wce;
     VMChangeStateEntry *change;
 #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
     Notifier migration_state_notifier;