diff mbox

[v2] xen_disk: support "direct-io-safe" backend option

Message ID alpine.DEB.2.02.1306271911210.4782@kaball.uk.xensource.com
State New
Headers show

Commit Message

Stefano Stabellini June 27, 2013, 6:16 p.m. UTC
Support backend option "direct-io-safe".  This is documented as
follows in the Xen backend specification:

 * direct-io-safe
 *      Values:         0/1 (boolean)
 *      Default Value:  0
 *
 *      The underlying storage is not affected by the direct IO memory
 *      lifetime bug.  See:
 *        http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
 *
 *      Therefore this option gives the backend permission to use
 *      O_DIRECT, notwithstanding that bug.
 *
 *      That is, if this option is enabled, use of O_DIRECT is safe,
 *      in circumstances where we would normally have avoided it as a
 *      workaround for that bug.  This option is not relevant for all
 *      backends, and even not necessarily supported for those for
 *      which it is relevant.  A backend which knows that it is not
 *      affected by the bug can ignore this option.
 *
 *      This option doesn't require a backend to use O_DIRECT, so it
 *      should not be used to try to control the caching behaviour.

Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the
default flags passed to the qemu block layer.

The original proposal for a "cache" backend option has been dropped
because it was believed too wide, especially considering that at the
moment the backend doesn't have a way to tell the toolstack that it is
capable of supporting it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>

Comments

Paolo Bonzini June 28, 2013, 7:56 a.m. UTC | #1
Il 27/06/2013 20:16, Stefano Stabellini ha scritto:
> Support backend option "direct-io-safe".  This is documented as
> follows in the Xen backend specification:
> 
>  * direct-io-safe
>  *      Values:         0/1 (boolean)
>  *      Default Value:  0
>  *
>  *      The underlying storage is not affected by the direct IO memory
>  *      lifetime bug.  See:
>  *        http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
>  *
>  *      Therefore this option gives the backend permission to use
>  *      O_DIRECT, notwithstanding that bug.
>  *
>  *      That is, if this option is enabled, use of O_DIRECT is safe,
>  *      in circumstances where we would normally have avoided it as a
>  *      workaround for that bug.  This option is not relevant for all
>  *      backends, and even not necessarily supported for those for
>  *      which it is relevant.  A backend which knows that it is not
>  *      affected by the bug can ignore this option.
>  *
>  *      This option doesn't require a backend to use O_DIRECT, so it
>  *      should not be used to try to control the caching behaviour.
> 
> Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the
> default flags passed to the qemu block layer.
> 
> The original proposal for a "cache" backend option has been dropped
> because it was believed too wide, especially considering that at the
> moment the backend doesn't have a way to tell the toolstack that it is
> capable of supporting it.

Given how rusty my xenstore-fu is, I'll ask the obvious: the frontend
cannot write to it, can it?

Paolo
Alex Bligh June 28, 2013, 8:48 a.m. UTC | #2
Stefano,

--On 27 June 2013 19:16:30 +0100 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

>  *      Therefore this option gives the backend permission to use
>  *      O_DIRECT, notwithstanding that bug.

Looks useful. Are you planning to do this for both emulated and pv
disks?
Ian Jackson June 28, 2013, 10:44 a.m. UTC | #3
Alex Bligh writes ("Re: [Qemu-devel] [PATCH v2] xen_disk: support "direct-io-safe" backend	option"):
> Stefano,
> --On 27 June 2013 19:16:30 +0100 Stefano Stabellini 
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> >  *      Therefore this option gives the backend permission to use
> >  *      O_DIRECT, notwithstanding that bug.
> 
> Looks useful. Are you planning to do this for both emulated and pv
> disks?

Emulated disks don't have the same problem because they don't try to
use O_DIRECT on pages shared with the guest via the Xen grant table
mechanism.

Ian.
Ian Jackson June 28, 2013, 10:54 a.m. UTC | #4
Paolo Bonzini writes ("Re: [PATCH v2] xen_disk: support "direct-io-safe" backend option"):
> Il 27/06/2013 20:16, Stefano Stabellini ha scritto:
> > The original proposal for a "cache" backend option has been dropped
> > because it was believed too wide, especially considering that at the
> > moment the backend doesn't have a way to tell the toolstack that it is
> > capable of supporting it.
> 
> Given how rusty my xenstore-fu is, I'll ask the obvious: the frontend
> cannot write to it, can it?

No, the backend directory is writeable only by the toolstack and
driver domains.

Ian.
Stefano Stabellini June 28, 2013, 10:56 a.m. UTC | #5
On Fri, 28 Jun 2013, Alex Bligh wrote:
> Stefano,
> 
> --On 27 June 2013 19:16:30 +0100 Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> >  *      Therefore this option gives the backend permission to use
> >  *      O_DIRECT, notwithstanding that bug.
> 
> Looks useful. Are you planning to do this for both emulated and pv
> disks?

This is PV only, at least for the moment: emulated disks always use
writeback caching.
From the performance point of view, making this change for IDE disks is
not very important (because IDE is slow anyway).
Stefano Stabellini June 28, 2013, 10:57 a.m. UTC | #6
On Fri, 28 Jun 2013, Paolo Bonzini wrote:
> Il 27/06/2013 20:16, Stefano Stabellini ha scritto:
> > Support backend option "direct-io-safe".  This is documented as
> > follows in the Xen backend specification:
> > 
> >  * direct-io-safe
> >  *      Values:         0/1 (boolean)
> >  *      Default Value:  0
> >  *
> >  *      The underlying storage is not affected by the direct IO memory
> >  *      lifetime bug.  See:
> >  *        http://lists.xen.org/archives/html/xen-devel/2012-12/msg01154.html
> >  *
> >  *      Therefore this option gives the backend permission to use
> >  *      O_DIRECT, notwithstanding that bug.
> >  *
> >  *      That is, if this option is enabled, use of O_DIRECT is safe,
> >  *      in circumstances where we would normally have avoided it as a
> >  *      workaround for that bug.  This option is not relevant for all
> >  *      backends, and even not necessarily supported for those for
> >  *      which it is relevant.  A backend which knows that it is not
> >  *      affected by the bug can ignore this option.
> >  *
> >  *      This option doesn't require a backend to use O_DIRECT, so it
> >  *      should not be used to try to control the caching behaviour.
> > 
> > Also, BDRV_O_NATIVE_AIO is ignored if BDRV_O_NOCACHE, so clarify the
> > default flags passed to the qemu block layer.
> > 
> > The original proposal for a "cache" backend option has been dropped
> > because it was believed too wide, especially considering that at the
> > moment the backend doesn't have a way to tell the toolstack that it is
> > capable of supporting it.
> 
> Given how rusty my xenstore-fu is, I'll ask the obvious: the frontend
> cannot write to it, can it?

Nope, this option lives under the backend path, that is read-only for
the frontend
Alex Bligh June 28, 2013, 4:17 p.m. UTC | #7
--On 28 June 2013 11:44:41 +0100 Ian Jackson <Ian.Jackson@eu.citrix.com> 
wrote:

>> Looks useful. Are you planning to do this for both emulated and pv
>> disks?
>
> Emulated disks don't have the same problem because they don't try to
> use O_DIRECT on pages shared with the guest via the Xen grant table
> mechanism.

I should have been more specific. The original thread maintained
emulated disks always had O_DIRECT turned off, despite the fact
the rationale for using O_DIRECT for PV disks was that not using
O_DIRECT in some circumstances might be unsafe, because it was
the only way to get any decent performance out of them. I think
we ran the 'no O_DIRECT might be unsafe' argument to ground, but
if the rationale for Stefano's patch is not just speed but additional
safety (for instance against the host dying and losing the page
cache for file systems that have barriers switched off), then
there is an argument to use it for emulated disks too.

But as Stefano says:

--On 28 June 2013 11:56:29 +0100 Stefano Stabellini 
<stefano.stabellini@eu.citrix.com> wrote:

> This is PV only, at least for the moment: emulated disks always use
> writeback caching.
> From the performance point of view, making this change for IDE disks is
> not very important (because IDE is slow anyway).

... perhaps 'who cares'.
Paolo Bonzini June 28, 2013, 4:26 p.m. UTC | #8
Il 28/06/2013 18:17, Alex Bligh ha scritto:
> 
> But as Stefano says:
> 
> --On 28 June 2013 11:56:29 +0100 Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> 
>> This is PV only, at least for the moment: emulated disks always use
>> writeback caching.
>> From the performance point of view, making this change for IDE disks is
>> not very important (because IDE is slow anyway).
> 
> ... perhaps 'who cares'.

There are plenty of emulated devices that are not that slow, though
perhaps Xen management does not support them: AHCI, MegaSAS, VMware
pvscsi, and of course virtio-{blk,scsi}...

Paolo
diff mbox

Patch

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 247f32f..727f433 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -93,6 +93,7 @@  struct XenBlkDev {
     char                *type;
     char                *dev;
     char                *devtype;
+    bool                directiosafe;
     const char          *fileproto;
     const char          *filename;
     int                 ring_ref;
@@ -701,6 +702,7 @@  static int blk_init(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
     int info = 0;
+    char *directiosafe = NULL;
 
     /* read xenstore entries */
     if (blkdev->params == NULL) {
@@ -733,6 +735,8 @@  static int blk_init(struct XenDevice *xendev)
     if (blkdev->devtype == NULL) {
         blkdev->devtype = xenstore_read_be_str(&blkdev->xendev, "device-type");
     }
+    directiosafe = xenstore_read_be_str(&blkdev->xendev, "direct-io-safe");
+    blkdev->directiosafe = (directiosafe && atoi(directiosafe));
 
     /* do we have all we need? */
     if (blkdev->params == NULL ||
@@ -760,6 +764,8 @@  static int blk_init(struct XenDevice *xendev)
     xenstore_write_be_int(&blkdev->xendev, "feature-flush-cache", 1);
     xenstore_write_be_int(&blkdev->xendev, "feature-persistent", 1);
     xenstore_write_be_int(&blkdev->xendev, "info", info);
+
+    g_free(directiosafe);
     return 0;
 
 out_error:
@@ -773,6 +779,8 @@  out_error:
     blkdev->dev = NULL;
     g_free(blkdev->devtype);
     blkdev->devtype = NULL;
+    g_free(directiosafe);
+    blkdev->directiosafe = false;
     return -1;
 }
 
@@ -783,7 +791,11 @@  static int blk_connect(struct XenDevice *xendev)
     bool readonly = true;
 
     /* read-only ? */
-    qflags = BDRV_O_CACHE_WB | BDRV_O_NATIVE_AIO;
+    if (blkdev->directiosafe) {
+        qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
+    } else {
+        qflags = BDRV_O_CACHE_WB;
+    }
     if (strcmp(blkdev->mode, "w") == 0) {
         qflags |= BDRV_O_RDWR;
         readonly = false;