Patchwork [RFC,1/3] xen_disk: handle disk files on ramfs/tmpfs

login
register
mail settings
Submitter Roger Pau Monne
Date Dec. 31, 2012, 12:16 p.m.
Message ID <1356956174-23548-2-git-send-email-roger.pau@citrix.com>
Download mbox | patch
Permalink /patch/208847/
State New
Headers show

Comments

Roger Pau Monne - Dec. 31, 2012, 12:16 p.m.
Files that reside on ramfs or tmpfs cannot be opened with O_DIRECT,
if first call to bdrv_open fails with errno = EINVAL, try a second
call without BDRV_O_NOCACHE.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: xen-devel@lists.xen.org
Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>
---
 hw/xen_disk.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)
Konrad Rzeszutek Wilk - Jan. 3, 2013, 2:21 p.m.
On Mon, Dec 31, 2012 at 01:16:12PM +0100, Roger Pau Monne wrote:
> Files that reside on ramfs or tmpfs cannot be opened with O_DIRECT,

That is not entirely true. There are patches floating around (LKML)
to make tmpfs/ramfs be able to do this.

> if first call to bdrv_open fails with errno = EINVAL, try a second
> call without BDRV_O_NOCACHE.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Cc: xen-devel@lists.xen.org
> Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
> Cc: Anthony PERARD <anthony.perard@citrix.com>
> ---
>  hw/xen_disk.c |   16 +++++++++++++---
>  1 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> index e6bb2f2..a159ee5 100644
> --- a/hw/xen_disk.c
> +++ b/hw/xen_disk.c
> @@ -562,7 +562,7 @@ static void blk_alloc(struct XenDevice *xendev)
>  static int blk_init(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> -    int index, qflags, info = 0;
> +    int index, qflags, info = 0, rc;
>  
>      /* read xenstore entries */
>      if (blkdev->params == NULL) {
> @@ -625,8 +625,18 @@ static int blk_init(struct XenDevice *xendev)
>          xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus setup)\n");
>          blkdev->bs = bdrv_new(blkdev->dev);
>          if (blkdev->bs) {
> -            if (bdrv_open(blkdev->bs, blkdev->filename, qflags,
> -                        bdrv_find_whitelisted_format(blkdev->fileproto)) != 0) {
> +            rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
> +                        bdrv_find_whitelisted_format(blkdev->fileproto));
> +            if (rc != 0 && errno == EINVAL) {
> +                /* Files on ramfs or tmpfs cannot be opened with O_DIRECT,
> +                 * remove the BDRV_O_NOCACHE flag, and try to open
> +                 * the file again.
> +                 */
> +                qflags &= ~BDRV_O_NOCACHE;
> +                rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
> +                        bdrv_find_whitelisted_format(blkdev->fileproto));
> +            }
> +            if (rc != 0) {
>                  bdrv_delete(blkdev->bs);
>                  blkdev->bs = NULL;
>              }
> -- 
> 1.7.7.5 (Apple Git-26)
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
Ian Campbell - Jan. 3, 2013, 2:28 p.m.
On Mon, 2012-12-31 at 12:16 +0000, Roger Pau Monne wrote:
> Files that reside on ramfs or tmpfs cannot be opened with O_DIRECT,
> if first call to bdrv_open fails with errno = EINVAL, try a second
> call without BDRV_O_NOCACHE.

Doesn't that risk spuriously turning of NOCACHE on other sorts of
devices as well which (potentially) opens up a data loss issue?

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Cc: xen-devel@lists.xen.org
> Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
> Cc: Anthony PERARD <anthony.perard@citrix.com>
> ---
>  hw/xen_disk.c |   16 +++++++++++++---
>  1 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> index e6bb2f2..a159ee5 100644
> --- a/hw/xen_disk.c
> +++ b/hw/xen_disk.c
> @@ -562,7 +562,7 @@ static void blk_alloc(struct XenDevice *xendev)
>  static int blk_init(struct XenDevice *xendev)
>  {
>      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> -    int index, qflags, info = 0;
> +    int index, qflags, info = 0, rc;
>  
>      /* read xenstore entries */
>      if (blkdev->params == NULL) {
> @@ -625,8 +625,18 @@ static int blk_init(struct XenDevice *xendev)
>          xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus setup)\n");
>          blkdev->bs = bdrv_new(blkdev->dev);
>          if (blkdev->bs) {
> -            if (bdrv_open(blkdev->bs, blkdev->filename, qflags,
> -                        bdrv_find_whitelisted_format(blkdev->fileproto)) != 0) {
> +            rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
> +                        bdrv_find_whitelisted_format(blkdev->fileproto));
> +            if (rc != 0 && errno == EINVAL) {
> +                /* Files on ramfs or tmpfs cannot be opened with O_DIRECT,
> +                 * remove the BDRV_O_NOCACHE flag, and try to open
> +                 * the file again.
> +                 */
> +                qflags &= ~BDRV_O_NOCACHE;
> +                rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
> +                        bdrv_find_whitelisted_format(blkdev->fileproto));
> +            }
> +            if (rc != 0) {
>                  bdrv_delete(blkdev->bs);
>                  blkdev->bs = NULL;
>              }
Stefano Stabellini - Jan. 4, 2013, 2:54 p.m.
On Thu, 3 Jan 2013, Ian Campbell wrote:
> On Mon, 2012-12-31 at 12:16 +0000, Roger Pau Monne wrote:
> > Files that reside on ramfs or tmpfs cannot be opened with O_DIRECT,
> > if first call to bdrv_open fails with errno = EINVAL, try a second
> > call without BDRV_O_NOCACHE.
> 
> Doesn't that risk spuriously turning of NOCACHE on other sorts of
> devices as well which (potentially) opens up a data loss issue?

I agree, we shouldn't have this kind of critical configuration changes
behind the user's back.

I would rather let the user set the cache attributes, QEMU has already a
command line option for it, but we can't use it directly because
xen_disk gets the configuration solely from xenstore at the moment.

I guess we could add a key pair cache=foobar to the xl disk
configuration spec, that gets translated somehow to a key on xenstore.
Xen_disk would read the key and sets qflags accordingly.
We could use the same cache parameters supported by QEMU, see
bdrv_parse_cache_flags.

As an alternative, we could reuse the already defined "access" key, like
this:

access=rw|nocache

or

access=rw|unsafe


> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > Cc: xen-devel@lists.xen.org
> > Cc: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
> > Cc: Anthony PERARD <anthony.perard@citrix.com>
> > ---
> >  hw/xen_disk.c |   16 +++++++++++++---
> >  1 files changed, 13 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/xen_disk.c b/hw/xen_disk.c
> > index e6bb2f2..a159ee5 100644
> > --- a/hw/xen_disk.c
> > +++ b/hw/xen_disk.c
> > @@ -562,7 +562,7 @@ static void blk_alloc(struct XenDevice *xendev)
> >  static int blk_init(struct XenDevice *xendev)
> >  {
> >      struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
> > -    int index, qflags, info = 0;
> > +    int index, qflags, info = 0, rc;
> >  
> >      /* read xenstore entries */
> >      if (blkdev->params == NULL) {
> > @@ -625,8 +625,18 @@ static int blk_init(struct XenDevice *xendev)
> >          xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus setup)\n");
> >          blkdev->bs = bdrv_new(blkdev->dev);
> >          if (blkdev->bs) {
> > -            if (bdrv_open(blkdev->bs, blkdev->filename, qflags,
> > -                        bdrv_find_whitelisted_format(blkdev->fileproto)) != 0) {
> > +            rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
> > +                        bdrv_find_whitelisted_format(blkdev->fileproto));
> > +            if (rc != 0 && errno == EINVAL) {
> > +                /* Files on ramfs or tmpfs cannot be opened with O_DIRECT,
> > +                 * remove the BDRV_O_NOCACHE flag, and try to open
> > +                 * the file again.
> > +                 */
> > +                qflags &= ~BDRV_O_NOCACHE;
> > +                rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
> > +                        bdrv_find_whitelisted_format(blkdev->fileproto));
> > +            }
> > +            if (rc != 0) {
> >                  bdrv_delete(blkdev->bs);
> >                  blkdev->bs = NULL;
> >              }
> 
> 
>
Roger Pau Monne - Jan. 4, 2013, 3:05 p.m.
On 04/01/13 15:54, Stefano Stabellini wrote:
> On Thu, 3 Jan 2013, Ian Campbell wrote:
>> On Mon, 2012-12-31 at 12:16 +0000, Roger Pau Monne wrote:
>>> Files that reside on ramfs or tmpfs cannot be opened with O_DIRECT,
>>> if first call to bdrv_open fails with errno = EINVAL, try a second
>>> call without BDRV_O_NOCACHE.
>>
>> Doesn't that risk spuriously turning of NOCACHE on other sorts of
>> devices as well which (potentially) opens up a data loss issue?
> 
> I agree, we shouldn't have this kind of critical configuration changes
> behind the user's back.
> 
> I would rather let the user set the cache attributes, QEMU has already a
> command line option for it, but we can't use it directly because
> xen_disk gets the configuration solely from xenstore at the moment.
> 
> I guess we could add a key pair cache=foobar to the xl disk
> configuration spec, that gets translated somehow to a key on xenstore.
> Xen_disk would read the key and sets qflags accordingly.
> We could use the same cache parameters supported by QEMU, see
> bdrv_parse_cache_flags.
> 
> As an alternative, we could reuse the already defined "access" key, like
> this:
> 
> access=rw|nocache
> 
> or
> 
> access=rw|unsafe

I needed this patch to be able to perform the benchmarks for the
persistent grants implementation, but I realize this is not the best way
to solve this problem.

It might be worth to think of a good way to pass more information to the
qdisk backend (not only limited to whether O_DIRECT should be used or
not), so we can take advantage in the future of all the possible file
backends that Qemu supports, like GlusterFS or SheepDog.
Stefano Stabellini - Jan. 4, 2013, 3:30 p.m.
On Fri, 4 Jan 2013, Roger Pau Monne wrote:
> On 04/01/13 15:54, Stefano Stabellini wrote:
> > On Thu, 3 Jan 2013, Ian Campbell wrote:
> >> On Mon, 2012-12-31 at 12:16 +0000, Roger Pau Monne wrote:
> >>> Files that reside on ramfs or tmpfs cannot be opened with O_DIRECT,
> >>> if first call to bdrv_open fails with errno = EINVAL, try a second
> >>> call without BDRV_O_NOCACHE.
> >>
> >> Doesn't that risk spuriously turning of NOCACHE on other sorts of
> >> devices as well which (potentially) opens up a data loss issue?
> > 
> > I agree, we shouldn't have this kind of critical configuration changes
> > behind the user's back.
> > 
> > I would rather let the user set the cache attributes, QEMU has already a
> > command line option for it, but we can't use it directly because
> > xen_disk gets the configuration solely from xenstore at the moment.
> > 
> > I guess we could add a key pair cache=foobar to the xl disk
> > configuration spec, that gets translated somehow to a key on xenstore.
> > Xen_disk would read the key and sets qflags accordingly.
> > We could use the same cache parameters supported by QEMU, see
> > bdrv_parse_cache_flags.
> > 
> > As an alternative, we could reuse the already defined "access" key, like
> > this:
> > 
> > access=rw|nocache
> > 
> > or
> > 
> > access=rw|unsafe
> 
> I needed this patch to be able to perform the benchmarks for the
> persistent grants implementation, but I realize this is not the best way
> to solve this problem.
> 
> It might be worth to think of a good way to pass more information to the
> qdisk backend (not only limited to whether O_DIRECT should be used or
> not), so we can take advantage in the future of all the possible file
> backends that Qemu supports, like GlusterFS or SheepDog.

Yes, you are right.
However in the Xen world QEMU is never invoked directly, always via
libxl. So it is natural that whatever we want to expose to the user has
to go through libxl. I suppose that GlusterFS and SheepDog make no
exception: they would be just another key=value pair or just another
value in the xl disk config line.
At this point it doesn't matter that much how we pass these parameters
from libxl to QEMU: at the moment everything is done via xenstore, we
might as well keep doing it that way.

Patch

diff --git a/hw/xen_disk.c b/hw/xen_disk.c
index e6bb2f2..a159ee5 100644
--- a/hw/xen_disk.c
+++ b/hw/xen_disk.c
@@ -562,7 +562,7 @@  static void blk_alloc(struct XenDevice *xendev)
 static int blk_init(struct XenDevice *xendev)
 {
     struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, xendev);
-    int index, qflags, info = 0;
+    int index, qflags, info = 0, rc;
 
     /* read xenstore entries */
     if (blkdev->params == NULL) {
@@ -625,8 +625,18 @@  static int blk_init(struct XenDevice *xendev)
         xen_be_printf(&blkdev->xendev, 2, "create new bdrv (xenbus setup)\n");
         blkdev->bs = bdrv_new(blkdev->dev);
         if (blkdev->bs) {
-            if (bdrv_open(blkdev->bs, blkdev->filename, qflags,
-                        bdrv_find_whitelisted_format(blkdev->fileproto)) != 0) {
+            rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
+                        bdrv_find_whitelisted_format(blkdev->fileproto));
+            if (rc != 0 && errno == EINVAL) {
+                /* Files on ramfs or tmpfs cannot be opened with O_DIRECT,
+                 * remove the BDRV_O_NOCACHE flag, and try to open
+                 * the file again.
+                 */
+                qflags &= ~BDRV_O_NOCACHE;
+                rc = bdrv_open(blkdev->bs, blkdev->filename, qflags,
+                        bdrv_find_whitelisted_format(blkdev->fileproto));
+            }
+            if (rc != 0) {
                 bdrv_delete(blkdev->bs);
                 blkdev->bs = NULL;
             }