diff mbox

[v2] Add support for fd: protocol

Message ID 4DF762A4.2040503@us.ibm.com
State New
Headers show

Commit Message

Corey Bryant June 14, 2011, 1:31 p.m. UTC
sVirt provides SELinux MAC isolation for Qemu guest processes and their
corresponding resources (image files). sVirt provides this support
by labeling guests and resources with security labels that are stored
in file system extended attributes. Some file systems, such as NFS, do
not support the extended attribute security namespace, which is needed
for image file isolation when using the sVirt SELinux security driver
in libvirt.

The proposed solution entails a combination of Qemu, libvirt, and
SELinux patches that work together to isolate multiple guests' images
when they're stored in the same NFS mount. This results in an
environment where sVirt isolation and NFS image file isolation can both
be provided.

This patch contains the Qemu code to support this solution. I would
like to solicit input from the libvirt community prior to starting
the libvirt patch.

Currently, Qemu opens an image file in addition to performing the
necessary read and write operations. The proposed solution will move
the open out of Qemu and into libvirt. Once libvirt opens an image
file for the guest, it will pass the file descriptor to Qemu via a
new fd: protocol.

If the image file resides in an NFS mount, the following SELinux policy
changes will provide image isolation:

  - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
    allow Qemu (svirt_t) to only have SELinux read and write
    permissions on nfs_t files

  - Qemu (svirt_t) also gets SELinux use permissions on libvirt
    (virtd_t) file descriptors

Following is a sample invocation of Qemu using the fd: protocol on
the command line:

    qemu -drive file=fd:4,format=qcow2

The fd: protocol is also supported by the drive_add monitor command.
This requires that the specified file descriptor is passed to the
monitor alongside a prior getfd monitor command.

There are some additional features provided by certain image types
where Qemu reopens the image file. All of these scenarios will be
unsupported for the fd: protocol, at least for this patch:

  - The -snapshot command line option
  - The savevm monitor command
  - The snapshot_blkdev monitor command
  - Starting Qemu with a backing file

The thought is that this support can be added in the future, but is
not required for the initial fd: support.

This patch was tested with the following formats: raw, cow, qcow,
qcow2, qed, and vmdk, using the fd: protocol from the command line
and the monitor. Tests were also run to verify existing file name
support and qemu-img were not regressed. Non-valid file descriptors,
fd: without format, snapshot and backing files were also tested.

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
 block.c           |   16 ++++++++++
 block.h           |    1 +
 block/cow.c       |    5 +++
 block/qcow.c      |    5 +++
 block/qcow2.c     |    5 +++
 block/qed.c       |    4 ++
 block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
 block/vmdk.c      |    5 +++
 block_int.h       |    1 +
 blockdev.c        |   10 ++++++
 monitor.c         |    5 +++
 monitor.h         |    1 +
 qemu-options.hx   |    8 +++--
 qemu-tool.c       |    5 +++
 14 files changed, 140 insertions(+), 12 deletions(-)

Comments

Eric Blake June 14, 2011, 4:13 p.m. UTC | #1
On 06/14/2011 07:31 AM, Corey Bryant wrote:
> This patch contains the Qemu code to support this solution. I would
> like to solicit input from the libvirt community prior to starting
> the libvirt patch.
> 
> Currently, Qemu opens an image file in addition to performing the
> necessary read and write operations. The proposed solution will move
> the open out of Qemu and into libvirt. Once libvirt opens an image
> file for the guest, it will pass the file descriptor to Qemu via a
> new fd: protocol.

Certainly sounds reasonable.  And libvirt already has code in place to
use fd: passing for migration reasons (including the use of the getfd
monitor command).

> There are some additional features provided by certain image types
> where Qemu reopens the image file. All of these scenarios will be
> unsupported for the fd: protocol, at least for this patch:
> 
>   - The -snapshot command line option

As it is, libvirt is not currently exposing the -snapshot option, so no
loss in functionality there (although it might be nice to someday add that).

>   - The savevm monitor command

As it is, the savevm monitor command is limited - it is an
all-or-nothing command that requires that all associated disk images be
qcow2 format, where qemu does the snapshot.  Right now, there is work
under way to add new monitor commands to facilitate live snapshots,
where libvirt is instead the entity doing the snapshot and where libvirt
can use qemu-img for qcow2, as well as use lvm and btrfs features for
instant snapshots of other file types, as well as supporting (slower)
copies as snapshots.  So from that aspect, libvirt will probably want to
use the newer monitor commands for live snapshot purposes anyways.

>   - The snapshot_blkdev monitor command

Libvirt isn't currently using this feature.

>   - Starting Qemu with a backing file

What do you mean by this?  Taking a guess:

In the case of a qcow2 image with a backing file, does that mean that
both the qcow2 image and it's backing file can both be passed to qemu
via fd: notations?  That is, if the -drive file=fd:4,format=qcow2 option
is passed, and fd4 is a qcow2 image that also has a backing file, it
seems like libvirt should also be able to pass that backing file via
another fd, so that qemu doesn't have to open() the backing file
directly.  So we would need something like -drive
file=fd:4,format=qcow2,backing=fd:5

and since backing files can be nested, we'd need some way of specifying
more than one level of backing file.  Libvirt already knows how to walk
a chain of backing images in qcow2 files (it has to, in order to set
sVirt SELinux permissions on all of those files so that qemu can open()
each file), so it wouldn't be much harder for libvirt to instead do the
open() and pass each fd.
Corey Bryant June 14, 2011, 7:55 p.m. UTC | #2
On 06/14/2011 12:13 PM, Eric Blake wrote:
> On 06/14/2011 07:31 AM, Corey Bryant wrote:
>> >     - Starting Qemu with a backing file
> What do you mean by this?  Taking a guess:
>
> In the case of a qcow2 image with a backing file, does that mean that
> both the qcow2 image and it's backing file can both be passed to qemu
> via fd: notations?  That is, if the -drive file=fd:4,format=qcow2 option
> is passed, and fd4 is a qcow2 image that also has a backing file, it
> seems like libvirt should also be able to pass that backing file via
> another fd, so that qemu doesn't have to open() the backing file
> directly.  So we would need something like -drive
> file=fd:4,format=qcow2,backing=fd:5
>
> and since backing files can be nested, we'd need some way of specifying
> more than one level of backing file.  Libvirt already knows how to walk
> a chain of backing images in qcow2 files (it has to, in order to set
> sVirt SELinux permissions on all of those files so that qemu can open()
> each file), so it wouldn't be much harder for libvirt to instead do the
> open() and pass each fd.
>
> -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization
> library http://libvirt.org

Right.  So what I was talking about here (and poorly stated) is that 
starting Qemu with a copy-on-write image file causes a reopen of the 
backing file. In this case you would only be passing the fd of the 
copy-on-write file to Qemu. I've fenced that off as unsupported for now.

I like your approach for passing multiple fds. Do you think backing file 
support is needed immediately with this patch?
Eric Blake June 14, 2011, 8:18 p.m. UTC | #3
On 06/14/2011 01:55 PM, Corey Bryant wrote:
>> So we would need something like -drive
>> file=fd:4,format=qcow2,backing=fd:5
>>
>> and since backing files can be nested, we'd need some way of specifying
>> more than one level of backing file.  Libvirt already knows how to walk
>> a chain of backing images in qcow2 files (it has to, in order to set
>> sVirt SELinux permissions on all of those files so that qemu can open()
>> each file), so it wouldn't be much harder for libvirt to instead do the
>> open() and pass each fd.
>>
> Right.  So what I was talking about here (and poorly stated) is that
> starting Qemu with a copy-on-write image file causes a reopen of the
> backing file. In this case you would only be passing the fd of the
> copy-on-write file to Qemu. I've fenced that off as unsupported for now.
> 
> I like your approach for passing multiple fds. Do you think backing file
> support is needed immediately with this patch?

Incremental support is fine by me, but we'll need it sooner or later.
And for the nested backing file case, does:

-drive file=fd:4,format=qcow2,backing=fd:5,backing=fd:6

work, or would we need:

-drive file=fd:4,format=qcow2,backing1=fd:5,backing2=fd:6

That is, can you have more than one backing=, or do you need distinct
backingN=, and if you have to support multiple N, we must make sure the
solution allows arbitrarily deep nesting (or at least as deep as any
current limits qemu places on qcow2 backing file nesting).
Corey Bryant June 14, 2011, 9:39 p.m. UTC | #4
On 06/14/2011 04:18 PM, Eric Blake wrote:
> On 06/14/2011 01:55 PM, Corey Bryant wrote:
>>> >>  So we would need something like -drive
>>> >>  file=fd:4,format=qcow2,backing=fd:5
>>> >>
>>> >>  and since backing files can be nested, we'd need some way of specifying
>>> >>  more than one level of backing file.  Libvirt already knows how to walk
>>> >>  a chain of backing images in qcow2 files (it has to, in order to set
>>> >>  sVirt SELinux permissions on all of those files so that qemu can open()
>>> >>  each file), so it wouldn't be much harder for libvirt to instead do the
>>> >>  open() and pass each fd.
>>> >>
>> >  Right.  So what I was talking about here (and poorly stated) is that
>> >  starting Qemu with a copy-on-write image file causes a reopen of the
>> >  backing file. In this case you would only be passing the fd of the
>> >  copy-on-write file to Qemu. I've fenced that off as unsupported for now.
>> >
>> >  I like your approach for passing multiple fds. Do you think backing file
>> >  support is needed immediately with this patch?
> Incremental support is fine by me, but we'll need it sooner or later.
> And for the nested backing file case, does:
>
> -drive file=fd:4,format=qcow2,backing=fd:5,backing=fd:6
>
> work, or would we need:
>
> -drive file=fd:4,format=qcow2,backing1=fd:5,backing2=fd:6
>
> That is, can you have more than one backing=, or do you need distinct
> backingN=, and if you have to support multiple N, we must make sure the
> solution allows arbitrarily deep nesting (or at least as deep as any
> current limits qemu places on qcow2 backing file nesting).

Qemu currently opens the specified copy-on-write file, finds a backing 
file name in the opened file's header, opens that backing file, and repeats.

So it seems that either a backingN= would be needed or a way to map a 
file descriptor to the file name in the header.

I think the latter is along the lines of what was discussed here with 
-blockdev:
http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02417.html

Repeating the syntax specified in that thread, you'd have:

-blockdev id=foo-base,path=fd:4,format=raw
-blockdev id=foo,path=fd:3,format=qcow2,backing_file=foo-base

where id is the file name.  This chains the backing files on the command 
line.

Corey
Eric Blake June 14, 2011, 9:52 p.m. UTC | #5
On 06/14/2011 03:39 PM, Corey Bryant wrote:
>> Incremental support is fine by me, but we'll need it sooner or later.
> 
> Qemu currently opens the specified copy-on-write file, finds a backing
> file name in the opened file's header, opens that backing file, and
> repeats.
> 
> So it seems that either a backingN= would be needed or a way to map a
> file descriptor to the file name in the header.
> 
> I think the latter is along the lines of what was discussed here with
> -blockdev:
> http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg02417.html
> 
> Repeating the syntax specified in that thread, you'd have:
> 
> -blockdev id=foo-base,path=fd:4,format=raw
> -blockdev id=foo,path=fd:3,format=qcow2,backing_file=foo-base
> 
> where id is the file name.  This chains the backing files on the command
> line.

I like it.  It scales as deep as we want by adding as many additional
-blockdev entries prior to the -drive that uses the nested -blockdevs.
Blue Swirl June 15, 2011, 7:12 p.m. UTC | #6
On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant <bryntcor@us.ibm.com> wrote:
> sVirt provides SELinux MAC isolation for Qemu guest processes and their
> corresponding resources (image files). sVirt provides this support
> by labeling guests and resources with security labels that are stored
> in file system extended attributes. Some file systems, such as NFS, do
> not support the extended attribute security namespace, which is needed
> for image file isolation when using the sVirt SELinux security driver
> in libvirt.
>
> The proposed solution entails a combination of Qemu, libvirt, and
> SELinux patches that work together to isolate multiple guests' images
> when they're stored in the same NFS mount. This results in an
> environment where sVirt isolation and NFS image file isolation can both
> be provided.
>
> This patch contains the Qemu code to support this solution. I would
> like to solicit input from the libvirt community prior to starting
> the libvirt patch.
>
> Currently, Qemu opens an image file in addition to performing the
> necessary read and write operations. The proposed solution will move
> the open out of Qemu and into libvirt. Once libvirt opens an image
> file for the guest, it will pass the file descriptor to Qemu via a
> new fd: protocol.
>
> If the image file resides in an NFS mount, the following SELinux policy
> changes will provide image isolation:
>
>  - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>    allow Qemu (svirt_t) to only have SELinux read and write
>    permissions on nfs_t files
>
>  - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>    (virtd_t) file descriptors
>
> Following is a sample invocation of Qemu using the fd: protocol on
> the command line:
>
>    qemu -drive file=fd:4,format=qcow2
>
> The fd: protocol is also supported by the drive_add monitor command.
> This requires that the specified file descriptor is passed to the
> monitor alongside a prior getfd monitor command.
>
> There are some additional features provided by certain image types
> where Qemu reopens the image file. All of these scenarios will be
> unsupported for the fd: protocol, at least for this patch:
>
>  - The -snapshot command line option
>  - The savevm monitor command
>  - The snapshot_blkdev monitor command
>  - Starting Qemu with a backing file

There's also native CDROM device. Did you consider adding an explicit
reopen method to block layer?

> The thought is that this support can be added in the future, but is
> not required for the initial fd: support.
>
> This patch was tested with the following formats: raw, cow, qcow,
> qcow2, qed, and vmdk, using the fd: protocol from the command line
> and the monitor. Tests were also run to verify existing file name
> support and qemu-img were not regressed. Non-valid file descriptors,
> fd: without format, snapshot and backing files were also tested.
>
> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
> ---
>  block.c           |   16 ++++++++++
>  block.h           |    1 +
>  block/cow.c       |    5 +++
>  block/qcow.c      |    5 +++
>  block/qcow2.c     |    5 +++
>  block/qed.c       |    4 ++
>  block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>  block/vmdk.c      |    5 +++
>  block_int.h       |    1 +
>  blockdev.c        |   10 ++++++
>  monitor.c         |    5 +++
>  monitor.h         |    1 +
>  qemu-options.hx   |    8 +++--
>  qemu-tool.c       |    5 +++
>  14 files changed, 140 insertions(+), 12 deletions(-)
>
> diff --git a/block.c b/block.c
> index 24a25d5..500db84 100644
> --- a/block.c
> +++ b/block.c
> @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>         char tmp_filename[PATH_MAX];
>         char backing_filename[PATH_MAX];
>
> +        if (bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }
> +
>         /* if snapshot, we create a temporary backing file and open it
>            instead of opening 'filename' directly */
>
> @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>
>     /* Find the right image format driver */
>     if (!drv) {
> +        /* format must be specified for fd: protocol */
> +        if (bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }
>         ret = find_image_format(filename, &drv);
>     }
>
> @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>     return bs->enable_write_cache;
>  }
>
> +int bdrv_is_fd_protocol(BlockDriverState *bs)
> +{
> +    return bs->fd_protocol;
> +}
> +
>  /* XXX: no longer used */
>  void bdrv_set_change_cb(BlockDriverState *bs,
>                         void (*change_cb)(void *opaque, int reason),
> @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>     BlockDriver *drv = bs->drv;
>     if (!drv)
>         return -ENOMEDIUM;
> +    if (bdrv_is_fd_protocol(bs)) {
> +        return -ENOTSUP;
> +    }
>     if (drv->bdrv_snapshot_create)
>         return drv->bdrv_snapshot_create(bs, sn_info);
>     if (bs->file)
> diff --git a/block.h b/block.h
> index da7d39c..dd46d52 100644
> --- a/block.h
> +++ b/block.h
> @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>  int bdrv_is_read_only(BlockDriverState *bs);
>  int bdrv_is_sg(BlockDriverState *bs);
>  int bdrv_enable_write_cache(BlockDriverState *bs);
> +int bdrv_is_fd_protocol(BlockDriverState *bs);
>  int bdrv_is_inserted(BlockDriverState *bs);
>  int bdrv_media_changed(BlockDriverState *bs);
>  int bdrv_is_locked(BlockDriverState *bs);
> diff --git a/block/cow.c b/block/cow.c
> index 4cf543c..e17f8e7 100644
> --- a/block/cow.c
> +++ b/block/cow.c
> @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>     pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>             cow_header.backing_file);
>
> +    if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +        /* backing file currently not supported by fd: protocol */
> +        goto fail;
> +    }
> +
>     bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
>     s->cow_sectors_offset = (bitmap_size + 511) & ~511;
>     return 0;
> diff --git a/block/qcow.c b/block/qcow.c
> index a26c886..1e46bdb 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>         if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>             goto fail;
>         bs->backing_file[len] = '\0';
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            /* backing file currently not supported by fd: protocol */
> +            goto fail;
> +        }
>     }
>     return 0;
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 8451ded..8b9c160 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>             goto fail;
>         }
>         bs->backing_file[len] = '\0';
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            ret = -ENOTSUP;
> +            goto fail;
> +        }
>     }
>     if (qcow2_read_snapshots(bs) < 0) {
>         ret = -EINVAL;
> diff --git a/block/qed.c b/block/qed.c
> index 3970379..5028897 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>             return ret;
>         }
>
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }
> +
>         if (s->header.features & QED_F_BACKING_FORMAT_NO_PROBE) {
>             pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>         }
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 4cd7d7a..c72de3d 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "block/raw-posix-aio.h"
> +#include "monitor.h"
>
>  #ifdef CONFIG_COCOA
>  #include <paths.h>
> @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>                            int bdrv_flags, int open_flags)
>  {
>     BDRVRawState *s = bs->opaque;
> -    int fd, ret;
> +    int fd = -1;
> +    int ret;
>
>     ret = raw_normalize_devicepath(&filename);
>     if (ret != 0) {
> @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>     if (!(bdrv_flags & BDRV_O_CACHE_WB))
>         s->open_flags |= O_DSYNC;
>
> -    s->fd = -1;
> -    fd = qemu_open(filename, s->open_flags, 0644);
> -    if (fd < 0) {
> -        ret = -errno;
> -        if (ret == -EROFS)
> -            ret = -EACCES;
> -        return ret;
> +    if (s->fd == -1) {
> +        fd = qemu_open(filename, s->open_flags, 0644);
> +        if (fd < 0) {
> +            ret = -errno;
> +            if (ret == -EROFS) {
> +                ret = -EACCES;
> +            }
> +            return ret;
> +        }
> +        s->fd = fd;
>     }
> -    s->fd = fd;
>     s->aligned_buf = NULL;
>
>     if ((bdrv_flags & BDRV_O_NOCACHE)) {
> @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>  {
>     BDRVRawState *s = bs->opaque;
>
> +    s->fd = -1;
>     s->type = FTYPE_FILE;
>     return raw_open_common(bs, filename, flags, 0);
>  }
> @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = {
>     .create_options = raw_create_options,
>  };
>
> +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    const char *fd_str;
> +    int fd;
> +
> +    /* extract the file descriptor - fail if it's not fd: */
> +    if (!strstart(filename, "fd:", &fd_str)) {
> +        return -EINVAL;
> +    }
> +
> +    if (!qemu_isdigit(fd_str[0])) {
> +        /* get fd from monitor */
> +        fd = qemu_get_fd(fd_str);
> +        if (fd == -1) {
> +            return -EBADF;
> +        }
> +    } else {
> +        char *endptr = NULL;
> +
> +        fd = strtol(fd_str, &endptr, 10);
> +        if (*endptr || (fd == 0 && fd_str == endptr)) {
> +            return -EBADF;
> +        }
> +    }
> +
> +    s->fd = fd;
> +    s->type = FTYPE_FILE;
> +
> +    return raw_open_common(bs, filename, flags, 0);
> +}
> +
> +static BlockDriver bdrv_file_fd = {
> +    .format_name = "file",
> +    .protocol_name = "fd",
> +    .instance_size = sizeof(BDRVRawState),
> +    .bdrv_probe = NULL, /* no probe for protocols */
> +    .bdrv_file_open = raw_open_fd,
> +    .bdrv_read = raw_read,
> +    .bdrv_write = raw_write,
> +    .bdrv_close = raw_close,
> +    .bdrv_flush = raw_flush,
> +    .bdrv_discard = raw_discard,
> +
> +    .bdrv_aio_readv = raw_aio_readv,
> +    .bdrv_aio_writev = raw_aio_writev,
> +    .bdrv_aio_flush = raw_aio_flush,
> +
> +    .bdrv_truncate = raw_truncate,
> +    .bdrv_getlength = raw_getlength,
> +
> +    .create_options = raw_create_options,
> +};
> +
>  /***********************************************/
>  /* host device */
>
> @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>     }
>  #endif
>
> +    s->fd = -1;
>     s->type = FTYPE_FILE;
>  #if defined(__linux__)
>     {
> @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>     BDRVRawState *s = bs->opaque;
>     int ret;
>
> +    s->fd = -1;
>     s->type = FTYPE_FD;
>
>     /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
> @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>  {
>     BDRVRawState *s = bs->opaque;
>
> +    s->fd = -1;
>     s->type = FTYPE_CD;
>
>     /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
> @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void)
>      * Register all the drivers.  Note that order is important, the driver
>      * registered last will get probed first.
>      */
> +    bdrv_register(&bdrv_file_fd);
>     bdrv_register(&bdrv_file);
>     bdrv_register(&bdrv_host_device);
>  #ifdef __linux__
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 922b23d..2ea808e 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>             return -1;
>
>         pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            /* backing file currently not supported by fd: protocol */
> +            return -1;
> +        }
>     }
>
>     return 0;
> diff --git a/block_int.h b/block_int.h
> index fa91337..a305ee2 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -152,6 +152,7 @@ struct BlockDriverState {
>     int encrypted; /* if true, the media is encrypted */
>     int valid_key; /* if true, a valid encryption key has been set */
>     int sg;        /* if true, the device is a /dev/sg* */
> +    int fd_protocol; /* if true, the fd: protocol was specified */

bool?

>     /* event callback when inserting/removing */
>     void (*change_cb)(void *opaque, int reason);
>     void *change_opaque;
> diff --git a/blockdev.c b/blockdev.c
> index e81e0ab..a536c20 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>
>     bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> +    if (strncmp(file, "fd:", 3) == 0) {
> +        dinfo->bdrv->fd_protocol = 1;
> +    }
> +
>     ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>     if (ret < 0) {
>         error_report("could not open disk image %s: %s",
> @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
>         goto out;
>     }
>
> +    if (bdrv_is_fd_protocol(bs)) {
> +        qerror_report(QERR_UNSUPPORTED);
> +        ret = -1;
> +        goto out;
> +    }
> +
>     pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>
>     old_drv = bs->drv;
> diff --git a/monitor.c b/monitor.c
> index 59a3e76..ea60be2 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
>     return -1;
>  }
>
> +int qemu_get_fd(const char *fdname)
> +{
> +    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
> +}
> +
>  static const mon_cmd_t mon_cmds[] = {
>  #include "hmp-commands.h"
>     { NULL, NULL, },
> diff --git a/monitor.h b/monitor.h
> index 4f2d328..de5b987 100644
> --- a/monitor.h
> +++ b/monitor.h
> @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
>                                 void *opaque);
>
>  int monitor_get_fd(Monitor *mon, const char *fdname);
> +int qemu_get_fd(const char *fdname);
>
>  void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>     GCC_FMT_ATTR(2, 0);
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 1d5ad8b..f9b66f4 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}).
>  ETEXI
>
>  DEF("drive", HAS_ARG, QEMU_OPTION_drive,
> -    "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
> +    "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>     "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
> @@ -129,10 +129,12 @@ STEXI
>  Define a new drive. Valid options are:
>
>  @table @option
> -@item file=@var{file}
> +@item file=[fd:]@var{file}
>  This option defines which disk image (@pxref{disk_images}) to use with
>  this drive. If the filename contains comma, you must double it
> -(for instance, "file=my,,file" to use file "my,file").
> +(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
> +specifies the file descriptor of an already open disk image.
> +@option{format=}@var{format} is required by @option{fd:}@var{file}.
>  @item if=@var{interface}
>  This option defines on which type on interface the drive is connected.
>  Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
> diff --git a/qemu-tool.c b/qemu-tool.c
> index 41e5c41..8fe6b8c 100644
> --- a/qemu-tool.c
> +++ b/qemu-tool.c
> @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>  {
>     return 0;
>  }
> +
> +int qemu_get_fd(const char *fdname)
> +{
> +    return -1;
> +}
> --
> 1.7.1
>
>
>
Corey Bryant June 16, 2011, 2:48 p.m. UTC | #7
On 06/15/2011 03:12 PM, Blue Swirl wrote:
> On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant<bryntcor@us.ibm.com>  wrote:
>> >  sVirt provides SELinux MAC isolation for Qemu guest processes and their
>> >  corresponding resources (image files). sVirt provides this support
>> >  by labeling guests and resources with security labels that are stored
>> >  in file system extended attributes. Some file systems, such as NFS, do
>> >  not support the extended attribute security namespace, which is needed
>> >  for image file isolation when using the sVirt SELinux security driver
>> >  in libvirt.
>> >
>> >  The proposed solution entails a combination of Qemu, libvirt, and
>> >  SELinux patches that work together to isolate multiple guests' images
>> >  when they're stored in the same NFS mount. This results in an
>> >  environment where sVirt isolation and NFS image file isolation can both
>> >  be provided.
>> >
>> >  This patch contains the Qemu code to support this solution. I would
>> >  like to solicit input from the libvirt community prior to starting
>> >  the libvirt patch.
>> >
>> >  Currently, Qemu opens an image file in addition to performing the
>> >  necessary read and write operations. The proposed solution will move
>> >  the open out of Qemu and into libvirt. Once libvirt opens an image
>> >  file for the guest, it will pass the file descriptor to Qemu via a
>> >  new fd: protocol.
>> >
>> >  If the image file resides in an NFS mount, the following SELinux policy
>> >  changes will provide image isolation:
>> >
>> >    - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>> >      allow Qemu (svirt_t) to only have SELinux read and write
>> >      permissions on nfs_t files
>> >
>> >    - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>> >      (virtd_t) file descriptors
>> >
>> >  Following is a sample invocation of Qemu using the fd: protocol on
>> >  the command line:
>> >
>> >      qemu -drive file=fd:4,format=qcow2
>> >
>> >  The fd: protocol is also supported by the drive_add monitor command.
>> >  This requires that the specified file descriptor is passed to the
>> >  monitor alongside a prior getfd monitor command.
>> >
>> >  There are some additional features provided by certain image types
>> >  where Qemu reopens the image file. All of these scenarios will be
>> >  unsupported for the fd: protocol, at least for this patch:
>> >
>> >    - The -snapshot command line option
>> >    - The savevm monitor command
>> >    - The snapshot_blkdev monitor command
>> >    - Starting Qemu with a backing file
> There's also native CDROM device. Did you consider adding an explicit
> reopen method to block layer?
Thanks. Yes it looks like I overlooked CDROM reopens.

I'm not sure that I'm clear on the purpose of the reopen function. 
Would the goal be to funnel all block layer reopens through a single 
function, enabling potential future support where a privileged layer of 
Qemu, or libvirt, performs the open?


>
>> >  The thought is that this support can be added in the future, but is
>> >  not required for the initial fd: support.
>> >
>> >  This patch was tested with the following formats: raw, cow, qcow,
>> >  qcow2, qed, and vmdk, using the fd: protocol from the command line
>> >  and the monitor. Tests were also run to verify existing file name
>> >  support and qemu-img were not regressed. Non-valid file descriptors,
>> >  fd: without format, snapshot and backing files were also tested.
>> >
>> >  Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
>> >  ---
>> >    block.c           |   16 ++++++++++
>> >    block.h           |    1 +
>> >    block/cow.c       |    5 +++
>> >    block/qcow.c      |    5 +++
>> >    block/qcow2.c     |    5 +++
>> >    block/qed.c       |    4 ++
>> >    block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>> >    block/vmdk.c      |    5 +++
>> >    block_int.h       |    1 +
>> >    blockdev.c        |   10 ++++++
>> >    monitor.c         |    5 +++
>> >    monitor.h         |    1 +
>> >    qemu-options.hx   |    8 +++--
>> >    qemu-tool.c       |    5 +++
>> >    14 files changed, 140 insertions(+), 12 deletions(-)
>> >
>> >  diff --git a/block.c b/block.c
>> >  index 24a25d5..500db84 100644
>> >  --- a/block.c
>> >  +++ b/block.c
>> >  @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> >           char tmp_filename[PATH_MAX];
>> >           char backing_filename[PATH_MAX];
>> >
>> >  +        if (bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
>> >  +
>> >           /* if snapshot, we create a temporary backing file and open it
>> >              instead of opening 'filename' directly */
>> >
>> >  @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> >
>> >       /* Find the right image format driver */
>> >       if (!drv) {
>> >  +        /* format must be specified for fd: protocol */
>> >  +        if (bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
>> >           ret = find_image_format(filename,&drv);
>> >       }
>> >
>> >  @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>> >       return bs->enable_write_cache;
>> >    }
>> >
>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs)
>> >  +{
>> >  +    return bs->fd_protocol;
>> >  +}
>> >  +
>> >    /* XXX: no longer used */
>> >    void bdrv_set_change_cb(BlockDriverState *bs,
>> >                           void (*change_cb)(void *opaque, int reason),
>> >  @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>> >       BlockDriver *drv = bs->drv;
>> >       if (!drv)
>> >           return -ENOMEDIUM;
>> >  +    if (bdrv_is_fd_protocol(bs)) {
>> >  +        return -ENOTSUP;
>> >  +    }
>> >       if (drv->bdrv_snapshot_create)
>> >           return drv->bdrv_snapshot_create(bs, sn_info);
>> >       if (bs->file)
>> >  diff --git a/block.h b/block.h
>> >  index da7d39c..dd46d52 100644
>> >  --- a/block.h
>> >  +++ b/block.h
>> >  @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>> >    int bdrv_is_read_only(BlockDriverState *bs);
>> >    int bdrv_is_sg(BlockDriverState *bs);
>> >    int bdrv_enable_write_cache(BlockDriverState *bs);
>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs);
>> >    int bdrv_is_inserted(BlockDriverState *bs);
>> >    int bdrv_media_changed(BlockDriverState *bs);
>> >    int bdrv_is_locked(BlockDriverState *bs);
>> >  diff --git a/block/cow.c b/block/cow.c
>> >  index 4cf543c..e17f8e7 100644
>> >  --- a/block/cow.c
>> >  +++ b/block/cow.c
>> >  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>> >       pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>> >               cow_header.backing_file);
>> >
>> >  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +        /* backing file currently not supported by fd: protocol */
>> >  +        goto fail;
>> >  +    }
>> >  +
>> >       bitmap_size = ((bs->total_sectors + 7)>>  3) + sizeof(cow_header);
>> >       s->cow_sectors_offset = (bitmap_size + 511)&  ~511;
>> >       return 0;
>> >  diff --git a/block/qcow.c b/block/qcow.c
>> >  index a26c886..1e46bdb 100644
>> >  --- a/block/qcow.c
>> >  +++ b/block/qcow.c
>> >  @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>> >           if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>> >               goto fail;
>> >           bs->backing_file[len] = '\0';
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            /* backing file currently not supported by fd: protocol */
>> >  +            goto fail;
>> >  +        }
>> >       }
>> >       return 0;
>> >
>> >  diff --git a/block/qcow2.c b/block/qcow2.c
>> >  index 8451ded..8b9c160 100644
>> >  --- a/block/qcow2.c
>> >  +++ b/block/qcow2.c
>> >  @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>> >               goto fail;
>> >           }
>> >           bs->backing_file[len] = '\0';
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            ret = -ENOTSUP;
>> >  +            goto fail;
>> >  +        }
>> >       }
>> >       if (qcow2_read_snapshots(bs)<  0) {
>> >           ret = -EINVAL;
>> >  diff --git a/block/qed.c b/block/qed.c
>> >  index 3970379..5028897 100644
>> >  --- a/block/qed.c
>> >  +++ b/block/qed.c
>> >  @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>> >               return ret;
>> >           }
>> >
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
>> >  +
>> >           if (s->header.features&  QED_F_BACKING_FORMAT_NO_PROBE) {
>> >               pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>> >           }
>> >  diff --git a/block/raw-posix.c b/block/raw-posix.c
>> >  index 4cd7d7a..c72de3d 100644
>> >  --- a/block/raw-posix.c
>> >  +++ b/block/raw-posix.c
>> >  @@ -28,6 +28,7 @@
>> >    #include "block_int.h"
>> >    #include "module.h"
>> >    #include "block/raw-posix-aio.h"
>> >  +#include "monitor.h"
>> >
>> >    #ifdef CONFIG_COCOA
>> >    #include<paths.h>
>> >  @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> >                              int bdrv_flags, int open_flags)
>> >    {
>> >       BDRVRawState *s = bs->opaque;
>> >  -    int fd, ret;
>> >  +    int fd = -1;
>> >  +    int ret;
>> >
>> >       ret = raw_normalize_devicepath(&filename);
>> >       if (ret != 0) {
>> >  @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> >       if (!(bdrv_flags&  BDRV_O_CACHE_WB))
>> >           s->open_flags |= O_DSYNC;
>> >
>> >  -    s->fd = -1;
>> >  -    fd = qemu_open(filename, s->open_flags, 0644);
>> >  -    if (fd<  0) {
>> >  -        ret = -errno;
>> >  -        if (ret == -EROFS)
>> >  -            ret = -EACCES;
>> >  -        return ret;
>> >  +    if (s->fd == -1) {
>> >  +        fd = qemu_open(filename, s->open_flags, 0644);
>> >  +        if (fd<  0) {
>> >  +            ret = -errno;
>> >  +            if (ret == -EROFS) {
>> >  +                ret = -EACCES;
>> >  +            }
>> >  +            return ret;
>> >  +        }
>> >  +        s->fd = fd;
>> >       }
>> >  -    s->fd = fd;
>> >       s->aligned_buf = NULL;
>> >
>> >       if ((bdrv_flags&  BDRV_O_NOCACHE)) {
>> >  @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>> >    {
>> >       BDRVRawState *s = bs->opaque;
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_FILE;
>> >       return raw_open_common(bs, filename, flags, 0);
>> >    }
>> >  @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = {
>> >       .create_options = raw_create_options,
>> >    };
>> >
>> >  +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
>> >  +{
>> >  +    BDRVRawState *s = bs->opaque;
>> >  +    const char *fd_str;
>> >  +    int fd;
>> >  +
>> >  +    /* extract the file descriptor - fail if it's not fd: */
>> >  +    if (!strstart(filename, "fd:",&fd_str)) {
>> >  +        return -EINVAL;
>> >  +    }
>> >  +
>> >  +    if (!qemu_isdigit(fd_str[0])) {
>> >  +        /* get fd from monitor */
>> >  +        fd = qemu_get_fd(fd_str);
>> >  +        if (fd == -1) {
>> >  +            return -EBADF;
>> >  +        }
>> >  +    } else {
>> >  +        char *endptr = NULL;
>> >  +
>> >  +        fd = strtol(fd_str,&endptr, 10);
>> >  +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
>> >  +            return -EBADF;
>> >  +        }
>> >  +    }
>> >  +
>> >  +    s->fd = fd;
>> >  +    s->type = FTYPE_FILE;
>> >  +
>> >  +    return raw_open_common(bs, filename, flags, 0);
>> >  +}
>> >  +
>> >  +static BlockDriver bdrv_file_fd = {
>> >  +    .format_name = "file",
>> >  +    .protocol_name = "fd",
>> >  +    .instance_size = sizeof(BDRVRawState),
>> >  +    .bdrv_probe = NULL, /* no probe for protocols */
>> >  +    .bdrv_file_open = raw_open_fd,
>> >  +    .bdrv_read = raw_read,
>> >  +    .bdrv_write = raw_write,
>> >  +    .bdrv_close = raw_close,
>> >  +    .bdrv_flush = raw_flush,
>> >  +    .bdrv_discard = raw_discard,
>> >  +
>> >  +    .bdrv_aio_readv = raw_aio_readv,
>> >  +    .bdrv_aio_writev = raw_aio_writev,
>> >  +    .bdrv_aio_flush = raw_aio_flush,
>> >  +
>> >  +    .bdrv_truncate = raw_truncate,
>> >  +    .bdrv_getlength = raw_getlength,
>> >  +
>> >  +    .create_options = raw_create_options,
>> >  +};
>> >  +
>> >    /***********************************************/
>> >    /* host device */
>> >
>> >  @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>> >       }
>> >    #endif
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_FILE;
>> >    #if defined(__linux__)
>> >       {
>> >  @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>> >       BDRVRawState *s = bs->opaque;
>> >       int ret;
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_FD;
>> >
>> >       /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
>> >  @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>> >    {
>> >       BDRVRawState *s = bs->opaque;
>> >
>> >  +    s->fd = -1;
>> >       s->type = FTYPE_CD;
>> >
>> >       /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
>> >  @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void)
>> >        * Register all the drivers.  Note that order is important, the driver
>> >        * registered last will get probed first.
>> >        */
>> >  +    bdrv_register(&bdrv_file_fd);
>> >       bdrv_register(&bdrv_file);
>> >       bdrv_register(&bdrv_host_device);
>> >    #ifdef __linux__
>> >  diff --git a/block/vmdk.c b/block/vmdk.c
>> >  index 922b23d..2ea808e 100644
>> >  --- a/block/vmdk.c
>> >  +++ b/block/vmdk.c
>> >  @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>> >               return -1;
>> >
>> >           pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            /* backing file currently not supported by fd: protocol */
>> >  +            return -1;
>> >  +        }
>> >       }
>> >
>> >       return 0;
>> >  diff --git a/block_int.h b/block_int.h
>> >  index fa91337..a305ee2 100644
>> >  --- a/block_int.h
>> >  +++ b/block_int.h
>> >  @@ -152,6 +152,7 @@ struct BlockDriverState {
>> >       int encrypted; /* if true, the media is encrypted */
>> >       int valid_key; /* if true, a valid encryption key has been set */
>> >       int sg;        /* if true, the device is a /dev/sg* */
>> >  +    int fd_protocol; /* if true, the fd: protocol was specified */
> bool?
>
I was following suit here, but I agree that bool would be better.  Or 
better, these could all be reduced to bit flags.  My thought is that 
I'll stick with following suit here though.


>> >       /* event callback when inserting/removing */
>> >       void (*change_cb)(void *opaque, int reason);
>> >       void *change_opaque;
>> >  diff --git a/blockdev.c b/blockdev.c
>> >  index e81e0ab..a536c20 100644
>> >  --- a/blockdev.c
>> >  +++ b/blockdev.c
>> >  @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>> >
>> >       bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>> >
>> >  +    if (strncmp(file, "fd:", 3) == 0) {
>> >  +        dinfo->bdrv->fd_protocol = 1;
>> >  +    }
>> >  +
>> >       ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>> >       if (ret<  0) {
>> >           error_report("could not open disk image %s: %s",
>> >  @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
>> >           goto out;
>> >       }
>> >
>> >  +    if (bdrv_is_fd_protocol(bs)) {
>> >  +        qerror_report(QERR_UNSUPPORTED);
>> >  +        ret = -1;
>> >  +        goto out;
>> >  +    }
>> >  +
>> >       pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>> >
>> >       old_drv = bs->drv;
>> >  diff --git a/monitor.c b/monitor.c
>> >  index 59a3e76..ea60be2 100644
>> >  --- a/monitor.c
>> >  +++ b/monitor.c
>> >  @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
>> >       return -1;
>> >    }
>> >
>> >  +int qemu_get_fd(const char *fdname)
>> >  +{
>> >  +    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
>> >  +}
>> >  +
>> >    static const mon_cmd_t mon_cmds[] = {
>> >    #include "hmp-commands.h"
>> >       { NULL, NULL, },
>> >  diff --git a/monitor.h b/monitor.h
>> >  index 4f2d328..de5b987 100644
>> >  --- a/monitor.h
>> >  +++ b/monitor.h
>> >  @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
>> >                                   void *opaque);
>> >
>> >    int monitor_get_fd(Monitor *mon, const char *fdname);
>> >  +int qemu_get_fd(const char *fdname);
>> >
>> >    void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>> >       GCC_FMT_ATTR(2, 0);
>> >  diff --git a/qemu-options.hx b/qemu-options.hx
>> >  index 1d5ad8b..f9b66f4 100644
>> >  --- a/qemu-options.hx
>> >  +++ b/qemu-options.hx
>> >  @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}).
>> >    ETEXI
>> >
>> >    DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>> >  -    "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>> >  +    "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>> >       "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>> >       "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>> >       "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>> >  @@ -129,10 +129,12 @@ STEXI
>> >    Define a new drive. Valid options are:
>> >
>> >    @table @option
>> >  -@item file=@var{file}
>> >  +@item file=[fd:]@var{file}
>> >    This option defines which disk image (@pxref{disk_images}) to use with
>> >    this drive. If the filename contains comma, you must double it
>> >  -(for instance, "file=my,,file" to use file "my,file").
>> >  +(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
>> >  +specifies the file descriptor of an already open disk image.
>> >  +@option{format=}@var{format} is required by @option{fd:}@var{file}.
>> >    @item if=@var{interface}
>> >    This option defines on which type on interface the drive is connected.
>> >    Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
>> >  diff --git a/qemu-tool.c b/qemu-tool.c
>> >  index 41e5c41..8fe6b8c 100644
>> >  --- a/qemu-tool.c
>> >  +++ b/qemu-tool.c
>> >  @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>> >    {
>> >       return 0;
>> >    }
>> >  +
>> >  +int qemu_get_fd(const char *fdname)
>> >  +{
>> >  +    return -1;
>> >  +}
>> >  --
>> >  1.7.1
>> >
>> >
>> >


Regards,
Corey Bryant
Blue Swirl June 18, 2011, 8:50 p.m. UTC | #8
On Thu, Jun 16, 2011 at 5:48 PM, Corey Bryant <bryntcor@us.ibm.com> wrote:
>
>
> On 06/15/2011 03:12 PM, Blue Swirl wrote:
>>
>> On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant<bryntcor@us.ibm.com>  wrote:
>>>
>>> >  sVirt provides SELinux MAC isolation for Qemu guest processes and
>>> > their
>>> >  corresponding resources (image files). sVirt provides this support
>>> >  by labeling guests and resources with security labels that are stored
>>> >  in file system extended attributes. Some file systems, such as NFS, do
>>> >  not support the extended attribute security namespace, which is needed
>>> >  for image file isolation when using the sVirt SELinux security driver
>>> >  in libvirt.
>>> >
>>> >  The proposed solution entails a combination of Qemu, libvirt, and
>>> >  SELinux patches that work together to isolate multiple guests' images
>>> >  when they're stored in the same NFS mount. This results in an
>>> >  environment where sVirt isolation and NFS image file isolation can
>>> > both
>>> >  be provided.
>>> >
>>> >  This patch contains the Qemu code to support this solution. I would
>>> >  like to solicit input from the libvirt community prior to starting
>>> >  the libvirt patch.
>>> >
>>> >  Currently, Qemu opens an image file in addition to performing the
>>> >  necessary read and write operations. The proposed solution will move
>>> >  the open out of Qemu and into libvirt. Once libvirt opens an image
>>> >  file for the guest, it will pass the file descriptor to Qemu via a
>>> >  new fd: protocol.
>>> >
>>> >  If the image file resides in an NFS mount, the following SELinux
>>> > policy
>>> >  changes will provide image isolation:
>>> >
>>> >    - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>>> >      allow Qemu (svirt_t) to only have SELinux read and write
>>> >      permissions on nfs_t files
>>> >
>>> >    - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>>> >      (virtd_t) file descriptors
>>> >
>>> >  Following is a sample invocation of Qemu using the fd: protocol on
>>> >  the command line:
>>> >
>>> >      qemu -drive file=fd:4,format=qcow2
>>> >
>>> >  The fd: protocol is also supported by the drive_add monitor command.
>>> >  This requires that the specified file descriptor is passed to the
>>> >  monitor alongside a prior getfd monitor command.
>>> >
>>> >  There are some additional features provided by certain image types
>>> >  where Qemu reopens the image file. All of these scenarios will be
>>> >  unsupported for the fd: protocol, at least for this patch:
>>> >
>>> >    - The -snapshot command line option
>>> >    - The savevm monitor command
>>> >    - The snapshot_blkdev monitor command
>>> >    - Starting Qemu with a backing file
>>
>> There's also native CDROM device. Did you consider adding an explicit
>> reopen method to block layer?
>
> Thanks. Yes it looks like I overlooked CDROM reopens.
>
> I'm not sure that I'm clear on the purpose of the reopen function. Would the
> goal be to funnel all block layer reopens through a single function,
> enabling potential future support where a privileged layer of Qemu, or
> libvirt, performs the open?

Eventually yes, but I think it would help also now by moving the
checks to a single place. It's a bit orthogonal to this patch though.

>>> >  The thought is that this support can be added in the future, but is
>>> >  not required for the initial fd: support.
>>> >
>>> >  This patch was tested with the following formats: raw, cow, qcow,
>>> >  qcow2, qed, and vmdk, using the fd: protocol from the command line
>>> >  and the monitor. Tests were also run to verify existing file name
>>> >  support and qemu-img were not regressed. Non-valid file descriptors,
>>> >  fd: without format, snapshot and backing files were also tested.
>>> >
>>> >  Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
>>> >  ---
>>> >    block.c           |   16 ++++++++++
>>> >    block.h           |    1 +
>>> >    block/cow.c       |    5 +++
>>> >    block/qcow.c      |    5 +++
>>> >    block/qcow2.c     |    5 +++
>>> >    block/qed.c       |    4 ++
>>> >    block/raw-posix.c |   81
>>> > +++++++++++++++++++++++++++++++++++++++++++++++------
>>> >    block/vmdk.c      |    5 +++
>>> >    block_int.h       |    1 +
>>> >    blockdev.c        |   10 ++++++
>>> >    monitor.c         |    5 +++
>>> >    monitor.h         |    1 +
>>> >    qemu-options.hx   |    8 +++--
>>> >    qemu-tool.c       |    5 +++
>>> >    14 files changed, 140 insertions(+), 12 deletions(-)
>>> >
>>> >  diff --git a/block.c b/block.c
>>> >  index 24a25d5..500db84 100644
>>> >  --- a/block.c
>>> >  +++ b/block.c
>>> >  @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char
>>> > *filename, int flags,
>>> >           char tmp_filename[PATH_MAX];
>>> >           char backing_filename[PATH_MAX];
>>> >
>>> >  +        if (bdrv_is_fd_protocol(bs)) {
>>> >  +            return -ENOTSUP;
>>> >  +        }
>>> >  +
>>> >           /* if snapshot, we create a temporary backing file and open
>>> > it
>>> >              instead of opening 'filename' directly */
>>> >
>>> >  @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char
>>> > *filename, int flags,
>>> >
>>> >       /* Find the right image format driver */
>>> >       if (!drv) {
>>> >  +        /* format must be specified for fd: protocol */
>>> >  +        if (bdrv_is_fd_protocol(bs)) {
>>> >  +            return -ENOTSUP;
>>> >  +        }
>>> >           ret = find_image_format(filename,&drv);
>>> >       }
>>> >
>>> >  @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState
>>> > *bs)
>>> >       return bs->enable_write_cache;
>>> >    }
>>> >
>>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs)
>>> >  +{
>>> >  +    return bs->fd_protocol;
>>> >  +}
>>> >  +
>>> >    /* XXX: no longer used */
>>> >    void bdrv_set_change_cb(BlockDriverState *bs,
>>> >                           void (*change_cb)(void *opaque, int reason),
>>> >  @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>>> >       BlockDriver *drv = bs->drv;
>>> >       if (!drv)
>>> >           return -ENOMEDIUM;
>>> >  +    if (bdrv_is_fd_protocol(bs)) {
>>> >  +        return -ENOTSUP;
>>> >  +    }
>>> >       if (drv->bdrv_snapshot_create)
>>> >           return drv->bdrv_snapshot_create(bs, sn_info);
>>> >       if (bs->file)
>>> >  diff --git a/block.h b/block.h
>>> >  index da7d39c..dd46d52 100644
>>> >  --- a/block.h
>>> >  +++ b/block.h
>>> >  @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>>> >    int bdrv_is_read_only(BlockDriverState *bs);
>>> >    int bdrv_is_sg(BlockDriverState *bs);
>>> >    int bdrv_enable_write_cache(BlockDriverState *bs);
>>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs);
>>> >    int bdrv_is_inserted(BlockDriverState *bs);
>>> >    int bdrv_media_changed(BlockDriverState *bs);
>>> >    int bdrv_is_locked(BlockDriverState *bs);
>>> >  diff --git a/block/cow.c b/block/cow.c
>>> >  index 4cf543c..e17f8e7 100644
>>> >  --- a/block/cow.c
>>> >  +++ b/block/cow.c
>>> >  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int
>>> > flags)
>>> >       pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>>> >               cow_header.backing_file);
>>> >
>>> >  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>> >  +        /* backing file currently not supported by fd: protocol */
>>> >  +        goto fail;
>>> >  +    }
>>> >  +
>>> >       bitmap_size = ((bs->total_sectors + 7)>>  3) +
>>> > sizeof(cow_header);
>>> >       s->cow_sectors_offset = (bitmap_size + 511)&  ~511;
>>> >       return 0;
>>> >  diff --git a/block/qcow.c b/block/qcow.c
>>> >  index a26c886..1e46bdb 100644
>>> >  --- a/block/qcow.c
>>> >  +++ b/block/qcow.c
>>> >  @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int
>>> > flags)
>>> >           if (bdrv_pread(bs->file, header.backing_file_offset,
>>> > bs->backing_file, len) != len)
>>> >               goto fail;
>>> >           bs->backing_file[len] = '\0';
>>> >  +
>>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>> >  +            /* backing file currently not supported by fd: protocol
>>> > */
>>> >  +            goto fail;
>>> >  +        }
>>> >       }
>>> >       return 0;
>>> >
>>> >  diff --git a/block/qcow2.c b/block/qcow2.c
>>> >  index 8451ded..8b9c160 100644
>>> >  --- a/block/qcow2.c
>>> >  +++ b/block/qcow2.c
>>> >  @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int
>>> > flags)
>>> >               goto fail;
>>> >           }
>>> >           bs->backing_file[len] = '\0';
>>> >  +
>>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>> >  +            ret = -ENOTSUP;
>>> >  +            goto fail;
>>> >  +        }
>>> >       }
>>> >       if (qcow2_read_snapshots(bs)<  0) {
>>> >           ret = -EINVAL;
>>> >  diff --git a/block/qed.c b/block/qed.c
>>> >  index 3970379..5028897 100644
>>> >  --- a/block/qed.c
>>> >  +++ b/block/qed.c
>>> >  @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs,
>>> > int flags)
>>> >               return ret;
>>> >           }
>>> >
>>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>> >  +            return -ENOTSUP;
>>> >  +        }
>>> >  +
>>> >           if (s->header.features&  QED_F_BACKING_FORMAT_NO_PROBE) {
>>> >               pstrcpy(bs->backing_format, sizeof(bs->backing_format),
>>> > "raw");
>>> >           }
>>> >  diff --git a/block/raw-posix.c b/block/raw-posix.c
>>> >  index 4cd7d7a..c72de3d 100644
>>> >  --- a/block/raw-posix.c
>>> >  +++ b/block/raw-posix.c
>>> >  @@ -28,6 +28,7 @@
>>> >    #include "block_int.h"
>>> >    #include "module.h"
>>> >    #include "block/raw-posix-aio.h"
>>> >  +#include "monitor.h"
>>> >
>>> >    #ifdef CONFIG_COCOA
>>> >    #include<paths.h>
>>> >  @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs,
>>> > const char *filename,
>>> >                              int bdrv_flags, int open_flags)
>>> >    {
>>> >       BDRVRawState *s = bs->opaque;
>>> >  -    int fd, ret;
>>> >  +    int fd = -1;
>>> >  +    int ret;
>>> >
>>> >       ret = raw_normalize_devicepath(&filename);
>>> >       if (ret != 0) {
>>> >  @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs,
>>> > const char *filename,
>>> >       if (!(bdrv_flags&  BDRV_O_CACHE_WB))
>>> >           s->open_flags |= O_DSYNC;
>>> >
>>> >  -    s->fd = -1;
>>> >  -    fd = qemu_open(filename, s->open_flags, 0644);
>>> >  -    if (fd<  0) {
>>> >  -        ret = -errno;
>>> >  -        if (ret == -EROFS)
>>> >  -            ret = -EACCES;
>>> >  -        return ret;
>>> >  +    if (s->fd == -1) {
>>> >  +        fd = qemu_open(filename, s->open_flags, 0644);
>>> >  +        if (fd<  0) {
>>> >  +            ret = -errno;
>>> >  +            if (ret == -EROFS) {
>>> >  +                ret = -EACCES;
>>> >  +            }
>>> >  +            return ret;
>>> >  +        }
>>> >  +        s->fd = fd;
>>> >       }
>>> >  -    s->fd = fd;
>>> >       s->aligned_buf = NULL;
>>> >
>>> >       if ((bdrv_flags&  BDRV_O_NOCACHE)) {
>>> >  @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const
>>> > char *filename, int flags)
>>> >    {
>>> >       BDRVRawState *s = bs->opaque;
>>> >
>>> >  +    s->fd = -1;
>>> >       s->type = FTYPE_FILE;
>>> >       return raw_open_common(bs, filename, flags, 0);
>>> >    }
>>> >  @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = {
>>> >       .create_options = raw_create_options,
>>> >    };
>>> >
>>> >  +static int raw_open_fd(BlockDriverState *bs, const char *filename,
>>> > int flags)
>>> >  +{
>>> >  +    BDRVRawState *s = bs->opaque;
>>> >  +    const char *fd_str;
>>> >  +    int fd;
>>> >  +
>>> >  +    /* extract the file descriptor - fail if it's not fd: */
>>> >  +    if (!strstart(filename, "fd:",&fd_str)) {
>>> >  +        return -EINVAL;
>>> >  +    }
>>> >  +
>>> >  +    if (!qemu_isdigit(fd_str[0])) {
>>> >  +        /* get fd from monitor */
>>> >  +        fd = qemu_get_fd(fd_str);
>>> >  +        if (fd == -1) {
>>> >  +            return -EBADF;
>>> >  +        }
>>> >  +    } else {
>>> >  +        char *endptr = NULL;
>>> >  +
>>> >  +        fd = strtol(fd_str,&endptr, 10);
>>> >  +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
>>> >  +            return -EBADF;
>>> >  +        }
>>> >  +    }
>>> >  +
>>> >  +    s->fd = fd;
>>> >  +    s->type = FTYPE_FILE;
>>> >  +
>>> >  +    return raw_open_common(bs, filename, flags, 0);
>>> >  +}
>>> >  +
>>> >  +static BlockDriver bdrv_file_fd = {
>>> >  +    .format_name = "file",
>>> >  +    .protocol_name = "fd",
>>> >  +    .instance_size = sizeof(BDRVRawState),
>>> >  +    .bdrv_probe = NULL, /* no probe for protocols */
>>> >  +    .bdrv_file_open = raw_open_fd,
>>> >  +    .bdrv_read = raw_read,
>>> >  +    .bdrv_write = raw_write,
>>> >  +    .bdrv_close = raw_close,
>>> >  +    .bdrv_flush = raw_flush,
>>> >  +    .bdrv_discard = raw_discard,
>>> >  +
>>> >  +    .bdrv_aio_readv = raw_aio_readv,
>>> >  +    .bdrv_aio_writev = raw_aio_writev,
>>> >  +    .bdrv_aio_flush = raw_aio_flush,
>>> >  +
>>> >  +    .bdrv_truncate = raw_truncate,
>>> >  +    .bdrv_getlength = raw_getlength,
>>> >  +
>>> >  +    .create_options = raw_create_options,
>>> >  +};
>>> >  +
>>> >    /***********************************************/
>>> >    /* host device */
>>> >
>>> >  @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const
>>> > char *filename, int flags)
>>> >       }
>>> >    #endif
>>> >
>>> >  +    s->fd = -1;
>>> >       s->type = FTYPE_FILE;
>>> >    #if defined(__linux__)
>>> >       {
>>> >  @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs,
>>> > const char *filename, int flags)
>>> >       BDRVRawState *s = bs->opaque;
>>> >       int ret;
>>> >
>>> >  +    s->fd = -1;
>>> >       s->type = FTYPE_FD;
>>> >
>>> >       /* open will not fail even if no floppy is inserted, so add
>>> > O_NONBLOCK */
>>> >  @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs,
>>> > const char *filename, int flags)
>>> >    {
>>> >       BDRVRawState *s = bs->opaque;
>>> >
>>> >  +    s->fd = -1;
>>> >       s->type = FTYPE_CD;
>>> >
>>> >       /* open will not fail even if no CD is inserted, so add
>>> > O_NONBLOCK */
>>> >  @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void)
>>> >        * Register all the drivers.  Note that order is important, the
>>> > driver
>>> >        * registered last will get probed first.
>>> >        */
>>> >  +    bdrv_register(&bdrv_file_fd);
>>> >       bdrv_register(&bdrv_file);
>>> >       bdrv_register(&bdrv_host_device);
>>> >    #ifdef __linux__
>>> >  diff --git a/block/vmdk.c b/block/vmdk.c
>>> >  index 922b23d..2ea808e 100644
>>> >  --- a/block/vmdk.c
>>> >  +++ b/block/vmdk.c
>>> >  @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>>> >               return -1;
>>> >
>>> >           pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>>> >  +
>>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>> >  +            /* backing file currently not supported by fd: protocol
>>> > */
>>> >  +            return -1;
>>> >  +        }
>>> >       }
>>> >
>>> >       return 0;
>>> >  diff --git a/block_int.h b/block_int.h
>>> >  index fa91337..a305ee2 100644
>>> >  --- a/block_int.h
>>> >  +++ b/block_int.h
>>> >  @@ -152,6 +152,7 @@ struct BlockDriverState {
>>> >       int encrypted; /* if true, the media is encrypted */
>>> >       int valid_key; /* if true, a valid encryption key has been set */
>>> >       int sg;        /* if true, the device is a /dev/sg* */
>>> >  +    int fd_protocol; /* if true, the fd: protocol was specified */
>>
>> bool?
>>
> I was following suit here, but I agree that bool would be better.  Or
> better, these could all be reduced to bit flags.  My thought is that I'll
> stick with following suit here though.

OK, better convert all at once with a separate patch.

>>> >       /* event callback when inserting/removing */
>>> >       void (*change_cb)(void *opaque, int reason);
>>> >       void *change_opaque;
>>> >  diff --git a/blockdev.c b/blockdev.c
>>> >  index e81e0ab..a536c20 100644
>>> >  --- a/blockdev.c
>>> >  +++ b/blockdev.c
>>> >  @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int
>>> > default_to_scsi)
>>> >
>>> >       bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>>> >
>>> >  +    if (strncmp(file, "fd:", 3) == 0) {
>>> >  +        dinfo->bdrv->fd_protocol = 1;
>>> >  +    }
>>> >  +
>>> >       ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>>> >       if (ret<  0) {
>>> >           error_report("could not open disk image %s: %s",
>>> >  @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict
>>> > *qdict, QObject **ret_data)
>>> >           goto out;
>>> >       }
>>> >
>>> >  +    if (bdrv_is_fd_protocol(bs)) {
>>> >  +        qerror_report(QERR_UNSUPPORTED);
>>> >  +        ret = -1;
>>> >  +        goto out;
>>> >  +    }
>>> >  +
>>> >       pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>>> >
>>> >       old_drv = bs->drv;
>>> >  diff --git a/monitor.c b/monitor.c
>>> >  index 59a3e76..ea60be2 100644
>>> >  --- a/monitor.c
>>> >  +++ b/monitor.c
>>> >  @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char
>>> > *fdname)
>>> >       return -1;
>>> >    }
>>> >
>>> >  +int qemu_get_fd(const char *fdname)
>>> >  +{
>>> >  +    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
>>> >  +}
>>> >  +
>>> >    static const mon_cmd_t mon_cmds[] = {
>>> >    #include "hmp-commands.h"
>>> >       { NULL, NULL, },
>>> >  diff --git a/monitor.h b/monitor.h
>>> >  index 4f2d328..de5b987 100644
>>> >  --- a/monitor.h
>>> >  +++ b/monitor.h
>>> >  @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon,
>>> > BlockDriverState *bs,
>>> >                                   void *opaque);
>>> >
>>> >    int monitor_get_fd(Monitor *mon, const char *fdname);
>>> >  +int qemu_get_fd(const char *fdname);
>>> >
>>> >    void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>>> >       GCC_FMT_ATTR(2, 0);
>>> >  diff --git a/qemu-options.hx b/qemu-options.hx
>>> >  index 1d5ad8b..f9b66f4 100644
>>> >  --- a/qemu-options.hx
>>> >  +++ b/qemu-options.hx
>>> >  @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename
>>> > (@pxref{host_drives}).
>>> >    ETEXI
>>> >
>>> >    DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>>> >  -    "-drive
>>> > [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>>> >  +    "-drive
>>> > [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>>> >       "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>>> >       "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>>> >       "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>>> >  @@ -129,10 +129,12 @@ STEXI
>>> >    Define a new drive. Valid options are:
>>> >
>>> >    @table @option
>>> >  -@item file=@var{file}
>>> >  +@item file=[fd:]@var{file}
>>> >    This option defines which disk image (@pxref{disk_images}) to use
>>> > with
>>> >    this drive. If the filename contains comma, you must double it
>>> >  -(for instance, "file=my,,file" to use file "my,file").
>>> >  +(for instance, "file=my,,file" to use file "my,file").
>>> > @option{fd:}@var{file}
>>> >  +specifies the file descriptor of an already open disk image.
>>> >  +@option{format=}@var{format} is required by @option{fd:}@var{file}.
>>> >    @item if=@var{interface}
>>> >    This option defines on which type on interface the drive is
>>> > connected.
>>> >    Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
>>> >  diff --git a/qemu-tool.c b/qemu-tool.c
>>> >  index 41e5c41..8fe6b8c 100644
>>> >  --- a/qemu-tool.c
>>> >  +++ b/qemu-tool.c
>>> >  @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>>> >    {
>>> >       return 0;
>>> >    }
>>> >  +
>>> >  +int qemu_get_fd(const char *fdname)
>>> >  +{
>>> >  +    return -1;
>>> >  +}
>>> >  --
>>> >  1.7.1
>>> >
>>> >
>>> >
>
>
> Regards,
> Corey Bryant
>
Corey Bryant June 20, 2011, 1:08 p.m. UTC | #9
On 06/18/2011 04:50 PM, Blue Swirl wrote:
> On Thu, Jun 16, 2011 at 5:48 PM, Corey Bryant<bryntcor@us.ibm.com>  wrote:
>>
>>
>> On 06/15/2011 03:12 PM, Blue Swirl wrote:
>>>
>>> On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant<bryntcor@us.ibm.com>    wrote:
>>>>
>>>>>   sVirt provides SELinux MAC isolation for Qemu guest processes and
>>>>> their
>>>>>   corresponding resources (image files). sVirt provides this support
>>>>>   by labeling guests and resources with security labels that are stored
>>>>>   in file system extended attributes. Some file systems, such as NFS, do
>>>>>   not support the extended attribute security namespace, which is needed
>>>>>   for image file isolation when using the sVirt SELinux security driver
>>>>>   in libvirt.
>>>>>
>>>>>   The proposed solution entails a combination of Qemu, libvirt, and
>>>>>   SELinux patches that work together to isolate multiple guests' images
>>>>>   when they're stored in the same NFS mount. This results in an
>>>>>   environment where sVirt isolation and NFS image file isolation can
>>>>> both
>>>>>   be provided.
>>>>>
>>>>>   This patch contains the Qemu code to support this solution. I would
>>>>>   like to solicit input from the libvirt community prior to starting
>>>>>   the libvirt patch.
>>>>>
>>>>>   Currently, Qemu opens an image file in addition to performing the
>>>>>   necessary read and write operations. The proposed solution will move
>>>>>   the open out of Qemu and into libvirt. Once libvirt opens an image
>>>>>   file for the guest, it will pass the file descriptor to Qemu via a
>>>>>   new fd: protocol.
>>>>>
>>>>>   If the image file resides in an NFS mount, the following SELinux
>>>>> policy
>>>>>   changes will provide image isolation:
>>>>>
>>>>>     - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>>>>>       allow Qemu (svirt_t) to only have SELinux read and write
>>>>>       permissions on nfs_t files
>>>>>
>>>>>     - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>>>>>       (virtd_t) file descriptors
>>>>>
>>>>>   Following is a sample invocation of Qemu using the fd: protocol on
>>>>>   the command line:
>>>>>
>>>>>       qemu -drive file=fd:4,format=qcow2
>>>>>
>>>>>   The fd: protocol is also supported by the drive_add monitor command.
>>>>>   This requires that the specified file descriptor is passed to the
>>>>>   monitor alongside a prior getfd monitor command.
>>>>>
>>>>>   There are some additional features provided by certain image types
>>>>>   where Qemu reopens the image file. All of these scenarios will be
>>>>>   unsupported for the fd: protocol, at least for this patch:
>>>>>
>>>>>     - The -snapshot command line option
>>>>>     - The savevm monitor command
>>>>>     - The snapshot_blkdev monitor command
>>>>>     - Starting Qemu with a backing file
>>>
>>> There's also native CDROM device. Did you consider adding an explicit
>>> reopen method to block layer?
>>
>> Thanks. Yes it looks like I overlooked CDROM reopens.
>>
>> I'm not sure that I'm clear on the purpose of the reopen function. Would the
>> goal be to funnel all block layer reopens through a single function,
>> enabling potential future support where a privileged layer of Qemu, or
>> libvirt, performs the open?
>
> Eventually yes, but I think it would help also now by moving the
> checks to a single place. It's a bit orthogonal to this patch though.
>

This would definitely simplify things, especially when reopen support is 
added.  I'm going to defer this until then.

>>>>>   The thought is that this support can be added in the future, but is
>>>>>   not required for the initial fd: support.
>>>>>
>>>>>   This patch was tested with the following formats: raw, cow, qcow,
>>>>>   qcow2, qed, and vmdk, using the fd: protocol from the command line
>>>>>   and the monitor. Tests were also run to verify existing file name
>>>>>   support and qemu-img were not regressed. Non-valid file descriptors,
>>>>>   fd: without format, snapshot and backing files were also tested.
>>>>>
>>>>>   Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
>>>>>   ---
>>>>>     block.c           |   16 ++++++++++
>>>>>     block.h           |    1 +
>>>>>     block/cow.c       |    5 +++
>>>>>     block/qcow.c      |    5 +++
>>>>>     block/qcow2.c     |    5 +++
>>>>>     block/qed.c       |    4 ++
>>>>>     block/raw-posix.c |   81
>>>>> +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>>     block/vmdk.c      |    5 +++
>>>>>     block_int.h       |    1 +
>>>>>     blockdev.c        |   10 ++++++
>>>>>     monitor.c         |    5 +++
>>>>>     monitor.h         |    1 +
>>>>>     qemu-options.hx   |    8 +++--
>>>>>     qemu-tool.c       |    5 +++
>>>>>     14 files changed, 140 insertions(+), 12 deletions(-)
>>>>>
>>>>>   diff --git a/block.c b/block.c
>>>>>   index 24a25d5..500db84 100644
>>>>>   --- a/block.c
>>>>>   +++ b/block.c
>>>>>   @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char
>>>>> *filename, int flags,
>>>>>            char tmp_filename[PATH_MAX];
>>>>>            char backing_filename[PATH_MAX];
>>>>>
>>>>>   +        if (bdrv_is_fd_protocol(bs)) {
>>>>>   +            return -ENOTSUP;
>>>>>   +        }
>>>>>   +
>>>>>            /* if snapshot, we create a temporary backing file and open
>>>>> it
>>>>>               instead of opening 'filename' directly */
>>>>>
>>>>>   @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char
>>>>> *filename, int flags,
>>>>>
>>>>>        /* Find the right image format driver */
>>>>>        if (!drv) {
>>>>>   +        /* format must be specified for fd: protocol */
>>>>>   +        if (bdrv_is_fd_protocol(bs)) {
>>>>>   +            return -ENOTSUP;
>>>>>   +        }
>>>>>            ret = find_image_format(filename,&drv);
>>>>>        }
>>>>>
>>>>>   @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState
>>>>> *bs)
>>>>>        return bs->enable_write_cache;
>>>>>     }
>>>>>
>>>>>   +int bdrv_is_fd_protocol(BlockDriverState *bs)
>>>>>   +{
>>>>>   +    return bs->fd_protocol;
>>>>>   +}
>>>>>   +
>>>>>     /* XXX: no longer used */
>>>>>     void bdrv_set_change_cb(BlockDriverState *bs,
>>>>>                            void (*change_cb)(void *opaque, int reason),
>>>>>   @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>>>>>        BlockDriver *drv = bs->drv;
>>>>>        if (!drv)
>>>>>            return -ENOMEDIUM;
>>>>>   +    if (bdrv_is_fd_protocol(bs)) {
>>>>>   +        return -ENOTSUP;
>>>>>   +    }
>>>>>        if (drv->bdrv_snapshot_create)
>>>>>            return drv->bdrv_snapshot_create(bs, sn_info);
>>>>>        if (bs->file)
>>>>>   diff --git a/block.h b/block.h
>>>>>   index da7d39c..dd46d52 100644
>>>>>   --- a/block.h
>>>>>   +++ b/block.h
>>>>>   @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>>>>>     int bdrv_is_read_only(BlockDriverState *bs);
>>>>>     int bdrv_is_sg(BlockDriverState *bs);
>>>>>     int bdrv_enable_write_cache(BlockDriverState *bs);
>>>>>   +int bdrv_is_fd_protocol(BlockDriverState *bs);
>>>>>     int bdrv_is_inserted(BlockDriverState *bs);
>>>>>     int bdrv_media_changed(BlockDriverState *bs);
>>>>>     int bdrv_is_locked(BlockDriverState *bs);
>>>>>   diff --git a/block/cow.c b/block/cow.c
>>>>>   index 4cf543c..e17f8e7 100644
>>>>>   --- a/block/cow.c
>>>>>   +++ b/block/cow.c
>>>>>   @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int
>>>>> flags)
>>>>>        pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>>>>>                cow_header.backing_file);
>>>>>
>>>>>   +    if (bs->backing_file[0] != '\0'&&    bdrv_is_fd_protocol(bs)) {
>>>>>   +        /* backing file currently not supported by fd: protocol */
>>>>>   +        goto fail;
>>>>>   +    }
>>>>>   +
>>>>>        bitmap_size = ((bs->total_sectors + 7)>>    3) +
>>>>> sizeof(cow_header);
>>>>>        s->cow_sectors_offset = (bitmap_size + 511)&    ~511;
>>>>>        return 0;
>>>>>   diff --git a/block/qcow.c b/block/qcow.c
>>>>>   index a26c886..1e46bdb 100644
>>>>>   --- a/block/qcow.c
>>>>>   +++ b/block/qcow.c
>>>>>   @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int
>>>>> flags)
>>>>>            if (bdrv_pread(bs->file, header.backing_file_offset,
>>>>> bs->backing_file, len) != len)
>>>>>                goto fail;
>>>>>            bs->backing_file[len] = '\0';
>>>>>   +
>>>>>   +        if (bs->backing_file[0] != '\0'&&    bdrv_is_fd_protocol(bs)) {
>>>>>   +            /* backing file currently not supported by fd: protocol
>>>>> */
>>>>>   +            goto fail;
>>>>>   +        }
>>>>>        }
>>>>>        return 0;
>>>>>
>>>>>   diff --git a/block/qcow2.c b/block/qcow2.c
>>>>>   index 8451ded..8b9c160 100644
>>>>>   --- a/block/qcow2.c
>>>>>   +++ b/block/qcow2.c
>>>>>   @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int
>>>>> flags)
>>>>>                goto fail;
>>>>>            }
>>>>>            bs->backing_file[len] = '\0';
>>>>>   +
>>>>>   +        if (bs->backing_file[0] != '\0'&&    bdrv_is_fd_protocol(bs)) {
>>>>>   +            ret = -ENOTSUP;
>>>>>   +            goto fail;
>>>>>   +        }
>>>>>        }
>>>>>        if (qcow2_read_snapshots(bs)<    0) {
>>>>>            ret = -EINVAL;
>>>>>   diff --git a/block/qed.c b/block/qed.c
>>>>>   index 3970379..5028897 100644
>>>>>   --- a/block/qed.c
>>>>>   +++ b/block/qed.c
>>>>>   @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs,
>>>>> int flags)
>>>>>                return ret;
>>>>>            }
>>>>>
>>>>>   +        if (bs->backing_file[0] != '\0'&&    bdrv_is_fd_protocol(bs)) {
>>>>>   +            return -ENOTSUP;
>>>>>   +        }
>>>>>   +
>>>>>            if (s->header.features&    QED_F_BACKING_FORMAT_NO_PROBE) {
>>>>>                pstrcpy(bs->backing_format, sizeof(bs->backing_format),
>>>>> "raw");
>>>>>            }
>>>>>   diff --git a/block/raw-posix.c b/block/raw-posix.c
>>>>>   index 4cd7d7a..c72de3d 100644
>>>>>   --- a/block/raw-posix.c
>>>>>   +++ b/block/raw-posix.c
>>>>>   @@ -28,6 +28,7 @@
>>>>>     #include "block_int.h"
>>>>>     #include "module.h"
>>>>>     #include "block/raw-posix-aio.h"
>>>>>   +#include "monitor.h"
>>>>>
>>>>>     #ifdef CONFIG_COCOA
>>>>>     #include<paths.h>
>>>>>   @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs,
>>>>> const char *filename,
>>>>>                               int bdrv_flags, int open_flags)
>>>>>     {
>>>>>        BDRVRawState *s = bs->opaque;
>>>>>   -    int fd, ret;
>>>>>   +    int fd = -1;
>>>>>   +    int ret;
>>>>>
>>>>>        ret = raw_normalize_devicepath(&filename);
>>>>>        if (ret != 0) {
>>>>>   @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs,
>>>>> const char *filename,
>>>>>        if (!(bdrv_flags&    BDRV_O_CACHE_WB))
>>>>>            s->open_flags |= O_DSYNC;
>>>>>
>>>>>   -    s->fd = -1;
>>>>>   -    fd = qemu_open(filename, s->open_flags, 0644);
>>>>>   -    if (fd<    0) {
>>>>>   -        ret = -errno;
>>>>>   -        if (ret == -EROFS)
>>>>>   -            ret = -EACCES;
>>>>>   -        return ret;
>>>>>   +    if (s->fd == -1) {
>>>>>   +        fd = qemu_open(filename, s->open_flags, 0644);
>>>>>   +        if (fd<    0) {
>>>>>   +            ret = -errno;
>>>>>   +            if (ret == -EROFS) {
>>>>>   +                ret = -EACCES;
>>>>>   +            }
>>>>>   +            return ret;
>>>>>   +        }
>>>>>   +        s->fd = fd;
>>>>>        }
>>>>>   -    s->fd = fd;
>>>>>        s->aligned_buf = NULL;
>>>>>
>>>>>        if ((bdrv_flags&    BDRV_O_NOCACHE)) {
>>>>>   @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const
>>>>> char *filename, int flags)
>>>>>     {
>>>>>        BDRVRawState *s = bs->opaque;
>>>>>
>>>>>   +    s->fd = -1;
>>>>>        s->type = FTYPE_FILE;
>>>>>        return raw_open_common(bs, filename, flags, 0);
>>>>>     }
>>>>>   @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = {
>>>>>        .create_options = raw_create_options,
>>>>>     };
>>>>>
>>>>>   +static int raw_open_fd(BlockDriverState *bs, const char *filename,
>>>>> int flags)
>>>>>   +{
>>>>>   +    BDRVRawState *s = bs->opaque;
>>>>>   +    const char *fd_str;
>>>>>   +    int fd;
>>>>>   +
>>>>>   +    /* extract the file descriptor - fail if it's not fd: */
>>>>>   +    if (!strstart(filename, "fd:",&fd_str)) {
>>>>>   +        return -EINVAL;
>>>>>   +    }
>>>>>   +
>>>>>   +    if (!qemu_isdigit(fd_str[0])) {
>>>>>   +        /* get fd from monitor */
>>>>>   +        fd = qemu_get_fd(fd_str);
>>>>>   +        if (fd == -1) {
>>>>>   +            return -EBADF;
>>>>>   +        }
>>>>>   +    } else {
>>>>>   +        char *endptr = NULL;
>>>>>   +
>>>>>   +        fd = strtol(fd_str,&endptr, 10);
>>>>>   +        if (*endptr || (fd == 0&&    fd_str == endptr)) {
>>>>>   +            return -EBADF;
>>>>>   +        }
>>>>>   +    }
>>>>>   +
>>>>>   +    s->fd = fd;
>>>>>   +    s->type = FTYPE_FILE;
>>>>>   +
>>>>>   +    return raw_open_common(bs, filename, flags, 0);
>>>>>   +}
>>>>>   +
>>>>>   +static BlockDriver bdrv_file_fd = {
>>>>>   +    .format_name = "file",
>>>>>   +    .protocol_name = "fd",
>>>>>   +    .instance_size = sizeof(BDRVRawState),
>>>>>   +    .bdrv_probe = NULL, /* no probe for protocols */
>>>>>   +    .bdrv_file_open = raw_open_fd,
>>>>>   +    .bdrv_read = raw_read,
>>>>>   +    .bdrv_write = raw_write,
>>>>>   +    .bdrv_close = raw_close,
>>>>>   +    .bdrv_flush = raw_flush,
>>>>>   +    .bdrv_discard = raw_discard,
>>>>>   +
>>>>>   +    .bdrv_aio_readv = raw_aio_readv,
>>>>>   +    .bdrv_aio_writev = raw_aio_writev,
>>>>>   +    .bdrv_aio_flush = raw_aio_flush,
>>>>>   +
>>>>>   +    .bdrv_truncate = raw_truncate,
>>>>>   +    .bdrv_getlength = raw_getlength,
>>>>>   +
>>>>>   +    .create_options = raw_create_options,
>>>>>   +};
>>>>>   +
>>>>>     /***********************************************/
>>>>>     /* host device */
>>>>>
>>>>>   @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const
>>>>> char *filename, int flags)
>>>>>        }
>>>>>     #endif
>>>>>
>>>>>   +    s->fd = -1;
>>>>>        s->type = FTYPE_FILE;
>>>>>     #if defined(__linux__)
>>>>>        {
>>>>>   @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs,
>>>>> const char *filename, int flags)
>>>>>        BDRVRawState *s = bs->opaque;
>>>>>        int ret;
>>>>>
>>>>>   +    s->fd = -1;
>>>>>        s->type = FTYPE_FD;
>>>>>
>>>>>        /* open will not fail even if no floppy is inserted, so add
>>>>> O_NONBLOCK */
>>>>>   @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs,
>>>>> const char *filename, int flags)
>>>>>     {
>>>>>        BDRVRawState *s = bs->opaque;
>>>>>
>>>>>   +    s->fd = -1;
>>>>>        s->type = FTYPE_CD;
>>>>>
>>>>>        /* open will not fail even if no CD is inserted, so add
>>>>> O_NONBLOCK */
>>>>>   @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void)
>>>>>         * Register all the drivers.  Note that order is important, the
>>>>> driver
>>>>>         * registered last will get probed first.
>>>>>         */
>>>>>   +    bdrv_register(&bdrv_file_fd);
>>>>>        bdrv_register(&bdrv_file);
>>>>>        bdrv_register(&bdrv_host_device);
>>>>>     #ifdef __linux__
>>>>>   diff --git a/block/vmdk.c b/block/vmdk.c
>>>>>   index 922b23d..2ea808e 100644
>>>>>   --- a/block/vmdk.c
>>>>>   +++ b/block/vmdk.c
>>>>>   @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>>>>>                return -1;
>>>>>
>>>>>            pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>>>>>   +
>>>>>   +        if (bs->backing_file[0] != '\0'&&    bdrv_is_fd_protocol(bs)) {
>>>>>   +            /* backing file currently not supported by fd: protocol
>>>>> */
>>>>>   +            return -1;
>>>>>   +        }
>>>>>        }
>>>>>
>>>>>        return 0;
>>>>>   diff --git a/block_int.h b/block_int.h
>>>>>   index fa91337..a305ee2 100644
>>>>>   --- a/block_int.h
>>>>>   +++ b/block_int.h
>>>>>   @@ -152,6 +152,7 @@ struct BlockDriverState {
>>>>>        int encrypted; /* if true, the media is encrypted */
>>>>>        int valid_key; /* if true, a valid encryption key has been set */
>>>>>        int sg;        /* if true, the device is a /dev/sg* */
>>>>>   +    int fd_protocol; /* if true, the fd: protocol was specified */
>>>
>>> bool?
>>>
>> I was following suit here, but I agree that bool would be better.  Or
>> better, these could all be reduced to bit flags.  My thought is that I'll
>> stick with following suit here though.
>
> OK, better convert all at once with a separate patch.
>

Ok

>>>>>        /* event callback when inserting/removing */
>>>>>        void (*change_cb)(void *opaque, int reason);
>>>>>        void *change_opaque;
>>>>>   diff --git a/blockdev.c b/blockdev.c
>>>>>   index e81e0ab..a536c20 100644
>>>>>   --- a/blockdev.c
>>>>>   +++ b/blockdev.c
>>>>>   @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int
>>>>> default_to_scsi)
>>>>>
>>>>>        bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>>>>>
>>>>>   +    if (strncmp(file, "fd:", 3) == 0) {
>>>>>   +        dinfo->bdrv->fd_protocol = 1;
>>>>>   +    }
>>>>>   +
>>>>>        ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>>>>>        if (ret<    0) {
>>>>>            error_report("could not open disk image %s: %s",
>>>>>   @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict
>>>>> *qdict, QObject **ret_data)
>>>>>            goto out;
>>>>>        }
>>>>>
>>>>>   +    if (bdrv_is_fd_protocol(bs)) {
>>>>>   +        qerror_report(QERR_UNSUPPORTED);
>>>>>   +        ret = -1;
>>>>>   +        goto out;
>>>>>   +    }
>>>>>   +
>>>>>        pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>>>>>
>>>>>        old_drv = bs->drv;
>>>>>   diff --git a/monitor.c b/monitor.c
>>>>>   index 59a3e76..ea60be2 100644
>>>>>   --- a/monitor.c
>>>>>   +++ b/monitor.c
>>>>>   @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char
>>>>> *fdname)
>>>>>        return -1;
>>>>>     }
>>>>>
>>>>>   +int qemu_get_fd(const char *fdname)
>>>>>   +{
>>>>>   +    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
>>>>>   +}
>>>>>   +
>>>>>     static const mon_cmd_t mon_cmds[] = {
>>>>>     #include "hmp-commands.h"
>>>>>        { NULL, NULL, },
>>>>>   diff --git a/monitor.h b/monitor.h
>>>>>   index 4f2d328..de5b987 100644
>>>>>   --- a/monitor.h
>>>>>   +++ b/monitor.h
>>>>>   @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon,
>>>>> BlockDriverState *bs,
>>>>>                                    void *opaque);
>>>>>
>>>>>     int monitor_get_fd(Monitor *mon, const char *fdname);
>>>>>   +int qemu_get_fd(const char *fdname);
>>>>>
>>>>>     void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>>>>>        GCC_FMT_ATTR(2, 0);
>>>>>   diff --git a/qemu-options.hx b/qemu-options.hx
>>>>>   index 1d5ad8b..f9b66f4 100644
>>>>>   --- a/qemu-options.hx
>>>>>   +++ b/qemu-options.hx
>>>>>   @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename
>>>>> (@pxref{host_drives}).
>>>>>     ETEXI
>>>>>
>>>>>     DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>>>>>   -    "-drive
>>>>> [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>>>>>   +    "-drive
>>>>> [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>>>>>        "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>>>>>        "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>>>>>        "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>>>>>   @@ -129,10 +129,12 @@ STEXI
>>>>>     Define a new drive. Valid options are:
>>>>>
>>>>>     @table @option
>>>>>   -@item file=@var{file}
>>>>>   +@item file=[fd:]@var{file}
>>>>>     This option defines which disk image (@pxref{disk_images}) to use
>>>>> with
>>>>>     this drive. If the filename contains comma, you must double it
>>>>>   -(for instance, "file=my,,file" to use file "my,file").
>>>>>   +(for instance, "file=my,,file" to use file "my,file").
>>>>> @option{fd:}@var{file}
>>>>>   +specifies the file descriptor of an already open disk image.
>>>>>   +@option{format=}@var{format} is required by @option{fd:}@var{file}.
>>>>>     @item if=@var{interface}
>>>>>     This option defines on which type on interface the drive is
>>>>> connected.
>>>>>     Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
>>>>>   diff --git a/qemu-tool.c b/qemu-tool.c
>>>>>   index 41e5c41..8fe6b8c 100644
>>>>>   --- a/qemu-tool.c
>>>>>   +++ b/qemu-tool.c
>>>>>   @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>>>>>     {
>>>>>        return 0;
>>>>>     }
>>>>>   +
>>>>>   +int qemu_get_fd(const char *fdname)
>>>>>   +{
>>>>>   +    return -1;
>>>>>   +}
>>>>>   --
>>>>>   1.7.1
>>>>>
>>>>>
>>>>>
>>
>>
>> Regards,
>> Corey Bryant
>>
Avi Kivity June 20, 2011, 1:40 p.m. UTC | #10
On 06/14/2011 04:31 PM, Corey Bryant wrote:
>    - Starting Qemu with a backing file
>

For this we could tell qemu that a file named "xyz" is available via fd 
n, via an extension of the getfd command.

For example

   (qemu) getfd path="/images/my-image.img"
   (qemu) getfd path="/images/template.img"
   (qemu) drive-add path="/images/my-image.img"

The open() for my-image.img first looks up the name in the getfd 
database, and finds it, so it returns the fd from there instead of 
opening.  It then opens the backing file ("template.img") and looks it 
up again, and finds the second fd from the session.

The result is that open()s are satisfied from the monitor, instead of 
the host kernel, but without reversing the request/reply nature of the 
monitor protocol.

A similar extension could be added to the command line:

   qemu -drive file=fd:4,cache=none -path-alias 
name=/images/template.img,path=fd:5

Here the main image is opened via a fd 4; if it needs template.img, it 
gets shunted to fd 5.
Anthony Liguori June 20, 2011, 1:50 p.m. UTC | #11
On 06/20/2011 08:40 AM, Avi Kivity wrote:
> On 06/14/2011 04:31 PM, Corey Bryant wrote:
>> - Starting Qemu with a backing file
>>
>
> For this we could tell qemu that a file named "xyz" is available via fd
> n, via an extension of the getfd command.
>
> For example
>
> (qemu) getfd path="/images/my-image.img"
> (qemu) getfd path="/images/template.img"
> (qemu) drive-add path="/images/my-image.img"
>
> The open() for my-image.img first looks up the name in the getfd
> database, and finds it, so it returns the fd from there instead of
> opening. It then opens the backing file ("template.img") and looks it up
> again, and finds the second fd from the session.

The way I've been thinking about this is:

  -blockdev id=hd0-back,file=fd:4,format=raw \
  -blockdev file=fd:3,format=qcow2,backing=hd0-back

While your proposal is clever, it makes me a little nervous about subtle 
security ramifications.

Regards,

Anthony Liguori

>
> The result is that open()s are satisfied from the monitor, instead of
> the host kernel, but without reversing the request/reply nature of the
> monitor protocol.
>
> A similar extension could be added to the command line:
>
> qemu -drive file=fd:4,cache=none -path-alias
> name=/images/template.img,path=fd:5
>
> Here the main image is opened via a fd 4; if it needs template.img, it
> gets shunted to fd 5.
>
Avi Kivity June 20, 2011, 5:35 p.m. UTC | #12
On 06/20/2011 04:50 PM, Anthony Liguori wrote:
> On 06/20/2011 08:40 AM, Avi Kivity wrote:
>> On 06/14/2011 04:31 PM, Corey Bryant wrote:
>>> - Starting Qemu with a backing file
>>>
>>
>> For this we could tell qemu that a file named "xyz" is available via fd
>> n, via an extension of the getfd command.
>>
>> For example
>>
>> (qemu) getfd path="/images/my-image.img"
>> (qemu) getfd path="/images/template.img"
>> (qemu) drive-add path="/images/my-image.img"
>>
>> The open() for my-image.img first looks up the name in the getfd
>> database, and finds it, so it returns the fd from there instead of
>> opening. It then opens the backing file ("template.img") and looks it up
>> again, and finds the second fd from the session.
>
> The way I've been thinking about this is:
>
>  -blockdev id=hd0-back,file=fd:4,format=raw \
>  -blockdev file=fd:3,format=qcow2,backing=hd0-back
>
> While your proposal is clever, it makes me a little nervous about 
> subtle security ramifications.

It would need careful explanation in the management tool author's guide, 
yes.

The main advantage is generality.  It doesn't assume that a file format 
has just one backing file, and doesn't require new syntax wherever a 
file is referred to indirectly.
Anthony Liguori June 20, 2011, 7:11 p.m. UTC | #13
On 06/20/2011 12:35 PM, Avi Kivity wrote:
> On 06/20/2011 04:50 PM, Anthony Liguori wrote:
>> On 06/20/2011 08:40 AM, Avi Kivity wrote:
>>> On 06/14/2011 04:31 PM, Corey Bryant wrote:
>>>> - Starting Qemu with a backing file
>>>>
>>>
>>> For this we could tell qemu that a file named "xyz" is available via fd
>>> n, via an extension of the getfd command.
>>>
>>> For example
>>>
>>> (qemu) getfd path="/images/my-image.img"
>>> (qemu) getfd path="/images/template.img"
>>> (qemu) drive-add path="/images/my-image.img"
>>>
>>> The open() for my-image.img first looks up the name in the getfd
>>> database, and finds it, so it returns the fd from there instead of
>>> opening. It then opens the backing file ("template.img") and looks it up
>>> again, and finds the second fd from the session.
>>
>> The way I've been thinking about this is:
>>
>> -blockdev id=hd0-back,file=fd:4,format=raw \
>> -blockdev file=fd:3,format=qcow2,backing=hd0-back
>>
>> While your proposal is clever, it makes me a little nervous about
>> subtle security ramifications.
>
> It would need careful explanation in the management tool author's guide,
> yes.
>
> The main advantage is generality. It doesn't assume that a file format
> has just one backing file, and doesn't require new syntax wherever a
> file is referred to indirectly.

FWIW, with blockdev, we need options to control this all anyway.  If you 
go back to my QCFG proposal, the parameters would actually be format 
specific, so if we had:

-block 
file=fd:4,format=fancypantsformat,part0=hd0-back.part1,part1=hd0-back.part2...

Regards,

Anthony Liguori

>
Avi Kivity June 21, 2011, 8:21 a.m. UTC | #14
On 06/20/2011 10:11 PM, Anthony Liguori wrote:
>> It would need careful explanation in the management tool author's guide,
>> yes.
>>
>> The main advantage is generality. It doesn't assume that a file format
>> has just one backing file, and doesn't require new syntax wherever a
>> file is referred to indirectly.
>
>
> FWIW, with blockdev, we need options to control this all anyway.  If 
> you go back to my QCFG proposal, the parameters would actually be 
> format specific, so if we had:
>
> -block 
> file=fd:4,format=fancypantsformat,part0=hd0-back.part1,part1=hd0-back.part2...

Yeah.  We either name the formal argument (your proposal) or the actual 
argument (mine).
diff mbox

Patch

diff --git a/block.c b/block.c
index 24a25d5..500db84 100644
--- a/block.c
+++ b/block.c
@@ -536,6 +536,10 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         char tmp_filename[PATH_MAX];
         char backing_filename[PATH_MAX];
 
+        if (bdrv_is_fd_protocol(bs)) {
+            return -ENOTSUP;
+        }
+
         /* if snapshot, we create a temporary backing file and open it
            instead of opening 'filename' directly */
 
@@ -585,6 +589,10 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
 
     /* Find the right image format driver */
     if (!drv) {
+        /* format must be specified for fd: protocol */
+        if (bdrv_is_fd_protocol(bs)) {
+            return -ENOTSUP;
+        }
         ret = find_image_format(filename, &drv);
     }
 
@@ -1460,6 +1468,11 @@  int bdrv_enable_write_cache(BlockDriverState *bs)
     return bs->enable_write_cache;
 }
 
+int bdrv_is_fd_protocol(BlockDriverState *bs)
+{
+    return bs->fd_protocol;
+}
+
 /* XXX: no longer used */
 void bdrv_set_change_cb(BlockDriverState *bs,
                         void (*change_cb)(void *opaque, int reason),
@@ -1964,6 +1977,9 @@  int bdrv_snapshot_create(BlockDriverState *bs,
     BlockDriver *drv = bs->drv;
     if (!drv)
         return -ENOMEDIUM;
+    if (bdrv_is_fd_protocol(bs)) {
+        return -ENOTSUP;
+    }
     if (drv->bdrv_snapshot_create)
         return drv->bdrv_snapshot_create(bs, sn_info);
     if (bs->file)
diff --git a/block.h b/block.h
index da7d39c..dd46d52 100644
--- a/block.h
+++ b/block.h
@@ -182,6 +182,7 @@  int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
+int bdrv_is_fd_protocol(BlockDriverState *bs);
 int bdrv_is_inserted(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
 int bdrv_is_locked(BlockDriverState *bs);
diff --git a/block/cow.c b/block/cow.c
index 4cf543c..e17f8e7 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -82,6 +82,11 @@  static int cow_open(BlockDriverState *bs, int flags)
     pstrcpy(bs->backing_file, sizeof(bs->backing_file),
             cow_header.backing_file);
 
+    if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+        /* backing file currently not supported by fd: protocol */
+        goto fail;
+    }
+
     bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
     s->cow_sectors_offset = (bitmap_size + 511) & ~511;
     return 0;
diff --git a/block/qcow.c b/block/qcow.c
index a26c886..1e46bdb 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -157,6 +157,11 @@  static int qcow_open(BlockDriverState *bs, int flags)
         if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
             goto fail;
         bs->backing_file[len] = '\0';
+
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            /* backing file currently not supported by fd: protocol */
+            goto fail;
+        }
     }
     return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 8451ded..8b9c160 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -270,6 +270,11 @@  static int qcow2_open(BlockDriverState *bs, int flags)
             goto fail;
         }
         bs->backing_file[len] = '\0';
+
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            ret = -ENOTSUP;
+            goto fail;
+        }
     }
     if (qcow2_read_snapshots(bs) < 0) {
         ret = -EINVAL;
diff --git a/block/qed.c b/block/qed.c
index 3970379..5028897 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -446,6 +446,10 @@  static int bdrv_qed_open(BlockDriverState *bs, int flags)
             return ret;
         }
 
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            return -ENOTSUP;
+        }
+
         if (s->header.features & QED_F_BACKING_FORMAT_NO_PROBE) {
             pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
         }
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 4cd7d7a..c72de3d 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -28,6 +28,7 @@ 
 #include "block_int.h"
 #include "module.h"
 #include "block/raw-posix-aio.h"
+#include "monitor.h"
 
 #ifdef CONFIG_COCOA
 #include <paths.h>
@@ -183,7 +184,8 @@  static int raw_open_common(BlockDriverState *bs, const char *filename,
                            int bdrv_flags, int open_flags)
 {
     BDRVRawState *s = bs->opaque;
-    int fd, ret;
+    int fd = -1;
+    int ret;
 
     ret = raw_normalize_devicepath(&filename);
     if (ret != 0) {
@@ -205,15 +207,17 @@  static int raw_open_common(BlockDriverState *bs, const char *filename,
     if (!(bdrv_flags & BDRV_O_CACHE_WB))
         s->open_flags |= O_DSYNC;
 
-    s->fd = -1;
-    fd = qemu_open(filename, s->open_flags, 0644);
-    if (fd < 0) {
-        ret = -errno;
-        if (ret == -EROFS)
-            ret = -EACCES;
-        return ret;
+    if (s->fd == -1) {
+        fd = qemu_open(filename, s->open_flags, 0644);
+        if (fd < 0) {
+            ret = -errno;
+            if (ret == -EROFS) {
+                ret = -EACCES;
+            }
+            return ret;
+        }
+        s->fd = fd;
     }
-    s->fd = fd;
     s->aligned_buf = NULL;
 
     if ((bdrv_flags & BDRV_O_NOCACHE)) {
@@ -270,6 +274,7 @@  static int raw_open(BlockDriverState *bs, const char *filename, int flags)
 {
     BDRVRawState *s = bs->opaque;
 
+    s->fd = -1;
     s->type = FTYPE_FILE;
     return raw_open_common(bs, filename, flags, 0);
 }
@@ -890,6 +895,60 @@  static BlockDriver bdrv_file = {
     .create_options = raw_create_options,
 };
 
+static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVRawState *s = bs->opaque;
+    const char *fd_str;
+    int fd;
+
+    /* extract the file descriptor - fail if it's not fd: */
+    if (!strstart(filename, "fd:", &fd_str)) {
+        return -EINVAL;
+    }
+
+    if (!qemu_isdigit(fd_str[0])) {
+        /* get fd from monitor */
+        fd = qemu_get_fd(fd_str);
+        if (fd == -1) {
+            return -EBADF;
+        }
+    } else {
+        char *endptr = NULL;
+
+        fd = strtol(fd_str, &endptr, 10);
+        if (*endptr || (fd == 0 && fd_str == endptr)) {
+            return -EBADF;
+        }
+    }
+
+    s->fd = fd;
+    s->type = FTYPE_FILE;
+
+    return raw_open_common(bs, filename, flags, 0);
+}
+
+static BlockDriver bdrv_file_fd = {
+    .format_name = "file",
+    .protocol_name = "fd",
+    .instance_size = sizeof(BDRVRawState),
+    .bdrv_probe = NULL, /* no probe for protocols */
+    .bdrv_file_open = raw_open_fd,
+    .bdrv_read = raw_read,
+    .bdrv_write = raw_write,
+    .bdrv_close = raw_close,
+    .bdrv_flush = raw_flush,
+    .bdrv_discard = raw_discard,
+
+    .bdrv_aio_readv = raw_aio_readv,
+    .bdrv_aio_writev = raw_aio_writev,
+    .bdrv_aio_flush = raw_aio_flush,
+
+    .bdrv_truncate = raw_truncate,
+    .bdrv_getlength = raw_getlength,
+
+    .create_options = raw_create_options,
+};
+
 /***********************************************/
 /* host device */
 
@@ -998,6 +1057,7 @@  static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
     }
 #endif
 
+    s->fd = -1;
     s->type = FTYPE_FILE;
 #if defined(__linux__)
     {
@@ -1168,6 +1228,7 @@  static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
     BDRVRawState *s = bs->opaque;
     int ret;
 
+    s->fd = -1;
     s->type = FTYPE_FD;
 
     /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
@@ -1280,6 +1341,7 @@  static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
 {
     BDRVRawState *s = bs->opaque;
 
+    s->fd = -1;
     s->type = FTYPE_CD;
 
     /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
@@ -1503,6 +1565,7 @@  static void bdrv_file_init(void)
      * Register all the drivers.  Note that order is important, the driver
      * registered last will get probed first.
      */
+    bdrv_register(&bdrv_file_fd);
     bdrv_register(&bdrv_file);
     bdrv_register(&bdrv_host_device);
 #ifdef __linux__
diff --git a/block/vmdk.c b/block/vmdk.c
index 922b23d..2ea808e 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -353,6 +353,11 @@  static int vmdk_parent_open(BlockDriverState *bs)
             return -1;
 
         pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
+
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            /* backing file currently not supported by fd: protocol */
+            return -1;
+        }
     }
 
     return 0;
diff --git a/block_int.h b/block_int.h
index fa91337..a305ee2 100644
--- a/block_int.h
+++ b/block_int.h
@@ -152,6 +152,7 @@  struct BlockDriverState {
     int encrypted; /* if true, the media is encrypted */
     int valid_key; /* if true, a valid encryption key has been set */
     int sg;        /* if true, the device is a /dev/sg* */
+    int fd_protocol; /* if true, the fd: protocol was specified */
     /* event callback when inserting/removing */
     void (*change_cb)(void *opaque, int reason);
     void *change_opaque;
diff --git a/blockdev.c b/blockdev.c
index e81e0ab..a536c20 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -546,6 +546,10 @@  DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
 
+    if (strncmp(file, "fd:", 3) == 0) {
+        dinfo->bdrv->fd_protocol = 1;
+    }
+
     ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
     if (ret < 0) {
         error_report("could not open disk image %s: %s",
@@ -606,6 +610,12 @@  int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
         goto out;
     }
 
+    if (bdrv_is_fd_protocol(bs)) {
+        qerror_report(QERR_UNSUPPORTED);
+        ret = -1;
+        goto out;
+    }
+
     pstrcpy(old_filename, sizeof(old_filename), bs->filename);
 
     old_drv = bs->drv;
diff --git a/monitor.c b/monitor.c
index 59a3e76..ea60be2 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2832,6 +2832,11 @@  int monitor_get_fd(Monitor *mon, const char *fdname)
     return -1;
 }
 
+int qemu_get_fd(const char *fdname)
+{
+    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
+}
+
 static const mon_cmd_t mon_cmds[] = {
 #include "hmp-commands.h"
     { NULL, NULL, },
diff --git a/monitor.h b/monitor.h
index 4f2d328..de5b987 100644
--- a/monitor.h
+++ b/monitor.h
@@ -51,6 +51,7 @@  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
                                 void *opaque);
 
 int monitor_get_fd(Monitor *mon, const char *fdname);
+int qemu_get_fd(const char *fdname);
 
 void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
     GCC_FMT_ATTR(2, 0);
diff --git a/qemu-options.hx b/qemu-options.hx
index 1d5ad8b..f9b66f4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -116,7 +116,7 @@  using @file{/dev/cdrom} as filename (@pxref{host_drives}).
 ETEXI
 
 DEF("drive", HAS_ARG, QEMU_OPTION_drive,
-    "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
+    "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
     "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
@@ -129,10 +129,12 @@  STEXI
 Define a new drive. Valid options are:
 
 @table @option
-@item file=@var{file}
+@item file=[fd:]@var{file}
 This option defines which disk image (@pxref{disk_images}) to use with
 this drive. If the filename contains comma, you must double it
-(for instance, "file=my,,file" to use file "my,file").
+(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
+specifies the file descriptor of an already open disk image.
+@option{format=}@var{format} is required by @option{fd:}@var{file}.
 @item if=@var{interface}
 This option defines on which type on interface the drive is connected.
 Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
diff --git a/qemu-tool.c b/qemu-tool.c
index 41e5c41..8fe6b8c 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -96,3 +96,8 @@  int64_t qemu_get_clock_ns(QEMUClock *clock)
 {
     return 0;
 }
+
+int qemu_get_fd(const char *fdname)
+{
+    return -1;
+}