Patchwork [v3] Add support for fd: protocol

login
register
mail settings
Submitter Corey Bryant
Date July 26, 2011, 12:51 p.m.
Message ID <1311684710-27074-1-git-send-email-coreyb@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/106864/
State New
Headers show

Comments

Corey Bryant - July 26, 2011, 12:51 p.m.
sVirt provides SELinux MAC isolation for Qemu guest processes and their
corresponding resources (image files). sVirt provides this support
by labeling guests and resources with security labels that are stored
in file system extended attributes. Some file systems, such as NFS, do
not support the extended attribute security namespace, which is needed
for image file isolation when using the sVirt SELinux security driver
in libvirt.

The proposed solution entails a combination of Qemu, libvirt, and
SELinux patches that work together to isolate multiple guests' images
when they're stored in the same NFS mount. This results in an
environment where sVirt isolation and NFS image file isolation can both
be provided.

This patch contains the Qemu code to support this solution. I would
like to solicit input from the libvirt community prior to starting
the libvirt patch.

Currently, Qemu opens an image file in addition to performing the
necessary read and write operations. The proposed solution will move
the open out of Qemu and into libvirt. Once libvirt opens an image
file for the guest, it will pass the file descriptor to Qemu via a
new fd: protocol.

If the image file resides in an NFS mount, the following SELinux policy
changes will provide image isolation:

  - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
    allow Qemu (svirt_t) to only have SELinux read and write
    permissions on nfs_t files

  - Qemu (svirt_t) also gets SELinux use permissions on libvirt
    (virtd_t) file descriptors

Following is a sample invocation of Qemu using the fd: protocol on
the command line:

    qemu -drive file=fd:4,format=qcow2

The fd: protocol is also supported by the drive_add monitor command.
This requires that the specified file descriptor is passed to the
monitor alongside a prior getfd monitor command.

There are some additional features provided by certain image types
where Qemu reopens the image file. All of these scenarios will be
unsupported for the fd: protocol, at least for this patch:

  - The -snapshot command line option
  - The savevm monitor command
  - The snapshot_blkdev monitor command
  - Use of copy-on-write image files
  - The -cdrom command line option
  - The -drive command line option with media=cdrom
  - The change monitor command

The thought is that this support can be added in the future, but is
not required for the initial fd: support.

This patch was tested with the following formats: raw, cow, qcow,
qcow2, qed, and vmdk, using the fd: protocol from the command line
and the monitor. Tests were also run to verify existing file name
support and qemu-img were not regressed. Non-valid file descriptors,
fd: without format, snapshot and backing files, and cdrom were also
tested.

v2:
  - Add drive_add monitor command support
  - Fence off unsupported features that re-open image file

v3:
  - Fence off cdrom and change monitor command support

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
---
 block.c           |   16 ++++++++++
 block.h           |    1 +
 block/cow.c       |    5 +++
 block/qcow.c      |    5 +++
 block/qcow2.c     |    5 +++
 block/qed.c       |    4 ++
 block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
 block/vmdk.c      |    5 +++
 block_int.h       |    1 +
 blockdev.c        |   19 ++++++++++++
 monitor.c         |    5 +++
 monitor.h         |    1 +
 qemu-options.hx   |    8 +++--
 qemu-tool.c       |    5 +++
 14 files changed, 149 insertions(+), 12 deletions(-)
Christoph Hellwig - July 26, 2011, 1:02 p.m.
I have to say I really hate it.  We've been working hard on getting rid
of special cases in the qemu block layer, and this sprinkles them all
over.  I'd recommend to fix your security model instead.
Corey Bryant - July 26, 2011, 1:15 p.m.
On 07/26/2011 09:02 AM, Christoph Hellwig wrote:
> I have to say I really hate it.  We've been working hard on getting rid
> of special cases in the qemu block layer, and this sprinkles them all
> over.  I'd recommend to fix your security model instead.

I understand your point on special casing.  The special cases are 
temporarily fencing off re-opening of image files.  In fact, the entire 
patch should be temporary, fixing Qemu's security model until a time 
when NFS can provide support that enables sVirt isolation.

Regards,
Corey
Eric Blake - July 26, 2011, 2 p.m.
On 07/26/2011 06:51 AM, Corey Bryant wrote:
> There are some additional features provided by certain image types
> where Qemu reopens the image file. All of these scenarios will be
> unsupported for the fd: protocol, at least for this patch:
>
>    - The -snapshot command line option
>    - The savevm monitor command
>    - The snapshot_blkdev monitor command
>    - Use of copy-on-write image files
>    - The -cdrom command line option
>    - The -drive command line option with media=cdrom
>    - The change monitor command
>
> The thought is that this support can be added in the future, but is
> not required for the initial fd: support.

Libvirt will eventually need support for fd passing on savevm, 
snapshot_blkdev, and change monitor commands, as well as for -cdrom, 
before this feature can be used to provide the desired security 
enhancements.  I agree that for an incremental patch, you don't have to 
solve all points at once, but until all places have been modified to 
support fd usage, you aren't gaining any security, except for severely 
constrained guests.

Furthermore, how do you plan to map fd: to filename?  There's already 
been big threads on why snapshot_blkdev needs both the new fd: and the 
name of the old backing file at the same time, so that qemu can write 
the correct headers into new qcow2 files.  But your proposal precludes 
that, since "qemu -drive file=fd:4,format=qcow2" is not letting qemu 
know the file name of fd:4 that would later have to be written into a 
qcow2 header.  I'm afraid that we need a better solution that gets both 
fd and filename mapped together, before this stands a chance of being 
useful.  That said, I'm strongly in favor of getting the open() burden 
moved out of qemu into libvirt, because of the potential it has for 
increased security.
Eric Blake - July 26, 2011, 2:02 p.m.
On 07/26/2011 07:02 AM, Christoph Hellwig wrote:
> I have to say I really hate it.  We've been working hard on getting rid
> of special cases in the qemu block layer, and this sprinkles them all
> over.  I'd recommend to fix your security model instead.

What is it that you hate - the particular qemu implementation expressed 
by this patch (in which case, we're all ears for a better 
implementation), or the need for fd passing (which libvirt is strongly 
in favor of, and which there really is no other way to fix the security 
model to work well with file systems like NFS that lack per-file 
labels)?  In my mind, increased security trumps poor aesthetics - if the 
patch really does make it possible to enhance security, then the special 
casing to support the patch is worth it.
Kevin Wolf - July 26, 2011, 2:05 p.m.
Am 26.07.2011 15:02, schrieb Christoph Hellwig:
> I have to say I really hate it.  We've been working hard on getting rid
> of special cases in the qemu block layer, and this sprinkles them all
> over.  I'd recommend to fix your security model instead.

I think the problem here is more with the implementation that with the
intention.

I agree that you just can't do this. A patch adding support for a fd:
protocol should touch block/fd.c and nothing else. You can add some
supporting patches that extend the generic block layer to support e.g.
formats that can't reopen. However, if you touch the code of other block
drivers, you're doing it wrong.

Kevin
Kevin Wolf - July 26, 2011, 2:19 p.m.
Am 26.07.2011 16:00, schrieb Eric Blake:
> On 07/26/2011 06:51 AM, Corey Bryant wrote:
>> There are some additional features provided by certain image types
>> where Qemu reopens the image file. All of these scenarios will be
>> unsupported for the fd: protocol, at least for this patch:
>>
>>    - The -snapshot command line option
>>    - The savevm monitor command
>>    - The snapshot_blkdev monitor command
>>    - Use of copy-on-write image files
>>    - The -cdrom command line option
>>    - The -drive command line option with media=cdrom
>>    - The change monitor command
>>
>> The thought is that this support can be added in the future, but is
>> not required for the initial fd: support.
> 
> Libvirt will eventually need support for fd passing on savevm, 
> snapshot_blkdev, and change monitor commands, as well as for -cdrom, 
> before this feature can be used to provide the desired security 
> enhancements.  I agree that for an incremental patch, you don't have to 
> solve all points at once, but until all places have been modified to 
> support fd usage, you aren't gaining any security, except for severely 
> constrained guests.
> 
> Furthermore, how do you plan to map fd: to filename?  There's already 
> been big threads on why snapshot_blkdev needs both the new fd: and the 
> name of the old backing file at the same time, so that qemu can write 
> the correct headers into new qcow2 files. 

That's a problem to solve in snapshot_blkdev, not in -drive. In general
qemu doesn't need and shouldn't know the file name if it's meant to use
an fd.

Kevin
Corey Bryant - July 26, 2011, 2:46 p.m.
On 07/26/2011 10:05 AM, Kevin Wolf wrote:
> Am 26.07.2011 15:02, schrieb Christoph Hellwig:
>> >  I have to say I really hate it.  We've been working hard on getting rid
>> >  of special cases in the qemu block layer, and this sprinkles them all
>> >  over.  I'd recommend to fix your security model instead.
> I think the problem here is more with the implementation that with the
> intention.
>
> I agree that you just can't do this. A patch adding support for a fd:
> protocol should touch block/fd.c and nothing else. You can add some
> supporting patches that extend the generic block layer to support e.g.
> formats that can't reopen. However, if you touch the code of other block
> drivers, you're doing it wrong.
>
> Kevin
>
>

I'll look into this approach, but on the surface it seems like this 
could prevent a lot of code reuse in block/raw-posix.c.

Regards,
Corey
Kevin Wolf - July 26, 2011, 2:54 p.m.
Am 26.07.2011 16:46, schrieb Corey Bryant:
> 
> On 07/26/2011 10:05 AM, Kevin Wolf wrote:
>> Am 26.07.2011 15:02, schrieb Christoph Hellwig:
>>>>  I have to say I really hate it.  We've been working hard on getting rid
>>>>  of special cases in the qemu block layer, and this sprinkles them all
>>>>  over.  I'd recommend to fix your security model instead.
>> I think the problem here is more with the implementation that with the
>> intention.
>>
>> I agree that you just can't do this. A patch adding support for a fd:
>> protocol should touch block/fd.c and nothing else. You can add some
>> supporting patches that extend the generic block layer to support e.g.
>> formats that can't reopen. However, if you touch the code of other block
>> drivers, you're doing it wrong.
> 
> I'll look into this approach, but on the surface it seems like this 
> could prevent a lot of code reuse in block/raw-posix.c.

What I meant here is more about modifying qcow2, VMDK etc. I still need
to look into the details of what you share with raw-posix.c.

Kevin
Kevin Wolf - July 26, 2011, 3:18 p.m.
Am 26.07.2011 14:51, schrieb Corey Bryant:
> sVirt provides SELinux MAC isolation for Qemu guest processes and their
> corresponding resources (image files). sVirt provides this support
> by labeling guests and resources with security labels that are stored
> in file system extended attributes. Some file systems, such as NFS, do
> not support the extended attribute security namespace, which is needed
> for image file isolation when using the sVirt SELinux security driver
> in libvirt.
> 
> The proposed solution entails a combination of Qemu, libvirt, and
> SELinux patches that work together to isolate multiple guests' images
> when they're stored in the same NFS mount. This results in an
> environment where sVirt isolation and NFS image file isolation can both
> be provided.
> 
> This patch contains the Qemu code to support this solution. I would
> like to solicit input from the libvirt community prior to starting
> the libvirt patch.
> 
> Currently, Qemu opens an image file in addition to performing the
> necessary read and write operations. The proposed solution will move
> the open out of Qemu and into libvirt. Once libvirt opens an image
> file for the guest, it will pass the file descriptor to Qemu via a
> new fd: protocol.
> 
> If the image file resides in an NFS mount, the following SELinux policy
> changes will provide image isolation:
> 
>   - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>     allow Qemu (svirt_t) to only have SELinux read and write
>     permissions on nfs_t files
> 
>   - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>     (virtd_t) file descriptors
> 
> Following is a sample invocation of Qemu using the fd: protocol on
> the command line:
> 
>     qemu -drive file=fd:4,format=qcow2
> 
> The fd: protocol is also supported by the drive_add monitor command.
> This requires that the specified file descriptor is passed to the
> monitor alongside a prior getfd monitor command.
> 
> There are some additional features provided by certain image types
> where Qemu reopens the image file. All of these scenarios will be
> unsupported for the fd: protocol, at least for this patch:
> 
>   - The -snapshot command line option
>   - The savevm monitor command
>   - The snapshot_blkdev monitor command
>   - Use of copy-on-write image files
>   - The -cdrom command line option
>   - The -drive command line option with media=cdrom
>   - The change monitor command
> 
> The thought is that this support can be added in the future, but is
> not required for the initial fd: support.
> 
> This patch was tested with the following formats: raw, cow, qcow,
> qcow2, qed, and vmdk, using the fd: protocol from the command line
> and the monitor. Tests were also run to verify existing file name
> support and qemu-img were not regressed. Non-valid file descriptors,
> fd: without format, snapshot and backing files, and cdrom were also
> tested.
> 
> v2:
>   - Add drive_add monitor command support
>   - Fence off unsupported features that re-open image file
> 
> v3:
>   - Fence off cdrom and change monitor command support
> 
> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
> ---
>  block.c           |   16 ++++++++++
>  block.h           |    1 +
>  block/cow.c       |    5 +++
>  block/qcow.c      |    5 +++
>  block/qcow2.c     |    5 +++
>  block/qed.c       |    4 ++
>  block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>  block/vmdk.c      |    5 +++
>  block_int.h       |    1 +
>  blockdev.c        |   19 ++++++++++++
>  monitor.c         |    5 +++
>  monitor.h         |    1 +
>  qemu-options.hx   |    8 +++--
>  qemu-tool.c       |    5 +++
>  14 files changed, 149 insertions(+), 12 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 24a25d5..500db84 100644
> --- a/block.c
> +++ b/block.c
> @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>          char tmp_filename[PATH_MAX];
>          char backing_filename[PATH_MAX];
>  
> +        if (bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }

Hm, shouldn't that just work even with fd?

> +
>          /* if snapshot, we create a temporary backing file and open it
>             instead of opening 'filename' directly */
>  
> @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>  
>      /* Find the right image format driver */
>      if (!drv) {
> +        /* format must be specified for fd: protocol */
> +        if (bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }

This isn't really required to make fd work.

>          ret = find_image_format(filename, &drv);
>      }
>  
> @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>      return bs->enable_write_cache;
>  }
>  
> +int bdrv_is_fd_protocol(BlockDriverState *bs)
> +{
> +    return bs->fd_protocol;
> +}

The generic block layer shouldn't have any special cases based on the
format driver. If you need to do something different for fd, think about
what property of fd it is that requires the special case (like can't
reopen).

> +
>  /* XXX: no longer used */
>  void bdrv_set_change_cb(BlockDriverState *bs,
>                          void (*change_cb)(void *opaque, int reason),
> @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>      BlockDriver *drv = bs->drv;
>      if (!drv)
>          return -ENOMEDIUM;
> +    if (bdrv_is_fd_protocol(bs)) {
> +        return -ENOTSUP;
> +    }

What's the problem with internal snapshots in images accesses over an fd?

>      if (drv->bdrv_snapshot_create)
>          return drv->bdrv_snapshot_create(bs, sn_info);
>      if (bs->file)
> diff --git a/block.h b/block.h
> index 859d1d9..0417b69 100644
> --- a/block.h
> +++ b/block.h
> @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>  int bdrv_is_read_only(BlockDriverState *bs);
>  int bdrv_is_sg(BlockDriverState *bs);
>  int bdrv_enable_write_cache(BlockDriverState *bs);
> +int bdrv_is_fd_protocol(BlockDriverState *bs);
>  int bdrv_is_inserted(BlockDriverState *bs);
>  int bdrv_media_changed(BlockDriverState *bs);
>  int bdrv_is_locked(BlockDriverState *bs);
> diff --git a/block/cow.c b/block/cow.c
> index 4cf543c..e17f8e7 100644
> --- a/block/cow.c
> +++ b/block/cow.c
> @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>      pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>              cow_header.backing_file);
>  
> +    if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +        /* backing file currently not supported by fd: protocol */
> +        goto fail;
> +    }

I don't think these checks are strictly needed. Without the check you
can open the image itself using an fd, and the backing file using good
old raw-posix. We shouldn't decide for our users that this isn't useful.

In any case, it would have to move into block.c, you can't modify
independent drivers for this.

> +
>      bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
>      s->cow_sectors_offset = (bitmap_size + 511) & ~511;
>      return 0;
> diff --git a/block/qcow.c b/block/qcow.c
> index 227b104..964d411 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>          if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>              goto fail;
>          bs->backing_file[len] = '\0';
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            /* backing file currently not supported by fd: protocol */
> +            goto fail;
> +        }
>      }
>      return 0;
>  
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 48e1b95..7f6a4fa 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>              goto fail;
>          }
>          bs->backing_file[len] = '\0';
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            ret = -ENOTSUP;
> +            goto fail;
> +        }
>      }
>      if (qcow2_read_snapshots(bs) < 0) {
>          ret = -EINVAL;
> diff --git a/block/qed.c b/block/qed.c
> index 3970379..5028897 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>              return ret;
>          }
>  
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }
> +
>          if (s->header.features & QED_F_BACKING_FORMAT_NO_PROBE) {
>              pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>          }
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 34b64aa..cec4d36 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "block/raw-posix-aio.h"
> +#include "monitor.h"
>  
>  #ifdef CONFIG_COCOA
>  #include <paths.h>
> @@ -185,7 +186,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>                             int bdrv_flags, int open_flags)
>  {
>      BDRVRawState *s = bs->opaque;
> -    int fd, ret;
> +    int fd = -1;
> +    int ret;
>  
>      ret = raw_normalize_devicepath(&filename);
>      if (ret != 0) {
> @@ -207,15 +209,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>      if (!(bdrv_flags & BDRV_O_CACHE_WB))
>          s->open_flags |= O_DSYNC;
>  
> -    s->fd = -1;
> -    fd = qemu_open(filename, s->open_flags, 0644);
> -    if (fd < 0) {
> -        ret = -errno;
> -        if (ret == -EROFS)
> -            ret = -EACCES;
> -        return ret;
> +    if (s->fd == -1) {
> +        fd = qemu_open(filename, s->open_flags, 0644);
> +        if (fd < 0) {
> +            ret = -errno;
> +            if (ret == -EROFS) {
> +                ret = -EACCES;
> +            }
> +            return ret;
> +        }
> +        s->fd = fd;
>      }
> -    s->fd = fd;
>      s->aligned_buf = NULL;
>  
>      if ((bdrv_flags & BDRV_O_NOCACHE)) {
> @@ -272,6 +276,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>  {
>      BDRVRawState *s = bs->opaque;
>  
> +    s->fd = -1;
>      s->type = FTYPE_FILE;
>      return raw_open_common(bs, filename, flags, 0);
>  }
> @@ -892,6 +897,60 @@ static BlockDriver bdrv_file = {
>      .create_options = raw_create_options,
>  };
>  
> +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    const char *fd_str;
> +    int fd;
> +
> +    /* extract the file descriptor - fail if it's not fd: */
> +    if (!strstart(filename, "fd:", &fd_str)) {
> +        return -EINVAL;
> +    }
> +
> +    if (!qemu_isdigit(fd_str[0])) {
> +        /* get fd from monitor */
> +        fd = qemu_get_fd(fd_str);
> +        if (fd == -1) {
> +            return -EBADF;
> +        }
> +    } else {
> +        char *endptr = NULL;
> +
> +        fd = strtol(fd_str, &endptr, 10);
> +        if (*endptr || (fd == 0 && fd_str == endptr)) {
> +            return -EBADF;
> +        }
> +    }
> +
> +    s->fd = fd;
> +    s->type = FTYPE_FILE;
> +
> +    return raw_open_common(bs, filename, flags, 0);
> +}
> +
> +static BlockDriver bdrv_file_fd = {
> +    .format_name = "file",
> +    .protocol_name = "fd",
> +    .instance_size = sizeof(BDRVRawState),
> +    .bdrv_probe = NULL, /* no probe for protocols */
> +    .bdrv_file_open = raw_open_fd,
> +    .bdrv_read = raw_read,
> +    .bdrv_write = raw_write,
> +    .bdrv_close = raw_close,
> +    .bdrv_flush = raw_flush,
> +    .bdrv_discard = raw_discard,
> +
> +    .bdrv_aio_readv = raw_aio_readv,
> +    .bdrv_aio_writev = raw_aio_writev,
> +    .bdrv_aio_flush = raw_aio_flush,
> +
> +    .bdrv_truncate = raw_truncate,
> +    .bdrv_getlength = raw_getlength,
> +
> +    .create_options = raw_create_options,
> +};
> +
>  /***********************************************/
>  /* host device */
>  
> @@ -1000,6 +1059,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>      }
>  #endif
>  
> +    s->fd = -1;
>      s->type = FTYPE_FILE;
>  #if defined(__linux__)
>      {
> @@ -1170,6 +1230,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>      BDRVRawState *s = bs->opaque;
>      int ret;
>  
> +    s->fd = -1;
>      s->type = FTYPE_FD;
>  
>      /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
> @@ -1288,6 +1349,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>  {
>      BDRVRawState *s = bs->opaque;
>  
> +    s->fd = -1;
>      s->type = FTYPE_CD;
>  
>      /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
> @@ -1517,6 +1579,7 @@ static void bdrv_file_init(void)
>       * Register all the drivers.  Note that order is important, the driver
>       * registered last will get probed first.
>       */
> +    bdrv_register(&bdrv_file_fd);
>      bdrv_register(&bdrv_file);
>      bdrv_register(&bdrv_host_device);

The part modifying raw looks good at the first sight.

>  #ifdef __linux__
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 922b23d..2ea808e 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>              return -1;
>  
>          pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            /* backing file currently not supported by fd: protocol */
> +            return -1;
> +        }
>      }
>  
>      return 0;
> diff --git a/block_int.h b/block_int.h
> index 1e265d2..441049c 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -152,6 +152,7 @@ struct BlockDriverState {
>      int encrypted; /* if true, the media is encrypted */
>      int valid_key; /* if true, a valid encryption key has been set */
>      int sg;        /* if true, the device is a /dev/sg* */
> +    int fd_protocol; /* if true, the fd: protocol was specified */

You don't need this any more when you remove bdrv_is_fd_protocol()

>      /* event callback when inserting/removing */
>      void (*change_cb)(void *opaque, int reason);
>      void *change_opaque;
> diff --git a/blockdev.c b/blockdev.c
> index c263663..5cb7b56 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -542,6 +542,14 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>  
>      bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>  
> +    if (strncmp(file, "fd:", 3) == 0) {
> +        if (media == MEDIA_CDROM) {
> +            error_report("CD-ROM not supported by fd: protocol");
> +            goto err;
> +        }
> +        dinfo->bdrv->fd_protocol = 1;
> +    }

Why?

> +
>      ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>      if (ret < 0) {
>          error_report("could not open disk image %s: %s",
> @@ -602,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
>          goto out;
>      }
>  
> +    if (bdrv_is_fd_protocol(bs)) {
> +        qerror_report(QERR_UNSUPPORTED);
> +        ret = -1;
> +        goto out;
> +    }
> +
>      pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>  
>      old_drv = bs->drv;
> @@ -719,6 +733,11 @@ int do_change_block(Monitor *mon, const char *device,
>      BlockDriver *drv = NULL;
>      int bdrv_flags;
>  
> +    if (strncmp(filename, "fd:", 3) == 0) {
> +        qerror_report(QERR_UNSUPPORTED);
> +        return -1;
> +    }

What's the problem here?

Kevin
Corey Bryant - July 26, 2011, 4:57 p.m.
Kevin, thanks for the input.

On 07/26/2011 11:18 AM, Kevin Wolf wrote:
> Am 26.07.2011 14:51, schrieb Corey Bryant:
>> >  sVirt provides SELinux MAC isolation for Qemu guest processes and their
>> >  corresponding resources (image files). sVirt provides this support
>> >  by labeling guests and resources with security labels that are stored
>> >  in file system extended attributes. Some file systems, such as NFS, do
>> >  not support the extended attribute security namespace, which is needed
>> >  for image file isolation when using the sVirt SELinux security driver
>> >  in libvirt.
>> >
>> >  The proposed solution entails a combination of Qemu, libvirt, and
>> >  SELinux patches that work together to isolate multiple guests' images
>> >  when they're stored in the same NFS mount. This results in an
>> >  environment where sVirt isolation and NFS image file isolation can both
>> >  be provided.
>> >
>> >  This patch contains the Qemu code to support this solution. I would
>> >  like to solicit input from the libvirt community prior to starting
>> >  the libvirt patch.
>> >
>> >  Currently, Qemu opens an image file in addition to performing the
>> >  necessary read and write operations. The proposed solution will move
>> >  the open out of Qemu and into libvirt. Once libvirt opens an image
>> >  file for the guest, it will pass the file descriptor to Qemu via a
>> >  new fd: protocol.
>> >
>> >  If the image file resides in an NFS mount, the following SELinux policy
>> >  changes will provide image isolation:
>> >
>> >     - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>> >       allow Qemu (svirt_t) to only have SELinux read and write
>> >       permissions on nfs_t files
>> >
>> >     - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>> >       (virtd_t) file descriptors
>> >
>> >  Following is a sample invocation of Qemu using the fd: protocol on
>> >  the command line:
>> >
>> >       qemu -drive file=fd:4,format=qcow2
>> >
>> >  The fd: protocol is also supported by the drive_add monitor command.
>> >  This requires that the specified file descriptor is passed to the
>> >  monitor alongside a prior getfd monitor command.
>> >
>> >  There are some additional features provided by certain image types
>> >  where Qemu reopens the image file. All of these scenarios will be
>> >  unsupported for the fd: protocol, at least for this patch:
>> >
>> >     - The -snapshot command line option
>> >     - The savevm monitor command
>> >     - The snapshot_blkdev monitor command
>> >     - Use of copy-on-write image files
>> >     - The -cdrom command line option
>> >     - The -drive command line option with media=cdrom
>> >     - The change monitor command
>> >
>> >  The thought is that this support can be added in the future, but is
>> >  not required for the initial fd: support.
>> >
>> >  This patch was tested with the following formats: raw, cow, qcow,
>> >  qcow2, qed, and vmdk, using the fd: protocol from the command line
>> >  and the monitor. Tests were also run to verify existing file name
>> >  support and qemu-img were not regressed. Non-valid file descriptors,
>> >  fd: without format, snapshot and backing files, and cdrom were also
>> >  tested.
>> >
>> >  v2:
>> >     - Add drive_add monitor command support
>> >     - Fence off unsupported features that re-open image file
>> >
>> >  v3:
>> >     - Fence off cdrom and change monitor command support
>> >
>> >  Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
>> >  ---
>> >    block.c           |   16 ++++++++++
>> >    block.h           |    1 +
>> >    block/cow.c       |    5 +++
>> >    block/qcow.c      |    5 +++
>> >    block/qcow2.c     |    5 +++
>> >    block/qed.c       |    4 ++
>> >    block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>> >    block/vmdk.c      |    5 +++
>> >    block_int.h       |    1 +
>> >    blockdev.c        |   19 ++++++++++++
>> >    monitor.c         |    5 +++
>> >    monitor.h         |    1 +
>> >    qemu-options.hx   |    8 +++--
>> >    qemu-tool.c       |    5 +++
>> >    14 files changed, 149 insertions(+), 12 deletions(-)
>> >
>> >  diff --git a/block.c b/block.c
>> >  index 24a25d5..500db84 100644
>> >  --- a/block.c
>> >  +++ b/block.c
>> >  @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> >            char tmp_filename[PATH_MAX];
>> >            char backing_filename[PATH_MAX];
>> >
>> >  +        if (bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
> Hm, shouldn't that just work even with fd?
>

My thought was that the proposed SELinux changes would prevent the open 
of the temporary backing file if the file corresponding to fd resides on 
NFS.  But perhaps the backing file is not opened on NFS?

>> >  +
>> >            /* if snapshot, we create a temporary backing file and open it
>> >               instead of opening 'filename' directly */
>> >
>> >  @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>> >
>> >        /* Find the right image format driver */
>> >        if (!drv) {
>> >  +        /* format must be specified for fd: protocol */
>> >  +        if (bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
> This isn't really required to make fd work.
>

If format is not specified, the file needs to be opened and probed to 
determine the format.  The proposed SELinux changes should prevent the 
open if it resides on NFS.

>> >            ret = find_image_format(filename,&drv);
>> >        }
>> >
>> >  @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>> >        return bs->enable_write_cache;
>> >    }
>> >
>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs)
>> >  +{
>> >  +    return bs->fd_protocol;
>> >  +}
> The generic block layer shouldn't have any special cases based on the
> format driver. If you need to do something different for fd, think about
> what property of fd it is that requires the special case (like can't
> reopen).
>

I'm not sure I completely understand what you're saying here.  I 
understand the desire to not have special cases based on the fd 
protocol.  I have a number special case checks that prevent re-opens, 
which presumably would not be allowed by the proposed SELinux changes. 
Should there be no special cases at all (e.g. should I provide solutions 
to all the re-opens now) or would you rather special cases be provided 
at a different layer?

>> >  +
>> >    /* XXX: no longer used */
>> >    void bdrv_set_change_cb(BlockDriverState *bs,
>> >                            void (*change_cb)(void *opaque, int reason),
>> >  @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>> >        BlockDriver *drv = bs->drv;
>> >        if (!drv)
>> >            return -ENOMEDIUM;
>> >  +    if (bdrv_is_fd_protocol(bs)) {
>> >  +        return -ENOTSUP;
>> >  +    }
> What's the problem with internal snapshots in images accesses over an fd?
>

I was thinking the proposed SELinux changes would prevent the open of 
the snapshot file if the file corresponding to fd resides on NFS.  But 
perhaps it is not opened on NFS?

>> >        if (drv->bdrv_snapshot_create)
>> >            return drv->bdrv_snapshot_create(bs, sn_info);
>> >        if (bs->file)
>> >  diff --git a/block.h b/block.h
>> >  index 859d1d9..0417b69 100644
>> >  --- a/block.h
>> >  +++ b/block.h
>> >  @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>> >    int bdrv_is_read_only(BlockDriverState *bs);
>> >    int bdrv_is_sg(BlockDriverState *bs);
>> >    int bdrv_enable_write_cache(BlockDriverState *bs);
>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs);
>> >    int bdrv_is_inserted(BlockDriverState *bs);
>> >    int bdrv_media_changed(BlockDriverState *bs);
>> >    int bdrv_is_locked(BlockDriverState *bs);
>> >  diff --git a/block/cow.c b/block/cow.c
>> >  index 4cf543c..e17f8e7 100644
>> >  --- a/block/cow.c
>> >  +++ b/block/cow.c
>> >  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>> >        pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>> >                cow_header.backing_file);
>> >
>> >  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +        /* backing file currently not supported by fd: protocol */
>> >  +        goto fail;
>> >  +    }
> I don't think these checks are strictly needed. Without the check you
> can open the image itself using an fd, and the backing file using good
> old raw-posix. We shouldn't decide for our users that this isn't useful.
>
> In any case, it would have to move into block.c, you can't modify
> independent drivers for this.
>

I understand the point on not modifying independent drivers.

But if the backing file resides on NFS, wouldn't the the proposed 
SELinux changes prevent the open?

Or are you talking about support where libvirt opens the backing file 
and passes the fd to Qemu?  If so there was some discussion about future 
support for this here: 
http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01496.html


>> >  +
>> >        bitmap_size = ((bs->total_sectors + 7)>>  3) + sizeof(cow_header);
>> >        s->cow_sectors_offset = (bitmap_size + 511)&  ~511;
>> >        return 0;
>> >  diff --git a/block/qcow.c b/block/qcow.c
>> >  index 227b104..964d411 100644
>> >  --- a/block/qcow.c
>> >  +++ b/block/qcow.c
>> >  @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>> >            if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>> >                goto fail;
>> >            bs->backing_file[len] = '\0';
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            /* backing file currently not supported by fd: protocol */
>> >  +            goto fail;
>> >  +        }
>> >        }
>> >        return 0;
>> >
>> >  diff --git a/block/qcow2.c b/block/qcow2.c
>> >  index 48e1b95..7f6a4fa 100644
>> >  --- a/block/qcow2.c
>> >  +++ b/block/qcow2.c
>> >  @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>> >                goto fail;
>> >            }
>> >            bs->backing_file[len] = '\0';
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            ret = -ENOTSUP;
>> >  +            goto fail;
>> >  +        }
>> >        }
>> >        if (qcow2_read_snapshots(bs)<  0) {
>> >            ret = -EINVAL;
>> >  diff --git a/block/qed.c b/block/qed.c
>> >  index 3970379..5028897 100644
>> >  --- a/block/qed.c
>> >  +++ b/block/qed.c
>> >  @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>> >                return ret;
>> >            }
>> >
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            return -ENOTSUP;
>> >  +        }
>> >  +
>> >            if (s->header.features&  QED_F_BACKING_FORMAT_NO_PROBE) {
>> >                pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>> >            }
>> >  diff --git a/block/raw-posix.c b/block/raw-posix.c
>> >  index 34b64aa..cec4d36 100644
>> >  --- a/block/raw-posix.c
>> >  +++ b/block/raw-posix.c
>> >  @@ -28,6 +28,7 @@
>> >    #include "block_int.h"
>> >    #include "module.h"
>> >    #include "block/raw-posix-aio.h"
>> >  +#include "monitor.h"
>> >
>> >    #ifdef CONFIG_COCOA
>> >    #include<paths.h>
>> >  @@ -185,7 +186,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> >                               int bdrv_flags, int open_flags)
>> >    {
>> >        BDRVRawState *s = bs->opaque;
>> >  -    int fd, ret;
>> >  +    int fd = -1;
>> >  +    int ret;
>> >
>> >        ret = raw_normalize_devicepath(&filename);
>> >        if (ret != 0) {
>> >  @@ -207,15 +209,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>> >        if (!(bdrv_flags&  BDRV_O_CACHE_WB))
>> >            s->open_flags |= O_DSYNC;
>> >
>> >  -    s->fd = -1;
>> >  -    fd = qemu_open(filename, s->open_flags, 0644);
>> >  -    if (fd<  0) {
>> >  -        ret = -errno;
>> >  -        if (ret == -EROFS)
>> >  -            ret = -EACCES;
>> >  -        return ret;
>> >  +    if (s->fd == -1) {
>> >  +        fd = qemu_open(filename, s->open_flags, 0644);
>> >  +        if (fd<  0) {
>> >  +            ret = -errno;
>> >  +            if (ret == -EROFS) {
>> >  +                ret = -EACCES;
>> >  +            }
>> >  +            return ret;
>> >  +        }
>> >  +        s->fd = fd;
>> >        }
>> >  -    s->fd = fd;
>> >        s->aligned_buf = NULL;
>> >
>> >        if ((bdrv_flags&  BDRV_O_NOCACHE)) {
>> >  @@ -272,6 +276,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>> >    {
>> >        BDRVRawState *s = bs->opaque;
>> >
>> >  +    s->fd = -1;
>> >        s->type = FTYPE_FILE;
>> >        return raw_open_common(bs, filename, flags, 0);
>> >    }
>> >  @@ -892,6 +897,60 @@ static BlockDriver bdrv_file = {
>> >        .create_options = raw_create_options,
>> >    };
>> >
>> >  +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
>> >  +{
>> >  +    BDRVRawState *s = bs->opaque;
>> >  +    const char *fd_str;
>> >  +    int fd;
>> >  +
>> >  +    /* extract the file descriptor - fail if it's not fd: */
>> >  +    if (!strstart(filename, "fd:",&fd_str)) {
>> >  +        return -EINVAL;
>> >  +    }
>> >  +
>> >  +    if (!qemu_isdigit(fd_str[0])) {
>> >  +        /* get fd from monitor */
>> >  +        fd = qemu_get_fd(fd_str);
>> >  +        if (fd == -1) {
>> >  +            return -EBADF;
>> >  +        }
>> >  +    } else {
>> >  +        char *endptr = NULL;
>> >  +
>> >  +        fd = strtol(fd_str,&endptr, 10);
>> >  +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
>> >  +            return -EBADF;
>> >  +        }
>> >  +    }
>> >  +
>> >  +    s->fd = fd;
>> >  +    s->type = FTYPE_FILE;
>> >  +
>> >  +    return raw_open_common(bs, filename, flags, 0);
>> >  +}
>> >  +
>> >  +static BlockDriver bdrv_file_fd = {
>> >  +    .format_name = "file",
>> >  +    .protocol_name = "fd",
>> >  +    .instance_size = sizeof(BDRVRawState),
>> >  +    .bdrv_probe = NULL, /* no probe for protocols */
>> >  +    .bdrv_file_open = raw_open_fd,
>> >  +    .bdrv_read = raw_read,
>> >  +    .bdrv_write = raw_write,
>> >  +    .bdrv_close = raw_close,
>> >  +    .bdrv_flush = raw_flush,
>> >  +    .bdrv_discard = raw_discard,
>> >  +
>> >  +    .bdrv_aio_readv = raw_aio_readv,
>> >  +    .bdrv_aio_writev = raw_aio_writev,
>> >  +    .bdrv_aio_flush = raw_aio_flush,
>> >  +
>> >  +    .bdrv_truncate = raw_truncate,
>> >  +    .bdrv_getlength = raw_getlength,
>> >  +
>> >  +    .create_options = raw_create_options,
>> >  +};
>> >  +
>> >    /***********************************************/
>> >    /* host device */
>> >
>> >  @@ -1000,6 +1059,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>> >        }
>> >    #endif
>> >
>> >  +    s->fd = -1;
>> >        s->type = FTYPE_FILE;
>> >    #if defined(__linux__)
>> >        {
>> >  @@ -1170,6 +1230,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>> >        BDRVRawState *s = bs->opaque;
>> >        int ret;
>> >
>> >  +    s->fd = -1;
>> >        s->type = FTYPE_FD;
>> >
>> >        /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
>> >  @@ -1288,6 +1349,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>> >    {
>> >        BDRVRawState *s = bs->opaque;
>> >
>> >  +    s->fd = -1;
>> >        s->type = FTYPE_CD;
>> >
>> >        /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
>> >  @@ -1517,6 +1579,7 @@ static void bdrv_file_init(void)
>> >         * Register all the drivers.  Note that order is important, the driver
>> >         * registered last will get probed first.
>> >         */
>> >  +    bdrv_register(&bdrv_file_fd);
>> >        bdrv_register(&bdrv_file);
>> >        bdrv_register(&bdrv_host_device);
> The part modifying raw looks good at the first sight.
>

Ok. This leads me to think that a block/fd.c file isn't necessary then.

>> >    #ifdef __linux__
>> >  diff --git a/block/vmdk.c b/block/vmdk.c
>> >  index 922b23d..2ea808e 100644
>> >  --- a/block/vmdk.c
>> >  +++ b/block/vmdk.c
>> >  @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>> >                return -1;
>> >
>> >            pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>> >  +
>> >  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>> >  +            /* backing file currently not supported by fd: protocol */
>> >  +            return -1;
>> >  +        }
>> >        }
>> >
>> >        return 0;
>> >  diff --git a/block_int.h b/block_int.h
>> >  index 1e265d2..441049c 100644
>> >  --- a/block_int.h
>> >  +++ b/block_int.h
>> >  @@ -152,6 +152,7 @@ struct BlockDriverState {
>> >        int encrypted; /* if true, the media is encrypted */
>> >        int valid_key; /* if true, a valid encryption key has been set */
>> >        int sg;        /* if true, the device is a /dev/sg* */
>> >  +    int fd_protocol; /* if true, the fd: protocol was specified */
> You don't need this any more when you remove bdrv_is_fd_protocol()
>
>> >        /* event callback when inserting/removing */
>> >        void (*change_cb)(void *opaque, int reason);
>> >        void *change_opaque;
>> >  diff --git a/blockdev.c b/blockdev.c
>> >  index c263663..5cb7b56 100644
>> >  --- a/blockdev.c
>> >  +++ b/blockdev.c
>> >  @@ -542,6 +542,14 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>> >
>> >        bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>> >
>> >  +    if (strncmp(file, "fd:", 3) == 0) {
>> >  +        if (media == MEDIA_CDROM) {
>> >  +            error_report("CD-ROM not supported by fd: protocol");
>> >  +            goto err;
>> >  +        }
>> >  +        dinfo->bdrv->fd_protocol = 1;
>> >  +    }
> Why?
>

The thought was that CD-ROMs can be re-opened but perhaps that is only 
applicable to native CD-ROM, which may rule out an iso stored on NFS. 
Is that what you're thinking?

>> >  +
>> >        ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>> >        if (ret<  0) {
>> >            error_report("could not open disk image %s: %s",
>> >  @@ -602,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
>> >            goto out;
>> >        }
>> >
>> >  +    if (bdrv_is_fd_protocol(bs)) {
>> >  +        qerror_report(QERR_UNSUPPORTED);
>> >  +        ret = -1;
>> >  +        goto out;
>> >  +    }
>> >  +
>> >        pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>> >
>> >        old_drv = bs->drv;
>> >  @@ -719,6 +733,11 @@ int do_change_block(Monitor *mon, const char *device,
>> >        BlockDriver *drv = NULL;
>> >        int bdrv_flags;
>> >
>> >  +    if (strncmp(filename, "fd:", 3) == 0) {
>> >  +        qerror_report(QERR_UNSUPPORTED);
>> >  +        return -1;
>> >  +    }
> What's the problem here?
>

I don't think this code is necessary.

> Kevin
>

Regards,
Corey
Alexander Graf - July 26, 2011, 5:10 p.m.
On 26.07.2011, at 18:57, Corey Bryant wrote:

> 
> Kevin, thanks for the input.
> 
> On 07/26/2011 11:18 AM, Kevin Wolf wrote:
>> Am 26.07.2011 14:51, schrieb Corey Bryant:
>>> >  sVirt provides SELinux MAC isolation for Qemu guest processes and their
>>> >  corresponding resources (image files). sVirt provides this support
>>> >  by labeling guests and resources with security labels that are stored
>>> >  in file system extended attributes. Some file systems, such as NFS, do
>>> >  not support the extended attribute security namespace, which is needed
>>> >  for image file isolation when using the sVirt SELinux security driver
>>> >  in libvirt.
>>> >
>>> >  The proposed solution entails a combination of Qemu, libvirt, and
>>> >  SELinux patches that work together to isolate multiple guests' images
>>> >  when they're stored in the same NFS mount. This results in an
>>> >  environment where sVirt isolation and NFS image file isolation can both
>>> >  be provided.
>>> >
>>> >  This patch contains the Qemu code to support this solution. I would
>>> >  like to solicit input from the libvirt community prior to starting
>>> >  the libvirt patch.
>>> >
>>> >  Currently, Qemu opens an image file in addition to performing the
>>> >  necessary read and write operations. The proposed solution will move
>>> >  the open out of Qemu and into libvirt. Once libvirt opens an image
>>> >  file for the guest, it will pass the file descriptor to Qemu via a
>>> >  new fd: protocol.
>>> >
>>> >  If the image file resides in an NFS mount, the following SELinux policy
>>> >  changes will provide image isolation:
>>> >
>>> >     - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>>> >       allow Qemu (svirt_t) to only have SELinux read and write
>>> >       permissions on nfs_t files
>>> >
>>> >     - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>>> >       (virtd_t) file descriptors
>>> >
>>> >  Following is a sample invocation of Qemu using the fd: protocol on
>>> >  the command line:
>>> >
>>> >       qemu -drive file=fd:4,format=qcow2
>>> >
>>> >  The fd: protocol is also supported by the drive_add monitor command.
>>> >  This requires that the specified file descriptor is passed to the
>>> >  monitor alongside a prior getfd monitor command.
>>> >
>>> >  There are some additional features provided by certain image types
>>> >  where Qemu reopens the image file. All of these scenarios will be
>>> >  unsupported for the fd: protocol, at least for this patch:
>>> >
>>> >     - The -snapshot command line option
>>> >     - The savevm monitor command
>>> >     - The snapshot_blkdev monitor command
>>> >     - Use of copy-on-write image files
>>> >     - The -cdrom command line option
>>> >     - The -drive command line option with media=cdrom
>>> >     - The change monitor command
>>> >
>>> >  The thought is that this support can be added in the future, but is
>>> >  not required for the initial fd: support.
>>> >
>>> >  This patch was tested with the following formats: raw, cow, qcow,
>>> >  qcow2, qed, and vmdk, using the fd: protocol from the command line
>>> >  and the monitor. Tests were also run to verify existing file name
>>> >  support and qemu-img were not regressed. Non-valid file descriptors,
>>> >  fd: without format, snapshot and backing files, and cdrom were also
>>> >  tested.
>>> >
>>> >  v2:
>>> >     - Add drive_add monitor command support
>>> >     - Fence off unsupported features that re-open image file
>>> >
>>> >  v3:
>>> >     - Fence off cdrom and change monitor command support
>>> >
>>> >  Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
>>> >  ---
>>> >    block.c           |   16 ++++++++++
>>> >    block.h           |    1 +
>>> >    block/cow.c       |    5 +++
>>> >    block/qcow.c      |    5 +++
>>> >    block/qcow2.c     |    5 +++
>>> >    block/qed.c       |    4 ++
>>> >    block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>>> >    block/vmdk.c      |    5 +++
>>> >    block_int.h       |    1 +
>>> >    blockdev.c        |   19 ++++++++++++
>>> >    monitor.c         |    5 +++
>>> >    monitor.h         |    1 +
>>> >    qemu-options.hx   |    8 +++--
>>> >    qemu-tool.c       |    5 +++
>>> >    14 files changed, 149 insertions(+), 12 deletions(-)
>>> >
>>> >  diff --git a/block.c b/block.c
>>> >  index 24a25d5..500db84 100644
>>> >  --- a/block.c
>>> >  +++ b/block.c
>>> >  @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>> >            char tmp_filename[PATH_MAX];
>>> >            char backing_filename[PATH_MAX];
>>> >
>>> >  +        if (bdrv_is_fd_protocol(bs)) {
>>> >  +            return -ENOTSUP;
>>> >  +        }
>> Hm, shouldn't that just work even with fd?
>> 
> 
> My thought was that the proposed SELinux changes would prevent the open of the temporary backing file if the file corresponding to fd resides on NFS.  But perhaps the backing file is not opened on NFS?

Then make a new flag that allows you to prohibit backing files.

> 
>>> >  +
>>> >            /* if snapshot, we create a temporary backing file and open it
>>> >               instead of opening 'filename' directly */
>>> >
>>> >  @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>> >
>>> >        /* Find the right image format driver */
>>> >        if (!drv) {
>>> >  +        /* format must be specified for fd: protocol */
>>> >  +        if (bdrv_is_fd_protocol(bs)) {
>>> >  +            return -ENOTSUP;
>>> >  +        }
>> This isn't really required to make fd work.
>> 
> 
> If format is not specified, the file needs to be opened and probed to determine the format.  The proposed SELinux changes should prevent the open if it resides on NFS.

You're mixing together completely different semantics. This is not the "libvirt super-uber-great-special block driver", it's an fd driver. Everything SELinux should be separate. So again, add an option somewhere that allows you to generically disable probing of drivers. Or if all you care about is libvirt, simply always pass a format= option.

> 
>>> >            ret = find_image_format(filename,&drv);
>>> >        }
>>> >
>>> >  @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>>> >        return bs->enable_write_cache;
>>> >    }
>>> >
>>> >  +int bdrv_is_fd_protocol(BlockDriverState *bs)
>>> >  +{
>>> >  +    return bs->fd_protocol;
>>> >  +}
>> The generic block layer shouldn't have any special cases based on the
>> format driver. If you need to do something different for fd, think about
>> what property of fd it is that requires the special case (like can't
>> reopen).
>> 
> 
> I'm not sure I completely understand what you're saying here.  I understand the desire to not have special cases based on the fd protocol.  I have a number special case checks that prevent re-opens, which presumably would not be allowed by the proposed SELinux changes. Should there be no special cases at all (e.g. should I provide solutions to all the re-opens now) or would you rather special cases be provided at a different layer?

Whatever you do, don't ever try to make anything but your fd driver aware of fd drivers. a NO_REOPEN flag would be one way to achieve what you're trying to do.

> 
>>> >  +
>>> >    /* XXX: no longer used */
>>> >    void bdrv_set_change_cb(BlockDriverState *bs,
>>> >                            void (*change_cb)(void *opaque, int reason),
>>> >  @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>>> >        BlockDriver *drv = bs->drv;
>>> >        if (!drv)
>>> >            return -ENOMEDIUM;
>>> >  +    if (bdrv_is_fd_protocol(bs)) {
>>> >  +        return -ENOTSUP;
>>> >  +    }
>> What's the problem with internal snapshots in images accesses over an fd?
>> 
> 
> I was thinking the proposed SELinux changes would prevent the open of the snapshot file if the file corresponding to fd resides on NFS.  But perhaps it is not opened on NFS?

Same as above - this has nothing to do with the fd driver.


Alex
Corey Bryant - July 26, 2011, 7 p.m.
On 07/26/2011 01:10 PM, Alexander Graf wrote:
> On 26.07.2011, at 18:57, Corey Bryant wrote:
>
>> >
>> >  Kevin, thanks for the input.
>> >
>> >  On 07/26/2011 11:18 AM, Kevin Wolf wrote:
>>> >>  Am 26.07.2011 14:51, schrieb Corey Bryant:
>>>>> >>>  >    sVirt provides SELinux MAC isolation for Qemu guest processes and their
>>>>> >>>  >    corresponding resources (image files). sVirt provides this support
>>>>> >>>  >    by labeling guests and resources with security labels that are stored
>>>>> >>>  >    in file system extended attributes. Some file systems, such as NFS, do
>>>>> >>>  >    not support the extended attribute security namespace, which is needed
>>>>> >>>  >    for image file isolation when using the sVirt SELinux security driver
>>>>> >>>  >    in libvirt.
>>>>> >>>  >
>>>>> >>>  >    The proposed solution entails a combination of Qemu, libvirt, and
>>>>> >>>  >    SELinux patches that work together to isolate multiple guests' images
>>>>> >>>  >    when they're stored in the same NFS mount. This results in an
>>>>> >>>  >    environment where sVirt isolation and NFS image file isolation can both
>>>>> >>>  >    be provided.
>>>>> >>>  >
>>>>> >>>  >    This patch contains the Qemu code to support this solution. I would
>>>>> >>>  >    like to solicit input from the libvirt community prior to starting
>>>>> >>>  >    the libvirt patch.
>>>>> >>>  >
>>>>> >>>  >    Currently, Qemu opens an image file in addition to performing the
>>>>> >>>  >    necessary read and write operations. The proposed solution will move
>>>>> >>>  >    the open out of Qemu and into libvirt. Once libvirt opens an image
>>>>> >>>  >    file for the guest, it will pass the file descriptor to Qemu via a
>>>>> >>>  >    new fd: protocol.
>>>>> >>>  >
>>>>> >>>  >    If the image file resides in an NFS mount, the following SELinux policy
>>>>> >>>  >    changes will provide image isolation:
>>>>> >>>  >
>>>>> >>>  >       - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>>>>> >>>  >         allow Qemu (svirt_t) to only have SELinux read and write
>>>>> >>>  >         permissions on nfs_t files
>>>>> >>>  >
>>>>> >>>  >       - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>>>>> >>>  >         (virtd_t) file descriptors
>>>>> >>>  >
>>>>> >>>  >    Following is a sample invocation of Qemu using the fd: protocol on
>>>>> >>>  >    the command line:
>>>>> >>>  >
>>>>> >>>  >         qemu -drive file=fd:4,format=qcow2
>>>>> >>>  >
>>>>> >>>  >    The fd: protocol is also supported by the drive_add monitor command.
>>>>> >>>  >    This requires that the specified file descriptor is passed to the
>>>>> >>>  >    monitor alongside a prior getfd monitor command.
>>>>> >>>  >
>>>>> >>>  >    There are some additional features provided by certain image types
>>>>> >>>  >    where Qemu reopens the image file. All of these scenarios will be
>>>>> >>>  >    unsupported for the fd: protocol, at least for this patch:
>>>>> >>>  >
>>>>> >>>  >       - The -snapshot command line option
>>>>> >>>  >       - The savevm monitor command
>>>>> >>>  >       - The snapshot_blkdev monitor command
>>>>> >>>  >       - Use of copy-on-write image files
>>>>> >>>  >       - The -cdrom command line option
>>>>> >>>  >       - The -drive command line option with media=cdrom
>>>>> >>>  >       - The change monitor command
>>>>> >>>  >
>>>>> >>>  >    The thought is that this support can be added in the future, but is
>>>>> >>>  >    not required for the initial fd: support.
>>>>> >>>  >
>>>>> >>>  >    This patch was tested with the following formats: raw, cow, qcow,
>>>>> >>>  >    qcow2, qed, and vmdk, using the fd: protocol from the command line
>>>>> >>>  >    and the monitor. Tests were also run to verify existing file name
>>>>> >>>  >    support and qemu-img were not regressed. Non-valid file descriptors,
>>>>> >>>  >    fd: without format, snapshot and backing files, and cdrom were also
>>>>> >>>  >    tested.
>>>>> >>>  >
>>>>> >>>  >    v2:
>>>>> >>>  >       - Add drive_add monitor command support
>>>>> >>>  >       - Fence off unsupported features that re-open image file
>>>>> >>>  >
>>>>> >>>  >    v3:
>>>>> >>>  >       - Fence off cdrom and change monitor command support
>>>>> >>>  >
>>>>> >>>  >    Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
>>>>> >>>  >    ---
>>>>> >>>  >      block.c           |   16 ++++++++++
>>>>> >>>  >      block.h           |    1 +
>>>>> >>>  >      block/cow.c       |    5 +++
>>>>> >>>  >      block/qcow.c      |    5 +++
>>>>> >>>  >      block/qcow2.c     |    5 +++
>>>>> >>>  >      block/qed.c       |    4 ++
>>>>> >>>  >      block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>> >>>  >      block/vmdk.c      |    5 +++
>>>>> >>>  >      block_int.h       |    1 +
>>>>> >>>  >      blockdev.c        |   19 ++++++++++++
>>>>> >>>  >      monitor.c         |    5 +++
>>>>> >>>  >      monitor.h         |    1 +
>>>>> >>>  >      qemu-options.hx   |    8 +++--
>>>>> >>>  >      qemu-tool.c       |    5 +++
>>>>> >>>  >      14 files changed, 149 insertions(+), 12 deletions(-)
>>>>> >>>  >
>>>>> >>>  >    diff --git a/block.c b/block.c
>>>>> >>>  >    index 24a25d5..500db84 100644
>>>>> >>>  >    --- a/block.c
>>>>> >>>  >    +++ b/block.c
>>>>> >>>  >    @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>>>> >>>  >              char tmp_filename[PATH_MAX];
>>>>> >>>  >              char backing_filename[PATH_MAX];
>>>>> >>>  >
>>>>> >>>  >    +        if (bdrv_is_fd_protocol(bs)) {
>>>>> >>>  >    +            return -ENOTSUP;
>>>>> >>>  >    +        }
>>> >>  Hm, shouldn't that just work even with fd?
>>> >>
>> >
>> >  My thought was that the proposed SELinux changes would prevent the open of the temporary backing file if the file corresponding to fd resides on NFS.  But perhaps the backing file is not opened on NFS?
> Then make a new flag that allows you to prohibit backing files.
>
>> >
>>>>> >>>  >    +
>>>>> >>>  >              /* if snapshot, we create a temporary backing file and open it
>>>>> >>>  >                 instead of opening 'filename' directly */
>>>>> >>>  >
>>>>> >>>  >    @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>>>> >>>  >
>>>>> >>>  >          /* Find the right image format driver */
>>>>> >>>  >          if (!drv) {
>>>>> >>>  >    +        /* format must be specified for fd: protocol */
>>>>> >>>  >    +        if (bdrv_is_fd_protocol(bs)) {
>>>>> >>>  >    +            return -ENOTSUP;
>>>>> >>>  >    +        }
>>> >>  This isn't really required to make fd work.
>>> >>
>> >
>> >  If format is not specified, the file needs to be opened and probed to determine the format.  The proposed SELinux changes should prevent the open if it resides on NFS.
> You're mixing together completely different semantics. This is not the "libvirt super-uber-great-special block driver", it's an fd driver. Everything SELinux should be separate. So again, add an option somewhere that allows you to generically disable probing of drivers. Or if all you care about is libvirt, simply always pass a format= option.
>
>> >
>>>>> >>>  >              ret = find_image_format(filename,&drv);
>>>>> >>>  >          }
>>>>> >>>  >
>>>>> >>>  >    @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>>>>> >>>  >          return bs->enable_write_cache;
>>>>> >>>  >      }
>>>>> >>>  >
>>>>> >>>  >    +int bdrv_is_fd_protocol(BlockDriverState *bs)
>>>>> >>>  >    +{
>>>>> >>>  >    +    return bs->fd_protocol;
>>>>> >>>  >    +}
>>> >>  The generic block layer shouldn't have any special cases based on the
>>> >>  format driver. If you need to do something different for fd, think about
>>> >>  what property of fd it is that requires the special case (like can't
>>> >>  reopen).
>>> >>
>> >
>> >  I'm not sure I completely understand what you're saying here.  I understand the desire to not have special cases based on the fd protocol.  I have a number special case checks that prevent re-opens, which presumably would not be allowed by the proposed SELinux changes. Should there be no special cases at all (e.g. should I provide solutions to all the re-opens now) or would you rather special cases be provided at a different layer?
> Whatever you do, don't ever try to make anything but your fd driver aware of fd drivers. a NO_REOPEN flag would be one way to achieve what you're trying to do.
>
>> >
>>>>> >>>  >    +
>>>>> >>>  >      /* XXX: no longer used */
>>>>> >>>  >      void bdrv_set_change_cb(BlockDriverState *bs,
>>>>> >>>  >                              void (*change_cb)(void *opaque, int reason),
>>>>> >>>  >    @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>>>>> >>>  >          BlockDriver *drv = bs->drv;
>>>>> >>>  >          if (!drv)
>>>>> >>>  >              return -ENOMEDIUM;
>>>>> >>>  >    +    if (bdrv_is_fd_protocol(bs)) {
>>>>> >>>  >    +        return -ENOTSUP;
>>>>> >>>  >    +    }
>>> >>  What's the problem with internal snapshots in images accesses over an fd?
>>> >>
>> >
>> >  I was thinking the proposed SELinux changes would prevent the open of the snapshot file if the file corresponding to fd resides on NFS.  But perhaps it is not opened on NFS?
> Same as above - this has nothing to do with the fd driver.
>
>
> Alex
>


Alex, I understand what you and Kevin are saying now.  Thanks for 
clearing this up for me.

Regards,
Corey
Kevin Wolf - July 27, 2011, 8:11 a.m.
Am 26.07.2011 18:57, schrieb Corey Bryant:
> 
> Kevin, thanks for the input.
> 
> On 07/26/2011 11:18 AM, Kevin Wolf wrote:
>> Am 26.07.2011 14:51, schrieb Corey Bryant:
>>>>  sVirt provides SELinux MAC isolation for Qemu guest processes and their
>>>>  corresponding resources (image files). sVirt provides this support
>>>>  by labeling guests and resources with security labels that are stored
>>>>  in file system extended attributes. Some file systems, such as NFS, do
>>>>  not support the extended attribute security namespace, which is needed
>>>>  for image file isolation when using the sVirt SELinux security driver
>>>>  in libvirt.
>>>>
>>>>  The proposed solution entails a combination of Qemu, libvirt, and
>>>>  SELinux patches that work together to isolate multiple guests' images
>>>>  when they're stored in the same NFS mount. This results in an
>>>>  environment where sVirt isolation and NFS image file isolation can both
>>>>  be provided.
>>>>
>>>>  This patch contains the Qemu code to support this solution. I would
>>>>  like to solicit input from the libvirt community prior to starting
>>>>  the libvirt patch.
>>>>
>>>>  Currently, Qemu opens an image file in addition to performing the
>>>>  necessary read and write operations. The proposed solution will move
>>>>  the open out of Qemu and into libvirt. Once libvirt opens an image
>>>>  file for the guest, it will pass the file descriptor to Qemu via a
>>>>  new fd: protocol.
>>>>
>>>>  If the image file resides in an NFS mount, the following SELinux policy
>>>>  changes will provide image isolation:
>>>>
>>>>     - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>>>>       allow Qemu (svirt_t) to only have SELinux read and write
>>>>       permissions on nfs_t files
>>>>
>>>>     - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>>>>       (virtd_t) file descriptors
>>>>
>>>>  Following is a sample invocation of Qemu using the fd: protocol on
>>>>  the command line:
>>>>
>>>>       qemu -drive file=fd:4,format=qcow2
>>>>
>>>>  The fd: protocol is also supported by the drive_add monitor command.
>>>>  This requires that the specified file descriptor is passed to the
>>>>  monitor alongside a prior getfd monitor command.
>>>>
>>>>  There are some additional features provided by certain image types
>>>>  where Qemu reopens the image file. All of these scenarios will be
>>>>  unsupported for the fd: protocol, at least for this patch:
>>>>
>>>>     - The -snapshot command line option
>>>>     - The savevm monitor command
>>>>     - The snapshot_blkdev monitor command
>>>>     - Use of copy-on-write image files
>>>>     - The -cdrom command line option
>>>>     - The -drive command line option with media=cdrom
>>>>     - The change monitor command
>>>>
>>>>  The thought is that this support can be added in the future, but is
>>>>  not required for the initial fd: support.
>>>>
>>>>  This patch was tested with the following formats: raw, cow, qcow,
>>>>  qcow2, qed, and vmdk, using the fd: protocol from the command line
>>>>  and the monitor. Tests were also run to verify existing file name
>>>>  support and qemu-img were not regressed. Non-valid file descriptors,
>>>>  fd: without format, snapshot and backing files, and cdrom were also
>>>>  tested.
>>>>
>>>>  v2:
>>>>     - Add drive_add monitor command support
>>>>     - Fence off unsupported features that re-open image file
>>>>
>>>>  v3:
>>>>     - Fence off cdrom and change monitor command support
>>>>
>>>>  Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
>>>>  ---
>>>>    block.c           |   16 ++++++++++
>>>>    block.h           |    1 +
>>>>    block/cow.c       |    5 +++
>>>>    block/qcow.c      |    5 +++
>>>>    block/qcow2.c     |    5 +++
>>>>    block/qed.c       |    4 ++
>>>>    block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>    block/vmdk.c      |    5 +++
>>>>    block_int.h       |    1 +
>>>>    blockdev.c        |   19 ++++++++++++
>>>>    monitor.c         |    5 +++
>>>>    monitor.h         |    1 +
>>>>    qemu-options.hx   |    8 +++--
>>>>    qemu-tool.c       |    5 +++
>>>>    14 files changed, 149 insertions(+), 12 deletions(-)
>>>>
>>>>  diff --git a/block.c b/block.c
>>>>  index 24a25d5..500db84 100644
>>>>  --- a/block.c
>>>>  +++ b/block.c
>>>>  @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>>>            char tmp_filename[PATH_MAX];
>>>>            char backing_filename[PATH_MAX];
>>>>
>>>>  +        if (bdrv_is_fd_protocol(bs)) {
>>>>  +            return -ENOTSUP;
>>>>  +        }
>> Hm, shouldn't that just work even with fd?
>>
> 
> My thought was that the proposed SELinux changes would prevent the open 
> of the temporary backing file if the file corresponding to fd resides on 
> NFS.  But perhaps the backing file is not opened on NFS?

Depends on how broken your SELinux restrictions are. ;-)

I would argue that allowing qemu to create temporary files is a
reasonable thing to do. Maybe libvirt comes to a different conclusion,
but that doesn't mean that every other management tool comes to the same.

Also, as Alex already said, you shouldn't think of your use case as the
only valid use case. What a fd driver implementation is about is just
working with an fd for images. If libvirts puts more restrictions on it,
that's fine, but don't force other users to follow your model.

>>>>  +
>>>>            /* if snapshot, we create a temporary backing file and open it
>>>>               instead of opening 'filename' directly */
>>>>
>>>>  @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>>>
>>>>        /* Find the right image format driver */
>>>>        if (!drv) {
>>>>  +        /* format must be specified for fd: protocol */
>>>>  +        if (bdrv_is_fd_protocol(bs)) {
>>>>  +            return -ENOTSUP;
>>>>  +        }
>> This isn't really required to make fd work.
>>
> 
> If format is not specified, the file needs to be opened and probed to 
> determine the format.  The proposed SELinux changes should prevent the 
> open if it resides on NFS.

SELinux shouldn't see an open() if you bdrv_open() an image with the fd
protocol. I think the only thing to which you need to pay attention is
that you can open it multiple times, i.e. don't close the fd on
bdrv_close (or maybe just dup() it in bdrv_open?)

>>>>            ret = find_image_format(filename,&drv);
>>>>        }
>>>>
>>>>  @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>>>>        return bs->enable_write_cache;
>>>>    }
>>>>
>>>>  +int bdrv_is_fd_protocol(BlockDriverState *bs)
>>>>  +{
>>>>  +    return bs->fd_protocol;
>>>>  +}
>> The generic block layer shouldn't have any special cases based on the
>> format driver. If you need to do something different for fd, think about
>> what property of fd it is that requires the special case (like can't
>> reopen).
>>
> 
> I'm not sure I completely understand what you're saying here.  I 
> understand the desire to not have special cases based on the fd 
> protocol.  I have a number special case checks that prevent re-opens, 
> which presumably would not be allowed by the proposed SELinux changes. 
> Should there be no special cases at all (e.g. should I provide solutions 
> to all the re-opens now) or would you rather special cases be provided 
> at a different layer?

bdrv_is_fd_protocol() names a specific driver instead of the important
property of that driver. Something like bdrv_can_reopen() would be
generic and could be used by other drivers having the same property.

Something like this should be a flag of the BlockDriver and not of the
BlockDriverState.

>>>>  +
>>>>    /* XXX: no longer used */
>>>>    void bdrv_set_change_cb(BlockDriverState *bs,
>>>>                            void (*change_cb)(void *opaque, int reason),
>>>>  @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>>>>        BlockDriver *drv = bs->drv;
>>>>        if (!drv)
>>>>            return -ENOMEDIUM;
>>>>  +    if (bdrv_is_fd_protocol(bs)) {
>>>>  +        return -ENOTSUP;
>>>>  +    }
>> What's the problem with internal snapshots in images accesses over an fd?
>>
> 
> I was thinking the proposed SELinux changes would prevent the open of 
> the snapshot file if the file corresponding to fd resides on NFS.  But 
> perhaps it is not opened on NFS?

This is about internal snapshots, it doesn't open anything.

>>>>        if (drv->bdrv_snapshot_create)
>>>>            return drv->bdrv_snapshot_create(bs, sn_info);
>>>>        if (bs->file)
>>>>  diff --git a/block.h b/block.h
>>>>  index 859d1d9..0417b69 100644
>>>>  --- a/block.h
>>>>  +++ b/block.h
>>>>  @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>>>>    int bdrv_is_read_only(BlockDriverState *bs);
>>>>    int bdrv_is_sg(BlockDriverState *bs);
>>>>    int bdrv_enable_write_cache(BlockDriverState *bs);
>>>>  +int bdrv_is_fd_protocol(BlockDriverState *bs);
>>>>    int bdrv_is_inserted(BlockDriverState *bs);
>>>>    int bdrv_media_changed(BlockDriverState *bs);
>>>>    int bdrv_is_locked(BlockDriverState *bs);
>>>>  diff --git a/block/cow.c b/block/cow.c
>>>>  index 4cf543c..e17f8e7 100644
>>>>  --- a/block/cow.c
>>>>  +++ b/block/cow.c
>>>>  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>>>>        pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>>>>                cow_header.backing_file);
>>>>
>>>>  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +        /* backing file currently not supported by fd: protocol */
>>>>  +        goto fail;
>>>>  +    }
>> I don't think these checks are strictly needed. Without the check you
>> can open the image itself using an fd, and the backing file using good
>> old raw-posix. We shouldn't decide for our users that this isn't useful.
>>
>> In any case, it would have to move into block.c, you can't modify
>> independent drivers for this.
>>
> 
> I understand the point on not modifying independent drivers.
> 
> But if the backing file resides on NFS, wouldn't the the proposed 
> SELinux changes prevent the open?

Probably. But what about cases where the backing file is local? Or even
a non-libvirt, non-SELinux use case?

> Or are you talking about support where libvirt opens the backing file 
> and passes the fd to Qemu?  If so there was some discussion about future 
> support for this here: 
> http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01496.html

Not really, but implementing this will be a bit easier if you don't
forbid using backing files with fd.


>>>>  +
>>>>        bitmap_size = ((bs->total_sectors + 7)>>  3) + sizeof(cow_header);
>>>>        s->cow_sectors_offset = (bitmap_size + 511)&  ~511;
>>>>        return 0;
>>>>  diff --git a/block/qcow.c b/block/qcow.c
>>>>  index 227b104..964d411 100644
>>>>  --- a/block/qcow.c
>>>>  +++ b/block/qcow.c
>>>>  @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>>>>            if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>>>>                goto fail;
>>>>            bs->backing_file[len] = '\0';
>>>>  +
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            /* backing file currently not supported by fd: protocol */
>>>>  +            goto fail;
>>>>  +        }
>>>>        }
>>>>        return 0;
>>>>
>>>>  diff --git a/block/qcow2.c b/block/qcow2.c
>>>>  index 48e1b95..7f6a4fa 100644
>>>>  --- a/block/qcow2.c
>>>>  +++ b/block/qcow2.c
>>>>  @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>>>>                goto fail;
>>>>            }
>>>>            bs->backing_file[len] = '\0';
>>>>  +
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            ret = -ENOTSUP;
>>>>  +            goto fail;
>>>>  +        }
>>>>        }
>>>>        if (qcow2_read_snapshots(bs)<  0) {
>>>>            ret = -EINVAL;
>>>>  diff --git a/block/qed.c b/block/qed.c
>>>>  index 3970379..5028897 100644
>>>>  --- a/block/qed.c
>>>>  +++ b/block/qed.c
>>>>  @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>>>>                return ret;
>>>>            }
>>>>
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            return -ENOTSUP;
>>>>  +        }
>>>>  +
>>>>            if (s->header.features&  QED_F_BACKING_FORMAT_NO_PROBE) {
>>>>                pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>>>>            }
>>>>  diff --git a/block/raw-posix.c b/block/raw-posix.c
>>>>  index 34b64aa..cec4d36 100644
>>>>  --- a/block/raw-posix.c
>>>>  +++ b/block/raw-posix.c
>>>>  @@ -28,6 +28,7 @@
>>>>    #include "block_int.h"
>>>>    #include "module.h"
>>>>    #include "block/raw-posix-aio.h"
>>>>  +#include "monitor.h"
>>>>
>>>>    #ifdef CONFIG_COCOA
>>>>    #include<paths.h>
>>>>  @@ -185,7 +186,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>>>                               int bdrv_flags, int open_flags)
>>>>    {
>>>>        BDRVRawState *s = bs->opaque;
>>>>  -    int fd, ret;
>>>>  +    int fd = -1;
>>>>  +    int ret;
>>>>
>>>>        ret = raw_normalize_devicepath(&filename);
>>>>        if (ret != 0) {
>>>>  @@ -207,15 +209,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>>>        if (!(bdrv_flags&  BDRV_O_CACHE_WB))
>>>>            s->open_flags |= O_DSYNC;
>>>>
>>>>  -    s->fd = -1;
>>>>  -    fd = qemu_open(filename, s->open_flags, 0644);
>>>>  -    if (fd<  0) {
>>>>  -        ret = -errno;
>>>>  -        if (ret == -EROFS)
>>>>  -            ret = -EACCES;
>>>>  -        return ret;
>>>>  +    if (s->fd == -1) {
>>>>  +        fd = qemu_open(filename, s->open_flags, 0644);
>>>>  +        if (fd<  0) {
>>>>  +            ret = -errno;
>>>>  +            if (ret == -EROFS) {
>>>>  +                ret = -EACCES;
>>>>  +            }
>>>>  +            return ret;
>>>>  +        }
>>>>  +        s->fd = fd;
>>>>        }
>>>>  -    s->fd = fd;
>>>>        s->aligned_buf = NULL;
>>>>
>>>>        if ((bdrv_flags&  BDRV_O_NOCACHE)) {
>>>>  @@ -272,6 +276,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>>>>    {
>>>>        BDRVRawState *s = bs->opaque;
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_FILE;
>>>>        return raw_open_common(bs, filename, flags, 0);
>>>>    }
>>>>  @@ -892,6 +897,60 @@ static BlockDriver bdrv_file = {
>>>>        .create_options = raw_create_options,
>>>>    };
>>>>
>>>>  +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
>>>>  +{
>>>>  +    BDRVRawState *s = bs->opaque;
>>>>  +    const char *fd_str;
>>>>  +    int fd;
>>>>  +
>>>>  +    /* extract the file descriptor - fail if it's not fd: */
>>>>  +    if (!strstart(filename, "fd:",&fd_str)) {
>>>>  +        return -EINVAL;
>>>>  +    }
>>>>  +
>>>>  +    if (!qemu_isdigit(fd_str[0])) {
>>>>  +        /* get fd from monitor */
>>>>  +        fd = qemu_get_fd(fd_str);
>>>>  +        if (fd == -1) {
>>>>  +            return -EBADF;
>>>>  +        }
>>>>  +    } else {
>>>>  +        char *endptr = NULL;
>>>>  +
>>>>  +        fd = strtol(fd_str,&endptr, 10);
>>>>  +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
>>>>  +            return -EBADF;
>>>>  +        }
>>>>  +    }
>>>>  +
>>>>  +    s->fd = fd;
>>>>  +    s->type = FTYPE_FILE;
>>>>  +
>>>>  +    return raw_open_common(bs, filename, flags, 0);
>>>>  +}
>>>>  +
>>>>  +static BlockDriver bdrv_file_fd = {
>>>>  +    .format_name = "file",
>>>>  +    .protocol_name = "fd",
>>>>  +    .instance_size = sizeof(BDRVRawState),
>>>>  +    .bdrv_probe = NULL, /* no probe for protocols */
>>>>  +    .bdrv_file_open = raw_open_fd,
>>>>  +    .bdrv_read = raw_read,
>>>>  +    .bdrv_write = raw_write,
>>>>  +    .bdrv_close = raw_close,
>>>>  +    .bdrv_flush = raw_flush,
>>>>  +    .bdrv_discard = raw_discard,
>>>>  +
>>>>  +    .bdrv_aio_readv = raw_aio_readv,
>>>>  +    .bdrv_aio_writev = raw_aio_writev,
>>>>  +    .bdrv_aio_flush = raw_aio_flush,
>>>>  +
>>>>  +    .bdrv_truncate = raw_truncate,
>>>>  +    .bdrv_getlength = raw_getlength,
>>>>  +
>>>>  +    .create_options = raw_create_options,
>>>>  +};
>>>>  +
>>>>    /***********************************************/
>>>>    /* host device */
>>>>
>>>>  @@ -1000,6 +1059,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>>>>        }
>>>>    #endif
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_FILE;
>>>>    #if defined(__linux__)
>>>>        {
>>>>  @@ -1170,6 +1230,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>>>>        BDRVRawState *s = bs->opaque;
>>>>        int ret;
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_FD;
>>>>
>>>>        /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
>>>>  @@ -1288,6 +1349,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>>>>    {
>>>>        BDRVRawState *s = bs->opaque;
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_CD;
>>>>
>>>>        /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
>>>>  @@ -1517,6 +1579,7 @@ static void bdrv_file_init(void)
>>>>         * Register all the drivers.  Note that order is important, the driver
>>>>         * registered last will get probed first.
>>>>         */
>>>>  +    bdrv_register(&bdrv_file_fd);
>>>>        bdrv_register(&bdrv_file);
>>>>        bdrv_register(&bdrv_host_device);
>> The part modifying raw looks good at the first sight.
>>
> 
> Ok. This leads me to think that a block/fd.c file isn't necessary then.

Yes, having it in raw-posix.c is fine.

>>>>    #ifdef __linux__
>>>>  diff --git a/block/vmdk.c b/block/vmdk.c
>>>>  index 922b23d..2ea808e 100644
>>>>  --- a/block/vmdk.c
>>>>  +++ b/block/vmdk.c
>>>>  @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>>>>                return -1;
>>>>
>>>>            pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>>>>  +
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            /* backing file currently not supported by fd: protocol */
>>>>  +            return -1;
>>>>  +        }
>>>>        }
>>>>
>>>>        return 0;
>>>>  diff --git a/block_int.h b/block_int.h
>>>>  index 1e265d2..441049c 100644
>>>>  --- a/block_int.h
>>>>  +++ b/block_int.h
>>>>  @@ -152,6 +152,7 @@ struct BlockDriverState {
>>>>        int encrypted; /* if true, the media is encrypted */
>>>>        int valid_key; /* if true, a valid encryption key has been set */
>>>>        int sg;        /* if true, the device is a /dev/sg* */
>>>>  +    int fd_protocol; /* if true, the fd: protocol was specified */
>> You don't need this any more when you remove bdrv_is_fd_protocol()
>>
>>>>        /* event callback when inserting/removing */
>>>>        void (*change_cb)(void *opaque, int reason);
>>>>        void *change_opaque;
>>>>  diff --git a/blockdev.c b/blockdev.c
>>>>  index c263663..5cb7b56 100644
>>>>  --- a/blockdev.c
>>>>  +++ b/blockdev.c
>>>>  @@ -542,6 +542,14 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>>>
>>>>        bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>>>>
>>>>  +    if (strncmp(file, "fd:", 3) == 0) {
>>>>  +        if (media == MEDIA_CDROM) {
>>>>  +            error_report("CD-ROM not supported by fd: protocol");
>>>>  +            goto err;
>>>>  +        }
>>>>  +        dinfo->bdrv->fd_protocol = 1;
>>>>  +    }
>> Why?
>>
> 
> The thought was that CD-ROMs can be re-opened but perhaps that is only 
> applicable to native CD-ROM, which may rule out an iso stored on NFS. 
> Is that what you're thinking?

Actually, using host devices over the fd protocol is an interesting
point. I think in this implementation fd implicitly means file.

I don't think that with CD-ROM passthrough a reopen happens in normal
use cases. It does happen with the 'change' monitor command, but
changing to a different image (maybe another fd) shouldn't be a problem.

Kevin
Daniel P. Berrange - July 27, 2011, 8:22 a.m.
On Wed, Jul 27, 2011 at 10:11:06AM +0200, Kevin Wolf wrote:
> Am 26.07.2011 18:57, schrieb Corey Bryant:
> >>>>  diff --git a/block/cow.c b/block/cow.c
> >>>>  index 4cf543c..e17f8e7 100644
> >>>>  --- a/block/cow.c
> >>>>  +++ b/block/cow.c
> >>>>  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
> >>>>        pstrcpy(bs->backing_file, sizeof(bs->backing_file),
> >>>>                cow_header.backing_file);
> >>>>
> >>>>  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
> >>>>  +        /* backing file currently not supported by fd: protocol */
> >>>>  +        goto fail;
> >>>>  +    }
> >> I don't think these checks are strictly needed. Without the check you
> >> can open the image itself using an fd, and the backing file using good
> >> old raw-posix. We shouldn't decide for our users that this isn't useful.
> >>
> >> In any case, it would have to move into block.c, you can't modify
> >> independent drivers for this.
> >>
> > 
> > I understand the point on not modifying independent drivers.
> > 
> > But if the backing file resides on NFS, wouldn't the the proposed 
> > SELinux changes prevent the open?
> 
> Probably. But what about cases where the backing file is local? Or even
> a non-libvirt, non-SELinux use case?
> 
> > Or are you talking about support where libvirt opens the backing file 
> > and passes the fd to Qemu?  If so there was some discussion about future 
> > support for this here: 
> > http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01496.html
> 
> Not really, but implementing this will be a bit easier if you don't
> forbid using backing files with fd.

In any case, for 'fd:' to be actually used by libvirt, we need to have
backing files supported. There are major users of libvirt who rely on
NFS and also use backing files, so an 'fd:' impl which can't deal with
the backing file problem isn't much use to libvirt.

Regards,
Daniel
Kevin Wolf - July 27, 2011, 8:36 a.m.
Am 27.07.2011 10:22, schrieb Daniel P. Berrange:
> On Wed, Jul 27, 2011 at 10:11:06AM +0200, Kevin Wolf wrote:
>> Am 26.07.2011 18:57, schrieb Corey Bryant:
>>>>>>  diff --git a/block/cow.c b/block/cow.c
>>>>>>  index 4cf543c..e17f8e7 100644
>>>>>>  --- a/block/cow.c
>>>>>>  +++ b/block/cow.c
>>>>>>  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>>>>>>        pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>>>>>>                cow_header.backing_file);
>>>>>>
>>>>>>  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>>>  +        /* backing file currently not supported by fd: protocol */
>>>>>>  +        goto fail;
>>>>>>  +    }
>>>> I don't think these checks are strictly needed. Without the check you
>>>> can open the image itself using an fd, and the backing file using good
>>>> old raw-posix. We shouldn't decide for our users that this isn't useful.
>>>>
>>>> In any case, it would have to move into block.c, you can't modify
>>>> independent drivers for this.
>>>>
>>>
>>> I understand the point on not modifying independent drivers.
>>>
>>> But if the backing file resides on NFS, wouldn't the the proposed 
>>> SELinux changes prevent the open?
>>
>> Probably. But what about cases where the backing file is local? Or even
>> a non-libvirt, non-SELinux use case?
>>
>>> Or are you talking about support where libvirt opens the backing file 
>>> and passes the fd to Qemu?  If so there was some discussion about future 
>>> support for this here: 
>>> http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01496.html
>>
>> Not really, but implementing this will be a bit easier if you don't
>> forbid using backing files with fd.
> 
> In any case, for 'fd:' to be actually used by libvirt, we need to have
> backing files supported. There are major users of libvirt who rely on
> NFS and also use backing files, so an 'fd:' impl which can't deal with
> the backing file problem isn't much use to libvirt.

I'm perfectly aware of that. But seriously, repeating it over and over
again isn't going to make it happen any sooner. It requires -blockdev
which we may or may not have by 1.0.

Kevin
Daniel P. Berrange - July 27, 2011, 8:43 a.m.
On Wed, Jul 27, 2011 at 10:36:25AM +0200, Kevin Wolf wrote:
> Am 27.07.2011 10:22, schrieb Daniel P. Berrange:
> > On Wed, Jul 27, 2011 at 10:11:06AM +0200, Kevin Wolf wrote:
> >> Am 26.07.2011 18:57, schrieb Corey Bryant:
> >>>>>>  diff --git a/block/cow.c b/block/cow.c
> >>>>>>  index 4cf543c..e17f8e7 100644
> >>>>>>  --- a/block/cow.c
> >>>>>>  +++ b/block/cow.c
> >>>>>>  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
> >>>>>>        pstrcpy(bs->backing_file, sizeof(bs->backing_file),
> >>>>>>                cow_header.backing_file);
> >>>>>>
> >>>>>>  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
> >>>>>>  +        /* backing file currently not supported by fd: protocol */
> >>>>>>  +        goto fail;
> >>>>>>  +    }
> >>>> I don't think these checks are strictly needed. Without the check you
> >>>> can open the image itself using an fd, and the backing file using good
> >>>> old raw-posix. We shouldn't decide for our users that this isn't useful.
> >>>>
> >>>> In any case, it would have to move into block.c, you can't modify
> >>>> independent drivers for this.
> >>>>
> >>>
> >>> I understand the point on not modifying independent drivers.
> >>>
> >>> But if the backing file resides on NFS, wouldn't the the proposed 
> >>> SELinux changes prevent the open?
> >>
> >> Probably. But what about cases where the backing file is local? Or even
> >> a non-libvirt, non-SELinux use case?
> >>
> >>> Or are you talking about support where libvirt opens the backing file 
> >>> and passes the fd to Qemu?  If so there was some discussion about future 
> >>> support for this here: 
> >>> http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01496.html
> >>
> >> Not really, but implementing this will be a bit easier if you don't
> >> forbid using backing files with fd.
> > 
> > In any case, for 'fd:' to be actually used by libvirt, we need to have
> > backing files supported. There are major users of libvirt who rely on
> > NFS and also use backing files, so an 'fd:' impl which can't deal with
> > the backing file problem isn't much use to libvirt.
> 
> I'm perfectly aware of that. But seriously, repeating it over and over
> again isn't going to make it happen any sooner. It requires -blockdev
> which we may or may not have by 1.0.

Yes, I understand we need also -blockdev for this. Other messages in this
mail thread have impied that this proposed patch on its own is useful for
libvirt's requirements, so I just wanted to remind people in general that
we can't use this patch on its own until we have something like -blockdev.

Regards,
Daniel
Corey Bryant - July 27, 2011, 1:09 p.m.
On 07/27/2011 04:43 AM, Daniel P. Berrange wrote:
> On Wed, Jul 27, 2011 at 10:36:25AM +0200, Kevin Wolf wrote:
>> Am 27.07.2011 10:22, schrieb Daniel P. Berrange:
>>> On Wed, Jul 27, 2011 at 10:11:06AM +0200, Kevin Wolf wrote:
>>>> Am 26.07.2011 18:57, schrieb Corey Bryant:
>>>>>>>>   diff --git a/block/cow.c b/block/cow.c
>>>>>>>>   index 4cf543c..e17f8e7 100644
>>>>>>>>   --- a/block/cow.c
>>>>>>>>   +++ b/block/cow.c
>>>>>>>>   @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>>>>>>>>         pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>>>>>>>>                 cow_header.backing_file);
>>>>>>>>
>>>>>>>>   +    if (bs->backing_file[0] != '\0'&&   bdrv_is_fd_protocol(bs)) {
>>>>>>>>   +        /* backing file currently not supported by fd: protocol */
>>>>>>>>   +        goto fail;
>>>>>>>>   +    }
>>>>>> I don't think these checks are strictly needed. Without the check you
>>>>>> can open the image itself using an fd, and the backing file using good
>>>>>> old raw-posix. We shouldn't decide for our users that this isn't useful.
>>>>>>
>>>>>> In any case, it would have to move into block.c, you can't modify
>>>>>> independent drivers for this.
>>>>>>
>>>>>
>>>>> I understand the point on not modifying independent drivers.
>>>>>
>>>>> But if the backing file resides on NFS, wouldn't the the proposed
>>>>> SELinux changes prevent the open?
>>>>
>>>> Probably. But what about cases where the backing file is local? Or even
>>>> a non-libvirt, non-SELinux use case?
>>>>
>>>>> Or are you talking about support where libvirt opens the backing file
>>>>> and passes the fd to Qemu?  If so there was some discussion about future
>>>>> support for this here:
>>>>> http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01496.html
>>>>
>>>> Not really, but implementing this will be a bit easier if you don't
>>>> forbid using backing files with fd.
>>>
>>> In any case, for 'fd:' to be actually used by libvirt, we need to have
>>> backing files supported. There are major users of libvirt who rely on
>>> NFS and also use backing files, so an 'fd:' impl which can't deal with
>>> the backing file problem isn't much use to libvirt.
>>
>> I'm perfectly aware of that. But seriously, repeating it over and over
>> again isn't going to make it happen any sooner. It requires -blockdev
>> which we may or may not have by 1.0.
>
> Yes, I understand we need also -blockdev for this. Other messages in this
> mail thread have impied that this proposed patch on its own is useful for
> libvirt's requirements, so I just wanted to remind people in general that
> we can't use this patch on its own until we have something like -blockdev.
>
> Regards,
> Daniel

Kevin/Daniel, thanks a lot for your input.

In terms of the support that libvirt requires, I just want to make sure 
all bases are covered.

In order for this support to be useful to libvirt, the following are 
required (sorry if this is repetitive):

1) -blockdev (backing file support)
2) savevm (snapshot support)
3) snapshot_blkdev (snapshot support)
4) 'change' monitor command
5) -cdrom

and as far as I know, the status of the above is:

1) Someone is slated to work on this (not me)
2) I need to figure out how to "re-open" file without an open (me)
3) This will be covered by live snapshots feature (not me)
4) Should just work as is (me)
5) May also just work as is (me)

In other words, 1 and 3 will not be implemented by me (except perhaps 
some re-usable infrastructure).  Could you confirm my understanding?

Regards,
Corey
Kevin Wolf - July 27, 2011, 2:57 p.m.
Am 27.07.2011 15:09, schrieb Corey Bryant:
> Kevin/Daniel, thanks a lot for your input.
> 
> In terms of the support that libvirt requires, I just want to make sure 
> all bases are covered.
> 
> In order for this support to be useful to libvirt, the following are 
> required (sorry if this is repetitive):
> 
> 1) -blockdev (backing file support)
> 2) savevm (snapshot support)

savevm is harmless and doesn't need any special treatment.

> 3) snapshot_blkdev (snapshot support)
> 4) 'change' monitor command
> 5) -cdrom
> 
> and as far as I know, the status of the above is:
> 
> 1) Someone is slated to work on this (not me)
> 2) I need to figure out how to "re-open" file without an open (me)
> 3) This will be covered by live snapshots feature (not me)
> 4) Should just work as is (me)
> 5) May also just work as is (me)
> 
> In other words, 1 and 3 will not be implemented by me (except perhaps 
> some re-usable infrastructure).  Could you confirm my understanding?

Yes, at least not at this point.

Kevin
Blue Swirl - July 27, 2011, 9:36 p.m.
On Tue, Jul 26, 2011 at 3:51 PM, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
> sVirt provides SELinux MAC isolation for Qemu guest processes and their
> corresponding resources (image files). sVirt provides this support
> by labeling guests and resources with security labels that are stored
> in file system extended attributes. Some file systems, such as NFS, do
> not support the extended attribute security namespace, which is needed
> for image file isolation when using the sVirt SELinux security driver
> in libvirt.
>
> The proposed solution entails a combination of Qemu, libvirt, and
> SELinux patches that work together to isolate multiple guests' images
> when they're stored in the same NFS mount. This results in an
> environment where sVirt isolation and NFS image file isolation can both
> be provided.
>
> This patch contains the Qemu code to support this solution. I would
> like to solicit input from the libvirt community prior to starting
> the libvirt patch.
>
> Currently, Qemu opens an image file in addition to performing the
> necessary read and write operations. The proposed solution will move
> the open out of Qemu and into libvirt. Once libvirt opens an image
> file for the guest, it will pass the file descriptor to Qemu via a
> new fd: protocol.
>
> If the image file resides in an NFS mount, the following SELinux policy
> changes will provide image isolation:
>
>  - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>    allow Qemu (svirt_t) to only have SELinux read and write
>    permissions on nfs_t files
>
>  - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>    (virtd_t) file descriptors
>
> Following is a sample invocation of Qemu using the fd: protocol on
> the command line:
>
>    qemu -drive file=fd:4,format=qcow2
>
> The fd: protocol is also supported by the drive_add monitor command.
> This requires that the specified file descriptor is passed to the
> monitor alongside a prior getfd monitor command.
>
> There are some additional features provided by certain image types
> where Qemu reopens the image file. All of these scenarios will be
> unsupported for the fd: protocol, at least for this patch:
>
>  - The -snapshot command line option
>  - The savevm monitor command
>  - The snapshot_blkdev monitor command
>  - Use of copy-on-write image files
>  - The -cdrom command line option
>  - The -drive command line option with media=cdrom
>  - The change monitor command

In the earlier discussion, virtio-blk and iSCSI were identified as
interesting protocols to be used with file descriptors in the future.
This patch would bind fd: protocol only to raw file interface to be
used by qcow2 etc. higher levels. I guess with small changes the
protocol could be made selectable. Maybe it's just changing command
line to format=virtio-blk or format=iscsi for those uses, though I
fear there may be a layering violation.

Considering the privilege separation side, this patch seems to be
somewhat orthogonal to that (I don't expect any command line passing
and parsing to happen between QEMU processes) but the low level fd
handling seems useful modulo the comments made by others.

>
> The thought is that this support can be added in the future, but is
> not required for the initial fd: support.
>
> This patch was tested with the following formats: raw, cow, qcow,
> qcow2, qed, and vmdk, using the fd: protocol from the command line
> and the monitor. Tests were also run to verify existing file name
> support and qemu-img were not regressed. Non-valid file descriptors,
> fd: without format, snapshot and backing files, and cdrom were also
> tested.
>
> v2:
>  - Add drive_add monitor command support
>  - Fence off unsupported features that re-open image file
>
> v3:
>  - Fence off cdrom and change monitor command support
>
> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
> ---
>  block.c           |   16 ++++++++++
>  block.h           |    1 +
>  block/cow.c       |    5 +++
>  block/qcow.c      |    5 +++
>  block/qcow2.c     |    5 +++
>  block/qed.c       |    4 ++
>  block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>  block/vmdk.c      |    5 +++
>  block_int.h       |    1 +
>  blockdev.c        |   19 ++++++++++++
>  monitor.c         |    5 +++
>  monitor.h         |    1 +
>  qemu-options.hx   |    8 +++--
>  qemu-tool.c       |    5 +++
>  14 files changed, 149 insertions(+), 12 deletions(-)
>
> diff --git a/block.c b/block.c
> index 24a25d5..500db84 100644
> --- a/block.c
> +++ b/block.c
> @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>         char tmp_filename[PATH_MAX];
>         char backing_filename[PATH_MAX];
>
> +        if (bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }
> +
>         /* if snapshot, we create a temporary backing file and open it
>            instead of opening 'filename' directly */
>
> @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>
>     /* Find the right image format driver */
>     if (!drv) {
> +        /* format must be specified for fd: protocol */
> +        if (bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }
>         ret = find_image_format(filename, &drv);
>     }
>
> @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>     return bs->enable_write_cache;
>  }
>
> +int bdrv_is_fd_protocol(BlockDriverState *bs)
> +{
> +    return bs->fd_protocol;
> +}
> +
>  /* XXX: no longer used */
>  void bdrv_set_change_cb(BlockDriverState *bs,
>                         void (*change_cb)(void *opaque, int reason),
> @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>     BlockDriver *drv = bs->drv;
>     if (!drv)
>         return -ENOMEDIUM;
> +    if (bdrv_is_fd_protocol(bs)) {
> +        return -ENOTSUP;
> +    }
>     if (drv->bdrv_snapshot_create)
>         return drv->bdrv_snapshot_create(bs, sn_info);
>     if (bs->file)
> diff --git a/block.h b/block.h
> index 859d1d9..0417b69 100644
> --- a/block.h
> +++ b/block.h
> @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>  int bdrv_is_read_only(BlockDriverState *bs);
>  int bdrv_is_sg(BlockDriverState *bs);
>  int bdrv_enable_write_cache(BlockDriverState *bs);
> +int bdrv_is_fd_protocol(BlockDriverState *bs);
>  int bdrv_is_inserted(BlockDriverState *bs);
>  int bdrv_media_changed(BlockDriverState *bs);
>  int bdrv_is_locked(BlockDriverState *bs);
> diff --git a/block/cow.c b/block/cow.c
> index 4cf543c..e17f8e7 100644
> --- a/block/cow.c
> +++ b/block/cow.c
> @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>     pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>             cow_header.backing_file);
>
> +    if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +        /* backing file currently not supported by fd: protocol */
> +        goto fail;
> +    }
> +
>     bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
>     s->cow_sectors_offset = (bitmap_size + 511) & ~511;
>     return 0;
> diff --git a/block/qcow.c b/block/qcow.c
> index 227b104..964d411 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>         if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>             goto fail;
>         bs->backing_file[len] = '\0';
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            /* backing file currently not supported by fd: protocol */
> +            goto fail;
> +        }
>     }
>     return 0;
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 48e1b95..7f6a4fa 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>             goto fail;
>         }
>         bs->backing_file[len] = '\0';
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            ret = -ENOTSUP;
> +            goto fail;
> +        }
>     }
>     if (qcow2_read_snapshots(bs) < 0) {
>         ret = -EINVAL;
> diff --git a/block/qed.c b/block/qed.c
> index 3970379..5028897 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>             return ret;
>         }
>
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            return -ENOTSUP;
> +        }
> +
>         if (s->header.features & QED_F_BACKING_FORMAT_NO_PROBE) {
>             pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>         }
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 34b64aa..cec4d36 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "module.h"
>  #include "block/raw-posix-aio.h"
> +#include "monitor.h"
>
>  #ifdef CONFIG_COCOA
>  #include <paths.h>
> @@ -185,7 +186,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>                            int bdrv_flags, int open_flags)
>  {
>     BDRVRawState *s = bs->opaque;
> -    int fd, ret;
> +    int fd = -1;
> +    int ret;
>
>     ret = raw_normalize_devicepath(&filename);
>     if (ret != 0) {
> @@ -207,15 +209,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>     if (!(bdrv_flags & BDRV_O_CACHE_WB))
>         s->open_flags |= O_DSYNC;
>
> -    s->fd = -1;
> -    fd = qemu_open(filename, s->open_flags, 0644);
> -    if (fd < 0) {
> -        ret = -errno;
> -        if (ret == -EROFS)
> -            ret = -EACCES;
> -        return ret;
> +    if (s->fd == -1) {
> +        fd = qemu_open(filename, s->open_flags, 0644);
> +        if (fd < 0) {
> +            ret = -errno;
> +            if (ret == -EROFS) {
> +                ret = -EACCES;
> +            }
> +            return ret;
> +        }
> +        s->fd = fd;
>     }
> -    s->fd = fd;
>     s->aligned_buf = NULL;
>
>     if ((bdrv_flags & BDRV_O_NOCACHE)) {
> @@ -272,6 +276,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>  {
>     BDRVRawState *s = bs->opaque;
>
> +    s->fd = -1;
>     s->type = FTYPE_FILE;
>     return raw_open_common(bs, filename, flags, 0);
>  }
> @@ -892,6 +897,60 @@ static BlockDriver bdrv_file = {
>     .create_options = raw_create_options,
>  };
>
> +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    BDRVRawState *s = bs->opaque;
> +    const char *fd_str;
> +    int fd;
> +
> +    /* extract the file descriptor - fail if it's not fd: */
> +    if (!strstart(filename, "fd:", &fd_str)) {
> +        return -EINVAL;
> +    }
> +
> +    if (!qemu_isdigit(fd_str[0])) {
> +        /* get fd from monitor */
> +        fd = qemu_get_fd(fd_str);
> +        if (fd == -1) {
> +            return -EBADF;
> +        }
> +    } else {
> +        char *endptr = NULL;
> +
> +        fd = strtol(fd_str, &endptr, 10);
> +        if (*endptr || (fd == 0 && fd_str == endptr)) {
> +            return -EBADF;
> +        }
> +    }
> +
> +    s->fd = fd;
> +    s->type = FTYPE_FILE;
> +
> +    return raw_open_common(bs, filename, flags, 0);
> +}
> +
> +static BlockDriver bdrv_file_fd = {
> +    .format_name = "file",
> +    .protocol_name = "fd",
> +    .instance_size = sizeof(BDRVRawState),
> +    .bdrv_probe = NULL, /* no probe for protocols */
> +    .bdrv_file_open = raw_open_fd,
> +    .bdrv_read = raw_read,
> +    .bdrv_write = raw_write,
> +    .bdrv_close = raw_close,
> +    .bdrv_flush = raw_flush,
> +    .bdrv_discard = raw_discard,
> +
> +    .bdrv_aio_readv = raw_aio_readv,
> +    .bdrv_aio_writev = raw_aio_writev,
> +    .bdrv_aio_flush = raw_aio_flush,
> +
> +    .bdrv_truncate = raw_truncate,
> +    .bdrv_getlength = raw_getlength,
> +
> +    .create_options = raw_create_options,
> +};
> +
>  /***********************************************/
>  /* host device */
>
> @@ -1000,6 +1059,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>     }
>  #endif
>
> +    s->fd = -1;
>     s->type = FTYPE_FILE;
>  #if defined(__linux__)
>     {
> @@ -1170,6 +1230,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>     BDRVRawState *s = bs->opaque;
>     int ret;
>
> +    s->fd = -1;
>     s->type = FTYPE_FD;
>
>     /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
> @@ -1288,6 +1349,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>  {
>     BDRVRawState *s = bs->opaque;
>
> +    s->fd = -1;
>     s->type = FTYPE_CD;
>
>     /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
> @@ -1517,6 +1579,7 @@ static void bdrv_file_init(void)
>      * Register all the drivers.  Note that order is important, the driver
>      * registered last will get probed first.
>      */
> +    bdrv_register(&bdrv_file_fd);
>     bdrv_register(&bdrv_file);
>     bdrv_register(&bdrv_host_device);
>  #ifdef __linux__
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 922b23d..2ea808e 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>             return -1;
>
>         pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
> +
> +        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
> +            /* backing file currently not supported by fd: protocol */
> +            return -1;
> +        }
>     }
>
>     return 0;
> diff --git a/block_int.h b/block_int.h
> index 1e265d2..441049c 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -152,6 +152,7 @@ struct BlockDriverState {
>     int encrypted; /* if true, the media is encrypted */
>     int valid_key; /* if true, a valid encryption key has been set */
>     int sg;        /* if true, the device is a /dev/sg* */
> +    int fd_protocol; /* if true, the fd: protocol was specified */
>     /* event callback when inserting/removing */
>     void (*change_cb)(void *opaque, int reason);
>     void *change_opaque;
> diff --git a/blockdev.c b/blockdev.c
> index c263663..5cb7b56 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -542,6 +542,14 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>
>     bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>
> +    if (strncmp(file, "fd:", 3) == 0) {
> +        if (media == MEDIA_CDROM) {
> +            error_report("CD-ROM not supported by fd: protocol");
> +            goto err;
> +        }
> +        dinfo->bdrv->fd_protocol = 1;
> +    }
> +
>     ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>     if (ret < 0) {
>         error_report("could not open disk image %s: %s",
> @@ -602,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
>         goto out;
>     }
>
> +    if (bdrv_is_fd_protocol(bs)) {
> +        qerror_report(QERR_UNSUPPORTED);
> +        ret = -1;
> +        goto out;
> +    }
> +
>     pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>
>     old_drv = bs->drv;
> @@ -719,6 +733,11 @@ int do_change_block(Monitor *mon, const char *device,
>     BlockDriver *drv = NULL;
>     int bdrv_flags;
>
> +    if (strncmp(filename, "fd:", 3) == 0) {
> +        qerror_report(QERR_UNSUPPORTED);
> +        return -1;
> +    }
> +
>     bs = bdrv_find(device);
>     if (!bs) {
>         qerror_report(QERR_DEVICE_NOT_FOUND, device);
> diff --git a/monitor.c b/monitor.c
> index a6388a9..e521b60 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char *fdname)
>     return -1;
>  }
>
> +int qemu_get_fd(const char *fdname)
> +{
> +    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
> +}
> +
>  static const mon_cmd_t mon_cmds[] = {
>  #include "hmp-commands.h"
>     { NULL, NULL, },
> diff --git a/monitor.h b/monitor.h
> index 4f2d328..de5b987 100644
> --- a/monitor.h
> +++ b/monitor.h
> @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
>                                 void *opaque);
>
>  int monitor_get_fd(Monitor *mon, const char *fdname);
> +int qemu_get_fd(const char *fdname);
>
>  void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>     GCC_FMT_ATTR(2, 0);
> diff --git a/qemu-options.hx b/qemu-options.hx
> index cb3347e..ab28541 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename (@pxref{host_drives}).
>  ETEXI
>
>  DEF("drive", HAS_ARG, QEMU_OPTION_drive,
> -    "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
> +    "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>     "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
> @@ -129,10 +129,12 @@ STEXI
>  Define a new drive. Valid options are:
>
>  @table @option
> -@item file=@var{file}
> +@item file=[fd:]@var{file}
>  This option defines which disk image (@pxref{disk_images}) to use with
>  this drive. If the filename contains comma, you must double it
> -(for instance, "file=my,,file" to use file "my,file").
> +(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
> +specifies the file descriptor of an already open disk image.
> +@option{format=}@var{format} is required by @option{fd:}@var{file}.
>  @item if=@var{interface}
>  This option defines on which type on interface the drive is connected.
>  Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
> diff --git a/qemu-tool.c b/qemu-tool.c
> index 41e5c41..8fe6b8c 100644
> --- a/qemu-tool.c
> +++ b/qemu-tool.c
> @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>  {
>     return 0;
>  }
> +
> +int qemu_get_fd(const char *fdname)
> +{
> +    return -1;
> +}
> --
> 1.7.3.4
>
>
>
Corey Bryant - Aug. 11, 2011, 4:28 p.m.
On 07/26/2011 08:51 AM, Corey Bryant wrote:

 > +static int raw_open_fd(BlockDriverState *bs, const char *filename, 
int flags)
 > +{
 > +    BDRVRawState *s = bs->opaque;
 > +    const char *fd_str;
 > +    int fd;
 > +
 > +    /* extract the file descriptor - fail if it's not fd: */
 > +    if (!strstart(filename, "fd:",&fd_str)) {
 > +        return -EINVAL;
 > +    }
 > +
 > +    if (!qemu_isdigit(fd_str[0])) {
 > +        /* get fd from monitor */
 > +        fd = qemu_get_fd(fd_str);
 > +        if (fd == -1) {
 > +            return -EBADF;
 > +        }
 > +    } else {
 > +        char *endptr = NULL;
 > +
 > +        fd = strtol(fd_str,&endptr, 10);
 > +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
 > +            return -EBADF;
 > +        }
 > +    }
 > +
 > +    s->fd = fd;
 > +    s->type = FTYPE_FILE;
 > +
 > +    return raw_open_common(bs, filename, flags, 0);
 > +}
 > +
 > +static BlockDriver bdrv_file_fd = {
 > +    .format_name = "file",
 > +    .protocol_name = "fd",
 > +    .instance_size = sizeof(BDRVRawState),
 > +    .bdrv_probe = NULL, /* no probe for protocols */
 > +    .bdrv_file_open = raw_open_fd,
 > +    .bdrv_read = raw_read,
 > +    .bdrv_write = raw_write,
 > +    .bdrv_close = raw_close,
 > +    .bdrv_flush = raw_flush,
 > +    .bdrv_discard = raw_discard,
 > +
 > +    .bdrv_aio_readv = raw_aio_readv,
 > +    .bdrv_aio_writev = raw_aio_writev,
 > +    .bdrv_aio_flush = raw_aio_flush,
 > +
 > +    .bdrv_truncate = raw_truncate,
 > +    .bdrv_getlength = raw_getlength,
 > +
 > +    .create_options = raw_create_options,
 > +};
 > +

I'm a bit unsure of how to support CD-ROM with the fd: protocol.

I don't think use of bdrv_file_fd is an option.  For example, while 
raw_open_fd may work, there is no eject support.

Another approach is to have 2 BlockDriver structs that support the fd 
protocol.  For example, creating a new BlockDriver struct that mirrors 
bdrv_host_cdrom.  So you would have bdrv_host_cdrom_fd with a 
corresponding cdrom_open_fd function.  But that doesn't appear possible 
as it appears that the protocol must be unique among all BlockDriver 
structs.

Do we need to introduce a similar protocol (maybe "cdfd") to support 
passing of CD-ROM file descriptors?
Kevin Wolf - Aug. 12, 2011, 10:07 a.m.
Am 11.08.2011 18:28, schrieb Corey Bryant:
> 
> 
> On 07/26/2011 08:51 AM, Corey Bryant wrote:
> 
>  > +static int raw_open_fd(BlockDriverState *bs, const char *filename, 
> int flags)
>  > +{
>  > +    BDRVRawState *s = bs->opaque;
>  > +    const char *fd_str;
>  > +    int fd;
>  > +
>  > +    /* extract the file descriptor - fail if it's not fd: */
>  > +    if (!strstart(filename, "fd:",&fd_str)) {
>  > +        return -EINVAL;
>  > +    }
>  > +
>  > +    if (!qemu_isdigit(fd_str[0])) {
>  > +        /* get fd from monitor */
>  > +        fd = qemu_get_fd(fd_str);
>  > +        if (fd == -1) {
>  > +            return -EBADF;
>  > +        }
>  > +    } else {
>  > +        char *endptr = NULL;
>  > +
>  > +        fd = strtol(fd_str,&endptr, 10);
>  > +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
>  > +            return -EBADF;
>  > +        }
>  > +    }
>  > +
>  > +    s->fd = fd;
>  > +    s->type = FTYPE_FILE;
>  > +
>  > +    return raw_open_common(bs, filename, flags, 0);
>  > +}
>  > +
>  > +static BlockDriver bdrv_file_fd = {
>  > +    .format_name = "file",
>  > +    .protocol_name = "fd",
>  > +    .instance_size = sizeof(BDRVRawState),
>  > +    .bdrv_probe = NULL, /* no probe for protocols */
>  > +    .bdrv_file_open = raw_open_fd,
>  > +    .bdrv_read = raw_read,
>  > +    .bdrv_write = raw_write,
>  > +    .bdrv_close = raw_close,
>  > +    .bdrv_flush = raw_flush,
>  > +    .bdrv_discard = raw_discard,
>  > +
>  > +    .bdrv_aio_readv = raw_aio_readv,
>  > +    .bdrv_aio_writev = raw_aio_writev,
>  > +    .bdrv_aio_flush = raw_aio_flush,
>  > +
>  > +    .bdrv_truncate = raw_truncate,
>  > +    .bdrv_getlength = raw_getlength,
>  > +
>  > +    .create_options = raw_create_options,
>  > +};
>  > +
> 
> I'm a bit unsure of how to support CD-ROM with the fd: protocol.
> 
> I don't think use of bdrv_file_fd is an option.  For example, while 
> raw_open_fd may work, there is no eject support.
> 
> Another approach is to have 2 BlockDriver structs that support the fd 
> protocol.  For example, creating a new BlockDriver struct that mirrors 
> bdrv_host_cdrom.  So you would have bdrv_host_cdrom_fd with a 
> corresponding cdrom_open_fd function.  But that doesn't appear possible 
> as it appears that the protocol must be unique among all BlockDriver 
> structs.
> 
> Do we need to introduce a similar protocol (maybe "cdfd") to support 
> passing of CD-ROM file descriptors?

I would try to avoid that. I think you can determine the kind of device
during fd_open and then implement an fd_eject etc. that checks this type
and calls the appropriate ioctl. Maybe the same should be done for
file/host_device/host_cdrom instead of having separate protocols.

Anyway, I don't think this is important to get a first version merged.
It will still take a while anyway until we have full support that
libvirt would want to use.

Kevin

Patch

diff --git a/block.c b/block.c
index 24a25d5..500db84 100644
--- a/block.c
+++ b/block.c
@@ -536,6 +536,10 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
         char tmp_filename[PATH_MAX];
         char backing_filename[PATH_MAX];
 
+        if (bdrv_is_fd_protocol(bs)) {
+            return -ENOTSUP;
+        }
+
         /* if snapshot, we create a temporary backing file and open it
            instead of opening 'filename' directly */
 
@@ -585,6 +589,10 @@  int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
 
     /* Find the right image format driver */
     if (!drv) {
+        /* format must be specified for fd: protocol */
+        if (bdrv_is_fd_protocol(bs)) {
+            return -ENOTSUP;
+        }
         ret = find_image_format(filename, &drv);
     }
 
@@ -1460,6 +1468,11 @@  int bdrv_enable_write_cache(BlockDriverState *bs)
     return bs->enable_write_cache;
 }
 
+int bdrv_is_fd_protocol(BlockDriverState *bs)
+{
+    return bs->fd_protocol;
+}
+
 /* XXX: no longer used */
 void bdrv_set_change_cb(BlockDriverState *bs,
                         void (*change_cb)(void *opaque, int reason),
@@ -1964,6 +1977,9 @@  int bdrv_snapshot_create(BlockDriverState *bs,
     BlockDriver *drv = bs->drv;
     if (!drv)
         return -ENOMEDIUM;
+    if (bdrv_is_fd_protocol(bs)) {
+        return -ENOTSUP;
+    }
     if (drv->bdrv_snapshot_create)
         return drv->bdrv_snapshot_create(bs, sn_info);
     if (bs->file)
diff --git a/block.h b/block.h
index 859d1d9..0417b69 100644
--- a/block.h
+++ b/block.h
@@ -182,6 +182,7 @@  int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
 int bdrv_enable_write_cache(BlockDriverState *bs);
+int bdrv_is_fd_protocol(BlockDriverState *bs);
 int bdrv_is_inserted(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
 int bdrv_is_locked(BlockDriverState *bs);
diff --git a/block/cow.c b/block/cow.c
index 4cf543c..e17f8e7 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -82,6 +82,11 @@  static int cow_open(BlockDriverState *bs, int flags)
     pstrcpy(bs->backing_file, sizeof(bs->backing_file),
             cow_header.backing_file);
 
+    if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+        /* backing file currently not supported by fd: protocol */
+        goto fail;
+    }
+
     bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
     s->cow_sectors_offset = (bitmap_size + 511) & ~511;
     return 0;
diff --git a/block/qcow.c b/block/qcow.c
index 227b104..964d411 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -157,6 +157,11 @@  static int qcow_open(BlockDriverState *bs, int flags)
         if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
             goto fail;
         bs->backing_file[len] = '\0';
+
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            /* backing file currently not supported by fd: protocol */
+            goto fail;
+        }
     }
     return 0;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 48e1b95..7f6a4fa 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -270,6 +270,11 @@  static int qcow2_open(BlockDriverState *bs, int flags)
             goto fail;
         }
         bs->backing_file[len] = '\0';
+
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            ret = -ENOTSUP;
+            goto fail;
+        }
     }
     if (qcow2_read_snapshots(bs) < 0) {
         ret = -EINVAL;
diff --git a/block/qed.c b/block/qed.c
index 3970379..5028897 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -446,6 +446,10 @@  static int bdrv_qed_open(BlockDriverState *bs, int flags)
             return ret;
         }
 
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            return -ENOTSUP;
+        }
+
         if (s->header.features & QED_F_BACKING_FORMAT_NO_PROBE) {
             pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
         }
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 34b64aa..cec4d36 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -28,6 +28,7 @@ 
 #include "block_int.h"
 #include "module.h"
 #include "block/raw-posix-aio.h"
+#include "monitor.h"
 
 #ifdef CONFIG_COCOA
 #include <paths.h>
@@ -185,7 +186,8 @@  static int raw_open_common(BlockDriverState *bs, const char *filename,
                            int bdrv_flags, int open_flags)
 {
     BDRVRawState *s = bs->opaque;
-    int fd, ret;
+    int fd = -1;
+    int ret;
 
     ret = raw_normalize_devicepath(&filename);
     if (ret != 0) {
@@ -207,15 +209,17 @@  static int raw_open_common(BlockDriverState *bs, const char *filename,
     if (!(bdrv_flags & BDRV_O_CACHE_WB))
         s->open_flags |= O_DSYNC;
 
-    s->fd = -1;
-    fd = qemu_open(filename, s->open_flags, 0644);
-    if (fd < 0) {
-        ret = -errno;
-        if (ret == -EROFS)
-            ret = -EACCES;
-        return ret;
+    if (s->fd == -1) {
+        fd = qemu_open(filename, s->open_flags, 0644);
+        if (fd < 0) {
+            ret = -errno;
+            if (ret == -EROFS) {
+                ret = -EACCES;
+            }
+            return ret;
+        }
+        s->fd = fd;
     }
-    s->fd = fd;
     s->aligned_buf = NULL;
 
     if ((bdrv_flags & BDRV_O_NOCACHE)) {
@@ -272,6 +276,7 @@  static int raw_open(BlockDriverState *bs, const char *filename, int flags)
 {
     BDRVRawState *s = bs->opaque;
 
+    s->fd = -1;
     s->type = FTYPE_FILE;
     return raw_open_common(bs, filename, flags, 0);
 }
@@ -892,6 +897,60 @@  static BlockDriver bdrv_file = {
     .create_options = raw_create_options,
 };
 
+static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVRawState *s = bs->opaque;
+    const char *fd_str;
+    int fd;
+
+    /* extract the file descriptor - fail if it's not fd: */
+    if (!strstart(filename, "fd:", &fd_str)) {
+        return -EINVAL;
+    }
+
+    if (!qemu_isdigit(fd_str[0])) {
+        /* get fd from monitor */
+        fd = qemu_get_fd(fd_str);
+        if (fd == -1) {
+            return -EBADF;
+        }
+    } else {
+        char *endptr = NULL;
+
+        fd = strtol(fd_str, &endptr, 10);
+        if (*endptr || (fd == 0 && fd_str == endptr)) {
+            return -EBADF;
+        }
+    }
+
+    s->fd = fd;
+    s->type = FTYPE_FILE;
+
+    return raw_open_common(bs, filename, flags, 0);
+}
+
+static BlockDriver bdrv_file_fd = {
+    .format_name = "file",
+    .protocol_name = "fd",
+    .instance_size = sizeof(BDRVRawState),
+    .bdrv_probe = NULL, /* no probe for protocols */
+    .bdrv_file_open = raw_open_fd,
+    .bdrv_read = raw_read,
+    .bdrv_write = raw_write,
+    .bdrv_close = raw_close,
+    .bdrv_flush = raw_flush,
+    .bdrv_discard = raw_discard,
+
+    .bdrv_aio_readv = raw_aio_readv,
+    .bdrv_aio_writev = raw_aio_writev,
+    .bdrv_aio_flush = raw_aio_flush,
+
+    .bdrv_truncate = raw_truncate,
+    .bdrv_getlength = raw_getlength,
+
+    .create_options = raw_create_options,
+};
+
 /***********************************************/
 /* host device */
 
@@ -1000,6 +1059,7 @@  static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
     }
 #endif
 
+    s->fd = -1;
     s->type = FTYPE_FILE;
 #if defined(__linux__)
     {
@@ -1170,6 +1230,7 @@  static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
     BDRVRawState *s = bs->opaque;
     int ret;
 
+    s->fd = -1;
     s->type = FTYPE_FD;
 
     /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
@@ -1288,6 +1349,7 @@  static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
 {
     BDRVRawState *s = bs->opaque;
 
+    s->fd = -1;
     s->type = FTYPE_CD;
 
     /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
@@ -1517,6 +1579,7 @@  static void bdrv_file_init(void)
      * Register all the drivers.  Note that order is important, the driver
      * registered last will get probed first.
      */
+    bdrv_register(&bdrv_file_fd);
     bdrv_register(&bdrv_file);
     bdrv_register(&bdrv_host_device);
 #ifdef __linux__
diff --git a/block/vmdk.c b/block/vmdk.c
index 922b23d..2ea808e 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -353,6 +353,11 @@  static int vmdk_parent_open(BlockDriverState *bs)
             return -1;
 
         pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
+
+        if (bs->backing_file[0] != '\0' && bdrv_is_fd_protocol(bs)) {
+            /* backing file currently not supported by fd: protocol */
+            return -1;
+        }
     }
 
     return 0;
diff --git a/block_int.h b/block_int.h
index 1e265d2..441049c 100644
--- a/block_int.h
+++ b/block_int.h
@@ -152,6 +152,7 @@  struct BlockDriverState {
     int encrypted; /* if true, the media is encrypted */
     int valid_key; /* if true, a valid encryption key has been set */
     int sg;        /* if true, the device is a /dev/sg* */
+    int fd_protocol; /* if true, the fd: protocol was specified */
     /* event callback when inserting/removing */
     void (*change_cb)(void *opaque, int reason);
     void *change_opaque;
diff --git a/blockdev.c b/blockdev.c
index c263663..5cb7b56 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -542,6 +542,14 @@  DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
 
+    if (strncmp(file, "fd:", 3) == 0) {
+        if (media == MEDIA_CDROM) {
+            error_report("CD-ROM not supported by fd: protocol");
+            goto err;
+        }
+        dinfo->bdrv->fd_protocol = 1;
+    }
+
     ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
     if (ret < 0) {
         error_report("could not open disk image %s: %s",
@@ -602,6 +610,12 @@  int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data)
         goto out;
     }
 
+    if (bdrv_is_fd_protocol(bs)) {
+        qerror_report(QERR_UNSUPPORTED);
+        ret = -1;
+        goto out;
+    }
+
     pstrcpy(old_filename, sizeof(old_filename), bs->filename);
 
     old_drv = bs->drv;
@@ -719,6 +733,11 @@  int do_change_block(Monitor *mon, const char *device,
     BlockDriver *drv = NULL;
     int bdrv_flags;
 
+    if (strncmp(filename, "fd:", 3) == 0) {
+        qerror_report(QERR_UNSUPPORTED);
+        return -1;
+    }
+
     bs = bdrv_find(device);
     if (!bs) {
         qerror_report(QERR_DEVICE_NOT_FOUND, device);
diff --git a/monitor.c b/monitor.c
index a6388a9..e521b60 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2832,6 +2832,11 @@  int monitor_get_fd(Monitor *mon, const char *fdname)
     return -1;
 }
 
+int qemu_get_fd(const char *fdname)
+{
+    return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
+}
+
 static const mon_cmd_t mon_cmds[] = {
 #include "hmp-commands.h"
     { NULL, NULL, },
diff --git a/monitor.h b/monitor.h
index 4f2d328..de5b987 100644
--- a/monitor.h
+++ b/monitor.h
@@ -51,6 +51,7 @@  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
                                 void *opaque);
 
 int monitor_get_fd(Monitor *mon, const char *fdname);
+int qemu_get_fd(const char *fdname);
 
 void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
     GCC_FMT_ATTR(2, 0);
diff --git a/qemu-options.hx b/qemu-options.hx
index cb3347e..ab28541 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -116,7 +116,7 @@  using @file{/dev/cdrom} as filename (@pxref{host_drives}).
 ETEXI
 
 DEF("drive", HAS_ARG, QEMU_OPTION_drive,
-    "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
+    "-drive [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
     "       [,cache=writethrough|writeback|none|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
@@ -129,10 +129,12 @@  STEXI
 Define a new drive. Valid options are:
 
 @table @option
-@item file=@var{file}
+@item file=[fd:]@var{file}
 This option defines which disk image (@pxref{disk_images}) to use with
 this drive. If the filename contains comma, you must double it
-(for instance, "file=my,,file" to use file "my,file").
+(for instance, "file=my,,file" to use file "my,file"). @option{fd:}@var{file}
+specifies the file descriptor of an already open disk image.
+@option{format=}@var{format} is required by @option{fd:}@var{file}.
 @item if=@var{interface}
 This option defines on which type on interface the drive is connected.
 Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
diff --git a/qemu-tool.c b/qemu-tool.c
index 41e5c41..8fe6b8c 100644
--- a/qemu-tool.c
+++ b/qemu-tool.c
@@ -96,3 +96,8 @@  int64_t qemu_get_clock_ns(QEMUClock *clock)
 {
     return 0;
 }
+
+int qemu_get_fd(const char *fdname)
+{
+    return -1;
+}