Patchwork [v2,3/3] raw-posix: Re-open host CD-ROM after media change

login
register
mail settings
Submitter Stefan Hajnoczi
Date March 29, 2011, 7:04 p.m.
Message ID <1301425482-8722-4-git-send-email-stefanha@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/88835/
State New
Headers show

Comments

Stefan Hajnoczi - March 29, 2011, 7:04 p.m.
Piggy-back on the guest CD-ROM polling to poll on the host.  Open and
close the host CD-ROM file descriptor to ensure we read the new size and
not a stale size.

Two things are going on here:

1. If hald/udisks is not already polling CD-ROMs on the host then
   re-opening the CD-ROM causes the host to read the new medium's size.

2. There is a bug in Linux which means the CD-ROM file descriptor must
   be re-opened in order for lseek(2) to see the new size.  The
   inode size gets out of sync with the underlying device (which you can
   confirm by checking that /sys/block/sr0/size and lseek(2) do not
   match after media change).  I have raised this with the
   maintainers but we need a workaround for the foreseeable future.

Note that these changes are all in a #ifdef __linux__ section.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block/raw-posix.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)
Kevin Wolf - March 31, 2011, 10:05 a.m.
Am 29.03.2011 21:04, schrieb Stefan Hajnoczi:
> Piggy-back on the guest CD-ROM polling to poll on the host.  Open and
> close the host CD-ROM file descriptor to ensure we read the new size and
> not a stale size.
> 
> Two things are going on here:
> 
> 1. If hald/udisks is not already polling CD-ROMs on the host then
>    re-opening the CD-ROM causes the host to read the new medium's size.
> 
> 2. There is a bug in Linux which means the CD-ROM file descriptor must
>    be re-opened in order for lseek(2) to see the new size.  The
>    inode size gets out of sync with the underlying device (which you can
>    confirm by checking that /sys/block/sr0/size and lseek(2) do not
>    match after media change).  I have raised this with the
>    maintainers but we need a workaround for the foreseeable future.
> 
> Note that these changes are all in a #ifdef __linux__ section.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Applied the first two patches of the series. I have some comments on
this one.

> ---
>  block/raw-posix.c |   26 ++++++++++++++++++++++----
>  1 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 6b72470..8b5205c 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs)
>      BDRVRawState *s = bs->opaque;
>      int ret;
>  
> -    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
> -    if (ret == CDS_DISC_OK)
> -        return 1;
> -    return 0;
> +    /*
> +     * Close the file descriptor if no medium is present and open it to poll
> +     * again.  This ensures the medium size is refreshed.  If the file
> +     * descriptor is kept open the size can become stale.  This is essentially
> +     * replicating CD-ROM polling but is driven by the guest.  As the guest
> +     * polls, we poll the host.
> +     */
> +
> +    if (s->fd == -1) {
> +        s->fd = qemu_open(bs->filename, s->open_flags, 0644);
> +        if (s->fd < 0) {
> +            return 0;
> +        }
> +    }
> +
> +    ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK);
> +
> +    if (!ret) {
> +        close(s->fd);
> +        s->fd = -1;
> +    }
> +    return ret;
>  }

We have this code in raw_close:

    if (s->fd >= 0) {
        close(s->fd);
        s->fd = -1;
        if (s->aligned_buf != NULL)
            qemu_vfree(s->aligned_buf);
    }

So now that we set s->fd = -1, this part won't be run and we leak
s->aligned_buf. This problem exists in other places in raw-posix, too.

The other thing is that I'm not sure if everything in raw-posix is
prepared to deal with a -1 fd. At the very least, I think we'll get
-EBADF errors instead of the expected -ENOMEDIUM.

The general approach looks good to me.

Kevin
Stefan Hajnoczi - April 1, 2011, 2:09 p.m.
On Thu, Mar 31, 2011 at 11:05 AM, Kevin Wolf <kwolf@redhat.com> wrote:
> The other thing is that I'm not sure if everything in raw-posix is
> prepared to deal with a -1 fd. At the very least, I think we'll get
> -EBADF errors instead of the expected -ENOMEDIUM.

Not all of block.c checks for !bdrv_is_inserted() so you are right, I
missed cases where we should return -ENOMEDIUM.

We have a similar scenario with floppy disks where fd_open() can
return leaving s->fd == -1.  I would like to unify these floppy and
CD-ROM open cases.

Stefan
Stefan Hajnoczi - April 3, 2011, 11:57 a.m.
On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> Piggy-back on the guest CD-ROM polling to poll on the host.  Open and
> close the host CD-ROM file descriptor to ensure we read the new size and
> not a stale size.
>
> Two things are going on here:
>
> 1. If hald/udisks is not already polling CD-ROMs on the host then
>   re-opening the CD-ROM causes the host to read the new medium's size.
>
> 2. There is a bug in Linux which means the CD-ROM file descriptor must
>   be re-opened in order for lseek(2) to see the new size.  The
>   inode size gets out of sync with the underlying device (which you can
>   confirm by checking that /sys/block/sr0/size and lseek(2) do not
>   match after media change).  I have raised this with the
>   maintainers but we need a workaround for the foreseeable future.
>
> Note that these changes are all in a #ifdef __linux__ section.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block/raw-posix.c |   26 ++++++++++++++++++++++----
>  1 files changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 6b72470..8b5205c 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs)
>     BDRVRawState *s = bs->opaque;
>     int ret;
>
> -    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
> -    if (ret == CDS_DISC_OK)
> -        return 1;
> -    return 0;
> +    /*
> +     * Close the file descriptor if no medium is present and open it to poll
> +     * again.  This ensures the medium size is refreshed.  If the file
> +     * descriptor is kept open the size can become stale.  This is essentially
> +     * replicating CD-ROM polling but is driven by the guest.  As the guest
> +     * polls, we poll the host.
> +     */
> +
> +    if (s->fd == -1) {
> +        s->fd = qemu_open(bs->filename, s->open_flags, 0644);
> +        if (s->fd < 0) {
> +            return 0;
> +        }
> +    }
> +
> +    ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK);
> +
> +    if (!ret) {
> +        close(s->fd);
> +        s->fd = -1;
> +    }
> +    return ret;
>  }
>
>  static int cdrom_eject(BlockDriverState *bs, int eject_flag)
> --
> 1.7.4.1
>
>
>

There is an issue with reopening host devices in QEMU when running
under libvirt.  It appears that libvirt chowns image files (including
device nodes) so that the launched QEMU process can access them.

Unfortunately after media change on host devices udev will reset the
ownership of the device node.  This causes open(2) to fail with EACCES
since the QEMU process does not have the right uid/gid/groups and
libvirt is unaware that the file's ownership has changed.

In order for media change to work with Linux host CD-ROM it is
necessary to reopen the file (otherwise the inode size will not
refresh, this is an issue with existing kernels).

How can libvirt's security model be made to support this case?  In
theory udev could be temporarily configured with libvirt permissions
for the CD-ROM device while passed through to the guest, but is that
feasible?

Stefan
Blue Swirl - April 3, 2011, 1:12 p.m.
On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com> wrote:
>> Piggy-back on the guest CD-ROM polling to poll on the host.  Open and
>> close the host CD-ROM file descriptor to ensure we read the new size and
>> not a stale size.
>>
>> Two things are going on here:
>>
>> 1. If hald/udisks is not already polling CD-ROMs on the host then
>>   re-opening the CD-ROM causes the host to read the new medium's size.
>>
>> 2. There is a bug in Linux which means the CD-ROM file descriptor must
>>   be re-opened in order for lseek(2) to see the new size.  The
>>   inode size gets out of sync with the underlying device (which you can
>>   confirm by checking that /sys/block/sr0/size and lseek(2) do not
>>   match after media change).  I have raised this with the
>>   maintainers but we need a workaround for the foreseeable future.
>>
>> Note that these changes are all in a #ifdef __linux__ section.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  block/raw-posix.c |   26 ++++++++++++++++++++++----
>>  1 files changed, 22 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 6b72470..8b5205c 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs)
>>     BDRVRawState *s = bs->opaque;
>>     int ret;
>>
>> -    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>> -    if (ret == CDS_DISC_OK)
>> -        return 1;
>> -    return 0;
>> +    /*
>> +     * Close the file descriptor if no medium is present and open it to poll
>> +     * again.  This ensures the medium size is refreshed.  If the file
>> +     * descriptor is kept open the size can become stale.  This is essentially
>> +     * replicating CD-ROM polling but is driven by the guest.  As the guest
>> +     * polls, we poll the host.
>> +     */
>> +
>> +    if (s->fd == -1) {
>> +        s->fd = qemu_open(bs->filename, s->open_flags, 0644);
>> +        if (s->fd < 0) {
>> +            return 0;
>> +        }
>> +    }
>> +
>> +    ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK);
>> +
>> +    if (!ret) {
>> +        close(s->fd);
>> +        s->fd = -1;
>> +    }
>> +    return ret;
>>  }
>>
>>  static int cdrom_eject(BlockDriverState *bs, int eject_flag)
>> --
>> 1.7.4.1
>>
>>
>>
>
> There is an issue with reopening host devices in QEMU when running
> under libvirt.  It appears that libvirt chowns image files (including
> device nodes) so that the launched QEMU process can access them.
>
> Unfortunately after media change on host devices udev will reset the
> ownership of the device node.  This causes open(2) to fail with EACCES
> since the QEMU process does not have the right uid/gid/groups and
> libvirt is unaware that the file's ownership has changed.
>
> In order for media change to work with Linux host CD-ROM it is
> necessary to reopen the file (otherwise the inode size will not
> refresh, this is an issue with existing kernels).
>
> How can libvirt's security model be made to support this case?  In
> theory udev could be temporarily configured with libvirt permissions
> for the CD-ROM device while passed through to the guest, but is that
> feasible?

How about something like this: Add an explicit reopen method to
BlockDriver. Make a special block device for passed file descriptors.
Pass descriptors in libvirt for CD-ROMs instead of the device paths.
The reopen method for file descriptors should notify libvirt about
need to pass a reopened descriptor and then block all accesses until a
new descriptor is available. This should also solve your earlier
problem.
Stefan Hajnoczi - April 3, 2011, 6:06 p.m.
On Sun, Apr 3, 2011 at 2:12 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi
>> <stefanha@linux.vnet.ibm.com> wrote:
>>> Piggy-back on the guest CD-ROM polling to poll on the host.  Open and
>>> close the host CD-ROM file descriptor to ensure we read the new size and
>>> not a stale size.
>>>
>>> Two things are going on here:
>>>
>>> 1. If hald/udisks is not already polling CD-ROMs on the host then
>>>   re-opening the CD-ROM causes the host to read the new medium's size.
>>>
>>> 2. There is a bug in Linux which means the CD-ROM file descriptor must
>>>   be re-opened in order for lseek(2) to see the new size.  The
>>>   inode size gets out of sync with the underlying device (which you can
>>>   confirm by checking that /sys/block/sr0/size and lseek(2) do not
>>>   match after media change).  I have raised this with the
>>>   maintainers but we need a workaround for the foreseeable future.
>>>
>>> Note that these changes are all in a #ifdef __linux__ section.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>> ---
>>>  block/raw-posix.c |   26 ++++++++++++++++++++++----
>>>  1 files changed, 22 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>> index 6b72470..8b5205c 100644
>>> --- a/block/raw-posix.c
>>> +++ b/block/raw-posix.c
>>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs)
>>>     BDRVRawState *s = bs->opaque;
>>>     int ret;
>>>
>>> -    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>>> -    if (ret == CDS_DISC_OK)
>>> -        return 1;
>>> -    return 0;
>>> +    /*
>>> +     * Close the file descriptor if no medium is present and open it to poll
>>> +     * again.  This ensures the medium size is refreshed.  If the file
>>> +     * descriptor is kept open the size can become stale.  This is essentially
>>> +     * replicating CD-ROM polling but is driven by the guest.  As the guest
>>> +     * polls, we poll the host.
>>> +     */
>>> +
>>> +    if (s->fd == -1) {
>>> +        s->fd = qemu_open(bs->filename, s->open_flags, 0644);
>>> +        if (s->fd < 0) {
>>> +            return 0;
>>> +        }
>>> +    }
>>> +
>>> +    ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK);
>>> +
>>> +    if (!ret) {
>>> +        close(s->fd);
>>> +        s->fd = -1;
>>> +    }
>>> +    return ret;
>>>  }
>>>
>>>  static int cdrom_eject(BlockDriverState *bs, int eject_flag)
>>> --
>>> 1.7.4.1
>>>
>>>
>>>
>>
>> There is an issue with reopening host devices in QEMU when running
>> under libvirt.  It appears that libvirt chowns image files (including
>> device nodes) so that the launched QEMU process can access them.
>>
>> Unfortunately after media change on host devices udev will reset the
>> ownership of the device node.  This causes open(2) to fail with EACCES
>> since the QEMU process does not have the right uid/gid/groups and
>> libvirt is unaware that the file's ownership has changed.
>>
>> In order for media change to work with Linux host CD-ROM it is
>> necessary to reopen the file (otherwise the inode size will not
>> refresh, this is an issue with existing kernels).
>>
>> How can libvirt's security model be made to support this case?  In
>> theory udev could be temporarily configured with libvirt permissions
>> for the CD-ROM device while passed through to the guest, but is that
>> feasible?
>
> How about something like this: Add an explicit reopen method to
> BlockDriver. Make a special block device for passed file descriptors.
> Pass descriptors in libvirt for CD-ROMs instead of the device paths.
> The reopen method for file descriptors should notify libvirt about
> need to pass a reopened descriptor and then block all accesses until a
> new descriptor is available. This should also solve your earlier
> problem.

I'm hoping libvirt's behavior can be made to just work rather than
adding new features to QEMU.  But perhaps passing file descriptors is
useful for more than just reopening host devices.  This would
basically be a privilege separation model where the QEMU process isn't
able to open files itself but can request libvirt to open them on its
behalf.

Stefan
Daniel P. Berrange - April 4, 2011, 10:47 a.m.
On Sun, Apr 03, 2011 at 07:06:17PM +0100, Stefan Hajnoczi wrote:
> On Sun, Apr 3, 2011 at 2:12 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> > On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi
> >> <stefanha@linux.vnet.ibm.com> wrote:
> >>> Piggy-back on the guest CD-ROM polling to poll on the host.  Open and
> >>> close the host CD-ROM file descriptor to ensure we read the new size and
> >>> not a stale size.
> >>>
> >>> Two things are going on here:
> >>>
> >>> 1. If hald/udisks is not already polling CD-ROMs on the host then
> >>>   re-opening the CD-ROM causes the host to read the new medium's size.
> >>>
> >>> 2. There is a bug in Linux which means the CD-ROM file descriptor must
> >>>   be re-opened in order for lseek(2) to see the new size.  The
> >>>   inode size gets out of sync with the underlying device (which you can
> >>>   confirm by checking that /sys/block/sr0/size and lseek(2) do not
> >>>   match after media change).  I have raised this with the
> >>>   maintainers but we need a workaround for the foreseeable future.
> >>>
> >>> Note that these changes are all in a #ifdef __linux__ section.
> >>>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>> ---
> >>>  block/raw-posix.c |   26 ++++++++++++++++++++++----
> >>>  1 files changed, 22 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/block/raw-posix.c b/block/raw-posix.c
> >>> index 6b72470..8b5205c 100644
> >>> --- a/block/raw-posix.c
> >>> +++ b/block/raw-posix.c
> >>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs)
> >>>     BDRVRawState *s = bs->opaque;
> >>>     int ret;
> >>>
> >>> -    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
> >>> -    if (ret == CDS_DISC_OK)
> >>> -        return 1;
> >>> -    return 0;
> >>> +    /*
> >>> +     * Close the file descriptor if no medium is present and open it to poll
> >>> +     * again.  This ensures the medium size is refreshed.  If the file
> >>> +     * descriptor is kept open the size can become stale.  This is essentially
> >>> +     * replicating CD-ROM polling but is driven by the guest.  As the guest
> >>> +     * polls, we poll the host.
> >>> +     */
> >>> +
> >>> +    if (s->fd == -1) {
> >>> +        s->fd = qemu_open(bs->filename, s->open_flags, 0644);
> >>> +        if (s->fd < 0) {
> >>> +            return 0;
> >>> +        }
> >>> +    }
> >>> +
> >>> +    ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK);
> >>> +
> >>> +    if (!ret) {
> >>> +        close(s->fd);
> >>> +        s->fd = -1;
> >>> +    }
> >>> +    return ret;
> >>>  }
> >>>
> >>>  static int cdrom_eject(BlockDriverState *bs, int eject_flag)
> >>> --
> >>> 1.7.4.1
> >>>
> >>>
> >>>
> >>
> >> There is an issue with reopening host devices in QEMU when running
> >> under libvirt.  It appears that libvirt chowns image files (including
> >> device nodes) so that the launched QEMU process can access them.
> >>
> >> Unfortunately after media change on host devices udev will reset the
> >> ownership of the device node.  This causes open(2) to fail with EACCES
> >> since the QEMU process does not have the right uid/gid/groups and
> >> libvirt is unaware that the file's ownership has changed.
> >>
> >> In order for media change to work with Linux host CD-ROM it is
> >> necessary to reopen the file (otherwise the inode size will not
> >> refresh, this is an issue with existing kernels).
> >>
> >> How can libvirt's security model be made to support this case?  In
> >> theory udev could be temporarily configured with libvirt permissions
> >> for the CD-ROM device while passed through to the guest, but is that
> >> feasible?
> >
> > How about something like this: Add an explicit reopen method to
> > BlockDriver. Make a special block device for passed file descriptors.
> > Pass descriptors in libvirt for CD-ROMs instead of the device paths.
> > The reopen method for file descriptors should notify libvirt about
> > need to pass a reopened descriptor and then block all accesses until a
> > new descriptor is available. This should also solve your earlier
> > problem.
> 
> I'm hoping libvirt's behavior can be made to just work rather than
> adding new features to QEMU.  But perhaps passing file descriptors is
> useful for more than just reopening host devices.  This would
> basically be a privilege separation model where the QEMU process isn't
> able to open files itself but can request libvirt to open them on its
> behalf.

It is rather frickin' annoying the way udev resets the ownership
when the media merely changes. If it isn't possible to stop udev
doing this, then i think the only practical thing is to use ACLs
instead of user/group ownership. We wanted to switch to ACLs in
libvirt for other reasons already, but it isn't quite as simple
as it sounds[1] so we've not done it just yet.

Daniel

[1] Mostly due to handling upgrades from existing libvirtd while
    VMs are running, and coping with filesystems which don't
    support ACLs (or have them turned of by mount options)
Stefan Hajnoczi - April 4, 2011, 12:58 p.m.
On Mon, Apr 4, 2011 at 11:47 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
> On Sun, Apr 03, 2011 at 07:06:17PM +0100, Stefan Hajnoczi wrote:
>> On Sun, Apr 3, 2011 at 2:12 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
>> > On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> >> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi
>> >> <stefanha@linux.vnet.ibm.com> wrote:
>> >>> Piggy-back on the guest CD-ROM polling to poll on the host.  Open and
>> >>> close the host CD-ROM file descriptor to ensure we read the new size and
>> >>> not a stale size.
>> >>>
>> >>> Two things are going on here:
>> >>>
>> >>> 1. If hald/udisks is not already polling CD-ROMs on the host then
>> >>>   re-opening the CD-ROM causes the host to read the new medium's size.
>> >>>
>> >>> 2. There is a bug in Linux which means the CD-ROM file descriptor must
>> >>>   be re-opened in order for lseek(2) to see the new size.  The
>> >>>   inode size gets out of sync with the underlying device (which you can
>> >>>   confirm by checking that /sys/block/sr0/size and lseek(2) do not
>> >>>   match after media change).  I have raised this with the
>> >>>   maintainers but we need a workaround for the foreseeable future.
>> >>>
>> >>> Note that these changes are all in a #ifdef __linux__ section.
>> >>>
>> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> >>> ---
>> >>>  block/raw-posix.c |   26 ++++++++++++++++++++++----
>> >>>  1 files changed, 22 insertions(+), 4 deletions(-)
>> >>>
>> >>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> >>> index 6b72470..8b5205c 100644
>> >>> --- a/block/raw-posix.c
>> >>> +++ b/block/raw-posix.c
>> >>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs)
>> >>>     BDRVRawState *s = bs->opaque;
>> >>>     int ret;
>> >>>
>> >>> -    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
>> >>> -    if (ret == CDS_DISC_OK)
>> >>> -        return 1;
>> >>> -    return 0;
>> >>> +    /*
>> >>> +     * Close the file descriptor if no medium is present and open it to poll
>> >>> +     * again.  This ensures the medium size is refreshed.  If the file
>> >>> +     * descriptor is kept open the size can become stale.  This is essentially
>> >>> +     * replicating CD-ROM polling but is driven by the guest.  As the guest
>> >>> +     * polls, we poll the host.
>> >>> +     */
>> >>> +
>> >>> +    if (s->fd == -1) {
>> >>> +        s->fd = qemu_open(bs->filename, s->open_flags, 0644);
>> >>> +        if (s->fd < 0) {
>> >>> +            return 0;
>> >>> +        }
>> >>> +    }
>> >>> +
>> >>> +    ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK);
>> >>> +
>> >>> +    if (!ret) {
>> >>> +        close(s->fd);
>> >>> +        s->fd = -1;
>> >>> +    }
>> >>> +    return ret;
>> >>>  }
>> >>>
>> >>>  static int cdrom_eject(BlockDriverState *bs, int eject_flag)
>> >>> --
>> >>> 1.7.4.1
>> >>>
>> >>>
>> >>>
>> >>
>> >> There is an issue with reopening host devices in QEMU when running
>> >> under libvirt.  It appears that libvirt chowns image files (including
>> >> device nodes) so that the launched QEMU process can access them.
>> >>
>> >> Unfortunately after media change on host devices udev will reset the
>> >> ownership of the device node.  This causes open(2) to fail with EACCES
>> >> since the QEMU process does not have the right uid/gid/groups and
>> >> libvirt is unaware that the file's ownership has changed.
>> >>
>> >> In order for media change to work with Linux host CD-ROM it is
>> >> necessary to reopen the file (otherwise the inode size will not
>> >> refresh, this is an issue with existing kernels).
>> >>
>> >> How can libvirt's security model be made to support this case?  In
>> >> theory udev could be temporarily configured with libvirt permissions
>> >> for the CD-ROM device while passed through to the guest, but is that
>> >> feasible?
>> >
>> > How about something like this: Add an explicit reopen method to
>> > BlockDriver. Make a special block device for passed file descriptors.
>> > Pass descriptors in libvirt for CD-ROMs instead of the device paths.
>> > The reopen method for file descriptors should notify libvirt about
>> > need to pass a reopened descriptor and then block all accesses until a
>> > new descriptor is available. This should also solve your earlier
>> > problem.
>>
>> I'm hoping libvirt's behavior can be made to just work rather than
>> adding new features to QEMU.  But perhaps passing file descriptors is
>> useful for more than just reopening host devices.  This would
>> basically be a privilege separation model where the QEMU process isn't
>> able to open files itself but can request libvirt to open them on its
>> behalf.
>
> It is rather frickin' annoying the way udev resets the ownership
> when the media merely changes. If it isn't possible to stop udev
> doing this, then i think the only practical thing is to use ACLs
> instead of user/group ownership. We wanted to switch to ACLs in
> libvirt for other reasons already, but it isn't quite as simple
> as it sounds[1] so we've not done it just yet.
>
> Daniel
>
> [1] Mostly due to handling upgrades from existing libvirtd while
>    VMs are running, and coping with filesystems which don't
>    support ACLs (or have them turned of by mount options)

I haven't peeked at how udev does it but perhaps the ACLs will be lost
or reset across media change too?

Daniel, do you know someone from the udev side who we should include
in this discussion?

Stefan
Anthony Liguori - April 4, 2011, 1:02 p.m.
On 04/04/2011 05:47 AM, Daniel P. Berrange wrote:
>> I'm hoping libvirt's behavior can be made to just work rather than
>> adding new features to QEMU.  But perhaps passing file descriptors is
>> useful for more than just reopening host devices.  This would
>> basically be a privilege separation model where the QEMU process isn't
>> able to open files itself but can request libvirt to open them on its
>> behalf.
> It is rather frickin' annoying the way udev resets the ownership
> when the media merely changes. If it isn't possible to stop udev
> doing this, then i think the only practical thing is to use ACLs
> instead of user/group ownership. We wanted to switch to ACLs in
> libvirt for other reasons already, but it isn't quite as simple
> as it sounds[1] so we've not done it just yet.

Isn't the root of the problem that you're not running a guest in the 
expected security context?

How much of a leap would it be to spawn a guest with the credentials of 
the user that created/defined it?  Or better yet, to let the user be 
specified in the XML.

Regards,

Anthony Liguori

> Daniel
>
> [1] Mostly due to handling upgrades from existing libvirtd while
>      VMs are running, and coping with filesystems which don't
>      support ACLs (or have them turned of by mount options)
Daniel P. Berrange - April 4, 2011, 1:16 p.m.
On Mon, Apr 04, 2011 at 08:02:26AM -0500, Anthony Liguori wrote:
> On 04/04/2011 05:47 AM, Daniel P. Berrange wrote:
> >>I'm hoping libvirt's behavior can be made to just work rather than
> >>adding new features to QEMU.  But perhaps passing file descriptors is
> >>useful for more than just reopening host devices.  This would
> >>basically be a privilege separation model where the QEMU process isn't
> >>able to open files itself but can request libvirt to open them on its
> >>behalf.
> >It is rather frickin' annoying the way udev resets the ownership
> >when the media merely changes. If it isn't possible to stop udev
> >doing this, then i think the only practical thing is to use ACLs
> >instead of user/group ownership. We wanted to switch to ACLs in
> >libvirt for other reasons already, but it isn't quite as simple
> >as it sounds[1] so we've not done it just yet.
> 
> Isn't the root of the problem that you're not running a guest in the
> expected security context?

That doesn't really have any impact. If a desktop user is logged
in, udev may change the ownership to match that user, but if they
aren't, then udev may reset it to root:disk. Either way, QEMU
may loose permissions to the disk.

> How much of a leap would it be to spawn a guest with the credentials
> of the user that created/defined it?  Or better yet, to let the user
> be specified in the XML.

That's a completely independent RFE which won't fix this issue in
the general case.

Regards,
Daniel
Avi Kivity - April 4, 2011, 1:22 p.m.
On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
> In order for media change to work with Linux host CD-ROM it is
> necessary to reopen the file (otherwise the inode size will not
> refresh, this is an issue with existing kernels).
>

Maybe we should fix the bug in Linux (and backport as necessary)?

I think cd-rom assignment is sufficiently obscure that we can require a 
fixed kernel instead of providing a workaround.
Anthony Liguori - April 4, 2011, 1:38 p.m.
On 04/04/2011 08:22 AM, Avi Kivity wrote:
> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>> In order for media change to work with Linux host CD-ROM it is
>> necessary to reopen the file (otherwise the inode size will not
>> refresh, this is an issue with existing kernels).
>>
>
> Maybe we should fix the bug in Linux (and backport as necessary)?
>
> I think cd-rom assignment is sufficiently obscure that we can require 
> a fixed kernel instead of providing a workaround.

Do reads fail after CD change?  Or do they succeed and the size is just 
reported incorrectly?

If it's the later, I'd agree that it needs fixing in the kernel.  If 
it's the former, I'd say it's clearly a feature.

Regards,

Anthony Liguori
Avi Kivity - April 4, 2011, 1:49 p.m.
On 04/04/2011 04:38 PM, Anthony Liguori wrote:
> On 04/04/2011 08:22 AM, Avi Kivity wrote:
>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>>> In order for media change to work with Linux host CD-ROM it is
>>> necessary to reopen the file (otherwise the inode size will not
>>> refresh, this is an issue with existing kernels).
>>>
>>
>> Maybe we should fix the bug in Linux (and backport as necessary)?
>>
>> I think cd-rom assignment is sufficiently obscure that we can require 
>> a fixed kernel instead of providing a workaround.
>
> Do reads fail after CD change?  Or do they succeed and the size is 
> just reported incorrectly?
>
> If it's the later, I'd agree that it needs fixing in the kernel.  If 
> it's the former, I'd say it's clearly a feature.
>

Even if it's a documented or intentional feature, we can add an ioctl to 
"refresh" the device with up-to-date data.
Anthony Liguori - April 4, 2011, 2:19 p.m.
On 04/04/2011 08:16 AM, Daniel P. Berrange wrote:
> That doesn't really have any impact. If a desktop user is logged
> in, udev may change the ownership to match that user, but if they
> aren't, then udev may reset it to root:disk. Either way, QEMU
> may loose permissions to the disk.

Then if you create a guest without being in the 'disk' group, it'll 
fail.  That's pretty expected AFAICT.

But with libvirt today, when you launch a guest, your security context 
doesn't matter and there's no way you can control what context the guest 
gets.  libvirt is essentially creating it's own authorization 
mechanism.  Supporting ACLs goes much further down that path.

>> How much of a leap would it be to spawn a guest with the credentials
>> of the user that created/defined it?  Or better yet, to let the user
>> be specified in the XML.
> That's a completely independent RFE which won't fix this issue in
> the general case.

I think it really does.

Regards,

Anthony Liguori

> Regards,
> Daniel
Daniel P. Berrange - April 4, 2011, 2:26 p.m.
On Mon, Apr 04, 2011 at 09:19:36AM -0500, Anthony Liguori wrote:
> On 04/04/2011 08:16 AM, Daniel P. Berrange wrote:
> >That doesn't really have any impact. If a desktop user is logged
> >in, udev may change the ownership to match that user, but if they
> >aren't, then udev may reset it to root:disk. Either way, QEMU
> >may loose permissions to the disk.
> 
> Then if you create a guest without being in the 'disk' group, it'll
> fail.  That's pretty expected AFAICT.

We don't *ever* want to put QEMU in the 'disk' group because
that gives it access to any disk on the system in general.

> But with libvirt today, when you launch a guest, your security
> context doesn't matter and there's no way you can control what
> context the guest gets.  libvirt is essentially creating it's own
> authorization mechanism.  Supporting ACLs goes much further down
> that path.
> 
> >>How much of a leap would it be to spawn a guest with the credentials
> >>of the user that created/defined it?  Or better yet, to let the user
> >>be specified in the XML.
> >That's a completely independent RFE which won't fix this issue in
> >the general case.
> 
> I think it really does.

Nope it doesn't.

Regards,
Daniel
Anthony Liguori - April 4, 2011, 2:43 p.m.
On 04/04/2011 09:26 AM, Daniel P. Berrange wrote:
> On Mon, Apr 04, 2011 at 09:19:36AM -0500, Anthony Liguori wrote:
>> On 04/04/2011 08:16 AM, Daniel P. Berrange wrote:
>>> That doesn't really have any impact. If a desktop user is logged
>>> in, udev may change the ownership to match that user, but if they
>>> aren't, then udev may reset it to root:disk. Either way, QEMU
>>> may loose permissions to the disk.
>> Then if you create a guest without being in the 'disk' group, it'll
>> fail.  That's pretty expected AFAICT.
> We don't *ever* want to put QEMU in the 'disk' group because
> that gives it access to any disk on the system in general.

If that's what the user wants to do, what's the problem with doing it?

Setting the global user/group is not enough because just because you 
have one VM that you want in disk doesn't mean you want all of them in disk.

And to be clear, if you could do this today, there wouldn't be a problem 
here in terms of QEMU just reopening the file.

Regards,

Anthony Liguori

Regards,

Anthony Liguori
Stefan Hajnoczi - April 4, 2011, 3:09 p.m.
On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote:
> On 04/04/2011 04:38 PM, Anthony Liguori wrote:
>>
>> On 04/04/2011 08:22 AM, Avi Kivity wrote:
>>>
>>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>>>>
>>>> In order for media change to work with Linux host CD-ROM it is
>>>> necessary to reopen the file (otherwise the inode size will not
>>>> refresh, this is an issue with existing kernels).
>>>>
>>>
>>> Maybe we should fix the bug in Linux (and backport as necessary)?
>>>
>>> I think cd-rom assignment is sufficiently obscure that we can require a
>>> fixed kernel instead of providing a workaround.
>>
>> Do reads fail after CD change?  Or do they succeed and the size is just
>> reported incorrectly?
>>
>> If it's the later, I'd agree that it needs fixing in the kernel.  If it's
>> the former, I'd say it's clearly a feature.
>>
>
> Even if it's a documented or intentional feature, we can add an ioctl to
> "refresh" the device with up-to-date data.

It's possible to fix this in the kernel.  I just haven't written the
patch yet.  The inode size needs to be updated when the new medium is
detected.

I haven't tested but I suspect reads within the size of the previous
medium will succeed.  But if the new medium is larger then reads
beyond the old medium size will fail.

The size reported by lseek(fd, 0, SEEK_END) is outdated.

Stefan
Avi Kivity - April 4, 2011, 3:11 p.m.
On 04/04/2011 06:09 PM, Stefan Hajnoczi wrote:
> On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity<avi@redhat.com>  wrote:
> >  On 04/04/2011 04:38 PM, Anthony Liguori wrote:
> >>
> >>  On 04/04/2011 08:22 AM, Avi Kivity wrote:
> >>>
> >>>  On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
> >>>>
> >>>>  In order for media change to work with Linux host CD-ROM it is
> >>>>  necessary to reopen the file (otherwise the inode size will not
> >>>>  refresh, this is an issue with existing kernels).
> >>>>
> >>>
> >>>  Maybe we should fix the bug in Linux (and backport as necessary)?
> >>>
> >>>  I think cd-rom assignment is sufficiently obscure that we can require a
> >>>  fixed kernel instead of providing a workaround.
> >>
> >>  Do reads fail after CD change?  Or do they succeed and the size is just
> >>  reported incorrectly?
> >>
> >>  If it's the later, I'd agree that it needs fixing in the kernel.  If it's
> >>  the former, I'd say it's clearly a feature.
> >>
> >
> >  Even if it's a documented or intentional feature, we can add an ioctl to
> >  "refresh" the device with up-to-date data.
>
> It's possible to fix this in the kernel.  I just haven't written the
> patch yet.  The inode size needs to be updated when the new medium is
> detected.
>
> I haven't tested but I suspect reads within the size of the previous
> medium will succeed.  But if the new medium is larger then reads
> beyond the old medium size will fail.
>
> The size reported by lseek(fd, 0, SEEK_END) is outdated.

I believe a kernel fix is best in that case, leaving qemu alone.
Blue Swirl - April 4, 2011, 4:38 p.m.
On Mon, Apr 4, 2011 at 5:43 PM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 04/04/2011 09:26 AM, Daniel P. Berrange wrote:
>>
>> On Mon, Apr 04, 2011 at 09:19:36AM -0500, Anthony Liguori wrote:
>>>
>>> On 04/04/2011 08:16 AM, Daniel P. Berrange wrote:
>>>>
>>>> That doesn't really have any impact. If a desktop user is logged
>>>> in, udev may change the ownership to match that user, but if they
>>>> aren't, then udev may reset it to root:disk. Either way, QEMU
>>>> may loose permissions to the disk.
>>>
>>> Then if you create a guest without being in the 'disk' group, it'll
>>> fail.  That's pretty expected AFAICT.
>>
>> We don't *ever* want to put QEMU in the 'disk' group because
>> that gives it access to any disk on the system in general.
>
> If that's what the user wants to do, what's the problem with doing it?
>
> Setting the global user/group is not enough because just because you have
> one VM that you want in disk doesn't mean you want all of them in disk.

Privilege separated QEMU sounds so interesting that I'd go for that
direction. There could be helper processes which retain privileges and
communicate with the main unprivileged QEMU with only file
descriptors. The helpers could even execute setgid disk group
re-opener for the CD-ROM case, or ask libvirt to do the reopen. For
unprivileged QEMU part it wouldn't matter, all it sees are the
descriptors.
David Ahern - April 4, 2011, 5:54 p.m.
On 04/04/11 07:38, Anthony Liguori wrote:
> On 04/04/2011 08:22 AM, Avi Kivity wrote:
>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>>> In order for media change to work with Linux host CD-ROM it is
>>> necessary to reopen the file (otherwise the inode size will not
>>> refresh, this is an issue with existing kernels).
>>>
>>
>> Maybe we should fix the bug in Linux (and backport as necessary)?
>>
>> I think cd-rom assignment is sufficiently obscure that we can require
>> a fixed kernel instead of providing a workaround.
> 
> Do reads fail after CD change?  Or do they succeed and the size is just
> reported incorrectly?
> 
> If it's the later, I'd agree that it needs fixing in the kernel.  If
> it's the former, I'd say it's clearly a feature.

In January 2010 I was seeing old data -- data from the prior CD -- in
the guest after the media change.

David
Stefan Hajnoczi - April 5, 2011, 5:33 a.m.
On Mon, Apr 4, 2011 at 6:54 PM, David Ahern <dsahern@gmail.com> wrote:
>
>
> On 04/04/11 07:38, Anthony Liguori wrote:
>> On 04/04/2011 08:22 AM, Avi Kivity wrote:
>>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>>>> In order for media change to work with Linux host CD-ROM it is
>>>> necessary to reopen the file (otherwise the inode size will not
>>>> refresh, this is an issue with existing kernels).
>>>>
>>>
>>> Maybe we should fix the bug in Linux (and backport as necessary)?
>>>
>>> I think cd-rom assignment is sufficiently obscure that we can require
>>> a fixed kernel instead of providing a workaround.
>>
>> Do reads fail after CD change?  Or do they succeed and the size is just
>> reported incorrectly?
>>
>> If it's the later, I'd agree that it needs fixing in the kernel.  If
>> it's the former, I'd say it's clearly a feature.
>
> In January 2010 I was seeing old data -- data from the prior CD -- in
> the guest after the media change.

Yikes.  Is there a bug report for this?  What are the steps to reproduce it?

Stefan
David Ahern - April 5, 2011, 5:42 a.m.
On 04/04/11 23:33, Stefan Hajnoczi wrote:
> On Mon, Apr 4, 2011 at 6:54 PM, David Ahern <dsahern@gmail.com> wrote:
>> On 04/04/11 07:38, Anthony Liguori wrote:
>>> On 04/04/2011 08:22 AM, Avi Kivity wrote:
>>>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>>>>> In order for media change to work with Linux host CD-ROM it is
>>>>> necessary to reopen the file (otherwise the inode size will not
>>>>> refresh, this is an issue with existing kernels).
>>>>>
>>>>
>>>> Maybe we should fix the bug in Linux (and backport as necessary)?
>>>>
>>>> I think cd-rom assignment is sufficiently obscure that we can require
>>>> a fixed kernel instead of providing a workaround.
>>>
>>> Do reads fail after CD change?  Or do they succeed and the size is just
>>> reported incorrectly?
>>>
>>> If it's the later, I'd agree that it needs fixing in the kernel.  If
>>> it's the former, I'd say it's clearly a feature.
>>
>> In January 2010 I was seeing old data -- data from the prior CD -- in
>> the guest after the media change.
> 
> Yikes.  Is there a bug report for this?  What are the steps to reproduce it?

Not that I know of. It is reported by someone else last year as well:
http://www.mail-archive.com/kvm@vger.kernel.org/msg32999.html

> 
> Stefan
Amit Shah - April 5, 2011, 6:41 a.m.
On (Mon) 04 Apr 2011 [16:09:05], Stefan Hajnoczi wrote:
> On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote:
> > On 04/04/2011 04:38 PM, Anthony Liguori wrote:
> >>
> >> On 04/04/2011 08:22 AM, Avi Kivity wrote:
> >>>
> >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
> >>>>
> >>>> In order for media change to work with Linux host CD-ROM it is
> >>>> necessary to reopen the file (otherwise the inode size will not
> >>>> refresh, this is an issue with existing kernels).
> >>>>
> >>>
> >>> Maybe we should fix the bug in Linux (and backport as necessary)?
> >>>
> >>> I think cd-rom assignment is sufficiently obscure that we can require a
> >>> fixed kernel instead of providing a workaround.
> >>
> >> Do reads fail after CD change?  Or do they succeed and the size is just
> >> reported incorrectly?
> >>
> >> If it's the later, I'd agree that it needs fixing in the kernel.  If it's
> >> the former, I'd say it's clearly a feature.
> >>
> >
> > Even if it's a documented or intentional feature, we can add an ioctl to
> > "refresh" the device with up-to-date data.
> 
> It's possible to fix this in the kernel.  I just haven't written the
> patch yet.  The inode size needs to be updated when the new medium is
> detected.
> 
> I haven't tested but I suspect reads within the size of the previous
> medium will succeed.  But if the new medium is larger then reads
> beyond the old medium size will fail.

See http://www.spinics.net/lists/linux-scsi/msg51504.html

		Amit
Avi Kivity - April 5, 2011, 7:48 a.m.
On 04/05/2011 09:41 AM, Amit Shah wrote:
> See http://www.spinics.net/lists/linux-scsi/msg51504.html


I see this is quite fresh.  What are the plans here?
Amit Shah - April 5, 2011, 8:09 a.m.
On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote:
> On 04/05/2011 09:41 AM, Amit Shah wrote:
> >See http://www.spinics.net/lists/linux-scsi/msg51504.html
> 
> I see this is quite fresh.  What are the plans here?

We're still discussing where the fix should be, but it certainly is a
kernel bug and should be fixed there, and then applied to stable.

However, there are other bugs in qemu which will prevent the right
size changes to be visible in the guest (the RFC series I sent out
earlier in this thread need to be applied to QEMU at the least, the
series has grown in my development tree since the time I sent that one
out).  So essentially we need to update both, the hypervisor and the
guest to get proper CDROM media change support.

It also looks like we can't have a workaround in QEMU to get older
guests to work.

However, a hack in the kernel can be used without any QEMU changes
(revalidate disk on each sr_open() call, irrespective of detecting any
media change).  I'm against doing that for upstream, but downstreams
could do that for new guest - old hypervisor compat.

		Amit
Stefan Hajnoczi - April 5, 2011, 8:40 a.m.
On Tue, Apr 5, 2011 at 7:41 AM, Amit Shah <amit.shah@redhat.com> wrote:
> On (Mon) 04 Apr 2011 [16:09:05], Stefan Hajnoczi wrote:
>> On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote:
>> > On 04/04/2011 04:38 PM, Anthony Liguori wrote:
>> >>
>> >> On 04/04/2011 08:22 AM, Avi Kivity wrote:
>> >>>
>> >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>> >>>>
>> >>>> In order for media change to work with Linux host CD-ROM it is
>> >>>> necessary to reopen the file (otherwise the inode size will not
>> >>>> refresh, this is an issue with existing kernels).
>> >>>>
>> >>>
>> >>> Maybe we should fix the bug in Linux (and backport as necessary)?
>> >>>
>> >>> I think cd-rom assignment is sufficiently obscure that we can require a
>> >>> fixed kernel instead of providing a workaround.
>> >>
>> >> Do reads fail after CD change?  Or do they succeed and the size is just
>> >> reported incorrectly?
>> >>
>> >> If it's the later, I'd agree that it needs fixing in the kernel.  If it's
>> >> the former, I'd say it's clearly a feature.
>> >>
>> >
>> > Even if it's a documented or intentional feature, we can add an ioctl to
>> > "refresh" the device with up-to-date data.
>>
>> It's possible to fix this in the kernel.  I just haven't written the
>> patch yet.  The inode size needs to be updated when the new medium is
>> detected.
>>
>> I haven't tested but I suspect reads within the size of the previous
>> medium will succeed.  But if the new medium is larger then reads
>> beyond the old medium size will fail.
>
> See http://www.spinics.net/lists/linux-scsi/msg51504.html

I don't think that patch updates the block inode size.  We'd need to
call fs/block_dev.c:revalidate_disk() instead of directly calling
cdi->disk->fops->revalidate_disk(cdi->disk).
fs/block_dev.c:revalidate_disk() calls check_disk_size_change(), which
will update the inode size.

Here are the steps to reproduce the issue: https://lkml.org/lkml/2011/3/23/156

Stefan
Amit Shah - April 5, 2011, 8:58 a.m.
On (Tue) 05 Apr 2011 [09:40:05], Stefan Hajnoczi wrote:
> > See http://www.spinics.net/lists/linux-scsi/msg51504.html
> 
> I don't think that patch updates the block inode size.  We'd need to
> call fs/block_dev.c:revalidate_disk() instead of directly calling
> cdi->disk->fops->revalidate_disk(cdi->disk).
> fs/block_dev.c:revalidate_disk() calls check_disk_size_change(), which
> will update the inode size.

Then the patch is buggy :-)  As Tejun also says in the thread, the
patch should be in the block layer, not sr.c.

(btw that patch does update /sys/block/sr0/size, so that part of
revalidation is done.)

		Amit
Avi Kivity - April 5, 2011, 9 a.m.
On 04/05/2011 11:09 AM, Amit Shah wrote:
> On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote:
> >  On 04/05/2011 09:41 AM, Amit Shah wrote:
> >  >See http://www.spinics.net/lists/linux-scsi/msg51504.html
> >
> >  I see this is quite fresh.  What are the plans here?
>
> We're still discussing where the fix should be, but it certainly is a
> kernel bug and should be fixed there, and then applied to stable.
>
> However, there are other bugs in qemu which will prevent the right
> size changes to be visible in the guest (the RFC series I sent out
> earlier in this thread need to be applied to QEMU at the least, the
> series has grown in my development tree since the time I sent that one
> out).  So essentially we need to update both, the hypervisor and the
> guest to get proper CDROM media change support.

Why do we need to update the guest for a qemu bug?  What is the qemu bug?

> It also looks like we can't have a workaround in QEMU to get older
> guests to work.

Older guests?  or older hosts?

> However, a hack in the kernel can be used without any QEMU changes
> (revalidate disk on each sr_open() call, irrespective of detecting any
> media change).  I'm against doing that for upstream, but downstreams
> could do that for new guest - old hypervisor compat.

Seriously confused.  Please use the kernels "host kernel" and "qemu" 
instead of "hypervisor" which is ambiguous.
Amit Shah - April 5, 2011, 9:12 a.m.
On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote:
> On 04/05/2011 11:09 AM, Amit Shah wrote:
> >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote:
> >>  On 04/05/2011 09:41 AM, Amit Shah wrote:
> >>  >See http://www.spinics.net/lists/linux-scsi/msg51504.html
> >>
> >>  I see this is quite fresh.  What are the plans here?
> >
> >We're still discussing where the fix should be, but it certainly is a
> >kernel bug and should be fixed there, and then applied to stable.
> >
> >However, there are other bugs in qemu which will prevent the right
> >size changes to be visible in the guest (the RFC series I sent out
> >earlier in this thread need to be applied to QEMU at the least, the
> >series has grown in my development tree since the time I sent that one
> >out).  So essentially we need to update both, the hypervisor and the
> >guest to get proper CDROM media change support.
> 
> Why do we need to update the guest for a qemu bug?  What is the qemu bug?

Guest kernel bug: CDROM change event missed, so the the revalidate
call isn't made, which causes stale data (like disc size) to be used
on newer media.

qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command
from guests (which is a mandatory command acc. to scsi spec) which the
guest uses to detect CDROM changes.  Once this command is implemented,
QEMU sends the required info the guest needs to detect CDROM changes.
I have this implemented locally (also sent as RFC PATCH 2/3 in the
'cdrom bug roundup' thread.

So: even if qemu is updated to handle this command, the guest won't
work correctly since it misses the event.

> >It also looks like we can't have a workaround in QEMU to get older
> >guests to work.
> 
> Older guests?  or older hosts?

Older guests (not patched with fix for the bug described above).

Since the guest kernel completely misses the disc change event in the
path that does the revalidation, there's nothing qemu can do that will
make such older guests notice disc change.

Also: if only the guest kernel is updated by qemu is not, things still
won't work since qemu will never send valid information for the
GET_EVENT_STATUS_NOTIFICATION command.

> >However, a hack in the kernel can be used without any QEMU changes
> >(revalidate disk on each sr_open() call, irrespective of detecting any
> >media change).  I'm against doing that for upstream, but downstreams
> >could do that for new guest - old hypervisor compat.
> 
> Seriously confused.  Please use the kernels "host kernel" and "qemu"
> instead of "hypervisor" which is ambiguous.

OK: this last bit says that forcefully revalidating discs in the guest
kernel when a guest userspace opens the disc will ensure size changes
are reflected properly for guest userspace.  So in this case, even if
we're using an older qemu which doesn't implement
GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine.

This is obviously a hack.

		Amit
Avi Kivity - April 5, 2011, 9:17 a.m.
On 04/05/2011 12:12 PM, Amit Shah wrote:
> On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote:
> >  On 04/05/2011 11:09 AM, Amit Shah wrote:
> >  >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote:
> >  >>   On 04/05/2011 09:41 AM, Amit Shah wrote:
> >  >>   >See http://www.spinics.net/lists/linux-scsi/msg51504.html
> >  >>
> >  >>   I see this is quite fresh.  What are the plans here?
> >  >
> >  >We're still discussing where the fix should be, but it certainly is a
> >  >kernel bug and should be fixed there, and then applied to stable.
> >  >
> >  >However, there are other bugs in qemu which will prevent the right
> >  >size changes to be visible in the guest (the RFC series I sent out
> >  >earlier in this thread need to be applied to QEMU at the least, the
> >  >series has grown in my development tree since the time I sent that one
> >  >out).  So essentially we need to update both, the hypervisor and the
> >  >guest to get proper CDROM media change support.
> >
> >  Why do we need to update the guest for a qemu bug?  What is the qemu bug?
>
> Guest kernel bug: CDROM change event missed, so the the revalidate
> call isn't made, which causes stale data (like disc size) to be used
> on newer media.
>
> qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command
> from guests (which is a mandatory command acc. to scsi spec) which the
> guest uses to detect CDROM changes.  Once this command is implemented,
> QEMU sends the required info the guest needs to detect CDROM changes.
> I have this implemented locally (also sent as RFC PATCH 2/3 in the
> 'cdrom bug roundup' thread.
>
> So: even if qemu is updated to handle this command, the guest won't
> work correctly since it misses the event.

Okay.  We aren't responsible for guest kernel bugs, especially those 
which apply to real hardware (we should make more effort for virtio 
bugs).  It's enough that we fix qemu here.

> >  >It also looks like we can't have a workaround in QEMU to get older
> >  >guests to work.
> >
> >  Older guests?  or older hosts?
>
> Older guests (not patched with fix for the bug described above).
>
> Since the guest kernel completely misses the disc change event in the
> path that does the revalidation, there's nothing qemu can do that will
> make such older guests notice disc change.
>
> Also: if only the guest kernel is updated by qemu is not, things still
> won't work since qemu will never send valid information for the
> GET_EVENT_STATUS_NOTIFICATION command.
>
> >  >However, a hack in the kernel can be used without any QEMU changes
> >  >(revalidate disk on each sr_open() call, irrespective of detecting any
> >  >media change).  I'm against doing that for upstream, but downstreams
> >  >could do that for new guest - old hypervisor compat.
> >
> >  Seriously confused.  Please use the kernels "host kernel" and "qemu"
> >  instead of "hypervisor" which is ambiguous.
>
> OK: this last bit says that forcefully revalidating discs in the guest
> kernel when a guest userspace opens the disc will ensure size changes
> are reflected properly for guest userspace.  So in this case, even if
> we're using an older qemu which doesn't implement
> GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine.
>
> This is obviously a hack.

Yes.  Thanks for the clarification.

(let's see if I really got it - we have a kernel bug that hit both the 
guest and the host, plus a qemu bug?)
Amit Shah - April 5, 2011, 9:26 a.m.
On (Tue) 05 Apr 2011 [12:17:30], Avi Kivity wrote:
> On 04/05/2011 12:12 PM, Amit Shah wrote:
> >On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote:
> >>  On 04/05/2011 11:09 AM, Amit Shah wrote:
> >>  >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote:
> >>  >>   On 04/05/2011 09:41 AM, Amit Shah wrote:
> >>  >>   >See http://www.spinics.net/lists/linux-scsi/msg51504.html
> >>  >>
> >>  >>   I see this is quite fresh.  What are the plans here?
> >>  >
> >>  >We're still discussing where the fix should be, but it certainly is a
> >>  >kernel bug and should be fixed there, and then applied to stable.
> >>  >
> >>  >However, there are other bugs in qemu which will prevent the right
> >>  >size changes to be visible in the guest (the RFC series I sent out
> >>  >earlier in this thread need to be applied to QEMU at the least, the
> >>  >series has grown in my development tree since the time I sent that one
> >>  >out).  So essentially we need to update both, the hypervisor and the
> >>  >guest to get proper CDROM media change support.
> >>
> >>  Why do we need to update the guest for a qemu bug?  What is the qemu bug?
> >
> >Guest kernel bug: CDROM change event missed, so the the revalidate
> >call isn't made, which causes stale data (like disc size) to be used
> >on newer media.
> >
> >qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command
> >from guests (which is a mandatory command acc. to scsi spec) which the
> >guest uses to detect CDROM changes.  Once this command is implemented,
> >QEMU sends the required info the guest needs to detect CDROM changes.
> >I have this implemented locally (also sent as RFC PATCH 2/3 in the
> >'cdrom bug roundup' thread.
> >
> >So: even if qemu is updated to handle this command, the guest won't
> >work correctly since it misses the event.
> 
> Okay.  We aren't responsible for guest kernel bugs, especially those
> which apply to real hardware (we should make more effort for virtio
> bugs).  It's enough that we fix qemu here.
> 
> >>  >It also looks like we can't have a workaround in QEMU to get older
> >>  >guests to work.
> >>
> >>  Older guests?  or older hosts?
> >
> >Older guests (not patched with fix for the bug described above).
> >
> >Since the guest kernel completely misses the disc change event in the
> >path that does the revalidation, there's nothing qemu can do that will
> >make such older guests notice disc change.
> >
> >Also: if only the guest kernel is updated by qemu is not, things still
> >won't work since qemu will never send valid information for the
> >GET_EVENT_STATUS_NOTIFICATION command.
> >
> >>  >However, a hack in the kernel can be used without any QEMU changes
> >>  >(revalidate disk on each sr_open() call, irrespective of detecting any
> >>  >media change).  I'm against doing that for upstream, but downstreams
> >>  >could do that for new guest - old hypervisor compat.
> >>
> >>  Seriously confused.  Please use the kernels "host kernel" and "qemu"
> >>  instead of "hypervisor" which is ambiguous.
> >
> >OK: this last bit says that forcefully revalidating discs in the guest
> >kernel when a guest userspace opens the disc will ensure size changes
> >are reflected properly for guest userspace.  So in this case, even if
> >we're using an older qemu which doesn't implement
> >GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine.
> >
> >This is obviously a hack.
> 
> Yes.  Thanks for the clarification.
> 
> (let's see if I really got it - we have a kernel bug that hit both
> the guest and the host, plus a qemu bug?)

Yes -- but just that we have many more qemu bugs.

Our cdrom emulation has a lot of holes when it comes to being
spec-compliant.  I have a few fixes, Markus is working on some as well.

		Amit
Stefan Hajnoczi - April 5, 2011, 12:41 p.m.
On Tue, Apr 5, 2011 at 6:42 AM, David Ahern <dsahern@gmail.com> wrote:
> On 04/04/11 23:33, Stefan Hajnoczi wrote:
>> On Mon, Apr 4, 2011 at 6:54 PM, David Ahern <dsahern@gmail.com> wrote:
>>> On 04/04/11 07:38, Anthony Liguori wrote:
>>>> On 04/04/2011 08:22 AM, Avi Kivity wrote:
>>>>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote:
>>>>>> In order for media change to work with Linux host CD-ROM it is
>>>>>> necessary to reopen the file (otherwise the inode size will not
>>>>>> refresh, this is an issue with existing kernels).
>>>>>>
>>>>>
>>>>> Maybe we should fix the bug in Linux (and backport as necessary)?
>>>>>
>>>>> I think cd-rom assignment is sufficiently obscure that we can require
>>>>> a fixed kernel instead of providing a workaround.
>>>>
>>>> Do reads fail after CD change?  Or do they succeed and the size is just
>>>> reported incorrectly?
>>>>
>>>> If it's the later, I'd agree that it needs fixing in the kernel.  If
>>>> it's the former, I'd say it's clearly a feature.
>>>
>>> In January 2010 I was seeing old data -- data from the prior CD -- in
>>> the guest after the media change.
>>
>> Yikes.  Is there a bug report for this?  What are the steps to reproduce it?
>
> Not that I know of. It is reported by someone else last year as well:
> http://www.mail-archive.com/kvm@vger.kernel.org/msg32999.html

Thanks for the link.  That looks like typical symptoms of missing
media change (which can also legitimately happen if the guest does not
poll).  This should be fixed once the CD-ROM fixes are merged into
QEMU and Linux.

Stefan
Amit Shah - April 6, 2011, 8:07 a.m.
On (Tue) 05 Apr 2011 [12:17:30], Avi Kivity wrote:
> >Guest kernel bug: CDROM change event missed, so the the revalidate
> >call isn't made, which causes stale data (like disc size) to be used
> >on newer media.
> >
> >qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command
> >from guests (which is a mandatory command acc. to scsi spec) which the
> >guest uses to detect CDROM changes.  Once this command is implemented,
> >QEMU sends the required info the guest needs to detect CDROM changes.
> >I have this implemented locally (also sent as RFC PATCH 2/3 in the
> >'cdrom bug roundup' thread.
> >
> >So: even if qemu is updated to handle this command, the guest won't
> >work correctly since it misses the event.
> 
> Okay.  We aren't responsible for guest kernel bugs, especially those
> which apply to real hardware (we should make more effort for virtio
> bugs).  It's enough that we fix qemu here.

Actually I forgot about this already: it is possible for a workaround
in qemu: inject the event twice instead of once.  ie, if the spec says
'SENSE_UNIT_ATTENTION' needs to be reset if some cmd is successful,
don't reset it the first time.  That's how I confirmed the kernel bug
last week.  It is such a horrible hack that I didn't want to keep it
around at all :-)

		Amit

Patch

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 6b72470..8b5205c 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1238,10 +1238,28 @@  static int cdrom_is_inserted(BlockDriverState *bs)
     BDRVRawState *s = bs->opaque;
     int ret;
 
-    ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT);
-    if (ret == CDS_DISC_OK)
-        return 1;
-    return 0;
+    /*
+     * Close the file descriptor if no medium is present and open it to poll
+     * again.  This ensures the medium size is refreshed.  If the file
+     * descriptor is kept open the size can become stale.  This is essentially
+     * replicating CD-ROM polling but is driven by the guest.  As the guest
+     * polls, we poll the host.
+     */
+
+    if (s->fd == -1) {
+        s->fd = qemu_open(bs->filename, s->open_flags, 0644);
+        if (s->fd < 0) {
+            return 0;
+        }
+    }
+
+    ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK);
+
+    if (!ret) {
+        close(s->fd);
+        s->fd = -1;
+    }
+    return ret;
 }
 
 static int cdrom_eject(BlockDriverState *bs, int eject_flag)