Message ID | 1301425482-8722-4-git-send-email-stefanha@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
Am 29.03.2011 21:04, schrieb Stefan Hajnoczi: > Piggy-back on the guest CD-ROM polling to poll on the host. Open and > close the host CD-ROM file descriptor to ensure we read the new size and > not a stale size. > > Two things are going on here: > > 1. If hald/udisks is not already polling CD-ROMs on the host then > re-opening the CD-ROM causes the host to read the new medium's size. > > 2. There is a bug in Linux which means the CD-ROM file descriptor must > be re-opened in order for lseek(2) to see the new size. The > inode size gets out of sync with the underlying device (which you can > confirm by checking that /sys/block/sr0/size and lseek(2) do not > match after media change). I have raised this with the > maintainers but we need a workaround for the foreseeable future. > > Note that these changes are all in a #ifdef __linux__ section. > > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Applied the first two patches of the series. I have some comments on this one. > --- > block/raw-posix.c | 26 ++++++++++++++++++++++---- > 1 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/block/raw-posix.c b/block/raw-posix.c > index 6b72470..8b5205c 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) > BDRVRawState *s = bs->opaque; > int ret; > > - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); > - if (ret == CDS_DISC_OK) > - return 1; > - return 0; > + /* > + * Close the file descriptor if no medium is present and open it to poll > + * again. This ensures the medium size is refreshed. If the file > + * descriptor is kept open the size can become stale. This is essentially > + * replicating CD-ROM polling but is driven by the guest. As the guest > + * polls, we poll the host. > + */ > + > + if (s->fd == -1) { > + s->fd = qemu_open(bs->filename, s->open_flags, 0644); > + if (s->fd < 0) { > + return 0; > + } > + } > + > + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); > + > + if (!ret) { > + close(s->fd); > + s->fd = -1; > + } > + return ret; > } We have this code in raw_close: if (s->fd >= 0) { close(s->fd); s->fd = -1; if (s->aligned_buf != NULL) qemu_vfree(s->aligned_buf); } So now that we set s->fd = -1, this part won't be run and we leak s->aligned_buf. This problem exists in other places in raw-posix, too. The other thing is that I'm not sure if everything in raw-posix is prepared to deal with a -1 fd. At the very least, I think we'll get -EBADF errors instead of the expected -ENOMEDIUM. The general approach looks good to me. Kevin
On Thu, Mar 31, 2011 at 11:05 AM, Kevin Wolf <kwolf@redhat.com> wrote: > The other thing is that I'm not sure if everything in raw-posix is > prepared to deal with a -1 fd. At the very least, I think we'll get > -EBADF errors instead of the expected -ENOMEDIUM. Not all of block.c checks for !bdrv_is_inserted() so you are right, I missed cases where we should return -ENOMEDIUM. We have a similar scenario with floppy disks where fd_open() can return leaving s->fd == -1. I would like to unify these floppy and CD-ROM open cases. Stefan
On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> wrote: > Piggy-back on the guest CD-ROM polling to poll on the host. Open and > close the host CD-ROM file descriptor to ensure we read the new size and > not a stale size. > > Two things are going on here: > > 1. If hald/udisks is not already polling CD-ROMs on the host then > re-opening the CD-ROM causes the host to read the new medium's size. > > 2. There is a bug in Linux which means the CD-ROM file descriptor must > be re-opened in order for lseek(2) to see the new size. The > inode size gets out of sync with the underlying device (which you can > confirm by checking that /sys/block/sr0/size and lseek(2) do not > match after media change). I have raised this with the > maintainers but we need a workaround for the foreseeable future. > > Note that these changes are all in a #ifdef __linux__ section. > > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> > --- > block/raw-posix.c | 26 ++++++++++++++++++++++---- > 1 files changed, 22 insertions(+), 4 deletions(-) > > diff --git a/block/raw-posix.c b/block/raw-posix.c > index 6b72470..8b5205c 100644 > --- a/block/raw-posix.c > +++ b/block/raw-posix.c > @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) > BDRVRawState *s = bs->opaque; > int ret; > > - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); > - if (ret == CDS_DISC_OK) > - return 1; > - return 0; > + /* > + * Close the file descriptor if no medium is present and open it to poll > + * again. This ensures the medium size is refreshed. If the file > + * descriptor is kept open the size can become stale. This is essentially > + * replicating CD-ROM polling but is driven by the guest. As the guest > + * polls, we poll the host. > + */ > + > + if (s->fd == -1) { > + s->fd = qemu_open(bs->filename, s->open_flags, 0644); > + if (s->fd < 0) { > + return 0; > + } > + } > + > + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); > + > + if (!ret) { > + close(s->fd); > + s->fd = -1; > + } > + return ret; > } > > static int cdrom_eject(BlockDriverState *bs, int eject_flag) > -- > 1.7.4.1 > > > There is an issue with reopening host devices in QEMU when running under libvirt. It appears that libvirt chowns image files (including device nodes) so that the launched QEMU process can access them. Unfortunately after media change on host devices udev will reset the ownership of the device node. This causes open(2) to fail with EACCES since the QEMU process does not have the right uid/gid/groups and libvirt is unaware that the file's ownership has changed. In order for media change to work with Linux host CD-ROM it is necessary to reopen the file (otherwise the inode size will not refresh, this is an issue with existing kernels). How can libvirt's security model be made to support this case? In theory udev could be temporarily configured with libvirt permissions for the CD-ROM device while passed through to the guest, but is that feasible? Stefan
On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi > <stefanha@linux.vnet.ibm.com> wrote: >> Piggy-back on the guest CD-ROM polling to poll on the host. Open and >> close the host CD-ROM file descriptor to ensure we read the new size and >> not a stale size. >> >> Two things are going on here: >> >> 1. If hald/udisks is not already polling CD-ROMs on the host then >> re-opening the CD-ROM causes the host to read the new medium's size. >> >> 2. There is a bug in Linux which means the CD-ROM file descriptor must >> be re-opened in order for lseek(2) to see the new size. The >> inode size gets out of sync with the underlying device (which you can >> confirm by checking that /sys/block/sr0/size and lseek(2) do not >> match after media change). I have raised this with the >> maintainers but we need a workaround for the foreseeable future. >> >> Note that these changes are all in a #ifdef __linux__ section. >> >> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> >> --- >> block/raw-posix.c | 26 ++++++++++++++++++++++---- >> 1 files changed, 22 insertions(+), 4 deletions(-) >> >> diff --git a/block/raw-posix.c b/block/raw-posix.c >> index 6b72470..8b5205c 100644 >> --- a/block/raw-posix.c >> +++ b/block/raw-posix.c >> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) >> BDRVRawState *s = bs->opaque; >> int ret; >> >> - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); >> - if (ret == CDS_DISC_OK) >> - return 1; >> - return 0; >> + /* >> + * Close the file descriptor if no medium is present and open it to poll >> + * again. This ensures the medium size is refreshed. If the file >> + * descriptor is kept open the size can become stale. This is essentially >> + * replicating CD-ROM polling but is driven by the guest. As the guest >> + * polls, we poll the host. >> + */ >> + >> + if (s->fd == -1) { >> + s->fd = qemu_open(bs->filename, s->open_flags, 0644); >> + if (s->fd < 0) { >> + return 0; >> + } >> + } >> + >> + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); >> + >> + if (!ret) { >> + close(s->fd); >> + s->fd = -1; >> + } >> + return ret; >> } >> >> static int cdrom_eject(BlockDriverState *bs, int eject_flag) >> -- >> 1.7.4.1 >> >> >> > > There is an issue with reopening host devices in QEMU when running > under libvirt. It appears that libvirt chowns image files (including > device nodes) so that the launched QEMU process can access them. > > Unfortunately after media change on host devices udev will reset the > ownership of the device node. This causes open(2) to fail with EACCES > since the QEMU process does not have the right uid/gid/groups and > libvirt is unaware that the file's ownership has changed. > > In order for media change to work with Linux host CD-ROM it is > necessary to reopen the file (otherwise the inode size will not > refresh, this is an issue with existing kernels). > > How can libvirt's security model be made to support this case? In > theory udev could be temporarily configured with libvirt permissions > for the CD-ROM device while passed through to the guest, but is that > feasible? How about something like this: Add an explicit reopen method to BlockDriver. Make a special block device for passed file descriptors. Pass descriptors in libvirt for CD-ROMs instead of the device paths. The reopen method for file descriptors should notify libvirt about need to pass a reopened descriptor and then block all accesses until a new descriptor is available. This should also solve your earlier problem.
On Sun, Apr 3, 2011 at 2:12 PM, Blue Swirl <blauwirbel@gmail.com> wrote: > On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi >> <stefanha@linux.vnet.ibm.com> wrote: >>> Piggy-back on the guest CD-ROM polling to poll on the host. Open and >>> close the host CD-ROM file descriptor to ensure we read the new size and >>> not a stale size. >>> >>> Two things are going on here: >>> >>> 1. If hald/udisks is not already polling CD-ROMs on the host then >>> re-opening the CD-ROM causes the host to read the new medium's size. >>> >>> 2. There is a bug in Linux which means the CD-ROM file descriptor must >>> be re-opened in order for lseek(2) to see the new size. The >>> inode size gets out of sync with the underlying device (which you can >>> confirm by checking that /sys/block/sr0/size and lseek(2) do not >>> match after media change). I have raised this with the >>> maintainers but we need a workaround for the foreseeable future. >>> >>> Note that these changes are all in a #ifdef __linux__ section. >>> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> >>> --- >>> block/raw-posix.c | 26 ++++++++++++++++++++++---- >>> 1 files changed, 22 insertions(+), 4 deletions(-) >>> >>> diff --git a/block/raw-posix.c b/block/raw-posix.c >>> index 6b72470..8b5205c 100644 >>> --- a/block/raw-posix.c >>> +++ b/block/raw-posix.c >>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) >>> BDRVRawState *s = bs->opaque; >>> int ret; >>> >>> - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); >>> - if (ret == CDS_DISC_OK) >>> - return 1; >>> - return 0; >>> + /* >>> + * Close the file descriptor if no medium is present and open it to poll >>> + * again. This ensures the medium size is refreshed. If the file >>> + * descriptor is kept open the size can become stale. This is essentially >>> + * replicating CD-ROM polling but is driven by the guest. As the guest >>> + * polls, we poll the host. >>> + */ >>> + >>> + if (s->fd == -1) { >>> + s->fd = qemu_open(bs->filename, s->open_flags, 0644); >>> + if (s->fd < 0) { >>> + return 0; >>> + } >>> + } >>> + >>> + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); >>> + >>> + if (!ret) { >>> + close(s->fd); >>> + s->fd = -1; >>> + } >>> + return ret; >>> } >>> >>> static int cdrom_eject(BlockDriverState *bs, int eject_flag) >>> -- >>> 1.7.4.1 >>> >>> >>> >> >> There is an issue with reopening host devices in QEMU when running >> under libvirt. It appears that libvirt chowns image files (including >> device nodes) so that the launched QEMU process can access them. >> >> Unfortunately after media change on host devices udev will reset the >> ownership of the device node. This causes open(2) to fail with EACCES >> since the QEMU process does not have the right uid/gid/groups and >> libvirt is unaware that the file's ownership has changed. >> >> In order for media change to work with Linux host CD-ROM it is >> necessary to reopen the file (otherwise the inode size will not >> refresh, this is an issue with existing kernels). >> >> How can libvirt's security model be made to support this case? In >> theory udev could be temporarily configured with libvirt permissions >> for the CD-ROM device while passed through to the guest, but is that >> feasible? > > How about something like this: Add an explicit reopen method to > BlockDriver. Make a special block device for passed file descriptors. > Pass descriptors in libvirt for CD-ROMs instead of the device paths. > The reopen method for file descriptors should notify libvirt about > need to pass a reopened descriptor and then block all accesses until a > new descriptor is available. This should also solve your earlier > problem. I'm hoping libvirt's behavior can be made to just work rather than adding new features to QEMU. But perhaps passing file descriptors is useful for more than just reopening host devices. This would basically be a privilege separation model where the QEMU process isn't able to open files itself but can request libvirt to open them on its behalf. Stefan
On Sun, Apr 03, 2011 at 07:06:17PM +0100, Stefan Hajnoczi wrote: > On Sun, Apr 3, 2011 at 2:12 PM, Blue Swirl <blauwirbel@gmail.com> wrote: > > On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > >> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi > >> <stefanha@linux.vnet.ibm.com> wrote: > >>> Piggy-back on the guest CD-ROM polling to poll on the host. Open and > >>> close the host CD-ROM file descriptor to ensure we read the new size and > >>> not a stale size. > >>> > >>> Two things are going on here: > >>> > >>> 1. If hald/udisks is not already polling CD-ROMs on the host then > >>> re-opening the CD-ROM causes the host to read the new medium's size. > >>> > >>> 2. There is a bug in Linux which means the CD-ROM file descriptor must > >>> be re-opened in order for lseek(2) to see the new size. The > >>> inode size gets out of sync with the underlying device (which you can > >>> confirm by checking that /sys/block/sr0/size and lseek(2) do not > >>> match after media change). I have raised this with the > >>> maintainers but we need a workaround for the foreseeable future. > >>> > >>> Note that these changes are all in a #ifdef __linux__ section. > >>> > >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> > >>> --- > >>> block/raw-posix.c | 26 ++++++++++++++++++++++---- > >>> 1 files changed, 22 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/block/raw-posix.c b/block/raw-posix.c > >>> index 6b72470..8b5205c 100644 > >>> --- a/block/raw-posix.c > >>> +++ b/block/raw-posix.c > >>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) > >>> BDRVRawState *s = bs->opaque; > >>> int ret; > >>> > >>> - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); > >>> - if (ret == CDS_DISC_OK) > >>> - return 1; > >>> - return 0; > >>> + /* > >>> + * Close the file descriptor if no medium is present and open it to poll > >>> + * again. This ensures the medium size is refreshed. If the file > >>> + * descriptor is kept open the size can become stale. This is essentially > >>> + * replicating CD-ROM polling but is driven by the guest. As the guest > >>> + * polls, we poll the host. > >>> + */ > >>> + > >>> + if (s->fd == -1) { > >>> + s->fd = qemu_open(bs->filename, s->open_flags, 0644); > >>> + if (s->fd < 0) { > >>> + return 0; > >>> + } > >>> + } > >>> + > >>> + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); > >>> + > >>> + if (!ret) { > >>> + close(s->fd); > >>> + s->fd = -1; > >>> + } > >>> + return ret; > >>> } > >>> > >>> static int cdrom_eject(BlockDriverState *bs, int eject_flag) > >>> -- > >>> 1.7.4.1 > >>> > >>> > >>> > >> > >> There is an issue with reopening host devices in QEMU when running > >> under libvirt. It appears that libvirt chowns image files (including > >> device nodes) so that the launched QEMU process can access them. > >> > >> Unfortunately after media change on host devices udev will reset the > >> ownership of the device node. This causes open(2) to fail with EACCES > >> since the QEMU process does not have the right uid/gid/groups and > >> libvirt is unaware that the file's ownership has changed. > >> > >> In order for media change to work with Linux host CD-ROM it is > >> necessary to reopen the file (otherwise the inode size will not > >> refresh, this is an issue with existing kernels). > >> > >> How can libvirt's security model be made to support this case? In > >> theory udev could be temporarily configured with libvirt permissions > >> for the CD-ROM device while passed through to the guest, but is that > >> feasible? > > > > How about something like this: Add an explicit reopen method to > > BlockDriver. Make a special block device for passed file descriptors. > > Pass descriptors in libvirt for CD-ROMs instead of the device paths. > > The reopen method for file descriptors should notify libvirt about > > need to pass a reopened descriptor and then block all accesses until a > > new descriptor is available. This should also solve your earlier > > problem. > > I'm hoping libvirt's behavior can be made to just work rather than > adding new features to QEMU. But perhaps passing file descriptors is > useful for more than just reopening host devices. This would > basically be a privilege separation model where the QEMU process isn't > able to open files itself but can request libvirt to open them on its > behalf. It is rather frickin' annoying the way udev resets the ownership when the media merely changes. If it isn't possible to stop udev doing this, then i think the only practical thing is to use ACLs instead of user/group ownership. We wanted to switch to ACLs in libvirt for other reasons already, but it isn't quite as simple as it sounds[1] so we've not done it just yet. Daniel [1] Mostly due to handling upgrades from existing libvirtd while VMs are running, and coping with filesystems which don't support ACLs (or have them turned of by mount options)
On Mon, Apr 4, 2011 at 11:47 AM, Daniel P. Berrange <berrange@redhat.com> wrote: > On Sun, Apr 03, 2011 at 07:06:17PM +0100, Stefan Hajnoczi wrote: >> On Sun, Apr 3, 2011 at 2:12 PM, Blue Swirl <blauwirbel@gmail.com> wrote: >> > On Sun, Apr 3, 2011 at 2:57 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: >> >> On Tue, Mar 29, 2011 at 8:04 PM, Stefan Hajnoczi >> >> <stefanha@linux.vnet.ibm.com> wrote: >> >>> Piggy-back on the guest CD-ROM polling to poll on the host. Open and >> >>> close the host CD-ROM file descriptor to ensure we read the new size and >> >>> not a stale size. >> >>> >> >>> Two things are going on here: >> >>> >> >>> 1. If hald/udisks is not already polling CD-ROMs on the host then >> >>> re-opening the CD-ROM causes the host to read the new medium's size. >> >>> >> >>> 2. There is a bug in Linux which means the CD-ROM file descriptor must >> >>> be re-opened in order for lseek(2) to see the new size. The >> >>> inode size gets out of sync with the underlying device (which you can >> >>> confirm by checking that /sys/block/sr0/size and lseek(2) do not >> >>> match after media change). I have raised this with the >> >>> maintainers but we need a workaround for the foreseeable future. >> >>> >> >>> Note that these changes are all in a #ifdef __linux__ section. >> >>> >> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> >> >>> --- >> >>> block/raw-posix.c | 26 ++++++++++++++++++++++---- >> >>> 1 files changed, 22 insertions(+), 4 deletions(-) >> >>> >> >>> diff --git a/block/raw-posix.c b/block/raw-posix.c >> >>> index 6b72470..8b5205c 100644 >> >>> --- a/block/raw-posix.c >> >>> +++ b/block/raw-posix.c >> >>> @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) >> >>> BDRVRawState *s = bs->opaque; >> >>> int ret; >> >>> >> >>> - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); >> >>> - if (ret == CDS_DISC_OK) >> >>> - return 1; >> >>> - return 0; >> >>> + /* >> >>> + * Close the file descriptor if no medium is present and open it to poll >> >>> + * again. This ensures the medium size is refreshed. If the file >> >>> + * descriptor is kept open the size can become stale. This is essentially >> >>> + * replicating CD-ROM polling but is driven by the guest. As the guest >> >>> + * polls, we poll the host. >> >>> + */ >> >>> + >> >>> + if (s->fd == -1) { >> >>> + s->fd = qemu_open(bs->filename, s->open_flags, 0644); >> >>> + if (s->fd < 0) { >> >>> + return 0; >> >>> + } >> >>> + } >> >>> + >> >>> + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); >> >>> + >> >>> + if (!ret) { >> >>> + close(s->fd); >> >>> + s->fd = -1; >> >>> + } >> >>> + return ret; >> >>> } >> >>> >> >>> static int cdrom_eject(BlockDriverState *bs, int eject_flag) >> >>> -- >> >>> 1.7.4.1 >> >>> >> >>> >> >>> >> >> >> >> There is an issue with reopening host devices in QEMU when running >> >> under libvirt. It appears that libvirt chowns image files (including >> >> device nodes) so that the launched QEMU process can access them. >> >> >> >> Unfortunately after media change on host devices udev will reset the >> >> ownership of the device node. This causes open(2) to fail with EACCES >> >> since the QEMU process does not have the right uid/gid/groups and >> >> libvirt is unaware that the file's ownership has changed. >> >> >> >> In order for media change to work with Linux host CD-ROM it is >> >> necessary to reopen the file (otherwise the inode size will not >> >> refresh, this is an issue with existing kernels). >> >> >> >> How can libvirt's security model be made to support this case? In >> >> theory udev could be temporarily configured with libvirt permissions >> >> for the CD-ROM device while passed through to the guest, but is that >> >> feasible? >> > >> > How about something like this: Add an explicit reopen method to >> > BlockDriver. Make a special block device for passed file descriptors. >> > Pass descriptors in libvirt for CD-ROMs instead of the device paths. >> > The reopen method for file descriptors should notify libvirt about >> > need to pass a reopened descriptor and then block all accesses until a >> > new descriptor is available. This should also solve your earlier >> > problem. >> >> I'm hoping libvirt's behavior can be made to just work rather than >> adding new features to QEMU. But perhaps passing file descriptors is >> useful for more than just reopening host devices. This would >> basically be a privilege separation model where the QEMU process isn't >> able to open files itself but can request libvirt to open them on its >> behalf. > > It is rather frickin' annoying the way udev resets the ownership > when the media merely changes. If it isn't possible to stop udev > doing this, then i think the only practical thing is to use ACLs > instead of user/group ownership. We wanted to switch to ACLs in > libvirt for other reasons already, but it isn't quite as simple > as it sounds[1] so we've not done it just yet. > > Daniel > > [1] Mostly due to handling upgrades from existing libvirtd while > VMs are running, and coping with filesystems which don't > support ACLs (or have them turned of by mount options) I haven't peeked at how udev does it but perhaps the ACLs will be lost or reset across media change too? Daniel, do you know someone from the udev side who we should include in this discussion? Stefan
On 04/04/2011 05:47 AM, Daniel P. Berrange wrote: >> I'm hoping libvirt's behavior can be made to just work rather than >> adding new features to QEMU. But perhaps passing file descriptors is >> useful for more than just reopening host devices. This would >> basically be a privilege separation model where the QEMU process isn't >> able to open files itself but can request libvirt to open them on its >> behalf. > It is rather frickin' annoying the way udev resets the ownership > when the media merely changes. If it isn't possible to stop udev > doing this, then i think the only practical thing is to use ACLs > instead of user/group ownership. We wanted to switch to ACLs in > libvirt for other reasons already, but it isn't quite as simple > as it sounds[1] so we've not done it just yet. Isn't the root of the problem that you're not running a guest in the expected security context? How much of a leap would it be to spawn a guest with the credentials of the user that created/defined it? Or better yet, to let the user be specified in the XML. Regards, Anthony Liguori > Daniel > > [1] Mostly due to handling upgrades from existing libvirtd while > VMs are running, and coping with filesystems which don't > support ACLs (or have them turned of by mount options)
On Mon, Apr 04, 2011 at 08:02:26AM -0500, Anthony Liguori wrote: > On 04/04/2011 05:47 AM, Daniel P. Berrange wrote: > >>I'm hoping libvirt's behavior can be made to just work rather than > >>adding new features to QEMU. But perhaps passing file descriptors is > >>useful for more than just reopening host devices. This would > >>basically be a privilege separation model where the QEMU process isn't > >>able to open files itself but can request libvirt to open them on its > >>behalf. > >It is rather frickin' annoying the way udev resets the ownership > >when the media merely changes. If it isn't possible to stop udev > >doing this, then i think the only practical thing is to use ACLs > >instead of user/group ownership. We wanted to switch to ACLs in > >libvirt for other reasons already, but it isn't quite as simple > >as it sounds[1] so we've not done it just yet. > > Isn't the root of the problem that you're not running a guest in the > expected security context? That doesn't really have any impact. If a desktop user is logged in, udev may change the ownership to match that user, but if they aren't, then udev may reset it to root:disk. Either way, QEMU may loose permissions to the disk. > How much of a leap would it be to spawn a guest with the credentials > of the user that created/defined it? Or better yet, to let the user > be specified in the XML. That's a completely independent RFE which won't fix this issue in the general case. Regards, Daniel
On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: > In order for media change to work with Linux host CD-ROM it is > necessary to reopen the file (otherwise the inode size will not > refresh, this is an issue with existing kernels). > Maybe we should fix the bug in Linux (and backport as necessary)? I think cd-rom assignment is sufficiently obscure that we can require a fixed kernel instead of providing a workaround.
On 04/04/2011 08:22 AM, Avi Kivity wrote: > On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >> In order for media change to work with Linux host CD-ROM it is >> necessary to reopen the file (otherwise the inode size will not >> refresh, this is an issue with existing kernels). >> > > Maybe we should fix the bug in Linux (and backport as necessary)? > > I think cd-rom assignment is sufficiently obscure that we can require > a fixed kernel instead of providing a workaround. Do reads fail after CD change? Or do they succeed and the size is just reported incorrectly? If it's the later, I'd agree that it needs fixing in the kernel. If it's the former, I'd say it's clearly a feature. Regards, Anthony Liguori
On 04/04/2011 04:38 PM, Anthony Liguori wrote: > On 04/04/2011 08:22 AM, Avi Kivity wrote: >> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >>> In order for media change to work with Linux host CD-ROM it is >>> necessary to reopen the file (otherwise the inode size will not >>> refresh, this is an issue with existing kernels). >>> >> >> Maybe we should fix the bug in Linux (and backport as necessary)? >> >> I think cd-rom assignment is sufficiently obscure that we can require >> a fixed kernel instead of providing a workaround. > > Do reads fail after CD change? Or do they succeed and the size is > just reported incorrectly? > > If it's the later, I'd agree that it needs fixing in the kernel. If > it's the former, I'd say it's clearly a feature. > Even if it's a documented or intentional feature, we can add an ioctl to "refresh" the device with up-to-date data.
On 04/04/2011 08:16 AM, Daniel P. Berrange wrote: > That doesn't really have any impact. If a desktop user is logged > in, udev may change the ownership to match that user, but if they > aren't, then udev may reset it to root:disk. Either way, QEMU > may loose permissions to the disk. Then if you create a guest without being in the 'disk' group, it'll fail. That's pretty expected AFAICT. But with libvirt today, when you launch a guest, your security context doesn't matter and there's no way you can control what context the guest gets. libvirt is essentially creating it's own authorization mechanism. Supporting ACLs goes much further down that path. >> How much of a leap would it be to spawn a guest with the credentials >> of the user that created/defined it? Or better yet, to let the user >> be specified in the XML. > That's a completely independent RFE which won't fix this issue in > the general case. I think it really does. Regards, Anthony Liguori > Regards, > Daniel
On Mon, Apr 04, 2011 at 09:19:36AM -0500, Anthony Liguori wrote: > On 04/04/2011 08:16 AM, Daniel P. Berrange wrote: > >That doesn't really have any impact. If a desktop user is logged > >in, udev may change the ownership to match that user, but if they > >aren't, then udev may reset it to root:disk. Either way, QEMU > >may loose permissions to the disk. > > Then if you create a guest without being in the 'disk' group, it'll > fail. That's pretty expected AFAICT. We don't *ever* want to put QEMU in the 'disk' group because that gives it access to any disk on the system in general. > But with libvirt today, when you launch a guest, your security > context doesn't matter and there's no way you can control what > context the guest gets. libvirt is essentially creating it's own > authorization mechanism. Supporting ACLs goes much further down > that path. > > >>How much of a leap would it be to spawn a guest with the credentials > >>of the user that created/defined it? Or better yet, to let the user > >>be specified in the XML. > >That's a completely independent RFE which won't fix this issue in > >the general case. > > I think it really does. Nope it doesn't. Regards, Daniel
On 04/04/2011 09:26 AM, Daniel P. Berrange wrote: > On Mon, Apr 04, 2011 at 09:19:36AM -0500, Anthony Liguori wrote: >> On 04/04/2011 08:16 AM, Daniel P. Berrange wrote: >>> That doesn't really have any impact. If a desktop user is logged >>> in, udev may change the ownership to match that user, but if they >>> aren't, then udev may reset it to root:disk. Either way, QEMU >>> may loose permissions to the disk. >> Then if you create a guest without being in the 'disk' group, it'll >> fail. That's pretty expected AFAICT. > We don't *ever* want to put QEMU in the 'disk' group because > that gives it access to any disk on the system in general. If that's what the user wants to do, what's the problem with doing it? Setting the global user/group is not enough because just because you have one VM that you want in disk doesn't mean you want all of them in disk. And to be clear, if you could do this today, there wouldn't be a problem here in terms of QEMU just reopening the file. Regards, Anthony Liguori Regards, Anthony Liguori
On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote: > On 04/04/2011 04:38 PM, Anthony Liguori wrote: >> >> On 04/04/2011 08:22 AM, Avi Kivity wrote: >>> >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >>>> >>>> In order for media change to work with Linux host CD-ROM it is >>>> necessary to reopen the file (otherwise the inode size will not >>>> refresh, this is an issue with existing kernels). >>>> >>> >>> Maybe we should fix the bug in Linux (and backport as necessary)? >>> >>> I think cd-rom assignment is sufficiently obscure that we can require a >>> fixed kernel instead of providing a workaround. >> >> Do reads fail after CD change? Or do they succeed and the size is just >> reported incorrectly? >> >> If it's the later, I'd agree that it needs fixing in the kernel. If it's >> the former, I'd say it's clearly a feature. >> > > Even if it's a documented or intentional feature, we can add an ioctl to > "refresh" the device with up-to-date data. It's possible to fix this in the kernel. I just haven't written the patch yet. The inode size needs to be updated when the new medium is detected. I haven't tested but I suspect reads within the size of the previous medium will succeed. But if the new medium is larger then reads beyond the old medium size will fail. The size reported by lseek(fd, 0, SEEK_END) is outdated. Stefan
On 04/04/2011 06:09 PM, Stefan Hajnoczi wrote: > On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity<avi@redhat.com> wrote: > > On 04/04/2011 04:38 PM, Anthony Liguori wrote: > >> > >> On 04/04/2011 08:22 AM, Avi Kivity wrote: > >>> > >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: > >>>> > >>>> In order for media change to work with Linux host CD-ROM it is > >>>> necessary to reopen the file (otherwise the inode size will not > >>>> refresh, this is an issue with existing kernels). > >>>> > >>> > >>> Maybe we should fix the bug in Linux (and backport as necessary)? > >>> > >>> I think cd-rom assignment is sufficiently obscure that we can require a > >>> fixed kernel instead of providing a workaround. > >> > >> Do reads fail after CD change? Or do they succeed and the size is just > >> reported incorrectly? > >> > >> If it's the later, I'd agree that it needs fixing in the kernel. If it's > >> the former, I'd say it's clearly a feature. > >> > > > > Even if it's a documented or intentional feature, we can add an ioctl to > > "refresh" the device with up-to-date data. > > It's possible to fix this in the kernel. I just haven't written the > patch yet. The inode size needs to be updated when the new medium is > detected. > > I haven't tested but I suspect reads within the size of the previous > medium will succeed. But if the new medium is larger then reads > beyond the old medium size will fail. > > The size reported by lseek(fd, 0, SEEK_END) is outdated. I believe a kernel fix is best in that case, leaving qemu alone.
On Mon, Apr 4, 2011 at 5:43 PM, Anthony Liguori <anthony@codemonkey.ws> wrote: > On 04/04/2011 09:26 AM, Daniel P. Berrange wrote: >> >> On Mon, Apr 04, 2011 at 09:19:36AM -0500, Anthony Liguori wrote: >>> >>> On 04/04/2011 08:16 AM, Daniel P. Berrange wrote: >>>> >>>> That doesn't really have any impact. If a desktop user is logged >>>> in, udev may change the ownership to match that user, but if they >>>> aren't, then udev may reset it to root:disk. Either way, QEMU >>>> may loose permissions to the disk. >>> >>> Then if you create a guest without being in the 'disk' group, it'll >>> fail. That's pretty expected AFAICT. >> >> We don't *ever* want to put QEMU in the 'disk' group because >> that gives it access to any disk on the system in general. > > If that's what the user wants to do, what's the problem with doing it? > > Setting the global user/group is not enough because just because you have > one VM that you want in disk doesn't mean you want all of them in disk. Privilege separated QEMU sounds so interesting that I'd go for that direction. There could be helper processes which retain privileges and communicate with the main unprivileged QEMU with only file descriptors. The helpers could even execute setgid disk group re-opener for the CD-ROM case, or ask libvirt to do the reopen. For unprivileged QEMU part it wouldn't matter, all it sees are the descriptors.
On 04/04/11 07:38, Anthony Liguori wrote: > On 04/04/2011 08:22 AM, Avi Kivity wrote: >> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >>> In order for media change to work with Linux host CD-ROM it is >>> necessary to reopen the file (otherwise the inode size will not >>> refresh, this is an issue with existing kernels). >>> >> >> Maybe we should fix the bug in Linux (and backport as necessary)? >> >> I think cd-rom assignment is sufficiently obscure that we can require >> a fixed kernel instead of providing a workaround. > > Do reads fail after CD change? Or do they succeed and the size is just > reported incorrectly? > > If it's the later, I'd agree that it needs fixing in the kernel. If > it's the former, I'd say it's clearly a feature. In January 2010 I was seeing old data -- data from the prior CD -- in the guest after the media change. David
On Mon, Apr 4, 2011 at 6:54 PM, David Ahern <dsahern@gmail.com> wrote: > > > On 04/04/11 07:38, Anthony Liguori wrote: >> On 04/04/2011 08:22 AM, Avi Kivity wrote: >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >>>> In order for media change to work with Linux host CD-ROM it is >>>> necessary to reopen the file (otherwise the inode size will not >>>> refresh, this is an issue with existing kernels). >>>> >>> >>> Maybe we should fix the bug in Linux (and backport as necessary)? >>> >>> I think cd-rom assignment is sufficiently obscure that we can require >>> a fixed kernel instead of providing a workaround. >> >> Do reads fail after CD change? Or do they succeed and the size is just >> reported incorrectly? >> >> If it's the later, I'd agree that it needs fixing in the kernel. If >> it's the former, I'd say it's clearly a feature. > > In January 2010 I was seeing old data -- data from the prior CD -- in > the guest after the media change. Yikes. Is there a bug report for this? What are the steps to reproduce it? Stefan
On 04/04/11 23:33, Stefan Hajnoczi wrote: > On Mon, Apr 4, 2011 at 6:54 PM, David Ahern <dsahern@gmail.com> wrote: >> On 04/04/11 07:38, Anthony Liguori wrote: >>> On 04/04/2011 08:22 AM, Avi Kivity wrote: >>>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >>>>> In order for media change to work with Linux host CD-ROM it is >>>>> necessary to reopen the file (otherwise the inode size will not >>>>> refresh, this is an issue with existing kernels). >>>>> >>>> >>>> Maybe we should fix the bug in Linux (and backport as necessary)? >>>> >>>> I think cd-rom assignment is sufficiently obscure that we can require >>>> a fixed kernel instead of providing a workaround. >>> >>> Do reads fail after CD change? Or do they succeed and the size is just >>> reported incorrectly? >>> >>> If it's the later, I'd agree that it needs fixing in the kernel. If >>> it's the former, I'd say it's clearly a feature. >> >> In January 2010 I was seeing old data -- data from the prior CD -- in >> the guest after the media change. > > Yikes. Is there a bug report for this? What are the steps to reproduce it? Not that I know of. It is reported by someone else last year as well: http://www.mail-archive.com/kvm@vger.kernel.org/msg32999.html > > Stefan
On (Mon) 04 Apr 2011 [16:09:05], Stefan Hajnoczi wrote: > On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote: > > On 04/04/2011 04:38 PM, Anthony Liguori wrote: > >> > >> On 04/04/2011 08:22 AM, Avi Kivity wrote: > >>> > >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: > >>>> > >>>> In order for media change to work with Linux host CD-ROM it is > >>>> necessary to reopen the file (otherwise the inode size will not > >>>> refresh, this is an issue with existing kernels). > >>>> > >>> > >>> Maybe we should fix the bug in Linux (and backport as necessary)? > >>> > >>> I think cd-rom assignment is sufficiently obscure that we can require a > >>> fixed kernel instead of providing a workaround. > >> > >> Do reads fail after CD change? Or do they succeed and the size is just > >> reported incorrectly? > >> > >> If it's the later, I'd agree that it needs fixing in the kernel. If it's > >> the former, I'd say it's clearly a feature. > >> > > > > Even if it's a documented or intentional feature, we can add an ioctl to > > "refresh" the device with up-to-date data. > > It's possible to fix this in the kernel. I just haven't written the > patch yet. The inode size needs to be updated when the new medium is > detected. > > I haven't tested but I suspect reads within the size of the previous > medium will succeed. But if the new medium is larger then reads > beyond the old medium size will fail. See http://www.spinics.net/lists/linux-scsi/msg51504.html Amit
On 04/05/2011 09:41 AM, Amit Shah wrote:
> See http://www.spinics.net/lists/linux-scsi/msg51504.html
I see this is quite fresh. What are the plans here?
On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote: > On 04/05/2011 09:41 AM, Amit Shah wrote: > >See http://www.spinics.net/lists/linux-scsi/msg51504.html > > I see this is quite fresh. What are the plans here? We're still discussing where the fix should be, but it certainly is a kernel bug and should be fixed there, and then applied to stable. However, there are other bugs in qemu which will prevent the right size changes to be visible in the guest (the RFC series I sent out earlier in this thread need to be applied to QEMU at the least, the series has grown in my development tree since the time I sent that one out). So essentially we need to update both, the hypervisor and the guest to get proper CDROM media change support. It also looks like we can't have a workaround in QEMU to get older guests to work. However, a hack in the kernel can be used without any QEMU changes (revalidate disk on each sr_open() call, irrespective of detecting any media change). I'm against doing that for upstream, but downstreams could do that for new guest - old hypervisor compat. Amit
On Tue, Apr 5, 2011 at 7:41 AM, Amit Shah <amit.shah@redhat.com> wrote: > On (Mon) 04 Apr 2011 [16:09:05], Stefan Hajnoczi wrote: >> On Mon, Apr 4, 2011 at 2:49 PM, Avi Kivity <avi@redhat.com> wrote: >> > On 04/04/2011 04:38 PM, Anthony Liguori wrote: >> >> >> >> On 04/04/2011 08:22 AM, Avi Kivity wrote: >> >>> >> >>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >> >>>> >> >>>> In order for media change to work with Linux host CD-ROM it is >> >>>> necessary to reopen the file (otherwise the inode size will not >> >>>> refresh, this is an issue with existing kernels). >> >>>> >> >>> >> >>> Maybe we should fix the bug in Linux (and backport as necessary)? >> >>> >> >>> I think cd-rom assignment is sufficiently obscure that we can require a >> >>> fixed kernel instead of providing a workaround. >> >> >> >> Do reads fail after CD change? Or do they succeed and the size is just >> >> reported incorrectly? >> >> >> >> If it's the later, I'd agree that it needs fixing in the kernel. If it's >> >> the former, I'd say it's clearly a feature. >> >> >> > >> > Even if it's a documented or intentional feature, we can add an ioctl to >> > "refresh" the device with up-to-date data. >> >> It's possible to fix this in the kernel. I just haven't written the >> patch yet. The inode size needs to be updated when the new medium is >> detected. >> >> I haven't tested but I suspect reads within the size of the previous >> medium will succeed. But if the new medium is larger then reads >> beyond the old medium size will fail. > > See http://www.spinics.net/lists/linux-scsi/msg51504.html I don't think that patch updates the block inode size. We'd need to call fs/block_dev.c:revalidate_disk() instead of directly calling cdi->disk->fops->revalidate_disk(cdi->disk). fs/block_dev.c:revalidate_disk() calls check_disk_size_change(), which will update the inode size. Here are the steps to reproduce the issue: https://lkml.org/lkml/2011/3/23/156 Stefan
On (Tue) 05 Apr 2011 [09:40:05], Stefan Hajnoczi wrote: > > See http://www.spinics.net/lists/linux-scsi/msg51504.html > > I don't think that patch updates the block inode size. We'd need to > call fs/block_dev.c:revalidate_disk() instead of directly calling > cdi->disk->fops->revalidate_disk(cdi->disk). > fs/block_dev.c:revalidate_disk() calls check_disk_size_change(), which > will update the inode size. Then the patch is buggy :-) As Tejun also says in the thread, the patch should be in the block layer, not sr.c. (btw that patch does update /sys/block/sr0/size, so that part of revalidation is done.) Amit
On 04/05/2011 11:09 AM, Amit Shah wrote: > On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote: > > On 04/05/2011 09:41 AM, Amit Shah wrote: > > >See http://www.spinics.net/lists/linux-scsi/msg51504.html > > > > I see this is quite fresh. What are the plans here? > > We're still discussing where the fix should be, but it certainly is a > kernel bug and should be fixed there, and then applied to stable. > > However, there are other bugs in qemu which will prevent the right > size changes to be visible in the guest (the RFC series I sent out > earlier in this thread need to be applied to QEMU at the least, the > series has grown in my development tree since the time I sent that one > out). So essentially we need to update both, the hypervisor and the > guest to get proper CDROM media change support. Why do we need to update the guest for a qemu bug? What is the qemu bug? > It also looks like we can't have a workaround in QEMU to get older > guests to work. Older guests? or older hosts? > However, a hack in the kernel can be used without any QEMU changes > (revalidate disk on each sr_open() call, irrespective of detecting any > media change). I'm against doing that for upstream, but downstreams > could do that for new guest - old hypervisor compat. Seriously confused. Please use the kernels "host kernel" and "qemu" instead of "hypervisor" which is ambiguous.
On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote: > On 04/05/2011 11:09 AM, Amit Shah wrote: > >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote: > >> On 04/05/2011 09:41 AM, Amit Shah wrote: > >> >See http://www.spinics.net/lists/linux-scsi/msg51504.html > >> > >> I see this is quite fresh. What are the plans here? > > > >We're still discussing where the fix should be, but it certainly is a > >kernel bug and should be fixed there, and then applied to stable. > > > >However, there are other bugs in qemu which will prevent the right > >size changes to be visible in the guest (the RFC series I sent out > >earlier in this thread need to be applied to QEMU at the least, the > >series has grown in my development tree since the time I sent that one > >out). So essentially we need to update both, the hypervisor and the > >guest to get proper CDROM media change support. > > Why do we need to update the guest for a qemu bug? What is the qemu bug? Guest kernel bug: CDROM change event missed, so the the revalidate call isn't made, which causes stale data (like disc size) to be used on newer media. qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command from guests (which is a mandatory command acc. to scsi spec) which the guest uses to detect CDROM changes. Once this command is implemented, QEMU sends the required info the guest needs to detect CDROM changes. I have this implemented locally (also sent as RFC PATCH 2/3 in the 'cdrom bug roundup' thread. So: even if qemu is updated to handle this command, the guest won't work correctly since it misses the event. > >It also looks like we can't have a workaround in QEMU to get older > >guests to work. > > Older guests? or older hosts? Older guests (not patched with fix for the bug described above). Since the guest kernel completely misses the disc change event in the path that does the revalidation, there's nothing qemu can do that will make such older guests notice disc change. Also: if only the guest kernel is updated by qemu is not, things still won't work since qemu will never send valid information for the GET_EVENT_STATUS_NOTIFICATION command. > >However, a hack in the kernel can be used without any QEMU changes > >(revalidate disk on each sr_open() call, irrespective of detecting any > >media change). I'm against doing that for upstream, but downstreams > >could do that for new guest - old hypervisor compat. > > Seriously confused. Please use the kernels "host kernel" and "qemu" > instead of "hypervisor" which is ambiguous. OK: this last bit says that forcefully revalidating discs in the guest kernel when a guest userspace opens the disc will ensure size changes are reflected properly for guest userspace. So in this case, even if we're using an older qemu which doesn't implement GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine. This is obviously a hack. Amit
On 04/05/2011 12:12 PM, Amit Shah wrote: > On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote: > > On 04/05/2011 11:09 AM, Amit Shah wrote: > > >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote: > > >> On 04/05/2011 09:41 AM, Amit Shah wrote: > > >> >See http://www.spinics.net/lists/linux-scsi/msg51504.html > > >> > > >> I see this is quite fresh. What are the plans here? > > > > > >We're still discussing where the fix should be, but it certainly is a > > >kernel bug and should be fixed there, and then applied to stable. > > > > > >However, there are other bugs in qemu which will prevent the right > > >size changes to be visible in the guest (the RFC series I sent out > > >earlier in this thread need to be applied to QEMU at the least, the > > >series has grown in my development tree since the time I sent that one > > >out). So essentially we need to update both, the hypervisor and the > > >guest to get proper CDROM media change support. > > > > Why do we need to update the guest for a qemu bug? What is the qemu bug? > > Guest kernel bug: CDROM change event missed, so the the revalidate > call isn't made, which causes stale data (like disc size) to be used > on newer media. > > qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command > from guests (which is a mandatory command acc. to scsi spec) which the > guest uses to detect CDROM changes. Once this command is implemented, > QEMU sends the required info the guest needs to detect CDROM changes. > I have this implemented locally (also sent as RFC PATCH 2/3 in the > 'cdrom bug roundup' thread. > > So: even if qemu is updated to handle this command, the guest won't > work correctly since it misses the event. Okay. We aren't responsible for guest kernel bugs, especially those which apply to real hardware (we should make more effort for virtio bugs). It's enough that we fix qemu here. > > >It also looks like we can't have a workaround in QEMU to get older > > >guests to work. > > > > Older guests? or older hosts? > > Older guests (not patched with fix for the bug described above). > > Since the guest kernel completely misses the disc change event in the > path that does the revalidation, there's nothing qemu can do that will > make such older guests notice disc change. > > Also: if only the guest kernel is updated by qemu is not, things still > won't work since qemu will never send valid information for the > GET_EVENT_STATUS_NOTIFICATION command. > > > >However, a hack in the kernel can be used without any QEMU changes > > >(revalidate disk on each sr_open() call, irrespective of detecting any > > >media change). I'm against doing that for upstream, but downstreams > > >could do that for new guest - old hypervisor compat. > > > > Seriously confused. Please use the kernels "host kernel" and "qemu" > > instead of "hypervisor" which is ambiguous. > > OK: this last bit says that forcefully revalidating discs in the guest > kernel when a guest userspace opens the disc will ensure size changes > are reflected properly for guest userspace. So in this case, even if > we're using an older qemu which doesn't implement > GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine. > > This is obviously a hack. Yes. Thanks for the clarification. (let's see if I really got it - we have a kernel bug that hit both the guest and the host, plus a qemu bug?)
On (Tue) 05 Apr 2011 [12:17:30], Avi Kivity wrote: > On 04/05/2011 12:12 PM, Amit Shah wrote: > >On (Tue) 05 Apr 2011 [12:00:38], Avi Kivity wrote: > >> On 04/05/2011 11:09 AM, Amit Shah wrote: > >> >On (Tue) 05 Apr 2011 [10:48:16], Avi Kivity wrote: > >> >> On 04/05/2011 09:41 AM, Amit Shah wrote: > >> >> >See http://www.spinics.net/lists/linux-scsi/msg51504.html > >> >> > >> >> I see this is quite fresh. What are the plans here? > >> > > >> >We're still discussing where the fix should be, but it certainly is a > >> >kernel bug and should be fixed there, and then applied to stable. > >> > > >> >However, there are other bugs in qemu which will prevent the right > >> >size changes to be visible in the guest (the RFC series I sent out > >> >earlier in this thread need to be applied to QEMU at the least, the > >> >series has grown in my development tree since the time I sent that one > >> >out). So essentially we need to update both, the hypervisor and the > >> >guest to get proper CDROM media change support. > >> > >> Why do we need to update the guest for a qemu bug? What is the qemu bug? > > > >Guest kernel bug: CDROM change event missed, so the the revalidate > >call isn't made, which causes stale data (like disc size) to be used > >on newer media. > > > >qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command > >from guests (which is a mandatory command acc. to scsi spec) which the > >guest uses to detect CDROM changes. Once this command is implemented, > >QEMU sends the required info the guest needs to detect CDROM changes. > >I have this implemented locally (also sent as RFC PATCH 2/3 in the > >'cdrom bug roundup' thread. > > > >So: even if qemu is updated to handle this command, the guest won't > >work correctly since it misses the event. > > Okay. We aren't responsible for guest kernel bugs, especially those > which apply to real hardware (we should make more effort for virtio > bugs). It's enough that we fix qemu here. > > >> >It also looks like we can't have a workaround in QEMU to get older > >> >guests to work. > >> > >> Older guests? or older hosts? > > > >Older guests (not patched with fix for the bug described above). > > > >Since the guest kernel completely misses the disc change event in the > >path that does the revalidation, there's nothing qemu can do that will > >make such older guests notice disc change. > > > >Also: if only the guest kernel is updated by qemu is not, things still > >won't work since qemu will never send valid information for the > >GET_EVENT_STATUS_NOTIFICATION command. > > > >> >However, a hack in the kernel can be used without any QEMU changes > >> >(revalidate disk on each sr_open() call, irrespective of detecting any > >> >media change). I'm against doing that for upstream, but downstreams > >> >could do that for new guest - old hypervisor compat. > >> > >> Seriously confused. Please use the kernels "host kernel" and "qemu" > >> instead of "hypervisor" which is ambiguous. > > > >OK: this last bit says that forcefully revalidating discs in the guest > >kernel when a guest userspace opens the disc will ensure size changes > >are reflected properly for guest userspace. So in this case, even if > >we're using an older qemu which doesn't implement > >GET_EVENT_STATUS_NOTIFICATION, guest userspace apps will work fine. > > > >This is obviously a hack. > > Yes. Thanks for the clarification. > > (let's see if I really got it - we have a kernel bug that hit both > the guest and the host, plus a qemu bug?) Yes -- but just that we have many more qemu bugs. Our cdrom emulation has a lot of holes when it comes to being spec-compliant. I have a few fixes, Markus is working on some as well. Amit
On Tue, Apr 5, 2011 at 6:42 AM, David Ahern <dsahern@gmail.com> wrote: > On 04/04/11 23:33, Stefan Hajnoczi wrote: >> On Mon, Apr 4, 2011 at 6:54 PM, David Ahern <dsahern@gmail.com> wrote: >>> On 04/04/11 07:38, Anthony Liguori wrote: >>>> On 04/04/2011 08:22 AM, Avi Kivity wrote: >>>>> On 04/03/2011 02:57 PM, Stefan Hajnoczi wrote: >>>>>> In order for media change to work with Linux host CD-ROM it is >>>>>> necessary to reopen the file (otherwise the inode size will not >>>>>> refresh, this is an issue with existing kernels). >>>>>> >>>>> >>>>> Maybe we should fix the bug in Linux (and backport as necessary)? >>>>> >>>>> I think cd-rom assignment is sufficiently obscure that we can require >>>>> a fixed kernel instead of providing a workaround. >>>> >>>> Do reads fail after CD change? Or do they succeed and the size is just >>>> reported incorrectly? >>>> >>>> If it's the later, I'd agree that it needs fixing in the kernel. If >>>> it's the former, I'd say it's clearly a feature. >>> >>> In January 2010 I was seeing old data -- data from the prior CD -- in >>> the guest after the media change. >> >> Yikes. Is there a bug report for this? What are the steps to reproduce it? > > Not that I know of. It is reported by someone else last year as well: > http://www.mail-archive.com/kvm@vger.kernel.org/msg32999.html Thanks for the link. That looks like typical symptoms of missing media change (which can also legitimately happen if the guest does not poll). This should be fixed once the CD-ROM fixes are merged into QEMU and Linux. Stefan
On (Tue) 05 Apr 2011 [12:17:30], Avi Kivity wrote: > >Guest kernel bug: CDROM change event missed, so the the revalidate > >call isn't made, which causes stale data (like disc size) to be used > >on newer media. > > > >qemu bug: We don't handle the GET_EVENT_STATUS_NOTIFICATION command > >from guests (which is a mandatory command acc. to scsi spec) which the > >guest uses to detect CDROM changes. Once this command is implemented, > >QEMU sends the required info the guest needs to detect CDROM changes. > >I have this implemented locally (also sent as RFC PATCH 2/3 in the > >'cdrom bug roundup' thread. > > > >So: even if qemu is updated to handle this command, the guest won't > >work correctly since it misses the event. > > Okay. We aren't responsible for guest kernel bugs, especially those > which apply to real hardware (we should make more effort for virtio > bugs). It's enough that we fix qemu here. Actually I forgot about this already: it is possible for a workaround in qemu: inject the event twice instead of once. ie, if the spec says 'SENSE_UNIT_ATTENTION' needs to be reset if some cmd is successful, don't reset it the first time. That's how I confirmed the kernel bug last week. It is such a horrible hack that I didn't want to keep it around at all :-) Amit
diff --git a/block/raw-posix.c b/block/raw-posix.c index 6b72470..8b5205c 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -1238,10 +1238,28 @@ static int cdrom_is_inserted(BlockDriverState *bs) BDRVRawState *s = bs->opaque; int ret; - ret = ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT); - if (ret == CDS_DISC_OK) - return 1; - return 0; + /* + * Close the file descriptor if no medium is present and open it to poll + * again. This ensures the medium size is refreshed. If the file + * descriptor is kept open the size can become stale. This is essentially + * replicating CD-ROM polling but is driven by the guest. As the guest + * polls, we poll the host. + */ + + if (s->fd == -1) { + s->fd = qemu_open(bs->filename, s->open_flags, 0644); + if (s->fd < 0) { + return 0; + } + } + + ret = (ioctl(s->fd, CDROM_DRIVE_STATUS, CDSL_CURRENT) == CDS_DISC_OK); + + if (!ret) { + close(s->fd); + s->fd = -1; + } + return ret; } static int cdrom_eject(BlockDriverState *bs, int eject_flag)
Piggy-back on the guest CD-ROM polling to poll on the host. Open and close the host CD-ROM file descriptor to ensure we read the new size and not a stale size. Two things are going on here: 1. If hald/udisks is not already polling CD-ROMs on the host then re-opening the CD-ROM causes the host to read the new medium's size. 2. There is a bug in Linux which means the CD-ROM file descriptor must be re-opened in order for lseek(2) to see the new size. The inode size gets out of sync with the underlying device (which you can confirm by checking that /sys/block/sr0/size and lseek(2) do not match after media change). I have raised this with the maintainers but we need a workaround for the foreseeable future. Note that these changes are all in a #ifdef __linux__ section. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> --- block/raw-posix.c | 26 ++++++++++++++++++++++---- 1 files changed, 22 insertions(+), 4 deletions(-)