Message ID | 20090907181436.GA8538@redhat.com |
---|---|
State | Superseded |
Headers | show |
Michael S. Tsirkin wrote: > commit bf011293faaa7f87e4de83185931e7411b794128 made virtio-blk-pci not > PCI-compliant, since it makes region 0 (which is an i/o region) > size > 256, and, since PCI 2.1, i/o regions are limited to 256 bytes size. > > When the ATA serial number feature is off, which is the default, > make the device spec compliant again, by making region 0 smaller. I'd hazard this is the cause of the breakage others encountered -- even when the driver was initialized but unused. For some odd reason I hadn't seen nor been able to reproduce the failure. The mock-up of an entire ATA IDENTIFY page is really overkill for what we're trying to accomplish here, namely passing a 20 byte S/N from qemu to the guest. However emulating and passing an IDENTIFY page allows guest apps to interpret the information via an existing interface, with the guest driver doing nothing more than transferring the data as opaque. During review, other defined fields of the IDENTIFY page were speculated to be potentially useful thus the entire 512 byte page was passed wholesale. But it is clearly more trouble than benefit at this point. I'll rework the patch or use an alternate mechanism. -john
On Tue, Sep 08, 2009 at 03:40:08AM -0400, john cooper wrote: > Michael S. Tsirkin wrote: > > commit bf011293faaa7f87e4de83185931e7411b794128 made virtio-blk-pci not > > PCI-compliant, since it makes region 0 (which is an i/o region) > > size > 256, and, since PCI 2.1, i/o regions are limited to 256 bytes size. > > > > When the ATA serial number feature is off, which is the default, > > make the device spec compliant again, by making region 0 smaller. > > I'd hazard this is the cause of the breakage others > encountered -- even when the driver was initialized > but unused. For some odd reason I hadn't seen nor > been able to reproduce the failure. > > The mock-up of an entire ATA IDENTIFY page is really > overkill for what we're trying to accomplish here, > namely passing a 20 byte S/N from qemu to the guest. > However emulating and passing an IDENTIFY page allows > guest apps to interpret the information via an > existing interface, with the guest driver doing nothing > more than transferring the data as opaque. During > review, other defined fields of the IDENTIFY page were > speculated to be potentially useful thus the entire > 512 byte page was passed wholesale. But it is clearly > more trouble than benefit at this point. I'll rework > the patch or use an alternate mechanism. BTW, will the change you have in mind affect the guest driver as well? If yes, maybe revert 1d589bb16b825b3a7b4edd34d997f1f1f953033d in linux before 2.6.31 is out? Otherwise we'll be stuck supporting the bad interface from upstream 2.6.31 in upstream qemu ... > -john > > -- > john.cooper@redhat.com
On Tue, Sep 08, 2009 at 03:40:08AM -0400, john cooper wrote: > Michael S. Tsirkin wrote: > > commit bf011293faaa7f87e4de83185931e7411b794128 made virtio-blk-pci not > > PCI-compliant, since it makes region 0 (which is an i/o region) > > size > 256, and, since PCI 2.1, i/o regions are limited to 256 bytes size. > > > > When the ATA serial number feature is off, which is the default, > > make the device spec compliant again, by making region 0 smaller. > > I'd hazard this is the cause of the breakage others > encountered -- even when the driver was initialized > but unused. For some odd reason I hadn't seen nor > been able to reproduce the failure. > > The mock-up of an entire ATA IDENTIFY page is really > overkill for what we're trying to accomplish here, > namely passing a 20 byte S/N from qemu to the guest. > However emulating and passing an IDENTIFY page allows > guest apps to interpret the information via an > existing interface, with the guest driver doing nothing > more than transferring the data as opaque. During > review, other defined fields of the IDENTIFY page were > speculated to be potentially useful thus the entire > 512 byte page was passed wholesale. But it is clearly > more trouble than benefit at this point. I'll rework > the patch or use an alternate mechanism. > > -john For now, what my patch does is fix the PCI compliance issue for everyone who uses the default setup (without the serial #). With serial disabled, we should not reserve any space in region 0 (not even 20 bytes). So I propose we apply this patch for now, and then your patch only has to handle the case of serial number enabled. Makes sense? If yes pls ack. > -- > john.cooper@redhat.com
Michael S. Tsirkin wrote: > For now, what my patch does is fix the PCI compliance issue for everyone > who uses the default setup (without the serial #). With serial disabled, > we should not reserve any space in region 0 (not even 20 bytes). > > So I propose we apply this patch for now, and then your patch only has > to handle the case of serial number enabled. Makes sense? If yes pls > ack. Sorry, I intended to do so in my previous mail. Looks fine, ACK. -john
On Tue, 8 Sep 2009 05:28:31 pm Michael S. Tsirkin wrote: > On Tue, Sep 08, 2009 at 03:40:08AM -0400, john cooper wrote: > > Michael S. Tsirkin wrote: > > > commit bf011293faaa7f87e4de83185931e7411b794128 made virtio-blk-pci not > > > PCI-compliant, since it makes region 0 (which is an i/o region) > > > size > 256, and, since PCI 2.1, i/o regions are limited to 256 bytes size. > > > > > > When the ATA serial number feature is off, which is the default, > > > make the device spec compliant again, by making region 0 smaller. > > > > I'd hazard this is the cause of the breakage others > > encountered -- even when the driver was initialized > > but unused. For some odd reason I hadn't seen nor > > been able to reproduce the failure. > > > > The mock-up of an entire ATA IDENTIFY page is really > > overkill for what we're trying to accomplish here, > > namely passing a 20 byte S/N from qemu to the guest. > > However emulating and passing an IDENTIFY page allows > > guest apps to interpret the information via an > > existing interface, with the guest driver doing nothing > > more than transferring the data as opaque. During > > review, other defined fields of the IDENTIFY page were > > speculated to be potentially useful thus the entire > > 512 byte page was passed wholesale. But it is clearly > > more trouble than benefit at this point. I'll rework > > the patch or use an alternate mechanism. > > BTW, will the change you have in mind affect the guest driver as well? > If yes, maybe revert 1d589bb16b825b3a7b4edd34d997f1f1f953033d in linux > before 2.6.31 is out? Otherwise we'll be stuck supporting the bad > interface from upstream 2.6.31 in upstream qemu ... Too late, but no big deal. We just add another feature bit for the new method. We can have a command virtqueue for queries such as this. (Interestingly, this same issue broke lguest, with the 8-bit limit on configuration space sizes). Thanks, Rusty.
Rusty Russell wrote: > Too late, but no big deal. We just add another feature bit for the > new method. We can have a command virtqueue for queries such as this. > > (Interestingly, this same issue broke lguest, with the 8-bit limit on > configuration space sizes). I believe the existing guest visible interface is preservable as-is without introducing any ATA specific knowledge into the driver. A trivial alternate R/O mapping for the page into the guest is all that is required. I ran out of time last week to tie this off but hopefully will do so shortly. Thanks, -john
On Mon, 14 Sep 2009 09:09:59 pm Michael S. Tsirkin wrote: > On Tue, Sep 08, 2009 at 03:40:08AM -0400, john cooper wrote: > > Michael S. Tsirkin wrote: > > > commit bf011293faaa7f87e4de83185931e7411b794128 made virtio-blk-pci not > > > PCI-compliant, since it makes region 0 (which is an i/o region) > > > size > 256, and, since PCI 2.1, i/o regions are limited to 256 bytes size. > > > > > > When the ATA serial number feature is off, which is the default, > > > make the device spec compliant again, by making region 0 smaller. > > > > I'd hazard this is the cause of the breakage others > > encountered -- even when the driver was initialized > > but unused. For some odd reason I hadn't seen nor > > been able to reproduce the failure. > > > > The mock-up of an entire ATA IDENTIFY page is really > > overkill for what we're trying to accomplish here, > > namely passing a 20 byte S/N from qemu to the guest. > > However emulating and passing an IDENTIFY page allows > > guest apps to interpret the information via an > > existing interface, with the guest driver doing nothing > > more than transferring the data as opaque. During > > review, other defined fields of the IDENTIFY page were > > speculated to be potentially useful thus the entire > > 512 byte page was passed wholesale. But it is clearly > > more trouble than benefit at this point. I'll rework > > the patch or use an alternate mechanism. > > > > -john > > For now, what my patch does is fix the PCI compliance issue for everyone > who uses the default setup (without the serial #). With serial disabled, > we should not reserve any space in region 0 (not even 20 bytes). > > So I propose we apply this patch for now, and then your patch only has > to handle the case of serial number enabled. Makes sense? If yes pls > ack. lguest uses the same workaround (since we don't support that feature anyway at the moment). Thanks! Rusty.
On 09/21/2009 06:47 PM, john cooper wrote: > Rusty Russell wrote: > > >> Too late, but no big deal. We just add another feature bit for the >> new method. We can have a command virtqueue for queries such as this. >> >> (Interestingly, this same issue broke lguest, with the 8-bit limit on >> configuration space sizes). >> > I believe the existing guest visible interface > is preservable as-is without introducing any > ATA specific knowledge into the driver. A > trivial alternate R/O mapping for the page > into the guest is all that is required. > > I ran out of time last week to tie this off > but hopefully will do so shortly. > > Can we just read this page as a virtqueue command instead of having it mapped permanently?
Avi Kivity wrote: > On 09/21/2009 06:47 PM, john cooper wrote: >> Rusty Russell wrote: >> >> >>> Too late, but no big deal. We just add another feature bit for the >>> new method. We can have a command virtqueue for queries such as this. >>> >>> (Interestingly, this same issue broke lguest, with the 8-bit limit on >>> configuration space sizes). >>> >> I believe the existing guest visible interface >> is preservable as-is without introducing any >> ATA specific knowledge into the driver. A >> trivial alternate R/O mapping for the page >> into the guest is all that is required. >> >> I ran out of time last week to tie this off >> but hopefully will do so shortly. >> >> > > Can we just read this page as a virtqueue command instead of having it > mapped permanently? Probably although I hadn't looked specifically at doing so. Mapping the data via an unused pci bar is pretty trivial and seemed minimally intrusive to the existing driver. -john
On 09/22/2009 05:21 PM, john cooper wrote: >> Can we just read this page as a virtqueue command instead of having it >> mapped permanently? >> > Probably although I hadn't looked specifically > at doing so. Mapping the data via an unused > pci bar is pretty trivial and seemed minimally > intrusive to the existing driver. > We'll run out of bars if we expend them like that.
On Tue, Sep 22, 2009 at 05:27:22PM +0300, Avi Kivity wrote: > On 09/22/2009 05:21 PM, john cooper wrote: >>> Can we just read this page as a virtqueue command instead of having it >>> mapped permanently? >>> >> Probably although I hadn't looked specifically >> at doing so. Mapping the data via an unused >> pci bar is pretty trivial and seemed minimally >> intrusive to the existing driver. >> > > We'll run out of bars if we expend them like that. BAR1 is used for MSI-X already (if MSI-X is enabled), you can just add your data there. I would report the offset within the BAR somewhere in io space or in configuration space, so that we can relocate tables later if we like without changing guests. > -- > error compiling committee.c: too many arguments to function
On 09/22/2009 05:41 PM, Michael S. Tsirkin wrote: >>> >>> Probably although I hadn't looked specifically >>> at doing so. Mapping the data via an unused >>> pci bar is pretty trivial and seemed minimally >>> intrusive to the existing driver. >>> >>> >> We'll run out of bars if we expend them like that. >> > BAR1 is used for MSI-X already (if MSI-X is enabled), you can just add > your data there. I would report the offset within the BAR somewhere > in io space or in configuration space, so that we can relocate tables > later if we like without changing guests. > I'd really like to avoid such entanglements.
Avi Kivity wrote: > On 09/22/2009 05:21 PM, john cooper wrote: >>> Can we just read this page as a virtqueue command instead of having it >>> mapped permanently? >>> >> Probably although I hadn't looked specifically >> at doing so. Mapping the data via an unused >> pci bar is pretty trivial and seemed minimally >> intrusive to the existing driver. >> > > We'll run out of bars if we expend them like that. Agreed. However my motivation here was to use a single additional bar specifically to compensate for the PCI spec imposed 256 byte size limitation of the config space mapping. As we're defining the content/size of this area, future use to accommodate additional data should be unrestricted. If using an additional bar is still a concern I can certainly have a look at accomplishing the same via virtqueue scaffolding. -john
john cooper wrote: > Avi Kivity wrote: > >> On 09/22/2009 05:21 PM, john cooper wrote: >> >>>> Can we just read this page as a virtqueue command instead of having it >>>> mapped permanently? >>>> >>>> >>> Probably although I hadn't looked specifically >>> at doing so. Mapping the data via an unused >>> pci bar is pretty trivial and seemed minimally >>> intrusive to the existing driver. >>> >>> >> We'll run out of bars if we expend them like that. >> > > Agreed. However my motivation here was to use a > single additional bar specifically to compensate > for the PCI spec imposed 256 byte size limitation > of the config space mapping. As we're defining the > content/size of this area, future use to accommodate > additional data should be unrestricted. > Why expose the whole ATAPI page instead of just the serial number? I think the proper solution is to move the config to a separate bar that's MMIO instead of PIO. config access is never performance sensitive and an MMIO bar has less restrictions on size. Regards, Anthony Liguori
Anthony Liguori wrote: > john cooper wrote: >> Avi Kivity wrote: >> >>> On 09/22/2009 05:21 PM, john cooper wrote: >>> >>>>> Can we just read this page as a virtqueue command instead of having it >>>>> mapped permanently? >>>>> >>>> Probably although I hadn't looked specifically >>>> at doing so. Mapping the data via an unused >>>> pci bar is pretty trivial and seemed minimally >>>> intrusive to the existing driver. >>>> >>> We'll run out of bars if we expend them like that. >>> >> >> Agreed. However my motivation here was to use a >> single additional bar specifically to compensate >> for the PCI spec imposed 256 byte size limitation >> of the config space mapping. As we're defining the >> content/size of this area, future use to accommodate >> additional data should be unrestricted. >> > > Why expose the whole ATAPI page instead of just the serial number? Exposing the s/n alone was what I'd done originally which in retrospect was well contained within the pci config area -- even if squandering the somewhat limited space in that area. However exposing it as an identify page allows use of the existing HDIO_GET_IDENTITY ioctl (and existing use of that interface, eg: "hdparm -i") as a means to extract the s/n along with potentially useful related data in the page. Creation of the identify structure is within qemu as that is where the information exists. And this allows the virtio_blk driver to treat the data as opaque, simply exporting it uninterpreted via a copyout. So the only real issue remaining is how to best saw a hole between qemu and the guest virtio_blk driver to transfer the identity data. > I think the proper solution is to move the config to a separate bar > that's MMIO instead of PIO. config access is never performance > sensitive and an MMIO bar has less restrictions on size. That's exactly what I've done. I'll clean up the patch and post it here so we can have a more concrete discussion. -john
This is a correction to the previous version which violated the PCI config space 256 byte limitation. The crux of the feature is simply passing a qemu received virtio serial number to/through the guest driver and making it available to guest userspace via a preexisting interface. To accomplish this, ATA IDENTIFY data as implemented by the HDIO_GET_IDENTITY ioctl is returned to the caller for the virtio_blk drive of interest. Content of this identify structure is created by qemu and passed wholesale to guest userspace without interpretation by the guest driver. The change this patch implements is passing of the identify data through a mapping established by PCI BAR #5 rather than the PCI config area, the latter of which resulted in the above breakage. virtio_blk with this patch now pulls the information from the mapped bar area and includes minimal scaffolding to help map bars external to virtio_pci. -john
On Tue, Sep 29, 2009 at 02:09:30AM -0400, john cooper wrote: > This is a correction to the previous version > which violated the PCI config space 256 byte > limitation. > > The crux of the feature is simply passing a qemu > received virtio serial number to/through the guest > driver and making it available to guest userspace > via a preexisting interface. > > To accomplish this, ATA IDENTIFY data as implemented > by the HDIO_GET_IDENTITY ioctl is returned to the > caller for the virtio_blk drive of interest. Content > of this identify structure is created by qemu and > passed wholesale to guest userspace without > interpretation by the guest driver. > > The change this patch implements is passing of the > identify data through a mapping established by > PCI BAR #5 rather than the PCI config area, the > latter of which resulted in the above breakage. Using bar per feature we'll quickly run out of BARs. We already use a BAR for MSI-X - let's add ATA identity there? > virtio_blk with this patch now pulls the information > from the mapped bar area and includes minimal > scaffolding to help map bars external to virtio_pci. > > -john > > > -- > john.cooper@redhat.com
On 09/29/2009 08:58 AM, Michael S. Tsirkin wrote: > Using bar per feature we'll quickly run out of BARs. > We already use a BAR for MSI-X - let's add > ATA identity there? > Mixing unrelated features will quickly cause confusion.
On Tue, Sep 29, 2009 at 09:22:53AM +0200, Avi Kivity wrote: > On 09/29/2009 08:58 AM, Michael S. Tsirkin wrote: >> Using bar per feature we'll quickly run out of BARs. >> We already use a BAR for MSI-X - let's add >> ATA identity there? >> > Mixing unrelated features will quickly cause confusion. Note if we ever switch to 64 bit BARs, we'll only have 3 of these available, so the ID would be using the last free one. We can just as well plan ahead for when we'll add another feature? I agree fixed offsets are messy though. I previously proposed storing the offset to the identity data in an i/o register. This will let each implementation lay out this data in an optimal manner, without confusion. > -- > Do not meddle in the internals of kernels, for they are subtle and quick to panic.
On 09/29/2009 10:54 AM, Michael S. Tsirkin wrote: > On Tue, Sep 29, 2009 at 09:22:53AM +0200, Avi Kivity wrote: > >> On 09/29/2009 08:58 AM, Michael S. Tsirkin wrote: >> >>> Using bar per feature we'll quickly run out of BARs. >>> We already use a BAR for MSI-X - let's add >>> ATA identity there? >>> >>> >> Mixing unrelated features will quickly cause confusion. >> > Note if we ever switch to 64 bit BARs, we'll only have 3 of these > available, so the ID would be using the last free one. > We can just as well plan ahead for when we'll add another feature? > > I agree fixed offsets are messy though. I previously proposed storing > the offset to the identity data in an i/o register. This will let each > implementation lay out this data in an optimal manner, without > confusion. > Yes, that works much better.
john cooper wrote: > This is a correction to the previous version > which violated the PCI config space 256 byte > limitation. > > The crux of the feature is simply passing a qemu > received virtio serial number to/through the guest > driver and making it available to guest userspace > via a preexisting interface. > > To accomplish this, ATA IDENTIFY data as implemented > by the HDIO_GET_IDENTITY ioctl is returned to the > caller for the virtio_blk drive of interest. Content > of this identify structure is created by qemu and > passed wholesale to guest userspace without > interpretation by the guest driver. > > The change this patch implements is passing of the > identify data through a mapping established by > PCI BAR #5 rather than the PCI config area, the > latter of which resulted in the above breakage. > This is a massive layering violation. The virtio-blk ABI cannot make demands of the transport. The better solution would be to move the entire virtio-pci config space to a separate BAR that's an MMIO region. Then there is no practical limit on the size of the config area. Regards, Anthony Liguori
Michael S. Tsirkin wrote: > On Tue, Sep 29, 2009 at 09:22:53AM +0200, Avi Kivity wrote: > >> On 09/29/2009 08:58 AM, Michael S. Tsirkin wrote: >> >>> Using bar per feature we'll quickly run out of BARs. >>> We already use a BAR for MSI-X - let's add >>> ATA identity there? >>> >>> >> Mixing unrelated features will quickly cause confusion. >> > > Note if we ever switch to 64 bit BARs, we'll only have 3 of these > available, so the ID would be using the last free one. > We can just as well plan ahead for when we'll add another feature? > > I agree fixed offsets are messy though. I previously proposed storing > the offset to the identity data in an i/o register. This will let each > implementation lay out this data in an optimal manner, without > confusion. > 1) There's no need to make the identity page separate from the config space. The real problem is the config space is too small. We can fix that without making any changes to virtio-blk by putting the config space in a separate BAR. 2) Passing an ATA identity page is goofy. We should just pass the serial number and let Linux generate the identity page. Just because Linux requires this as it's user space interface, that doesn't mean that other guests will (like Windows). Instead of exposing an opaque blob, we should expose the information we need in a structured way. Regards, Anthony Liguori
On Tue, Sep 29, 2009 at 08:55:20AM -0500, Anthony Liguori wrote: > Michael S. Tsirkin wrote: >> On Tue, Sep 29, 2009 at 09:22:53AM +0200, Avi Kivity wrote: >> >>> On 09/29/2009 08:58 AM, Michael S. Tsirkin wrote: >>> >>>> Using bar per feature we'll quickly run out of BARs. >>>> We already use a BAR for MSI-X - let's add >>>> ATA identity there? >>>> >>> Mixing unrelated features will quickly cause confusion. >>> >> >> Note if we ever switch to 64 bit BARs, we'll only have 3 of these >> available, so the ID would be using the last free one. >> We can just as well plan ahead for when we'll add another feature? >> >> I agree fixed offsets are messy though. I previously proposed storing >> the offset to the identity data in an i/o register. This will let each >> implementation lay out this data in an optimal manner, without >> confusion. >> > > 1) There's no need to make the identity page separate from the config > space. The real problem is the config space is too small. We can fix > that without making any changes to virtio-blk by putting the config > space in a separate BAR. I don't think it's such a good idea. We want to keep exposing most of config space in i/o bar because of backwards compatibility. And we want to keep datapath operations such as vq kicks, in i/o space. > 2) Passing an ATA identity page is goofy. We should just pass the > serial number and let Linux generate the identity page. Just because > Linux requires this as it's user space interface, that doesn't mean that > other guests will (like Windows). Instead of exposing an opaque blob, > we should expose the information we need in a structured way. > > Regards, > > Anthony Liguori
Michael S. Tsirkin wrote: > I don't think it's such a good idea. We want to keep exposing most of > config space in i/o bar because of backwards compatibility. And we want > to keep datapath operations such as vq kicks, in i/o space. > We can do feature negotiation in virtio-pci. That lets us continue exposing the first 246 bytes of the config space in PIO and guests can negotiate a second bar for access to larger config spaces. Regards, Anthony Liguori
On 09/29/2009 03:51 PM, Anthony Liguori wrote: >> The change this patch implements is passing of the >> identify data through a mapping established by >> PCI BAR #5 rather than the PCI config area, the >> latter of which resulted in the above breakage. > > > This is a massive layering violation. The virtio-blk ABI cannot make > demands of the transport. True. > > The better solution would be to move the entire virtio-pci config > space to a separate BAR that's an MMIO region. Then there is no > practical limit on the size of the config area. Don't some fast-paths accesses go through the config space? Using mmio will slow them down.
On 09/29/2009 04:14 PM, Anthony Liguori wrote: > Michael S. Tsirkin wrote: >> I don't think it's such a good idea. We want to keep exposing most of >> config space in i/o bar because of backwards compatibility. And we want >> to keep datapath operations such as vq kicks, in i/o space. > > We can do feature negotiation in virtio-pci. That lets us continue > exposing the first 246 bytes of the config space in PIO and guests can > negotiate a second bar for access to larger config spaces. Shouldn't the guest's pci layer automatically adapt to what the card throws at it? Of course we need to keep migration compatibility so we need to keep pio.
On Tue, Sep 29, 2009 at 09:14:56AM -0500, Anthony Liguori wrote: > Michael S. Tsirkin wrote: >> I don't think it's such a good idea. We want to keep exposing most of >> config space in i/o bar because of backwards compatibility. And we want >> to keep datapath operations such as vq kicks, in i/o space. >> > > We can do feature negotiation in virtio-pci. That lets us continue > exposing the first 246 bytes of the config space in PIO and guests can > negotiate a second bar for access to larger config spaces. > > Regards, > > Anthony Liguori guests can't negotiate a bar. What I think we want to do - Use i/o for small fields (2-4 bytes) as we did previously we won't run out anytime soon - For large fields, put them in memory BAR1 and report the offset in i/o space - For huge fields, provide a mailbox in i/o space where guest writes an offset and reads out a value
Avi Kivity wrote: > On 09/29/2009 03:51 PM, Anthony Liguori wrote: >>> The change this patch implements is passing of the >>> identify data through a mapping established by >>> PCI BAR #5 rather than the PCI config area, the >>> latter of which resulted in the above breakage. >> >> >> This is a massive layering violation. The virtio-blk ABI cannot make >> demands of the transport. > > True. > >> >> The better solution would be to move the entire virtio-pci config >> space to a separate BAR that's an MMIO region. Then there is no >> practical limit on the size of the config area. > > Don't some fast-paths accesses go through the config space? Using > mmio will slow them down. Not that I know of. Regards, Anthony Liguori
Michael S. Tsirkin wrote: > On Tue, Sep 29, 2009 at 09:14:56AM -0500, Anthony Liguori wrote: > >> Michael S. Tsirkin wrote: >> >>> I don't think it's such a good idea. We want to keep exposing most of >>> config space in i/o bar because of backwards compatibility. And we want >>> to keep datapath operations such as vq kicks, in i/o space. >>> >>> >> We can do feature negotiation in virtio-pci. That lets us continue >> exposing the first 246 bytes of the config space in PIO and guests can >> negotiate a second bar for access to larger config spaces. >> >> Regards, >> >> Anthony Liguori >> > > guests can't negotiate a bar. > The host always exposes the config in two places. config[0..235] in bytes 20-255 in BAR0 and the full config space in BAR2 (or BAR1 + offset if desired). Old drivers will just ignore the new config location. > What I think we want to do > - Use i/o for small fields (2-4 bytes) as we did previously > we won't run out anytime soon > - For large fields, put them in memory BAR1 and report the > offset in i/o space > This is at best a band aid as a large number of small fields can result in the same problem. > - For huge fields, provide a mailbox in i/o space where guest > writes an offset and reads out a value > Regards, Anthony Liguori
On Tue, 29 Sep 2009 11:25:20 pm Anthony Liguori wrote: > 2) Passing an ATA identity page is goofy. We should just pass the > serial number and let Linux generate the identity page. Just because > Linux requires this as it's user space interface, that doesn't mean that > other guests will (like Windows). Instead of exposing an opaque blob, > we should expose the information we need in a structured way. I think John did this on my prompting, because it already existed as a defined ABI. If someone is going to make stuff up, isn't it better that kvm does it (some of those fields might have useful values?) than linux? (I don't care: my patches stick with the current scheme, but changing is fairly easy, but needs to be decided before commit). Thanks, Rusty,
Rusty Russell wrote: > On Tue, 29 Sep 2009 11:25:20 pm Anthony Liguori wrote: > >> 2) Passing an ATA identity page is goofy. We should just pass the >> serial number and let Linux generate the identity page. Just because >> Linux requires this as it's user space interface, that doesn't mean that >> other guests will (like Windows). Instead of exposing an opaque blob, >> we should expose the information we need in a structured way. >> > > I think John did this on my prompting, because it already existed as a > defined ABI. If someone is going to make stuff up, isn't it better that > kvm does it (some of those fields might have useful values?) than linux? > We should expose Linux ABI quirks in the virtio ABI. Down the road, someone may decide that there's a better way to expose things like S/N down to userspace and then in our drivers we would have to parse the ATA identity page in order to figure out the S/N to expose that properly through the new interface. Moreover, there's probably an OS out there that we already have to do that for. That's a pretty ugly thing to have to do. > (I don't care: my patches stick with the current scheme, but changing is > fairly easy, but needs to be decided before commit). > Everything is tremendously simplified if we just expose the S/N in the ABI since we can avoid solving the config size limitation. I'm a big fan of deferring difficult problems when there's an immediate easy solution :-) > Thanks, > Rusty, >
On Tue, Sep 29, 2009 at 12:26:13PM -0500, Anthony Liguori wrote: > Michael S. Tsirkin wrote: >> On Tue, Sep 29, 2009 at 09:14:56AM -0500, Anthony Liguori wrote: >> >>> Michael S. Tsirkin wrote: >>> >>>> I don't think it's such a good idea. We want to keep exposing most of >>>> config space in i/o bar because of backwards compatibility. And we want >>>> to keep datapath operations such as vq kicks, in i/o space. >>>> >>> We can do feature negotiation in virtio-pci. That lets us continue >>> exposing the first 246 bytes of the config space in PIO and guests >>> can negotiate a second bar for access to larger config spaces. >>> >>> Regards, >>> >>> Anthony Liguori >>> >> >> guests can't negotiate a bar. >> > > The host always exposes the config in two places. config[0..235] in > bytes 20-255 in BAR0 and the full config space in BAR2 (or BAR1 + offset > if desired). Old drivers will just ignore the new config location. > >> What I think we want to do >> - Use i/o for small fields (2-4 bytes) as we did previously >> we won't run out anytime soon >> - For large fields, put them in memory BAR1 and report the >> offset in i/o space >> > > This is at best a band aid as a large number of small fields can result > in the same problem. It would have to be very large :) >> - For huge fields, provide a mailbox in i/o space where guest >> writes an offset and reads out a value >> > > Regards, > > Anthony Liguori
Anthony Liguori wrote: > 2) Passing an ATA identity page is goofy. We should just pass the > serial number and let Linux generate the identity page. I have a hard time disagreeing with that comment. But the rationale that got us here was from the user's perspective this information could be trivially exposed via a preexisting interface vs. creating a new interface as I'd done initially. It was also pointed out other defined fields in the identify page exist which were potentially useful. So pressing a HDIO_GET_IDENTITY ioctl into service here didn't seem unreasonable. In terms of getting that bag of bits from qemu to the guest userspace, having qemu cobble together the page is arguably appropriate as that it where the source of the information resides. The alternative of pushing knowledge of this structure into the guest driver seems unnatural. Moreover we would still somehow need a means to get the data from qemu to the driver even if the driver was only packaging it up for export as an identify page. > Just because > Linux requires this as it's user space interface, that doesn't mean that > other guests will (like Windows). Instead of exposing an opaque blob, > we should expose the information we need in a structured way. The identify page is defined as the return of an "identify drive" command from an ATA drive/controller. I'd hazard it is a familiar structure in a windows driver/userland context as well. -john
john cooper wrote: > Anthony Liguori wrote: > >> 2) Passing an ATA identity page is goofy. We should just pass the >> serial number and let Linux generate the identity page. > > I have a hard time disagreeing with that comment. > > But the rationale that got us here was from the > user's perspective this information could be > trivially exposed via a preexisting interface vs. > creating a new interface as I'd done initially. > It was also pointed out other defined fields in > the identify page exist which were potentially > useful. So pressing a HDIO_GET_IDENTITY ioctl > into service here didn't seem unreasonable. virtio-blk isn't an ATA device so why pretend like it is? We can present the information we think is useful to the guest in a concise format and then if the guest wants to pretend it's an ATA device, it can create the ATA identify page on the fly. >> Just because Linux requires this as it's user space interface, that >> doesn't mean that other guests will (like Windows). Instead of >> exposing an opaque blob, we should expose the information we need in >> a structured way. > > The identify page is defined as the return of an > "identify drive" command from an ATA drive/controller. > I'd hazard it is a familiar structure in a windows > driver/userland context as well. Regards, Anthony Liguori
On Wed, 30 Sep 2009 03:01:12 am Anthony Liguori wrote: > Rusty Russell wrote: > > On Tue, 29 Sep 2009 11:25:20 pm Anthony Liguori wrote: > > > >> 2) Passing an ATA identity page is goofy. We should just pass the > >> serial number and let Linux generate the identity page. Just because > >> Linux requires this as it's user space interface, that doesn't mean that > >> other guests will (like Windows). Instead of exposing an opaque blob, > >> we should expose the information we need in a structured way. > >> > > > > I think John did this on my prompting, because it already existed as a > > defined ABI. If someone is going to make stuff up, isn't it better that > > kvm does it (some of those fields might have useful values?) than linux? > > We should expose Linux ABI quirks in the virtio ABI. Down the road, > someone may decide that there's a better way to expose things like S/N > down to userspace and then in our drivers we would have to parse the ATA > identity page in order to figure out the S/N to expose that properly > through the new interface. Moreover, there's probably an OS out there > that we already have to do that for. > > That's a pretty ugly thing to have to do. Calling the ATA IDENTIFY command a Linux ABI quirk is disingenuous. Not sure what the SCSI equiv is to get a serial number (VPD?) But if everyone is happy with 20 bytes, and John wants to do SN that way, I'll take the patch. Still, I got to learn fun stuff about the block layer! Rusty.
On Wed, 30 Sep 2009 06:25:39 am Anthony Liguori wrote:
> virtio-blk isn't an ATA device so why pretend like it is?
Do you want to maintain a "vdparm" to match hdparm and sdparm?
Confused,
Rusty.
Rusty Russell wrote: > Calling the ATA IDENTIFY command a Linux ABI quirk is disingenuous. > > Not sure what the SCSI equiv is to get a serial number (VPD?) But if > everyone is happy with 20 bytes, and John wants to do SN that way, > I'll take the patch. One thing I'd hope to see at some point is a "read-only" flag, set for read-only images. I don't know if ATA IDENTIFY has that yet, but I'm sure the SCSI equivalent does. -- Jamie
Rusty Russell wrote: > On Wed, 30 Sep 2009 06:25:39 am Anthony Liguori wrote: > >> virtio-blk isn't an ATA device so why pretend like it is? >> > > Do you want to maintain a "vdparm" to match hdparm and sdparm? > That's an argument to implement the HDIO_GET_IDENTITY ioctl in the Linux virtio-blk driver. That's a totally reasonable thing to do. However, baking something into the virtio-blk ABI because Linux requires an hdparm, sdparm, and potentially a vdparm doesn't seem like a good idea to me. We should expose the information that makes sense for a virtio block device and if a guest wants to create an ATA identify response out of that data, more power to it. Regards, Anthony Liguori > Confused, > Rusty. >
> > But the rationale that got us here was from the > > user's perspective this information could be > > trivially exposed via a preexisting interface vs. > > creating a new interface as I'd done initially. > > It was also pointed out other defined fields in > > the identify page exist which were potentially > > useful. So pressing a HDIO_GET_IDENTITY ioctl > > into service here didn't seem unreasonable. > > virtio-blk isn't an ATA device so why pretend like it is? Agreed. If we're going to copy anything than my vote would be for SCSI. We already have partial support for feeding SCSI commands to a virtio device. TBH I wouldn't have bothered with virtio-blk at all, and gone straight for virtio-scsi. I guess it's a bit late for that though. Paul
On Wed, 30 Sep 2009 11:47:00 am Anthony Liguori wrote: > Rusty Russell wrote: > > On Wed, 30 Sep 2009 06:25:39 am Anthony Liguori wrote: > > > >> virtio-blk isn't an ATA device so why pretend like it is? ... > That's an argument to implement the HDIO_GET_IDENTITY ioctl in the Linux > virtio-blk driver. That's a totally reasonable thing to do. Your second sentence contradicts your first here. If it's reasonable to imitate ATA in Linux, it's reasonable to imitate ATA in virtio. Using an existing standard should always be the default; one should need a compelling reason *not* to do so. In practice, (1) that code already exists in kvm, and (2) it makes the Linux implementation trivial. So, what are the real arguments? (1) Some fields are redundant (eg. geometry) with existing fields already. (2) Future fields will have to decide whether to use ATA IDENTIFY or normal features, (3) We don't know enough about ATA IDENTIFY to do a decent job of filling it in. I can buy (1), but I still have to deal with (2) and (3) if they're inside the virtio_blk driver anyway. Anything else? Rusty.
Rusty Russell wrote: > On Wed, 30 Sep 2009 11:47:00 am Anthony Liguori wrote: > > Rusty Russell wrote: > > > On Wed, 30 Sep 2009 06:25:39 am Anthony Liguori wrote: > > > > > >> virtio-blk isn't an ATA device so why pretend like it is? > ... > > That's an argument to implement the HDIO_GET_IDENTITY ioctl in the Linux > > virtio-blk driver. That's a totally reasonable thing to do. > > Your second sentence contradicts your first here. If it's reasonable to > imitate ATA in Linux, it's reasonable to imitate ATA in virtio. > > Using an existing standard should always be the default; one should need a > compelling reason *not* to do so. In practice, (1) that code already exists > in kvm, and (2) it makes the Linux implementation trivial. > > So, what are the real arguments? > (1) Some fields are redundant (eg. geometry) with existing fields already. > (2) Future fields will have to decide whether to use ATA IDENTIFY or normal > features, > (3) We don't know enough about ATA IDENTIFY to do a decent job of filling > it in. > > I can buy (1), but I still have to deal with (2) and (3) if they're inside > the virtio_blk driver anyway. > > Anything else? (4) Putting it into the virtio_blk driver means another duplicate implementation, and it might have to be duplicated another time in the Windows virtio_blk driver too, if there's a Windows generic equivalent to HDIO_GET_IDENTITY (I don't know if there is though). (5) When the host's backing driver is itself a disk implementing the ATA command, would you ever want the guest to see all details of the identity page as close as possible to the host device sometimes? I.e. is virtio_blk a way to accelerate block I/O to a host device, or a way to abstract it to the bare minimum? (6) Why don't we use the SCSI identity page instead? Presumably that's already in qemu/kvm too, for SCSI and USB disks :-) -- Jamie
Paul Brook wrote: > Agreed. If we're going to copy anything than my vote would be for SCSI. We > already have partial support for feeding SCSI commands to a virtio device. > TBH I wouldn't have bothered with virtio-blk at all, and gone straight for > virtio-scsi. I guess it's a bit late for that though. Initially the patch introduced a dedicated ioctl in the guest driver to retrieve this data. Review comments from that version suggested using an existing interface which to me seemed reasonable. I had looked first at implementing the above via a scsi INQUIRY cmd. But at least at that time this option would have been more invasive to the existing code vs. just doing the equivalent via an ATA HDIO_GET_IDENTITY. -john
Jamie Lokier wrote: > (5) When the host's backing driver is itself a disk implementing the > ATA command, would you ever want the guest to see all details of the > identity page as close as possible to the host device sometimes? > I.e. is virtio_blk a way to accelerate block I/O to a host device, or > a way to abstract it to the bare minimum? Admittedly most of the data in the identify page as seen by the host is fodder for the sake of existing userspace utilities. So I don't really see any case where pass-through of host->guest data is meaningful or useful. The patch just follows suit here with a similar mockup of this data for the guest userland. -john
Rusty Russell wrote: > Calling the ATA IDENTIFY command a Linux ABI quirk is disingenuous. I think ATA has earned that moniker all by itself. > Not sure what the SCSI equiv is to get a serial number (VPD?) But if everyone > is happy with 20 bytes, and John wants to do SN that way, I'll take the patch. I've been swayed from one side to the other of this argument. Still I'd begrudgingly give my vote to the ATA option only because it reuses an existing wart vs. introducing a new one. Updated qemu patch follows leveraging your virtqueue REQ_TYPE_SPECIAL scaffolding. At this point only the issue of this guest visible interface should remain. -john
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c index a33eafb..3652c0a 100644 --- a/hw/virtio-blk.c +++ b/hw/virtio-blk.c @@ -27,6 +27,7 @@ typedef struct VirtIOBlock void *rq; char serial_str[BLOCK_SERIAL_STRLEN + 1]; QEMUBH *bh; + size_t config_size; } VirtIOBlock; static VirtIOBlock *to_virtio_blk(VirtIODevice *vdev) @@ -362,7 +363,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config) virtio_identify_template(&blkcfg); memcpy(&blkcfg.identify[VIRTIO_BLK_ID_SN], s->serial_str, VIRTIO_BLK_ID_SN_BYTES); - memcpy(config, &blkcfg, sizeof(blkcfg)); + memcpy(config, &blkcfg, s->config_size); } static uint32_t virtio_blk_get_features(VirtIODevice *vdev) @@ -419,18 +420,21 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, DriveInfo *dinfo) VirtIOBlock *s; int cylinders, heads, secs; static int virtio_blk_id; - char *ps; + char *ps = (char *)drive_get_serial(dinfo->bdrv); + size_t size = strlen(ps) ? sizeof(struct virtio_blk_config) : + offsetof(struct virtio_blk_config, _blk_size); s = (VirtIOBlock *)virtio_common_init("virtio-blk", VIRTIO_ID_BLOCK, - sizeof(struct virtio_blk_config), + size, sizeof(VirtIOBlock)); + s->config_size = size; s->vdev.get_config = virtio_blk_update_config; s->vdev.get_features = virtio_blk_get_features; s->vdev.reset = virtio_blk_reset; s->bs = dinfo->bdrv; s->rq = NULL; - if (strlen(ps = (char *)drive_get_serial(s->bs))) + if (strlen(ps)) strncpy(s->serial_str, ps, sizeof(s->serial_str)); else snprintf(s->serial_str, sizeof(s->serial_str), "0");