Patchwork [RFC] virtio_blk: add cache control support

login
register
mail settings
Submitter Christoph Hellwig
Date March 15, 2011, 2:16 p.m.
Message ID <20110315141644.GA30803@lst.de>
Download mbox | patch
Permalink /patch/86990/
State New
Headers show

Comments

Christoph Hellwig - March 15, 2011, 2:16 p.m.
Add support for the new dynamic features config space field to allow
en/disabling the write cache at runtime.  The userspace interface is
a SCSI-compatible sysfs attribute.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Rusty Russell - March 16, 2011, 4:09 a.m.
On Tue, 15 Mar 2011 15:16:44 +0100, Christoph Hellwig <hch@lst.de> wrote:
> Add support for the new dynamic features config space field to allow
> en/disabling the write cache at runtime.  The userspace interface is
> a SCSI-compatible sysfs attribute.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Hi Christoph,

   Neat work.  Minor comments:

> +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> +		;
> +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {

   Is there a reason we're not letting gcc and/or strcmp do the
optimization work here?

> +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> +	                  &features, sizeof(features));
> +
> +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> +			  &features2, sizeof(features2));
> +
> +	if ((features & VIRTIO_BLK_RT_WCE) !=
> +	    (features2 & VIRTIO_BLK_RT_WCE))
> +		return -EIO;

   This seems like a debugging check you left in.  Or do you suspect
some issues?

Thanks,
Rusty.
Christoph Hellwig - March 16, 2011, 2:09 p.m.
On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
> > +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> > +		;
> > +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
> 
>    Is there a reason we're not letting gcc and/or strcmp do the
> optimization work here?

I'm happ to switch strcmp.

> > +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> > +	                  &features, sizeof(features));
> > +
> > +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> > +			  &features2, sizeof(features2));
> > +
> > +	if ((features & VIRTIO_BLK_RT_WCE) !=
> > +	    (features2 & VIRTIO_BLK_RT_WCE))
> > +		return -EIO;
> 
>    This seems like a debugging check you left in.  Or do you suspect
> some issues?

No, it's intentional.  config space writes can't return errors, so we need
to check that the value has really changed.  I'll add a comment explaining it.
Rusty Russell - March 17, 2011, 5:06 a.m.
On Wed, 16 Mar 2011 15:09:58 +0100, Christoph Hellwig <hch@lst.de> wrote:
> On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
> > > +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
> > > +		;
> > > +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
> > 
> >    Is there a reason we're not letting gcc and/or strcmp do the
> > optimization work here?
> 
> I'm happ to switch strcmp.

Of course, that's assuming buf is nul terminated.

> > > +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
> > > +	                  &features, sizeof(features));
> > > +
> > > +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
> > > +			  &features2, sizeof(features2));
> > > +
> > > +	if ((features & VIRTIO_BLK_RT_WCE) !=
> > > +	    (features2 & VIRTIO_BLK_RT_WCE))
> > > +		return -EIO;
> > 
> >    This seems like a debugging check you left in.  Or do you suspect
> > some issues?
> 
> No, it's intentional.  config space writes can't return errors, so we need
> to check that the value has really changed.  I'll add a comment explaining it.

OK, under what circumstances could it fail?

If you're using this mechanism to indicate that the host doesn't support
the feature, that's making an assumption about the nature of config
space writes which isn't true for non-PCI virtio.

ie. lguest and S/390 don't trap writes to config space.

Or perhaps they should?  But we should be explicit about needing it...

Thanks,
Rusty.
Christoph Hellwig - March 17, 2011, 2:21 p.m.
On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
> > I'm happ to switch strcmp.
> 
> Of course, that's assuming buf is nul terminated.

It's the string the user writes into it, which normally should be
nul-terminated.

> > No, it's intentional.  config space writes can't return errors, so we need
> > to check that the value has really changed.  I'll add a comment explaining it.
> 
> OK, under what circumstances could it fail?
> 
> If you're using this mechanism to indicate that the host doesn't support
> the feature, that's making an assumption about the nature of config
> space writes which isn't true for non-PCI virtio.
> 
> ie. lguest and S/390 don't trap writes to config space.
> 
> Or perhaps they should?  But we should be explicit about needing it...

We have the features flag to indicate if updating the caching mode is
supported, but we we could still fail it for other reasons - e.g. we could fail
to reopen the file with/without O_SYNC.  But if lguest or S/390 don't support
trapping config space write this feature won't work for them at all.  As do
other features that make use of config space write, e.g. updating the MAC
address for virtio-net.
Rusty Russell - March 24, 2011, 12:11 a.m.
On Thu, 17 Mar 2011 15:21:22 +0100, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
> > OK, under what circumstances could it fail?
> > 
> > If you're using this mechanism to indicate that the host doesn't support
> > the feature, that's making an assumption about the nature of config
> > space writes which isn't true for non-PCI virtio.
> > 
> > ie. lguest and S/390 don't trap writes to config space.
> > 
> > Or perhaps they should?  But we should be explicit about needing it...
> 
> We have the features flag to indicate if updating the caching mode is
> supported, but we we could still fail it for other reasons - e.g. we could fail
> to reopen the file with/without O_SYNC.

OK, then I think you need to make it a real command and feed it into the
request queue.

The theory behind config space is that it's for advertising, not for
interaction.  And it's not ever very good at that...

>  But if lguest or S/390 don't support
> trapping config space write this feature won't work for them at all.  As do
> other features that make use of config space write, e.g. updating the MAC
> address for virtio-net.

Yes, and that was a mistake.  What does qemu do with a partially-written
MAC address?  Lguest ignores it, what does S/390 do?

Alex?

Cheers,
Rusty.
Anthony Liguori - March 24, 2011, 3:05 a.m.
On 03/17/2011 12:06 AM, Rusty Russell wrote:
> On Wed, 16 Mar 2011 15:09:58 +0100, Christoph Hellwig<hch@lst.de>  wrote:
>> On Wed, Mar 16, 2011 at 02:39:39PM +1030, Rusty Russell wrote:
>>>> +	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
>>>> +		;
>>>> +	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
>>>     Is there a reason we're not letting gcc and/or strcmp do the
>>> optimization work here?
>> I'm happ to switch strcmp.
> Of course, that's assuming buf is nul terminated.
>
>>>> +	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
>>>> +	&features, sizeof(features));
>>>> +
>>>> +	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
>>>> +			&features2, sizeof(features2));
>>>> +
>>>> +	if ((features&  VIRTIO_BLK_RT_WCE) !=
>>>> +	    (features2&  VIRTIO_BLK_RT_WCE))
>>>> +		return -EIO;
>>>     This seems like a debugging check you left in.  Or do you suspect
>>> some issues?
>> No, it's intentional.  config space writes can't return errors, so we need
>> to check that the value has really changed.  I'll add a comment explaining it.
> OK, under what circumstances could it fail?
>
> If you're using this mechanism to indicate that the host doesn't support
> the feature, that's making an assumption about the nature of config
> space writes which isn't true for non-PCI virtio.
>
> ie. lguest and S/390 don't trap writes to config space.
>
> Or perhaps they should?  But we should be explicit about needing it...

I don't think we ever operated on the assumption that config space 
writes would trap.

I don't think adding it is the right thing either because you can do 
byte access to the config space which makes atomicity difficult.

Any reason not to use a control queue to negotiate dynamic features?  
The authorative source of what the currently enabled features are can 
still be config space but the guest's enabling or disabling of a feature 
ought to be a control queue message.

Regards,

Anthony Liguori

> Thanks,
> Rusty.
>
>
>
Anthony Liguori - March 24, 2011, 3:11 a.m.
On 03/17/2011 09:21 AM, Christoph Hellwig wrote:
> On Thu, Mar 17, 2011 at 03:36:08PM +1030, Rusty Russell wrote:
>>> I'm happ to switch strcmp.
>> Of course, that's assuming buf is nul terminated.
> It's the string the user writes into it, which normally should be
> nul-terminated.
>
>>> No, it's intentional.  config space writes can't return errors, so we need
>>> to check that the value has really changed.  I'll add a comment explaining it.
>> OK, under what circumstances could it fail?
>>
>> If you're using this mechanism to indicate that the host doesn't support
>> the feature, that's making an assumption about the nature of config
>> space writes which isn't true for non-PCI virtio.
>>
>> ie. lguest and S/390 don't trap writes to config space.
>>
>> Or perhaps they should?  But we should be explicit about needing it...
> We have the features flag to indicate if updating the caching mode is
> supported, but we we could still fail it for other reasons - e.g. we could fail
> to reopen the file with/without O_SYNC.  But if lguest or S/390 don't support
> trapping config space write this feature won't work for them at all.  As do
> other features that make use of config space write, e.g. updating the MAC
> address for virtio-net.

QEMU does not rely on config space writes for supporting mac address 
updates.

Whenever the config space is updated, we update the mac address in the 
virtio-net structure but since PCI only supports 4 byte accesses, it 
will only be partially updated at certain points in time.

This is okay though because as long as a guest updates it before we send 
a network packet out, when we refer to the mac address, it will be 
correct.  We could just as well have the config space be in shared 
memory and only refer to the mac address in that shared memory area when 
transmitting a packet.

So the fact that we respond to config space writes doesn't mean that 
writes have a side effect other than updating the config space, which is 
really what I think the important point is here.

Regards,

Anthony Liguori
Christian Borntraeger - March 24, 2011, 9:54 a.m.
Am 24.03.2011 04:05, schrieb Anthony Liguori:
>> ie. lguest and S/390 don't trap writes to config space.
>>
>> Or perhaps they should?  But we should be explicit about needing it...
> I don't think we ever operated on the assumption that config space writes would trap.
> 
> I don't think adding it is the right thing either because you can do byte access to the config space which makes atomicity difficult.

There is the additional problem, that s390 has no MMIO and,therefore,
there is no real HW support for trapping writes to an area. You can
use page faults, or read-only faults on newer systems, but this is 
expensive. In addition, page faults only deliver the page frame, but
not the offset within a page.
> 
> Any reason not to use a control queue to negotiate dynamic features? 

Sounds reasonable.
Rusty Russell - March 25, 2011, 5:08 a.m.
On Thu, 24 Mar 2011 10:54:05 +0100, Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> Am 24.03.2011 04:05, schrieb Anthony Liguori:
> >> ie. lguest and S/390 don't trap writes to config space.
> >>
> >> Or perhaps they should?  But we should be explicit about needing it...
> > I don't think we ever operated on the assumption that config space writes would trap.
> > 
> > I don't think adding it is the right thing either because you can do byte access to the config space which makes atomicity difficult.
> 
> There is the additional problem, that s390 has no MMIO and,therefore,
> there is no real HW support for trapping writes to an area. You can
> use page faults, or read-only faults on newer systems, but this is 
> expensive. In addition, page faults only deliver the page frame, but
> not the offset within a page.

That's not *really* a problem, since you have control over the
config_set operation and could do whatever you wanted.

But I wanted to make sure we're all on the same page: you *can't* rely
on the host knowing immediately what you write to the config space.  If
you want that, an actual queued request is necessary...

Thanks,
Rusty.

Patch

Index: linux-2.6/drivers/block/virtio_blk.c
===================================================================
--- linux-2.6.orig/drivers/block/virtio_blk.c	2011-03-15 12:16:29.156133695 +0100
+++ linux-2.6/drivers/block/virtio_blk.c	2011-03-15 13:17:30.160634723 +0100
@@ -291,6 +291,73 @@  static ssize_t virtblk_serial_show(struc
 }
 DEVICE_ATTR(serial, S_IRUGO, virtblk_serial_show, NULL);
 
+static bool virtblk_has_wb_cache(struct virtio_blk *vblk)
+{
+	struct virtio_device *vdev = vblk->vdev;
+	u32 features;
+
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
+		return false;
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_DYNAMIC))
+		return true;
+
+	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
+			  &features, sizeof(features));
+
+	if (features & VIRTIO_BLK_RT_WCE)
+		return true;
+	return false;
+}
+
+static ssize_t virtblk_cache_type_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+
+	if (virtblk_has_wb_cache(disk->private_data))
+		return sprintf(buf, "write back\n");
+	else
+		return sprintf(buf, "write through\n");
+}
+
+static ssize_t virtblk_cache_type_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+	struct virtio_blk *vblk = disk->private_data;
+	struct virtio_device *vdev = vblk->vdev;
+	u32 features, features2 = 0;
+
+	if (!virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH) ||
+	    !virtio_has_feature(vdev, VIRTIO_BLK_F_DYNAMIC))
+		return -ENXIO;
+
+	if (strncmp(buf, "write through", sizeof("write through") - 1) == 0) {
+		;
+	} else if (strncmp(buf, "write back", sizeof("write back") - 1) == 0) {
+		blk_queue_flush(disk->queue, REQ_FLUSH);
+		features |= VIRTIO_BLK_RT_WCE;
+	} else {
+		return -EINVAL;
+	}
+
+	vdev->config->set(vdev, offsetof(struct virtio_blk_config, features),
+	                  &features, sizeof(features));
+
+	vdev->config->get(vdev, offsetof(struct virtio_blk_config, features),
+			  &features2, sizeof(features2));
+
+	if ((features & VIRTIO_BLK_RT_WCE) !=
+	    (features2 & VIRTIO_BLK_RT_WCE))
+		return -EIO;
+
+	if (!(features2 & VIRTIO_BLK_RT_WCE))
+		blk_queue_flush(disk->queue, 0);
+	return count;
+}
+static DEVICE_ATTR(cache_type, S_IRUGO|S_IWUSR,
+	    virtblk_cache_type_show, virtblk_cache_type_store);
+
 static int __devinit virtblk_probe(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk;
@@ -377,7 +444,7 @@  static int __devinit virtblk_probe(struc
 	index++;
 
 	/* configure queue flush support */
-	if (virtio_has_feature(vdev, VIRTIO_BLK_F_FLUSH))
+	if (virtblk_has_wb_cache(vblk))
 		blk_queue_flush(q, REQ_FLUSH);
 
 	/* If disk is read-only in the host, the guest should obey */
@@ -456,6 +523,10 @@  static int __devinit virtblk_probe(struc
 	if (err)
 		goto out_del_disk;
 
+	err = device_create_file(disk_to_dev(vblk->disk), &dev_attr_cache_type);
+	if (err)
+		goto out_del_disk;
+
 	return 0;
 
 out_del_disk:
@@ -499,7 +570,7 @@  static const struct virtio_device_id id_
 static unsigned int features[] = {
 	VIRTIO_BLK_F_SEG_MAX, VIRTIO_BLK_F_SIZE_MAX, VIRTIO_BLK_F_GEOMETRY,
 	VIRTIO_BLK_F_RO, VIRTIO_BLK_F_BLK_SIZE, VIRTIO_BLK_F_SCSI,
-	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY
+	VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_DYNAMIC
 };
 
 /*
Index: linux-2.6/include/linux/virtio_blk.h
===================================================================
--- linux-2.6.orig/include/linux/virtio_blk.h	2011-03-15 12:14:28.261632780 +0100
+++ linux-2.6/include/linux/virtio_blk.h	2011-03-15 12:25:56.308634399 +0100
@@ -16,6 +16,7 @@ 
 #define VIRTIO_BLK_F_SCSI	7	/* Supports scsi command passthru */
 #define VIRTIO_BLK_F_FLUSH	9	/* Cache flush command support */
 #define VIRTIO_BLK_F_TOPOLOGY	10	/* Topology information is available */
+#define VIRTIO_BLK_F_DYNAMIC	11	/* Dynamic features field */
 
 #define VIRTIO_BLK_ID_BYTES	20	/* ID string length */
 
@@ -45,6 +46,9 @@  struct virtio_blk_config {
 	__u16 min_io_size;
 	/* optimal sustained I/O size in logical blocks. */
 	__u32 opt_io_size;
+	/* runtime controllable features */
+	__u32 features;
+#define VIRTIO_BLK_RT_WCE		(1 << 0)
 
 } __attribute__((packed));