Patchwork Re: [PATCHv3 1/2] virtio: support layout with avail ring before idx

login
register
mail settings
Submitter Rusty Russell
Date June 4, 2010, 2:34 a.m.
Message ID <201006041204.57973.rusty@rustcorp.com.au>
Download mbox | patch
Permalink /patch/54533/
State New
Headers show

Comments

Rusty Russell - June 4, 2010, 2:34 a.m.
On Wed, 2 Jun 2010 12:17:12 am Michael S. Tsirkin wrote:
> This adds an (unused) option to put available ring before control (avail
> index, flags), and adds padding between index and flags. This avoids
> cache line sharing between control and ring, and also makes it possible
> to extend avail control without incurring extra cache misses.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

No no no no.  254?  You're trying to Morton me![1]

How's this (untested):


Cheers,
Rusty.
[1] Andrew Morton has this technique where he posts a solution so ugly it
    forces others to fix it properly.  Ego-roping, basically.
Michael S. Tsirkin - June 4, 2010, 10:35 a.m.
On Fri, Jun 04, 2010 at 12:04:57PM +0930, Rusty Russell wrote:
> On Wed, 2 Jun 2010 12:17:12 am Michael S. Tsirkin wrote:
> > This adds an (unused) option to put available ring before control (avail
> > index, flags), and adds padding between index and flags. This avoids
> > cache line sharing between control and ring, and also makes it possible
> > to extend avail control without incurring extra cache misses.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> No no no no.  254?  You're trying to Morton me![1]

Hmm, I wonder what will we do if we want a 3rd field on
a separate chacheline. But ok.

> How's this (untested):

I think we also want to put flags there as well,
they are used on interrupt path, together with last used index.

> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -74,8 +74,8 @@ struct vring {
>  /* The standard layout for the ring is a continuous chunk of memory which looks
>   * like this.  We assume num is a power of 2.
>   *
> - * struct vring
> - * {
> + * struct vring {
> + *	*** The driver writes to this part.
>   *	// The actual descriptors (16 bytes each)
>   *	struct vring_desc desc[num];
>   *
> @@ -84,9 +84,11 @@ struct vring {
>   *	__u16 avail_idx;
>   *	__u16 available[num];
>   *
> - *	// Padding to the next align boundary.
> + *	// Padding so used_flags is on the next align boundary.
>   *	char pad[];
> + *	__u16 last_used; // On a cacheline of its own.
>   *
> + *	*** The device writes to this part.
>   *	// A ring of used descriptor heads with free-running index.
>   *	__u16 used_flags;
>   *	__u16 used_idx;
> @@ -110,6 +112,12 @@ static inline unsigned vring_size(unsign
>  		+ sizeof(__u16) * 2 + sizeof(struct vring_used_elem) * num;
>  }
>  
> +/* Last used index sits at the very end of the driver part of the struct */
> +static inline __u16 *vring_last_used_idx(const struct vring *vr)
> +{
> +	return (__u16 *)vr->used - 1;
> +}
> +
>  #ifdef __KERNEL__
>  #include <linux/irqreturn.h>
>  struct virtio_device;
> 
> Cheers,
> Rusty.
> [1] Andrew Morton has this technique where he posts a solution so ugly it
>     forces others to fix it properly.  Ego-roping, basically.
Rusty Russell - June 4, 2010, 11:16 a.m.
On Fri, 4 Jun 2010 08:05:43 pm Michael S. Tsirkin wrote:
> On Fri, Jun 04, 2010 at 12:04:57PM +0930, Rusty Russell wrote:
> > On Wed, 2 Jun 2010 12:17:12 am Michael S. Tsirkin wrote:
> > > This adds an (unused) option to put available ring before control (avail
> > > index, flags), and adds padding between index and flags. This avoids
> > > cache line sharing between control and ring, and also makes it possible
> > > to extend avail control without incurring extra cache misses.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > No no no no.  254?  You're trying to Morton me![1]
> 
> Hmm, I wonder what will we do if we want a 3rd field on
> a separate chacheline. But ok.
> 
> > How's this (untested):
> 
> I think we also want to put flags there as well,
> they are used on interrupt path, together with last used index.

I'm uncomfortable with moving a field.

We haven't done that before and I wonder what will break with old code.

Should we instead just abandon the flags field and use last_used only?
Or, more radically, put flags == last_used when the feature is on?

Thoughts?
Rusty.
Michael S. Tsirkin - June 4, 2010, 11:42 a.m.
On Fri, Jun 04, 2010 at 08:46:49PM +0930, Rusty Russell wrote:
> On Fri, 4 Jun 2010 08:05:43 pm Michael S. Tsirkin wrote:
> > On Fri, Jun 04, 2010 at 12:04:57PM +0930, Rusty Russell wrote:
> > > On Wed, 2 Jun 2010 12:17:12 am Michael S. Tsirkin wrote:
> > > > This adds an (unused) option to put available ring before control (avail
> > > > index, flags), and adds padding between index and flags. This avoids
> > > > cache line sharing between control and ring, and also makes it possible
> > > > to extend avail control without incurring extra cache misses.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > No no no no.  254?  You're trying to Morton me![1]
> > 
> > Hmm, I wonder what will we do if we want a 3rd field on
> > a separate chacheline. But ok.
> > 
> > > How's this (untested):
> > 
> > I think we also want to put flags there as well,
> > they are used on interrupt path, together with last used index.
> 
> I'm uncomfortable with moving a field.
> 
> We haven't done that before and I wonder what will break with old code.

With e.g. my patch, We only do this conditionally when bit is negotitated.

> Should we instead just abandon the flags field and use last_used only?
> Or, more radically, put flags == last_used when the feature is on?
> 
> Thoughts?
> Rusty.

Hmm, e.g. with TX and virtio net, we almost never want interrupts,
whatever the index value.
Rusty Russell - June 5, 2010, 4:10 a.m.
On Fri, 4 Jun 2010 09:12:05 pm Michael S. Tsirkin wrote:
> On Fri, Jun 04, 2010 at 08:46:49PM +0930, Rusty Russell wrote:
> > I'm uncomfortable with moving a field.
> > 
> > We haven't done that before and I wonder what will break with old code.
> 
> With e.g. my patch, We only do this conditionally when bit is negotitated.

Of course, but see this change:

commit ef688e151c00e5d529703be9a04fd506df8bc54e
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Fri Jun 12 22:16:35 2009 -0600

    virtio: meet virtio spec by finalizing features before using device
    
    Virtio devices are supposed to negotiate features before they start using
    the device, but the current code doesn't do this.  This is because the
    driver's probe() function invariably has to add buffers to a virtqueue,
    or probe the disk (virtio_blk).
    
    This currently doesn't matter since no existing backend is strict about
    the feature negotiation.  But it's possible to imagine a future feature
    which completely changes how a device operates: in this case, we'd need
    to acknowledge it before using the device.
    
    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Now, this isn't impossible to overcome: we know that if they use the ring
before completing feature negotiation then they don't understand the new
format.

But we have to be aware of that on the qemu side.  Are we?

> > Should we instead just abandon the flags field and use last_used only?
> > Or, more radically, put flags == last_used when the feature is on?
> > 
> > Thoughts?
> > Rusty.
> 
> Hmm, e.g. with TX and virtio net, we almost never want interrupts,
> whatever the index value.

Good point.  OK, I give in, I'll take your patch which moves the fields
to the end.  Is that your preference?

Please be careful with the qemu side though...

It's not inconceivable that I'll write that virtio cacheline simulator this
(coming) week, too...

Thanks.
Rusty.
Michael S. Tsirkin - June 6, 2010, 9:11 a.m.
On Sat, Jun 05, 2010 at 01:40:26PM +0930, Rusty Russell wrote:
> On Fri, 4 Jun 2010 09:12:05 pm Michael S. Tsirkin wrote:
> > On Fri, Jun 04, 2010 at 08:46:49PM +0930, Rusty Russell wrote:
> > > I'm uncomfortable with moving a field.
> > > 
> > > We haven't done that before and I wonder what will break with old code.
> > 
> > With e.g. my patch, We only do this conditionally when bit is negotitated.
> 
> Of course, but see this change:
> 
> commit ef688e151c00e5d529703be9a04fd506df8bc54e
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date:   Fri Jun 12 22:16:35 2009 -0600
> 
>     virtio: meet virtio spec by finalizing features before using device
>     
>     Virtio devices are supposed to negotiate features before they start using
>     the device, but the current code doesn't do this.  This is because the
>     driver's probe() function invariably has to add buffers to a virtqueue,
>     or probe the disk (virtio_blk).
>     
>     This currently doesn't matter since no existing backend is strict about
>     the feature negotiation.  But it's possible to imagine a future feature
>     which completely changes how a device operates: in this case, we'd need
>     to acknowledge it before using the device.
>     
>     Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> Now, this isn't impossible to overcome: we know that if they use the ring
> before completing feature negotiation then they don't understand the new
> format.
> 
> But we have to be aware of that on the qemu side.  Are we?

I think we are ok. virtqueue_init which sets the avail/ysed pointers is
called when we write the base address.  So we only need to be careful
and not change this feature bit after creating the rings.


> > > Should we instead just abandon the flags field and use last_used only?
> > > Or, more radically, put flags == last_used when the feature is on?
> > > 
> > > Thoughts?
> > > Rusty.
> > 
> > Hmm, e.g. with TX and virtio net, we almost never want interrupts,
> > whatever the index value.
> 
> Good point.  OK, I give in, I'll take your patch which moves the fields
> to the end.  Is that your preference?

Yes, I think so.
You mean PATCHv3 unchanged with 254 byte padding?

> Please be careful with the qemu side though...
> 
> It's not inconceivable that I'll write that virtio cacheline simulator this
> (coming) week, too...
> 
> Thanks.
> Rusty.

Patch

diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -74,8 +74,8 @@  struct vring {
 /* The standard layout for the ring is a continuous chunk of memory which looks
  * like this.  We assume num is a power of 2.
  *
- * struct vring
- * {
+ * struct vring {
+ *	*** The driver writes to this part.
  *	// The actual descriptors (16 bytes each)
  *	struct vring_desc desc[num];
  *
@@ -84,9 +84,11 @@  struct vring {
  *	__u16 avail_idx;
  *	__u16 available[num];
  *
- *	// Padding to the next align boundary.
+ *	// Padding so used_flags is on the next align boundary.
  *	char pad[];
+ *	__u16 last_used; // On a cacheline of its own.
  *
+ *	*** The device writes to this part.
  *	// A ring of used descriptor heads with free-running index.
  *	__u16 used_flags;
  *	__u16 used_idx;
@@ -110,6 +112,12 @@  static inline unsigned vring_size(unsign
 		+ sizeof(__u16) * 2 + sizeof(struct vring_used_elem) * num;
 }
 
+/* Last used index sits at the very end of the driver part of the struct */
+static inline __u16 *vring_last_used_idx(const struct vring *vr)
+{
+	return (__u16 *)vr->used - 1;
+}
+
 #ifdef __KERNEL__
 #include <linux/irqreturn.h>
 struct virtio_device;