diff mbox

[14/18] virtio: add api for delayed callbacks

Message ID 64d47c628b3fdc0ac156aed4be182933d8bcc0db.1304541919.git.mst@redhat.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Michael S. Tsirkin May 4, 2011, 8:52 p.m. UTC
Add an API that tells the other side that callbacks
should be delayed until a lot of work has been done.
Implement using the new used_event feature.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/virtio/virtio_ring.c |   27 +++++++++++++++++++++++++++
 include/linux/virtio.h       |    9 +++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

Comments

Rusty Russell May 9, 2011, 5:57 a.m. UTC | #1
On Wed, 4 May 2011 23:52:33 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> Add an API that tells the other side that callbacks
> should be delayed until a lot of work has been done.
> Implement using the new used_event feature.

Since you're going to add a capacity query anyway, why not add the
threshold argument here?

Then the caller can choose how much space it needs.  Maybe net and block
will want different things...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 15, 2011, 12:48 p.m. UTC | #2
On Mon, May 09, 2011 at 03:27:33PM +0930, Rusty Russell wrote:
> On Wed, 4 May 2011 23:52:33 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > Add an API that tells the other side that callbacks
> > should be delayed until a lot of work has been done.
> > Implement using the new used_event feature.
> 
> Since you're going to add a capacity query anyway, why not add the
> threshold argument here?

I thought that if we keep the API kind of generic
there might be more of a chance that future transports
will be able to implement it. For example, with an
old host we can't commit to a specific index.

> 
> Then the caller can choose how much space it needs.  Maybe net and block
> will want different things...
> 
> Cheers,
> Rusty.

Hmm, maybe. OK.
Rusty Russell May 16, 2011, 7:13 a.m. UTC | #3
On Sun, 15 May 2011 15:48:18 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, May 09, 2011 at 03:27:33PM +0930, Rusty Russell wrote:
> > On Wed, 4 May 2011 23:52:33 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > Add an API that tells the other side that callbacks
> > > should be delayed until a lot of work has been done.
> > > Implement using the new used_event feature.
> > 
> > Since you're going to add a capacity query anyway, why not add the
> > threshold argument here?
> 
> I thought that if we keep the API kind of generic
> there might be more of a chance that future transports
> will be able to implement it. For example, with an
> old host we can't commit to a specific index.

No, it's always a hint anyway: you can be notified before the threshold
is reached.  But best make it explicit I think.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin May 19, 2011, 7:24 a.m. UTC | #4
On Mon, May 16, 2011 at 04:43:21PM +0930, Rusty Russell wrote:
> On Sun, 15 May 2011 15:48:18 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, May 09, 2011 at 03:27:33PM +0930, Rusty Russell wrote:
> > > On Wed, 4 May 2011 23:52:33 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > Add an API that tells the other side that callbacks
> > > > should be delayed until a lot of work has been done.
> > > > Implement using the new used_event feature.
> > > 
> > > Since you're going to add a capacity query anyway, why not add the
> > > threshold argument here?
> > 
> > I thought that if we keep the API kind of generic
> > there might be more of a chance that future transports
> > will be able to implement it. For example, with an
> > old host we can't commit to a specific index.
> 
> No, it's always a hint anyway: you can be notified before the threshold
> is reached.  But best make it explicit I think.
> 
> Cheers,
> Rusty.


I tried doing that and remembered the real reason I went for this API:

capacity is limited by descriptor table space, not
used ring space: each entry in the used ring frees up multiple entries
in the descriptor ring. Thus the ring can't provide
callback after capacity is N: capacity is only available
after we get bufs.

We could try and make the API pass in the number of freed bufs, however:
- this is not really what virtio-net cares about (it cares about
  capacity)
- if the driver passes a number > number of outstanding bufs, it will
  never get a callback. So to stay correct the driver will need to
  track number of outstanding requests. The simpler API avoids that. 


APIs are easy to change so I'm guessing it's not a major blocker:
we can change later when e.g. block tries to
pass in some kind of extra hint: we'll be smarter
about how this API can change then.

Right?
Rusty Russell May 20, 2011, 7:43 a.m. UTC | #5
On Thu, 19 May 2011 10:24:12 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, May 16, 2011 at 04:43:21PM +0930, Rusty Russell wrote:
> > On Sun, 15 May 2011 15:48:18 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, May 09, 2011 at 03:27:33PM +0930, Rusty Russell wrote:
> > > > On Wed, 4 May 2011 23:52:33 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > Add an API that tells the other side that callbacks
> > > > > should be delayed until a lot of work has been done.
> > > > > Implement using the new used_event feature.
> > > > 
> > > > Since you're going to add a capacity query anyway, why not add the
> > > > threshold argument here?
> > > 
> > > I thought that if we keep the API kind of generic
> > > there might be more of a chance that future transports
> > > will be able to implement it. For example, with an
> > > old host we can't commit to a specific index.
> > 
> > No, it's always a hint anyway: you can be notified before the threshold
> > is reached.  But best make it explicit I think.
> > 
> > Cheers,
> > Rusty.
> 
> 
> I tried doing that and remembered the real reason I went for this API:
> 
> capacity is limited by descriptor table space, not
> used ring space: each entry in the used ring frees up multiple entries
> in the descriptor ring. Thus the ring can't provide
> callback after capacity is N: capacity is only available
> after we get bufs.

I think this indicates a problem, but I haven't reviewed your patches
except very cursorily.  I am slack...

> We could try and make the API pass in the number of freed bufs, however:
> - this is not really what virtio-net cares about (it cares about
>   capacity)

Yes, max sg elements and max requests are separate, though in the
current virtio implementation remaining sg <= remaining request slots.

That's why we currently feed back remaining descriptors to the driver,
not the number of request slots.

This implies that the thresholds should refer to descriptor numbers
(ie. wake me when there are this many descriptors freed), not the used
ring at all.  Which means we're barking up the wrong tree...

I think this needs more thought.

> - if the driver passes a number > number of outstanding bufs, it will
>   never get a callback. So to stay correct the driver will need to
>   track number of outstanding requests. The simpler API avoids that. 

I think the driver should simply say "wake me when you have this many
descriptors free".  And set it during initialization, rather than every
time.  The virtio_ring code should handle it from there.

Perhaps that can be done with the current technique, where the
virtio_ring makes an educated guess on when sufficient capacity will be
available...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 262dfe6..3a70d70 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -397,6 +397,33 @@  bool virtqueue_enable_cb(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb);
 
+bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	int bufs;
+
+	START_USE(vq);
+
+	/* We optimistically turn back on interrupts, then check if there was
+	 * more to do. */
+	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
+	 * either clear the flags bit or point the event index at the next
+	 * entry. Always do both to keep code simple. */
+	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+	/* TODO: tune this threshold */
+	bufs = (vq->vring.avail->idx - vq->last_used_idx) * 3 / 4;
+	vring_used_event(&vq->vring) = vq->last_used_idx + bufs;
+	virtio_mb();
+	if (unlikely(vq->vring.used->idx - vq->last_used_idx > bufs)) {
+		END_USE(vq);
+		return false;
+	}
+
+	END_USE(vq);
+	return true;
+}
+EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
+
 void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 718336b..5151fd1 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -51,6 +51,13 @@  struct virtqueue {
  *	This re-enables callbacks; it returns "false" if there are pending
  *	buffers in the queue, to detect a possible race between the driver
  *	checking for more work, and enabling callbacks.
+ * virtqueue_enable_cb_delayed: restart callbacks after disable_cb.
+ *	vq: the struct virtqueue we're talking about.
+ *	This re-enables callbacks but hints to the other side to delay
+ *	interrupts until most of the available buffers have been processed;
+ *	it returns "false" if there are many pending buffers in the queue,
+ *	to detect a possible race between the driver checking for more work,
+ *	and enabling callbacks.
  * virtqueue_detach_unused_buf: detach first unused buffer
  * 	vq: the struct virtqueue we're talking about.
  * 	Returns NULL or the "data" token handed to add_buf
@@ -86,6 +93,8 @@  void virtqueue_disable_cb(struct virtqueue *vq);
 
 bool virtqueue_enable_cb(struct virtqueue *vq);
 
+bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
+
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
 
 /**