diff mbox

[2/3] virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)

Message ID 200905171134.31285.rusty@rustcorp.com.au
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Rusty Russell May 17, 2009, 2:04 a.m. UTC
On Tue, 12 May 2009 02:40:38 am Mark McLoughlin wrote:
> Still have one FIXME in the patch worth looking at - at what point
> should we use an indirect entry rather than consuming N entries?

Is this overkill?

Rusty.

virtio: use indirect buffers based on demand (heuristic)

virtio_ring uses a ring buffer of descriptors: indirect support allows
a single descriptor to refer to a table of descriptors.  This saves
space in the ring, but requires a kmalloc/kfree.

Rather than try to figure out what the right threshold at which to use
indirect buffers, we drop the threshold dynamically when the ring is
under stress.

Note: to stress this, I reduced the ring size to 32 in lguest, and a
1G send reduced the threshold to 9.

Note2: I moved the BUG_ON()s above the indirect test, where they belong
(indirect falls thru on OOM, so the constraints still apply).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Avi Kivity May 17, 2009, 6:27 a.m. UTC | #1
Rusty Russell wrote:
>  
> +static void adjust_threshold(struct vring_virtqueue *vq,
> +			     unsigned int out, unsigned int in)
> +{
> +	/* There are really two species of virtqueue, and it matters here.
> +	 * If there are no output parts, it's a "normally full" receive queue,
> +	 * otherwise it's a "normally empty" send queue. */
>   

This comment is true for networking, but not for block. ++overkill with 
a ->adjust_threshold op.
Rusty Russell May 17, 2009, 2:16 p.m. UTC | #2
On Sun, 17 May 2009 03:57:01 pm Avi Kivity wrote:
> Rusty Russell wrote:
> > +static void adjust_threshold(struct vring_virtqueue *vq,
> > +			     unsigned int out, unsigned int in)
> > +{
> > +	/* There are really two species of virtqueue, and it matters here.
> > +	 * If there are no output parts, it's a "normally full" receive queue,
> > +	 * otherwise it's a "normally empty" send queue. */
>
> This comment is true for networking, but not for block. ++overkill with
> a ->adjust_threshold op.

No, it's true for block.  It has output parts, so we should reduce threshold 
when it's full.  Network recvq is an example which should reduce threshold 
when it's empty.

->adjust_threshold is better as an arg to vring_new_virtqueue, but it's still 
not clear what the answer would be.

Rusty.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity May 17, 2009, 3:05 p.m. UTC | #3
Rusty Russell wrote:
> On Sun, 17 May 2009 03:57:01 pm Avi Kivity wrote:
>   
>> Rusty Russell wrote:
>>     
>>> +static void adjust_threshold(struct vring_virtqueue *vq,
>>> +			     unsigned int out, unsigned int in)
>>> +{
>>> +	/* There are really two species of virtqueue, and it matters here.
>>> +	 * If there are no output parts, it's a "normally full" receive queue,
>>> +	 * otherwise it's a "normally empty" send queue. */
>>>       
>> This comment is true for networking, but not for block. ++overkill with
>> a ->adjust_threshold op.
>>     
>
> No, it's true for block.  It has output parts, so we should reduce threshold 
> when it's full.  Network recvq is an example which should reduce threshold 
> when it's empty.
>   

You mean the header that contains the sector number?  It's a little 
incidental, but I guess it works.
Rusty Russell May 19, 2009, 8:15 a.m. UTC | #4
On Mon, 18 May 2009 12:35:39 am Avi Kivity wrote:
> Rusty Russell wrote:
> > No, it's true for block.  It has output parts, so we should reduce
> > threshold when it's full.  Network recvq is an example which should
> > reduce threshold when it's empty.
>
> You mean the header that contains the sector number?  It's a little
> incidental, but I guess it works.

Yes, the one which breaks is randomness.  You really do just throw buffers with 
no metadata in them and the request is implied. If/when we care, we can add a 
hint flag.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -63,6 +63,8 @@  struct vring_virtqueue
 
 	/* Host supports indirect buffers */
 	bool indirect;
+	/* Threshold before we go indirect. */
+	unsigned int indirect_threshold;
 
 	/* Number of free buffers */
 	unsigned int num_free;
@@ -137,6 +141,32 @@  static int vring_add_indirect(struct vri
 	return head;
 }
 
+static void adjust_threshold(struct vring_virtqueue *vq,
+			     unsigned int out, unsigned int in)
+{
+	/* There are really two species of virtqueue, and it matters here.
+	 * If there are no output parts, it's a "normally full" receive queue,
+	 * otherwise it's a "normally empty" send queue. */
+	if (out) {
+		/* Leave threshold unless we're full. */
+		if (out + in < vq->num_free)
+			return;
+	} else {
+		/* Leave threshold unless we're empty. */
+		if (vq->num_free != vq->vring.num)
+			return;
+	}
+
+	/* Never drop threshold below 1 */
+	vq->indirect_threshold /= 2;
+	vq->indirect_threshold |= 1;
+
+	printk("%s %s: indirect threshold %u (%u+%u vs %u)\n", 
+	       dev_name(&vq->vq.vdev->dev),
+	       vq->vq.name, vq->indirect_threshold,
+	       out, in, vq->num_free);
+}
+
 static int vring_add_buf(struct virtqueue *_vq,
 			 struct scatterlist sg[],
 			 unsigned int out,
@@ -149,18 +179,31 @@  static int vring_add_buf(struct virtqueu
 	START_USE(vq);
 
 	BUG_ON(data == NULL);
-
-	/* If the host supports indirect descriptor tables, and we have multiple
-	 * buffers, then go indirect. FIXME: tune this threshold */
-	if (vq->indirect && (out + in) > 1 && vq->num_free) {
-		head = vring_add_indirect(vq, sg, out, in);
-		if (head != vq->vring.num)
-			goto add_head;
-	}
-
 	BUG_ON(out + in > vq->vring.num);
 	BUG_ON(out + in == 0);
 
+	/* If the host supports indirect descriptor tables, consider it. */
+	if (vq->indirect) {
+		bool try_indirect;
+
+		/* We tweak the threshold automatically. */
+		adjust_threshold(vq, out, in);
+
+		/* If we can't fit any at all, fall through. */
+		if (vq->num_free == 0)
+			try_indirect = false;
+		else if (out + in > vq->num_free)
+			try_indirect = true;
+		else
+			try_indirect = (out + in > vq->indirect_threshold);
+
+		if (try_indirect) {
+			head = vring_add_indirect(vq, sg, out, in);
+			if (head != vq->vring.num)
+				goto add_head;
+		}
+	}
+
 	if (vq->num_free < out + in) {
 		pr_debug("Can't add buf len %i - avail = %i\n",
 			 out + in, vq->num_free);
@@ -391,6 +434,7 @@  struct virtqueue *vring_new_virtqueue(un
 #endif
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+	vq->indirect_threshold = num;
 
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback)