diff mbox

[RFC] virtio: add virtqueue_fill_partial

Message ID 1430144304-13514-1-git-send-email-mst@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin April 27, 2015, 2:18 p.m. UTC
On error, virtio blk dirties guest memory but doesn't want to tell guest
about it. Add virtqueue_fill_partial to cover this use case.  This gets
two parameters: host_len is >= the amount of guest memory actually
written, guest_len is how much we guarantee to guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio.h |  3 +++
 hw/virtio/virtio.c         | 25 ++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 5 deletions(-)

Comments

Paolo Bonzini April 27, 2015, 2:24 p.m. UTC | #1
On 27/04/2015 16:18, Michael S. Tsirkin wrote:
> +/*
> + * Some devices dirty guest memory but don't want to tell guest about it. In
> + * that case, use virtqueue_fill_partial: host_len is >= the amount of guest
> + * memory actually written, guest_len is how much we guarantee to guest.
> + * If you know exactly how much was written, use virtqueue_fill instead.

"If you never leave holes unwritten in the iov, use virtqueue_fill
instead.  If the guest is not relying on iov boundaries, it should never
be necessary to use this function."

The API looks okay though.

Paolo

> + */
> +void virtqueue_fill_partial(VirtQueue *vq, const VirtQueueElement *elem,
> +                            unsigned int host_len, unsigned int guest_len,
> +                            unsigned int idx)
>  {
>      unsigned int offset;
>      int i;
>  
> -    trace_virtqueue_fill(vq, elem, len, idx);
> +    assert(host_len >= guest_len);
> +
> +    trace_virtqueue_fill(vq, elem, guest_len, idx);
>  
>      offset = 0;
>      for (i = 0; i < elem->in_num; i++) {
> -        size_t size = MIN(len - offset, elem->in_sg[i].iov_len);
> +        size_t size = MIN(host_len - offset, elem->in_sg[i].iov_len);
>  
>          cpu_physical_memory_unmap(elem->in_sg[i].iov_base,
>                                    elem->in_sg[i].iov_len,
> @@ -269,7 +278,13 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
>  
>      /* Get a pointer to the next entry in the used ring. */
>      vring_used_ring_id(vq, idx, elem->index);
> -    vring_used_ring_len(vq, idx, len);
> +    vring_used_ring_len(vq, idx, guest_len);
> +}
> +
Michael S. Tsirkin April 27, 2015, 2:38 p.m. UTC | #2
On Mon, Apr 27, 2015 at 04:24:31PM +0200, Paolo Bonzini wrote:
> 
> 
> On 27/04/2015 16:18, Michael S. Tsirkin wrote:
> > +/*
> > + * Some devices dirty guest memory but don't want to tell guest about it. In
> > + * that case, use virtqueue_fill_partial: host_len is >= the amount of guest
> > + * memory actually written, guest_len is how much we guarantee to guest.
> > + * If you know exactly how much was written, use virtqueue_fill instead.
> 
> "If you never leave holes unwritten in the iov, use virtqueue_fill
> instead.

Is it just holes? Never data you are unsure about?

>  If the guest is not relying on iov boundaries, it should never
> be necessary to use this function."

You never know what guest does though, do you?

> The API looks okay though.
> 
> Paolo
> 
> > + */
> > +void virtqueue_fill_partial(VirtQueue *vq, const VirtQueueElement *elem,
> > +                            unsigned int host_len, unsigned int guest_len,
> > +                            unsigned int idx)
> >  {
> >      unsigned int offset;
> >      int i;
> >  
> > -    trace_virtqueue_fill(vq, elem, len, idx);
> > +    assert(host_len >= guest_len);
> > +
> > +    trace_virtqueue_fill(vq, elem, guest_len, idx);
> >  
> >      offset = 0;
> >      for (i = 0; i < elem->in_num; i++) {
> > -        size_t size = MIN(len - offset, elem->in_sg[i].iov_len);
> > +        size_t size = MIN(host_len - offset, elem->in_sg[i].iov_len);
> >  
> >          cpu_physical_memory_unmap(elem->in_sg[i].iov_base,
> >                                    elem->in_sg[i].iov_len,
> > @@ -269,7 +278,13 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
> >  
> >      /* Get a pointer to the next entry in the used ring. */
> >      vring_used_ring_id(vq, idx, elem->index);
> > -    vring_used_ring_len(vq, idx, len);
> > +    vring_used_ring_len(vq, idx, guest_len);
> > +}
> > +
Paolo Bonzini April 27, 2015, 2:58 p.m. UTC | #3
On 27/04/2015 16:38, Michael S. Tsirkin wrote:
> > "If you never leave holes unwritten in the iov, use virtqueue_fill
> > instead.
> 
> Is it just holes? Never data you are unsure about?

Then "unwritten (or possibly unwritten)"?

If the data you are unsure is at the end, it's okay.  This is what
virtio-scsi does when it has a failure (the "!= VIRTIO_SCSI_S_OK" case).

> >  If the guest is not relying on iov boundaries, it should never
> > be necessary to use this function."
> 
> You never know what guest does though, do you?

I think that this sentence is wrong, but only because you cannot know
the format of _all_ request headers/footers and thus it's wrong to make
such a sweeping generalization.

Anyhow, I'll describe what I had in mind.  Consider the case I gave
before, where you have a 1 sector (512-bytes) read.  Consider now two
guests, both of them passing 2049 bytes for the device-writable area.

Guest 1 doesn't have anylayout; it placed 2048 bytes in iovec 0 and 1
byte in iovec 1.  The device thus has to write bytes 0-511 and byte
2048.  In this case host_len = 2049, guest_len = 512.

Guest 2 has anylayout (or uses virtio v1).  It placed 2048 bytes in
iovec 0 and 1 byte in iovec 1.  The device however "knows" that the read
is just 512 bytes in length; it writes bytes 0-512 (the last byte is the
status) and ignores bytes 513-2048.  In this case host_len = guest_len =
513.

However, there could be a legitimate reason for a device to leave some
bytes uninitialized in the middle, possibly for optimization reasons.
In this case, you might have to use virtqueue_fill_partial.

...

Oh wait, that's exactly what happens with virtio-scsi!

sense_len in the response header is usually zero, and in that case we
don't write the sense[] array.  So, I guess virtio-scsi in the common
case should return just the offset up to sense_len?

Paolo
Michael S. Tsirkin April 27, 2015, 4:13 p.m. UTC | #4
On Mon, Apr 27, 2015 at 04:58:37PM +0200, Paolo Bonzini wrote:
> On 27/04/2015 16:38, Michael S. Tsirkin wrote:
> > > "If you never leave holes unwritten in the iov, use virtqueue_fill
> > > instead.
> > 
> > Is it just holes? Never data you are unsure about?
> 
> Then "unwritten (or possibly unwritten)"?
> 
> If the data you are unsure is at the end, it's okay.  This is what
> virtio-scsi does when it has a failure (the "!= VIRTIO_SCSI_S_OK" case).
> 
> > >  If the guest is not relying on iov boundaries, it should never
> > > be necessary to use this function."
> > 
> > You never know what guest does though, do you?
> 
> I think that this sentence is wrong, but only because you cannot know
> the format of _all_ request headers/footers and thus it's wrong to make
> such a sweeping generalization.
> 
> Anyhow, I'll describe what I had in mind.  Consider the case I gave
> before, where you have a 1 sector (512-bytes) read.  Consider now two
> guests, both of them passing 2049 bytes for the device-writable area.
> 
> Guest 1 doesn't have anylayout; it placed 2048 bytes in iovec 0 and 1
> byte in iovec 1.  The device thus has to write bytes 0-511 and byte
> 2048.  In this case host_len = 2049, guest_len = 512.
> 
> Guest 2 has anylayout (or uses virtio v1).  It placed 2048 bytes in
> iovec 0 and 1 byte in iovec 1.  The device however "knows" that the read
> is just 512 bytes in length; it writes bytes 0-512 (the last byte is the
> status) and ignores bytes 513-2048.  In this case host_len = guest_len =
> 513.
> 
> However, there could be a legitimate reason for a device to leave some
> bytes uninitialized in the middle, possibly for optimization reasons.
> In this case, you might have to use virtqueue_fill_partial.
> 
> ...
> 
> Oh wait, that's exactly what happens with virtio-scsi!
> 
> sense_len in the response header is usually zero, and in that case we
> don't write the sense[] array.  So, I guess virtio-scsi in the common
> case should return just the offset up to sense_len?
> 
> Paolo

So it appears.
diff mbox

Patch

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index e3adb1d..9957aae 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -135,6 +135,9 @@  void virtio_del_queue(VirtIODevice *vdev, int n);
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len);
 void virtqueue_flush(VirtQueue *vq, unsigned int count);
+void virtqueue_fill_partial(VirtQueue *vq, const VirtQueueElement *elem,
+                            unsigned int host_len, unsigned int guest_len,
+                            unsigned int idx);
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
                     unsigned int len, unsigned int idx);
 
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 159e5c6..111b0db 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -241,17 +241,26 @@  int virtio_queue_empty(VirtQueue *vq)
     return vring_avail_idx(vq) == vq->last_avail_idx;
 }
 
-void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
-                    unsigned int len, unsigned int idx)
+/*
+ * Some devices dirty guest memory but don't want to tell guest about it. In
+ * that case, use virtqueue_fill_partial: host_len is >= the amount of guest
+ * memory actually written, guest_len is how much we guarantee to guest.
+ * If you know exactly how much was written, use virtqueue_fill instead.
+ */
+void virtqueue_fill_partial(VirtQueue *vq, const VirtQueueElement *elem,
+                            unsigned int host_len, unsigned int guest_len,
+                            unsigned int idx)
 {
     unsigned int offset;
     int i;
 
-    trace_virtqueue_fill(vq, elem, len, idx);
+    assert(host_len >= guest_len);
+
+    trace_virtqueue_fill(vq, elem, guest_len, idx);
 
     offset = 0;
     for (i = 0; i < elem->in_num; i++) {
-        size_t size = MIN(len - offset, elem->in_sg[i].iov_len);
+        size_t size = MIN(host_len - offset, elem->in_sg[i].iov_len);
 
         cpu_physical_memory_unmap(elem->in_sg[i].iov_base,
                                   elem->in_sg[i].iov_len,
@@ -269,7 +278,13 @@  void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 
     /* Get a pointer to the next entry in the used ring. */
     vring_used_ring_id(vq, idx, elem->index);
-    vring_used_ring_len(vq, idx, len);
+    vring_used_ring_len(vq, idx, guest_len);
+}
+
+void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
+                    unsigned int len, unsigned int idx)
+{
+    virtqueue_fill_partial(vq, elem, len, len, idx);
 }
 
 void virtqueue_flush(VirtQueue *vq, unsigned int count)