diff mbox

[1/6] virtio: introduce virtio_map

Message ID 1445935663-31971-2-git-send-email-mst@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin Oct. 27, 2015, 8:47 a.m. UTC
virtio_map_sg currently fails if one of the entries it's mapping is
contigious in GPA but not HVA address space.  Introduce virtio_map which
handles this by splitting sg entries.

This new API generally turns out to be a good idea since it's harder to
misuse: at least in one case the existing one was used incorrectly.

This will still fail if there's no space left in the sg, but luckily max
queue size in use is currently 256, while max sg size is 1024, so we
should be OK even is all entries happen to cross a single DIMM boundary.

Won't work well with very small DIMM sizes, unfortunately:
e.g. this will fail with 4K DIMMs where a single
request might span a large number of DIMMs.

Let's hope these are uncommon - at least we are not breaking things.

Note: virtio-scsi calls virtio_map_sg on data loaded from network, and
validates input, asserting on failure.  Copy the validating code here -
it will be dropped from virtio-scsi in a follow-up patch.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 include/hw/virtio/virtio.h |  1 +
 hw/virtio/virtio.c         | 56 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 48 insertions(+), 9 deletions(-)

Comments

Stefan Hajnoczi Oct. 27, 2015, 4:13 p.m. UTC | #1
On Tue, Oct 27, 2015 at 10:47:56AM +0200, Michael S. Tsirkin wrote:
> +    for (i = 0; i < *num_sg; i++) {
>          len = sg[i].iov_len;
>          sg[i].iov_base = cpu_physical_memory_map(addr[i], &len, is_write);
> -        if (sg[i].iov_base == NULL || len != sg[i].iov_len) {
> +        if (!sg[i].iov_base) {
>              error_report("virtio: error trying to map MMIO memory");
>              exit(1);
>          }
> +        if (len == sg[i].iov_len) {
> +            continue;
> +        }
> +        if (*num_sg >= max_size) {
> +            error_report("virtio: memory split makes iovec too large");
> +            exit(1);
> +        }
> +        memcpy(sg + i + 1, sg + i, sizeof(*sg) * (*num_sg - i));
> +        memcpy(addr + i + 1, addr + i, sizeof(*addr) * (*num_sg - i));

These should be memmove() since memcpy() arguments are not allowed to overlap.
Stefan Hajnoczi Oct. 27, 2015, 4:19 p.m. UTC | #2
On Tue, Oct 27, 2015 at 10:47:56AM +0200, Michael S. Tsirkin wrote:
> This will still fail if there's no space left in the sg, but luckily max
> queue size in use is currently 256, while max sg size is 1024, so we
> should be OK even is all entries happen to cross a single DIMM boundary.

Don't forget about indirect descriptors.  They can use all 1024 iovecs,
regardless of virtqueue size, so virtqueue size of 256 isn't the true
maximum.

I'm worried that we could now see failures due to non-contiguous HVAs.
Michael S. Tsirkin Oct. 27, 2015, 6:34 p.m. UTC | #3
On Tue, Oct 27, 2015 at 04:19:54PM +0000, Stefan Hajnoczi wrote:
> On Tue, Oct 27, 2015 at 10:47:56AM +0200, Michael S. Tsirkin wrote:
> > This will still fail if there's no space left in the sg, but luckily max
> > queue size in use is currently 256, while max sg size is 1024, so we
> > should be OK even is all entries happen to cross a single DIMM boundary.
> 
> Don't forget about indirect descriptors.  They can use all 1024 iovecs,
> regardless of virtqueue size, so virtqueue size of 256 isn't the true
> maximum.

Not according to the spec - virtio spec says vq size is the maximum size
of a chain.

> I'm worried that we could now see failures due to non-contiguous HVAs.

Does linux guest create chains > vq size then? Does it actually
have 1024 hardcoded somewhere?
Michael S. Tsirkin Oct. 27, 2015, 6:38 p.m. UTC | #4
On Tue, Oct 27, 2015 at 04:13:12PM +0000, Stefan Hajnoczi wrote:
> On Tue, Oct 27, 2015 at 10:47:56AM +0200, Michael S. Tsirkin wrote:
> > +    for (i = 0; i < *num_sg; i++) {
> >          len = sg[i].iov_len;
> >          sg[i].iov_base = cpu_physical_memory_map(addr[i], &len, is_write);
> > -        if (sg[i].iov_base == NULL || len != sg[i].iov_len) {
> > +        if (!sg[i].iov_base) {
> >              error_report("virtio: error trying to map MMIO memory");
> >              exit(1);
> >          }
> > +        if (len == sg[i].iov_len) {
> > +            continue;
> > +        }
> > +        if (*num_sg >= max_size) {
> > +            error_report("virtio: memory split makes iovec too large");
> > +            exit(1);
> > +        }
> > +        memcpy(sg + i + 1, sg + i, sizeof(*sg) * (*num_sg - i));
> > +        memcpy(addr + i + 1, addr + i, sizeof(*addr) * (*num_sg - i));
> 
> These should be memmove() since memcpy() arguments are not allowed to overlap.

Oh right. Will fix, thanks!
Stefan Hajnoczi Oct. 28, 2015, 11:37 a.m. UTC | #5
On Tue, Oct 27, 2015 at 08:34:32PM +0200, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2015 at 04:19:54PM +0000, Stefan Hajnoczi wrote:
> > On Tue, Oct 27, 2015 at 10:47:56AM +0200, Michael S. Tsirkin wrote:
> > > This will still fail if there's no space left in the sg, but luckily max
> > > queue size in use is currently 256, while max sg size is 1024, so we
> > > should be OK even is all entries happen to cross a single DIMM boundary.
> > 
> > Don't forget about indirect descriptors.  They can use all 1024 iovecs,
> > regardless of virtqueue size, so virtqueue size of 256 isn't the true
> > maximum.
> 
> Not according to the spec - virtio spec says vq size is the maximum size
> of a chain.
> 
> > I'm worried that we could now see failures due to non-contiguous HVAs.
> 
> Does linux guest create chains > vq size then? Does it actually
> have 1024 hardcoded somewhere?

You are correct, drivers/virtio/virtio_ring.c:virtqueue_add() says:

  BUG_ON(total_sg > vq->vring.num);

This also makes sense since it means there is a well-known maximum size
for indirect descriptor tables.

So this fix should work fine with indirect descriptors.

Stefan
Igor Mammedov Oct. 28, 2015, 1:11 p.m. UTC | #6
On Tue, 27 Oct 2015 10:47:56 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> virtio_map_sg currently fails if one of the entries it's mapping is
> contigious in GPA but not HVA address space.  Introduce virtio_map which
> handles this by splitting sg entries.
> 
> This new API generally turns out to be a good idea since it's harder to
> misuse: at least in one case the existing one was used incorrectly.
> 
> This will still fail if there's no space left in the sg, but luckily max
> queue size in use is currently 256, while max sg size is 1024, so we
> should be OK even is all entries happen to cross a single DIMM boundary.
> 
> Won't work well with very small DIMM sizes, unfortunately:
> e.g. this will fail with 4K DIMMs where a single
> request might span a large number of DIMMs.
> 
> Let's hope these are uncommon - at least we are not breaking things.
> 
> Note: virtio-scsi calls virtio_map_sg on data loaded from network, and
> validates input, asserting on failure.  Copy the validating code here -
> it will be dropped from virtio-scsi in a follow-up patch.
> 
> Reported-by: Igor Mammedov <imammedo@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

With fixup you've posted and build fix in this thread:
Reviewed-by: Igor Mammedov <imammedo@redhat.com>

> ---
>  include/hw/virtio/virtio.h |  1 +
>  hw/virtio/virtio.c         | 56 ++++++++++++++++++++++++++++++++++++++--------
>  2 files changed, 48 insertions(+), 9 deletions(-)
> 
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 9d09115..9d9abb4 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -153,6 +153,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
>  
>  void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
>      size_t num_sg, int is_write);
> +void virtqueue_map(VirtQueueElement *elem);
>  int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem);
>  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>                            unsigned int out_bytes);
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index d0bc72e..a6878c0 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -448,28 +448,66 @@ int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
>      return in_bytes <= in_total && out_bytes <= out_total;
>  }
>  
> -void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
> -    size_t num_sg, int is_write)
> +static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
> +                                size_t *num_sg, size_t max_size,
> +                                int is_write)
>  {
>      unsigned int i;
>      hwaddr len;
>  
> -    if (num_sg > VIRTQUEUE_MAX_SIZE) {
> -        error_report("virtio: map attempt out of bounds: %zd > %d",
> -                     num_sg, VIRTQUEUE_MAX_SIZE);
> -        exit(1);
> -    }
> +    /* Note: this function MUST validate input, some callers
> +     * are passing in num_sg values received over the network.
> +     */
> +    /* TODO: teach all callers that this can fail, and return failure instead
> +     * of asserting here.
> +     * When we do, we might be able to re-enable NDEBUG below.
> +     */
> +#ifdef NDEBUG
> +#error building with NDEBUG is not supported
> +#endif
> +    assert(*num_sg <= max_size);
>  
> -    for (i = 0; i < num_sg; i++) {
> +    for (i = 0; i < *num_sg; i++) {
>          len = sg[i].iov_len;
>          sg[i].iov_base = cpu_physical_memory_map(addr[i], &len, is_write);
> -        if (sg[i].iov_base == NULL || len != sg[i].iov_len) {
> +        if (!sg[i].iov_base) {
>              error_report("virtio: error trying to map MMIO memory");
>              exit(1);
>          }
> +        if (len == sg[i].iov_len) {
> +            continue;
> +        }
> +        if (*num_sg >= max_size) {
> +            error_report("virtio: memory split makes iovec too large");
> +            exit(1);
> +        }
> +        memcpy(sg + i + 1, sg + i, sizeof(*sg) * (*num_sg - i));
> +        memcpy(addr + i + 1, addr + i, sizeof(*addr) * (*num_sg - i));
> +        assert(len < sg[i + 1].iov_len);
> +        sg[i].iov_len = len;
> +        addr[i + 1] += len;
> +        sg[i + 1].iov_len -= len;
> +        ++*num_sg;
>      }
>  }
>  
> +/* Deprecated: don't use in new code */
> +void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
> +                      size_t num_sg, int is_write)
> +{
> +    virtqueue_map_iovec(sg, addr, &num_sg, num_sg, is_write);
> +}
> +
> +void virtqueue_map(VirtQueueElement *elem)
> +{
> +    virtqueue_map_iovec(elem->in_sg, elem->in_addr, &elem->in_num,
> +                        MIN(ARRAY_SIZE(elem->in_sg), ARRAY_SIZE(elem->in_addr)),
> +                        1);
> +    virtqueue_map_iovec(elem->out_sg, elem->out_addr, &elem->out_num,
> +                        MIN(ARRAY_SIZE(elem->out_sg), ARRAY_SIZE(elem->out_addr)),
> +                        0);
> +}
> +
>  int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
>  {
>      unsigned int i, head, max;
diff mbox

Patch

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9d09115..9d9abb4 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -153,6 +153,7 @@  void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 
 void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
     size_t num_sg, int is_write);
+void virtqueue_map(VirtQueueElement *elem);
 int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem);
 int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
                           unsigned int out_bytes);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d0bc72e..a6878c0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -448,28 +448,66 @@  int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes,
     return in_bytes <= in_total && out_bytes <= out_total;
 }
 
-void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
-    size_t num_sg, int is_write)
+static void virtqueue_map_iovec(struct iovec *sg, hwaddr *addr,
+                                size_t *num_sg, size_t max_size,
+                                int is_write)
 {
     unsigned int i;
     hwaddr len;
 
-    if (num_sg > VIRTQUEUE_MAX_SIZE) {
-        error_report("virtio: map attempt out of bounds: %zd > %d",
-                     num_sg, VIRTQUEUE_MAX_SIZE);
-        exit(1);
-    }
+    /* Note: this function MUST validate input, some callers
+     * are passing in num_sg values received over the network.
+     */
+    /* TODO: teach all callers that this can fail, and return failure instead
+     * of asserting here.
+     * When we do, we might be able to re-enable NDEBUG below.
+     */
+#ifdef NDEBUG
+#error building with NDEBUG is not supported
+#endif
+    assert(*num_sg <= max_size);
 
-    for (i = 0; i < num_sg; i++) {
+    for (i = 0; i < *num_sg; i++) {
         len = sg[i].iov_len;
         sg[i].iov_base = cpu_physical_memory_map(addr[i], &len, is_write);
-        if (sg[i].iov_base == NULL || len != sg[i].iov_len) {
+        if (!sg[i].iov_base) {
             error_report("virtio: error trying to map MMIO memory");
             exit(1);
         }
+        if (len == sg[i].iov_len) {
+            continue;
+        }
+        if (*num_sg >= max_size) {
+            error_report("virtio: memory split makes iovec too large");
+            exit(1);
+        }
+        memcpy(sg + i + 1, sg + i, sizeof(*sg) * (*num_sg - i));
+        memcpy(addr + i + 1, addr + i, sizeof(*addr) * (*num_sg - i));
+        assert(len < sg[i + 1].iov_len);
+        sg[i].iov_len = len;
+        addr[i + 1] += len;
+        sg[i + 1].iov_len -= len;
+        ++*num_sg;
     }
 }
 
+/* Deprecated: don't use in new code */
+void virtqueue_map_sg(struct iovec *sg, hwaddr *addr,
+                      size_t num_sg, int is_write)
+{
+    virtqueue_map_iovec(sg, addr, &num_sg, num_sg, is_write);
+}
+
+void virtqueue_map(VirtQueueElement *elem)
+{
+    virtqueue_map_iovec(elem->in_sg, elem->in_addr, &elem->in_num,
+                        MIN(ARRAY_SIZE(elem->in_sg), ARRAY_SIZE(elem->in_addr)),
+                        1);
+    virtqueue_map_iovec(elem->out_sg, elem->out_addr, &elem->out_num,
+                        MIN(ARRAY_SIZE(elem->out_sg), ARRAY_SIZE(elem->out_addr)),
+                        0);
+}
+
 int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem)
 {
     unsigned int i, head, max;