Patchwork [v3,09/12] iov: add iov_get_ptr() to reference vector data

login
register
mail settings
Submitter Stefan Hajnoczi
Date Nov. 21, 2012, 6:32 p.m.
Message ID <1353522781-12721-10-git-send-email-stefanha@redhat.com>
Download mbox | patch
Permalink /patch/200830/
State New
Headers show

Comments

Stefan Hajnoczi - Nov. 21, 2012, 6:32 p.m.
The iov_get_ptr() data returns a pointer to contiguous data within a
vector.  This allows the caller to manipulate data inside the vector
without copying in/out using iov_from_buf()/iov_to_buf() when we know
that data is contiguous within an iovec element.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 iov.c | 25 +++++++++++++++++++++++++
 iov.h | 11 +++++++++++
 2 files changed, 36 insertions(+)
Paolo Bonzini - Nov. 22, 2012, 9:34 a.m.
Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
> The iov_get_ptr() data returns a pointer to contiguous data within a
> vector.  This allows the caller to manipulate data inside the vector
> without copying in/out using iov_from_buf()/iov_to_buf() when we know
> that data is contiguous within an iovec element.

This works for you because you have a single byte to write.  It would
not work for the SG_IO inhdr, which would need iov_to_buf().

What about the following alternative API:

void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
                  ssize_t offset, size_t *bytes);

which would place the number of valid bytes (i.e. the length of the
remainder of the iovec entry) in *bytes?

Also, I think that offset == iov_size(iov, iov_cnt) should be
acceptable, and it would be the only case in which *bytes == 0.

Otherwise looks good.

Paolo

> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  iov.c | 25 +++++++++++++++++++++++++
>  iov.h | 11 +++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/iov.c b/iov.c
> index 6eed089..bc78c34 100644
> --- a/iov.c
> +++ b/iov.c
> @@ -395,3 +395,28 @@ size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t bytes)
>      }
>      return total;
>  }
> +
> +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
> +                  ssize_t offset, size_t bytes)
> +{
> +    if (offset < 0) {
> +        offset += iov_size(iov, iov_cnt);
> +        if (offset < 0) {
> +            return NULL; /* offset before beginning of vector */
> +        }
> +    }
> +
> +    while (iov_cnt > 0) {
> +        if (offset < iov->iov_len) {
> +            if (bytes > iov->iov_len - offset) {
> +                return NULL; /* would span iovec elements */
> +            }
> +            return iov->iov_base + offset;
> +        }
> +
> +        offset -= iov->iov_len;
> +        iov_cnt--;
> +        iov++;
> +    }
> +    return NULL; /* offset beyond end of vector */
> +}
> diff --git a/iov.h b/iov.h
> index d6d1fa6..674dd51 100644
> --- a/iov.h
> +++ b/iov.h
> @@ -108,3 +108,14 @@ unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
>   * smaller than requested if the vector is too small.
>   */
>  size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t bytes);
> +
> +/*
> + * Get a pointer into a vector at offset if the given number of bytes is
> + * contiguous and not split across iovec elements.  NULL is returned if
> + * memory would span iovec elements or exceed the length of the vector.
> + *
> + * The offset is ssize_t so that an offset from the end of the vector can
> + * be specified with a negative number.
> + */
> +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt, ssize_t offset,
> +                  size_t bytes);
>
Michael S. Tsirkin - Nov. 22, 2012, 9:45 a.m.
On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
> Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
> > The iov_get_ptr() data returns a pointer to contiguous data within a
> > vector.  This allows the caller to manipulate data inside the vector
> > without copying in/out using iov_from_buf()/iov_to_buf() when we know
> > that data is contiguous within an iovec element.
> 
> This works for you because you have a single byte to write.  It would
> not work for the SG_IO inhdr, which would need iov_to_buf().
> 
> What about the following alternative API:
> 
> void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
>                   ssize_t offset, size_t *bytes);
> 
> which would place the number of valid bytes (i.e. the length of the
> remainder of the iovec entry) in *bytes?
> 
> Also, I think that offset == iov_size(iov, iov_cnt) should be
> acceptable, and it would be the only case in which *bytes == 0.
> 
> Otherwise looks good.
> 
> Paolo

All this looks suspiciously like premature optimization to me.
Do we have data to show avoiding header copy is a win?

The caller would have to handle the case where the header is not
contigious, which seems just too complex - in practice the code
seems to simply fail in this case.

Why not just copy the header using iov_from_buf()/iov_to_buf()
and be done with it?


> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  iov.c | 25 +++++++++++++++++++++++++
> >  iov.h | 11 +++++++++++
> >  2 files changed, 36 insertions(+)
> > 
> > diff --git a/iov.c b/iov.c
> > index 6eed089..bc78c34 100644
> > --- a/iov.c
> > +++ b/iov.c
> > @@ -395,3 +395,28 @@ size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t bytes)
> >      }
> >      return total;
> >  }
> > +
> > +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
> > +                  ssize_t offset, size_t bytes)
> > +{
> > +    if (offset < 0) {
> > +        offset += iov_size(iov, iov_cnt);
> > +        if (offset < 0) {
> > +            return NULL; /* offset before beginning of vector */
> > +        }
> > +    }
> > +
> > +    while (iov_cnt > 0) {
> > +        if (offset < iov->iov_len) {
> > +            if (bytes > iov->iov_len - offset) {
> > +                return NULL; /* would span iovec elements */
> > +            }
> > +            return iov->iov_base + offset;
> > +        }
> > +
> > +        offset -= iov->iov_len;
> > +        iov_cnt--;
> > +        iov++;
> > +    }
> > +    return NULL; /* offset beyond end of vector */
> > +}
> > diff --git a/iov.h b/iov.h
> > index d6d1fa6..674dd51 100644
> > --- a/iov.h
> > +++ b/iov.h
> > @@ -108,3 +108,14 @@ unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
> >   * smaller than requested if the vector is too small.
> >   */
> >  size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t bytes);
> > +
> > +/*
> > + * Get a pointer into a vector at offset if the given number of bytes is
> > + * contiguous and not split across iovec elements.  NULL is returned if
> > + * memory would span iovec elements or exceed the length of the vector.
> > + *
> > + * The offset is ssize_t so that an offset from the end of the vector can
> > + * be specified with a negative number.
> > + */
> > +void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt, ssize_t offset,
> > +                  size_t bytes);
> >
Paolo Bonzini - Nov. 22, 2012, 9:52 a.m.
Il 22/11/2012 10:45, Michael S. Tsirkin ha scritto:
> All this looks suspiciously like premature optimization to me.
> Do we have data to show avoiding header copy is a win?

The code is a little simpler, because we know the footer is 1 byte only.

Paolo

> The caller would have to handle the case where the header is not
> contigious, which seems just too complex - in practice the code
> seems to simply fail in this case.
> 
> Why not just copy the header using iov_from_buf()/iov_to_buf()
> and be done with it?
Michael S. Tsirkin - Nov. 22, 2012, 10:38 a.m.
On Thu, Nov 22, 2012 at 10:52:27AM +0100, Paolo Bonzini wrote:
> Il 22/11/2012 10:45, Michael S. Tsirkin ha scritto:
> > All this looks suspiciously like premature optimization to me.
> > Do we have data to show avoiding header copy is a win?
> 
> The code is a little simpler, because we know the footer is 1 byte only.
> 
> Paolo

Yes but the APIs don't make sense in the generic case
of >1 byte: users will have to code up two paths for when
the buffer they want to access gets scattered across.
So this looks like a future source of errors:
it's better to avoid poking at guest memory directly.

If the point is to avoid scanning iov vector when data is towards the
end of the iov, then this does sound reasonable.  In that case IMHO we
should just have accessors that work back from end of the iov. E.g.

size_t iov_from_buf_end(const struct iovec *iov, unsigned int iov_cnt,
			size_t offset, const void *buf, size_t bytes)
Paolo Bonzini - Nov. 22, 2012, 10:54 a.m.
Il 22/11/2012 11:38, Michael S. Tsirkin ha scritto:
>> > The code is a little simpler, because we know the footer is 1 byte only.
>
> Yes but the APIs don't make sense in the generic case
> of >1 byte: users will have to code up two paths for when
> the buffer they want to access gets scattered across.

That would be premature optimization; with >1 byte you just use
iov_from/to_buf.

BTW, something like this function is also useful for the broken SCSI
outhdr ("the CDB starts after the common outhdr and is in a single
iovec").  But the API must be changed slightly, as in my answer to Stefan.

> If the point is to avoid scanning iov vector when data is towards the
> end of the iov, then this does sound reasonable.  In that case IMHO we
> should just have accessors that work back from end of the iov. E.g.
> 
> size_t iov_from_buf_end(const struct iovec *iov, unsigned int iov_cnt,
> 			size_t offset, const void *buf, size_t bytes)

That's also a possibility.

Paolo
Michael S. Tsirkin - Nov. 22, 2012, 11:14 a.m.
On Thu, Nov 22, 2012 at 11:54:49AM +0100, Paolo Bonzini wrote:
> Il 22/11/2012 11:38, Michael S. Tsirkin ha scritto:
> >> > The code is a little simpler, because we know the footer is 1 byte only.
> >
> > Yes but the APIs don't make sense in the generic case
> > of >1 byte: users will have to code up two paths for when
> > the buffer they want to access gets scattered across.
> 
> That would be premature optimization; with >1 byte you just use
> iov_from/to_buf.

If the API makes it very clear that it only works for 1 byte
I would have no objection.

> BTW, something like this function is also useful for the broken SCSI
> outhdr ("the CDB starts after the common outhdr and is in a single
> iovec").  But the API must be changed slightly, as in my answer to Stefan.

The scsi command is special in that the length is
iov_length. It's all legacy anyway so I think we can
just assume it's the second s/g entry:
iov[1].iov_length.
We should probably have a wrapper that gets it
after checking iov_cnt > 1.

	size_t iov_get_seg_length(...)
	{
		assert(iov_cnt <= idx);
		return iov[idx].iov_length;
	}

Rusty says he'll put the length of the command
in the buffer in the future, so we will have
if (new_cmd_feature)
		iov_to_buf(.... &len)
else
		len = iov_get_seg_length



> > If the point is to avoid scanning iov vector when data is towards the
> > end of the iov, then this does sound reasonable.  In that case IMHO we
> > should just have accessors that work back from end of the iov. E.g.
> > 
> > size_t iov_from_buf_end(const struct iovec *iov, unsigned int iov_cnt,
> > 			size_t offset, const void *buf, size_t bytes)
> 
> That's also a possibility.
> 
> Paolo
Stefan Hajnoczi - Nov. 22, 2012, 11:58 a.m.
On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
> Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
> > The iov_get_ptr() data returns a pointer to contiguous data within a
> > vector.  This allows the caller to manipulate data inside the vector
> > without copying in/out using iov_from_buf()/iov_to_buf() when we know
> > that data is contiguous within an iovec element.
> 
> This works for you because you have a single byte to write.  It would
> not work for the SG_IO inhdr, which would need iov_to_buf().

Guilty as charged, your honor. :)

Let me give a few more details about the motivation for this function:

In virtio-blk-data-plane we have an iovec[] array.  In the read/write
code path we discard the inhdr/outhdr so just the data buffers are left
in the iovec[] array.  Then we can pass the iovec[] array straight to
the Linux AIO functions.

Because we're using the iovec[] array for data buffers and we're not
allowed to make assumptions about iovec layout, we cannot use
iov_to_buf()/iov_from_buf() at the end to fill in the status field - the
inhdr has already been discarded from the iovec[] array.

Since I knew the inhdr is only 1 byte I decided against doing something
like dynamically allocating/freeing a QEMUIOVector which could handle
spanning iovecs.

That said, I think this function is okay as-is because it works fine for
non-virtio cases where the caller *knows* the iovec[] layout.  As a
utility function it stands on its own.

> What about the following alternative API:
> 
> void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
>                   ssize_t offset, size_t *bytes);
> 
> which would place the number of valid bytes (i.e. the length of the
> remainder of the iovec entry) in *bytes?
> 
> Also, I think that offset == iov_size(iov, iov_cnt) should be
> acceptable, and it would be the only case in which *bytes == 0.

Hmm...this may be more useful than the version I proposed since the
caller can also use it to find out how many bytes are contiguous.

Michael: Any concerns if I update the code to reflect Paolo's
suggestion?

Stefan
Paolo Bonzini - Nov. 22, 2012, 12:15 p.m.
Il 22/11/2012 12:58, Stefan Hajnoczi ha scritto:
> Then we can pass the iovec[] array straight to
> the Linux AIO functions.
> 
> Since I knew the inhdr is only 1 byte I decided against doing something
> like dynamically allocating/freeing a QEMUIOVector which could handle
> spanning iovecs.

Long-term we want anyway a QEMUIOVector, to pass it to block functions.
 Can you introduce qemu_iovec_concat_iov that takes an iov/niov pair
instead of the source QEMUIOVector?  That should do quite nicely.

Paolo
Michael S. Tsirkin - Nov. 22, 2012, 12:35 p.m.
On Thu, Nov 22, 2012 at 12:58:23PM +0100, Stefan Hajnoczi wrote:
> On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
> > Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
> > > The iov_get_ptr() data returns a pointer to contiguous data within a
> > > vector.  This allows the caller to manipulate data inside the vector
> > > without copying in/out using iov_from_buf()/iov_to_buf() when we know
> > > that data is contiguous within an iovec element.
> > 
> > This works for you because you have a single byte to write.  It would
> > not work for the SG_IO inhdr, which would need iov_to_buf().
> 
> Guilty as charged, your honor. :)
> 
> Let me give a few more details about the motivation for this function:
> 
> In virtio-blk-data-plane we have an iovec[] array.  In the read/write
> code path we discard the inhdr/outhdr so just the data buffers are left
> in the iovec[] array.  Then we can pass the iovec[] array straight to
> the Linux AIO functions.
> 
> Because we're using the iovec[] array for data buffers and we're not
> allowed to make assumptions about iovec layout, we cannot use
> iov_to_buf()/iov_from_buf() at the end to fill in the status field - the
> inhdr has already been discarded from the iovec[] array.

How about using iov_copy?

We have exactly this problem in virtio net if we run
on host that does not support mergeable buffer header,
and we solve it by copying out the iovec.

> Since I knew the inhdr is only 1 byte I decided against doing something
> like dynamically allocating/freeing a QEMUIOVector which could handle
> spanning iovecs.
> 
> That said, I think this function is okay as-is because it works fine for
> non-virtio cases where the caller *knows* the iovec[] layout.  As a
> utility function it stands on its own.
> 

My concern is these APIs are unsafe to use: you get back a pointer and
you must verify length is not too big before access.  Since the iov can
be manipulated by guest this looks like a good place to put extra
safeguards.

> > What about the following alternative API:
> > 
> > void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
> >                   ssize_t offset, size_t *bytes);
> > 
> > which would place the number of valid bytes (i.e. the length of the
> > remainder of the iovec entry) in *bytes?
> > 
> > Also, I think that offset == iov_size(iov, iov_cnt) should be
> > acceptable, and it would be the only case in which *bytes == 0.
> 
> Hmm...this may be more useful than the version I proposed since the
> caller can also use it to find out how many bytes are contiguous.
> 
> Michael: Any concerns if I update the code to reflect Paolo's
> suggestion?
> 
> Stefan

I'd prefer something that actually works for all cases
rather than making callers check and handle failure,
or reason why it can't fail.
Stefan Hajnoczi - Nov. 22, 2012, 3:18 p.m.
On Thu, Nov 22, 2012 at 1:35 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Thu, Nov 22, 2012 at 12:58:23PM +0100, Stefan Hajnoczi wrote:
>> On Thu, Nov 22, 2012 at 10:34:13AM +0100, Paolo Bonzini wrote:
>> > Il 21/11/2012 19:32, Stefan Hajnoczi ha scritto:
>> > > The iov_get_ptr() data returns a pointer to contiguous data within a
>> > > vector.  This allows the caller to manipulate data inside the vector
>> > > without copying in/out using iov_from_buf()/iov_to_buf() when we know
>> > > that data is contiguous within an iovec element.
>> >
>> > This works for you because you have a single byte to write.  It would
>> > not work for the SG_IO inhdr, which would need iov_to_buf().
>>
>> Guilty as charged, your honor. :)
>>
>> Let me give a few more details about the motivation for this function:
>>
>> In virtio-blk-data-plane we have an iovec[] array.  In the read/write
>> code path we discard the inhdr/outhdr so just the data buffers are left
>> in the iovec[] array.  Then we can pass the iovec[] array straight to
>> the Linux AIO functions.
>>
>> Because we're using the iovec[] array for data buffers and we're not
>> allowed to make assumptions about iovec layout, we cannot use
>> iov_to_buf()/iov_from_buf() at the end to fill in the status field - the
>> inhdr has already been discarded from the iovec[] array.
>
> How about using iov_copy?
>
> We have exactly this problem in virtio net if we run
> on host that does not support mergeable buffer header,
> and we solve it by copying out the iovec.
>
>> Since I knew the inhdr is only 1 byte I decided against doing something
>> like dynamically allocating/freeing a QEMUIOVector which could handle
>> spanning iovecs.
>>
>> That said, I think this function is okay as-is because it works fine for
>> non-virtio cases where the caller *knows* the iovec[] layout.  As a
>> utility function it stands on its own.
>>
>
> My concern is these APIs are unsafe to use: you get back a pointer and
> you must verify length is not too big before access.  Since the iov can
> be manipulated by guest this looks like a good place to put extra
> safeguards.
>
>> > What about the following alternative API:
>> >
>> > void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
>> >                   ssize_t offset, size_t *bytes);
>> >
>> > which would place the number of valid bytes (i.e. the length of the
>> > remainder of the iovec entry) in *bytes?
>> >
>> > Also, I think that offset == iov_size(iov, iov_cnt) should be
>> > acceptable, and it would be the only case in which *bytes == 0.
>>
>> Hmm...this may be more useful than the version I proposed since the
>> caller can also use it to find out how many bytes are contiguous.
>>
>> Michael: Any concerns if I update the code to reflect Paolo's
>> suggestion?
>>
>> Stefan
>
> I'd prefer something that actually works for all cases
> rather than making callers check and handle failure,
> or reason why it can't fail.

I just sent out a new version of the patch which goes whole hog and
uses a QEMUIOVector to safely access virtio_blk_inhdr regardless of
its size or iovec spanning.

Stefan

Patch

diff --git a/iov.c b/iov.c
index 6eed089..bc78c34 100644
--- a/iov.c
+++ b/iov.c
@@ -395,3 +395,28 @@  size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t bytes)
     }
     return total;
 }
+
+void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt,
+                  ssize_t offset, size_t bytes)
+{
+    if (offset < 0) {
+        offset += iov_size(iov, iov_cnt);
+        if (offset < 0) {
+            return NULL; /* offset before beginning of vector */
+        }
+    }
+
+    while (iov_cnt > 0) {
+        if (offset < iov->iov_len) {
+            if (bytes > iov->iov_len - offset) {
+                return NULL; /* would span iovec elements */
+            }
+            return iov->iov_base + offset;
+        }
+
+        offset -= iov->iov_len;
+        iov_cnt--;
+        iov++;
+    }
+    return NULL; /* offset beyond end of vector */
+}
diff --git a/iov.h b/iov.h
index d6d1fa6..674dd51 100644
--- a/iov.h
+++ b/iov.h
@@ -108,3 +108,14 @@  unsigned iov_copy(struct iovec *dst_iov, unsigned int dst_iov_cnt,
  * smaller than requested if the vector is too small.
  */
 size_t iov_discard(struct iovec **iov, unsigned int *iov_cnt, ssize_t bytes);
+
+/*
+ * Get a pointer into a vector at offset if the given number of bytes is
+ * contiguous and not split across iovec elements.  NULL is returned if
+ * memory would span iovec elements or exceed the length of the vector.
+ *
+ * The offset is ssize_t so that an offset from the end of the vector can
+ * be specified with a negative number.
+ */
+void *iov_get_ptr(struct iovec *iov, unsigned int iov_cnt, ssize_t offset,
+                  size_t bytes);