Patchwork For AIO return -ENOSPC on short write

login
register
mail settings
Submitter Jes Sorensen
Date Feb. 22, 2011, 10:18 a.m.
Message ID <1298369880-23859-2-git-send-email-Jes.Sorensen@redhat.com>
Download mbox | patch
Permalink /patch/83948/
State New
Headers show

Comments

Jes Sorensen - Feb. 22, 2011, 10:18 a.m.
From: Jes Sorensen <Jes.Sorensen@redhat.com>

Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
---
 linux-aio.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)
Kevin Wolf - Feb. 22, 2011, 11:44 a.m.
Am 22.02.2011 11:18, schrieb Jes.Sorensen@redhat.com:
> From: Jes Sorensen <Jes.Sorensen@redhat.com>
> 
> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
> ---
>  linux-aio.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/linux-aio.c b/linux-aio.c
> index 68f4b3d..d9c0225 100644
> --- a/linux-aio.c
> +++ b/linux-aio.c
> @@ -32,6 +32,7 @@ struct qemu_laiocb {
>      ssize_t ret;
>      size_t nbytes;
>      int async_context_id;
> +    int type;
>      QLIST_ENTRY(qemu_laiocb) node;
>  };
>  
> @@ -62,6 +63,9 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
>      if (ret != -ECANCELED) {
>          if (ret == laiocb->nbytes)
>              ret = 0;
> +        else if ((laiocb->type == QEMU_AIO_WRITE) && (ret >= 0) &&
> +                 (ret < laiocb->nbytes))
> +            ret = -ENOSPC;
>          else if (ret >= 0)
>              ret = -EINVAL;

Isn't there a way to get the real error code instead of just making it
up more cleverly? Like retrying for the rest of the request?

Kevin
Jes Sorensen - Feb. 22, 2011, 11:45 a.m.
On 02/22/11 12:44, Kevin Wolf wrote:
>> @@ -62,6 +63,9 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
>>      if (ret != -ECANCELED) {
>>          if (ret == laiocb->nbytes)
>>              ret = 0;
>> +        else if ((laiocb->type == QEMU_AIO_WRITE) && (ret >= 0) &&
>> +                 (ret < laiocb->nbytes))
>> +            ret = -ENOSPC;
>>          else if (ret >= 0)
>>              ret = -EINVAL;
> 
> Isn't there a way to get the real error code instead of just making it
> up more cleverly? Like retrying for the rest of the request?
> 
> Kevin

I guess we could retry the last part of the request, but if we already
have an error, it seems silly to try to rewrite the same stuff again
just to obtain the error code.

I looked through the aio calls and I didn't find any obvious way to
retrieve the error code, but maybe I missed something?

Cheers,
Jes
Avi Kivity - Feb. 22, 2011, 1:56 p.m.
On 02/22/2011 01:45 PM, Jes Sorensen wrote:
> On 02/22/11 12:44, Kevin Wolf wrote:
> >>  @@ -62,6 +63,9 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
> >>       if (ret != -ECANCELED) {
> >>           if (ret == laiocb->nbytes)
> >>               ret = 0;
> >>  +        else if ((laiocb->type == QEMU_AIO_WRITE)&&  (ret>= 0)&&
> >>  +                 (ret<  laiocb->nbytes))
> >>  +            ret = -ENOSPC;
> >>           else if (ret>= 0)
> >>               ret = -EINVAL;
> >
> >  Isn't there a way to get the real error code instead of just making it
> >  up more cleverly? Like retrying for the rest of the request?
> >
> >  Kevin
>
> I guess we could retry the last part of the request, but if we already
> have an error, it seems silly to try to rewrite the same stuff again
> just to obtain the error code.

Why?  It's the standard Unix idiom.  Keep writing until you either 
complete your request or get an error.  We don't do this here, and 
instead invent an error.

Admittedly it's harder to do.

> I looked through the aio calls and I didn't find any obvious way to
> retrieve the error code, but maybe I missed something?

The existing code already has it: if ret is negative, that's what we return.

What you have to do on a short read or write is to schedule a new 
request that starts from the point where this completion ends, and let 
the completion of the new request return the error (or perhaps succeed).
Stefan Hajnoczi - Feb. 22, 2011, 3:02 p.m.
On Tue, Feb 22, 2011 at 10:18 AM,  <Jes.Sorensen@redhat.com> wrote:
> +        else if ((laiocb->type == QEMU_AIO_WRITE) && (ret >= 0) &&
> +                 (ret < laiocb->nbytes))
> +            ret = -ENOSPC;

Why is write special?

Why are we even allowing requests that extend beyond the end of the
device?  Is the LVM volume marked growable in the QEMU block layer?

Stefan
Kevin Wolf - Feb. 22, 2011, 3:11 p.m.
Am 22.02.2011 16:02, schrieb Stefan Hajnoczi:
> On Tue, Feb 22, 2011 at 10:18 AM,  <Jes.Sorensen@redhat.com> wrote:
>> +        else if ((laiocb->type == QEMU_AIO_WRITE) && (ret >= 0) &&
>> +                 (ret < laiocb->nbytes))
>> +            ret = -ENOSPC;
> 
> Why is write special?

I think we need the change reads, too. However not to return ENOSPC, but
to return zeros instead (this is what the synchronous raw_read does, and
pwrite relies on it - once we make pwrite async, we'll need this).

> Why are we even allowing requests that extend beyond the end of the
> device?  Is the LVM volume marked growable in the QEMU block layer?

Might well be a qcow2 on LVM case that Jes was debugging.

Kevin
Stefan Hajnoczi - Feb. 22, 2011, 3:16 p.m.
On Tue, Feb 22, 2011 at 3:11 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 22.02.2011 16:02, schrieb Stefan Hajnoczi:
>> Why are we even allowing requests that extend beyond the end of the
>> device?  Is the LVM volume marked growable in the QEMU block layer?
>
> Might well be a qcow2 on LVM case that Jes was debugging.

Yes it is.  It doesn't explain it though.  The code involved here is
linux-aio.c and will be qcow2's bs->file.  That ought to be a
host_device and AFAIK that is not growable.  So I wanted to figure out
why we're even getting this far.  I expected the request to get
rejected in block.c when checking the range against the host_device.

Stefan
Paolo Bonzini - Feb. 22, 2011, 4:59 p.m.
On 02/22/2011 04:16 PM, Stefan Hajnoczi wrote:
> Yes it is.  It doesn't explain it though.  The code involved here is
> linux-aio.c and will be qcow2's bs->file.  That ought to be a
> host_device and AFAIK that is not growable.  So I wanted to figure out
> why we're even getting this far.  I expected the request to get
> rejected in block.c when checking the range against the host_device.

Possibly a COW logical volume can give short writes on disk full?

Paolo
Christoph Hellwig - March 1, 2011, 8:19 p.m.
On Tue, Feb 22, 2011 at 05:59:01PM +0100, Paolo Bonzini wrote:
> On 02/22/2011 04:16 PM, Stefan Hajnoczi wrote:
> >Yes it is.  It doesn't explain it though.  The code involved here is
> >linux-aio.c and will be qcow2's bs->file.  That ought to be a
> >host_device and AFAIK that is not growable.  So I wanted to figure out
> >why we're even getting this far.  I expected the request to get
> >rejected in block.c when checking the range against the host_device.
> 
> Possibly a COW logical volume can give short writes on disk full?

It shouldn't.  If it does that needs fixing.

Patch

diff --git a/linux-aio.c b/linux-aio.c
index 68f4b3d..d9c0225 100644
--- a/linux-aio.c
+++ b/linux-aio.c
@@ -32,6 +32,7 @@  struct qemu_laiocb {
     ssize_t ret;
     size_t nbytes;
     int async_context_id;
+    int type;
     QLIST_ENTRY(qemu_laiocb) node;
 };
 
@@ -62,6 +63,9 @@  static void qemu_laio_process_completion(struct qemu_laio_state *s,
     if (ret != -ECANCELED) {
         if (ret == laiocb->nbytes)
             ret = 0;
+        else if ((laiocb->type == QEMU_AIO_WRITE) && (ret >= 0) &&
+                 (ret < laiocb->nbytes))
+            ret = -ENOSPC;
         else if (ret >= 0)
             ret = -EINVAL;
 
@@ -204,6 +208,7 @@  BlockDriverAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
     laiocb->nbytes = nb_sectors * 512;
     laiocb->ctx = s;
     laiocb->ret = -EINPROGRESS;
+    laiocb->type = type;
     laiocb->async_context_id = get_async_context_id();
 
     iocbs = &laiocb->iocb;