Patchwork [1/2] ide: convert ide_sector_read() to asynchronous I/O

login
register
mail settings
Submitter Stefan Hajnoczi
Date March 28, 2012, 3:43 p.m.
Message ID <1332949439-6781-2-git-send-email-stefanha@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/149278/
State New
Headers show

Comments

Stefan Hajnoczi - March 28, 2012, 3:43 p.m.
The IDE PIO interface currently uses bdrv_read() to perform reads
synchronously.  Synchronous I/O in the vcpu thread is bad because it
prevents the guest from executing code - it makes the guest
unresponsive.

This patch converts IDE PIO to use bdrv_aio_readv().  We simply need to
use the BUSY_STAT status so the guest knows to wait while we are busy.

The only external user of ide_sector_read() is restart behavior on I/O
errors and it is not affected by this change.  We still need to restart
I/O in the same way.

Migration is also unaffected if I understand the code correctly.  We
continue to use the same transfer function and the BUSY_STAT status
should never be migrated since we flush I/O before migrating device
state.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 hw/ide/core.c     |   69 ++++++++++++++++++++++++++++++++++++----------------
 hw/ide/internal.h |    2 +
 2 files changed, 50 insertions(+), 21 deletions(-)
Michael Tokarev - March 29, 2012, 6:42 a.m.
On 28.03.2012 19:43, Stefan Hajnoczi wrote:
>  void ide_sector_read(IDEState *s)
>  {
[]
> +    s->iov.iov_base = s->io_buffer;
> +    s->iov.iov_len  = n * BDRV_SECTOR_SIZE;
> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +
> +    bdrv_acct_start(s->bs, &s->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ);
> +    bdrv_aio_readv(s->bs, sector_num, &s->qiov, n,
> +                   ide_sector_read_cb, s);
>  }

Shouldn't this function be returning something and
check the return value of bdrv_aio_readv() ?

I'm not sure if bdrv_aio_readv() is _supposed_ to never
return any errors.  In practice it is definitely a bit
more complex.  If bdrv_aio_readv() didn't do anything
useful (ie, didn't queue the callback), we'll be busy
forever.

Again, I've no idea if it supposed to never return error.
A block device without proper "media" present may - at
least in theory - do so.  If it must not, I think this
should be documented somewhere that bdrv_aio_readv()
does never return error, by design.

That was one of the reasons I digged into the block
layer in the first place - in order to understand,
simplify and _document_ what each interface does.

Thanks,

/mjt
Michael Tokarev - March 29, 2012, 6:46 a.m.
On 28.03.2012 19:43, Stefan Hajnoczi wrote:
...

> void ide_sector_read(IDEState *s)
> {
...
> +    s->iov.iov_base = s->io_buffer;
> +    s->iov.iov_len  = n * BDRV_SECTOR_SIZE;
> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
> +    bdrv_aio_readv(s->bs, sector_num, &s->qiov, n,
> +                   ide_sector_read_cb, s);
>  }
>  
> @@ -383,6 +383,8 @@ struct IDEState {
>      int cd_sector_size;
>      int atapi_dma; /* true if dma is requested for the packet cmd */
>      BlockAcctCookie acct;
> +    struct iovec iov;
> +    QEMUIOVector qiov;
>      /* ATA DMA state */
>      int io_buffer_size;
>      QEMUSGList sg;

You don't actually need iov here, it can be a local variable
just as well.  The same applies to ide_sector_write() in the
next patch.  The state structure usually holds stuff which
is actually needed to be keept across several calls, iov is
not one of them.

Thanks,

/mjt
Michael Tokarev - March 29, 2012, 7 a.m.
On 29.03.2012 10:46, Michael Tokarev wrote:
> On 28.03.2012 19:43, Stefan Hajnoczi wrote:
> ...
> 
>> void ide_sector_read(IDEState *s)
>> {
> ...
>> +    s->iov.iov_base = s->io_buffer;
>> +    s->iov.iov_len  = n * BDRV_SECTOR_SIZE;
>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>> +    bdrv_aio_readv(s->bs, sector_num, &s->qiov, n,
>> +                   ide_sector_read_cb, s);
>>  }
>>  
>> @@ -383,6 +383,8 @@ struct IDEState {
>>      int cd_sector_size;
>>      int atapi_dma; /* true if dma is requested for the packet cmd */
>>      BlockAcctCookie acct;
>> +    struct iovec iov;
>> +    QEMUIOVector qiov;
>>      /* ATA DMA state */
>>      int io_buffer_size;
>>      QEMUSGList sg;
> 
> You don't actually need iov here, it can be a local variable
> just as well.  The same applies to ide_sector_write() in the
> next patch.  The state structure usually holds stuff which
> is actually needed to be keept across several calls, iov is
> not one of them.

Please ignore this one ;)  For qemu_iovec_init_external(), the
real iov is ofcourse needed.

Thanks,

/mjt
Stefan Hajnoczi - March 29, 2012, 8:29 a.m.
On Thu, Mar 29, 2012 at 7:42 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> On 28.03.2012 19:43, Stefan Hajnoczi wrote:
>>  void ide_sector_read(IDEState *s)
>>  {
> []
>> +    s->iov.iov_base = s->io_buffer;
>> +    s->iov.iov_len  = n * BDRV_SECTOR_SIZE;
>> +    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
>> +
>> +    bdrv_acct_start(s->bs, &s->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ);
>> +    bdrv_aio_readv(s->bs, sector_num, &s->qiov, n,
>> +                   ide_sector_read_cb, s);
>>  }
>
> Shouldn't this function be returning something and
> check the return value of bdrv_aio_readv() ?

I think you are right that something is missing here.  If the IDE
controller is reset then we need to cancel the pending aio request.
Otherwise we could end up with a dangling request across reset that
overwrites io_buffer.

About checking the return value: it used to be the case that the
return value needed to be checked.

This was kind of a wart in the interface because it created 2 error
handling paths: one immediate return and one callback with error.  I
think Paolo and Kevin were the ones to address this.  You can check
out this patch for starters but there are a few more related ones:

ad54ae80c7 ("block: bdrv_aio_* do not return NULL")

Today the return value does not need to be checked for NULL.

Stefan
Paolo Bonzini - March 29, 2012, 9:10 a.m.
Il 29/03/2012 10:29, Stefan Hajnoczi ha scritto:
> This was kind of a wart in the interface because it created 2 error
> handling paths: one immediate return and one callback with error.  I
> think Paolo and Kevin were the ones to address this.

I think you did that when adding the coroutines.  I just cleaned up the
result.

Paolo

Patch

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 4d568ac..e188157 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -457,40 +457,67 @@  static void ide_rw_error(IDEState *s) {
     ide_set_irq(s->bus);
 }
 
+static void ide_sector_read_cb(void *opaque, int ret)
+{
+    IDEState *s = opaque;
+    int n;
+
+    s->status &= ~BUSY_STAT;
+
+    bdrv_acct_done(s->bs, &s->acct);
+    if (ret != 0) {
+        if (ide_handle_rw_error(s, -ret, BM_STATUS_PIO_RETRY |
+                                BM_STATUS_RETRY_READ)) {
+            return;
+        }
+    }
+
+    n = s->nsector;
+    if (n > s->req_nb_sectors) {
+        n = s->req_nb_sectors;
+    }
+
+    /* Allow the guest to read the io_buffer */
+    ide_transfer_start(s, s->io_buffer, n * BDRV_SECTOR_SIZE, ide_sector_read);
+
+    ide_set_irq(s->bus);
+
+    ide_set_sector(s, ide_get_sector(s) + n);
+    s->nsector -= n;
+}
+
 void ide_sector_read(IDEState *s)
 {
     int64_t sector_num;
-    int ret, n;
+    int n;
 
     s->status = READY_STAT | SEEK_STAT;
     s->error = 0; /* not needed by IDE spec, but needed by Windows */
     sector_num = ide_get_sector(s);
     n = s->nsector;
+
     if (n == 0) {
-        /* no more sector to read from disk */
         ide_transfer_stop(s);
-    } else {
+        return;
+    }
+
+    s->status |= BUSY_STAT;
+
+    if (n > s->req_nb_sectors) {
+        n = s->req_nb_sectors;
+    }
+
 #if defined(DEBUG_IDE)
-        printf("read sector=%" PRId64 "\n", sector_num);
+    printf("sector=%" PRId64 "\n", sector_num);
 #endif
-        if (n > s->req_nb_sectors)
-            n = s->req_nb_sectors;
 
-        bdrv_acct_start(s->bs, &s->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ);
-        ret = bdrv_read(s->bs, sector_num, s->io_buffer, n);
-        bdrv_acct_done(s->bs, &s->acct);
-        if (ret != 0) {
-            if (ide_handle_rw_error(s, -ret,
-                BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ))
-            {
-                return;
-            }
-        }
-        ide_transfer_start(s, s->io_buffer, 512 * n, ide_sector_read);
-        ide_set_irq(s->bus);
-        ide_set_sector(s, sector_num + n);
-        s->nsector -= n;
-    }
+    s->iov.iov_base = s->io_buffer;
+    s->iov.iov_len  = n * BDRV_SECTOR_SIZE;
+    qemu_iovec_init_external(&s->qiov, &s->iov, 1);
+
+    bdrv_acct_start(s->bs, &s->acct, n * BDRV_SECTOR_SIZE, BDRV_ACCT_READ);
+    bdrv_aio_readv(s->bs, sector_num, &s->qiov, n,
+                   ide_sector_read_cb, s);
 }
 
 static void dma_buf_commit(IDEState *s)
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index c808a0d..37d7307 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -383,6 +383,8 @@  struct IDEState {
     int cd_sector_size;
     int atapi_dma; /* true if dma is requested for the packet cmd */
     BlockAcctCookie acct;
+    struct iovec iov;
+    QEMUIOVector qiov;
     /* ATA DMA state */
     int io_buffer_size;
     QEMUSGList sg;