diff mbox

linux-aio: keep processing events if MAX_EVENTS reached

Message ID 1467045468-25709-1-git-send-email-stefanha@redhat.com
State New
Headers show

Commit Message

Stefan Hajnoczi June 27, 2016, 4:37 p.m. UTC
Commit ccb9dc10129954d0bcd7814298ed445e684d5a2a ("linux-aio: Cancel BH
if not needed") exposed a side-effect of scheduling the BH for nested
event loops.  When io_getevents(2) fetches MAX_EVENTS events then it's
necessary to call it again.  Failure to do so can lead to hung I/O
because the eventfd has already been cleared and the completion BH will
not run again.

This patch fixes the hang by calling io_getevents(2) again if the events
array was totally full.

Reported-by: Roman Penyaev <roman.penyaev@profitbricks.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Paolo Bonzini June 27, 2016, 4:42 p.m. UTC | #1
On 27/06/2016 18:37, Stefan Hajnoczi wrote:
> Commit ccb9dc10129954d0bcd7814298ed445e684d5a2a ("linux-aio: Cancel BH
> if not needed") exposed a side-effect of scheduling the BH for nested
> event loops.  When io_getevents(2) fetches MAX_EVENTS events then it's
> necessary to call it again.  Failure to do so can lead to hung I/O
> because the eventfd has already been cleared and the completion BH will
> not run again.
> 
> This patch fixes the hang by calling io_getevents(2) again if the events
> array was totally full.
> 
> Reported-by: Roman Penyaev <roman.penyaev@profitbricks.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/linux-aio.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index e468960..af15f85 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -117,6 +117,7 @@ static void qemu_laio_completion_bh(void *opaque)
>      LinuxAioState *s = opaque;
>  
>      /* Fetch more completion events when empty */
> +more_events:
>      if (s->event_idx == s->event_max) {
>          do {
>              struct timespec ts = { 0 };
> @@ -146,6 +147,10 @@ static void qemu_laio_completion_bh(void *opaque)
>          qemu_laio_process_completion(laiocb);
>      }
>  
> +    if (s->event_idx == MAX_EVENTS) {
> +        goto more_events; /* there might still be events waiting for us */
> +    }
> +
>      if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>          ioq_submit(s);
>      }
> 

Do we want to do it after submitting, in order to reduce latency and
perhaps even get some results immediately?

Paolo
Roman Pen June 27, 2016, 6:36 p.m. UTC | #2
On Jun 27, 2016 6:37 PM, "Stefan Hajnoczi" <stefanha@redhat.com> wrote:
>
> Commit ccb9dc10129954d0bcd7814298ed445e684d5a2a ("linux-aio: Cancel BH
> if not needed") exposed a side-effect of scheduling the BH for nested
> event loops.  When io_getevents(2) fetches MAX_EVENTS events then it's
> necessary to call it again.  Failure to do so can lead to hung I/O
> because the eventfd has already been cleared and the completion BH will
> not run again.
>
> This patch fixes the hang by calling io_getevents(2) again if the events
> array was totally full.
>
> Reported-by: Roman Penyaev <roman.penyaev@profitbricks.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/linux-aio.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index e468960..af15f85 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -117,6 +117,7 @@ static void qemu_laio_completion_bh(void *opaque)
>      LinuxAioState *s = opaque;
>
>      /* Fetch more completion events when empty */
> +more_events:
>      if (s->event_idx == s->event_max) {
>          do {
>              struct timespec ts = { 0 };
> @@ -146,6 +147,10 @@ static void qemu_laio_completion_bh(void *opaque)
>          qemu_laio_process_completion(laiocb);
>      }
>
> +    if (s->event_idx == MAX_EVENTS) {
> +        goto more_events; /* there might still be events waiting for us
*/
> +    }
> +

I am on vacation and can't check anything, but seems you also
could reproduce an issue and seems you are right: what I've
also noticed is that io hangs only if total number of requests
in guest > than 128 (what is defined in linux-aio.c).

But I do not understand why you have to repeat io_getevents in
case of return value == MAX_EVENTS. According to my
understanding you cant submit more than 128, so the first
io_getevents call returns you exactly what you've submitted.

--
Roman

>      if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
>          ioq_submit(s);
>      }
> --
> 2.7.4
>
Stefan Hajnoczi June 28, 2016, 9:25 a.m. UTC | #3
On Mon, Jun 27, 2016 at 06:42:33PM +0200, Paolo Bonzini wrote:
> 
> 
> On 27/06/2016 18:37, Stefan Hajnoczi wrote:
> > Commit ccb9dc10129954d0bcd7814298ed445e684d5a2a ("linux-aio: Cancel BH
> > if not needed") exposed a side-effect of scheduling the BH for nested
> > event loops.  When io_getevents(2) fetches MAX_EVENTS events then it's
> > necessary to call it again.  Failure to do so can lead to hung I/O
> > because the eventfd has already been cleared and the completion BH will
> > not run again.
> > 
> > This patch fixes the hang by calling io_getevents(2) again if the events
> > array was totally full.
> > 
> > Reported-by: Roman Penyaev <roman.penyaev@profitbricks.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/linux-aio.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/block/linux-aio.c b/block/linux-aio.c
> > index e468960..af15f85 100644
> > --- a/block/linux-aio.c
> > +++ b/block/linux-aio.c
> > @@ -117,6 +117,7 @@ static void qemu_laio_completion_bh(void *opaque)
> >      LinuxAioState *s = opaque;
> >  
> >      /* Fetch more completion events when empty */
> > +more_events:
> >      if (s->event_idx == s->event_max) {
> >          do {
> >              struct timespec ts = { 0 };
> > @@ -146,6 +147,10 @@ static void qemu_laio_completion_bh(void *opaque)
> >          qemu_laio_process_completion(laiocb);
> >      }
> >  
> > +    if (s->event_idx == MAX_EVENTS) {
> > +        goto more_events; /* there might still be events waiting for us */
> > +    }
> > +
> >      if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
> >          ioq_submit(s);
> >      }
> > 
> 
> Do we want to do it after submitting, in order to reduce latency and
> perhaps even get some results immediately?

No, if more I/O requests are submitted let the event loop spin to avoid
starving the event loop when there is a high IOPS device.

Stefan
Stefan Hajnoczi June 28, 2016, 9:42 a.m. UTC | #4
On Mon, Jun 27, 2016 at 08:36:19PM +0200, Roman Penyaev wrote:
> On Jun 27, 2016 6:37 PM, "Stefan Hajnoczi" <stefanha@redhat.com> wrote:
> >
> > Commit ccb9dc10129954d0bcd7814298ed445e684d5a2a ("linux-aio: Cancel BH
> > if not needed") exposed a side-effect of scheduling the BH for nested
> > event loops.  When io_getevents(2) fetches MAX_EVENTS events then it's
> > necessary to call it again.  Failure to do so can lead to hung I/O
> > because the eventfd has already been cleared and the completion BH will
> > not run again.
> >
> > This patch fixes the hang by calling io_getevents(2) again if the events
> > array was totally full.
> >
> > Reported-by: Roman Penyaev <roman.penyaev@profitbricks.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  block/linux-aio.c | 5 +++++
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/block/linux-aio.c b/block/linux-aio.c
> > index e468960..af15f85 100644
> > --- a/block/linux-aio.c
> > +++ b/block/linux-aio.c
> > @@ -117,6 +117,7 @@ static void qemu_laio_completion_bh(void *opaque)
> >      LinuxAioState *s = opaque;
> >
> >      /* Fetch more completion events when empty */
> > +more_events:
> >      if (s->event_idx == s->event_max) {
> >          do {
> >              struct timespec ts = { 0 };
> > @@ -146,6 +147,10 @@ static void qemu_laio_completion_bh(void *opaque)
> >          qemu_laio_process_completion(laiocb);
> >      }
> >
> > +    if (s->event_idx == MAX_EVENTS) {
> > +        goto more_events; /* there might still be events waiting for us
> */
> > +    }
> > +
> 
> I am on vacation and can't check anything, but seems you also
> could reproduce an issue and seems you are right: what I've
> also noticed is that io hangs only if total number of requests
> in guest > than 128 (what is defined in linux-aio.c).
> 
> But I do not understand why you have to repeat io_getevents in
> case of return value == MAX_EVENTS. According to my
> understanding you cant submit more than 128, so the first
> io_getevents call returns you exactly what you've submitted.

Hi Roman,
There is nothing like discussions on qemu-devel from the beach.  True
vacation!

The issue is:

1. s->events[] is only MAX_EVENTS large so each io_getevents() call can
   only fetch that maximum number of events.

2. qemu_laio_completion_cb() clears the eventfd so the event loop will
   not call us again (unless additional I/O requests complete).

Therefore, returning from qemu_laio_completion_bh() without having
processed all events causes a hang.

Stefan
diff mbox

Patch

diff --git a/block/linux-aio.c b/block/linux-aio.c
index e468960..af15f85 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -117,6 +117,7 @@  static void qemu_laio_completion_bh(void *opaque)
     LinuxAioState *s = opaque;
 
     /* Fetch more completion events when empty */
+more_events:
     if (s->event_idx == s->event_max) {
         do {
             struct timespec ts = { 0 };
@@ -146,6 +147,10 @@  static void qemu_laio_completion_bh(void *opaque)
         qemu_laio_process_completion(laiocb);
     }
 
+    if (s->event_idx == MAX_EVENTS) {
+        goto more_events; /* there might still be events waiting for us */
+    }
+
     if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }