diff mbox series

xen-hvm: Avoid livelock while handling buffered ioreqs

Message ID 20240404140833.1557953-1-ross.lagerwall@citrix.com
State New
Headers show
Series xen-hvm: Avoid livelock while handling buffered ioreqs | expand

Commit Message

Ross Lagerwall April 4, 2024, 2:08 p.m. UTC
A malicious or buggy guest may generated buffered ioreqs faster than
QEMU can process them in handle_buffered_iopage(). The result is a
livelock - QEMU continuously processes ioreqs on the main thread without
iterating through the main loop which prevents handling other events,
processing timers, etc. Without QEMU handling other events, it often
results in the guest becoming unsable and makes it difficult to stop the
source of buffered ioreqs.

To avoid this, if we process a full page of buffered ioreqs, stop and
reschedule an immediate timer to continue processing them. This lets
QEMU go back to the main loop and catch up.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 hw/xen/xen-hvm-common.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

Comments

Durrant, Paul April 6, 2024, 10:58 a.m. UTC | #1
On 04/04/2024 15:08, Ross Lagerwall wrote:
> A malicious or buggy guest may generated buffered ioreqs faster than
> QEMU can process them in handle_buffered_iopage(). The result is a
> livelock - QEMU continuously processes ioreqs on the main thread without
> iterating through the main loop which prevents handling other events,
> processing timers, etc. Without QEMU handling other events, it often
> results in the guest becoming unsable and makes it difficult to stop the
> source of buffered ioreqs.
> 
> To avoid this, if we process a full page of buffered ioreqs, stop and
> reschedule an immediate timer to continue processing them. This lets
> QEMU go back to the main loop and catch up.
> 

Do PV backends potentially cause the same scheduling issue (if not using 
io threads)?

> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>   hw/xen/xen-hvm-common.c | 26 +++++++++++++++++---------
>   1 file changed, 17 insertions(+), 9 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>
Ross Lagerwall April 8, 2024, 1 p.m. UTC | #2
On Sat, Apr 6, 2024 at 11:58 AM Durrant, Paul <xadimgnik@gmail.com> wrote:
>
> On 04/04/2024 15:08, Ross Lagerwall wrote:
> > A malicious or buggy guest may generated buffered ioreqs faster than
> > QEMU can process them in handle_buffered_iopage(). The result is a
> > livelock - QEMU continuously processes ioreqs on the main thread without
> > iterating through the main loop which prevents handling other events,
> > processing timers, etc. Without QEMU handling other events, it often
> > results in the guest becoming unsable and makes it difficult to stop the
> > source of buffered ioreqs.
> >
> > To avoid this, if we process a full page of buffered ioreqs, stop and
> > reschedule an immediate timer to continue processing them. This lets
> > QEMU go back to the main loop and catch up.
> >
>
> Do PV backends potentially cause the same scheduling issue (if not using
> io threads)?
>

From what I can tell:

xen-block: It reads req_prod / req_cons once before entering the loop
so it should be fine, I think.

xen_console: Same as xen-block

xen_nic: It reads req_prod / req_cons once before entering the loop.
However, once the loop ends it checks for more requests and if there
are more requests it restarts from the beginning. It seems like this
could be susceptible to the same issue.

(These PV backends generally aren't used by XenServer's system QEMU
so I didn't spend too much time looking into it.)

Thanks,
Ross
Durrant, Paul April 8, 2024, 1:10 p.m. UTC | #3
On 08/04/2024 14:00, Ross Lagerwall wrote:
> On Sat, Apr 6, 2024 at 11:58 AM Durrant, Paul <xadimgnik@gmail.com> wrote:
>>
>> On 04/04/2024 15:08, Ross Lagerwall wrote:
>>> A malicious or buggy guest may generated buffered ioreqs faster than
>>> QEMU can process them in handle_buffered_iopage(). The result is a
>>> livelock - QEMU continuously processes ioreqs on the main thread without
>>> iterating through the main loop which prevents handling other events,
>>> processing timers, etc. Without QEMU handling other events, it often
>>> results in the guest becoming unsable and makes it difficult to stop the
>>> source of buffered ioreqs.
>>>
>>> To avoid this, if we process a full page of buffered ioreqs, stop and
>>> reschedule an immediate timer to continue processing them. This lets
>>> QEMU go back to the main loop and catch up.
>>>
>>
>> Do PV backends potentially cause the same scheduling issue (if not using
>> io threads)?
>>
> 
>  From what I can tell:
> 
> xen-block: It reads req_prod / req_cons once before entering the loop
> so it should be fine, I think.
> 
> xen_console: Same as xen-block
> 
> xen_nic: It reads req_prod / req_cons once before entering the loop.
> However, once the loop ends it checks for more requests and if there
> are more requests it restarts from the beginning. It seems like this
> could be susceptible to the same issue.
> 
> (These PV backends generally aren't used by XenServer's system QEMU
> so I didn't spend too much time looking into it.)
> 
> Thanks,

Ok. Thanks for checking.

   Paul
Durrant, Paul April 8, 2024, 1:12 p.m. UTC | #4
On 04/04/2024 15:08, Ross Lagerwall wrote:
> A malicious or buggy guest may generated buffered ioreqs faster than
> QEMU can process them in handle_buffered_iopage(). The result is a
> livelock - QEMU continuously processes ioreqs on the main thread without
> iterating through the main loop which prevents handling other events,
> processing timers, etc. Without QEMU handling other events, it often
> results in the guest becoming unsable and makes it difficult to stop the
> source of buffered ioreqs.
> 
> To avoid this, if we process a full page of buffered ioreqs, stop and
> reschedule an immediate timer to continue processing them. This lets
> QEMU go back to the main loop and catch up.
> 
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> ---
>   hw/xen/xen-hvm-common.c | 26 +++++++++++++++++---------
>   1 file changed, 17 insertions(+), 9 deletions(-)
> 

Reviewed-by: Paul Durrant <paul@xen.org>
Anthony PERARD April 9, 2024, 10:20 a.m. UTC | #5
On Thu, Apr 04, 2024 at 03:08:33PM +0100, Ross Lagerwall wrote:
> diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> index 1627da739822..1116b3978938 100644
> --- a/hw/xen/xen-hvm-common.c
> +++ b/hw/xen/xen-hvm-common.c
> @@ -521,22 +521,30 @@ static bool handle_buffered_iopage(XenIOState *state)
[...]
>  
>  static void handle_buffered_io(void *opaque)
>  {
> +    unsigned int handled;
>      XenIOState *state = opaque;
>  
> -    if (handle_buffered_iopage(state)) {
> +    handled = handle_buffered_iopage(state);
> +    if (handled >= IOREQ_BUFFER_SLOT_NUM) {
> +        /* We handled a full page of ioreqs. Schedule a timer to continue
> +         * processing while giving other stuff a chance to run.
> +         */

./scripts/checkpatch.pl report a style issue here:
    WARNING: Block comments use a leading /* on a separate line

I can try to remember to fix that on commit.

>          timer_mod(state->buffered_io_timer,
> -                BUFFER_IO_MAX_DELAY + qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
> -    } else {
> +                qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
> +    } else if (handled == 0) {

Just curious, why did you check for `handled == 0` here instead of
`handled != 0`? That would have avoided to invert the last 2 cases, and
the patch would just have introduce a new case without changing the
order of the existing ones. But not that important I guess.

>          timer_del(state->buffered_io_timer);
>          qemu_xen_evtchn_unmask(state->xce_handle, state->bufioreq_local_port);
> +    } else {
> +        timer_mod(state->buffered_io_timer,
> +                BUFFER_IO_MAX_DELAY + qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
>      }
>  }

Cheers,
Ross Lagerwall April 9, 2024, 2:19 p.m. UTC | #6
On Tue, Apr 9, 2024 at 11:20 AM Anthony PERARD <anthony.perard@cloud.com> wrote:
>
> On Thu, Apr 04, 2024 at 03:08:33PM +0100, Ross Lagerwall wrote:
> > diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> > index 1627da739822..1116b3978938 100644
> > --- a/hw/xen/xen-hvm-common.c
> > +++ b/hw/xen/xen-hvm-common.c
> > @@ -521,22 +521,30 @@ static bool handle_buffered_iopage(XenIOState *state)
> [...]
> >
> >  static void handle_buffered_io(void *opaque)
> >  {
> > +    unsigned int handled;
> >      XenIOState *state = opaque;
> >
> > -    if (handle_buffered_iopage(state)) {
> > +    handled = handle_buffered_iopage(state);
> > +    if (handled >= IOREQ_BUFFER_SLOT_NUM) {
> > +        /* We handled a full page of ioreqs. Schedule a timer to continue
> > +         * processing while giving other stuff a chance to run.
> > +         */
>
> ./scripts/checkpatch.pl report a style issue here:
>     WARNING: Block comments use a leading /* on a separate line
>
> I can try to remember to fix that on commit.

I copied the comment style from a few lines above but I guess it was
wrong.

>
> >          timer_mod(state->buffered_io_timer,
> > -                BUFFER_IO_MAX_DELAY + qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
> > -    } else {
> > +                qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
> > +    } else if (handled == 0) {
>
> Just curious, why did you check for `handled == 0` here instead of
> `handled != 0`? That would have avoided to invert the last 2 cases, and
> the patch would just have introduce a new case without changing the
> order of the existing ones. But not that important I guess.
>

In general I try to use conditionals with the least amount of negation
since I think it is easier to read. I can change it if you would prefer?

Ross
Peter Maydell April 9, 2024, 2:35 p.m. UTC | #7
On Tue, 9 Apr 2024 at 15:20, Ross Lagerwall <ross.lagerwall@citrix.com> wrote:
>
> On Tue, Apr 9, 2024 at 11:20 AM Anthony PERARD <anthony.perard@cloud.com> wrote:
> >
> > On Thu, Apr 04, 2024 at 03:08:33PM +0100, Ross Lagerwall wrote:
> > > diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> > > index 1627da739822..1116b3978938 100644
> > > --- a/hw/xen/xen-hvm-common.c
> > > +++ b/hw/xen/xen-hvm-common.c
> > > @@ -521,22 +521,30 @@ static bool handle_buffered_iopage(XenIOState *state)
> > [...]
> > >
> > >  static void handle_buffered_io(void *opaque)
> > >  {
> > > +    unsigned int handled;
> > >      XenIOState *state = opaque;
> > >
> > > -    if (handle_buffered_iopage(state)) {
> > > +    handled = handle_buffered_iopage(state);
> > > +    if (handled >= IOREQ_BUFFER_SLOT_NUM) {
> > > +        /* We handled a full page of ioreqs. Schedule a timer to continue
> > > +         * processing while giving other stuff a chance to run.
> > > +         */
> >
> > ./scripts/checkpatch.pl report a style issue here:
> >     WARNING: Block comments use a leading /* on a separate line
> >
> > I can try to remember to fix that on commit.
>
> I copied the comment style from a few lines above but I guess it was
> wrong.

Yes, this is one of those cases where we've settled on a
style choice but there's still quite a lot of older code
in the codebase that doesn't adhere to it. Checkpatch
usually will catch this kind of nit for you.

thanks
-- PMM
diff mbox series

Patch

diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 1627da739822..1116b3978938 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -463,11 +463,11 @@  static void handle_ioreq(XenIOState *state, ioreq_t *req)
     }
 }
 
-static bool handle_buffered_iopage(XenIOState *state)
+static unsigned int handle_buffered_iopage(XenIOState *state)
 {
     buffered_iopage_t *buf_page = state->buffered_io_page;
     buf_ioreq_t *buf_req = NULL;
-    bool handled_ioreq = false;
+    unsigned int handled = 0;
     ioreq_t req;
     int qw;
 
@@ -480,7 +480,7 @@  static bool handle_buffered_iopage(XenIOState *state)
     req.count = 1;
     req.dir = IOREQ_WRITE;
 
-    for (;;) {
+    do {
         uint32_t rdptr = buf_page->read_pointer, wrptr;
 
         xen_rmb();
@@ -521,22 +521,30 @@  static bool handle_buffered_iopage(XenIOState *state)
         assert(!req.data_is_ptr);
 
         qatomic_add(&buf_page->read_pointer, qw + 1);
-        handled_ioreq = true;
-    }
+        handled += qw + 1;
+    } while (handled < IOREQ_BUFFER_SLOT_NUM);
 
-    return handled_ioreq;
+    return handled;
 }
 
 static void handle_buffered_io(void *opaque)
 {
+    unsigned int handled;
     XenIOState *state = opaque;
 
-    if (handle_buffered_iopage(state)) {
+    handled = handle_buffered_iopage(state);
+    if (handled >= IOREQ_BUFFER_SLOT_NUM) {
+        /* We handled a full page of ioreqs. Schedule a timer to continue
+         * processing while giving other stuff a chance to run.
+         */
         timer_mod(state->buffered_io_timer,
-                BUFFER_IO_MAX_DELAY + qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
-    } else {
+                qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
+    } else if (handled == 0) {
         timer_del(state->buffered_io_timer);
         qemu_xen_evtchn_unmask(state->xce_handle, state->bufioreq_local_port);
+    } else {
+        timer_mod(state->buffered_io_timer,
+                BUFFER_IO_MAX_DELAY + qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
     }
 }