diff mbox series

blkdebug: ignore invalid rules in non-coroutine context

Message ID 20221013093523.586361-1-pbonzini@redhat.com
State New
Headers show
Series blkdebug: ignore invalid rules in non-coroutine context | expand

Commit Message

Paolo Bonzini Oct. 13, 2022, 9:35 a.m. UTC
blkdebug events can be called from either non-coroutine or coroutine
contexts.  However, some actions (specifically suspend actions and
errors reported with immediately=off) only make sense from within
a coroutine.

Currently, using those action would lead to an abort() in
qemu_coroutine_yield() ("Co-routine is yielding to no one").
Catch them and print an error instead.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/blkdebug.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

Comments

Markus Armbruster Oct. 13, 2022, 10:56 a.m. UTC | #1
Paolo Bonzini <pbonzini@redhat.com> writes:

> blkdebug events can be called from either non-coroutine or coroutine
> contexts.  However, some actions (specifically suspend actions and
> errors reported with immediately=off) only make sense from within
> a coroutine.
>
> Currently, using those action would lead to an abort() in
> qemu_coroutine_yield() ("Co-routine is yielding to no one").
> Catch them and print an error instead.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/blkdebug.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/block/blkdebug.c b/block/blkdebug.c
> index bbf2948703..bf0aedb17d 100644
> --- a/block/blkdebug.c
> +++ b/block/blkdebug.c
> @@ -31,6 +31,7 @@
>  #include "block/qdict.h"
>  #include "qemu/module.h"
>  #include "qemu/option.h"
> +#include "qemu/error-report.h"
>  #include "qapi/qapi-visit-block-core.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qlist.h"
> @@ -623,8 +624,13 @@ static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
>  
>      qemu_mutex_unlock(&s->lock);
>      if (!immediately) {
> -        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
> -        qemu_coroutine_yield();
> +        if (qemu_in_coroutine()) {
> +            aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
> +            qemu_coroutine_yield();
> +        } else {
> +            error_report("Non-coroutine event %s needs immediately = off\n",
> +                         BlkdebugEvent_lookup.array[rule->event]);

rule_check() is called from blkdebug_co_preadv(), blkdebug_co_pwritev(),
blkdebug_co_pwrite_zeroes(), blkdebug_co_pdiscard(),
blkdebug_co_block_status() (all marked coroutine_fn), and
blkdebug_co_flush() (which looks like it should be marked coroutine_fn).

Ignorant question: how could it be called outside coroutine context?

Also, code smell: reporting an error without taking an error path.  But
let's worry about that only after I understand the problem you're trying
to fix.

> +        }
>      }
>  
>      return -error;
> @@ -858,7 +864,12 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
>      }
>  
>      while (actions_count[ACTION_SUSPEND] > 0) {
> -        qemu_coroutine_yield();
> +        if (qemu_in_coroutine()) {
> +            qemu_coroutine_yield();
> +        } else {
> +            error_report("Non-coroutine event %s cannot suspend\n",
> +                         BlkdebugEvent_lookup.array[event]);
> +        }
>          actions_count[ACTION_SUSPEND]--;
>      }
>  }
Paolo Bonzini Oct. 13, 2022, 1:06 p.m. UTC | #2
On 10/13/22 12:56, Markus Armbruster wrote:
> rule_check() is called from blkdebug_co_preadv(), blkdebug_co_pwritev(),
> blkdebug_co_pwrite_zeroes(), blkdebug_co_pdiscard(),
> blkdebug_co_block_status() (all marked coroutine_fn), and
> blkdebug_co_flush() (which looks like it should be marked coroutine_fn).

Yes (separate patch sent, 
https://lore.kernel.org/qemu-devel/20221013123711.620631-11-pbonzini@redhat.com/T/#u).

> Ignorant question: how could it be called outside coroutine context?

You're right, only blkdebug_debug_event() can be called outside 
coroutine context.  I confused process_rule() (called by 
blkdebug_debug_event(), both inside and outside coroutine context) with 
rule_check() (called in coroutine context).

> Also, code smell: reporting an error without taking an error path.  But
> let's worry about that only after I understand the problem you're trying
> to fix.

Unfortunately there's no way to know in advance if an event will be 
called inside vs. outside a coroutine.  I can keep the abort() if you 
think it's preferrable, so what you get is still a crash but with a 
nicer error message.  Since this is debugging code either solution has 
pros and cons.

Paolo
Markus Armbruster Oct. 13, 2022, 1:28 p.m. UTC | #3
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 10/13/22 12:56, Markus Armbruster wrote:
>> rule_check() is called from blkdebug_co_preadv(), blkdebug_co_pwritev(),
>> blkdebug_co_pwrite_zeroes(), blkdebug_co_pdiscard(),
>> blkdebug_co_block_status() (all marked coroutine_fn), and
>> blkdebug_co_flush() (which looks like it should be marked coroutine_fn).
>
> Yes (separate patch sent, https://lore.kernel.org/qemu-devel/20221013123711.620631-11-pbonzini@redhat.com/T/#u).
>
>> Ignorant question: how could it be called outside coroutine context?
>
> You're right, only blkdebug_debug_event() can be called outside coroutine context.  I confused process_rule() (called by 
> blkdebug_debug_event(), both inside and outside coroutine context) with rule_check() (called in coroutine context).

Let's drop the rule_check() hunk then.

>> Also, code smell: reporting an error without taking an error path.  But
>> let's worry about that only after I understand the problem you're trying
>> to fix.
>
> Unfortunately there's no way to know in advance if an event will be called inside vs. outside a coroutine.  I can keep the abort() if you 
> think it's preferrable, so what you get is still a crash but with a nicer error message.  Since this is debugging code either solution has 
> pros and cons.

Let's have another look at the remaining patch hunk:

    @@ -858,7 +864,12 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
         }

         while (actions_count[ACTION_SUSPEND] > 0) {
    -        qemu_coroutine_yield();
    +        if (qemu_in_coroutine()) {
    +            qemu_coroutine_yield();
    +        } else {
    +            error_report("Non-coroutine event %s cannot suspend\n",
    +                         BlkdebugEvent_lookup.array[event]);
    +        }
             actions_count[ACTION_SUSPEND]--;
         }
     }

If I understand this correctly, the user asked us to suspend, but it now
turns out suspend doesn't make sense, so we ignore the request.
Correct?

warn_report()?  info_report()?
Paolo Bonzini Oct. 13, 2022, 3:41 p.m. UTC | #4
On 10/13/22 15:28, Markus Armbruster wrote:
> Let's have another look at the remaining patch hunk:
> 
>      @@ -858,7 +864,12 @@ static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
>           }
> 
>           while (actions_count[ACTION_SUSPEND] > 0) {
>      -        qemu_coroutine_yield();
>      +        if (qemu_in_coroutine()) {
>      +            qemu_coroutine_yield();
>      +        } else {
>      +            error_report("Non-coroutine event %s cannot suspend\n",
>      +                         BlkdebugEvent_lookup.array[event]);
>      +        }
>               actions_count[ACTION_SUSPEND]--;
>           }
>       }
> 
> If I understand this correctly, the user asked us to suspend, but it now
> turns out suspend doesn't make sense, so we ignore the request.
> Correct?

Yes.

> warn_report()?  info_report()?

Sure, warn_report() can work too.

Paolo
diff mbox series

Patch

diff --git a/block/blkdebug.c b/block/blkdebug.c
index bbf2948703..bf0aedb17d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -31,6 +31,7 @@ 
 #include "block/qdict.h"
 #include "qemu/module.h"
 #include "qemu/option.h"
+#include "qemu/error-report.h"
 #include "qapi/qapi-visit-block-core.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qlist.h"
@@ -623,8 +624,13 @@  static int rule_check(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
 
     qemu_mutex_unlock(&s->lock);
     if (!immediately) {
-        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
-        qemu_coroutine_yield();
+        if (qemu_in_coroutine()) {
+            aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
+            qemu_coroutine_yield();
+        } else {
+            error_report("Non-coroutine event %s needs immediately = off\n",
+                         BlkdebugEvent_lookup.array[rule->event]);
+        }
     }
 
     return -error;
@@ -858,7 +864,12 @@  static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
     }
 
     while (actions_count[ACTION_SUSPEND] > 0) {
-        qemu_coroutine_yield();
+        if (qemu_in_coroutine()) {
+            qemu_coroutine_yield();
+        } else {
+            error_report("Non-coroutine event %s cannot suspend\n",
+                         BlkdebugEvent_lookup.array[event]);
+        }
         actions_count[ACTION_SUSPEND]--;
     }
 }