diff mbox series

[v6,01/33] main-loop.h: introduce qemu_in_main_thread()

Message ID 20220121170544.2049944-2-eesposit@redhat.com
State New
Headers show
Series block layer: split block APIs in global state and I/O | expand

Commit Message

Emanuele Giuseppe Esposito Jan. 21, 2022, 5:05 p.m. UTC
When invoked from the main loop, this function is the same
as qemu_mutex_iothread_locked, and returns true if the BQL is held.
When invoked from iothreads or tests, it returns true only
if the current AioContext is the Main Loop.

This essentially just extends qemu_mutex_iothread_locked to work
also in unit tests or other users like storage-daemon, that run
in the Main Loop but end up using the implementation in
stubs/iothread-lock.c.

Using qemu_mutex_iothread_locked in unit tests defaults to false
because they use the implementation in stubs/iothread-lock,
making all assertions added in next patches fail despite the
AioContext is still the main loop.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/main-loop.h | 13 +++++++++++++
 softmmu/cpus.c           |  5 +++++
 stubs/iothread-lock.c    |  5 +++++
 3 files changed, 23 insertions(+)

Comments

Kevin Wolf Jan. 27, 2022, 10:56 a.m. UTC | #1
Am 21.01.2022 um 18:05 hat Emanuele Giuseppe Esposito geschrieben:
> When invoked from the main loop, this function is the same
> as qemu_mutex_iothread_locked, and returns true if the BQL is held.

So its name is misleading because it doesn't answer the question whether
we're in the main thread, but whethere we're holding the BQL (which is
mostly equivalent to holding an AioContext lock, not being in the home
thread of that AioContext).

> When invoked from iothreads or tests, it returns true only
> if the current AioContext is the Main Loop.
> 
> This essentially just extends qemu_mutex_iothread_locked to work
> also in unit tests or other users like storage-daemon, that run
> in the Main Loop but end up using the implementation in
> stubs/iothread-lock.c.
> 
> Using qemu_mutex_iothread_locked in unit tests defaults to false
> because they use the implementation in stubs/iothread-lock,
> making all assertions added in next patches fail despite the
> AioContext is still the main loop.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

This adds a new function that is almost the same as an existing
function, but the documentation is unclear when I should use one or the
other.

What are the use cases for the existing qemu_mutex_iothread_locked()
stub where we rely on false being returned?

If there aren't any, then maybe we should change the stub for that one
instead of adding a new function that behaves the same in the system
emulator and different only when it's stubbed out?

Kevin
Emanuele Giuseppe Esposito Jan. 31, 2022, 11:25 a.m. UTC | #2
On 27/01/2022 11:56, Kevin Wolf wrote:
> Am 21.01.2022 um 18:05 hat Emanuele Giuseppe Esposito geschrieben:
>> When invoked from the main loop, this function is the same
>> as qemu_mutex_iothread_locked, and returns true if the BQL is held.
> 
> So its name is misleading because it doesn't answer the question whether
> we're in the main thread, but whethere we're holding the BQL (which is
> mostly equivalent to holding an AioContext lock, not being in the home
> thread of that AioContext).
> 
>> When invoked from iothreads or tests, it returns true only
>> if the current AioContext is the Main Loop.
>>
>> This essentially just extends qemu_mutex_iothread_locked to work
>> also in unit tests or other users like storage-daemon, that run
>> in the Main Loop but end up using the implementation in
>> stubs/iothread-lock.c.
>>
>> Using qemu_mutex_iothread_locked in unit tests defaults to false
>> because they use the implementation in stubs/iothread-lock,
>> making all assertions added in next patches fail despite the
>> AioContext is still the main loop.
>>
>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> This adds a new function that is almost the same as an existing
> function, but the documentation is unclear when I should use one or the
> other.

Why do you think it is unclear? as explained in commit and
documentation, unit tests for example use stubs/iothread-lock, which
returns always false. So we can (and should!) use this function in APIs
invoked by
unit tests, because qemu_mutex_iothread_locked will just return false,
making all tests crash. On the other side, I am pretty sure it won't
cause any failure when running QEMU or qemu-iotests, because there most
of the API use the cpu implementation.

> 
> What are the use cases for the existing qemu_mutex_iothread_locked()
> stub where we rely on false being returned?

I don't think we ever rely on stub being false. There are only 2 places
where I see
!qemu_mutex_iothread_locked(), and are in helper_regs.c and cpus.c

So being cpu related functions they use the cpu implementation.

However, commit 5f50be9b5810293141bb53cfd0cb46e765367d56 changed the
stub to return false for a specific reason.

> 
> If there aren't any, then maybe we should change the stub for that one
> instead of adding a new function that behaves the same in the system
> emulator and different only when it's stubbed out?

I don't think we can replace it, bcause stubs/qemu_in_main_thread()
calls qemu_get_current_aio_context, that in turn calls
qemu_mutex_iothread_locked. So we implicitly rely on that "false".

Thank you,
Emanuele
Kevin Wolf Jan. 31, 2022, 2:33 p.m. UTC | #3
Am 31.01.2022 um 12:25 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> On 27/01/2022 11:56, Kevin Wolf wrote:
> > Am 21.01.2022 um 18:05 hat Emanuele Giuseppe Esposito geschrieben:
> >> When invoked from the main loop, this function is the same
> >> as qemu_mutex_iothread_locked, and returns true if the BQL is held.
> > 
> > So its name is misleading because it doesn't answer the question whether
> > we're in the main thread, but whethere we're holding the BQL (which is
> > mostly equivalent to holding an AioContext lock, not being in the home
> > thread of that AioContext).
> > 
> >> When invoked from iothreads or tests, it returns true only
> >> if the current AioContext is the Main Loop.
> >>
> >> This essentially just extends qemu_mutex_iothread_locked to work
> >> also in unit tests or other users like storage-daemon, that run
> >> in the Main Loop but end up using the implementation in
> >> stubs/iothread-lock.c.
> >>
> >> Using qemu_mutex_iothread_locked in unit tests defaults to false
> >> because they use the implementation in stubs/iothread-lock,
> >> making all assertions added in next patches fail despite the
> >> AioContext is still the main loop.
> >>
> >> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> >> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > 
> > This adds a new function that is almost the same as an existing
> > function, but the documentation is unclear when I should use one or the
> > other.
> 
> Why do you think it is unclear? as explained in commit and
> documentation, unit tests for example use stubs/iothread-lock, which
> returns always false. So we can (and should!) use this function in
> APIs invoked by unit tests, because qemu_mutex_iothread_locked will
> just return false, making all tests crash. On the other side, I am
> pretty sure it won't cause any failure when running QEMU or
> qemu-iotests, because there most of the API use the cpu
> implementation.

I feel "use this function if your code is going to be used by unit
tests, but use qemu_mutex_iothread_locked() otherwise" is a very strange
interface. I'm relatively sure that qemu_mutex_iothread_locked() isn't
primarily used to make unit tests crash.

Documentation and the definition of the interface of a function
shouldn't be about the caller, but about the semantics of the function
itself. So, which question does qemu_mutex_iothread_locked() answer, and
which question does qemu_in_main_thread() answer?

The meaning of their result must be different, otherwise there wouldn't
be two different functions that must be used in different contexts.

> > What are the use cases for the existing qemu_mutex_iothread_locked()
> > stub where we rely on false being returned?
> 
> I don't think we ever rely on stub being false. There are only 2
> places where I see !qemu_mutex_iothread_locked(), and are in
> helper_regs.c and cpus.c
> 
> So being cpu related functions they use the cpu implementation.
> 
> However, commit 5f50be9b5810293141bb53cfd0cb46e765367d56 changed the
> stub to return false for a specific reason.

I must admit I don't really understand the reasoning there:

    With this change, the stub qemu_mutex_iothread_locked() must be changed
    from true to false.  The previous value of true was needed because the
    main thread did not have an AioContext in the thread-local variable,
    but now it does have one.

This explains why it _can_ be changed to false for this caller, but not
why it _must_ be changed.

> > If there aren't any, then maybe we should change the stub for that one
> > instead of adding a new function that behaves the same in the system
> > emulator and different only when it's stubbed out?
> 
> I don't think we can replace it, bcause stubs/qemu_in_main_thread()
> calls qemu_get_current_aio_context, that in turn calls
> qemu_mutex_iothread_locked. So we implicitly rely on that "false".

qemu_get_current_aio_context() will just return my_aiocontext, which is
qemu_aio_context because all tools call qemu_init_main_loop(). I don't
think it will ever call qemu_mutex_iothread_locked() in tools, except
for worker threads.

Ooh, and that's why we return false. Because the only kind of non-main
threads accessing global state in tools are worker threads which never
hold the BQL in the system emulator.

So is the problem with the unit tests really just that they never call
qemu_init_main_loop() and therefore never set the AioContext for the
main thread?

Kevin
Paolo Bonzini Jan. 31, 2022, 3:49 p.m. UTC | #4
On 1/31/22 15:33, Kevin Wolf wrote:
> I feel "use this function if your code is going to be used by unit
> tests, but use qemu_mutex_iothread_locked() otherwise" is a very strange
> interface. I'm relatively sure that qemu_mutex_iothread_locked() isn't
> primarily used to make unit tests crash.

qemu_mutex_iothread_locked() should never be used in the block layer, 
because programs that use the block layer might not call an iothread 
lock, and indeed only the emulator have an iothread lock.

Making it always true would be okay when those programs were 
single-threaded, but really they all had worker threads so it was 
changed to false by commit 5f50be9b58 ("async: the main AioContext is 
only "current" if under the BQL", 2021-06-18).

> Documentation and the definition of the interface of a function
> shouldn't be about the caller, but about the semantics of the function
> itself. So, which question does qemu_mutex_iothread_locked() answer, and
> which question does qemu_in_main_thread() answer?

qemu_mutex_iothread_locked() -> Does the program have exclusive access 
to the emulator's globals.

qemu_in_main_thread() -> Does the program have access to the block 
layer's globals.  In emulators this is governed by the iothread lock, 
elsewhere they are accessible only from the home thread of the initial 
AioContext.

So, in emulators it is the same question, but in the block layer one of 
them is actually meaningless.

Really the "bug" is that qemu_mutex_iothread_locked() should really not 
be used outside emulators.  There are just two uses, but it's hard to 
remove them.

So why are two functions needed?  Two reasons:

- stubs/iothread-lock.c cannot define qemu_mutex_iothread_locked() as 
"return qemu_get_current_aio_context() == qemu_get_aio_context();" 
because it would cause infinite recursion with 
qemu_get_current_aio_context()

- even if it could, frankly qemu_mutex_iothread_locked() is a horrible 
name, and in the context of the block layer it conflicts especially 
badly with the "iothread" concept.

Maybe we should embrace the BQL name and rename the functions that refer 
to the "iothread lock".  But then I don't want to have code in the block 
layer that refers to a "big lock".

>>> What are the use cases for the existing qemu_mutex_iothread_locked()
>>> stub where we rely on false being returned?
>>
>> I don't think we ever rely on stub being false. There are only 2
>> places where I see !qemu_mutex_iothread_locked(), and are in
>> helper_regs.c and cpus.c

We rely on it indirectly, here:

     AioContext *qemu_get_current_aio_context(void)
     {
         if (my_aiocontext) {
             return my_aiocontext;
         }
         if (qemu_mutex_iothread_locked()) {
             /* Possibly in a vCPU thread.  */
             return qemu_get_aio_context();
         }
         return NULL;
     }

which affects the behavior of aio_co_enter.  Before that patch, worker 
threads computed qemu_get_aio_context(), while afterwards they compute NULL.

This is not just for unit tests but also for qemu-storage-daemon and 
other block layer programs.

>> However, commit 5f50be9b5810293141bb53cfd0cb46e765367d56 changed the
>> stub to return false for a specific reason.
> 
> I must admit I don't really understand the reasoning there:
> 
>      With this change, the stub qemu_mutex_iothread_locked() must be changed
>      from true to false.  The previous value of true was needed because the
>      main thread did not have an AioContext in the thread-local variable,
>      but now it does have one.
> 
> This explains why it _can_ be changed to false for this caller, but not
> why it _must_ be changed.

See above: because it returns the wrong value for all threads except 
one, and there are better ways to do a meaningful check in that one 
thread: using qemu_get_current_aio_context(), which is what aio_co_enter 
did at the time and qemu_in_main_thread() does with Emanuele's change.

> So is the problem with the unit tests really just that they never call
> qemu_init_main_loop() and therefore never set the AioContext for the
> main thread?

No, most of them do (and if some are missing it we can add it).

Paolo
Emanuele Giuseppe Esposito Feb. 1, 2022, 9:08 a.m. UTC | #5
On 31/01/2022 16:49, Paolo Bonzini wrote:
> On 1/31/22 15:33, Kevin Wolf wrote:
>> I feel "use this function if your code is going to be used by unit
>> tests, but use qemu_mutex_iothread_locked() otherwise" is a very strange
>> interface. I'm relatively sure that qemu_mutex_iothread_locked() isn't
>> primarily used to make unit tests crash.
> 
> qemu_mutex_iothread_locked() should never be used in the block layer,
> because programs that use the block layer might not call an iothread
> lock, and indeed only the emulator have an iothread lock.
> 
> Making it always true would be okay when those programs were
> single-threaded, but really they all had worker threads so it was
> changed to false by commit 5f50be9b58 ("async: the main AioContext is
> only "current" if under the BQL", 2021-06-18).
> 
>> Documentation and the definition of the interface of a function
>> shouldn't be about the caller, but about the semantics of the function
>> itself. So, which question does qemu_mutex_iothread_locked() answer, and
>> which question does qemu_in_main_thread() answer?
> 
> qemu_mutex_iothread_locked() -> Does the program have exclusive access
> to the emulator's globals.
> 
> qemu_in_main_thread() -> Does the program have access to the block
> layer's globals.  In emulators this is governed by the iothread lock,
> elsewhere they are accessible only from the home thread of the initial
> AioContext.
> 
> So, in emulators it is the same question, but in the block layer one of
> them is actually meaningless.
> 
> Really the "bug" is that qemu_mutex_iothread_locked() should really not
> be used outside emulators.  There are just two uses, but it's hard to
> remove them.
> 
> So why are two functions needed?  Two reasons:
> 
> - stubs/iothread-lock.c cannot define qemu_mutex_iothread_locked() as
> "return qemu_get_current_aio_context() == qemu_get_aio_context();"
> because it would cause infinite recursion with
> qemu_get_current_aio_context()
> 
> - even if it could, frankly qemu_mutex_iothread_locked() is a horrible
> name, and in the context of the block layer it conflicts especially
> badly with the "iothread" concept.
> 
> Maybe we should embrace the BQL name and rename the functions that refer
> to the "iothread lock".  But then I don't want to have code in the block
> layer that refers to a "big lock".

So based on Paolo's reply, I would modify the function comment in this way:

@@ -242,6 +242,9 @@ AioContext *iohandler_get_aio_context(void);
  * must always be taken outside other locks.  This function helps
  * functions take different paths depending on whether the current
  * thread is running within the main loop mutex.
+ *
+ * This function should never be used in the block layer, please
+ * instead refer to qemu_in_main_thread().
  */
 bool qemu_mutex_iothread_locked(void);

+
+/**
+ * qemu_in_main_thread: same as qemu_mutex_iothread_locked when
+ * softmmu/cpus.c implementation is linked. Otherwise this function
+ * checks that the current AioContext is the global AioContext
+ * (main loop).
+ *
+ * This is useful when checking that the BQL is held as a
+ * replacement of qemu_mutex_iothread_locked() usege in the
+ * block layer, since the former returns false when invoked by
+ * unit tests or other users like qemu-storage-daemon that end
+ * up using the stubs/iothread-lock.c implementation.
+ *
+ * This function should only be used in the block layer.
+ * Use this function to determine whether it is possible to safely
+ * access the block layer's globals.
+ */
+bool qemu_in_main_thread(void);

What do you think?

Emanuele
Paolo Bonzini Feb. 1, 2022, 10:41 a.m. UTC | #6
On 2/1/22 10:08, Emanuele Giuseppe Esposito wrote:
> 
> + *
> + * This function should never be used in the block layer, please
> + * instead refer to qemu_in_main_thread().


This function should never be used in the block layer, because unit 
tests, block layer tools and qemu-storage-daemon do not have a BQL.
Please instead refer to qemu_in_main_thread().

> +
> +/**
> + * qemu_in_main_thread: same as qemu_mutex_iothread_locked when
> + * softmmu/cpus.c implementation is linked. Otherwise this function
> + * checks that the current AioContext is the global AioContext
> + * (main loop).
> + *
> + * This is useful when checking that the BQL is held as a
> + * replacement of qemu_mutex_iothread_locked() usege in the
> + * block layer, since the former returns false when invoked by
> + * unit tests or other users like qemu-storage-daemon that end
> + * up using the stubs/iothread-lock.c implementation.
> + *
> + * This function should only be used in the block layer.
> + * Use this function to determine whether it is possible to safely
> + * access the block layer's globals.
> + */
> +bool qemu_in_main_thread(void);

I think the reference to qemu_mutex_iothread_locked() complicates 
things.  It's enough to explain the different policies in my opinion:

/**
  * qemu_in_main_thread: return whether it's possible to safely access
  * the global state of the block layer.
  *
  * Global state of the block layer is not accessible from I/O threads
  * or worker threads; only from threads that "own" the default
  * AioContext that qemu_get_aio_context() returns.  For tests, block
  * layer tools and qemu-storage-daemon there is a designated thread that
  * runs the event loop for qemu_get_aio_context(), and that is the
  * main thread.
  *
  * For emulators, however, any thread that holds the BQL can act
  * as the block layer main thread; this will be any of the actual
  * main thread, the vCPU threads or the RCU thread.
  *
  * For clarity, do not use this function outside the block layer.
  */

Paolo
Kevin Wolf Feb. 1, 2022, 11:55 a.m. UTC | #7
Am 31.01.2022 um 16:49 hat Paolo Bonzini geschrieben:
> > > However, commit 5f50be9b5810293141bb53cfd0cb46e765367d56 changed the
> > > stub to return false for a specific reason.
> > 
> > I must admit I don't really understand the reasoning there:
> > 
> >      With this change, the stub qemu_mutex_iothread_locked() must be changed
> >      from true to false.  The previous value of true was needed because the
> >      main thread did not have an AioContext in the thread-local variable,
> >      but now it does have one.
> > 
> > This explains why it _can_ be changed to false for this caller, but not
> > why it _must_ be changed.
> 
> See above: because it returns the wrong value for all threads except one,
> and there are better ways to do a meaningful check in that one thread: using
> qemu_get_current_aio_context(), which is what aio_co_enter did at the time
> and qemu_in_main_thread() does with Emanuele's change.
> 
> > So is the problem with the unit tests really just that they never call
> > qemu_init_main_loop() and therefore never set the AioContext for the
> > main thread?
> 
> No, most of them do (and if some are missing it we can add it).

But if they do, why doesn't qemu_get_current_aio_context() already
return the right result? In this case, my_aiocontext is set and it
should never even call qemu_mutex_iothread_locked() - at least not in
any case where qemu_in_main_thread() would return true.

So provided that qemu_init_main_loop() is called, both functions should
be equivalent and we wouldn't need a second one.

Kevin
Kevin Wolf Feb. 1, 2022, 12:10 p.m. UTC | #8
Am 01.02.2022 um 12:55 hat Kevin Wolf geschrieben:
> Am 31.01.2022 um 16:49 hat Paolo Bonzini geschrieben:
> > > > However, commit 5f50be9b5810293141bb53cfd0cb46e765367d56 changed the
> > > > stub to return false for a specific reason.
> > > 
> > > I must admit I don't really understand the reasoning there:
> > > 
> > >      With this change, the stub qemu_mutex_iothread_locked() must be changed
> > >      from true to false.  The previous value of true was needed because the
> > >      main thread did not have an AioContext in the thread-local variable,
> > >      but now it does have one.
> > > 
> > > This explains why it _can_ be changed to false for this caller, but not
> > > why it _must_ be changed.
> > 
> > See above: because it returns the wrong value for all threads except one,
> > and there are better ways to do a meaningful check in that one thread: using
> > qemu_get_current_aio_context(), which is what aio_co_enter did at the time
> > and qemu_in_main_thread() does with Emanuele's change.
> > 
> > > So is the problem with the unit tests really just that they never call
> > > qemu_init_main_loop() and therefore never set the AioContext for the
> > > main thread?
> > 
> > No, most of them do (and if some are missing it we can add it).
> 
> But if they do, why doesn't qemu_get_current_aio_context() already
> return the right result? In this case, my_aiocontext is set and it
> should never even call qemu_mutex_iothread_locked() - at least not in
> any case where qemu_in_main_thread() would return true.
> 
> So provided that qemu_init_main_loop() is called, both functions should
> be equivalent and we wouldn't need a second one.

Sorry, I was confused and comparing the wrong two functions.
qemu_get_current_aio_context() does return the right result and it's
exactly what qemu_in_main_thread() uses. So yes, it's right and it's
different from qemu_mutex_iothread_locked().

It would be less confusing if qemu_get_current_aio_context() used
qemu_in_main_thread() (with two different implementations of
qemu_in_main_thread() for the system emulator and tools) instead of the
other way around, but I guess that's harder to implement because we
would need a different way to figure out whether we're in the main
thread, at least as long as my_aiocontext is static and can't be
accessed in stubs. We could probably make it public, though.

Kevin
diff mbox series

Patch

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 8dbc6fcb89..6b8fa57c5d 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -245,6 +245,19 @@  AioContext *iohandler_get_aio_context(void);
  */
 bool qemu_mutex_iothread_locked(void);
 
+/**
+ * qemu_in_main_thread: same as qemu_mutex_iothread_locked when
+ * softmmu/cpus.c implementation is linked. Otherwise this function
+ * checks that the current AioContext is the global AioContext
+ * (main loop).
+ *
+ * This is useful when checking that the BQL is held, to avoid
+ * returning false when invoked by unit tests or other users like
+ * storage-daemon that end up using stubs/iothread-lock.c
+ * implementation.
+ */
+bool qemu_in_main_thread(void);
+
 /**
  * qemu_mutex_lock_iothread: Lock the main loop mutex.
  *
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 23bca46b07..fd4e139289 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -485,6 +485,11 @@  bool qemu_mutex_iothread_locked(void)
     return iothread_locked;
 }
 
+bool qemu_in_main_thread(void)
+{
+    return qemu_mutex_iothread_locked();
+}
+
 /*
  * The BQL is taken from so many places that it is worth profiling the
  * callers directly, instead of funneling them all through a single function.
diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
index 5b45b7fc8b..ff7386e42c 100644
--- a/stubs/iothread-lock.c
+++ b/stubs/iothread-lock.c
@@ -6,6 +6,11 @@  bool qemu_mutex_iothread_locked(void)
     return false;
 }
 
+bool qemu_in_main_thread(void)
+{
+    return qemu_get_current_aio_context() == qemu_get_aio_context();
+}
+
 void qemu_mutex_lock_iothread_impl(const char *file, int line)
 {
 }