diff mbox series

[v1] migration/postcopy: ensure preempt channel is ready before loading states

Message ID 20240404100550.17777-1-wei.w.wang@intel.com
State New
Headers show
Series [v1] migration/postcopy: ensure preempt channel is ready before loading states | expand

Commit Message

Wang, Wei W April 4, 2024, 10:05 a.m. UTC
Before loading the guest states, ensure that the preempt channel has been
ready to use, as some of the states (e.g. via virtio_load) might trigger
page faults that will be handled through the preempt channel. So yield to
the main thread in the case that the channel create event has been
dispatched.

Originally-by: Lei Wang <lei4.wang@intel.com>
Link: https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel.com/T/
Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
 migration/savevm.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Peter Xu April 4, 2024, 2:11 p.m. UTC | #1
On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> Before loading the guest states, ensure that the preempt channel has been
> ready to use, as some of the states (e.g. via virtio_load) might trigger
> page faults that will be handled through the preempt channel. So yield to
> the main thread in the case that the channel create event has been
> dispatched.
> 
> Originally-by: Lei Wang <lei4.wang@intel.com>
> Link: https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel.com/T/
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Lei Wang <lei4.wang@intel.com>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> ---
>  migration/savevm.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 388d7af7cd..fbc9f2bdd4 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2342,6 +2342,23 @@ static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
>  
>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
>  
> +    /*
> +     * Before loading the guest states, ensure that the preempt channel has
> +     * been ready to use, as some of the states (e.g. via virtio_load) might
> +     * trigger page faults that will be handled through the preempt channel.
> +     * So yield to the main thread in the case that the channel create event
> +     * has been dispatched.
> +     */
> +    do {
> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> +            mis->postcopy_qemufile_dst) {
> +            break;
> +        }
> +
> +        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
> +        qemu_coroutine_yield();
> +    } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 1));

I think we need s/!// here, so the same mistake I made?  I think we need to
rework the retval of qemu_sem_timedwait() at some point later..

Besides, this patch kept the sem_wait() in postcopy_preempt_thread() so it
will wait() on this sem again.  If this qemu_sem_timedwait() accidentally
consumed the sem count then I think the other thread can hang forever?

That's why I put the wait before creation of the preempt thread (in
postcopy_ram_incoming_setup()), as we can only consume the sem once, so we
must prepare the qemufile when the thread is created.

Thanks,

> +
>      ret = qemu_loadvm_state_main(packf, mis);
>      trace_loadvm_handle_cmd_packaged_main(ret);
>      qemu_fclose(packf);
> -- 
> 2.27.0
>
Wang, Wei W April 4, 2024, 4:25 p.m. UTC | #2
On Thursday, April 4, 2024 10:12 PM, Peter Xu wrote:
> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> > Before loading the guest states, ensure that the preempt channel has
> > been ready to use, as some of the states (e.g. via virtio_load) might
> > trigger page faults that will be handled through the preempt channel.
> > So yield to the main thread in the case that the channel create event
> > has been dispatched.
> >
> > Originally-by: Lei Wang <lei4.wang@intel.com>
> > Link:
> > https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel
> > .com/T/
> > Suggested-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Lei Wang <lei4.wang@intel.com>
> > Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > ---
> >  migration/savevm.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/migration/savevm.c b/migration/savevm.c index
> > 388d7af7cd..fbc9f2bdd4 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2342,6 +2342,23 @@ static int
> > loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> >
> >      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> >
> > +    /*
> > +     * Before loading the guest states, ensure that the preempt channel has
> > +     * been ready to use, as some of the states (e.g. via virtio_load) might
> > +     * trigger page faults that will be handled through the preempt channel.
> > +     * So yield to the main thread in the case that the channel create event
> > +     * has been dispatched.
> > +     */
> > +    do {
> > +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> > +            mis->postcopy_qemufile_dst) {
> > +            break;
> > +        }
> > +
> > +        aio_co_schedule(qemu_get_current_aio_context(),
> qemu_coroutine_self());
> > +        qemu_coroutine_yield();
> > +    } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > + 1));
> 
> I think we need s/!// here, so the same mistake I made?  I think we need to
> rework the retval of qemu_sem_timedwait() at some point later..

No. qemu_sem_timedwait returns false when timeout, which means sem isn’t posted yet.
So it needs to go back to the loop. (the patch was tested)

> 
> Besides, this patch kept the sem_wait() in postcopy_preempt_thread() so it
> will wait() on this sem again.  If this qemu_sem_timedwait() accidentally
> consumed the sem count then I think the other thread can hang forever?

I can get the issue you mentioned, and seems better to be placed before the creation of
the preempt thread. Then we probably don’t need to wait_sem in the preempt thread, as the
channel is guaranteed to be ready when it runs?

Update will be:

diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index eccff499cb..5a70ce4f23 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1254,6 +1254,15 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
     }

     if (migrate_postcopy_preempt()) {
+        do {
+            if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
+                mis->postcopy_qemufile_dst) {
+                break;
+            }
+            aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
+            qemu_coroutine_yield();
+        } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 1));
+
         /*
          * This thread needs to be created after the temp pages because
          * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
@@ -1743,12 +1752,6 @@ void *postcopy_preempt_thread(void *opaque)

     qemu_sem_post(&mis->thread_sync_sem);

-    /*
-     * The preempt channel is established in asynchronous way.  Wait
-     * for its completion.
-     */
-    qemu_sem_wait(&mis->postcopy_qemufile_dst_done);
Lei Wang April 4, 2024, 4:48 p.m. UTC | #3
On 4/5/2024 0:25, Wang, Wei W wrote:> On Thursday, April 4, 2024 10:12 PM, Peter
Xu wrote:
>> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
>>> Before loading the guest states, ensure that the preempt channel has
>>> been ready to use, as some of the states (e.g. via virtio_load) might
>>> trigger page faults that will be handled through the preempt channel.
>>> So yield to the main thread in the case that the channel create event
>>> has been dispatched.
>>>
>>> Originally-by: Lei Wang <lei4.wang@intel.com>
>>> Link:
>>> https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel
>>> .com/T/
>>> Suggested-by: Peter Xu <peterx@redhat.com>
>>> Signed-off-by: Lei Wang <lei4.wang@intel.com>
>>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>>> ---
>>>  migration/savevm.c | 17 +++++++++++++++++
>>>  1 file changed, 17 insertions(+)
>>>
>>> diff --git a/migration/savevm.c b/migration/savevm.c index
>>> 388d7af7cd..fbc9f2bdd4 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -2342,6 +2342,23 @@ static int
>>> loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
>>>
>>>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
>>>
>>> +    /*
>>> +     * Before loading the guest states, ensure that the preempt channel has
>>> +     * been ready to use, as some of the states (e.g. via virtio_load) might
>>> +     * trigger page faults that will be handled through the preempt channel.
>>> +     * So yield to the main thread in the case that the channel create event
>>> +     * has been dispatched.
>>> +     */
>>> +    do {
>>> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
>>> +            mis->postcopy_qemufile_dst) {
>>> +            break;
>>> +        }
>>> +
>>> +        aio_co_schedule(qemu_get_current_aio_context(),
>> qemu_coroutine_self());
>>> +        qemu_coroutine_yield();
>>> +    } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
>>> + 1));
>>
>> I think we need s/!// here, so the same mistake I made?  I think we need to
>> rework the retval of qemu_sem_timedwait() at some point later..
> 
> No. qemu_sem_timedwait returns false when timeout, which means sem isn’t posted yet.
> So it needs to go back to the loop. (the patch was tested)

When timeout, qemu_sem_timedwait() will return -1. I think the patch test passed
may because you will always have at least one yield (the first yield in the do
...while ...) when loadvm_handle_cmd_packaged()?

> 
>>
>> Besides, this patch kept the sem_wait() in postcopy_preempt_thread() so it
>> will wait() on this sem again.  If this qemu_sem_timedwait() accidentally
>> consumed the sem count then I think the other thread can hang forever?
> 
> I can get the issue you mentioned, and seems better to be placed before the creation of
> the preempt thread. Then we probably don’t need to wait_sem in the preempt thread, as the
> channel is guaranteed to be ready when it runs?
> 
> Update will be:
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index eccff499cb..5a70ce4f23 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -1254,6 +1254,15 @@ int postcopy_ram_incoming_setup(MigrationIncomingState *mis)
>      }
> 
>      if (migrate_postcopy_preempt()) {
> +        do {
> +            if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> +                mis->postcopy_qemufile_dst) {
> +                break;
> +            }
> +            aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
> +            qemu_coroutine_yield();
> +        } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 1));
> +
>          /*
>           * This thread needs to be created after the temp pages because
>           * it'll fetch RAM_CHANNEL_POSTCOPY PostcopyTmpPage immediately.
> @@ -1743,12 +1752,6 @@ void *postcopy_preempt_thread(void *opaque)
> 
>      qemu_sem_post(&mis->thread_sync_sem);
> 
> -    /*
> -     * The preempt channel is established in asynchronous way.  Wait
> -     * for its completion.
> -     */
> -    qemu_sem_wait(&mis->postcopy_qemufile_dst_done);
> 
> 
> 
> 
> 
> 
>
Peter Xu April 4, 2024, 8:56 p.m. UTC | #4
On Fri, Apr 05, 2024 at 12:48:15AM +0800, Wang, Lei wrote:
> On 4/5/2024 0:25, Wang, Wei W wrote:> On Thursday, April 4, 2024 10:12 PM, Peter
> Xu wrote:
> >> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> >>> Before loading the guest states, ensure that the preempt channel has
> >>> been ready to use, as some of the states (e.g. via virtio_load) might
> >>> trigger page faults that will be handled through the preempt channel.
> >>> So yield to the main thread in the case that the channel create event
> >>> has been dispatched.
> >>>
> >>> Originally-by: Lei Wang <lei4.wang@intel.com>
> >>> Link:
> >>> https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel
> >>> .com/T/
> >>> Suggested-by: Peter Xu <peterx@redhat.com>
> >>> Signed-off-by: Lei Wang <lei4.wang@intel.com>
> >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> >>> ---
> >>>  migration/savevm.c | 17 +++++++++++++++++
> >>>  1 file changed, 17 insertions(+)
> >>>
> >>> diff --git a/migration/savevm.c b/migration/savevm.c index
> >>> 388d7af7cd..fbc9f2bdd4 100644
> >>> --- a/migration/savevm.c
> >>> +++ b/migration/savevm.c
> >>> @@ -2342,6 +2342,23 @@ static int
> >>> loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> >>>
> >>>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> >>>
> >>> +    /*
> >>> +     * Before loading the guest states, ensure that the preempt channel has
> >>> +     * been ready to use, as some of the states (e.g. via virtio_load) might
> >>> +     * trigger page faults that will be handled through the preempt channel.
> >>> +     * So yield to the main thread in the case that the channel create event
> >>> +     * has been dispatched.
> >>> +     */
> >>> +    do {
> >>> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> >>> +            mis->postcopy_qemufile_dst) {
> >>> +            break;
> >>> +        }
> >>> +
> >>> +        aio_co_schedule(qemu_get_current_aio_context(),
> >> qemu_coroutine_self());
> >>> +        qemu_coroutine_yield();
> >>> +    } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> >>> + 1));
> >>
> >> I think we need s/!// here, so the same mistake I made?  I think we need to
> >> rework the retval of qemu_sem_timedwait() at some point later..
> > 
> > No. qemu_sem_timedwait returns false when timeout, which means sem isn’t posted yet.
> > So it needs to go back to the loop. (the patch was tested)
> 
> When timeout, qemu_sem_timedwait() will return -1. I think the patch test passed
> may because you will always have at least one yield (the first yield in the do
> ...while ...) when loadvm_handle_cmd_packaged()?

My guess is that here the kick will work and qemu_sem_timedwait() later
will ETIMEOUT -> qemu_sem_timedwait() returns -1, then the loop just broke.
That aio schedule should make sure anyway that the file is ready; the
preempt thread must run before this to not hang that thread.

I think it more kind of justifies that the retval needs to be properly
defined. :( It's confusion is on top of when I know libpthread returns
positive error codes.

Thans,
Wang, Wei W April 5, 2024, 1:38 a.m. UTC | #5
On Friday, April 5, 2024 4:57 AM, Peter Xu wrote:
> On Fri, Apr 05, 2024 at 12:48:15AM +0800, Wang, Lei wrote:
> > On 4/5/2024 0:25, Wang, Wei W wrote:> On Thursday, April 4, 2024 10:12
> > PM, Peter Xu wrote:
> > >> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> > >>> Before loading the guest states, ensure that the preempt channel
> > >>> has been ready to use, as some of the states (e.g. via
> > >>> virtio_load) might trigger page faults that will be handled through the
> preempt channel.
> > >>> So yield to the main thread in the case that the channel create
> > >>> event has been dispatched.
> > >>>
> > >>> Originally-by: Lei Wang <lei4.wang@intel.com>
> > >>> Link:
> > >>> https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@i
> > >>> ntel
> > >>> .com/T/
> > >>> Suggested-by: Peter Xu <peterx@redhat.com>
> > >>> Signed-off-by: Lei Wang <lei4.wang@intel.com>
> > >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > >>> ---
> > >>>  migration/savevm.c | 17 +++++++++++++++++
> > >>>  1 file changed, 17 insertions(+)
> > >>>
> > >>> diff --git a/migration/savevm.c b/migration/savevm.c index
> > >>> 388d7af7cd..fbc9f2bdd4 100644
> > >>> --- a/migration/savevm.c
> > >>> +++ b/migration/savevm.c
> > >>> @@ -2342,6 +2342,23 @@ static int
> > >>> loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> > >>>
> > >>>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> > >>>
> > >>> +    /*
> > >>> +     * Before loading the guest states, ensure that the preempt channel
> has
> > >>> +     * been ready to use, as some of the states (e.g. via virtio_load) might
> > >>> +     * trigger page faults that will be handled through the preempt
> channel.
> > >>> +     * So yield to the main thread in the case that the channel create
> event
> > >>> +     * has been dispatched.
> > >>> +     */
> > >>> +    do {
> > >>> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> > >>> +            mis->postcopy_qemufile_dst) {
> > >>> +            break;
> > >>> +        }
> > >>> +
> > >>> +        aio_co_schedule(qemu_get_current_aio_context(),
> > >> qemu_coroutine_self());
> > >>> +        qemu_coroutine_yield();
> > >>> +    } while
> > >>> + (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > >>> + 1));
> > >>
> > >> I think we need s/!// here, so the same mistake I made?  I think we
> > >> need to rework the retval of qemu_sem_timedwait() at some point later..
> > >
> > > No. qemu_sem_timedwait returns false when timeout, which means sem
> isn’t posted yet.
> > > So it needs to go back to the loop. (the patch was tested)
> >
> > When timeout, qemu_sem_timedwait() will return -1. I think the patch
> > test passed may because you will always have at least one yield (the
> > first yield in the do ...while ...) when loadvm_handle_cmd_packaged()?
> 
> My guess is that here the kick will work and qemu_sem_timedwait() later will
> ETIMEOUT -> qemu_sem_timedwait() returns -1, then the loop just broke.
> That aio schedule should make sure anyway that the file is ready; the preempt
> thread must run before this to not hang that thread.

Yes, misread of the return value. It still worked because the loop broke at
the "if (mis->postcopy_qemufile_dst)" check.

Even below will work:
do {
    if (mis->postcopy_qemufile_dst) {
        break;
     }
...
} while (1);

I still don’t see the value of using postcopy_qemufile_dst_done sem in the code though
It simplify blocks the main thread from creating the preempt channel for 1ms (regardless
of the possibility about whether the sem has been be posted or not. We add it for the case
it is not posted and need to go back to the loop).
Peter Xu April 5, 2024, 2:32 a.m. UTC | #6
On Fri, Apr 05, 2024 at 01:38:31AM +0000, Wang, Wei W wrote:
> On Friday, April 5, 2024 4:57 AM, Peter Xu wrote:
> > On Fri, Apr 05, 2024 at 12:48:15AM +0800, Wang, Lei wrote:
> > > On 4/5/2024 0:25, Wang, Wei W wrote:> On Thursday, April 4, 2024 10:12
> > > PM, Peter Xu wrote:
> > > >> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> > > >>> Before loading the guest states, ensure that the preempt channel
> > > >>> has been ready to use, as some of the states (e.g. via
> > > >>> virtio_load) might trigger page faults that will be handled through the
> > preempt channel.
> > > >>> So yield to the main thread in the case that the channel create
> > > >>> event has been dispatched.
> > > >>>
> > > >>> Originally-by: Lei Wang <lei4.wang@intel.com>
> > > >>> Link:
> > > >>> https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@i
> > > >>> ntel
> > > >>> .com/T/
> > > >>> Suggested-by: Peter Xu <peterx@redhat.com>
> > > >>> Signed-off-by: Lei Wang <lei4.wang@intel.com>
> > > >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > >>> ---
> > > >>>  migration/savevm.c | 17 +++++++++++++++++
> > > >>>  1 file changed, 17 insertions(+)
> > > >>>
> > > >>> diff --git a/migration/savevm.c b/migration/savevm.c index
> > > >>> 388d7af7cd..fbc9f2bdd4 100644
> > > >>> --- a/migration/savevm.c
> > > >>> +++ b/migration/savevm.c
> > > >>> @@ -2342,6 +2342,23 @@ static int
> > > >>> loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> > > >>>
> > > >>>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> > > >>>
> > > >>> +    /*
> > > >>> +     * Before loading the guest states, ensure that the preempt channel
> > has
> > > >>> +     * been ready to use, as some of the states (e.g. via virtio_load) might
> > > >>> +     * trigger page faults that will be handled through the preempt
> > channel.
> > > >>> +     * So yield to the main thread in the case that the channel create
> > event
> > > >>> +     * has been dispatched.
> > > >>> +     */
> > > >>> +    do {
> > > >>> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> > > >>> +            mis->postcopy_qemufile_dst) {
> > > >>> +            break;
> > > >>> +        }
> > > >>> +
> > > >>> +        aio_co_schedule(qemu_get_current_aio_context(),
> > > >> qemu_coroutine_self());
> > > >>> +        qemu_coroutine_yield();
> > > >>> +    } while
> > > >>> + (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > > >>> + 1));
> > > >>
> > > >> I think we need s/!// here, so the same mistake I made?  I think we
> > > >> need to rework the retval of qemu_sem_timedwait() at some point later..
> > > >
> > > > No. qemu_sem_timedwait returns false when timeout, which means sem
> > isn’t posted yet.
> > > > So it needs to go back to the loop. (the patch was tested)
> > >
> > > When timeout, qemu_sem_timedwait() will return -1. I think the patch
> > > test passed may because you will always have at least one yield (the
> > > first yield in the do ...while ...) when loadvm_handle_cmd_packaged()?
> > 
> > My guess is that here the kick will work and qemu_sem_timedwait() later will
> > ETIMEOUT -> qemu_sem_timedwait() returns -1, then the loop just broke.
> > That aio schedule should make sure anyway that the file is ready; the preempt
> > thread must run before this to not hang that thread.
> 
> Yes, misread of the return value. It still worked because the loop broke at
> the "if (mis->postcopy_qemufile_dst)" check.
> 
> Even below will work:
> do {
>     if (mis->postcopy_qemufile_dst) {
>         break;
>      }
> ...
> } while (1);
> 
> I still don’t see the value of using postcopy_qemufile_dst_done sem in the code though
> It simplify blocks the main thread from creating the preempt channel for 1ms (regardless
> of the possibility about whether the sem has been be posted or not. We add it for the case
> it is not posted and need to go back to the loop).

I think it used to only wait() in the preempt thread, so that is needed.
It's also needed when postcopy is interrupted and need a recover, see
loadvm_postcopy_handle_resume(), in that case it's the postcopy ram load
thread that waits for it rather than the main thread or preempt thread.

Indeed if we move channel creation out of the preempt thread then it seems
we don't need the sem in this path.  However the other path will still need
it, then when the new channel created (postcopy_preempt_new_channel) we'll
need to identify a "switch to postcopy" case or "postcopy recovery" case,
only post the sem when the former.  I think it might complicate the code,
I'll think again tomorrow after a sleep so my brain will work better, but I
doubt this is what we want to do at rc3.

If you feel comfortable, please feel free to send a version that you think
is the most correct so far (if you prefer no timedwait it's fine), and make
sure the test works the best on your side.  Then I'll smoke it a bit during
weekends. Please always keep that in mind if that will be for rc3 it should
be non-intrusive change, or we keep it slow for 9.1 and we don't need to
rush.

Thanks,
Wang, Wei W April 5, 2024, 3:06 a.m. UTC | #7
On Friday, April 5, 2024 10:33 AM, Peter Xu wrote:
> On Fri, Apr 05, 2024 at 01:38:31AM +0000, Wang, Wei W wrote:
> > On Friday, April 5, 2024 4:57 AM, Peter Xu wrote:
> > > On Fri, Apr 05, 2024 at 12:48:15AM +0800, Wang, Lei wrote:
> > > > On 4/5/2024 0:25, Wang, Wei W wrote:> On Thursday, April 4, 2024
> > > > 10:12 PM, Peter Xu wrote:
> > > > >> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> > > > >>> Before loading the guest states, ensure that the preempt
> > > > >>> channel has been ready to use, as some of the states (e.g. via
> > > > >>> virtio_load) might trigger page faults that will be handled
> > > > >>> through the
> > > preempt channel.
> > > > >>> So yield to the main thread in the case that the channel
> > > > >>> create event has been dispatched.
> > > > >>>
> > > > >>> Originally-by: Lei Wang <lei4.wang@intel.com>
> > > > >>> Link:
> > > > >>> https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced2
> > > > >>> 49@i
> > > > >>> ntel
> > > > >>> .com/T/
> > > > >>> Suggested-by: Peter Xu <peterx@redhat.com>
> > > > >>> Signed-off-by: Lei Wang <lei4.wang@intel.com>
> > > > >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > >>> ---
> > > > >>>  migration/savevm.c | 17 +++++++++++++++++
> > > > >>>  1 file changed, 17 insertions(+)
> > > > >>>
> > > > >>> diff --git a/migration/savevm.c b/migration/savevm.c index
> > > > >>> 388d7af7cd..fbc9f2bdd4 100644
> > > > >>> --- a/migration/savevm.c
> > > > >>> +++ b/migration/savevm.c
> > > > >>> @@ -2342,6 +2342,23 @@ static int
> > > > >>> loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> > > > >>>
> > > > >>>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> > > > >>>
> > > > >>> +    /*
> > > > >>> +     * Before loading the guest states, ensure that the
> > > > >>> + preempt channel
> > > has
> > > > >>> +     * been ready to use, as some of the states (e.g. via virtio_load)
> might
> > > > >>> +     * trigger page faults that will be handled through the
> > > > >>> + preempt
> > > channel.
> > > > >>> +     * So yield to the main thread in the case that the
> > > > >>> + channel create
> > > event
> > > > >>> +     * has been dispatched.
> > > > >>> +     */
> > > > >>> +    do {
> > > > >>> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> > > > >>> +            mis->postcopy_qemufile_dst) {
> > > > >>> +            break;
> > > > >>> +        }
> > > > >>> +
> > > > >>> +        aio_co_schedule(qemu_get_current_aio_context(),
> > > > >> qemu_coroutine_self());
> > > > >>> +        qemu_coroutine_yield();
> > > > >>> +    } while
> > > > >>> + (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > > > >>> + 1));
> > > > >>
> > > > >> I think we need s/!// here, so the same mistake I made?  I
> > > > >> think we need to rework the retval of qemu_sem_timedwait() at some
> point later..
> > > > >
> > > > > No. qemu_sem_timedwait returns false when timeout, which means
> > > > > sem
> > > isn’t posted yet.
> > > > > So it needs to go back to the loop. (the patch was tested)
> > > >
> > > > When timeout, qemu_sem_timedwait() will return -1. I think the
> > > > patch test passed may because you will always have at least one
> > > > yield (the first yield in the do ...while ...) when
> loadvm_handle_cmd_packaged()?
> > >
> > > My guess is that here the kick will work and qemu_sem_timedwait()
> > > later will ETIMEOUT -> qemu_sem_timedwait() returns -1, then the loop
> just broke.
> > > That aio schedule should make sure anyway that the file is ready;
> > > the preempt thread must run before this to not hang that thread.
> >
> > Yes, misread of the return value. It still worked because the loop
> > broke at the "if (mis->postcopy_qemufile_dst)" check.
> >
> > Even below will work:
> > do {
> >     if (mis->postcopy_qemufile_dst) {
> >         break;
> >      }
> > ...
> > } while (1);
> >
> > I still don’t see the value of using postcopy_qemufile_dst_done sem in
> > the code though It simplify blocks the main thread from creating the
> > preempt channel for 1ms (regardless of the possibility about whether
> > the sem has been be posted or not. We add it for the case it is not posted
> and need to go back to the loop).
> 
> I think it used to only wait() in the preempt thread, so that is needed.
> It's also needed when postcopy is interrupted and need a recover, see
> loadvm_postcopy_handle_resume(), in that case it's the postcopy ram load
> thread that waits for it rather than the main thread or preempt thread.
> 
> Indeed if we move channel creation out of the preempt thread then it seems
> we don't need the sem in this path.  However the other path will still need it,
> then when the new channel created (postcopy_preempt_new_channel) we'll
> need to identify a "switch to postcopy" case or "postcopy recovery" case, only
> post the sem when the former.  I think it might complicate the code, I'll think
> again tomorrow after a sleep so my brain will work better, but I doubt this is
> what we want to do at rc3.

Yes, it's a bit rushed (no need to remove it completely at rc3).

> 
> If you feel comfortable, please feel free to send a version that you think is the
> most correct so far (if you prefer no timedwait it's fine), and make sure the test
> works the best on your side.  Then I'll smoke it a bit during weekends. Please
> always keep that in mind if that will be for rc3 it should be non-intrusive
> change, or we keep it slow for 9.1 and we don't need to rush.
> 
Sounds good.
diff mbox series

Patch

diff --git a/migration/savevm.c b/migration/savevm.c
index 388d7af7cd..fbc9f2bdd4 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2342,6 +2342,23 @@  static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
 
     QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
 
+    /*
+     * Before loading the guest states, ensure that the preempt channel has
+     * been ready to use, as some of the states (e.g. via virtio_load) might
+     * trigger page faults that will be handled through the preempt channel.
+     * So yield to the main thread in the case that the channel create event
+     * has been dispatched.
+     */
+    do {
+        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
+            mis->postcopy_qemufile_dst) {
+            break;
+        }
+
+        aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
+        qemu_coroutine_yield();
+    } while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done, 1));
+
     ret = qemu_loadvm_state_main(packf, mis);
     trace_loadvm_handle_cmd_packaged_main(ret);
     qemu_fclose(packf);