diff mbox series

[RFC,v4,21/27] qmp: let migrate-incoming allow out-of-band

Message ID 20171116130610.23582-22-peterx@redhat.com
State New
Headers show
Series QMP: out-of-band (OOB) execution support | expand

Commit Message

Peter Xu Nov. 16, 2017, 1:06 p.m. UTC
So it can get rid of being run on main thread.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 qapi/migration.json | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Dr. David Alan Gilbert Nov. 24, 2017, 1:14 p.m. UTC | #1
* Peter Xu (peterx@redhat.com) wrote:
> So it can get rid of being run on main thread.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

Last time I asked if you were sure that we didn't do locking,
and you explained that we end up just setting up a callback
that happens in the mainloop, and this shouldn't take a lock.
I confirmed this by:

  running with -incoming defer
  breakpointing in hmp_migrate_incoming
  doing   migrate_incoming tcp:0:4444
  once I hit that breakpointing adding two more breakpoints:
     a) qemu_mutex_lock_iothread
     b) the end of hmp's handle_hmp_command

  and indeed it hit the end of handle_hmp_command without
having hit the qemu_mutex_lock_iothread; so initially that
looks ok.

But then I added a break on pthread_mutex_lock, and I've got
this set caused by qemu_start_incoming_migration sending a
MIGRATION_STATUS_SETUP event:

#1  0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>)
    at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65
#2  0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442

440	    trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
441	
442	    qemu_mutex_lock(&monitor_lock);
443	

#3  0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661
#4  0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700)
    at /home/dgilbert/git/qemu/migration/migration.c:253
#5  0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730)
    at /home/dgilbert/git/qemu/migration/migration.c:1321

is there anything which protects us there?

Dave
    
> ---
>  qapi/migration.json | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index bbc4671ded..95098072dd 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1063,7 +1063,8 @@
>  # <- { "return": {} }
>  #
>  ##
> -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } }
> +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' },
> +  'allow-oob': true }
>  
>  ##
>  # @xen-save-devices-state:
> -- 
> 2.13.6
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu Nov. 27, 2017, 5:20 a.m. UTC | #2
On Fri, Nov 24, 2017 at 01:14:53PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > So it can get rid of being run on main thread.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Last time I asked if you were sure that we didn't do locking,
> and you explained that we end up just setting up a callback
> that happens in the mainloop, and this shouldn't take a lock.
> I confirmed this by:
> 
>   running with -incoming defer
>   breakpointing in hmp_migrate_incoming
>   doing   migrate_incoming tcp:0:4444
>   once I hit that breakpointing adding two more breakpoints:
>      a) qemu_mutex_lock_iothread
>      b) the end of hmp's handle_hmp_command
> 
>   and indeed it hit the end of handle_hmp_command without
> having hit the qemu_mutex_lock_iothread; so initially that
> looks ok.

I am not sure I fully understand the test above - I think it was
trying to verify the whole OOB thing running without BQL?  If so,
there are possibly two things missing:

Firstly, qemu_mutex_lock_iothread() is actually called before we call
hmp_migrate_incoming(). To verify it is simple: just break at entry of
hmp_migrate_incoming() and do "p iothread_locked", it'll be true (I
would always prefer printing that global variable to know whether
current thread is in a BQL section).

Secondly, HMP will still always take the BQL; this patch only enables
OOB for QMP command "migrate-incoming" rather than the HMP command. So
IMHO what we need to test is QMP command, rather than this HMP one.

To test that QMP command, it's still not that easy (actually awkward).
We need to first enable "OOB" when doing handshake for QMP:

  {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }

Then, we need to send the command with proper "id"/"control" field:

  { "execute": "migrate-incoming",
    "arguments": { "uri": "tcp:localhost:1234" },
    "control": { "run-oob": true },
    "id": "test-command" }

Note that here "id" field will be a must now since OOB requires that,
meanwhile the "control" field is a must too to make sure that is run
in OOB format (otherwise this command will still take BQL and run as
usual).  So if you see we do have a lot of protection to make sure we
only run OOB only if we really wanted to... otherwise we'll always run
in compatible and old way.

This time if we break at qmp_migrate_incoming() and then do "p
iothread_locked", we should see a false here.

> 
> But then I added a break on pthread_mutex_lock, and I've got
> this set caused by qemu_start_incoming_migration sending a
> MIGRATION_STATUS_SETUP event:
> 
> #1  0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>)
>     at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65
> #2  0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442
> 
> 440	    trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> 441	
> 442	    qemu_mutex_lock(&monitor_lock);
> 443	
> 
> #3  0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661
> #4  0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700)
>     at /home/dgilbert/git/qemu/migration/migration.c:253
> #5  0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730)
>     at /home/dgilbert/git/qemu/migration/migration.c:1321
> 
> is there anything which protects us there?

IIUC you mean what if we e.g. page fault during taking the
monitor_lock?  IMHO it just can't happen - monitor_lock is really used
in limited places and during those critical sections there is no guest
memory access at all (which only protects the monitor logic itself
AFAICT).

Thanks,

> 
> Dave
>     
> > ---
> >  qapi/migration.json | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index bbc4671ded..95098072dd 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1063,7 +1063,8 @@
> >  # <- { "return": {} }
> >  #
> >  ##
> > -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } }
> > +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' },
> > +  'allow-oob': true }
> >  
> >  ##
> >  # @xen-save-devices-state:
> > -- 
> > 2.13.6
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Dr. David Alan Gilbert Nov. 27, 2017, 10:44 a.m. UTC | #3
* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Nov 24, 2017 at 01:14:53PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > So it can get rid of being run on main thread.
> > > 
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > 
> > Last time I asked if you were sure that we didn't do locking,
> > and you explained that we end up just setting up a callback
> > that happens in the mainloop, and this shouldn't take a lock.
> > I confirmed this by:
> > 
> >   running with -incoming defer
> >   breakpointing in hmp_migrate_incoming
> >   doing   migrate_incoming tcp:0:4444
> >   once I hit that breakpointing adding two more breakpoints:
> >      a) qemu_mutex_lock_iothread
> >      b) the end of hmp's handle_hmp_command
> > 
> >   and indeed it hit the end of handle_hmp_command without
> > having hit the qemu_mutex_lock_iothread; so initially that
> > looks ok.
> 
> I am not sure I fully understand the test above - I think it was
> trying to verify the whole OOB thing running without BQL?  If so,
> there are possibly two things missing:
> 
> Firstly, qemu_mutex_lock_iothread() is actually called before we call
> hmp_migrate_incoming(). To verify it is simple: just break at entry of
> hmp_migrate_incoming() and do "p iothread_locked", it'll be true (I
> would always prefer printing that global variable to know whether
> current thread is in a BQL section).
> 

Yes, that's a good point - I was really trying to just follow through
qmp_migrate_incoming a bit, but you've got a point that it was really
the wrong way to look at it.

> OOB for QMP command "migrate-incoming" rather than the HMP command. So
> IMHO what we need to test is QMP command, rather than this HMP one.
> 
> To test that QMP command, it's still not that easy (actually awkward).
> We need to first enable "OOB" when doing handshake for QMP:
> 
>   {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } }
> 
> Then, we need to send the command with proper "id"/"control" field:
> 
>   { "execute": "migrate-incoming",
>     "arguments": { "uri": "tcp:localhost:1234" },
>     "control": { "run-oob": true },
>     "id": "test-command" }
> 
> Note that here "id" field will be a must now since OOB requires that,
> meanwhile the "control" field is a must too to make sure that is run
> in OOB format (otherwise this command will still take BQL and run as
> usual).  So if you see we do have a lot of protection to make sure we
> only run OOB only if we really wanted to... otherwise we'll always run
> in compatible and old way.
> 
> This time if we break at qmp_migrate_incoming() and then do "p
> iothread_locked", we should see a false here.
> 
> > 
> > But then I added a break on pthread_mutex_lock, and I've got
> > this set caused by qemu_start_incoming_migration sending a
> > MIGRATION_STATUS_SETUP event:
> > 
> > #1  0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>)
> >     at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65
> > #2  0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442
> > 
> > 440	    trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> > 441	
> > 442	    qemu_mutex_lock(&monitor_lock);
> > 443	
> > 
> > #3  0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661
> > #4  0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700)
> >     at /home/dgilbert/git/qemu/migration/migration.c:253
> > #5  0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730)
> >     at /home/dgilbert/git/qemu/migration/migration.c:1321
> > 
> > is there anything which protects us there?
> 
> IIUC you mean what if we e.g. page fault during taking the
> monitor_lock?  IMHO it just can't happen - monitor_lock is really used
> in limited places and during those critical sections there is no guest
> memory access at all (which only protects the monitor logic itself
> AFAICT).

OK, so we should document somewhere which locks it's OK to take in an
OOB command, and then make sure that for each of those locks we add
a note saying that anyone taking them must be careful.
We should also add a note above teh qmp_migrate_incoming that it's an
OOB command and to take care.

However, since you've convinced me it's OK:


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> Thanks,
> 
> > 
> > Dave
> >     
> > > ---
> > >  qapi/migration.json | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index bbc4671ded..95098072dd 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -1063,7 +1063,8 @@
> > >  # <- { "return": {} }
> > >  #
> > >  ##
> > > -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } }
> > > +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' },
> > > +  'allow-oob': true }
> > >  
> > >  ##
> > >  # @xen-save-devices-state:
> > > -- 
> > > 2.13.6
> > > 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu Nov. 27, 2017, 10:54 a.m. UTC | #4
On Mon, Nov 27, 2017 at 10:44:26AM +0000, Dr. David Alan Gilbert wrote:

[...]

> > > 
> > > But then I added a break on pthread_mutex_lock, and I've got
> > > this set caused by qemu_start_incoming_migration sending a
> > > MIGRATION_STATUS_SETUP event:
> > > 
> > > #1  0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>)
> > >     at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65
> > > #2  0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442
> > > 
> > > 440	    trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> > > 441	
> > > 442	    qemu_mutex_lock(&monitor_lock);
> > > 443	
> > > 
> > > #3  0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661
> > > #4  0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700)
> > >     at /home/dgilbert/git/qemu/migration/migration.c:253
> > > #5  0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730)
> > >     at /home/dgilbert/git/qemu/migration/migration.c:1321
> > > 
> > > is there anything which protects us there?
> > 
> > IIUC you mean what if we e.g. page fault during taking the
> > monitor_lock?  IMHO it just can't happen - monitor_lock is really used
> > in limited places and during those critical sections there is no guest
> > memory access at all (which only protects the monitor logic itself
> > AFAICT).
> 
> OK, so we should document somewhere which locks it's OK to take in an
> OOB command, and then make sure that for each of those locks we add
> a note saying that anyone taking them must be careful.
> We should also add a note above teh qmp_migrate_incoming that it's an
> OOB command and to take care.

Makes sense.  I think it will be hard to document what locks can be
taken during OOB command handing since there can be too many locks
there...  But at least I can add a comment to qmp_migrate_incoming()
along with current patch to note that this function should be OOB
friendly from now on.

> 
> However, since you've convinced me it's OK:
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Thanks!
Dr. David Alan Gilbert Nov. 27, 2017, 11:04 a.m. UTC | #5
* Peter Xu (peterx@redhat.com) wrote:
> On Mon, Nov 27, 2017 at 10:44:26AM +0000, Dr. David Alan Gilbert wrote:
> 
> [...]
> 
> > > > 
> > > > But then I added a break on pthread_mutex_lock, and I've got
> > > > this set caused by qemu_start_incoming_migration sending a
> > > > MIGRATION_STATUS_SETUP event:
> > > > 
> > > > #1  0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>)
> > > >     at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65
> > > > #2  0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442
> > > > 
> > > > 440	    trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> > > > 441	
> > > > 442	    qemu_mutex_lock(&monitor_lock);
> > > > 443	
> > > > 
> > > > #3  0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661
> > > > #4  0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700)
> > > >     at /home/dgilbert/git/qemu/migration/migration.c:253
> > > > #5  0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730)
> > > >     at /home/dgilbert/git/qemu/migration/migration.c:1321
> > > > 
> > > > is there anything which protects us there?
> > > 
> > > IIUC you mean what if we e.g. page fault during taking the
> > > monitor_lock?  IMHO it just can't happen - monitor_lock is really used
> > > in limited places and during those critical sections there is no guest
> > > memory access at all (which only protects the monitor logic itself
> > > AFAICT).
> > 
> > OK, so we should document somewhere which locks it's OK to take in an
> > OOB command, and then make sure that for each of those locks we add
> > a note saying that anyone taking them must be careful.
> > We should also add a note above teh qmp_migrate_incoming that it's an
> > OOB command and to take care.
> 
> Makes sense.  I think it will be hard to document what locks can be
> taken during OOB command handing since there can be too many locks
> there...

Yes, but we need people who are marking commands as OOB to really think
about it, otherwise they'll use a lock that in some weird corner case
can also be taken by another command that can block.

> But at least I can add a comment to qmp_migrate_incoming()
> along with current patch to note that this function should be OOB
> friendly from now on.

Thanks.

Dave
 
> > 
> > However, since you've convinced me it's OK:
> > 
> > 
> > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> Thanks!
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Peter Xu Nov. 27, 2017, 11:26 a.m. UTC | #6
On Mon, Nov 27, 2017 at 11:04:00AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Mon, Nov 27, 2017 at 10:44:26AM +0000, Dr. David Alan Gilbert wrote:
> > 
> > [...]
> > 
> > > > > 
> > > > > But then I added a break on pthread_mutex_lock, and I've got
> > > > > this set caused by qemu_start_incoming_migration sending a
> > > > > MIGRATION_STATUS_SETUP event:
> > > > > 
> > > > > #1  0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>)
> > > > >     at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65
> > > > > #2  0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442
> > > > > 
> > > > > 440	    trace_monitor_protocol_event_queue(event, qdict, evconf->rate);
> > > > > 441	
> > > > > 442	    qemu_mutex_lock(&monitor_lock);
> > > > > 443	
> > > > > 
> > > > > #3  0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661
> > > > > #4  0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700)
> > > > >     at /home/dgilbert/git/qemu/migration/migration.c:253
> > > > > #5  0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730)
> > > > >     at /home/dgilbert/git/qemu/migration/migration.c:1321
> > > > > 
> > > > > is there anything which protects us there?
> > > > 
> > > > IIUC you mean what if we e.g. page fault during taking the
> > > > monitor_lock?  IMHO it just can't happen - monitor_lock is really used
> > > > in limited places and during those critical sections there is no guest
> > > > memory access at all (which only protects the monitor logic itself
> > > > AFAICT).
> > > 
> > > OK, so we should document somewhere which locks it's OK to take in an
> > > OOB command, and then make sure that for each of those locks we add
> > > a note saying that anyone taking them must be careful.
> > > We should also add a note above teh qmp_migrate_incoming that it's an
> > > OOB command and to take care.
> > 
> > Makes sense.  I think it will be hard to document what locks can be
> > taken during OOB command handing since there can be too many locks
> > there...
> 
> Yes, but we need people who are marking commands as OOB to really think
> about it, otherwise they'll use a lock that in some weird corner case
> can also be taken by another command that can block.

Right.  That sounds like a HOWTO to migrate existing commands to
support OOB.  Hmm, could there be any/much?  I'm curious.

Anyways... how about I add this paragraph to doc patch?

  Converting an existing QMP command to be OOB-capable can be either
  very easy or extremely hard.  The most important thing is that, the
  command should never be blocked unexpectedly in any form.  For
  example, it's still legal to take a mutex in an OOB-capable command
  handler (but never the BQL!), but we need to make sure that all the
  other code paths that are protected by the same lock won't be
  blocked too.  Some candidates that can easily block: (1) doing IOs
  (especially when the backend can be an NFS storage), or (2)
  accessing guest memories (which can be simply blocked when it
  triggers a page fault, and which cannot be handled immediately).

Thanks,

-- Peter Xu
diff mbox series

Patch

diff --git a/qapi/migration.json b/qapi/migration.json
index bbc4671ded..95098072dd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1063,7 +1063,8 @@ 
 # <- { "return": {} }
 #
 ##
-{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } }
+{ 'command': 'migrate-incoming', 'data': {'uri': 'str' },
+  'allow-oob': true }
 
 ##
 # @xen-save-devices-state: