Message ID | 20171116130610.23582-22-peterx@redhat.com |
---|---|
State | New |
Headers | show |
Series | QMP: out-of-band (OOB) execution support | expand |
* Peter Xu (peterx@redhat.com) wrote: > So it can get rid of being run on main thread. > > Signed-off-by: Peter Xu <peterx@redhat.com> Last time I asked if you were sure that we didn't do locking, and you explained that we end up just setting up a callback that happens in the mainloop, and this shouldn't take a lock. I confirmed this by: running with -incoming defer breakpointing in hmp_migrate_incoming doing migrate_incoming tcp:0:4444 once I hit that breakpointing adding two more breakpoints: a) qemu_mutex_lock_iothread b) the end of hmp's handle_hmp_command and indeed it hit the end of handle_hmp_command without having hit the qemu_mutex_lock_iothread; so initially that looks ok. But then I added a break on pthread_mutex_lock, and I've got this set caused by qemu_start_incoming_migration sending a MIGRATION_STATUS_SETUP event: #1 0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>) at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65 #2 0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442 440 trace_monitor_protocol_event_queue(event, qdict, evconf->rate); 441 442 qemu_mutex_lock(&monitor_lock); 443 #3 0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661 #4 0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700) at /home/dgilbert/git/qemu/migration/migration.c:253 #5 0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730) at /home/dgilbert/git/qemu/migration/migration.c:1321 is there anything which protects us there? Dave > --- > qapi/migration.json | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index bbc4671ded..95098072dd 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -1063,7 +1063,8 @@ > # <- { "return": {} } > # > ## > -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } } > +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' }, > + 'allow-oob': true } > > ## > # @xen-save-devices-state: > -- > 2.13.6 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Nov 24, 2017 at 01:14:53PM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > So it can get rid of being run on main thread. > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > Last time I asked if you were sure that we didn't do locking, > and you explained that we end up just setting up a callback > that happens in the mainloop, and this shouldn't take a lock. > I confirmed this by: > > running with -incoming defer > breakpointing in hmp_migrate_incoming > doing migrate_incoming tcp:0:4444 > once I hit that breakpointing adding two more breakpoints: > a) qemu_mutex_lock_iothread > b) the end of hmp's handle_hmp_command > > and indeed it hit the end of handle_hmp_command without > having hit the qemu_mutex_lock_iothread; so initially that > looks ok. I am not sure I fully understand the test above - I think it was trying to verify the whole OOB thing running without BQL? If so, there are possibly two things missing: Firstly, qemu_mutex_lock_iothread() is actually called before we call hmp_migrate_incoming(). To verify it is simple: just break at entry of hmp_migrate_incoming() and do "p iothread_locked", it'll be true (I would always prefer printing that global variable to know whether current thread is in a BQL section). Secondly, HMP will still always take the BQL; this patch only enables OOB for QMP command "migrate-incoming" rather than the HMP command. So IMHO what we need to test is QMP command, rather than this HMP one. To test that QMP command, it's still not that easy (actually awkward). We need to first enable "OOB" when doing handshake for QMP: {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } Then, we need to send the command with proper "id"/"control" field: { "execute": "migrate-incoming", "arguments": { "uri": "tcp:localhost:1234" }, "control": { "run-oob": true }, "id": "test-command" } Note that here "id" field will be a must now since OOB requires that, meanwhile the "control" field is a must too to make sure that is run in OOB format (otherwise this command will still take BQL and run as usual). So if you see we do have a lot of protection to make sure we only run OOB only if we really wanted to... otherwise we'll always run in compatible and old way. This time if we break at qmp_migrate_incoming() and then do "p iothread_locked", we should see a false here. > > But then I added a break on pthread_mutex_lock, and I've got > this set caused by qemu_start_incoming_migration sending a > MIGRATION_STATUS_SETUP event: > > #1 0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>) > at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65 > #2 0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442 > > 440 trace_monitor_protocol_event_queue(event, qdict, evconf->rate); > 441 > 442 qemu_mutex_lock(&monitor_lock); > 443 > > #3 0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661 > #4 0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700) > at /home/dgilbert/git/qemu/migration/migration.c:253 > #5 0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730) > at /home/dgilbert/git/qemu/migration/migration.c:1321 > > is there anything which protects us there? IIUC you mean what if we e.g. page fault during taking the monitor_lock? IMHO it just can't happen - monitor_lock is really used in limited places and during those critical sections there is no guest memory access at all (which only protects the monitor logic itself AFAICT). Thanks, > > Dave > > > --- > > qapi/migration.json | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > index bbc4671ded..95098072dd 100644 > > --- a/qapi/migration.json > > +++ b/qapi/migration.json > > @@ -1063,7 +1063,8 @@ > > # <- { "return": {} } > > # > > ## > > -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } } > > +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' }, > > + 'allow-oob': true } > > > > ## > > # @xen-save-devices-state: > > -- > > 2.13.6 > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Peter Xu (peterx@redhat.com) wrote: > On Fri, Nov 24, 2017 at 01:14:53PM +0000, Dr. David Alan Gilbert wrote: > > * Peter Xu (peterx@redhat.com) wrote: > > > So it can get rid of being run on main thread. > > > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > Last time I asked if you were sure that we didn't do locking, > > and you explained that we end up just setting up a callback > > that happens in the mainloop, and this shouldn't take a lock. > > I confirmed this by: > > > > running with -incoming defer > > breakpointing in hmp_migrate_incoming > > doing migrate_incoming tcp:0:4444 > > once I hit that breakpointing adding two more breakpoints: > > a) qemu_mutex_lock_iothread > > b) the end of hmp's handle_hmp_command > > > > and indeed it hit the end of handle_hmp_command without > > having hit the qemu_mutex_lock_iothread; so initially that > > looks ok. > > I am not sure I fully understand the test above - I think it was > trying to verify the whole OOB thing running without BQL? If so, > there are possibly two things missing: > > Firstly, qemu_mutex_lock_iothread() is actually called before we call > hmp_migrate_incoming(). To verify it is simple: just break at entry of > hmp_migrate_incoming() and do "p iothread_locked", it'll be true (I > would always prefer printing that global variable to know whether > current thread is in a BQL section). > Yes, that's a good point - I was really trying to just follow through qmp_migrate_incoming a bit, but you've got a point that it was really the wrong way to look at it. > OOB for QMP command "migrate-incoming" rather than the HMP command. So > IMHO what we need to test is QMP command, rather than this HMP one. > > To test that QMP command, it's still not that easy (actually awkward). > We need to first enable "OOB" when doing handshake for QMP: > > {"execute": "qmp_capabilities", "arguments": { "enable": [ "oob" ] } } > > Then, we need to send the command with proper "id"/"control" field: > > { "execute": "migrate-incoming", > "arguments": { "uri": "tcp:localhost:1234" }, > "control": { "run-oob": true }, > "id": "test-command" } > > Note that here "id" field will be a must now since OOB requires that, > meanwhile the "control" field is a must too to make sure that is run > in OOB format (otherwise this command will still take BQL and run as > usual). So if you see we do have a lot of protection to make sure we > only run OOB only if we really wanted to... otherwise we'll always run > in compatible and old way. > > This time if we break at qmp_migrate_incoming() and then do "p > iothread_locked", we should see a false here. > > > > > But then I added a break on pthread_mutex_lock, and I've got > > this set caused by qemu_start_incoming_migration sending a > > MIGRATION_STATUS_SETUP event: > > > > #1 0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>) > > at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65 > > #2 0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442 > > > > 440 trace_monitor_protocol_event_queue(event, qdict, evconf->rate); > > 441 > > 442 qemu_mutex_lock(&monitor_lock); > > 443 > > > > #3 0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661 > > #4 0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700) > > at /home/dgilbert/git/qemu/migration/migration.c:253 > > #5 0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730) > > at /home/dgilbert/git/qemu/migration/migration.c:1321 > > > > is there anything which protects us there? > > IIUC you mean what if we e.g. page fault during taking the > monitor_lock? IMHO it just can't happen - monitor_lock is really used > in limited places and during those critical sections there is no guest > memory access at all (which only protects the monitor logic itself > AFAICT). OK, so we should document somewhere which locks it's OK to take in an OOB command, and then make sure that for each of those locks we add a note saying that anyone taking them must be careful. We should also add a note above teh qmp_migrate_incoming that it's an OOB command and to take care. However, since you've convinced me it's OK: Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > Thanks, > > > > > Dave > > > > > --- > > > qapi/migration.json | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/qapi/migration.json b/qapi/migration.json > > > index bbc4671ded..95098072dd 100644 > > > --- a/qapi/migration.json > > > +++ b/qapi/migration.json > > > @@ -1063,7 +1063,8 @@ > > > # <- { "return": {} } > > > # > > > ## > > > -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } } > > > +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' }, > > > + 'allow-oob': true } > > > > > > ## > > > # @xen-save-devices-state: > > > -- > > > 2.13.6 > > > > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, Nov 27, 2017 at 10:44:26AM +0000, Dr. David Alan Gilbert wrote: [...] > > > > > > But then I added a break on pthread_mutex_lock, and I've got > > > this set caused by qemu_start_incoming_migration sending a > > > MIGRATION_STATUS_SETUP event: > > > > > > #1 0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>) > > > at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65 > > > #2 0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442 > > > > > > 440 trace_monitor_protocol_event_queue(event, qdict, evconf->rate); > > > 441 > > > 442 qemu_mutex_lock(&monitor_lock); > > > 443 > > > > > > #3 0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661 > > > #4 0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700) > > > at /home/dgilbert/git/qemu/migration/migration.c:253 > > > #5 0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730) > > > at /home/dgilbert/git/qemu/migration/migration.c:1321 > > > > > > is there anything which protects us there? > > > > IIUC you mean what if we e.g. page fault during taking the > > monitor_lock? IMHO it just can't happen - monitor_lock is really used > > in limited places and during those critical sections there is no guest > > memory access at all (which only protects the monitor logic itself > > AFAICT). > > OK, so we should document somewhere which locks it's OK to take in an > OOB command, and then make sure that for each of those locks we add > a note saying that anyone taking them must be careful. > We should also add a note above teh qmp_migrate_incoming that it's an > OOB command and to take care. Makes sense. I think it will be hard to document what locks can be taken during OOB command handing since there can be too many locks there... But at least I can add a comment to qmp_migrate_incoming() along with current patch to note that this function should be OOB friendly from now on. > > However, since you've convinced me it's OK: > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Thanks!
* Peter Xu (peterx@redhat.com) wrote: > On Mon, Nov 27, 2017 at 10:44:26AM +0000, Dr. David Alan Gilbert wrote: > > [...] > > > > > > > > > But then I added a break on pthread_mutex_lock, and I've got > > > > this set caused by qemu_start_incoming_migration sending a > > > > MIGRATION_STATUS_SETUP event: > > > > > > > > #1 0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>) > > > > at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65 > > > > #2 0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442 > > > > > > > > 440 trace_monitor_protocol_event_queue(event, qdict, evconf->rate); > > > > 441 > > > > 442 qemu_mutex_lock(&monitor_lock); > > > > 443 > > > > > > > > #3 0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661 > > > > #4 0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700) > > > > at /home/dgilbert/git/qemu/migration/migration.c:253 > > > > #5 0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730) > > > > at /home/dgilbert/git/qemu/migration/migration.c:1321 > > > > > > > > is there anything which protects us there? > > > > > > IIUC you mean what if we e.g. page fault during taking the > > > monitor_lock? IMHO it just can't happen - monitor_lock is really used > > > in limited places and during those critical sections there is no guest > > > memory access at all (which only protects the monitor logic itself > > > AFAICT). > > > > OK, so we should document somewhere which locks it's OK to take in an > > OOB command, and then make sure that for each of those locks we add > > a note saying that anyone taking them must be careful. > > We should also add a note above teh qmp_migrate_incoming that it's an > > OOB command and to take care. > > Makes sense. I think it will be hard to document what locks can be > taken during OOB command handing since there can be too many locks > there... Yes, but we need people who are marking commands as OOB to really think about it, otherwise they'll use a lock that in some weird corner case can also be taken by another command that can block. > But at least I can add a comment to qmp_migrate_incoming() > along with current patch to note that this function should be OOB > friendly from now on. Thanks. Dave > > > > However, since you've convinced me it's OK: > > > > > > Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > > Thanks! > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Mon, Nov 27, 2017 at 11:04:00AM +0000, Dr. David Alan Gilbert wrote: > * Peter Xu (peterx@redhat.com) wrote: > > On Mon, Nov 27, 2017 at 10:44:26AM +0000, Dr. David Alan Gilbert wrote: > > > > [...] > > > > > > > > > > > > But then I added a break on pthread_mutex_lock, and I've got > > > > > this set caused by qemu_start_incoming_migration sending a > > > > > MIGRATION_STATUS_SETUP event: > > > > > > > > > > #1 0x0000555555ba6eba in qemu_mutex_lock (mutex=mutex@entry=0x5555563f9ba0 <monitor_lock>) > > > > > at /home/dgilbert/git/qemu/util/qemu-thread-posix.c:65 > > > > > #2 0x00005555557f01c1 in monitor_qapi_event_queue (event=QAPI_EVENT_MIGRATION, qdict=0x555557d93c00, errp=<optimized out>) at /home/dgilbert/git/qemu/monitor.c:442 > > > > > > > > > > 440 trace_monitor_protocol_event_queue(event, qdict, evconf->rate); > > > > > 441 > > > > > 442 qemu_mutex_lock(&monitor_lock); > > > > > 443 > > > > > > > > > > #3 0x0000555555b92722 in qapi_event_send_migration (status=status@entry=MIGRATION_STATUS_SETUP, errp=0x555556859320 <error_abort>) at qapi-event.c:661 > > > > > #4 0x0000555555a6159f in qemu_start_incoming_migration (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc700) > > > > > at /home/dgilbert/git/qemu/migration/migration.c:253 > > > > > #5 0x0000555555a641c5 in qmp_migrate_incoming (uri=0x555557693f30 "tcp:0:4444", errp=0x7fffffffc730) > > > > > at /home/dgilbert/git/qemu/migration/migration.c:1321 > > > > > > > > > > is there anything which protects us there? > > > > > > > > IIUC you mean what if we e.g. page fault during taking the > > > > monitor_lock? IMHO it just can't happen - monitor_lock is really used > > > > in limited places and during those critical sections there is no guest > > > > memory access at all (which only protects the monitor logic itself > > > > AFAICT). > > > > > > OK, so we should document somewhere which locks it's OK to take in an > > > OOB command, and then make sure that for each of those locks we add > > > a note saying that anyone taking them must be careful. > > > We should also add a note above teh qmp_migrate_incoming that it's an > > > OOB command and to take care. > > > > Makes sense. I think it will be hard to document what locks can be > > taken during OOB command handing since there can be too many locks > > there... > > Yes, but we need people who are marking commands as OOB to really think > about it, otherwise they'll use a lock that in some weird corner case > can also be taken by another command that can block. Right. That sounds like a HOWTO to migrate existing commands to support OOB. Hmm, could there be any/much? I'm curious. Anyways... how about I add this paragraph to doc patch? Converting an existing QMP command to be OOB-capable can be either very easy or extremely hard. The most important thing is that, the command should never be blocked unexpectedly in any form. For example, it's still legal to take a mutex in an OOB-capable command handler (but never the BQL!), but we need to make sure that all the other code paths that are protected by the same lock won't be blocked too. Some candidates that can easily block: (1) doing IOs (especially when the backend can be an NFS storage), or (2) accessing guest memories (which can be simply blocked when it triggers a page fault, and which cannot be handled immediately). Thanks, -- Peter Xu
diff --git a/qapi/migration.json b/qapi/migration.json index bbc4671ded..95098072dd 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -1063,7 +1063,8 @@ # <- { "return": {} } # ## -{ 'command': 'migrate-incoming', 'data': {'uri': 'str' } } +{ 'command': 'migrate-incoming', 'data': {'uri': 'str' }, + 'allow-oob': true } ## # @xen-save-devices-state:
So it can get rid of being run on main thread. Signed-off-by: Peter Xu <peterx@redhat.com> --- qapi/migration.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)