diff mbox

[1/5] monitor: screen_dump async

Message ID 1319457739-14562-2-git-send-email-alevy@redhat.com
State New
Headers show

Commit Message

Alon Levy Oct. 24, 2011, 12:02 p.m. UTC
Make screen_dump monitor command an async command to allow next for qxl
to implement it as a initiating call to red_worker and completion on
callback, to fix a deadlock when issueing a screendump command via
libvirt while connected with a libvirt controlled spice-gtk client.

This patch introduces no functional changes.
---
 console.c       |    5 +++--
 console.h       |    7 +++++--
 hmp-commands.hx |    3 ++-
 hw/g364fb.c     |   12 ++++++++----
 hw/qxl.c        |    6 ++++--
 hw/vga.c        |    7 +++++--
 hw/vmware_vga.c |    5 +++--
 monitor.c       |    5 +++--
 qmp-commands.hx |    3 ++-
 9 files changed, 35 insertions(+), 18 deletions(-)

Comments

Gerd Hoffmann Oct. 24, 2011, 3:13 p.m. UTC | #1
On 10/24/11 14:02, Alon Levy wrote:
> Make screen_dump monitor command an async command to allow next for qxl
> to implement it as a initiating call to red_worker and completion on
> callback, to fix a deadlock when issueing a screendump command via
> libvirt while connected with a libvirt controlled spice-gtk client.

Approach looks reasonable to me.  Patch breaks the build though, you've
missed a bunch of screen_dump functions in non-x86 targets.

cheers,
  Gerd
Luiz Capitulino Oct. 24, 2011, 3:45 p.m. UTC | #2
On Mon, 24 Oct 2011 17:13:14 +0200
Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 10/24/11 14:02, Alon Levy wrote:
> > Make screen_dump monitor command an async command to allow next for qxl
> > to implement it as a initiating call to red_worker and completion on
> > callback, to fix a deadlock when issueing a screendump command via
> > libvirt while connected with a libvirt controlled spice-gtk client.
> 
> Approach looks reasonable to me.  Patch breaks the build though, you've
> missed a bunch of screen_dump functions in non-x86 targets.

There are two problems actually.

The first one is that changing an existing command from synchronous
to asynchronous is an incompatible change because asynchronous commands
semantics is different. For an example of possible problems please
check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.

The second problem is that the existing asynchronous interface in the
monitor is incomplete and has never been used for real. Our plan is to
use QAPI's async support, but that has not landed in master yet and iirc
there wasn't consensus about it. I also think it's a bit late for its
inclusion in 1.0 (and certainly not a candidate for stable).

If all you need here is to delay sending the response, then maybe the
current interface could work (although I honestly don't trust it and
regret not having dropped it). Otherwise our only choice would be to
work on getting the QAPI async support merged.
Alon Levy Oct. 24, 2011, 5:29 p.m. UTC | #3
On Mon, Oct 24, 2011 at 01:45:16PM -0200, Luiz Capitulino wrote:
> On Mon, 24 Oct 2011 17:13:14 +0200
> Gerd Hoffmann <kraxel@redhat.com> wrote:
> 
> > On 10/24/11 14:02, Alon Levy wrote:
> > > Make screen_dump monitor command an async command to allow next for qxl
> > > to implement it as a initiating call to red_worker and completion on
> > > callback, to fix a deadlock when issueing a screendump command via
> > > libvirt while connected with a libvirt controlled spice-gtk client.
> > 
> > Approach looks reasonable to me.  Patch breaks the build though, you've
> > missed a bunch of screen_dump functions in non-x86 targets.
> 
> There are two problems actually.
> 
> The first one is that changing an existing command from synchronous
> to asynchronous is an incompatible change because asynchronous commands
> semantics is different. For an example of possible problems please
> check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.
> 
> The second problem is that the existing asynchronous interface in the
> monitor is incomplete and has never been used for real. Our plan is to
> use QAPI's async support, but that has not landed in master yet and iirc
> there wasn't consensus about it. I also think it's a bit late for its
> inclusion in 1.0 (and certainly not a candidate for stable).
> 
> If all you need here is to delay sending the response, then maybe the
> current interface could work (although I honestly don't trust it and
> regret not having dropped it). Otherwise our only choice would be to
> work on getting the QAPI async support merged.

My problem is that the io thread keeps the global mutex during the wait,
that's why the async monitor is perfect for what I want - this is
exactly what it does. I haven't looked at QAPI async support, but I
understand it's a bit in the future.
Luiz Capitulino Oct. 25, 2011, 12:31 a.m. UTC | #4
On Mon, 24 Oct 2011 19:29:37 +0200
Alon Levy <alevy@redhat.com> wrote:

> On Mon, Oct 24, 2011 at 01:45:16PM -0200, Luiz Capitulino wrote:
> > On Mon, 24 Oct 2011 17:13:14 +0200
> > Gerd Hoffmann <kraxel@redhat.com> wrote:
> > 
> > > On 10/24/11 14:02, Alon Levy wrote:
> > > > Make screen_dump monitor command an async command to allow next for qxl
> > > > to implement it as a initiating call to red_worker and completion on
> > > > callback, to fix a deadlock when issueing a screendump command via
> > > > libvirt while connected with a libvirt controlled spice-gtk client.
> > > 
> > > Approach looks reasonable to me.  Patch breaks the build though, you've
> > > missed a bunch of screen_dump functions in non-x86 targets.
> > 
> > There are two problems actually.
> > 
> > The first one is that changing an existing command from synchronous
> > to asynchronous is an incompatible change because asynchronous commands
> > semantics is different. For an example of possible problems please
> > check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.
> > 
> > The second problem is that the existing asynchronous interface in the
> > monitor is incomplete and has never been used for real. Our plan is to
> > use QAPI's async support, but that has not landed in master yet and iirc
> > there wasn't consensus about it. I also think it's a bit late for its
> > inclusion in 1.0 (and certainly not a candidate for stable).
> > 
> > If all you need here is to delay sending the response, then maybe the
> > current interface could work (although I honestly don't trust it and
> > regret not having dropped it). Otherwise our only choice would be to
> > work on getting the QAPI async support merged.
> 
> My problem is that the io thread keeps the global mutex during the wait,
> that's why the async monitor is perfect for what I want - this is
> exactly what it does.

Let's not mix internal implementation details with what we want as
an external interface.

Can't you just make a vga_hw_screen_dump() specific callback?

> I haven't looked at QAPI async support, but I
> understand it's a bit in the future.

Yes, it's not for the immediate term.
Gerd Hoffmann Oct. 25, 2011, 7:16 a.m. UTC | #5
Hi,

> If all you need here is to delay sending the response, then maybe the
> current interface could work (although I honestly don't trust it and
> regret not having dropped it). Otherwise our only choice would be to
> work on getting the QAPI async support merged.

A delayed monitor response is all we need.

The command may take a bit to finish, where "a bit" is in the order of a
few seconds max, usually less than a second.  Blocking the monitor for
that amount of time is fine, but blocking the io thread introduces
insane long I/O latencies for the guest (and device emulation).

cheers,
  Gerd
Alon Levy Oct. 25, 2011, 10:13 a.m. UTC | #6
On Mon, Oct 24, 2011 at 10:31:48PM -0200, Luiz Capitulino wrote:
> On Mon, 24 Oct 2011 19:29:37 +0200
> Alon Levy <alevy@redhat.com> wrote:
> 
> > On Mon, Oct 24, 2011 at 01:45:16PM -0200, Luiz Capitulino wrote:
> > > On Mon, 24 Oct 2011 17:13:14 +0200
> > > Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > 
> > > > On 10/24/11 14:02, Alon Levy wrote:
> > > > > Make screen_dump monitor command an async command to allow next for qxl
> > > > > to implement it as a initiating call to red_worker and completion on
> > > > > callback, to fix a deadlock when issueing a screendump command via
> > > > > libvirt while connected with a libvirt controlled spice-gtk client.
> > > > 
> > > > Approach looks reasonable to me.  Patch breaks the build though, you've
> > > > missed a bunch of screen_dump functions in non-x86 targets.
> > > 
> > > There are two problems actually.
> > > 
> > > The first one is that changing an existing command from synchronous
> > > to asynchronous is an incompatible change because asynchronous commands
> > > semantics is different. For an example of possible problems please
> > > check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.
> > > 
> > > The second problem is that the existing asynchronous interface in the
> > > monitor is incomplete and has never been used for real. Our plan is to
> > > use QAPI's async support, but that has not landed in master yet and iirc
> > > there wasn't consensus about it. I also think it's a bit late for its
> > > inclusion in 1.0 (and certainly not a candidate for stable).
> > > 
> > > If all you need here is to delay sending the response, then maybe the
> > > current interface could work (although I honestly don't trust it and
> > > regret not having dropped it). Otherwise our only choice would be to
> > > work on getting the QAPI async support merged.
> > 
> > My problem is that the io thread keeps the global mutex during the wait,
> > that's why the async monitor is perfect for what I want - this is
> > exactly what it does.
> 
> Let's not mix internal implementation details with what we want as
> an external interface.
> 
> Can't you just make a vga_hw_screen_dump() specific callback?
> 

I don't understand how that would help - if the monitor command doesn't
return (normal sync operation) then the mutex is never dropped, and any
callback won't change that.

On the other hand, thinking a bit about the reference to 623903 baloon
bug, I don't see a problem: the change doesn't affect the semantics for
any other device except qxl, which I've tested. For any other device,
the only difference is that instead of:

do_screen_dump call
 device specific implementation
 return

It becomes

do_screen_dump call
 device specific implementation (not qxl)
  callback called (always - not conditional, no one stores it except
   qxl)
 return

> > I haven't looked at QAPI async support, but I
> > understand it's a bit in the future.
> 
> Yes, it's not for the immediate term.
Luiz Capitulino Oct. 25, 2011, 12:51 p.m. UTC | #7
On Tue, 25 Oct 2011 12:13:09 +0200
Alon Levy <alevy@redhat.com> wrote:

> On Mon, Oct 24, 2011 at 10:31:48PM -0200, Luiz Capitulino wrote:
> > On Mon, 24 Oct 2011 19:29:37 +0200
> > Alon Levy <alevy@redhat.com> wrote:
> > 
> > > On Mon, Oct 24, 2011 at 01:45:16PM -0200, Luiz Capitulino wrote:
> > > > On Mon, 24 Oct 2011 17:13:14 +0200
> > > > Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > > 
> > > > > On 10/24/11 14:02, Alon Levy wrote:
> > > > > > Make screen_dump monitor command an async command to allow next for qxl
> > > > > > to implement it as a initiating call to red_worker and completion on
> > > > > > callback, to fix a deadlock when issueing a screendump command via
> > > > > > libvirt while connected with a libvirt controlled spice-gtk client.
> > > > > 
> > > > > Approach looks reasonable to me.  Patch breaks the build though, you've
> > > > > missed a bunch of screen_dump functions in non-x86 targets.
> > > > 
> > > > There are two problems actually.
> > > > 
> > > > The first one is that changing an existing command from synchronous
> > > > to asynchronous is an incompatible change because asynchronous commands
> > > > semantics is different. For an example of possible problems please
> > > > check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.
> > > > 
> > > > The second problem is that the existing asynchronous interface in the
> > > > monitor is incomplete and has never been used for real. Our plan is to
> > > > use QAPI's async support, but that has not landed in master yet and iirc
> > > > there wasn't consensus about it. I also think it's a bit late for its
> > > > inclusion in 1.0 (and certainly not a candidate for stable).
> > > > 
> > > > If all you need here is to delay sending the response, then maybe the
> > > > current interface could work (although I honestly don't trust it and
> > > > regret not having dropped it). Otherwise our only choice would be to
> > > > work on getting the QAPI async support merged.
> > > 
> > > My problem is that the io thread keeps the global mutex during the wait,
> > > that's why the async monitor is perfect for what I want - this is
> > > exactly what it does.
> > 
> > Let's not mix internal implementation details with what we want as
> > an external interface.
> > 
> > Can't you just make a vga_hw_screen_dump() specific callback?
> > 
> 
> I don't understand how that would help - if the monitor command doesn't
> return (normal sync operation) then the mutex is never dropped, and any
> callback won't change that.

I'm trying to figure out a different solution.

Our primary motivation for making a QMP command asynchronous must be to
give clients the ability to keep issuing commands while "slow" commands
are running. If that's not what you want nor your primary motivation,
then you're probably taking the wrong approach.

If that is what you want, then you'll have to add a new command, because
changing from asynchronous to synchronous is an incompatible change _and_
you shouldn't use the current interface, because it's botched (actually,
I think I'm going to drop it right now as my last series fixed its only user).

Using a botched interface that doesn't do what's supposed to do but happens to
solve a bug as a side effect will very likely end in tears at some point in
the future.

Now, I did some research and found this description of the problem:

"""
In testing my patches for 'add_client' support with SPICE, I noticed
deadlock in the QEMU/SPICE code. It only happens if I close a SPICE
client and then immediately reconnect within about 1 second. If I
wait a couple of seconds before reconnecting the SPICE client I don't
see the deadlock.
"""
(http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg02599.html)

Is this accurate? Why does it _work_ after 1 second?

> On the other hand, thinking a bit about the reference to 623903 baloon
> bug, I don't see a problem: the change doesn't affect the semantics for
> any other device except qxl, which I've tested. For any other device,
> the only difference is that instead of:
> 
> do_screen_dump call
>  device specific implementation
>  return
> 
> It becomes
> 
> do_screen_dump call
>  device specific implementation (not qxl)
>   callback called (always - not conditional, no one stores it except
>    qxl)
>  return


> 
> > > I haven't looked at QAPI async support, but I
> > > understand it's a bit in the future.
> > 
> > Yes, it's not for the immediate term.
>
Alon Levy Oct. 25, 2011, 1:21 p.m. UTC | #8
On Tue, Oct 25, 2011 at 10:51:30AM -0200, Luiz Capitulino wrote:
> On Tue, 25 Oct 2011 12:13:09 +0200
> Alon Levy <alevy@redhat.com> wrote:
> 
> > On Mon, Oct 24, 2011 at 10:31:48PM -0200, Luiz Capitulino wrote:
> > > On Mon, 24 Oct 2011 19:29:37 +0200
> > > Alon Levy <alevy@redhat.com> wrote:
> > > 
> > > > On Mon, Oct 24, 2011 at 01:45:16PM -0200, Luiz Capitulino wrote:
> > > > > On Mon, 24 Oct 2011 17:13:14 +0200
> > > > > Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > > > 
> > > > > > On 10/24/11 14:02, Alon Levy wrote:
> > > > > > > Make screen_dump monitor command an async command to allow next for qxl
> > > > > > > to implement it as a initiating call to red_worker and completion on
> > > > > > > callback, to fix a deadlock when issueing a screendump command via
> > > > > > > libvirt while connected with a libvirt controlled spice-gtk client.
> > > > > > 
> > > > > > Approach looks reasonable to me.  Patch breaks the build though, you've
> > > > > > missed a bunch of screen_dump functions in non-x86 targets.
> > > > > 
> > > > > There are two problems actually.
> > > > > 
> > > > > The first one is that changing an existing command from synchronous
> > > > > to asynchronous is an incompatible change because asynchronous commands
> > > > > semantics is different. For an example of possible problems please
> > > > > check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.
> > > > > 
> > > > > The second problem is that the existing asynchronous interface in the
> > > > > monitor is incomplete and has never been used for real. Our plan is to
> > > > > use QAPI's async support, but that has not landed in master yet and iirc
> > > > > there wasn't consensus about it. I also think it's a bit late for its
> > > > > inclusion in 1.0 (and certainly not a candidate for stable).
> > > > > 
> > > > > If all you need here is to delay sending the response, then maybe the
> > > > > current interface could work (although I honestly don't trust it and
> > > > > regret not having dropped it). Otherwise our only choice would be to
> > > > > work on getting the QAPI async support merged.
> > > > 
> > > > My problem is that the io thread keeps the global mutex during the wait,
> > > > that's why the async monitor is perfect for what I want - this is
> > > > exactly what it does.
> > > 
> > > Let's not mix internal implementation details with what we want as
> > > an external interface.
> > > 
> > > Can't you just make a vga_hw_screen_dump() specific callback?
> > > 
> > 
> > I don't understand how that would help - if the monitor command doesn't
> > return (normal sync operation) then the mutex is never dropped, and any
> > callback won't change that.
> 
> I'm trying to figure out a different solution.
> 
> Our primary motivation for making a QMP command asynchronous must be to
> give clients the ability to keep issuing commands while "slow" commands
> are running. If that's not what you want nor your primary motivation,
> then you're probably taking the wrong approach.

That sounds right, and it was what I assumed the async monitor
implementation would do, boy was I surprised to discover it doesn't do
any such thing, but what it does do is return early, allow *other* io
related events (select returns) to be handled, and keeps the serialized
only-one-command-ongoing monitor usage.

> 
> If that is what you want, then you'll have to add a new command, because
> changing from asynchronous to synchronous is an incompatible change _and_
> you shouldn't use the current interface, because it's botched (actually,
> I think I'm going to drop it right now as my last series fixed its only user).
> 

This is not what I want. I understand of course that it is what one
would design an async monitor / api to allow (it was after all what I
thought async monitor support meant).

> Using a botched interface that doesn't do what's supposed to do but happens to
> solve a bug as a side effect will very likely end in tears at some point in
> the future.
> 

Right, but it's the opposite of the current case.

> Now, I did some research and found this description of the problem:
> 
> """
> In testing my patches for 'add_client' support with SPICE, I noticed
> deadlock in the QEMU/SPICE code. It only happens if I close a SPICE
> client and then immediately reconnect within about 1 second. If I
> wait a couple of seconds before reconnecting the SPICE client I don't
> see the deadlock.
> """
> (http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg02599.html)
> 
> Is this accurate? Why does it _work_ after 1 second?
>

This is an unrelated bug, I know I said different on the thread but now
rereading the callstack it is the channel_event locking workaround - and
it is fixed by a spice-server only patch (which stops calling
channel_event from the worker thread).

> > On the other hand, thinking a bit about the reference to 623903 baloon
> > bug, I don't see a problem: the change doesn't affect the semantics for
> > any other device except qxl, which I've tested. For any other device,
> > the only difference is that instead of:
> > 
> > do_screen_dump call
> >  device specific implementation
> >  return
> > 
> > It becomes
> > 
> > do_screen_dump call
> >  device specific implementation (not qxl)
> >   callback called (always - not conditional, no one stores it except
> >    qxl)
> >  return
> 
> 
> > 
> > > > I haven't looked at QAPI async support, but I
> > > > understand it's a bit in the future.
> > > 
> > > Yes, it's not for the immediate term.
> > 
>
Luiz Capitulino Oct. 25, 2011, 6:41 p.m. UTC | #9
On Tue, 25 Oct 2011 15:21:11 +0200
Alon Levy <alevy@redhat.com> wrote:

> On Tue, Oct 25, 2011 at 10:51:30AM -0200, Luiz Capitulino wrote:
> > On Tue, 25 Oct 2011 12:13:09 +0200
> > Alon Levy <alevy@redhat.com> wrote:
> > 
> > > On Mon, Oct 24, 2011 at 10:31:48PM -0200, Luiz Capitulino wrote:
> > > > On Mon, 24 Oct 2011 19:29:37 +0200
> > > > Alon Levy <alevy@redhat.com> wrote:
> > > > 
> > > > > On Mon, Oct 24, 2011 at 01:45:16PM -0200, Luiz Capitulino wrote:
> > > > > > On Mon, 24 Oct 2011 17:13:14 +0200
> > > > > > Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > > > > 
> > > > > > > On 10/24/11 14:02, Alon Levy wrote:
> > > > > > > > Make screen_dump monitor command an async command to allow next for qxl
> > > > > > > > to implement it as a initiating call to red_worker and completion on
> > > > > > > > callback, to fix a deadlock when issueing a screendump command via
> > > > > > > > libvirt while connected with a libvirt controlled spice-gtk client.
> > > > > > > 
> > > > > > > Approach looks reasonable to me.  Patch breaks the build though, you've
> > > > > > > missed a bunch of screen_dump functions in non-x86 targets.
> > > > > > 
> > > > > > There are two problems actually.
> > > > > > 
> > > > > > The first one is that changing an existing command from synchronous
> > > > > > to asynchronous is an incompatible change because asynchronous commands
> > > > > > semantics is different. For an example of possible problems please
> > > > > > check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.
> > > > > > 
> > > > > > The second problem is that the existing asynchronous interface in the
> > > > > > monitor is incomplete and has never been used for real. Our plan is to
> > > > > > use QAPI's async support, but that has not landed in master yet and iirc
> > > > > > there wasn't consensus about it. I also think it's a bit late for its
> > > > > > inclusion in 1.0 (and certainly not a candidate for stable).
> > > > > > 
> > > > > > If all you need here is to delay sending the response, then maybe the
> > > > > > current interface could work (although I honestly don't trust it and
> > > > > > regret not having dropped it). Otherwise our only choice would be to
> > > > > > work on getting the QAPI async support merged.
> > > > > 
> > > > > My problem is that the io thread keeps the global mutex during the wait,
> > > > > that's why the async monitor is perfect for what I want - this is
> > > > > exactly what it does.
> > > > 
> > > > Let's not mix internal implementation details with what we want as
> > > > an external interface.
> > > > 
> > > > Can't you just make a vga_hw_screen_dump() specific callback?
> > > > 
> > > 
> > > I don't understand how that would help - if the monitor command doesn't
> > > return (normal sync operation) then the mutex is never dropped, and any
> > > callback won't change that.
> > 
> > I'm trying to figure out a different solution.
> > 
> > Our primary motivation for making a QMP command asynchronous must be to
> > give clients the ability to keep issuing commands while "slow" commands
> > are running. If that's not what you want nor your primary motivation,
> > then you're probably taking the wrong approach.
> 
> That sounds right, and it was what I assumed the async monitor
> implementation would do, boy was I surprised to discover it doesn't do
> any such thing, but what it does do is return early, allow *other* io
> related events (select returns) to be handled, and keeps the serialized
> only-one-command-ongoing monitor usage.

Yes, if that turns out to be needed internally then we'll have to add
such functionality (but probably not as part of QMP's async support iiuc).

> > If that is what you want, then you'll have to add a new command, because
> > changing from asynchronous to synchronous is an incompatible change _and_
> > you shouldn't use the current interface, because it's botched (actually,
> > I think I'm going to drop it right now as my last series fixed its only user).
> > 
> 
> This is not what I want. I understand of course that it is what one
> would design an async monitor / api to allow (it was after all what I
> thought async monitor support meant).
> 
> > Using a botched interface that doesn't do what's supposed to do but happens to
> > solve a bug as a side effect will very likely end in tears at some point in
> > the future.
> > 
> 
> Right, but it's the opposite of the current case.
> 
> > Now, I did some research and found this description of the problem:
> > 
> > """
> > In testing my patches for 'add_client' support with SPICE, I noticed
> > deadlock in the QEMU/SPICE code. It only happens if I close a SPICE
> > client and then immediately reconnect within about 1 second. If I
> > wait a couple of seconds before reconnecting the SPICE client I don't
> > see the deadlock.
> > """
> > (http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg02599.html)
> > 
> > Is this accurate? Why does it _work_ after 1 second?
> >
> 
> This is an unrelated bug, I know I said different on the thread but now
> rereading the callstack it is the channel_event locking workaround - and
> it is fixed by a spice-server only patch (which stops calling
> channel_event from the worker thread).

So, can you try to describe the problem you're having to someone who is
not familiar with qlx and its locking?

> 
> > > On the other hand, thinking a bit about the reference to 623903 baloon
> > > bug, I don't see a problem: the change doesn't affect the semantics for
> > > any other device except qxl, which I've tested. For any other device,
> > > the only difference is that instead of:
> > > 
> > > do_screen_dump call
> > >  device specific implementation
> > >  return
> > > 
> > > It becomes
> > > 
> > > do_screen_dump call
> > >  device specific implementation (not qxl)
> > >   callback called (always - not conditional, no one stores it except
> > >    qxl)
> > >  return
> > 
> > 
> > > 
> > > > > I haven't looked at QAPI async support, but I
> > > > > understand it's a bit in the future.
> > > > 
> > > > Yes, it's not for the immediate term.
> > > 
> > 
>
Alon Levy Oct. 26, 2011, 11:48 a.m. UTC | #10
On Tue, Oct 25, 2011 at 04:41:18PM -0200, Luiz Capitulino wrote:
> On Tue, 25 Oct 2011 15:21:11 +0200
> Alon Levy <alevy@redhat.com> wrote:
> 
> > On Tue, Oct 25, 2011 at 10:51:30AM -0200, Luiz Capitulino wrote:
> > > On Tue, 25 Oct 2011 12:13:09 +0200
> > > Alon Levy <alevy@redhat.com> wrote:
> > > 
> > > > On Mon, Oct 24, 2011 at 10:31:48PM -0200, Luiz Capitulino wrote:
> > > > > On Mon, 24 Oct 2011 19:29:37 +0200
> > > > > Alon Levy <alevy@redhat.com> wrote:
> > > > > 
> > > > > > On Mon, Oct 24, 2011 at 01:45:16PM -0200, Luiz Capitulino wrote:
> > > > > > > On Mon, 24 Oct 2011 17:13:14 +0200
> > > > > > > Gerd Hoffmann <kraxel@redhat.com> wrote:
> > > > > > > 
> > > > > > > > On 10/24/11 14:02, Alon Levy wrote:
> > > > > > > > > Make screen_dump monitor command an async command to allow next for qxl
> > > > > > > > > to implement it as a initiating call to red_worker and completion on
> > > > > > > > > callback, to fix a deadlock when issueing a screendump command via
> > > > > > > > > libvirt while connected with a libvirt controlled spice-gtk client.
> > > > > > > > 
> > > > > > > > Approach looks reasonable to me.  Patch breaks the build though, you've
> > > > > > > > missed a bunch of screen_dump functions in non-x86 targets.
> > > > > > > 
> > > > > > > There are two problems actually.
> > > > > > > 
> > > > > > > The first one is that changing an existing command from synchronous
> > > > > > > to asynchronous is an incompatible change because asynchronous commands
> > > > > > > semantics is different. For an example of possible problems please
> > > > > > > check: https://bugzilla.redhat.com/show_bug.cgi?id=623903.
> > > > > > > 
> > > > > > > The second problem is that the existing asynchronous interface in the
> > > > > > > monitor is incomplete and has never been used for real. Our plan is to
> > > > > > > use QAPI's async support, but that has not landed in master yet and iirc
> > > > > > > there wasn't consensus about it. I also think it's a bit late for its
> > > > > > > inclusion in 1.0 (and certainly not a candidate for stable).
> > > > > > > 
> > > > > > > If all you need here is to delay sending the response, then maybe the
> > > > > > > current interface could work (although I honestly don't trust it and
> > > > > > > regret not having dropped it). Otherwise our only choice would be to
> > > > > > > work on getting the QAPI async support merged.
> > > > > > 
> > > > > > My problem is that the io thread keeps the global mutex during the wait,
> > > > > > that's why the async monitor is perfect for what I want - this is
> > > > > > exactly what it does.
> > > > > 
> > > > > Let's not mix internal implementation details with what we want as
> > > > > an external interface.
> > > > > 
> > > > > Can't you just make a vga_hw_screen_dump() specific callback?
> > > > > 
> > > > 
> > > > I don't understand how that would help - if the monitor command doesn't
> > > > return (normal sync operation) then the mutex is never dropped, and any
> > > > callback won't change that.
> > > 
> > > I'm trying to figure out a different solution.
> > > 
> > > Our primary motivation for making a QMP command asynchronous must be to
> > > give clients the ability to keep issuing commands while "slow" commands
> > > are running. If that's not what you want nor your primary motivation,
> > > then you're probably taking the wrong approach.
> > 
> > That sounds right, and it was what I assumed the async monitor
> > implementation would do, boy was I surprised to discover it doesn't do
> > any such thing, but what it does do is return early, allow *other* io
> > related events (select returns) to be handled, and keeps the serialized
> > only-one-command-ongoing monitor usage.
> 
> Yes, if that turns out to be needed internally then we'll have to add
> such functionality (but probably not as part of QMP's async support iiuc).
> 
> > > If that is what you want, then you'll have to add a new command, because
> > > changing from asynchronous to synchronous is an incompatible change _and_
> > > you shouldn't use the current interface, because it's botched (actually,
> > > I think I'm going to drop it right now as my last series fixed its only user).
> > > 
> > 
> > This is not what I want. I understand of course that it is what one
> > would design an async monitor / api to allow (it was after all what I
> > thought async monitor support meant).
> > 
> > > Using a botched interface that doesn't do what's supposed to do but happens to
> > > solve a bug as a side effect will very likely end in tears at some point in
> > > the future.
> > > 
> > 
> > Right, but it's the opposite of the current case.
> > 
> > > Now, I did some research and found this description of the problem:
> > > 
> > > """
> > > In testing my patches for 'add_client' support with SPICE, I noticed
> > > deadlock in the QEMU/SPICE code. It only happens if I close a SPICE
> > > client and then immediately reconnect within about 1 second. If I
> > > wait a couple of seconds before reconnecting the SPICE client I don't
> > > see the deadlock.
> > > """
> > > (http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg02599.html)
> > > 
> > > Is this accurate? Why does it _work_ after 1 second?
> > >
> > 
> > This is an unrelated bug, I know I said different on the thread but now
> > rereading the callstack it is the channel_event locking workaround - and
> > it is fixed by a spice-server only patch (which stops calling
> > channel_event from the worker thread).
> 
> So, can you try to describe the problem you're having to someone who is
> not familiar with qlx and its locking?
> 

I'm suddenly not sure this patchset even fixes the problem that I said
it does, so I'll update once I made sure what it actually fixes.

> > 
> > > > On the other hand, thinking a bit about the reference to 623903 baloon
> > > > bug, I don't see a problem: the change doesn't affect the semantics for
> > > > any other device except qxl, which I've tested. For any other device,
> > > > the only difference is that instead of:
> > > > 
> > > > do_screen_dump call
> > > >  device specific implementation
> > > >  return
> > > > 
> > > > It becomes
> > > > 
> > > > do_screen_dump call
> > > >  device specific implementation (not qxl)
> > > >   callback called (always - not conditional, no one stores it except
> > > >    qxl)
> > > >  return
> > > 
> > > 
> > > > 
> > > > > > I haven't looked at QAPI async support, but I
> > > > > > understand it's a bit in the future.
> > > > > 
> > > > > Yes, it's not for the immediate term.
> > > > 
> > > 
> > 
>
diff mbox

Patch

diff --git a/console.c b/console.c
index e43de92..30c667a 100644
--- a/console.c
+++ b/console.c
@@ -173,7 +173,8 @@  void vga_hw_invalidate(void)
         active_console->hw_invalidate(active_console->hw);
 }
 
-void vga_hw_screen_dump(const char *filename)
+void vga_hw_screen_dump(const char *filename, MonitorCompletion cb,
+                        void *opaque)
 {
     TextConsole *previous_active_console;
 
@@ -183,7 +184,7 @@  void vga_hw_screen_dump(const char *filename)
        so always dump the first one.  */
     console_select(0);
     if (consoles[0] && consoles[0]->hw_screen_dump) {
-        consoles[0]->hw_screen_dump(consoles[0]->hw, filename);
+        consoles[0]->hw_screen_dump(consoles[0]->hw, filename, cb, opaque);
     }
 
     console_select(previous_active_console->index);
diff --git a/console.h b/console.h
index 9c1487e..dde4088 100644
--- a/console.h
+++ b/console.h
@@ -343,7 +343,9 @@  static inline void console_write_ch(console_ch_t *dest, uint32_t ch)
 
 typedef void (*vga_hw_update_ptr)(void *);
 typedef void (*vga_hw_invalidate_ptr)(void *);
-typedef void (*vga_hw_screen_dump_ptr)(void *, const char *);
+typedef void (*vga_hw_screen_dump_ptr)(void *, const char *,
+                                       MonitorCompletion *,
+                                       void *);
 typedef void (*vga_hw_text_update_ptr)(void *, console_ch_t *);
 
 DisplayState *graphic_console_init(vga_hw_update_ptr update,
@@ -354,7 +356,8 @@  DisplayState *graphic_console_init(vga_hw_update_ptr update,
 
 void vga_hw_update(void);
 void vga_hw_invalidate(void);
-void vga_hw_screen_dump(const char *filename);
+void vga_hw_screen_dump(const char *filename, MonitorCompletion cb,
+                        void *opaque);
 void vga_hw_text_update(console_ch_t *chardata);
 
 int is_graphic_console(void);
diff --git a/hmp-commands.hx b/hmp-commands.hx
index ab08d58..a7b3494 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -157,7 +157,8 @@  ETEXI
         .params     = "filename",
         .help       = "save screen into PPM image 'filename'",
         .user_print = monitor_user_noop,
-        .mhandler.cmd_new = do_screen_dump,
+        .mhandler.cmd_async = do_screen_dump,
+        .flags      = MONITOR_CMD_ASYNC,
     },
 
 STEXI
diff --git a/hw/g364fb.c b/hw/g364fb.c
index f00ee27..5884e20 100644
--- a/hw/g364fb.c
+++ b/hw/g364fb.c
@@ -291,7 +291,8 @@  static void g364fb_reset(G364State *s)
     g364fb_invalidate_display(s);
 }
 
-static void g364fb_screen_dump(void *opaque, const char *filename)
+static void g364fb_screen_dump(void *opaque, const char *filename,
+                               MonitorCompletion *cb, void *cb_opaque)
 {
     G364State *s = opaque;
     int y, x;
@@ -303,12 +304,13 @@  static void g364fb_screen_dump(void *opaque, const char *filename)
 
     if (s->depth != 8) {
         error_report("g364: unknown guest depth %d", s->depth);
-        return;
+        goto end;
     }
 
     f = fopen(filename, "wb");
-    if (!f)
-        return;
+    if (!f) {
+        goto end;
+    }
 
     if (s->ctla & CTLA_FORCE_BLANK) {
         /* blank screen */
@@ -331,6 +333,8 @@  static void g364fb_screen_dump(void *opaque, const char *filename)
     }
 
     fclose(f);
+end:
+    cb(cb_opaque, NULL);
 }
 
 /* called for accesses to io ports */
diff --git a/hw/qxl.c b/hw/qxl.c
index 03848ed..4003e53 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -1423,7 +1423,8 @@  static void qxl_hw_invalidate(void *opaque)
     vga->invalidate(vga);
 }
 
-static void qxl_hw_screen_dump(void *opaque, const char *filename)
+static void qxl_hw_screen_dump(void *opaque, const char *filename,
+                               MonitorCompletion *cb, void *cb_opaque)
 {
     PCIQXLDevice *qxl = opaque;
     VGACommonState *vga = &qxl->vga;
@@ -1433,9 +1434,10 @@  static void qxl_hw_screen_dump(void *opaque, const char *filename)
     case QXL_MODE_NATIVE:
         qxl_render_update(qxl);
         ppm_save(filename, qxl->ssd.ds->surface);
+        cb(cb_opaque, NULL);
         break;
     case QXL_MODE_VGA:
-        vga->screen_dump(vga, filename);
+        vga->screen_dump(vga, filename, cb, cb_opaque);
         break;
     default:
         break;
diff --git a/hw/vga.c b/hw/vga.c
index ca79aa1..b257f74 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -148,7 +148,8 @@  static uint32_t expand4[256];
 static uint16_t expand2[256];
 static uint8_t expand4to8[16];
 
-static void vga_screen_dump(void *opaque, const char *filename);
+static void vga_screen_dump(void *opaque, const char *filename,
+                            MonitorCompletion *cb, void *cb_opaque);
 static const char *screen_dump_filename;
 static DisplayChangeListener *screen_dump_dcl;
 
@@ -2404,7 +2405,8 @@  static DisplayChangeListener* vga_screen_dump_init(DisplayState *ds)
 
 /* save the vga display in a PPM image even if no display is
    available */
-static void vga_screen_dump(void *opaque, const char *filename)
+static void vga_screen_dump(void *opaque, const char *filename,
+                            MonitorCompletion *cb, void *cb_opaque)
 {
     VGACommonState *s = opaque;
 
@@ -2415,4 +2417,5 @@  static void vga_screen_dump(void *opaque, const char *filename)
     vga_invalidate_display(s);
     vga_hw_update();
     screen_dump_filename = NULL;
+    cb(cb_opaque, NULL);
 }
diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index af70bde..a238fef 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -1003,11 +1003,12 @@  static void vmsvga_invalidate_display(void *opaque)
 
 /* save the vga display in a PPM image even if no display is
    available */
-static void vmsvga_screen_dump(void *opaque, const char *filename)
+static void vmsvga_screen_dump(void *opaque, const char *filename,
+                               MonitorCompletion *cb, void *cb_opaque)
 {
     struct vmsvga_state_s *s = opaque;
     if (!s->enable) {
-        s->vga.screen_dump(&s->vga, filename);
+        s->vga.screen_dump(&s->vga, filename, cb, cb_opaque);
         return;
     }
 
diff --git a/monitor.c b/monitor.c
index ffda0fe..daf5e2f 100644
--- a/monitor.c
+++ b/monitor.c
@@ -1180,9 +1180,10 @@  static int client_migrate_info(Monitor *mon, const QDict *qdict, QObject **ret_d
     return -1;
 }
 
-static int do_screen_dump(Monitor *mon, const QDict *qdict, QObject **ret_data)
+static int do_screen_dump(Monitor *mon, const QDict *params,
+                          MonitorCompletion cb, void *opaque)
 {
-    vga_hw_screen_dump(qdict_get_str(qdict, "filename"));
+    vga_hw_screen_dump(qdict_get_str(params, "filename"), cb, opaque);
     return 0;
 }
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9c11e87..15b04d6 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -155,7 +155,8 @@  EQMP
         .params     = "filename",
         .help       = "save screen into PPM image 'filename'",
         .user_print = monitor_user_noop,
-        .mhandler.cmd_new = do_screen_dump,
+        .mhandler.cmd_async = do_screen_dump,
+        .flags      = MONITOR_CMD_ASYNC,
     },
 
 SQMP