Message ID | CAEgOgz5r5FFr6Wxu08hK+mKr3-ZohEAWSau=u8cPt=W9iZFFRw@mail.gmail.com |
---|---|
State | New |
Headers | show |
> --- > Is it expected that this non-blocking condition implies lockup of the > iothread? No. The idea was to make the loop cheaper when you had a qemu_notify_event() or bottom half, basically something that causes main_loop_wait() to wake up immediately multiple times. When that happens, it is cheaper to avoid releasing and taking the mutex. Can you check why main_loop_wait() is returning a non-zero value without making any progress? Paolo > Diving deeper again, I notice that this non-blocking feature isn't > even enabled at all for KVM. Which would probably mean that this bug > is not replicable by anyone testing with KVM. We could just make all > the CPU backends consistent with KVM by removing the nonblocking > altogether. Any comments from TCG people :) ? > > --- a/vl.c > +++ b/vl.c > @@ -2030,17 +2030,15 @@ static bool main_loop_should_exit(void) > > static void main_loop(void) > { > - bool nonblocking; > int last_io = 0; > #ifdef CONFIG_PROFILER > int64_t ti; > #endif > do { > - nonblocking = !kvm_enabled() && last_io > 0; > #ifdef CONFIG_PROFILER > ti = profile_getclock(); > #endif > - last_io = main_loop_wait(nonblocking); > + last_io = main_loop_wait(0); > #ifdef CONFIG_PROFILER > dev_time += profile_getclock() - ti; > #endif > > Not the mux's: if mux_chr_can_read() > > returns zero, the prepare function returns FALSE without touching the > > timeout at all... > > > > static gboolean io_watch_poll_prepare(GSource *source, gint *timeout_) > > { > > IOWatchPoll *iwp = io_watch_poll_from_source(source); > > > > iwp->max_size = iwp->fd_can_read(iwp->opaque); > > if (iwp->max_size == 0) { > > return FALSE; > > } > > > > return g_io_watch_funcs.prepare(source, timeout_); > > } > > > >> - Timeout means no unlock of IOthread. Device land never sees any more > >> cycles so the serial port never progresses - no flushing of > >> buffer > > > > Still, this is plausible, so the patch looks correct. > > > > Ok, > > Ill give it some more time and fix commit message. if we don't figure > out a better patch. > > Regards, > Peter > > > Paolo > > > >> - Deadlock > >> > >> Tested on petalogix_ml605 microblazeel machine model, which was faulty > >> due to 1154328. > >> > >> Fix by removing the conditions on unlocking the iothread. Don't know > >> what else this will break but the timeout is certainly the wrong > >> condition for the unlock. Probably the real solution is to have a more > >> selective unlock policy. > >> > >> I'm happy for someone to take this patch off my hands, or educate me on > >> the correct implementation. For the peeps doing automated testing on > >> nographic platforms this will get your build working again. > >> > >> Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com> > >> --- > >> main-loop.c | 8 ++------ > >> 1 files changed, 2 insertions(+), 6 deletions(-) > >> > >> diff --git a/main-loop.c b/main-loop.c > >> index eb80ff3..a376898 100644 > >> --- a/main-loop.c > >> +++ b/main-loop.c > >> @@ -194,15 +194,11 @@ static int os_host_main_loop_wait(uint32_t timeout) > >> > >> glib_pollfds_fill(&timeout); > >> > >> - if (timeout > 0) { > >> - qemu_mutex_unlock_iothread(); > >> - } > >> + qemu_mutex_unlock_iothread(); > >> > >> ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout); > >> > >> - if (timeout > 0) { > >> - qemu_mutex_lock_iothread(); > >> - } > >> + qemu_mutex_lock_iothread(); > >> > >> glib_pollfds_poll(); > >> return ret; > >> > > > > >
Hi Paolo, On Wed, Apr 3, 2013 at 4:35 PM, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> --- >> Is it expected that this non-blocking condition implies lockup of the >> iothread? > > No. The idea was to make the loop cheaper when you had a qemu_notify_event() > or bottom half, basically something that causes main_loop_wait() to wake up > immediately multiple times. When that happens, it is cheaper to avoid > releasing and taking the mutex. > > Can you check why main_loop_wait() is returning a non-zero value without > making any progress? > > Paolo > My best theory is that stdin thinks it is progressing when it is not. Heres some gdb to back it up. I hung the system with the 40+paste and dumped the state of the polled fds (from os_main_loop_wait()) at that time. 4th array element (stdin) has a revent which makes sense, as stdin is buffering my extra chars so a poll is going to return with that descriptor as readable. g_poll returns 1 accordingly and this is returned, propagating back up to vl.c as a non-zero return code giving the illusion of progress. I think there may be a flaw in that "any of the descriptors being pollable" is not a good definition of progress. stdin is blocked by the fact that the device and mux cannot accept their data anymore so even though its readable, no meaningful read will happen. That leaves us with having to devise more elaborate code to define progress, or we simplify by just removing this nonblocking optimisation altogether (original patch). Regards, Peter Program received signal SIGTRAP, Trace/breakpoint trap. 0x0000000000721353 in poll () (gdb) jhfdg;fdhg;dfgkdfg;dfjhg;dsfjhgds;fjgsd;fjhdjhdfg^C(gdb) (gdb) bt #0 0x0000000000721353 in poll () #1 0x0000000000492ec3 in os_host_main_loop_wait (timeout=0) at /home/peterc/Petalogix/Internal/plgx_src/qemu-arm-microblazeel/main-loop.c:201 #2 0x0000000000492f43 in main_loop_wait (nonblocking=1) at /home/peterc/Petalogix/Internal/plgx_src/qemu-arm-microblazeel/main-loop.c:443 #3 0x00000000005032c5 in main_loop () at /home/peterc/Petalogix/Internal/plgx_src/qemu-arm-microblazeel/vl.c:2043 #4 0x0000000000509783 in main (argc=20, argv=0x7fffffffcd68, envp=0x7fffffffce10) at /home/peterc/Petalogix/Internal/plgx_src/qemu-arm-microblazeel/vl.c:4429 (gdb) up 1 #1 0x0000000000492ec3 in os_host_main_loop_wait (timeout=0) at /home/peterc/Petalogix/Internal/plgx_src/qemu-arm-microblazeel/main-loop.c:201 201 ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout); (gdb) step Single stepping until exit from function poll, which has no line number information. os_host_main_loop_wait (timeout=0) at /home/peterc/Petalogix/Internal/plgx_src/qemu-arm-microblazeel/main-loop.c:203 203 if (timeout > 0) { (gdb) p/x gpollfds->len $1 = 0x6 (gdb) p/x ((GPollFD *)gpollfds->data)[0] $2 = {fd = 0x9, events = 0x19, revents = 0x0} (gdb) p/x ((GPollFD *)gpollfds->data)[1] $3 = {fd = 0x5, events = 0x19, revents = 0x0} (gdb) p/x ((GPollFD *)gpollfds->data)[2] $4 = {fd = 0x7, events = 0x1, revents = 0x0} (gdb) p/x ((GPollFD *)gpollfds->data)[3] $5 = {fd = 0xa, events = 0x1, revents = 0x0} (gdb) p/x ((GPollFD *)gpollfds->data)[4] $6 = {fd = 0x0, events = 0x1, revents = 0x1} (gdb) p/x ((GPollFD *)gpollfds->data)[5] $7 = {fd = 0x6, events = 0x1, revents = 0x0}
Il 04/04/2013 01:58, Peter Crosthwaite ha scritto: > > I think there may be a flaw in that "any of the descriptors being > pollable" is not a good definition of progress. stdin is blocked by > the fact that the device and mux cannot accept their data anymore so > even though its readable, no meaningful read will happen. That leaves > us with having to devise more elaborate code to define progress, or we > simplify by just removing this nonblocking optimisation altogether > (original patch). If stdin is blocked, it shouldn't be polled at all. That is the purpose of the can_read callback. Unfortunately, return FALSE from the prepare callback still leaves the poll handler. So your original patch fixes the symptom, but leaves the busy waiting unfixed. The right thing to use would be g_source_add_child_source() and g_source_remove_child_source(), but that is only present since glib 2.28 and we currently require 2.12 (2.20 on Windows). Anthony, Amit, can you look at it? Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > Il 04/04/2013 01:58, Peter Crosthwaite ha scritto: >> >> I think there may be a flaw in that "any of the descriptors being >> pollable" is not a good definition of progress. stdin is blocked by >> the fact that the device and mux cannot accept their data anymore so >> even though its readable, no meaningful read will happen. That leaves >> us with having to devise more elaborate code to define progress, or we >> simplify by just removing this nonblocking optimisation altogether >> (original patch). > > If stdin is blocked, it shouldn't be polled at all. That is the purpose > of the can_read callback. Unfortunately, return FALSE from the prepare > callback still leaves the poll handler. > > So your original patch fixes the symptom, but leaves the busy waiting > unfixed. > > The right thing to use would be g_source_add_child_source() and > g_source_remove_child_source(), but that is only present since glib 2.28 > and we currently require 2.12 (2.20 on Windows). > > Anthony, Amit, can you look at it? Ack. May not be until tomorrow but I'll try to dig in here. Regards, Anthony Liguori > > Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > Il 04/04/2013 01:58, Peter Crosthwaite ha scritto: >> >> I think there may be a flaw in that "any of the descriptors being >> pollable" is not a good definition of progress. stdin is blocked by >> the fact that the device and mux cannot accept their data anymore so >> even though its readable, no meaningful read will happen. That leaves >> us with having to devise more elaborate code to define progress, or we >> simplify by just removing this nonblocking optimisation altogether >> (original patch). > > If stdin is blocked, it shouldn't be polled at all. That is the purpose > of the can_read callback. Unfortunately, return FALSE from the prepare > callback still leaves the poll handler. > > So your original patch fixes the symptom, but leaves the busy waiting > unfixed. > > The right thing to use would be g_source_add_child_source() and > g_source_remove_child_source(), but that is only present since glib 2.28 > and we currently require 2.12 (2.20 on Windows). I don't think a child source fixes the problem. The backend definitely has work to do. What we don't know is whether the front end is capable of processing the work. The problem here is that we use polling on the front-end. IOW: 1) Char backend has data and is ready to write. 2) Asks front end if it can write 3) Front end says no 4) Goto (1) It does this without dropping the BQL which means that TCG never gets to run. Ideally, we would do: 1) Char backend has data and is ready to write. 2) Asks front end if it can write 3) Backend stops asking front end until front end says it can receive data again 4) Drop BQL because we have no more work to do. But this requires eliminating all users of can_read() which isn't going to happen any time soon. The solution therefore needs to be: 1) Char backend has data and is ready to write. 2) Asks front end if it can write 3) Front end says no 3.5) Drop the BQL and give the VCPU a chance to run 4) Goto (1) Which is essentially what Peter's patch does. In fact, you can imagine other scenarios where a very busy VNC client essentially blocks the BQL. We need to regularly give up the BQL in order to prevent starvation regardless of the polling in the char layer. So I think this is a long way of saying: Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Regards, Anthony Liguori > > Anthony, Amit, can you look at it? > > Paolo
On 4 April 2013 17:59, Anthony Liguori <aliguori@us.ibm.com> wrote: > So I think this is a long way of saying: > > Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Any chance we could update the commit message to include this more authoritative analysis? thanks -- PMM
Peter Maydell <peter.maydell@linaro.org> writes: > On 4 April 2013 17:59, Anthony Liguori <aliguori@us.ibm.com> wrote: >> So I think this is a long way of saying: >> >> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> > > Any chance we could update the commit message to include > this more authoritative analysis? Yes, please do. I'm also not sure that just always dropping the lock is the best strategy either. I think a simple counter with a nice comment that explains why we need to periodically drop the lock is a better solution as it gives a way to experiment with the right value to ensure fairness between the threads without having excessing lock acquisition/release. Regards, Anthony Liguori > > thanks > -- PMM
Il 04/04/2013 18:59, Anthony Liguori ha scritto: >> > >> > The right thing to use would be g_source_add_child_source() and >> > g_source_remove_child_source(), but that is only present since glib 2.28 >> > and we currently require 2.12 (2.20 on Windows). > I don't think a child source fixes the problem. The backend definitely > has work to do. What we don't know is whether the front end is capable > of processing the work. > > The problem here is that we use polling on the front-end. IOW: > > 1) Char backend has data and is ready to write. > 2) Asks front end if it can write > 3) Front end says no > 4) Goto (1) What we used to do is this, however: 1) Char backend asks front end if it can write 2) Front end says no 3) poll() goes on without char backend's descriptor 4) Goto (1), then: 5) Front end is now available, calls qemu_notify_event() 6) Char backend asks front end if it can write 7) Front end says yes No busy waiting here. Though we don't really do (5) everywhere, because for example this patch slipped through the cracks: http://lists.gnu.org/archive/html/qemu-devel/2012-03/msg03208.html (but for the monitor and every consumer running in iothread context it's ok). You could achieve the same thing by making the io watch a child source of the QEMU wrapper. All the wrapper does is add/remove the io watch source from its children, depending on can_read. Perhaps it's possible without child sources though, by setting the callbacks on the glib source directly. Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > Il 04/04/2013 18:59, Anthony Liguori ha scritto: >>> > >>> > The right thing to use would be g_source_add_child_source() and >>> > g_source_remove_child_source(), but that is only present since glib 2.28 >>> > and we currently require 2.12 (2.20 on Windows). >> I don't think a child source fixes the problem. The backend definitely >> has work to do. What we don't know is whether the front end is capable >> of processing the work. >> >> The problem here is that we use polling on the front-end. IOW: >> >> 1) Char backend has data and is ready to write. >> 2) Asks front end if it can write >> 3) Front end says no >> 4) Goto (1) > > What we used to do is this, however: > > 1) Char backend asks front end if it can write > 2) Front end says no You're missing a step here. Previously we would drop the data silently. > 3) poll() goes on without char backend's descriptor Which is what enabled this. > 4) Goto (1), then: > 5) Front end is now available, calls qemu_notify_event() This still exists FWIW. > 6) Char backend asks front end if it can write > 7) Front end says yes > > No busy waiting here. Though we don't really do (5) everywhere, because > for example this patch slipped through the cracks: Ack. > http://lists.gnu.org/archive/html/qemu-devel/2012-03/msg03208.html (but > for the monitor and every consumer running in iothread context it's ok). > > You could achieve the same thing by making the io watch a child source > of the QEMU wrapper. All the wrapper does is add/remove the io watch > source from its children, depending on can_read. I don't understand how this solves the problem. There aren't two sources. There is only a single source (the GIOChannel source). We override the function pointers to basically do monkey patching of the GSource. If I understand your suggestion, it's to add the source as a child of itself. > > Perhaps it's possible without child sources though, by setting the > callbacks on the glib source directly. That's what we're doing... src = g_io_create_watch(channel, G_IO_IN | G_IO_ERR | G_IO_HUP); g_source_set_funcs(src, &io_watch_poll_funcs); g_source_set_callback(src, (GSourceFunc)fd_read, user_data, NULL); There is only one GSource active here. Regards, Anthony Liguori > Paolo
--- a/main-loop.c +++ b/main-loop.c @@ -428,10 +428,6 @@ int main_loop_wait(int nonblocking) int ret; uint32_t timeout = UINT32_MAX; - if (nonblocking) { - timeout = 0; - } - /* poll any events */ g_array_set_size(gpollfds, 0); /* reset for new iteration */ /* XXX: separate device handlers from system ones */ --- Is it expected that this non-blocking condition implies lockup of the iothread? Diving deeper again, I notice that this non-blocking feature isn't even enabled at all for KVM. Which would probably mean that this bug is not replicable by anyone testing with KVM. We could just make all the CPU backends consistent with KVM by removing the nonblocking altogether. Any comments from TCG people :) ? --- a/vl.c +++ b/vl.c @@ -2030,17 +2030,15 @@ static bool main_loop_should_exit(void) static void main_loop(void) { - bool nonblocking; int last_io = 0; #ifdef CONFIG_PROFILER int64_t ti; #endif do { - nonblocking = !kvm_enabled() && last_io > 0; #ifdef CONFIG_PROFILER ti = profile_getclock(); #endif - last_io = main_loop_wait(nonblocking); + last_io = main_loop_wait(0); #ifdef CONFIG_PROFILER dev_time += profile_getclock() - ti; #endif