Message ID | 4BE01120.30608@redhat.com |
---|---|
State | New |
Headers | show |
hi kevin, you did it *g* looks promising. applied this patched and was not able to reproduce yet :-) secure way to reproduce was to shut down all multipath paths, then initiate i/o in the vm (e.g. start an application). of course, everything hangs at this point. after reenabling one path, vm crashed. now it seems to behave correctly and just report an DMA timeout and continues normally afterwards. can you imagine of any way preventing the vm to consume 100% cpu in that waiting state? my current approach is to run all vms with nice 1, which helped to keep the machine responsible if all vms (in my test case 64 on a box) have hanging i/o at the same time. br, peter Kevin Wolf wrote: > Am 04.05.2010 13:38, schrieb Peter Lieven: > >> hi kevin, >> >> i set a breakpint at bmdma_active_if. the first 2 breaks encountered >> when the last path in the multipath >> failed, but the assertion was not true. >> when i kicked one path back in the breakpoint was reached again, this >> time leading to an assert. >> the stacktrace is from the point shortly before. >> >> hope this helps. >> > > Hm, looks like there's something wrong with cancelling requests - > bdrv_aio_cancel might decide that it completes a request (and > consequently calls the callback for it) whereas the IDE emulation > decides that it's done with the request before calling bdrv_aio_cancel. > > I haven't looked in much detail what this could break, but does > something like this help? > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index 0757528..3cd55e3 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read) > void ide_dma_cancel(BMDMAState *bm) > { > if (bm->status & BM_STATUS_DMAING) { > - bm->status &= ~BM_STATUS_DMAING; > - /* cancel DMA request */ > - bm->unit = -1; > - bm->dma_cb = NULL; > if (bm->aiocb) { > #ifdef DEBUG_AIO > printf("aio_cancel\n"); > @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm) > bdrv_aio_cancel(bm->aiocb); > bm->aiocb = NULL; > } > + bm->status &= ~BM_STATUS_DMAING; > + /* cancel DMA request */ > + bm->unit = -1; > + bm->dma_cb = NULL; > } > } > > Kevin > >
Am 04.05.2010 15:42, schrieb Peter Lieven: > hi kevin, > > you did it *g* > > looks promising. applied this patched and was not able to reproduce yet :-) > > secure way to reproduce was to shut down all multipath paths, then > initiate i/o > in the vm (e.g. start an application). of course, everything hangs at > this point. > > after reenabling one path, vm crashed. now it seems to behave correctly and > just report an DMA timeout and continues normally afterwards. Great, I'm going to submit it as a proper patch then. Christoph, by now I'm pretty sure it's right, but can you have another look if this is correct, anyway? > can you imagine of any way preventing the vm to consume 100% cpu in > that waiting state? > my current approach is to run all vms with nice 1, which helped to keep the > machine responsible if all vms (in my test case 64 on a box) have hanging > i/o at the same time. I don't have anything particular in mind, but you could just attach gdb and get another backtrace while it consumes 100% CPU (you'll need to use "thread apply all bt" to catch everything). Then we should see where it's hanging. Kevin
On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote: > Great, I'm going to submit it as a proper patch then. > > Christoph, by now I'm pretty sure it's right, but can you have another > look if this is correct, anyway? It looks correct to me - we really shouldn't update the the fields until bdrv_aio_cancel has returned. In fact we cannot cancel a request more often than we can, so there's a fairly high chance it will complete. Reviewed-by: Christoph Hellwig <hch@lst.de>
Hi Kevin, On 04.05.2010 14:20, Kevin Wolf wrote: > Am 04.05.2010 13:38, schrieb Peter Lieven: >> hi kevin, >> >> i set a breakpint at bmdma_active_if. the first 2 breaks encountered >> when the last path in the multipath >> failed, but the assertion was not true. >> when i kicked one path back in the breakpoint was reached again, this >> time leading to an assert. >> the stacktrace is from the point shortly before. >> >> hope this helps. > > Hm, looks like there's something wrong with cancelling requests - > bdrv_aio_cancel might decide that it completes a request (and > consequently calls the callback for it) whereas the IDE emulation > decides that it's done with the request before calling bdrv_aio_cancel. > > I haven't looked in much detail what this could break, but does > something like this help? Your attached patch fixes the problem I had as well. I ran 3 consecutive tests tonight, which all finished without crashing the VM. I reported my "assertion failed" error on March 14th while doing disk perfomance tests using iozone in an Ubuntu 9.10 VM with qemu-kvm 0.12.3. Thank you very much. André
Hi Kevin, here we go. I created a blocking multipath device (interrupted all paths). qemu-kvm hangs with 100% cpu. also monitor is not responding. If I restore at least one path, the vm is continueing. BR, Peter ^C Program received signal SIGINT, Interrupt. 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 (gdb) bt #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 (gdb) bt full #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 No symbol table info available. #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 No symbol table info available. #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 No locals. #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 No locals. #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 ioh = (IOHandlerRecord *) 0x0 rfds = {fds_bits = {1048576, 0 <repeats 15 times>}} wfds = {fds_bits = {0 <repeats 16 times>}} xfds = {fds_bits = {0 <repeats 16 times>}} ret = 1 nfds = 21 tv = {tv_sec = 0, tv_usec = 999761} #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 fds = {18, 19} mask = {__val = {268443712, 0 <repeats 15 times>}} sigfd = 20 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 r = 0 #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 gdbstub_dev = 0x0 boot_devices_bitmap = 12 i = 0 snapshot = 0 linux_boot = 0 initrd_filename = 0x0 kernel_filename = 0x0 kernel_cmdline = 0x588fac "" boot_devices = "dc", '\0' <repeats 30 times> ds = (DisplayState *) 0x198bf00 dcl = (DisplayChangeListener *) 0x0 cyls = 0 heads = 0 secs = 0 translation = 0 hda_opts = (QemuOpts *) 0x0 opts = (QemuOpts *) 0x1957390 optind = 30 ---Type <return> to continue, or q <return> to quit--- r = 0x7fff266a8a23 "-usbdevice" optarg = 0x7fff266a8a2e "tablet" loadvm = 0x0 machine = (QEMUMachine *) 0x861720 cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' ' <repeats 11 times>, "E5520 @ 2.27GHz" fds = {644511720, 32767} tb_size = 0 pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid" incoming = 0x0 fd = 0 pwd = (struct passwd *) 0x0 chroot_dir = 0x0 run_as = 0x0 env = (struct CPUX86State *) 0x0 show_vnc_port = 0 params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0} Kevin Wolf wrote: > Am 04.05.2010 15:42, schrieb Peter Lieven: > >> hi kevin, >> >> you did it *g* >> >> looks promising. applied this patched and was not able to reproduce yet :-) >> >> secure way to reproduce was to shut down all multipath paths, then >> initiate i/o >> in the vm (e.g. start an application). of course, everything hangs at >> this point. >> >> after reenabling one path, vm crashed. now it seems to behave correctly and >> just report an DMA timeout and continues normally afterwards. >> > > Great, I'm going to submit it as a proper patch then. > > Christoph, by now I'm pretty sure it's right, but can you have another > look if this is correct, anyway? > > >> can you imagine of any way preventing the vm to consume 100% cpu in >> that waiting state? >> my current approach is to run all vms with nice 1, which helped to keep the >> machine responsible if all vms (in my test case 64 on a box) have hanging >> i/o at the same time. >> > > I don't have anything particular in mind, but you could just attach gdb > and get another backtrace while it consumes 100% CPU (you'll need to use > "thread apply all bt" to catch everything). Then we should see where > it's hanging. > > Kevin > > > >
Hi Peter, Am 12.05.2010 16:01, schrieb Peter Lieven: > Hi Kevin, > > here we go. I created a blocking multipath device (interrupted all > paths). qemu-kvm hangs with 100% cpu. > also monitor is not responding. > > If I restore at least one path, the vm is continueing. > > BR, > Peter This seems to be the backtrace of only one thread, and likely not the interesting one. Can you please use "threads all apply bt" to get the backtrace of all threads? Kevin > > > ^C > Program received signal SIGINT, Interrupt. > 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > (gdb) bt > #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 > #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 > #3 0x000000000042e739 in kvm_mutex_lock () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 > #4 0x000000000042e76e in qemu_mutex_lock_iothread () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 > #5 0x000000000040c262 in main_loop_wait (timeout=1000) at > /usr/src/qemu-kvm-0.12.4/vl.c:3995 > #6 0x000000000042dcf1 in kvm_main_loop () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 > #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 > #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, > envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 > (gdb) bt full > #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > No symbol table info available. > #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 > No symbol table info available. > #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 > No symbol table info available. > #3 0x000000000042e739 in kvm_mutex_lock () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 > No locals. > #4 0x000000000042e76e in qemu_mutex_lock_iothread () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 > No locals. > #5 0x000000000040c262 in main_loop_wait (timeout=1000) at > /usr/src/qemu-kvm-0.12.4/vl.c:3995 > ioh = (IOHandlerRecord *) 0x0 > rfds = {fds_bits = {1048576, 0 <repeats 15 times>}} > wfds = {fds_bits = {0 <repeats 16 times>}} > xfds = {fds_bits = {0 <repeats 16 times>}} > ret = 1 > nfds = 21 > tv = {tv_sec = 0, tv_usec = 999761} > #6 0x000000000042dcf1 in kvm_main_loop () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 > fds = {18, 19} > mask = {__val = {268443712, 0 <repeats 15 times>}} > sigfd = 20 > #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 > r = 0 > #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, > envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 > gdbstub_dev = 0x0 > boot_devices_bitmap = 12 > i = 0 > snapshot = 0 > linux_boot = 0 > initrd_filename = 0x0 > kernel_filename = 0x0 > kernel_cmdline = 0x588fac "" > boot_devices = "dc", '\0' <repeats 30 times> > ds = (DisplayState *) 0x198bf00 > dcl = (DisplayChangeListener *) 0x0 > cyls = 0 > heads = 0 > secs = 0 > translation = 0 > hda_opts = (QemuOpts *) 0x0 > opts = (QemuOpts *) 0x1957390 > optind = 30 > ---Type <return> to continue, or q <return> to quit--- > r = 0x7fff266a8a23 "-usbdevice" > optarg = 0x7fff266a8a2e "tablet" > loadvm = 0x0 > machine = (QEMUMachine *) 0x861720 > cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' > ' <repeats 11 times>, "E5520 @ 2.27GHz" > fds = {644511720, 32767} > tb_size = 0 > pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid" > incoming = 0x0 > fd = 0 > pwd = (struct passwd *) 0x0 > chroot_dir = 0x0 > run_as = 0x0 > env = (struct CPUX86State *) 0x0 > show_vnc_port = 0 > params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0} > > Kevin Wolf wrote: >> Am 04.05.2010 15:42, schrieb Peter Lieven: >> >>> hi kevin, >>> >>> you did it *g* >>> >>> looks promising. applied this patched and was not able to reproduce yet :-) >>> >>> secure way to reproduce was to shut down all multipath paths, then >>> initiate i/o >>> in the vm (e.g. start an application). of course, everything hangs at >>> this point. >>> >>> after reenabling one path, vm crashed. now it seems to behave correctly and >>> just report an DMA timeout and continues normally afterwards. >>> >> >> Great, I'm going to submit it as a proper patch then. >> >> Christoph, by now I'm pretty sure it's right, but can you have another >> look if this is correct, anyway? >> >> >>> can you imagine of any way preventing the vm to consume 100% cpu in >>> that waiting state? >>> my current approach is to run all vms with nice 1, which helped to keep the >>> machine responsible if all vms (in my test case 64 on a box) have hanging >>> i/o at the same time. >>> >> >> I don't have anything particular in mind, but you could just attach gdb >> and get another backtrace while it consumes 100% CPU (you'll need to use >> "thread apply all bt" to catch everything). Then we should see where >> it's hanging. >> >> Kevin >> >> >> >> > >
hi kevin, here is the backtrace of (hopefully) all threads: ^C Program received signal SIGINT, Interrupt. [Switching to Thread 0x7f39b72656f0 (LWP 10695)] 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0 (gdb) thread apply all bt Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)): #0 0x00007f39b6c3eedb in read () from /lib/libpthread.so.0 #1 0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at linux-aio.c:125 #2 0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at linux-aio.c:184 #3 0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba310) at block.c:1800 #4 0x0000000000587a52 in dma_aio_cancel (acb=0x22ba170) at /usr/src/qemu-kvm-0.12.4/dma-helpers.c:138 #5 0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba170) at block.c:1800 #6 0x0000000000444aac in ide_dma_cancel (bm=0x2800fd8) at /usr/src/qemu-kvm-0.12.4/hw/ide/core.c:2834 #7 0x0000000000445001 in bmdma_cmd_writeb (opaque=0x2800fd8, addr=49152, val=8) at /usr/src/qemu-kvm-0.12.4/hw/ide/pci.c:44 #8 0x00000000004c85f0 in ioport_write (index=0, address=49152, data=8) at ioport.c:80 #9 0x00000000004c8977 in cpu_outb (addr=49152, val=8 '\b') at ioport.c:198 #10 0x0000000000429731 in kvm_handle_io (port=49152, data=0x7f39b7263000, direction=1, size=1, count=1) at /usr/src/qemu-kvm-0.12.4/kvm-all.c:535 #11 0x000000000042bb8b in kvm_run (env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:968 #12 0x000000000042cea2 in kvm_cpu_exec (env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1651 #13 0x000000000042d62c in kvm_main_loop_cpu (env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1893 #14 0x000000000042d76d in ap_main_loop (_env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1943 #15 0x00007f39b6c383ba in start_thread () from /lib/libpthread.so.0 #16 0x00007f39b5cbafcd in clone () from /lib/libc.so.6 #17 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f39b72656f0 (LWP 10695)): #0 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f39b6c3a190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007f39b6c39a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 #8 0x000000000041054b in main (argc=30, argv=0x7fff019f1ca8, envp=0x7fff019f1da0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252
hi, will this patch make it into 0.12.4.1 ? br, peter Christoph Hellwig wrote: > On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote: > >> Great, I'm going to submit it as a proper patch then. >> >> Christoph, by now I'm pretty sure it's right, but can you have another >> look if this is correct, anyway? >> > > It looks correct to me - we really shouldn't update the the fields > until bdrv_aio_cancel has returned. In fact we cannot cancel a request > more often than we can, so there's a fairly high chance it will > complete. > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > >
Am 18.05.2010 13:13, schrieb Peter Lieven: > hi, > > will this patch make it into 0.12.4.1 ? > > br, > peter Anthony, can you please cherry-pick commit 38d8dfa1 into stable-0.12? Kevin > > Christoph Hellwig wrote: >> On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote: >> >>> Great, I'm going to submit it as a proper patch then. >>> >>> Christoph, by now I'm pretty sure it's right, but can you have another >>> look if this is correct, anyway? >>> >> >> It looks correct to me - we really shouldn't update the the fields >> until bdrv_aio_cancel has returned. In fact we cannot cancel a request >> more often than we can, so there's a fairly high chance it will >> complete. >> >> >> Reviewed-by: Christoph Hellwig <hch@lst.de>
Am 18.05.2010 13:10, schrieb Peter Lieven: > hi kevin, > > here is the backtrace of (hopefully) all threads: > > ^C > Program received signal SIGINT, Interrupt. > [Switching to Thread 0x7f39b72656f0 (LWP 10695)] > 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0 > > (gdb) thread apply all bt > > Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)): > #0 0x00007f39b6c3eedb in read () from /lib/libpthread.so.0 > #1 0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at > linux-aio.c:125 > #2 0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at > linux-aio.c:184 I think it's stuck here in an endless loop: while (laiocb->ret == -EINPROGRESS) qemu_laio_completion_cb(laiocb->ctx); Can you verify this by single-stepping one or two loop iterations? ret and errno after the read call could be interesting, too. We'll be stuck in an endless loop if the request doesn't complete, which might well happen in your scenario. Not sure what the right thing to do is. We probably need to fail the bdrv_aio_cancel to avoid blocking the whole program, but I have no idea what device emulations should do on that condition. As long as we can't handle that condition correctly, leaving the hang in place is probably the best option. Maybe add some sleep to avoid 100% CPU consumption. Kevin
On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: > I think it's stuck here in an endless loop: > > while (laiocb->ret == -EINPROGRESS) > qemu_laio_completion_cb(laiocb->ctx); > > Can you verify this by single-stepping one or two loop iterations? ret > and errno after the read call could be interesting, too. Maybe the compiler is just too smart. Without some form of barrier it could just optimize the loop away as laiocb->ret couldn't change in a normal single-threaded environment.
Am 19.05.2010 09:29, schrieb Christoph Hellwig: > On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: >> I think it's stuck here in an endless loop: >> >> while (laiocb->ret == -EINPROGRESS) >> qemu_laio_completion_cb(laiocb->ctx); >> >> Can you verify this by single-stepping one or two loop iterations? ret >> and errno after the read call could be interesting, too. > > Maybe the compiler is just too smart. Without some form of barrier > it could just optimize the loop away as laiocb->ret couldn't change > in a normal single-threaded environment. It probably could in theory, but in practice we're in a read() inside qemu_laio_completion, so it didn't do it here. Kevin
Kevin Wolf wrote: > Am 19.05.2010 09:29, schrieb Christoph Hellwig: > >> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: >> >>> I think it's stuck here in an endless loop: >>> >>> while (laiocb->ret == -EINPROGRESS) >>> qemu_laio_completion_cb(laiocb->ctx); >>> >>> Can you verify this by single-stepping one or two loop iterations? ret >>> and errno after the read call could be interesting, too. >>> >> Maybe the compiler is just too smart. Without some form of barrier >> it could just optimize the loop away as laiocb->ret couldn't change >> in a normal single-threaded environment. >> > > It probably could in theory, but in practice we're in a read() inside > qemu_laio_completion, so it didn't do it here. > if you supply a patch that will add some usleeps at the point in question i'm willing to test if it solves the 100% cpu problem. > Kevin > >
Am 19.05.2010 um 10:18 schrieb Peter Lieven: > Kevin Wolf wrote: >> Am 19.05.2010 09:29, schrieb Christoph Hellwig: >> >>> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: >>> >>>> I think it's stuck here in an endless loop: >>>> >>>> while (laiocb->ret == -EINPROGRESS) >>>> qemu_laio_completion_cb(laiocb->ctx); >>>> >>>> Can you verify this by single-stepping one or two loop iterations? ret >>>> and errno after the read call could be interesting, too. >>>> >>> Maybe the compiler is just too smart. Without some form of barrier >>> it could just optimize the loop away as laiocb->ret couldn't change >>> in a normal single-threaded environment. >>> >> >> It probably could in theory, but in practice we're in a read() inside >> qemu_laio_completion, so it didn't do it here. >> > if you supply a patch that will add some usleeps at the point in > question i'm willing to test if it solves the 100% cpu problem. can someone help here? what would be the best option to add some usleeps? >> Kevin >> >> > > >
diff --git a/hw/ide/core.c b/hw/ide/core.c index 0757528..3cd55e3 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read) void ide_dma_cancel(BMDMAState *bm) { if (bm->status & BM_STATUS_DMAING) { - bm->status &= ~BM_STATUS_DMAING; - /* cancel DMA request */ - bm->unit = -1; - bm->dma_cb = NULL; if (bm->aiocb) { #ifdef DEBUG_AIO printf("aio_cancel\n"); @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm) bdrv_aio_cancel(bm->aiocb); bm->aiocb = NULL; } + bm->status &= ~BM_STATUS_DMAING; + /* cancel DMA request */ + bm->unit = -1; + bm->dma_cb = NULL; } }