Message ID | 507BD3D1.3010804@redhat.com |
---|---|
State | New |
Headers | show |
Am 15.10.2012 11:13, schrieb Kevin Wolf: > Am 12.10.2012 17:52, schrieb Andreas Färber: >> Am 12.06.2012 15:44, schrieb Kevin Wolf: >>> Am 12.06.2012 15:33, schrieb Andreas Färber: >>>> Am 14.05.2012 14:20, schrieb Kevin Wolf: >>>>> Am 13.05.2012 10:03, schrieb Zhouyi Zhou: >>>>>> sometimes, qemu/kvm-0.1x will hang in endless loop in qcow2_alloc_cluster_offset. >>>>> >>>>> The patch looks reasonable to me. Note however that while it fixes the >>>>> hang, it still causes cluster leaks. I'm not sure if someone is >>>>> interested in picking these up for old stable releases. Andreas, I think >>>>> you were going to take 0.15? The first version that doesn't have the >>>>> problem is 1.0. >>>> >>> It's "fixed" as a side effect of the block layer conversion to >>> coroutines. Not exactly the kind of patches you'd want to cherry-pick >>> for stable-0.15. >>> >>> The better fix for 0.15 could be to backport the new behaviour of >>> coroutine based requests with bdrv_aio_cancel: >>> >>> static void bdrv_aio_co_cancel_em(BlockDriverAIOCB *blockacb) >>> { >>> qemu_aio_flush(); >>> } >>> >>> Using that as the implementation for qcow2_aio_cancel should be safe and >>> fix this problem. >> >> [...] stable-0.15 does not have coroutines, so I don't >> understand what exactly you're suggesting as alternative here: Backport >> the whole coroutine feature including coroutine function above? Or just >> call qemu_aio_flush() in place of what? This is old qcow2_aio_cancel(): > > No, that was qcow2_aio_flush. ;-) Ugh, what a copy-and-paste error... ;-) > What I'm suggesting (not even compile tested!) is: > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > diff --git a/block/qcow2.c b/block/qcow2.c > index 48e1b95..d665675 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -388,10 +388,7 @@ typedef struct QCowAIOCB { > > static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb) > { > - QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common); > - if (acb->hd_aiocb) > - bdrv_aio_cancel(acb->hd_aiocb); > - qemu_aio_release(acb); > + qemu_aio_flush(); > } > > static AIOPool qcow2_aio_pool = { Compiles fine. Is there a particular test case to invoke this code path? Does this attempt to fix the cluster leaks you mentioned as well, or just the cluster allocation endless loop? Andreas
Am 15.10.2012 16:28, schrieb Andreas Färber: >> What I'm suggesting (not even compile tested!) is: >> >> Signed-off-by: Kevin Wolf <kwolf@redhat.com> >> >> diff --git a/block/qcow2.c b/block/qcow2.c >> index 48e1b95..d665675 100644 >> --- a/block/qcow2.c >> +++ b/block/qcow2.c >> @@ -388,10 +388,7 @@ typedef struct QCowAIOCB { >> >> static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb) >> { >> - QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common); >> - if (acb->hd_aiocb) >> - bdrv_aio_cancel(acb->hd_aiocb); >> - qemu_aio_release(acb); >> + qemu_aio_flush(); >> } >> >> static AIOPool qcow2_aio_pool = { > > Compiles fine. Is there a particular test case to invoke this code path? As far as I know, the common way to get this were IDE resets. However, I'm not sure what you need to do with a Linux guest if you want it to reset the IDE controller... > Does this attempt to fix the cluster leaks you mentioned as well, or > just the cluster allocation endless loop? Instead of cancelling the in-flight requests it waits for their completion, so in the end we should be in an consistent state without cluster leaks. Kevin
Hi Kevin > Am 15.10.2012 16:28, schrieb Andreas Färber: > >> What I'm suggesting (not even compile tested!) is: > >> > >> Signed-off-by: Kevin Wolf <kwolf@redhat.com> > >> > >> diff --git a/block/qcow2.c b/block/qcow2.c > >> index 48e1b95..d665675 100644 > >> --- a/block/qcow2.c > >> +++ b/block/qcow2.c > >> @@ -388,10 +388,7 @@ typedef struct QCowAIOCB { > >> > >> static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb) > >> { > >> - QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common); > >> - if (acb->hd_aiocb) > >> - bdrv_aio_cancel(acb->hd_aiocb); > >> - qemu_aio_release(acb); > >> + qemu_aio_flush(); > >> } > >> > >> static AIOPool qcow2_aio_pool = { > > > > Compiles fine. Is there a particular test case to invoke this code path? > > As far as I know, the common way to get this were IDE resets. However, > I'm not sure what you need to do with a Linux guest if you want it to > reset the IDE controller... I first notice this bug during the stress test of FreeBSD9.0 guests. FreeBSD will issue a ata reset servicing function ata_timeout. A way to speed up the frequency of invoking ata_timeout is to modify the function ata_begin_transaction and let the timeout shorter then do a disk intensive stress test. under FreeBSD, invoke atacontro reinit ata0 may do the reset, Under Linux, invoke hdparm -w will also do the work? (dangerous, a safe way maybe modify Linux kernel and let the ata request timeout shorter?) > > Does this attempt to fix the cluster leaks you mentioned as well, or > > just the cluster allocation endless loop? > > Instead of cancelling the in-flight requests it waits for their > completion, so in the end we should be in an consistent state without > cluster leaks. > Zhouyi
Am 15.10.2012 11:13, schrieb Kevin Wolf: > What I'm suggesting (not even compile tested!) is: > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > diff --git a/block/qcow2.c b/block/qcow2.c > index 48e1b95..d665675 100644 > --- a/block/qcow2.c > +++ b/block/qcow2.c > @@ -388,10 +388,7 @@ typedef struct QCowAIOCB { > > static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb) > { > - QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common); > - if (acb->hd_aiocb) > - bdrv_aio_cancel(acb->hd_aiocb); > - qemu_aio_release(acb); > + qemu_aio_flush(); > } > > static AIOPool qcow2_aio_pool = { Thanks, we've applied this to stable-0.15. Andreas
diff --git a/block/qcow2.c b/block/qcow2.c index 48e1b95..d665675 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -388,10 +388,7 @@ typedef struct QCowAIOCB { static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb) { - QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common); - if (acb->hd_aiocb) - bdrv_aio_cancel(acb->hd_aiocb); - qemu_aio_release(acb); + qemu_aio_flush(); } static AIOPool qcow2_aio_pool = {