Message ID | 1431619400-640-1-git-send-email-yarygin@linux.vnet.ibm.com |
---|---|
State | New |
Headers | show |
On Thu, 05/14 19:03, Alexander Yarygin wrote: > After the commit 9b536adc ("block: acquire AioContext in > bdrv_drain_all()") the aio_poll() function got called for every > BlockDriverState, in assumption that every device may have its own > AioContext. The bdrv_drain_all() function is called in each > virtio_reset() call, which in turn is called for every virtio-blk > device on initialization, so we got aio_poll() called > 'length(device_list)^2' times. > > If we have thousands of disks attached, there are a lot of > BlockDriverStates but only a few AioContexts, leading to tons of > unnecessary aio_poll() calls. For example, startup times with 1000 disks > takes over 13 minutes. > > This patch changes the bdrv_drain_all() function allowing it find shared > AioContexts and to call aio_poll() only for unique ones. This results in > much better startup times, e.g. 1000 disks do come up within 5 seconds. > > Cc: Christian Borntraeger <borntraeger@de.ibm.com> > Cc: Cornelia Huck <cornelia.huck@de.ibm.com> > Cc: Kevin Wolf <kwolf@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Stefan Hajnoczi <stefanha@redhat.com> > Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> > --- > block.c | 40 +++++++++++++++++++++++++--------------- This doesn't apply to current master, the function has already changed and is not in block/io.c, could you rebase it? > 1 file changed, 25 insertions(+), 15 deletions(-) > > diff --git a/block.c b/block.c > index f2f8ae7..bdfb1ce 100644 > --- a/block.c > +++ b/block.c > @@ -1987,17 +1987,6 @@ static bool bdrv_requests_pending(BlockDriverState *bs) > return false; > } > > -static bool bdrv_drain_one(BlockDriverState *bs) > -{ > - bool bs_busy; > - > - bdrv_flush_io_queue(bs); > - bdrv_start_throttled_reqs(bs); > - bs_busy = bdrv_requests_pending(bs); > - bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy); > - return bs_busy; > -} > - > /* > * Wait for pending requests to complete on a single BlockDriverState subtree > * > @@ -2010,8 +1999,13 @@ static bool bdrv_drain_one(BlockDriverState *bs) > */ > void bdrv_drain(BlockDriverState *bs) > { > - while (bdrv_drain_one(bs)) { > + bool busy = true; > + > + while (busy) { > /* Keep iterating */ > + bdrv_flush_io_queue(bs); > + busy = bdrv_requests_pending(bs); > + busy |= aio_poll(bdrv_get_aio_context(bs), busy); > } > } > > @@ -2030,20 +2024,35 @@ void bdrv_drain(BlockDriverState *bs) > void bdrv_drain_all(void) > { > /* Always run first iteration so any pending completion BHs run */ > - bool busy = true; > + bool busy = true, pending = false; > BlockDriverState *bs; > + GSList *aio_ctxs = NULL, *ctx; > + AioContext *aio_context; > > while (busy) { > busy = false; > > QTAILQ_FOREACH(bs, &bdrv_states, device_list) { > - AioContext *aio_context = bdrv_get_aio_context(bs); > + aio_context = bdrv_get_aio_context(bs); > + > + aio_context_acquire(aio_context); > + bdrv_flush_io_queue(bs); > + busy |= bdrv_requests_pending(bs); > + aio_context_release(aio_context); > + if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) { > + aio_ctxs = g_slist_prepend(aio_ctxs, aio_context); > + } > + } > + pending = busy; > > + for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) { > + aio_context = ctx->data; > aio_context_acquire(aio_context); > - busy |= bdrv_drain_one(bs); > + busy |= aio_poll(aio_context, pending); > aio_context_release(aio_context); > } > } > + g_slist_free(aio_ctxs); > } How do you make sure that the second loop doesn't queue any request onto bs->throttled_reqs? Fam
Am 14.05.2015 um 18:03 schrieb Alexander Yarygin: > After the commit 9b536adc ("block: acquire AioContext in > bdrv_drain_all()") the aio_poll() function got called for every > BlockDriverState, in assumption that every device may have its own > AioContext. The bdrv_drain_all() function is called in each > virtio_reset() call, which in turn is called for every virtio-blk > device on initialization, so we got aio_poll() called > 'length(device_list)^2' times. > > If we have thousands of disks attached, there are a lot of > BlockDriverStates but only a few AioContexts, leading to tons of > unnecessary aio_poll() calls. For example, startup times with 1000 disks > takes over 13 minutes. > > This patch changes the bdrv_drain_all() function allowing it find shared > AioContexts and to call aio_poll() only for unique ones. This results in > much better startup times, e.g. 1000 disks do come up within 5 seconds. > > Cc: Christian Borntraeger <borntraeger@de.ibm.com> > Cc: Cornelia Huck <cornelia.huck@de.ibm.com> > Cc: Kevin Wolf <kwolf@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Stefan Hajnoczi <stefanha@redhat.com> > Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Applying on top of 2.3 I can verify the speedup. Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> PS: There is another independent issue now in the kernel when exiting QEMU caused by Linux kernel commit 6098b45b32e6baeacc04790773ced9340601d511 Author: Gu Zheng <guz.fnst@cn.fujitsu.com> AuthorDate: Wed Sep 3 17:45:44 2014 +0800 Commit: Benjamin LaHaise <bcrl@kvack.org> CommitDate: Thu Sep 4 16:54:47 2014 -0400 aio: block exit_aio() until all context requests are completed A QEMU with 1000 devices will sleep a long time in exit_aio with now obvious sign of activity as zombie process. I will take care of that.... Christian > --- > block.c | 40 +++++++++++++++++++++++++--------------- > 1 file changed, 25 insertions(+), 15 deletions(-) > > diff --git a/block.c b/block.c > index f2f8ae7..bdfb1ce 100644 > --- a/block.c > +++ b/block.c > @@ -1987,17 +1987,6 @@ static bool bdrv_requests_pending(BlockDriverState *bs) > return false; > } > > -static bool bdrv_drain_one(BlockDriverState *bs) > -{ > - bool bs_busy; > - > - bdrv_flush_io_queue(bs); > - bdrv_start_throttled_reqs(bs); > - bs_busy = bdrv_requests_pending(bs); > - bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy); > - return bs_busy; > -} > - > /* > * Wait for pending requests to complete on a single BlockDriverState subtree > * > @@ -2010,8 +1999,13 @@ static bool bdrv_drain_one(BlockDriverState *bs) > */ > void bdrv_drain(BlockDriverState *bs) > { > - while (bdrv_drain_one(bs)) { > + bool busy = true; > + > + while (busy) { > /* Keep iterating */ > + bdrv_flush_io_queue(bs); > + busy = bdrv_requests_pending(bs); > + busy |= aio_poll(bdrv_get_aio_context(bs), busy); > } > } > > @@ -2030,20 +2024,35 @@ void bdrv_drain(BlockDriverState *bs) > void bdrv_drain_all(void) > { > /* Always run first iteration so any pending completion BHs run */ > - bool busy = true; > + bool busy = true, pending = false; > BlockDriverState *bs; > + GSList *aio_ctxs = NULL, *ctx; > + AioContext *aio_context; > > while (busy) { > busy = false; > > QTAILQ_FOREACH(bs, &bdrv_states, device_list) { > - AioContext *aio_context = bdrv_get_aio_context(bs); > + aio_context = bdrv_get_aio_context(bs); > + > + aio_context_acquire(aio_context); > + bdrv_flush_io_queue(bs); > + busy |= bdrv_requests_pending(bs); > + aio_context_release(aio_context); > + if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) { > + aio_ctxs = g_slist_prepend(aio_ctxs, aio_context); > + } > + } > + pending = busy; > > + for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) { > + aio_context = ctx->data; > aio_context_acquire(aio_context); > - busy |= bdrv_drain_one(bs); > + busy |= aio_poll(aio_context, pending); > aio_context_release(aio_context); > } > } > + g_slist_free(aio_ctxs); > } > > /* make a BlockDriverState anonymous by removing from bdrv_state and > @@ -6087,6 +6096,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs) > } else if (bs->file) { > bdrv_flush_io_queue(bs->file); > } > + bdrv_start_throttled_reqs(bs); > } > > static bool append_open_options(QDict *d, BlockDriverState *bs) >
Am 15.05.2015 um 08:59 schrieb Christian Borntraeger: > PS: There is another independent issue now in the kernel when exiting QEMU > caused by Linux kernel commit 6098b45b32e6baeacc04790773ced9340601d511 > Author: Gu Zheng <guz.fnst@cn.fujitsu.com> > AuthorDate: Wed Sep 3 17:45:44 2014 +0800 > Commit: Benjamin LaHaise <bcrl@kvack.org> > CommitDate: Thu Sep 4 16:54:47 2014 -0400 > > aio: block exit_aio() until all context requests are completed > > A QEMU with 1000 devices will sleep a long time in exit_aio with now obvious > sign of activity as zombie process. I will take care of that.... > > Christian To make it clear. This kernel wait time happens with and without Alexanders patch.
Am 15.05.2015 um 08:59 schrieb Christian Borntraeger: > Am 14.05.2015 um 18:03 schrieb Alexander Yarygin: >> After the commit 9b536adc ("block: acquire AioContext in >> bdrv_drain_all()") the aio_poll() function got called for every >> BlockDriverState, in assumption that every device may have its own >> AioContext. The bdrv_drain_all() function is called in each >> virtio_reset() call, which in turn is called for every virtio-blk >> device on initialization, so we got aio_poll() called >> 'length(device_list)^2' times. >> >> If we have thousands of disks attached, there are a lot of >> BlockDriverStates but only a few AioContexts, leading to tons of >> unnecessary aio_poll() calls. For example, startup times with 1000 disks >> takes over 13 minutes. >> >> This patch changes the bdrv_drain_all() function allowing it find shared >> AioContexts and to call aio_poll() only for unique ones. This results in >> much better startup times, e.g. 1000 disks do come up within 5 seconds. >> >> Cc: Christian Borntraeger <borntraeger@de.ibm.com> >> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> >> Cc: Kevin Wolf <kwolf@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Stefan Hajnoczi <stefanha@redhat.com> >> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> > > Applying on top of 2.3 I can verify the speedup. > Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Hmmm. When I enable iothreads for all of these devices I get hangs. So lets defer my Tested-by until I understand that :-(
diff --git a/block.c b/block.c index f2f8ae7..bdfb1ce 100644 --- a/block.c +++ b/block.c @@ -1987,17 +1987,6 @@ static bool bdrv_requests_pending(BlockDriverState *bs) return false; } -static bool bdrv_drain_one(BlockDriverState *bs) -{ - bool bs_busy; - - bdrv_flush_io_queue(bs); - bdrv_start_throttled_reqs(bs); - bs_busy = bdrv_requests_pending(bs); - bs_busy |= aio_poll(bdrv_get_aio_context(bs), bs_busy); - return bs_busy; -} - /* * Wait for pending requests to complete on a single BlockDriverState subtree * @@ -2010,8 +1999,13 @@ static bool bdrv_drain_one(BlockDriverState *bs) */ void bdrv_drain(BlockDriverState *bs) { - while (bdrv_drain_one(bs)) { + bool busy = true; + + while (busy) { /* Keep iterating */ + bdrv_flush_io_queue(bs); + busy = bdrv_requests_pending(bs); + busy |= aio_poll(bdrv_get_aio_context(bs), busy); } } @@ -2030,20 +2024,35 @@ void bdrv_drain(BlockDriverState *bs) void bdrv_drain_all(void) { /* Always run first iteration so any pending completion BHs run */ - bool busy = true; + bool busy = true, pending = false; BlockDriverState *bs; + GSList *aio_ctxs = NULL, *ctx; + AioContext *aio_context; while (busy) { busy = false; QTAILQ_FOREACH(bs, &bdrv_states, device_list) { - AioContext *aio_context = bdrv_get_aio_context(bs); + aio_context = bdrv_get_aio_context(bs); + + aio_context_acquire(aio_context); + bdrv_flush_io_queue(bs); + busy |= bdrv_requests_pending(bs); + aio_context_release(aio_context); + if (!aio_ctxs || !g_slist_find(aio_ctxs, aio_context)) { + aio_ctxs = g_slist_prepend(aio_ctxs, aio_context); + } + } + pending = busy; + for (ctx = aio_ctxs; ctx != NULL; ctx = ctx->next) { + aio_context = ctx->data; aio_context_acquire(aio_context); - busy |= bdrv_drain_one(bs); + busy |= aio_poll(aio_context, pending); aio_context_release(aio_context); } } + g_slist_free(aio_ctxs); } /* make a BlockDriverState anonymous by removing from bdrv_state and @@ -6087,6 +6096,7 @@ void bdrv_flush_io_queue(BlockDriverState *bs) } else if (bs->file) { bdrv_flush_io_queue(bs->file); } + bdrv_start_throttled_reqs(bs); } static bool append_open_options(QDict *d, BlockDriverState *bs)
After the commit 9b536adc ("block: acquire AioContext in bdrv_drain_all()") the aio_poll() function got called for every BlockDriverState, in assumption that every device may have its own AioContext. The bdrv_drain_all() function is called in each virtio_reset() call, which in turn is called for every virtio-blk device on initialization, so we got aio_poll() called 'length(device_list)^2' times. If we have thousands of disks attached, there are a lot of BlockDriverStates but only a few AioContexts, leading to tons of unnecessary aio_poll() calls. For example, startup times with 1000 disks takes over 13 minutes. This patch changes the bdrv_drain_all() function allowing it find shared AioContexts and to call aio_poll() only for unique ones. This results in much better startup times, e.g. 1000 disks do come up within 5 seconds. Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> --- block.c | 40 +++++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 15 deletions(-)