Message ID | 20170406142527.25835-5-famz@redhat.com |
---|---|
State | New |
Headers | show |
On 04/06/2017 09:25 AM, Fam Zheng wrote: > During block job completion, nothing is preventing > block_job_defer_to_main_loop_bh from being called in a nested > aio_poll(), which is a trouble, such as in this code path: > > qmp_block_commit > commit_active_start > bdrv_reopen > bdrv_reopen_multiple > bdrv_reopen_prepare > bdrv_flush > aio_poll > aio_bh_poll > aio_bh_call > block_job_defer_to_main_loop_bh > stream_complete > bdrv_reopen > > block_job_defer_to_main_loop_bh is the last step of the stream job, > which should have been "paused" by the bdrv_drained_begin/end in > bdrv_reopen_multiple, but it is not done because it's in the form of a > main loop BH. > > Similar to why block jobs should be paused between drained_begin and > drained_end, BHs they schedule must be excluded as well. To achieve > this, this patch forces draining the BH before leaving bdrv_drained_begin(). > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block/io.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > Nice writeup. Reviewed-by: Eric Blake <eblake@redhat.com>
Am 06.04.2017 um 16:25 hat Fam Zheng geschrieben: > During block job completion, nothing is preventing > block_job_defer_to_main_loop_bh from being called in a nested > aio_poll(), which is a trouble, such as in this code path: > > qmp_block_commit > commit_active_start > bdrv_reopen > bdrv_reopen_multiple > bdrv_reopen_prepare > bdrv_flush > aio_poll > aio_bh_poll > aio_bh_call > block_job_defer_to_main_loop_bh > stream_complete > bdrv_reopen > > block_job_defer_to_main_loop_bh is the last step of the stream job, > which should have been "paused" by the bdrv_drained_begin/end in > bdrv_reopen_multiple, but it is not done because it's in the form of a > main loop BH. > > Similar to why block jobs should be paused between drained_begin and > drained_end, BHs they schedule must be excluded as well. To achieve > this, this patch forces draining the BH before leaving bdrv_drained_begin(). > > Signed-off-by: Fam Zheng <famz@redhat.com> We used to poll BHs in earlier times. Commit 99723548 ('block: add BDS field to count in-flight requests') changed this, without an explanation in the commit message. Paolo, is this simply a bug in that commit, or do you rely on it somewhere? I remember that you wanted to get rid of some aio_poll() calls a while ago. > block/io.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/block/io.c b/block/io.c > index 2709a70..b9cfd18 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -228,7 +228,12 @@ void bdrv_drained_begin(BlockDriverState *bs) > bdrv_parent_drained_begin(bs); > } > > - bdrv_drain_recurse(bs); > + while (true) { > + if (!bdrv_drain_recurse(bs) && > + !aio_poll(bdrv_get_aio_context(bs), false)) { > + break; > + } The indentation is off here. > + } > } The old code had this in what is now the BDRV_POLL_WHILE() call in bdrv_drain_recurse(). I think it makes more sense there, saves you a loop here and fixes bdrv_drain_all_begin() at the same time. Kevin
On 06/04/2017 23:37, Kevin Wolf wrote: > We used to poll BHs in earlier times. Commit 99723548 ('block: add BDS > field to count in-flight requests') changed this, without an explanation > in the commit message. > > Paolo, is this simply a bug in that commit, or do you rely on it > somewhere? I remember that you wanted to get rid of some aio_poll() > calls a while ago. It was replaced by the bdrv_inc_in_flight/bdrv_dec_in_flight calls in bdrv_aio_prw and bdrv_co_complete. These let bdrv_drain complete the invocation of the callbacks. But block_job_defer_to_main_loop is different because it schedules the bottom half in another QEMU AioContext (the main thread's). Since he's here with me, I'll ask Fam exactly what's happening. Stefan also found something not too convincing in his review. Paolo >> block/io.c | 7 ++++++- >> 1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/block/io.c b/block/io.c >> index 2709a70..b9cfd18 100644 >> --- a/block/io.c >> +++ b/block/io.c >> @@ -228,7 +228,12 @@ void bdrv_drained_begin(BlockDriverState *bs) >> bdrv_parent_drained_begin(bs); >> } >> >> - bdrv_drain_recurse(bs); >> + while (true) { >> + if (!bdrv_drain_recurse(bs) && >> + !aio_poll(bdrv_get_aio_context(bs), false)) { >> + break; >> + } > > The indentation is off here. > >> + } >> } > > The old code had this in what is now the BDRV_POLL_WHILE() call in > bdrv_drain_recurse(). I think it makes more sense there, saves you a > loop here and fixes bdrv_drain_all_begin() at the same time. > > Kevin > >
diff --git a/block/io.c b/block/io.c index 2709a70..b9cfd18 100644 --- a/block/io.c +++ b/block/io.c @@ -228,7 +228,12 @@ void bdrv_drained_begin(BlockDriverState *bs) bdrv_parent_drained_begin(bs); } - bdrv_drain_recurse(bs); + while (true) { + if (!bdrv_drain_recurse(bs) && + !aio_poll(bdrv_get_aio_context(bs), false)) { + break; + } + } } void bdrv_drained_end(BlockDriverState *bs)
During block job completion, nothing is preventing block_job_defer_to_main_loop_bh from being called in a nested aio_poll(), which is a trouble, such as in this code path: qmp_block_commit commit_active_start bdrv_reopen bdrv_reopen_multiple bdrv_reopen_prepare bdrv_flush aio_poll aio_bh_poll aio_bh_call block_job_defer_to_main_loop_bh stream_complete bdrv_reopen block_job_defer_to_main_loop_bh is the last step of the stream job, which should have been "paused" by the bdrv_drained_begin/end in bdrv_reopen_multiple, but it is not done because it's in the form of a main loop BH. Similar to why block jobs should be paused between drained_begin and drained_end, BHs they schedule must be excluded as well. To achieve this, this patch forces draining the BH before leaving bdrv_drained_begin(). Signed-off-by: Fam Zheng <famz@redhat.com> --- block/io.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)