Message ID | 1404303528-7115-4-git-send-email-ming.lei@canonical.com |
---|---|
State | New |
Headers | show |
Am 02.07.2014 um 14:18 hat Ming Lei geschrieben: > Before commit 580b6b2aa2(dataplane: use the QEMU block > layer for I/O), dataplane for virtio-blk submits block > I/O as a batch. > > This commit 580b6b2aa2 replaces the custom linux AIO > implementation(including submit I/O as a batch) with QEMU > block layer, but this commit causes ~40% throughput regression > on virtio-blk performance, and removing submitting I/O > as a batch is one of the causes. > > This patch applies the newly introduced bdrv_io_plug() and > bdrv_io_unplug() interfaces to support submitting I/O > at batch for Qemu block layer, and in my test, the change > can improve throughput by ~30% with 'aio=native'. > > Following my fio test script: > > [global] > direct=1 > size=4G > bsrange=4k-4k > timeout=40 > numjobs=4 > ioengine=libaio > iodepth=64 > filename=/dev/vdc > group_reporting=1 > > [f] > rw=randread > > Result on one of my small machine(host: x86_64, 2cores, 4thread, guest: 4cores): > - qemu master: 59K IOPS > - qemu master with these patches: 81K IOPS > - 2.0.0 release(dataplane using custom linux aio): 104K IOPS > > Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > Signed-off-by: Ming Lei <ming.lei@canonical.com> Reviewing this one doesn't make sense because commit b002254d ('virtio-blk: Unify {non-,}dataplane's request handlings') removes the patched code. You need to patch common virtio-blk code now (which should improve non-dataplane virtio-blk as well). Kevin
Il 03/07/2014 11:44, Kevin Wolf ha scritto: > Am 02.07.2014 um 14:18 hat Ming Lei geschrieben: >> Before commit 580b6b2aa2(dataplane: use the QEMU block >> layer for I/O), dataplane for virtio-blk submits block >> I/O as a batch. >> >> This commit 580b6b2aa2 replaces the custom linux AIO >> implementation(including submit I/O as a batch) with QEMU >> block layer, but this commit causes ~40% throughput regression >> on virtio-blk performance, and removing submitting I/O >> as a batch is one of the causes. >> >> This patch applies the newly introduced bdrv_io_plug() and >> bdrv_io_unplug() interfaces to support submitting I/O >> at batch for Qemu block layer, and in my test, the change >> can improve throughput by ~30% with 'aio=native'. >> >> Following my fio test script: >> >> [global] >> direct=1 >> size=4G >> bsrange=4k-4k >> timeout=40 >> numjobs=4 >> ioengine=libaio >> iodepth=64 >> filename=/dev/vdc >> group_reporting=1 >> >> [f] >> rw=randread >> >> Result on one of my small machine(host: x86_64, 2cores, 4thread, guest: 4cores): >> - qemu master: 59K IOPS >> - qemu master with these patches: 81K IOPS >> - 2.0.0 release(dataplane using custom linux aio): 104K IOPS >> >> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> >> Signed-off-by: Ming Lei <ming.lei@canonical.com> > > Reviewing this one doesn't make sense because commit b002254d > ('virtio-blk: Unify {non-,}dataplane's request handlings') removes the > patched code. You need to patch common virtio-blk code now (which should > improve non-dataplane virtio-blk as well). Actually no, the second and third hunk still apply. The patch you mentioned unifies request processing, but the loops that fetches requests from the virtqueue are still separate. This is because virtio-blk-dataplane still uses the vring API instead of the generic virtio API. address_space_map/unmap is not thread-safe yet, vring avoids it. Regarding the first hunk, it is not needed at all. Paolo
Am 03.07.2014 um 11:49 hat Paolo Bonzini geschrieben: > Il 03/07/2014 11:44, Kevin Wolf ha scritto: > >Am 02.07.2014 um 14:18 hat Ming Lei geschrieben: > >>Before commit 580b6b2aa2(dataplane: use the QEMU block > >>layer for I/O), dataplane for virtio-blk submits block > >>I/O as a batch. > >> > >>This commit 580b6b2aa2 replaces the custom linux AIO > >>implementation(including submit I/O as a batch) with QEMU > >>block layer, but this commit causes ~40% throughput regression > >>on virtio-blk performance, and removing submitting I/O > >>as a batch is one of the causes. > >> > >>This patch applies the newly introduced bdrv_io_plug() and > >>bdrv_io_unplug() interfaces to support submitting I/O > >>at batch for Qemu block layer, and in my test, the change > >>can improve throughput by ~30% with 'aio=native'. > >> > >>Following my fio test script: > >> > >> [global] > >> direct=1 > >> size=4G > >> bsrange=4k-4k > >> timeout=40 > >> numjobs=4 > >> ioengine=libaio > >> iodepth=64 > >> filename=/dev/vdc > >> group_reporting=1 > >> > >> [f] > >> rw=randread > >> > >>Result on one of my small machine(host: x86_64, 2cores, 4thread, guest: 4cores): > >> - qemu master: 59K IOPS > >> - qemu master with these patches: 81K IOPS > >> - 2.0.0 release(dataplane using custom linux aio): 104K IOPS > >> > >>Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> > >>Signed-off-by: Ming Lei <ming.lei@canonical.com> > > > >Reviewing this one doesn't make sense because commit b002254d > >('virtio-blk: Unify {non-,}dataplane's request handlings') removes the > >patched code. You need to patch common virtio-blk code now (which should > >improve non-dataplane virtio-blk as well). > > Actually no, the second and third hunk still apply. The patch you > mentioned unifies request processing, but the loops that fetches > requests from the virtqueue are still separate. This is because > virtio-blk-dataplane still uses the vring API instead of the generic > virtio API. address_space_map/unmap is not thread-safe yet, vring > avoids it. > > Regarding the first hunk, it is not needed at all. Right, I only saw the merge conflict on the first hunk and gave up. I didn't even look at the unapplied patch to see that it's wrong and should be dropped indeed. Kevin
On Wed, Jul 02, 2014 at 08:18:48PM +0800, Ming Lei wrote: > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > index c10b7b7..82bb276 100644 > --- a/hw/block/dataplane/virtio-blk.c > +++ b/hw/block/dataplane/virtio-blk.c > @@ -201,6 +201,9 @@ static void do_flush_cmd(VirtIOBlockDataPlane *s, VirtQueueElement *elem, > req->elem = elem; > req->inhdr = inhdr; > > + /* flush IOs queued first */ > + bdrv_flush_io_queue(s->blk->conf.bs); > + > bdrv_aio_flush(s->blk->conf.bs, complete_flush, req); > } > I looked back at previous email threads but I don't understand why this is necessary. bdrv_aio_flush() commits the disk write cache, that means _already completed_ writes will be on stable storage. However, it does not make any guarantees about in-flight writes. So this seems like a pointless call that can be dropped safely. Stefan
On Wed, Jul 02, 2014 at 08:18:48PM +0800, Ming Lei wrote: > diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c > index c10b7b7..82bb276 100644 > --- a/hw/block/dataplane/virtio-blk.c > +++ b/hw/block/dataplane/virtio-blk.c > @@ -201,6 +201,9 @@ static void do_flush_cmd(VirtIOBlockDataPlane *s, VirtQueueElement *elem, > req->elem = elem; > req->inhdr = inhdr; > > + /* flush IOs queued first */ > + bdrv_flush_io_queue(s->blk->conf.bs); > + > bdrv_aio_flush(s->blk->conf.bs, complete_flush, req); > } > > @@ -289,6 +292,7 @@ static void handle_notify(EventNotifier *e) > int ret; > > event_notifier_test_and_clear(&s->host_notifier); > + bdrv_io_plug(s->blk->conf.bs); > for (;;) { > /* Disable guest->host notifies to avoid unnecessary vmexits */ > vring_disable_notification(s->vdev, &s->vring); > @@ -322,6 +326,7 @@ static void handle_notify(EventNotifier *e) > break; > } > } > + bdrv_io_unplug(s->blk->conf.bs); > } Might as well do the same for non-dataplane in hw/block/virtio-blk.c? Stefan
On Thu, Jul 3, 2014 at 8:37 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote: > On Wed, Jul 02, 2014 at 08:18:48PM +0800, Ming Lei wrote: >> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c >> index c10b7b7..82bb276 100644 >> --- a/hw/block/dataplane/virtio-blk.c >> +++ b/hw/block/dataplane/virtio-blk.c >> @@ -201,6 +201,9 @@ static void do_flush_cmd(VirtIOBlockDataPlane *s, VirtQueueElement *elem, >> req->elem = elem; >> req->inhdr = inhdr; >> >> + /* flush IOs queued first */ >> + bdrv_flush_io_queue(s->blk->conf.bs); >> + >> bdrv_aio_flush(s->blk->conf.bs, complete_flush, req); >> } >> >> @@ -289,6 +292,7 @@ static void handle_notify(EventNotifier *e) >> int ret; >> >> event_notifier_test_and_clear(&s->host_notifier); >> + bdrv_io_plug(s->blk->conf.bs); >> for (;;) { >> /* Disable guest->host notifies to avoid unnecessary vmexits */ >> vring_disable_notification(s->vdev, &s->vring); >> @@ -322,6 +326,7 @@ static void handle_notify(EventNotifier *e) >> break; >> } >> } >> + bdrv_io_unplug(s->blk->conf.bs); >> } > > Might as well do the same for non-dataplane in hw/block/virtio-blk.c? It may need further optimization for non-dataplane, and I saw the message of 'main-loop: WARNING: I/O thread spun for 1000 iterations' when I enabled for non-dataplane. The queue depth should be decreased for non-dataplane. Thanks,
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index c10b7b7..82bb276 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -201,6 +201,9 @@ static void do_flush_cmd(VirtIOBlockDataPlane *s, VirtQueueElement *elem, req->elem = elem; req->inhdr = inhdr; + /* flush IOs queued first */ + bdrv_flush_io_queue(s->blk->conf.bs); + bdrv_aio_flush(s->blk->conf.bs, complete_flush, req); } @@ -289,6 +292,7 @@ static void handle_notify(EventNotifier *e) int ret; event_notifier_test_and_clear(&s->host_notifier); + bdrv_io_plug(s->blk->conf.bs); for (;;) { /* Disable guest->host notifies to avoid unnecessary vmexits */ vring_disable_notification(s->vdev, &s->vring); @@ -322,6 +326,7 @@ static void handle_notify(EventNotifier *e) break; } } + bdrv_io_unplug(s->blk->conf.bs); } /* Context: QEMU global mutex held */