diff mbox

[5/9] mirror: improve performance of mirroring of empty disk

Message ID 1465917916-22348-6-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev June 14, 2016, 3:25 p.m. UTC
We should not take into account zero blocks for delay calculations.
They are not read and thus IO throttling is not required. In the
other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
days.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Eric Blake June 15, 2016, 3:20 a.m. UTC | #1
On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> We should not take into account zero blocks for delay calculations.
> They are not read and thus IO throttling is not required. In the
> other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
> days.
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Jeff Cody <jcody@redhat.com>
> CC: Eric Blake <eblake@redhat.com>
> ---
>  block/mirror.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Seems reasonable, but I'll let others more familiar with throttling give
the final say.
Stefan Hajnoczi June 15, 2016, 9:19 a.m. UTC | #2
On Tue, Jun 14, 2016 at 09:20:47PM -0600, Eric Blake wrote:
> On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> > We should not take into account zero blocks for delay calculations.
> > They are not read and thus IO throttling is not required. In the
> > other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
> > days.
> > 
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > CC: Stefan Hajnoczi <stefanha@redhat.com>
> > CC: Fam Zheng <famz@redhat.com>
> > CC: Kevin Wolf <kwolf@redhat.com>
> > CC: Max Reitz <mreitz@redhat.com>
> > CC: Jeff Cody <jcody@redhat.com>
> > CC: Eric Blake <eblake@redhat.com>
> > ---
> >  block/mirror.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> Seems reasonable, but I'll let others more familiar with throttling give
> the final say.

There is a bounce buffer fallback when !drv->bdrv_co_pwrite_zeroes.  In
that case we need to account for the bytes transferred.  I don't see
where the patch takes this into account.
Denis V. Lunev June 15, 2016, 10:37 a.m. UTC | #3
On 06/15/2016 12:19 PM, Stefan Hajnoczi wrote:
> On Tue, Jun 14, 2016 at 09:20:47PM -0600, Eric Blake wrote:
>> On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
>>> We should not take into account zero blocks for delay calculations.
>>> They are not read and thus IO throttling is not required. In the
>>> other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
>>> days.
>>>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Fam Zheng <famz@redhat.com>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Jeff Cody <jcody@redhat.com>
>>> CC: Eric Blake <eblake@redhat.com>
>>> ---
>>>   block/mirror.c | 7 +++++--
>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>> Seems reasonable, but I'll let others more familiar with throttling give
>> the final say.
> There is a bounce buffer fallback when !drv->bdrv_co_pwrite_zeroes.  In
> that case we need to account for the bytes transferred.  I don't see
> where the patch takes this into account.
Interesting point.

I think we are charging for IO performed. Thus with the
absence of the callback we should charge for io_sectors/2
The write will be full scale, there is no reading.

Correct?
Stefan Hajnoczi June 16, 2016, 10:10 a.m. UTC | #4
On Wed, Jun 15, 2016 at 01:37:18PM +0300, Denis V. Lunev wrote:
> On 06/15/2016 12:19 PM, Stefan Hajnoczi wrote:
> > On Tue, Jun 14, 2016 at 09:20:47PM -0600, Eric Blake wrote:
> > > On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
> > > > We should not take into account zero blocks for delay calculations.
> > > > They are not read and thus IO throttling is not required. In the
> > > > other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
> > > > days.
> > > > 
> > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > CC: Stefan Hajnoczi <stefanha@redhat.com>
> > > > CC: Fam Zheng <famz@redhat.com>
> > > > CC: Kevin Wolf <kwolf@redhat.com>
> > > > CC: Max Reitz <mreitz@redhat.com>
> > > > CC: Jeff Cody <jcody@redhat.com>
> > > > CC: Eric Blake <eblake@redhat.com>
> > > > ---
> > > >   block/mirror.c | 7 +++++--
> > > >   1 file changed, 5 insertions(+), 2 deletions(-)
> > > Seems reasonable, but I'll let others more familiar with throttling give
> > > the final say.
> > There is a bounce buffer fallback when !drv->bdrv_co_pwrite_zeroes.  In
> > that case we need to account for the bytes transferred.  I don't see
> > where the patch takes this into account.
> Interesting point.
> 
> I think we are charging for IO performed. Thus with the
> absence of the callback we should charge for io_sectors/2
> The write will be full scale, there is no reading.
> 
> Correct?

io_sectors currently only accounts for bytes written, not bytes read.

Therefore, I think we need:

/* Don't charge for efficient zero writes */
if (drv->bdrv_co_pwrite_zeroes) {
    io_sectors = 0;
}

Stefan
Eric Blake June 17, 2016, 2:53 a.m. UTC | #5
On 06/16/2016 04:10 AM, Stefan Hajnoczi wrote:

> 
> io_sectors currently only accounts for bytes written, not bytes read.
> 
> Therefore, I think we need:
> 
> /* Don't charge for efficient zero writes */
> if (drv->bdrv_co_pwrite_zeroes) {
>     io_sectors = 0;
> }

That's not sufficient.  NBD will have conditional support for write
zeroes, depending on whether the server supports it (that is, once my
patches for NBD_CMD_WRITE_ZEROES get out of a holding pattern on all the
other patches in the queue being flushed...).  So NBD will have the
bdrv_co_pwrite_zeroes callback, but that doesn't mean it will always
work - if the server doesn't support it, calling bdrv_co_pwrite_zeroes
will fail with -ENOTSUP and fall back to less-efficient writes that need
to be accounted for.
Stefan Hajnoczi June 17, 2016, 1:56 p.m. UTC | #6
On Thu, Jun 16, 2016 at 08:53:12PM -0600, Eric Blake wrote:
> On 06/16/2016 04:10 AM, Stefan Hajnoczi wrote:
> 
> > 
> > io_sectors currently only accounts for bytes written, not bytes read.
> > 
> > Therefore, I think we need:
> > 
> > /* Don't charge for efficient zero writes */
> > if (drv->bdrv_co_pwrite_zeroes) {
> >     io_sectors = 0;
> > }
> 
> That's not sufficient.  NBD will have conditional support for write
> zeroes, depending on whether the server supports it (that is, once my
> patches for NBD_CMD_WRITE_ZEROES get out of a holding pattern on all the
> other patches in the queue being flushed...).  So NBD will have the
> bdrv_co_pwrite_zeroes callback, but that doesn't mean it will always
> work - if the server doesn't support it, calling bdrv_co_pwrite_zeroes
> will fail with -ENOTSUP and fall back to less-efficient writes that need
> to be accounted for.

Good point!  Optimizations are tricky :-).

Stefan
diff mbox

Patch

diff --git a/block/mirror.c b/block/mirror.c
index c2f8773..d8be80a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -363,7 +363,7 @@  static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
     bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
     while (nb_chunks > 0 && sector_num < end) {
         int ret;
-        int io_sectors;
+        int io_sectors, io_sectors_acct;
         BlockDriverState *file;
         enum MirrorMethod {
             MIRROR_METHOD_COPY,
@@ -399,12 +399,15 @@  static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         switch (mirror_method) {
         case MIRROR_METHOD_COPY:
             io_sectors = mirror_do_read(s, sector_num, io_sectors);
+            io_sectors_acct = io_sectors;
             break;
         case MIRROR_METHOD_ZERO:
             mirror_do_zero_or_discard(s, sector_num, io_sectors, false);
+            io_sectors_acct = 0;
             break;
         case MIRROR_METHOD_DISCARD:
             mirror_do_zero_or_discard(s, sector_num, io_sectors, true);
+            io_sectors_acct = 0;
             break;
         default:
             abort();
@@ -412,7 +415,7 @@  static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
         assert(io_sectors);
         sector_num += io_sectors;
         nb_chunks -= DIV_ROUND_UP(io_sectors, sectors_per_chunk);
-        delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors);
+        delay_ns += ratelimit_calculate_delay(&s->limit, io_sectors_acct);
     }
     return delay_ns;
 }