diff mbox

[1/3] block: ignore flush requests when storage is clean

Message ID 1466780802-30424-2-git-send-email-den@openvz.org
State New
Headers show

Commit Message

Denis V. Lunev June 24, 2016, 3:06 p.m. UTC
From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a dirty flag in BlockDriverState which is set
in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
avoid unnesessary flushing when storage is clean.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 block.c                   |  1 +
 block/dirty-bitmap.c      |  3 +++
 block/io.c                | 19 +++++++++++++++++++
 include/block/block_int.h |  2 ++
 4 files changed, 25 insertions(+)

Comments

Eric Blake June 24, 2016, 3:31 p.m. UTC | #1
On 06/24/2016 09:06 AM, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a dirty flag in BlockDriverState which is set
> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> avoid unnesessary flushing when storage is clean.

s/unnesessary/unnecessary/ (I pointed this out against v2
https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg05817.html,
which makes me wonder if anything else was missed)

> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> ---

> +++ b/include/block/block_int.h
> @@ -418,6 +418,8 @@ struct BlockDriverState {
>      int sg;        /* if true, the device is a /dev/sg* */
>      int copy_on_read; /* if true, copy read backing sectors into image
>                           note this is a reference count */
> +
> +    bool dirty;
>      bool probed;
>  

Conflicts with the current state of Kevin's block branch (due to my
reordering and conversion of bool parameters); so you'll want to rebase.
Evgeny Yakovlev June 24, 2016, 3:54 p.m. UTC | #2
On 24.06.2016 18:31, Eric Blake wrote:
> On 06/24/2016 09:06 AM, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a dirty flag in BlockDriverState which is set
>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>> avoid unnesessary flushing when storage is clean.
> s/unnesessary/unnecessary/ (I pointed this out against v2
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg05817.html,
> which makes me wonder if anything else was missed)

Yeah, i fixed that but messed up committing a change in commit message. 
Will be fixed in rebased version.

>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>> +++ b/include/block/block_int.h
>> @@ -418,6 +418,8 @@ struct BlockDriverState {
>>       int sg;        /* if true, the device is a /dev/sg* */
>>       int copy_on_read; /* if true, copy read backing sectors into image
>>                            note this is a reference count */
>> +
>> +    bool dirty;
>>       bool probed;
>>   
> Conflicts with the current state of Kevin's block branch (due to my
> reordering and conversion of bool parameters); so you'll want to rebase.
>

Ok
Paolo Bonzini June 28, 2016, 9:01 p.m. UTC | #3
On 24/06/2016 17:06, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a dirty flag in BlockDriverState which is set
> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> avoid unnesessary flushing when storage is clean.
> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> ---
>  block.c                   |  1 +
>  block/dirty-bitmap.c      |  3 +++
>  block/io.c                | 19 +++++++++++++++++++
>  include/block/block_int.h |  2 ++
>  4 files changed, 25 insertions(+)
> 
> diff --git a/block.c b/block.c
> index f4648e9..e36f148 100644
> --- a/block.c
> +++ b/block.c
> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>          ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>          bdrv_dirty_bitmap_truncate(bs);
>          bdrv_parent_cb_resize(bs);
> +        bs->dirty = true; /* file node sync is needed after truncate */
>      }
>      return ret;
>  }
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 4902ca5..54e0413 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>          }
>          hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>      }
> +
> +    /* Set global block driver dirty flag even if bitmap is disabled */
> +    bs->dirty = true;
>  }
>  
>  /**
> diff --git a/block/io.c b/block/io.c
> index 7cf3645..8078af2 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>          goto flush_parent;
>      }
>  
> +    /* Check if storage is actually dirty before flushing to disk */
> +    if (!bs->dirty) {
> +        /* Flush requests are appended to tracked request list in order so that
> +         * most recent request is at the head of the list. Following code uses
> +         * this ordering to wait for the most recent flush request to complete
> +         * to ensure that requests return in order */
> +        BdrvTrackedRequest *prev_req;
> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
> +                continue;
> +            }
> +
> +            qemu_co_queue_wait(&prev_req->wait_queue);
> +            break;
> +        }
> +        goto flush_parent;

Can you just have a CoQueue specific to flushes, where a completing
flush does a restart_all on the CoQueue?

Flushes are never serialising, so there's no reason for them to be in
tracked_requests (I posted patches a while ago that instead use a simple
atomic counter, but they will only be in 2.8).

Paolo

> +    }
> +    bs->dirty = false;
> +
>      BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>      if (bs->drv->bdrv_co_flush_to_disk) {
>          ret = bs->drv->bdrv_co_flush_to_disk(bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2057156..616058b 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -418,6 +418,8 @@ struct BlockDriverState {
>      int sg;        /* if true, the device is a /dev/sg* */
>      int copy_on_read; /* if true, copy read backing sectors into image
>                           note this is a reference count */
> +
> +    bool dirty;
>      bool probed;
>  
>      BlockDriver *drv; /* NULL means no media */
>
Paolo Bonzini June 29, 2016, 7:36 a.m. UTC | #4
On 28/06/2016 23:01, Paolo Bonzini wrote:
> 
> 
> On 24/06/2016 17:06, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a dirty flag in BlockDriverState which is set
>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>> avoid unnesessary flushing when storage is clean.
>>
>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>>  block.c                   |  1 +
>>  block/dirty-bitmap.c      |  3 +++
>>  block/io.c                | 19 +++++++++++++++++++
>>  include/block/block_int.h |  2 ++
>>  4 files changed, 25 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index f4648e9..e36f148 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>          ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>          bdrv_dirty_bitmap_truncate(bs);
>>          bdrv_parent_cb_resize(bs);
>> +        bs->dirty = true; /* file node sync is needed after truncate */
>>      }
>>      return ret;
>>  }
>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>> index 4902ca5..54e0413 100644
>> --- a/block/dirty-bitmap.c
>> +++ b/block/dirty-bitmap.c
>> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>>          }
>>          hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>      }
>> +
>> +    /* Set global block driver dirty flag even if bitmap is disabled */
>> +    bs->dirty = true;
>>  }
>>  
>>  /**
>> diff --git a/block/io.c b/block/io.c
>> index 7cf3645..8078af2 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>          goto flush_parent;
>>      }
>>  
>> +    /* Check if storage is actually dirty before flushing to disk */
>> +    if (!bs->dirty) {
>> +        /* Flush requests are appended to tracked request list in order so that
>> +         * most recent request is at the head of the list. Following code uses
>> +         * this ordering to wait for the most recent flush request to complete
>> +         * to ensure that requests return in order */
>> +        BdrvTrackedRequest *prev_req;
>> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
>> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
>> +                continue;
>> +            }
>> +
>> +            qemu_co_queue_wait(&prev_req->wait_queue);
>> +            break;
>> +        }
>> +        goto flush_parent;
> 
> Can you just have a CoQueue specific to flushes, where a completing
> flush does a restart_all on the CoQueue?

Something like this:

    current_gen = bs->write_gen;
    if (bs->flush_started_gen >= current_gen) {
        while (bs->flushed_gen < current_gen) {
            qemu_co_queue_wait(&bs->flush_queue);
        }
        return;
    }

    bs->flush_started_gen = current_gen;
    ...
    if (current_gen < bs->flushed_gen) {
        bs->flushed_gen = current_gen;
        qemu_co_queue_restart_all(&bs->flush_queue);
    }

Paolo

> Flushes are never serialising, so there's no reason for them to be in
> tracked_requests (I posted patches a while ago that instead use a simple
> atomic counter, but they will only be in 2.8).
Denis V. Lunev June 29, 2016, 8:32 a.m. UTC | #5
On 06/29/2016 10:36 AM, Paolo Bonzini wrote:
>
> On 28/06/2016 23:01, Paolo Bonzini wrote:
>>
>> On 24/06/2016 17:06, Denis V. Lunev wrote:
>>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>
>>> Some guests (win2008 server for example) do a lot of unnecessary
>>> flushing when underlying media has not changed. This adds additional
>>> overhead on host when calling fsync/fdatasync.
>>>
>>> This change introduces a dirty flag in BlockDriverState which is set
>>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>>> avoid unnesessary flushing when storage is clean.
>>>
>>> The problem with excessive flushing was found by a performance test
>>> which does parallel directory tree creation (from 2 processes).
>>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>>> Each loop creates 10^3 directories with 10 files in each.
>>>
>>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Fam Zheng <famz@redhat.com>
>>> CC: John Snow <jsnow@redhat.com>
>>> ---
>>>   block.c                   |  1 +
>>>   block/dirty-bitmap.c      |  3 +++
>>>   block/io.c                | 19 +++++++++++++++++++
>>>   include/block/block_int.h |  2 ++
>>>   4 files changed, 25 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index f4648e9..e36f148 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -2582,6 +2582,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>>           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>>           bdrv_dirty_bitmap_truncate(bs);
>>>           bdrv_parent_cb_resize(bs);
>>> +        bs->dirty = true; /* file node sync is needed after truncate */
>>>       }
>>>       return ret;
>>>   }
>>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>>> index 4902ca5..54e0413 100644
>>> --- a/block/dirty-bitmap.c
>>> +++ b/block/dirty-bitmap.c
>>> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>>>           }
>>>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>>       }
>>> +
>>> +    /* Set global block driver dirty flag even if bitmap is disabled */
>>> +    bs->dirty = true;
>>>   }
>>>   
>>>   /**
>>> diff --git a/block/io.c b/block/io.c
>>> index 7cf3645..8078af2 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -2239,6 +2239,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>           goto flush_parent;
>>>       }
>>>   
>>> +    /* Check if storage is actually dirty before flushing to disk */
>>> +    if (!bs->dirty) {
>>> +        /* Flush requests are appended to tracked request list in order so that
>>> +         * most recent request is at the head of the list. Following code uses
>>> +         * this ordering to wait for the most recent flush request to complete
>>> +         * to ensure that requests return in order */
>>> +        BdrvTrackedRequest *prev_req;
>>> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
>>> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
>>> +                continue;
>>> +            }
>>> +
>>> +            qemu_co_queue_wait(&prev_req->wait_queue);
>>> +            break;
>>> +        }
>>> +        goto flush_parent;
>> Can you just have a CoQueue specific to flushes, where a completing
>> flush does a restart_all on the CoQueue?
> Something like this:
>
>      current_gen = bs->write_gen;
>      if (bs->flush_started_gen >= current_gen) {
>          while (bs->flushed_gen < current_gen) {
>              qemu_co_queue_wait(&bs->flush_queue);
>          }
>          return;
>      }
>
>      bs->flush_started_gen = current_gen;
>      ...
>      if (current_gen < bs->flushed_gen) {
>          bs->flushed_gen = current_gen;
>          qemu_co_queue_restart_all(&bs->flush_queue);
>      }
>
> Paolo
>
I have had exactly this inn mind originally but current queue
with tracked requests is also useful. If it is going to be removed
in 2.8, generation approach would also work.

Thank you,
     Den
diff mbox

Patch

diff --git a/block.c b/block.c
index f4648e9..e36f148 100644
--- a/block.c
+++ b/block.c
@@ -2582,6 +2582,7 @@  int bdrv_truncate(BlockDriverState *bs, int64_t offset)
         ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
         bdrv_dirty_bitmap_truncate(bs);
         bdrv_parent_cb_resize(bs);
+        bs->dirty = true; /* file node sync is needed after truncate */
     }
     return ret;
 }
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 4902ca5..54e0413 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -370,6 +370,9 @@  void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
         }
         hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
     }
+
+    /* Set global block driver dirty flag even if bitmap is disabled */
+    bs->dirty = true;
 }
 
 /**
diff --git a/block/io.c b/block/io.c
index 7cf3645..8078af2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2239,6 +2239,25 @@  int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         goto flush_parent;
     }
 
+    /* Check if storage is actually dirty before flushing to disk */
+    if (!bs->dirty) {
+        /* Flush requests are appended to tracked request list in order so that
+         * most recent request is at the head of the list. Following code uses
+         * this ordering to wait for the most recent flush request to complete
+         * to ensure that requests return in order */
+        BdrvTrackedRequest *prev_req;
+        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
+            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
+                continue;
+            }
+
+            qemu_co_queue_wait(&prev_req->wait_queue);
+            break;
+        }
+        goto flush_parent;
+    }
+    bs->dirty = false;
+
     BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
     if (bs->drv->bdrv_co_flush_to_disk) {
         ret = bs->drv->bdrv_co_flush_to_disk(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2057156..616058b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -418,6 +418,8 @@  struct BlockDriverState {
     int sg;        /* if true, the device is a /dev/sg* */
     int copy_on_read; /* if true, copy read backing sectors into image
                          note this is a reference count */
+
+    bool dirty;
     bool probed;
 
     BlockDriver *drv; /* NULL means no media */