diff mbox series

[v2,09/11] block/backup: teach TOP to never copy unallocated regions

Message ID 20190716000117.25219-10-jsnow@redhat.com
State New
Headers show
Series bitmaps: allow bitmaps to be used with full and top | expand

Commit Message

John Snow July 16, 2019, 12:01 a.m. UTC
Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
In the write notifier handler, we dutifully copy out such regions.

Fix this in three parts:

1. Mark the bitmap as being initialized before the first yield.
2. After the first yield but before the backup loop, interrogate the
allocation status asynchronously and initialize the bitmap.
3. Teach the write notifier to interrogate allocation status if it is
invoked during bitmap initialization.

As an effect of this patch, the job progress for TOP backups
now behaves like this:

- total progress starts at bdrv_length.
- As allocation status is interrogated, total progress decreases.
- As blocks are copied, current progress increases.

Taken together, the floor and ceiling move to meet each other.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/backup.c     | 78 ++++++++++++++++++++++++++++++++++++++++------
 block/trace-events |  1 +
 2 files changed, 70 insertions(+), 9 deletions(-)

Comments

Max Reitz July 16, 2019, 11:43 a.m. UTC | #1
On 16.07.19 02:01, John Snow wrote:
> Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
> In the write notifier handler, we dutifully copy out such regions.
> 
> Fix this in three parts:
> 
> 1. Mark the bitmap as being initialized before the first yield.
> 2. After the first yield but before the backup loop, interrogate the
> allocation status asynchronously and initialize the bitmap.
> 3. Teach the write notifier to interrogate allocation status if it is
> invoked during bitmap initialization.
> 
> As an effect of this patch, the job progress for TOP backups
> now behaves like this:
> 
> - total progress starts at bdrv_length.
> - As allocation status is interrogated, total progress decreases.
> - As blocks are copied, current progress increases.
> 
> Taken together, the floor and ceiling move to meet each other.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/backup.c     | 78 ++++++++++++++++++++++++++++++++++++++++------
>  block/trace-events |  1 +
>  2 files changed, 70 insertions(+), 9 deletions(-)

Looks good to me but for a seemingly unrelated change:

> diff --git a/block/backup.c b/block/backup.c
> index b407d57954..e28fd23f6a 100644
> --- a/block/backup.c
> +++ b/block/backup.c

[...]

> @@ -507,10 +565,12 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>               * notify callback service CoW requests. */
>              job_yield(job);
>          }
> +        ret = -ECANCELED;

This one.  This doesn’t look like it belongs in this patch, and I’m not
even sure it’s correct.  Being cancelled is the normal state for
sync=none, so I suppose it is correct to just return 0 then.

Max

>      } else {
>          ret = backup_loop(s);
>      }
>  
> + out:
>      notifier_with_return_remove(&s->before_write);
>  
>      /* wait until pending backup_do_cow() calls have completed */
John Snow July 16, 2019, 4:02 p.m. UTC | #2
On 7/16/19 7:43 AM, Max Reitz wrote:
> On 16.07.19 02:01, John Snow wrote:
>> Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
>> In the write notifier handler, we dutifully copy out such regions.
>>
>> Fix this in three parts:
>>
>> 1. Mark the bitmap as being initialized before the first yield.
>> 2. After the first yield but before the backup loop, interrogate the
>> allocation status asynchronously and initialize the bitmap.
>> 3. Teach the write notifier to interrogate allocation status if it is
>> invoked during bitmap initialization.
>>
>> As an effect of this patch, the job progress for TOP backups
>> now behaves like this:
>>
>> - total progress starts at bdrv_length.
>> - As allocation status is interrogated, total progress decreases.
>> - As blocks are copied, current progress increases.
>>
>> Taken together, the floor and ceiling move to meet each other.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/backup.c     | 78 ++++++++++++++++++++++++++++++++++++++++------
>>  block/trace-events |  1 +
>>  2 files changed, 70 insertions(+), 9 deletions(-)
> 
> Looks good to me but for a seemingly unrelated change:
> 
>> diff --git a/block/backup.c b/block/backup.c
>> index b407d57954..e28fd23f6a 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
> 
> [...]
> 
>> @@ -507,10 +565,12 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>               * notify callback service CoW requests. */
>>              job_yield(job);
>>          }
>> +        ret = -ECANCELED;
> 
> This one.  This doesn’t look like it belongs in this patch, and I’m not
> even sure it’s correct.  Being cancelled is the normal state for
> sync=none, so I suppose it is correct to just return 0 then.
> 
> Max
> 
Yeah, this is wiggly, so... yes, we can return 0 here. The job
infrastructure machinery is going to change it to an ECANCELED for us
anyway:

job_completed
  job_update_rc
    if (!job->ret && job_is_cancelled(job)) {
        job->ret = -ECANCELED;
    }

So in this case I just figured that I might as well make it explicit;
this is an error exit.

(I guess just leaving it at 0 means "whatever the job machinery thinks"
too, which is probably also fine. The job machinery does not distinguish
between "canceled and 0" or "canceled and < 0".)


Since we're here, though... I was wondering if it shouldn't be the case
that "canceling" a sync=none job should actually result in success,
unless you force-cancel. OR, allow sync=none jobs to receive "COMPLETE"
verbs to finish successfully, or "CANCEL" verbs to terminate with error.

(I don't like what mirror does and don't wish to mimic it. I continue to
dislike the idea that canceling a ready mirror job allows it to complete
with a successful error code.)

>>      } else {
>>          ret = backup_loop(s);
>>      }
>>  
>> + out:
>>      notifier_with_return_remove(&s->before_write);
>>  
>>      /* wait until pending backup_do_cow() calls have completed */
>
Max Reitz July 16, 2019, 4:11 p.m. UTC | #3
On 16.07.19 18:02, John Snow wrote:
> 
> 
> On 7/16/19 7:43 AM, Max Reitz wrote:
>> On 16.07.19 02:01, John Snow wrote:
>>> Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
>>> In the write notifier handler, we dutifully copy out such regions.
>>>
>>> Fix this in three parts:
>>>
>>> 1. Mark the bitmap as being initialized before the first yield.
>>> 2. After the first yield but before the backup loop, interrogate the
>>> allocation status asynchronously and initialize the bitmap.
>>> 3. Teach the write notifier to interrogate allocation status if it is
>>> invoked during bitmap initialization.
>>>
>>> As an effect of this patch, the job progress for TOP backups
>>> now behaves like this:
>>>
>>> - total progress starts at bdrv_length.
>>> - As allocation status is interrogated, total progress decreases.
>>> - As blocks are copied, current progress increases.
>>>
>>> Taken together, the floor and ceiling move to meet each other.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>  block/backup.c     | 78 ++++++++++++++++++++++++++++++++++++++++------
>>>  block/trace-events |  1 +
>>>  2 files changed, 70 insertions(+), 9 deletions(-)
>>
>> Looks good to me but for a seemingly unrelated change:
>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index b407d57954..e28fd23f6a 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>
>> [...]
>>
>>> @@ -507,10 +565,12 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>               * notify callback service CoW requests. */
>>>              job_yield(job);
>>>          }
>>> +        ret = -ECANCELED;
>>
>> This one.  This doesn’t look like it belongs in this patch, and I’m not
>> even sure it’s correct.  Being cancelled is the normal state for
>> sync=none, so I suppose it is correct to just return 0 then.
>>
>> Max
>>
> Yeah, this is wiggly, so... yes, we can return 0 here. The job
> infrastructure machinery is going to change it to an ECANCELED for us
> anyway:
> 
> job_completed
>   job_update_rc
>     if (!job->ret && job_is_cancelled(job)) {
>         job->ret = -ECANCELED;
>     }
> 
> So in this case I just figured that I might as well make it explicit;
> this is an error exit.
> 
> (I guess just leaving it at 0 means "whatever the job machinery thinks"
> too, which is probably also fine. The job machinery does not distinguish
> between "canceled and 0" or "canceled and < 0".)

Hm, OK.  I think it should be an own patch, though.

> Since we're here, though... I was wondering if it shouldn't be the case
> that "canceling" a sync=none job should actually result in success,
> unless you force-cancel. OR, allow sync=none jobs to receive "COMPLETE"
> verbs to finish successfully, or "CANCEL" verbs to terminate with error.

That’s what I had thought.  (That canceling would be a success.)

As for COMPLETE, you want it to emit a READY event right when it’s
started? :-)

> (I don't like what mirror does and don't wish to mimic it. I continue to
> dislike the idea that canceling a ready mirror job allows it to complete
> with a successful error code.)

Hm.  Actually, I don’t even care that much.  If the user canceled it,
they probably won’t really look at the return code anyway...

Max

>>>      } else {
>>>          ret = backup_loop(s);
>>>      }
>>>  
>>> + out:
>>>      notifier_with_return_remove(&s->before_write);
>>>  
>>>      /* wait until pending backup_do_cow() calls have completed */
>>
John Snow July 17, 2019, 6:10 p.m. UTC | #4
On 7/16/19 12:11 PM, Max Reitz wrote:
> On 16.07.19 18:02, John Snow wrote:
>>
>>
>> On 7/16/19 7:43 AM, Max Reitz wrote:
>>> On 16.07.19 02:01, John Snow wrote:
>>>> Presently, If sync=TOP is selected, we mark the entire bitmap as dirty.
>>>> In the write notifier handler, we dutifully copy out such regions.
>>>>
>>>> Fix this in three parts:
>>>>
>>>> 1. Mark the bitmap as being initialized before the first yield.
>>>> 2. After the first yield but before the backup loop, interrogate the
>>>> allocation status asynchronously and initialize the bitmap.
>>>> 3. Teach the write notifier to interrogate allocation status if it is
>>>> invoked during bitmap initialization.
>>>>
>>>> As an effect of this patch, the job progress for TOP backups
>>>> now behaves like this:
>>>>
>>>> - total progress starts at bdrv_length.
>>>> - As allocation status is interrogated, total progress decreases.
>>>> - As blocks are copied, current progress increases.
>>>>
>>>> Taken together, the floor and ceiling move to meet each other.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>  block/backup.c     | 78 ++++++++++++++++++++++++++++++++++++++++------
>>>>  block/trace-events |  1 +
>>>>  2 files changed, 70 insertions(+), 9 deletions(-)
>>>
>>> Looks good to me but for a seemingly unrelated change:
>>>
>>>> diff --git a/block/backup.c b/block/backup.c
>>>> index b407d57954..e28fd23f6a 100644
>>>> --- a/block/backup.c
>>>> +++ b/block/backup.c
>>>
>>> [...]
>>>
>>>> @@ -507,10 +565,12 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
>>>>               * notify callback service CoW requests. */
>>>>              job_yield(job);
>>>>          }
>>>> +        ret = -ECANCELED;
>>>
>>> This one.  This doesn’t look like it belongs in this patch, and I’m not
>>> even sure it’s correct.  Being cancelled is the normal state for
>>> sync=none, so I suppose it is correct to just return 0 then.
>>>
>>> Max
>>>
>> Yeah, this is wiggly, so... yes, we can return 0 here. The job
>> infrastructure machinery is going to change it to an ECANCELED for us
>> anyway:
>>
>> job_completed
>>   job_update_rc
>>     if (!job->ret && job_is_cancelled(job)) {
>>         job->ret = -ECANCELED;
>>     }
>>
>> So in this case I just figured that I might as well make it explicit;
>> this is an error exit.
>>
>> (I guess just leaving it at 0 means "whatever the job machinery thinks"
>> too, which is probably also fine. The job machinery does not distinguish
>> between "canceled and 0" or "canceled and < 0".)
> 
> Hm, OK.  I think it should be an own patch, though.
> 

Path of least edits.

I'm removing this one change (since it doesn't change anything anyway),
making the error message capitalization fix in the reference output, and
staging the series.

--js
diff mbox series

Patch

diff --git a/block/backup.c b/block/backup.c
index b407d57954..e28fd23f6a 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -58,6 +58,7 @@  typedef struct BackupBlockJob {
     int64_t copy_range_size;
 
     bool serialize_target_writes;
+    bool initializing_bitmap;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
@@ -227,6 +228,35 @@  static int backup_is_cluster_allocated(BackupBlockJob *s, int64_t offset,
     }
 }
 
+/**
+ * Reset bits in copy_bitmap starting at offset if they represent unallocated
+ * data in the image. May reset subsequent contiguous bits.
+ * @return 0 when the cluster at @offset was unallocated,
+ *         1 otherwise, and -ret on error.
+ */
+static int64_t backup_bitmap_reset_unallocated(BackupBlockJob *s,
+                                               int64_t offset, int64_t *count)
+{
+    int ret;
+    int64_t clusters, bytes, estimate;
+
+    ret = backup_is_cluster_allocated(s, offset, &clusters);
+    if (ret < 0) {
+        return ret;
+    }
+
+    bytes = clusters * s->cluster_size;
+
+    if (!ret) {
+        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+        estimate = bdrv_get_dirty_count(s->copy_bitmap);
+        job_progress_set_remaining(&s->common.job, estimate);
+    }
+
+    *count = bytes;
+    return ret;
+}
+
 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
                                       int64_t offset, uint64_t bytes,
                                       bool *error_is_read,
@@ -236,6 +266,7 @@  static int coroutine_fn backup_do_cow(BackupBlockJob *job,
     int ret = 0;
     int64_t start, end; /* bytes */
     void *bounce_buffer = NULL;
+    int64_t skip_bytes;
 
     qemu_co_rwlock_rdlock(&job->flush_rwlock);
 
@@ -254,6 +285,15 @@  static int coroutine_fn backup_do_cow(BackupBlockJob *job,
             continue; /* already copied */
         }
 
+        if (job->initializing_bitmap) {
+            ret = backup_bitmap_reset_unallocated(job, start, &skip_bytes);
+            if (ret == 0) {
+                trace_backup_do_cow_skip_range(job, start, skip_bytes);
+                start += skip_bytes;
+                continue;
+            }
+        }
+
         trace_backup_do_cow_process(job, start);
 
         if (job->use_copy_range) {
@@ -436,18 +476,9 @@  static int coroutine_fn backup_loop(BackupBlockJob *job)
     int64_t offset;
     BdrvDirtyBitmapIter *bdbi;
     int ret = 0;
-    int64_t dummy;
 
     bdbi = bdrv_dirty_iter_new(job->copy_bitmap);
     while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-        if (job->sync_mode == MIRROR_SYNC_MODE_TOP &&
-            !backup_is_cluster_allocated(job, offset, &dummy))
-        {
-            bdrv_reset_dirty_bitmap(job->copy_bitmap, offset,
-                                    job->cluster_size);
-            continue;
-        }
-
         do {
             if (yield_and_check(job)) {
                 goto out;
@@ -478,6 +509,13 @@  static void backup_init_copy_bitmap(BackupBlockJob *job)
                                                NULL, true);
         assert(ret);
     } else {
+        if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
+            /*
+             * We can't hog the coroutine to initialize this thoroughly.
+             * Set a flag and resume work when we are able to yield safely.
+             */
+            job->initializing_bitmap = true;
+        }
         bdrv_set_dirty_bitmap(job->copy_bitmap, 0, job->len);
     }
 
@@ -499,6 +537,26 @@  static int coroutine_fn backup_run(Job *job, Error **errp)
     s->before_write.notify = backup_before_write_notify;
     bdrv_add_before_write_notifier(bs, &s->before_write);
 
+    if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
+        int64_t offset = 0;
+        int64_t count;
+
+        for (offset = 0; offset < s->len; ) {
+            if (yield_and_check(s)) {
+                ret = -ECANCELED;
+                goto out;
+            }
+
+            ret = backup_bitmap_reset_unallocated(s, offset, &count);
+            if (ret < 0) {
+                goto out;
+            }
+
+            offset += count;
+        }
+        s->initializing_bitmap = false;
+    }
+
     if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
         /* All bits are set in copy_bitmap to allow any cluster to be copied.
          * This does not actually require them to be copied. */
@@ -507,10 +565,12 @@  static int coroutine_fn backup_run(Job *job, Error **errp)
              * notify callback service CoW requests. */
             job_yield(job);
         }
+        ret = -ECANCELED;
     } else {
         ret = backup_loop(s);
     }
 
+ out:
     notifier_with_return_remove(&s->before_write);
 
     /* wait until pending backup_do_cow() calls have completed */
diff --git a/block/trace-events b/block/trace-events
index d724df0117..04209f058d 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -41,6 +41,7 @@  mirror_yield_in_flight(void *s, int64_t offset, int in_flight) "s %p offset %" P
 backup_do_cow_enter(void *job, int64_t start, int64_t offset, uint64_t bytes) "job %p start %" PRId64 " offset %" PRId64 " bytes %" PRIu64
 backup_do_cow_return(void *job, int64_t offset, uint64_t bytes, int ret) "job %p offset %" PRId64 " bytes %" PRIu64 " ret %d"
 backup_do_cow_skip(void *job, int64_t start) "job %p start %"PRId64
+backup_do_cow_skip_range(void *job, int64_t start, uint64_t bytes) "job %p start %"PRId64" bytes %"PRId64
 backup_do_cow_process(void *job, int64_t start) "job %p start %"PRId64
 backup_do_cow_read_fail(void *job, int64_t start, int ret) "job %p start %"PRId64" ret %d"
 backup_do_cow_write_fail(void *job, int64_t start, int ret) "job %p start %"PRId64" ret %d"