diff mbox

[v10,02/10] Backup: clear all bitmap when doing block checkpoint

Message ID 1443161858-20533-3-git-send-email-wency@cn.fujitsu.com
State New
Headers show

Commit Message

Wen Congyang Sept. 25, 2015, 6:17 a.m. UTC
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
---
 block/backup.c           | 14 ++++++++++++++
 blockjob.c               | 11 +++++++++++
 include/block/blockjob.h | 12 ++++++++++++
 3 files changed, 37 insertions(+)

Comments

Eric Blake Oct. 9, 2015, 7:09 p.m. UTC | #1
On 09/25/2015 12:17 AM, Wen Congyang wrote:

s/bitmap/bitmaps/ in the subject line.

The subject line says "what", but you are missing a commit body that
says "why".

> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>
> ---
Stefan Hajnoczi Oct. 12, 2015, 1:45 p.m. UTC | #2
On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> Reviewed-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/backup.c           | 14 ++++++++++++++
>  blockjob.c               | 11 +++++++++++
>  include/block/blockjob.h | 12 ++++++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/block/backup.c b/block/backup.c
> index c61e4c3..5e5995e 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
>      }
>  }
>  
> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
> +{
> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> +
> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
> +        error_setg(errp, "The backup job only supports block checkpoint in"
> +                   " sync=none mode");
> +        return;
> +    }
> +
> +    hbitmap_reset_all(backup_job->bitmap);
> +}

Is this a fast way to stop and then start a new backup blockjob without
emitting block job lifecycle events?

Not sure the blockjob_do_checkpoint() interface is appropriate.  Is
there any other block job type that will implement .do_checkpoint()?

COLO block replication could call a public backup_do_checkpoint()
function.  That way the direct coupling between COLO and the backup
block job is obvious.  I'm not convinced a generic interface like
blockjob_do_checkpoint() makes sense since it's really not a generic
operation that makes sense for other block job types.

void backup_do_checkpoint(BlockJob *job, Error **errp)
{
    BackupBlockJob *s;

    if (job->driver != backup_job_driver) {
        error_setg(errp, "expected backup block job type for "
	           "checkpoint, got %d", job->driver->job_type);
        return;
    }

    s = container_of(job, BackupBlockJob, common);
    ...
}

Please also make the function name and documentation more specific.
Instead of "do" maybe this should be "pre" or "post" to indicate whether
this happens before or after the checkpoint commit.  What happens if
this function returns an error?
Wen Congyang Oct. 13, 2015, 9:13 a.m. UTC | #3
On 10/12/2015 09:45 PM, Stefan Hajnoczi wrote:
> On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
>> Reviewed-by: Jeff Cody <jcody@redhat.com>
>> ---
>>  block/backup.c           | 14 ++++++++++++++
>>  blockjob.c               | 11 +++++++++++
>>  include/block/blockjob.h | 12 ++++++++++++
>>  3 files changed, 37 insertions(+)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index c61e4c3..5e5995e 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
>>      }
>>  }
>>  
>> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
>> +{
>> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
>> +
>> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
>> +        error_setg(errp, "The backup job only supports block checkpoint in"
>> +                   " sync=none mode");
>> +        return;
>> +    }
>> +
>> +    hbitmap_reset_all(backup_job->bitmap);
>> +}
> 
> Is this a fast way to stop and then start a new backup blockjob without
> emitting block job lifecycle events?
> 
> Not sure the blockjob_do_checkpoint() interface is appropriate.  Is
> there any other block job type that will implement .do_checkpoint()?

Currently, the answer is no.

> 
> COLO block replication could call a public backup_do_checkpoint()
> function.  That way the direct coupling between COLO and the backup
> block job is obvious.  I'm not convinced a generic interface like
> blockjob_do_checkpoint() makes sense since it's really not a generic
> operation that makes sense for other block job types.
> 
> void backup_do_checkpoint(BlockJob *job, Error **errp)
> {
>     BackupBlockJob *s;
> 
>     if (job->driver != backup_job_driver) {
>         error_setg(errp, "expected backup block job type for "
> 	           "checkpoint, got %d", job->driver->job_type);
>         return;
>     }
> 
>     s = container_of(job, BackupBlockJob, common);
>     ...
> }

In a older version, I implement it like this, but Paolo didn't like it.

> 
> Please also make the function name and documentation more specific.
> Instead of "do" maybe this should be "pre" or "post" to indicate whether
> this happens before or after the checkpoint commit.  What happens if

OK

> this function returns an error?

We just return this error to COLO, and COLO will do failover.

Thanks
Wen Congyang

> .
>
Stefan Hajnoczi Oct. 14, 2015, 2:12 p.m. UTC | #4
On Tue, Oct 13, 2015 at 05:13:14PM +0800, Wen Congyang wrote:
> On 10/12/2015 09:45 PM, Stefan Hajnoczi wrote:
> > On Fri, Sep 25, 2015 at 02:17:30PM +0800, Wen Congyang wrote:
> >> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> >> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> >> Reviewed-by: Jeff Cody <jcody@redhat.com>
> >> ---
> >>  block/backup.c           | 14 ++++++++++++++
> >>  blockjob.c               | 11 +++++++++++
> >>  include/block/blockjob.h | 12 ++++++++++++
> >>  3 files changed, 37 insertions(+)
> >>
> >> diff --git a/block/backup.c b/block/backup.c
> >> index c61e4c3..5e5995e 100644
> >> --- a/block/backup.c
> >> +++ b/block/backup.c
> >> @@ -214,11 +214,25 @@ static void backup_iostatus_reset(BlockJob *job)
> >>      }
> >>  }
> >>  
> >> +static void backup_do_checkpoint(BlockJob *job, Error **errp)
> >> +{
> >> +    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
> >> +
> >> +    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
> >> +        error_setg(errp, "The backup job only supports block checkpoint in"
> >> +                   " sync=none mode");
> >> +        return;
> >> +    }
> >> +
> >> +    hbitmap_reset_all(backup_job->bitmap);
> >> +}
> > 
> > Is this a fast way to stop and then start a new backup blockjob without
> > emitting block job lifecycle events?
> > 
> > Not sure the blockjob_do_checkpoint() interface is appropriate.  Is
> > there any other block job type that will implement .do_checkpoint()?
> 
> Currently, the answer is no.
> 
> > 
> > COLO block replication could call a public backup_do_checkpoint()
> > function.  That way the direct coupling between COLO and the backup
> > block job is obvious.  I'm not convinced a generic interface like
> > blockjob_do_checkpoint() makes sense since it's really not a generic
> > operation that makes sense for other block job types.
> > 
> > void backup_do_checkpoint(BlockJob *job, Error **errp)
> > {
> >     BackupBlockJob *s;
> > 
> >     if (job->driver != backup_job_driver) {
> >         error_setg(errp, "expected backup block job type for "
> > 	           "checkpoint, got %d", job->driver->job_type);
> >         return;
> >     }
> > 
> >     s = container_of(job, BackupBlockJob, common);
> >     ...
> > }
> 
> In a older version, I implement it like this, but Paolo didn't like it.

It's a question of taste.  In this case it seems to me that there is
really a direct coupling between COLO and the backup block job.  This
isn't really a generic interface that makes sense in other scenarios.
That's why I advocate for direct coupling instead of pretending this is
a generic interface.

I wish COLO could just stop the existing block job and start a new one
for each checkpoint.  In reality we probably don't want QMP events and
the full block job lifecycle for each checkpoint...  But anyway, I like
this approach because it does not require a new interface at all.

> > Please also make the function name and documentation more specific.
> > Instead of "do" maybe this should be "pre" or "post" to indicate whether
> > this happens before or after the checkpoint commit.  What happens if
> 
> OK
> 
> > this function returns an error?
> 
> We just return this error to COLO, and COLO will do failover.

Okay, I ask these questions because the information should be part of
the doc comment for this new interface.

Stefan
diff mbox

Patch

diff --git a/block/backup.c b/block/backup.c
index c61e4c3..5e5995e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -214,11 +214,25 @@  static void backup_iostatus_reset(BlockJob *job)
     }
 }
 
+static void backup_do_checkpoint(BlockJob *job, Error **errp)
+{
+    BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
+
+    if (backup_job->sync_mode != MIRROR_SYNC_MODE_NONE) {
+        error_setg(errp, "The backup job only supports block checkpoint in"
+                   " sync=none mode");
+        return;
+    }
+
+    hbitmap_reset_all(backup_job->bitmap);
+}
+
 static const BlockJobDriver backup_job_driver = {
     .instance_size  = sizeof(BackupBlockJob),
     .job_type       = BLOCK_JOB_TYPE_BACKUP,
     .set_speed      = backup_set_speed,
     .iostatus_reset = backup_iostatus_reset,
+    .do_checkpoint  = backup_do_checkpoint,
 };
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
diff --git a/blockjob.c b/blockjob.c
index ca4be94..ea4c44a 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -405,3 +405,14 @@  void block_job_defer_to_main_loop(BlockJob *job,
 
     qemu_bh_schedule(data->bh);
 }
+
+void block_job_do_checkpoint(BlockJob *job, Error **errp)
+{
+    if (!job->driver->do_checkpoint) {
+        error_setg(errp, "The job %s doesn't support block checkpoint",
+                   BlockJobType_lookup[job->driver->job_type]);
+        return;
+    }
+
+    job->driver->do_checkpoint(job, errp);
+}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index dd9d5e6..0b4f386 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,9 @@  typedef struct BlockJobDriver {
      * manually.
      */
     void (*complete)(BlockJob *job, Error **errp);
+
+    /** Optional callback for job types that support checkpoint. */
+    void (*do_checkpoint)(BlockJob *job, Error **errp);
 } BlockJobDriver;
 
 /**
@@ -356,4 +359,13 @@  void block_job_defer_to_main_loop(BlockJob *job,
                                   BlockJobDeferToMainLoopFn *fn,
                                   void *opaque);
 
+/**
+ * block_job_do_checkpoint:
+ * @job: The job.
+ * @errp: Error object.
+ *
+ * Do block checkpoint on the specified job.
+ */
+void block_job_do_checkpoint(BlockJob *job, Error **errp);
+
 #endif