Patchwork [1/3,v4] Expose a mechanisem to trace block writes

login
register
mail settings
Submitter lirans@il.ibm.com
Date Oct. 12, 2009, 4:23 p.m.
Message ID <12553646173152-git-send-email-lirans@il.ibm.com>
Download mbox | patch
Permalink /patch/35774/
State New
Headers show

Comments

lirans@il.ibm.com - Oct. 12, 2009, 4:23 p.m.
To support live migration without shared storage we need to be able to trace
writes to disk while migrating. This Patch expose handler registration for 
above components to be notified about block writes.
Anthony Liguori - Oct. 21, 2009, 6:21 p.m.
Hi Liran,

lirans@il.ibm.com wrote:
> To support live migration without shared storage we need to be able to trace
> writes to disk while migrating. This Patch expose handler registration for 
> above components to be notified about block writes.
>
> diff --git a/block.c b/block.c
> index 33f3d65..bf5f7a6 100644
> --- a/block.c
> +++ b/block.c
> @@ -61,6 +61,8 @@ BlockDriverState *bdrv_first;
>  
>  static BlockDriver *first_drv;
>  
> +static BlockDriverDirtyHandler *bdrv_dirty_handler = NULL;
> +
>   

Should be a property of a BlockDriverState.  IOW, we should register a 
dirty callback for each block device we're interested in.

>  int path_is_absolute(const char *path)
>  {
>      const char *p;
> @@ -626,6 +628,10 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>          return -EIO;
>  
> +    if(bdrv_dirty_handler != NULL) {
> +      bdrv_dirty_handler(bs, sector_num, nb_sectors);
> +    }
> +    
>      return drv->bdrv_write(bs, sector_num, buf, nb_sectors);
>  }
>   

CodingStyle seems off.

We have to be careful in these cases to check for whether we're dealing 
with BDRV_FILE.  In the case of something like qcow2, you would get two 
dirty callbacks as this code stands.  The first would be what the guest 
actually writes and then the second (and potentially third) would be 
qcow2 metadata updates along with writing the actual data to disk.

In terms of an interface, I think it would be better to register a 
bitmap and to poll the block driver periodically to see which bits have 
changed.  This is how ram dirty tracking works and I think keeping these 
interfaces consistent is a good thing.

I'd suggest tracking dirtiness in relatively large chunks (at least 2MB).

>  
> @@ -1359,6 +1370,10 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
>      if (bdrv_check_request(bs, sector_num, nb_sectors))
>          return NULL;
>  
> +    if(bdrv_dirty_handler != NULL) {
> +      bdrv_dirty_handler(bs, sector_num, nb_sectors);
> +    }
> +    
>      ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
>                                 cb, opaque);
>   

The check should be in the completion callback as AIO requests can be 
canceled.  You're potentially giving a false positive.

Regards,

Anthony Liguori
lirans@il.ibm.com - Oct. 22, 2009, 11:01 a.m.
qemu-devel-bounces+lirans=il.ibm.com@nongnu.org wrote on 21/10/2009
20:21:09:

> Anthony Liguori <anthony@codemonkey.ws>
> Sent by: qemu-devel-bounces+lirans=il.ibm.com@nongnu.org
>
> 21/10/2009 20:21
>
> To
>
> Liran Schour/Haifa/IBM@IBMIL
>
> cc
>
> qemu-devel@nongnu.org
>
> Subject
>
> Re: [Qemu-devel] [PATCH 1/3 v4] Expose a mechanisem to trace block writes
>
> Hi Liran,
>
> lirans@il.ibm.com wrote:
> > To support live migration without shared storage we need to be able to
trace
> > writes to disk while migrating. This Patch expose handler registration
for
> > above components to be notified about block writes.
> >
> > diff --git a/block.c b/block.c
> > index 33f3d65..bf5f7a6 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -61,6 +61,8 @@ BlockDriverState *bdrv_first;
> >
> >  static BlockDriver *first_drv;
> >
> > +static BlockDriverDirtyHandler *bdrv_dirty_handler = NULL;
> > +
> >
>
> Should be a property of a BlockDriverState.  IOW, we should register a
> dirty callback for each block device we're interested in.

Agree, I will fix that.

> >  int path_is_absolute(const char *path)
> >  {
> >      const char *p;
> > @@ -626,6 +628,10 @@ int bdrv_write(BlockDriverState *bs, int64_t
> sector_num,
> >      if (bdrv_check_request(bs, sector_num, nb_sectors))
> >          return -EIO;
> >
> > +    if(bdrv_dirty_handler != NULL) {
> > +      bdrv_dirty_handler(bs, sector_num, nb_sectors);
> > +    }
> > +
> >      return drv->bdrv_write(bs, sector_num, buf, nb_sectors);
> >  }
> >
>
> CodingStyle seems off.
>
> We have to be careful in these cases to check for whether we're dealing
> with BDRV_FILE.  In the case of something like qcow2, you would get two
> dirty callbacks as this code stands.  The first would be what the guest
> actually writes and then the second (and potentially third) would be
> qcow2 metadata updates along with writing the actual data to disk.

It seems to me that if we will register a callback for each device we are
interested in, it will solve the problem. (in the code now bug is avoided
by
check that I do in block-migration.c about the device type and name. But I
agree that right now it is buggy.)
Will register each device separately to solve the problem.

> In terms of an interface, I think it would be better to register a
> bitmap and to poll the block driver periodically to see which bits have
> changed.  This is how ram dirty tracking works and I think keeping these
> interfaces consistent is a good thing.

There are advantages for let the higher component to manage it's own dirty
tracking mechanism. In this way more then one component can register
itself.
I think to change the registration mechanism to allow more then one handler
to be
registered, the same way like register_savevm_live works.

> I'd suggest tracking dirtiness in relatively large chunks (at least 2MB).

There are disadvantages for tracking dirtiness in such large chunks. One
write
of 4KB to disk will result in 2MB migrated data.
For now I see a way to improve things by allocating the bitmap only when
migration
started and not from the beginning. I will fix that.

> >
> > @@ -1359,6 +1370,10 @@ BlockDriverAIOCB *bdrv_aio_writev
> (BlockDriverState *bs, int64_t sector_num,
> >      if (bdrv_check_request(bs, sector_num, nb_sectors))
> >          return NULL;
> >
> > +    if(bdrv_dirty_handler != NULL) {
> > +      bdrv_dirty_handler(bs, sector_num, nb_sectors);
> > +    }
> > +
> >      ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
> >                                 cb, opaque);
> >
>
> The check should be in the completion callback as AIO requests can be
> canceled.  You're potentially giving a false positive.

The problem is that the completion callback is caller private. The only way
is to intercept the code that is calling the cb, but this code is format
specific
- means many places in the source code. For our reason I think that we can
live
with treating a write that was cancel as a dirty block. What do you think?

I will fix all of the above and will resend the patch.

Thanks for the review.
- Liran

Patch

diff --git a/block.c b/block.c
index 33f3d65..bf5f7a6 100644
--- a/block.c
+++ b/block.c
@@ -61,6 +61,8 @@  BlockDriverState *bdrv_first;
 
 static BlockDriver *first_drv;
 
+static BlockDriverDirtyHandler *bdrv_dirty_handler = NULL;
+
 int path_is_absolute(const char *path)
 {
     const char *p;
@@ -626,6 +628,10 @@  int bdrv_write(BlockDriverState *bs, int64_t sector_num,
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return -EIO;
 
+    if(bdrv_dirty_handler != NULL) {
+      bdrv_dirty_handler(bs, sector_num, nb_sectors);
+    }
+    
     return drv->bdrv_write(bs, sector_num, buf, nb_sectors);
 }
 
@@ -1162,6 +1168,11 @@  int bdrv_write_compressed(BlockDriverState *bs, int64_t sector_num,
         return -ENOTSUP;
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return -EIO;
+    
+    if(bdrv_dirty_handler != NULL) {
+      bdrv_dirty_handler(bs, sector_num, nb_sectors);
+    }
+    
     return drv->bdrv_write_compressed(bs, sector_num, buf, nb_sectors);
 }
 
@@ -1359,6 +1370,10 @@  BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
     if (bdrv_check_request(bs, sector_num, nb_sectors))
         return NULL;
 
+    if(bdrv_dirty_handler != NULL) {
+      bdrv_dirty_handler(bs, sector_num, nb_sectors);
+    }
+    
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,
                                cb, opaque);
 
@@ -1869,7 +1884,19 @@  BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
     return NULL;
 }
 
+
+
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
     return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
 }
+
+void register_bdrv_dirty_tracking(BlockDriverDirtyHandler *dirty_handler)
+{
+  bdrv_dirty_handler = dirty_handler;
+}
+
+void unregister_bdrv_dirty_tracking(void)
+{
+  bdrv_dirty_handler = NULL;
+}
diff --git a/block.h b/block.h
index a966afb..9757a42 100644
--- a/block.h
+++ b/block.h
@@ -78,7 +78,8 @@  void bdrv_register(BlockDriver *bdrv);
 /* async block I/O */
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
-
+typedef void BlockDriverDirtyHandler(BlockDriverState *bs, int64_t sector,
+				     int sector_num);
 BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
                                  QEMUIOVector *iov, int nb_sectors,
                                  BlockDriverCompletionFunc *cb, void *opaque);
@@ -184,4 +185,7 @@  int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
 int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
                       int64_t pos, int size);
 
+void register_bdrv_dirty_tracking(BlockDriverDirtyHandler *dirty_handler);
+void unregister_bdrv_dirty_tracking(void);
+
 #endif
diff --git a/block_int.h b/block_int.h
index 8e72abe..c524860 100644
--- a/block_int.h
+++ b/block_int.h
@@ -168,6 +168,7 @@  struct BlockDriverState {
     int cyls, heads, secs, translation;
     int type;
     char device_name[32];
+    void *dirty_control;
     BlockDriverState *next;
     void *private;
 };