Patchwork [v3,2/6] add basic backup support to block driver

login
register
mail settings
Submitter Dietmar Maurer
Date Feb. 19, 2013, 11:31 a.m.
Message ID <1361273503-974882-2-git-send-email-dietmar@proxmox.com>
Download mbox | patch
Permalink /patch/221680/
State New
Headers show

Comments

Dietmar Maurer - Feb. 19, 2013, 11:31 a.m.
Function backup_job_create() creates a block job to backup a block device.
The coroutine is started with backup_job_start().

We call backup_do_cow() for each write during backup. That function
reads the original data and pass it to backup_dump_cb().

The tracked_request infrastructure is used to serialize access.

Currently backup cluster size is hardcoded to 65536 bytes.

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
 Makefile.objs            |    1 +
 backup.c                 |  338 ++++++++++++++++++++++++++++++++++++++++++++++
 backup.h                 |   32 +++++
 block.c                  |   71 +++++++++-
 include/block/block.h    |    2 +
 include/block/blockjob.h |   10 ++
 6 files changed, 448 insertions(+), 6 deletions(-)
 create mode 100644 backup.c
 create mode 100644 backup.h
Eric Blake - Feb. 19, 2013, 8:24 p.m.
On 02/19/2013 04:31 AM, Dietmar Maurer wrote:
> Function backup_job_create() creates a block job to backup a block device.
> The coroutine is started with backup_job_start().
> 
> We call backup_do_cow() for each write during backup. That function
> reads the original data and pass it to backup_dump_cb().
> 
> The tracked_request infrastructure is used to serialize access.
> 
> Currently backup cluster size is hardcoded to 65536 bytes.
> 
> Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
> ---

I'm only doing a casual review of low-hanging fruit; I'm not familiar
enough with coroutines or COW usage in qemu to do an in-depth review of
the actual algorithm.

> +++ b/backup.c
> @@ -0,0 +1,338 @@
> +/*
> + * QEMU backup
> + *
> + * Copyright (C) 2012 Proxmox Server Solutions

It's 2013 now.

> +        /* we need to yield so that qemu_aio_flush() returns.
> +         * (without, VM does not reboot)
> +        * Note: use 1000 instead of 0 (0 priorize this task too much)

s/priorize/prioritize/

> +++ b/backup.h
> @@ -0,0 +1,32 @@
> +/*
> + * QEMU backup related definitions
> + *
> + * Copyright (C) Proxmox Server Solutions

Year(s)?
Paolo Bonzini - Feb. 20, 2013, 2:38 p.m.
Il 19/02/2013 12:31, Dietmar Maurer ha scritto:
> +    start = 0;
> +    end = (bs->total_sectors + BACKUP_BLOCKS_PER_CLUSTER - 1) /
> +        BACKUP_BLOCKS_PER_CLUSTER;
> +
> +    DPRINTF("backup_run start %s %zd %zd\n", bdrv_get_device_name(bs),
> +            start, end);
> +
> +    int ret = 0;
> +
> +    for (; start < end; start++) {
> +        if (block_job_is_cancelled(&job->common)) {
> +            ret = -1;
> +            break;
> +        }
> +

This should call bdrv_is_allocated_above like the other block jobs do.
It would be needed later anyway to backup only the topmost image.

Paolo
Dietmar Maurer - Feb. 21, 2013, 6:33 a.m.
> This should call bdrv_is_allocated_above like the other block jobs do.
> It would be needed later anyway to backup only the topmost image.

I do not need that information now, so why do you want that I add dead code?
Paolo Bonzini - Feb. 21, 2013, 7:39 a.m.
Il 21/02/2013 07:33, Dietmar Maurer ha scritto:
>> This should call bdrv_is_allocated_above like the other block jobs do.
>> It would be needed later anyway to backup only the topmost image.
> 
> I do not need that information now, so why do you want that I add dead code?

I think you do.  You're wasting time reading unallocated clusters and
checking that they are zero.  bdrv_is_allocated_above gives you the same
information much more efficiently.  Do VMA files have to store all the
blocks of the source file?

Paolo
Dietmar Maurer - Feb. 21, 2013, 8:42 a.m.
> >> This should call bdrv_is_allocated_above like the other block jobs do.
> >> It would be needed later anyway to backup only the topmost image.
> >
> > I do not need that information now, so why do you want that I add dead code?
> 
> I think you do.  You're wasting time reading unallocated clusters and checking
> that they are zero.  bdrv_is_allocated_above gives you the same information
> much more efficiently.  

I thought that just returns information if the data is allocated, or if data
is on backing file?

Or is data guaranteed to be zero if bdrv_is_allocated_above() return 0?

> Do VMA files have to store all the blocks of the source file?

I only use it for full backups currently.

If you want incremental backups, you need to store information about the base image,
and this is not the scope of this patch. IMHO incremental backup which references to other 
images are a mess. They are difficult to generate and difficult to maintain.

I would prefer to use some kind of Content Addressable Storage, using hashes.
That way you have advantages of full backups and incremental backups (deduplication).
Having the additional advantage that we can easily rsync that data to another site.
Dietmar Maurer - Feb. 21, 2013, 8:55 a.m.
> > >> This should call bdrv_is_allocated_above like the other block jobs do.
> > >> It would be needed later anyway to backup only the topmost image.
> > >
> > > I do not need that information now, so why do you want that I add dead
> code?
> >
> > I think you do.  You're wasting time reading unallocated clusters and
> > checking that they are zero.  bdrv_is_allocated_above gives you the
> > same information much more efficiently.
> 
> I thought that just returns information if the data is allocated, or if data is on
> backing file?
> 
> Or is data guaranteed to be zero if bdrv_is_allocated_above() return 0?

Oh, I need to pass NULL for base to get that information?
Dietmar Maurer - Feb. 21, 2013, 11:40 a.m.
> > > I think you do.  You're wasting time reading unallocated clusters
> > > and checking that they are zero.  bdrv_is_allocated_above gives you
> > > the same information much more efficiently.
> >
> > I thought that just returns information if the data is allocated, or
> > if data is on backing file?
> >
> > Or is data guaranteed to be zero if bdrv_is_allocated_above() return 0?
> 
> Oh, I need to pass NULL for base to get that information?

I just posted v5 of the patch. But I get a slow down of 15% if I use bdrv_is_allocated_above.
(tested with empty qcow2 files.)

Please can you take a look at the code - maybe I am doing something wrong?
Paolo Bonzini - Feb. 21, 2013, 12:51 p.m.
----- Messaggio originale -----
> Da: "Dietmar Maurer" <dietmar@proxmox.com>
> A: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: qemu-devel@nongnu.org
> Inviato: Giovedì, 21 febbraio 2013 12:40:52
> Oggetto: RE: [PATCH v3 2/6] add basic backup support to block driver
> 
> > > > I think you do.  You're wasting time reading unallocated
> > > > clusters
> > > > and checking that they are zero.  bdrv_is_allocated_above gives
> > > > you
> > > > the same information much more efficiently.
> > >
> > > I thought that just returns information if the data is allocated,
> > > or
> > > if data is on backing file?
> > >
> > > Or is data guaranteed to be zero if bdrv_is_allocated_above()
> > > return 0?
> > 
> > Oh, I need to pass NULL for base to get that information?
> 
> I just posted v5 of the patch. But I get a slow down of 15% if I use
> bdrv_is_allocated_above. (tested with empty qcow2 files.)

Strange, for an unallocated area bdrv_is_allocated_above and bdrv_read
really do the same thing apart from writing the zeroes to the buffer.
The code looks okay, does a profile say where the time is being spent?
Or does the is_allocated_above ever trigger (are you using metadata
preallocation)?

BTW, any reason why BACKUP_BLOCKS_PER_CLUSTER is hardcoded and you're
not using the cluster size from bdrv_get_info?

Paolo
Dietmar Maurer - Feb. 21, 2013, 3:25 p.m.
> > I just posted v5 of the patch. But I get a slow down of 15% if I use

> > bdrv_is_allocated_above. (tested with empty qcow2 files.)

> 

> Strange, for an unallocated area bdrv_is_allocated_above and bdrv_read really

> do the same thing apart from writing the zeroes to the buffer.

> The code looks okay, does a profile say where the time is being spent?

> Or does the is_allocated_above ever trigger (are you using metadata

> preallocation)?


Yes, I use metadata preallocation. But that does not explain why it gets slower?

> BTW, any reason why BACKUP_BLOCKS_PER_CLUSTER is hardcoded and you're

> not using the cluster size from bdrv_get_info?


A backup includes several block devices, and I want to have the same cluster size
for all. It makes the code easier.
Paolo Bonzini - Feb. 21, 2013, 3:28 p.m.
Il 21/02/2013 16:25, Dietmar Maurer ha scritto:
>>> I just posted v5 of the patch. But I get a slow down of 15% if I use
>>> bdrv_is_allocated_above. (tested with empty qcow2 files.)
>>
>> Strange, for an unallocated area bdrv_is_allocated_above and bdrv_read really
>> do the same thing apart from writing the zeroes to the buffer.
>> The code looks okay, does a profile say where the time is being spent?
>> Or does the is_allocated_above ever trigger (are you using metadata
>> preallocation)?
> 
> Yes, I use metadata preallocation. But that does not explain why it gets slower?

Because then this is not an empty qcow2 file, it is a full qcow2 file
where reads are particularly cheap because they always hit holes.  It is
the worst case for this optimization, I think 15% is not wonderful but
not particularly bad.

>> BTW, any reason why BACKUP_BLOCKS_PER_CLUSTER is hardcoded and you're
>> not using the cluster size from bdrv_get_info?
> 
> A backup includes several block devices, and I want to have the same cluster size
> for all. It makes the code easier.

Paolo

Patch

diff --git a/Makefile.objs b/Makefile.objs
index a68cdac..df64f70 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -13,6 +13,7 @@  block-obj-$(CONFIG_POSIX) += aio-posix.o
 block-obj-$(CONFIG_WIN32) += aio-win32.o
 block-obj-y += block/
 block-obj-y += qapi-types.o qapi-visit.o
+block-obj-y += backup.o
 
 block-obj-y += qemu-coroutine.o qemu-coroutine-lock.o qemu-coroutine-io.o
 block-obj-y += qemu-coroutine-sleep.o
diff --git a/backup.c b/backup.c
new file mode 100644
index 0000000..ed6851a
--- /dev/null
+++ b/backup.c
@@ -0,0 +1,338 @@ 
+/*
+ * QEMU backup
+ *
+ * Copyright (C) 2012 Proxmox Server Solutions
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include <stdio.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include "block/block.h"
+#include "block/block_int.h"
+#include "block/blockjob.h"
+#include "qemu/ratelimit.h"
+#include "backup.h"
+
+#define DEBUG_BACKUP 0
+
+#define DPRINTF(fmt, ...) \
+    do { if (DEBUG_BACKUP) { printf("backup: " fmt, ## __VA_ARGS__); } } \
+    while (0)
+
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct BackupBlockJob {
+    BlockJob common;
+    RateLimit limit;
+    uint64_t sectors_read;
+    unsigned long *bitmap;
+    int bitmap_size;
+    BackupDumpFunc *backup_dump_cb;
+    BlockDriverCompletionFunc *backup_complete_cb;
+    void *opaque;
+} BackupBlockJob;
+
+static int backup_get_bitmap(BackupBlockJob *job, int64_t cluster_num)
+{
+    assert(job);
+    assert(job->bitmap);
+
+    unsigned long val, idx, bit;
+
+    idx = cluster_num / BITS_PER_LONG;
+
+    assert(job->bitmap_size > idx);
+
+    bit = cluster_num % BITS_PER_LONG;
+    val = job->bitmap[idx];
+
+    return !!(val & (1UL << bit));
+}
+
+static void backup_set_bitmap(BackupBlockJob *job, int64_t cluster_num,
+                              int dirty)
+{
+    assert(job);
+    assert(job->bitmap);
+
+    unsigned long val, idx, bit;
+
+    idx = cluster_num / BITS_PER_LONG;
+
+    assert(job->bitmap_size > idx);
+
+    bit = cluster_num % BITS_PER_LONG;
+    val = job->bitmap[idx];
+    if (dirty) {
+        if (!(val & (1UL << bit))) {
+            val |= 1UL << bit;
+        }
+    } else {
+        if (val & (1UL << bit)) {
+            val &= ~(1UL << bit);
+        }
+    }
+    job->bitmap[idx] = val;
+}
+
+static int backup_in_progress_count;
+
+static int coroutine_fn backup_do_cow(BlockDriverState *bs,
+                                      int64_t sector_num, int nb_sectors)
+{
+    assert(bs);
+    BackupBlockJob *job = (BackupBlockJob *)bs->job;
+    assert(job);
+
+    BlockDriver *drv = bs->drv;
+    struct iovec iov;
+    QEMUIOVector bounce_qiov;
+    void *bounce_buffer = NULL;
+    int ret = 0;
+
+    backup_in_progress_count++;
+
+    int64_t start, end;
+
+    start = sector_num / BACKUP_BLOCKS_PER_CLUSTER;
+    end = (sector_num + nb_sectors + BACKUP_BLOCKS_PER_CLUSTER - 1) /
+        BACKUP_BLOCKS_PER_CLUSTER;
+
+    DPRINTF("brdv_co_backup_cow enter %s C%zd %zd %d\n",
+            bdrv_get_device_name(bs), start, sector_num, nb_sectors);
+
+    for (; start < end; start++) {
+        if (backup_get_bitmap(job, start)) {
+            DPRINTF("brdv_co_backup_cow skip C%zd\n", start);
+            continue; /* already copied */
+        }
+
+        /* immediately set bitmap (avoid coroutine race) */
+        backup_set_bitmap(job, start, 1);
+
+        DPRINTF("brdv_co_backup_cow C%zd\n", start);
+
+        if (!bounce_buffer) {
+            iov.iov_len = BACKUP_CLUSTER_SIZE;
+            iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
+            qemu_iovec_init_external(&bounce_qiov, &iov, 1);
+        }
+
+        ret = drv->bdrv_co_readv(bs, start * BACKUP_BLOCKS_PER_CLUSTER,
+                                 BACKUP_BLOCKS_PER_CLUSTER,
+                                 &bounce_qiov);
+
+        job->sectors_read += BACKUP_BLOCKS_PER_CLUSTER;
+
+        if (ret < 0) {
+            DPRINTF("brdv_co_backup_cow bdrv_read C%zd failed\n", start);
+            goto out;
+        }
+
+        ret = job->backup_dump_cb(job->opaque, bs, start, bounce_buffer);
+        if (ret < 0) {
+            DPRINTF("brdv_co_backup_cow dump_cluster_cb C%zd failed\n", start);
+            goto out;
+        }
+
+        DPRINTF("brdv_co_backup_cow done C%zd\n", start);
+    }
+
+out:
+    if (bounce_buffer) {
+        qemu_vfree(bounce_buffer);
+    }
+
+    backup_in_progress_count--;
+
+    return ret;
+}
+
+static int coroutine_fn backup_before_read(BlockDriverState *bs,
+                                           int64_t sector_num,
+                                           int nb_sectors, QEMUIOVector *qiov)
+{
+    return backup_do_cow(bs, sector_num, nb_sectors);
+}
+
+static int coroutine_fn backup_before_write(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            int nb_sectors, QEMUIOVector *qiov)
+{
+    return backup_do_cow(bs, sector_num, nb_sectors);
+}
+
+static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    BackupBlockJob *s = container_of(job, BackupBlockJob, common);
+
+    if (speed < 0) {
+        error_set(errp, QERR_INVALID_PARAMETER, "speed");
+        return;
+    }
+    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType backup_job_type = {
+    .instance_size = sizeof(BackupBlockJob),
+    .before_read = backup_before_read,
+    .before_write = backup_before_write,
+    .job_type = "backup",
+    .set_speed = backup_set_speed,
+};
+
+static void coroutine_fn backup_run(void *opaque)
+{
+    BackupBlockJob *job = opaque;
+    BlockDriverState *bs = job->common.bs;
+    assert(bs);
+
+    int64_t start, end;
+
+    start = 0;
+    end = (bs->total_sectors + BACKUP_BLOCKS_PER_CLUSTER - 1) /
+        BACKUP_BLOCKS_PER_CLUSTER;
+
+    DPRINTF("backup_run start %s %zd %zd\n", bdrv_get_device_name(bs),
+            start, end);
+
+    int ret = 0;
+
+    for (; start < end; start++) {
+        if (block_job_is_cancelled(&job->common)) {
+            ret = -1;
+            break;
+        }
+
+        /* we need to yield so that qemu_aio_flush() returns.
+         * (without, VM does not reboot)
+        * Note: use 1000 instead of 0 (0 priorize this task too much)
+         */
+        if (job->common.speed) {
+            uint64_t delay_ns = ratelimit_calculate_delay(
+                &job->limit, job->sectors_read);
+            job->sectors_read = 0;
+            block_job_sleep_ns(&job->common, rt_clock, delay_ns);
+        } else {
+            block_job_sleep_ns(&job->common, rt_clock, 1000);
+        }
+
+        if (block_job_is_cancelled(&job->common)) {
+            ret = -1;
+            break;
+        }
+
+        if (backup_get_bitmap(job, start)) {
+            continue; /* already copied */
+        }
+
+        DPRINTF("backup_run loop C%zd\n", start);
+
+        /**
+         * This triggers a cluster copy
+         * Note: avoid direct call to brdv_co_backup_cow, because
+         * this does not call tracked_request_begin()
+         */
+        ret = bdrv_co_backup(bs, start*BACKUP_BLOCKS_PER_CLUSTER, 1);
+        if (ret < 0) {
+            break;
+        }
+        /* Publish progress */
+        job->common.offset += BACKUP_CLUSTER_SIZE;
+    }
+
+    while (backup_in_progress_count > 0) {
+        DPRINTF("backup_run backup_in_progress_count != 0 (%d)",
+                backup_in_progress_count);
+        block_job_sleep_ns(&job->common, rt_clock, 10000);
+
+    }
+
+    DPRINTF("backup_run complete %d\n", ret);
+    block_job_completed(&job->common, ret);
+}
+
+static void backup_job_cleanup_cb(void *opaque, int ret)
+{
+    BlockDriverState *bs = opaque;
+    assert(bs);
+    BackupBlockJob *job = (BackupBlockJob *)bs->job;
+    assert(job);
+
+    DPRINTF("backup_job_cleanup_cb start %d\n", ret);
+
+    job->backup_complete_cb(job->opaque, ret);
+
+    DPRINTF("backup_job_cleanup_cb end\n");
+
+    g_free(job->bitmap);
+}
+
+void
+backup_job_start(BlockDriverState *bs, bool cancel)
+{
+    assert(bs);
+    assert(bs->job);
+    assert(bs->job->co == NULL);
+
+    if (cancel) {
+        block_job_cancel(bs->job); /* set cancel flag */
+    }
+
+    bs->job->co = qemu_coroutine_create(backup_run);
+    qemu_coroutine_enter(bs->job->co, bs->job);
+}
+
+int
+backup_job_create(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
+                  BlockDriverCompletionFunc *backup_complete_cb,
+                  void *opaque, int64_t speed)
+{
+    assert(bs);
+    assert(backup_dump_cb);
+    assert(backup_complete_cb);
+
+    if (bs->job) {
+        DPRINTF("bdrv_backup_init failed - running job on %s\n",
+                bdrv_get_device_name(bs));
+        return -1;
+    }
+
+    int64_t bitmap_size;
+    const char *devname = bdrv_get_device_name(bs);
+
+    if (!devname || !devname[0]) {
+        return -1;
+    }
+
+    DPRINTF("bdrv_backup_init %s\n", bdrv_get_device_name(bs));
+
+    Error *errp;
+    BackupBlockJob *job = block_job_create(&backup_job_type, bs, speed,
+                                           backup_job_cleanup_cb, bs, &errp);
+
+    job->common.cluster_size = BACKUP_CLUSTER_SIZE;
+
+    bitmap_size = bs->total_sectors +
+        BACKUP_BLOCKS_PER_CLUSTER * BITS_PER_LONG - 1;
+    bitmap_size /= BACKUP_BLOCKS_PER_CLUSTER * BITS_PER_LONG;
+
+    job->backup_dump_cb = backup_dump_cb;
+    job->backup_complete_cb = backup_complete_cb;
+    job->opaque = opaque;
+    job->bitmap_size = bitmap_size;
+    job->bitmap = g_new0(unsigned long, bitmap_size);
+
+    job->common.len = bs->total_sectors*BDRV_SECTOR_SIZE;
+
+    return 0;
+}
diff --git a/backup.h b/backup.h
new file mode 100644
index 0000000..74f04f4
--- /dev/null
+++ b/backup.h
@@ -0,0 +1,32 @@ 
+/*
+ * QEMU backup related definitions
+ *
+ * Copyright (C) Proxmox Server Solutions
+ *
+ * Authors:
+ *  Dietmar Maurer (dietmar@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_BACKUP_H
+#define QEMU_BACKUP_H
+
+#include <uuid/uuid.h>
+
+#define BACKUP_CLUSTER_BITS 16
+#define BACKUP_CLUSTER_SIZE (1<<BACKUP_CLUSTER_BITS)
+#define BACKUP_BLOCKS_PER_CLUSTER (BACKUP_CLUSTER_SIZE/BDRV_SECTOR_SIZE)
+
+typedef int BackupDumpFunc(void *opaque, BlockDriverState *bs,
+                           int64_t cluster_num, unsigned char *buf);
+
+void backup_job_start(BlockDriverState *bs, bool cancel);
+
+int backup_job_create(BlockDriverState *bs, BackupDumpFunc *backup_dump_cb,
+                      BlockDriverCompletionFunc *backup_complete_cb,
+                      void *opaque, int64_t speed);
+
+#endif /* QEMU_BACKUP_H */
diff --git a/block.c b/block.c
index 50dab8e..6e6d08f 100644
--- a/block.c
+++ b/block.c
@@ -54,6 +54,7 @@ 
 typedef enum {
     BDRV_REQ_COPY_ON_READ = 0x1,
     BDRV_REQ_ZERO_WRITE   = 0x2,
+    BDRV_REQ_BACKUP_ONLY  = 0x4,
 } BdrvRequestFlags;
 
 static void bdrv_dev_change_media_cb(BlockDriverState *bs, bool load);
@@ -1554,7 +1555,7 @@  int bdrv_commit(BlockDriverState *bs)
 
     if (!drv)
         return -ENOMEDIUM;
-    
+
     if (!bs->backing_hd) {
         return -ENOTSUP;
     }
@@ -1691,6 +1692,22 @@  void bdrv_round_to_clusters(BlockDriverState *bs,
     }
 }
 
+/**
+ * Round a region to job cluster boundaries
+ */
+static void round_to_job_clusters(BlockDriverState *bs,
+                                  int64_t sector_num, int nb_sectors,
+                                  int job_cluster_size,
+                                  int64_t *cluster_sector_num,
+                                  int *cluster_nb_sectors)
+{
+    int64_t c = job_cluster_size/BDRV_SECTOR_SIZE;
+
+    *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
+    *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
+                                        nb_sectors, c);
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
                                      int64_t sector_num, int nb_sectors) {
     /*        aaaa   bbbb */
@@ -1705,7 +1722,9 @@  static bool tracked_request_overlaps(BdrvTrackedRequest *req,
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors)
+                                                       int64_t sector_num,
+                                                       int nb_sectors,
+                                                       int job_cluster_size)
 {
     BdrvTrackedRequest *req;
     int64_t cluster_sector_num;
@@ -1721,6 +1740,11 @@  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
     bdrv_round_to_clusters(bs, sector_num, nb_sectors,
                            &cluster_sector_num, &cluster_nb_sectors);
 
+    if (job_cluster_size) {
+        round_to_job_clusters(bs, sector_num, nb_sectors, job_cluster_size,
+                              &cluster_sector_num, &cluster_nb_sectors);
+    }
+
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
@@ -2260,12 +2284,24 @@  static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         bs->copy_on_read_in_flight++;
     }
 
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    int job_cluster_size = bs->job && bs->job->cluster_size ?
+        bs->job->cluster_size : 0;
+
+    if (bs->copy_on_read_in_flight || job_cluster_size) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      job_cluster_size);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
 
+    if (bs->job && bs->job->job_type->before_read) {
+        ret = bs->job->job_type->before_read(bs, sector_num, nb_sectors, qiov);
+        if ((ret < 0) || (flags & BDRV_REQ_BACKUP_ONLY)) {
+            /* Note: We do not return any data to the caller */
+            goto out;
+        }
+    }
+
     if (flags & BDRV_REQ_COPY_ON_READ) {
         int pnum;
 
@@ -2309,6 +2345,17 @@  int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
                             BDRV_REQ_COPY_ON_READ);
 }
 
+int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors)
+{
+    if (!bs->job) {
+        return -ENOTSUP;
+    }
+
+    return bdrv_co_do_readv(bs, sector_num, nb_sectors, NULL,
+                            BDRV_REQ_BACKUP_ONLY);
+}
+
 static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors)
 {
@@ -2366,12 +2413,23 @@  static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         bdrv_io_limits_intercept(bs, true, nb_sectors);
     }
 
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    int job_cluster_size = bs->job && bs->job->cluster_size ?
+        bs->job->cluster_size : 0;
+
+    if (bs->copy_on_read_in_flight || job_cluster_size) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      job_cluster_size);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
 
+    if (bs->job && bs->job->job_type->before_write) {
+        ret = bs->job->job_type->before_write(bs, sector_num, nb_sectors, qiov);
+        if (ret < 0) {
+            goto out;
+        }
+    }
+
     if (flags & BDRV_REQ_ZERO_WRITE) {
         ret = bdrv_co_do_write_zeroes(bs, sector_num, nb_sectors);
     } else {
@@ -2390,6 +2448,7 @@  static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         bs->wr_highest_sector = sector_num + nb_sectors - 1;
     }
 
+out:
     tracked_request_end(&req);
 
     return ret;
diff --git a/include/block/block.h b/include/block/block.h
index 5c3b911..b6144be 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -172,6 +172,8 @@  int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+int coroutine_fn bdrv_co_backup(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors);
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov);
 /*
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index c290d07..6f42495 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -50,6 +50,13 @@  typedef struct BlockJobType {
      * manually.
      */
     void (*complete)(BlockJob *job, Error **errp);
+
+    /** tracked requests */
+    int coroutine_fn (*before_read)(BlockDriverState *bs, int64_t sector_num,
+                                    int nb_sectors, QEMUIOVector *qiov);
+    int coroutine_fn (*before_write)(BlockDriverState *bs, int64_t sector_num,
+                                     int nb_sectors, QEMUIOVector *qiov);
+
 } BlockJobType;
 
 /**
@@ -103,6 +110,9 @@  struct BlockJob {
     /** Speed that was set with @block_job_set_speed.  */
     int64_t speed;
 
+    /** tracked requests */
+    int cluster_size;
+
     /** The completion function that will be called when the job completes.  */
     BlockDriverCompletionFunc *cb;