diff mbox series

[v2,3/4] qapi: blockdev-backup: add discard-source parameter

Message ID 20240117160737.1057513-4-vsementsov@yandex-team.ru
State New
Headers show
Series backup: discard-source parameter | expand

Commit Message

Vladimir Sementsov-Ogievskiy Jan. 17, 2024, 4:07 p.m. UTC
Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:

[guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
   |            |
   | root       | file
   v            v
[copy-before-write]
   |             |
   | file        | target
   v             v
[active disk]   [temp.img]

In this case discard-after-copy does two things:

 - discard data in temp.img to save disk space
 - avoid further copy-before-write operation in discarded area

Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 block/backup.c                         |  5 +++--
 block/block-copy.c                     | 12 ++++++++++--
 block/copy-before-write.c              |  2 +-
 block/replication.c                    |  4 ++--
 blockdev.c                             |  2 +-
 include/block/block-copy.h             |  2 +-
 include/block/block_int-global-state.h |  2 +-
 qapi/block-core.json                   |  4 ++++
 8 files changed, 23 insertions(+), 10 deletions(-)

Comments

Fiona Ebner Jan. 19, 2024, 2:46 p.m. UTC | #1
Am 17.01.24 um 17:07 schrieb Vladimir Sementsov-Ogievskiy:
> Add a parameter that enables discard-after-copy. That is mostly useful
> in "push backup with fleecing" scheme, when source is snapshot-access
> format driver node, based on copy-before-write filter snapshot-access
> API:
> 
> [guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
>    |            |
>    | root       | file
>    v            v
> [copy-before-write]
>    |             |
>    | file        | target
>    v             v
> [active disk]   [temp.img]
> 
> In this case discard-after-copy does two things:
> 
>  - discard data in temp.img to save disk space
>  - avoid further copy-before-write operation in discarded area
> 
> Note that we have to declare WRITE permission on source in
> copy-before-write filter, for discard to work.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Unfortunately, setting BLK_PERM_WRITE unconditionally breaks
blockdev-backup for a read-only node (even when not using discard-source):

> #!/bin/bash
> ./qemu-img create /tmp/disk.raw -f raw 1G
> ./qemu-img create /tmp/backup.raw -f raw 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw,read-only=true \
> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/backup.raw \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "blockdev-backup", "arguments": { "device": "node0", "target": "node1", "sync": "full", "job-id": "backup0" } }
> EOF

The above works before applying this patch, but leads to an error
afterwards:

> {"error": {"class": "GenericError", "desc": "Block node is read-only"}}

Best Regards,
Fiona
Fiona Ebner Jan. 24, 2024, 3:03 p.m. UTC | #2
Am 17.01.24 um 17:07 schrieb Vladimir Sementsov-Ogievskiy:
> Add a parameter that enables discard-after-copy. That is mostly useful
> in "push backup with fleecing" scheme, when source is snapshot-access
> format driver node, based on copy-before-write filter snapshot-access
> API:
> 
> [guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
>    |            |
>    | root       | file
>    v            v
> [copy-before-write]
>    |             |
>    | file        | target
>    v             v
> [active disk]   [temp.img]
> 
> In this case discard-after-copy does two things:
> 
>  - discard data in temp.img to save disk space
>  - avoid further copy-before-write operation in discarded area
> 
> Note that we have to declare WRITE permission on source in
> copy-before-write filter, for discard to work.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Ran into another issue when the cluster_size of the fleecing image is
larger than for the backup target, e.g.

> #!/bin/bash
> rm /tmp/fleecing.qcow2
> ./qemu-img create /tmp/disk.qcow2 -f qcow2 1G
> ./qemu-img create /tmp/fleecing.qcow2 -o cluster_size=2M -f qcow2 1G
> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2,discard=unmap \
> --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "discard": "unmap", "node-name": "snap0" } }
> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node2", "sync": "full", "job-id": "backup0", "discard-source": true } }
> EOF

will fail with

> qemu-system-x86_64: ../util/hbitmap.c:570: hbitmap_reset: Assertion `QEMU_IS_ALIGNED(count, gran) || (start + count == hb->orig_size)' failed.

Backtrace shows the assert happens while discarding, when resetting the
BDRVCopyBeforeWriteState access_bitmap
 > #6  0x0000555556142a2a in hbitmap_reset (hb=0x555557e01b80, start=0,
count=1048576) at ../util/hbitmap.c:570
> #7  0x0000555555f80764 in bdrv_reset_dirty_bitmap_locked (bitmap=0x55555850a660, offset=0, bytes=1048576) at ../block/dirty-bitmap.c:563
> #8  0x0000555555f807ab in bdrv_reset_dirty_bitmap (bitmap=0x55555850a660, offset=0, bytes=1048576) at ../block/dirty-bitmap.c:570
> #9  0x0000555555f7bb16 in cbw_co_pdiscard_snapshot (bs=0x5555581a7f60, offset=0, bytes=1048576) at ../block/copy-before-write.c:330
> #10 0x0000555555f8d00a in bdrv_co_pdiscard_snapshot (bs=0x5555581a7f60, offset=0, bytes=1048576) at ../block/io.c:3734
> #11 0x0000555555fd2380 in snapshot_access_co_pdiscard (bs=0x5555582b4f60, offset=0, bytes=1048576) at ../block/snapshot-access.c:55
> #12 0x0000555555f8b65d in bdrv_co_pdiscard (child=0x5555584fe790, offset=0, bytes=1048576) at ../block/io.c:3144
> #13 0x0000555555f78650 in block_copy_task_entry (task=0x555557f588f0) at ../block/block-copy.c:597

My guess for the cause is that in block_copy_calculate_cluster_size() we
only look at the target. But now that we need to discard the source,
we'll also need to consider that for the calculation?

Best Regards,
Fiona
Fiona Ebner Jan. 25, 2024, 12:47 p.m. UTC | #3
Am 24.01.24 um 16:03 schrieb Fiona Ebner:
> Am 17.01.24 um 17:07 schrieb Vladimir Sementsov-Ogievskiy:
>> Add a parameter that enables discard-after-copy. That is mostly useful
>> in "push backup with fleecing" scheme, when source is snapshot-access
>> format driver node, based on copy-before-write filter snapshot-access
>> API:
>>
>> [guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
>>    |            |
>>    | root       | file
>>    v            v
>> [copy-before-write]
>>    |             |
>>    | file        | target
>>    v             v
>> [active disk]   [temp.img]
>>
>> In this case discard-after-copy does two things:
>>
>>  - discard data in temp.img to save disk space
>>  - avoid further copy-before-write operation in discarded area
>>
>> Note that we have to declare WRITE permission on source in
>> copy-before-write filter, for discard to work.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> Ran into another issue when the cluster_size of the fleecing image is
> larger than for the backup target, e.g.
> 
>> #!/bin/bash
>> rm /tmp/fleecing.qcow2
>> ./qemu-img create /tmp/disk.qcow2 -f qcow2 1G
>> ./qemu-img create /tmp/fleecing.qcow2 -o cluster_size=2M -f qcow2 1G
>> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
>> ./qemu-system-x86_64 --qmp stdio \
>> --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
>> --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2,discard=unmap \
>> --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
>> <<EOF
>> {"execute": "qmp_capabilities"}
>> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
>> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "discard": "unmap", "node-name": "snap0" } }
>> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node2", "sync": "full", "job-id": "backup0", "discard-source": true } }
>> EOF
> 
> will fail with
> 
>> qemu-system-x86_64: ../util/hbitmap.c:570: hbitmap_reset: Assertion `QEMU_IS_ALIGNED(count, gran) || (start + count == hb->orig_size)' failed.
> 
> Backtrace shows the assert happens while discarding, when resetting the
> BDRVCopyBeforeWriteState access_bitmap
>  > #6  0x0000555556142a2a in hbitmap_reset (hb=0x555557e01b80, start=0,
> count=1048576) at ../util/hbitmap.c:570
>> #7  0x0000555555f80764 in bdrv_reset_dirty_bitmap_locked (bitmap=0x55555850a660, offset=0, bytes=1048576) at ../block/dirty-bitmap.c:563
>> #8  0x0000555555f807ab in bdrv_reset_dirty_bitmap (bitmap=0x55555850a660, offset=0, bytes=1048576) at ../block/dirty-bitmap.c:570
>> #9  0x0000555555f7bb16 in cbw_co_pdiscard_snapshot (bs=0x5555581a7f60, offset=0, bytes=1048576) at ../block/copy-before-write.c:330
>> #10 0x0000555555f8d00a in bdrv_co_pdiscard_snapshot (bs=0x5555581a7f60, offset=0, bytes=1048576) at ../block/io.c:3734
>> #11 0x0000555555fd2380 in snapshot_access_co_pdiscard (bs=0x5555582b4f60, offset=0, bytes=1048576) at ../block/snapshot-access.c:55
>> #12 0x0000555555f8b65d in bdrv_co_pdiscard (child=0x5555584fe790, offset=0, bytes=1048576) at ../block/io.c:3144
>> #13 0x0000555555f78650 in block_copy_task_entry (task=0x555557f588f0) at ../block/block-copy.c:597
> 
> My guess for the cause is that in block_copy_calculate_cluster_size() we
> only look at the target. But now that we need to discard the source,
> we'll also need to consider that for the calculation?
> 

Just querying the source and picking the maximum won't work either,
because snapshot-access does not currently implement .bdrv_co_get_info
and because copy-before-write (doesn't implement .bdrv_co_get_info and
is a filter) will just return the info of its file child. But the
discard will go to the target child.

If I do

1. .bdrv_co_get_info in snapshot-access: return info from file child
2. .bdrv_co_get_info in copy-before-write: return maximum cluster_size
from file child and target child
3. block_copy_calculate_cluster_size: return maximum from source and target

then the issue does go away, but I don't know if that's not violating
any assumptions and probably there's a better way to avoid the issue?

Best Regards,
Fiona
Vladimir Sementsov-Ogievskiy Jan. 25, 2024, 5:22 p.m. UTC | #4
On 25.01.24 15:47, Fiona Ebner wrote:
> Am 24.01.24 um 16:03 schrieb Fiona Ebner:
>> Am 17.01.24 um 17:07 schrieb Vladimir Sementsov-Ogievskiy:
>>> Add a parameter that enables discard-after-copy. That is mostly useful
>>> in "push backup with fleecing" scheme, when source is snapshot-access
>>> format driver node, based on copy-before-write filter snapshot-access
>>> API:
>>>
>>> [guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
>>>     |            |
>>>     | root       | file
>>>     v            v
>>> [copy-before-write]
>>>     |             |
>>>     | file        | target
>>>     v             v
>>> [active disk]   [temp.img]
>>>
>>> In this case discard-after-copy does two things:
>>>
>>>   - discard data in temp.img to save disk space
>>>   - avoid further copy-before-write operation in discarded area
>>>
>>> Note that we have to declare WRITE permission on source in
>>> copy-before-write filter, for discard to work.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>
>> Ran into another issue when the cluster_size of the fleecing image is
>> larger than for the backup target, e.g.
>>
>>> #!/bin/bash
>>> rm /tmp/fleecing.qcow2
>>> ./qemu-img create /tmp/disk.qcow2 -f qcow2 1G
>>> ./qemu-img create /tmp/fleecing.qcow2 -o cluster_size=2M -f qcow2 1G
>>> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
>>> ./qemu-system-x86_64 --qmp stdio \
>>> --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
>>> --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2,discard=unmap \
>>> --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
>>> <<EOF
>>> {"execute": "qmp_capabilities"}
>>> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
>>> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "discard": "unmap", "node-name": "snap0" } }
>>> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node2", "sync": "full", "job-id": "backup0", "discard-source": true } }
>>> EOF
>>
>> will fail with
>>
>>> qemu-system-x86_64: ../util/hbitmap.c:570: hbitmap_reset: Assertion `QEMU_IS_ALIGNED(count, gran) || (start + count == hb->orig_size)' failed.
>>
>> Backtrace shows the assert happens while discarding, when resetting the
>> BDRVCopyBeforeWriteState access_bitmap
>>   > #6  0x0000555556142a2a in hbitmap_reset (hb=0x555557e01b80, start=0,
>> count=1048576) at ../util/hbitmap.c:570
>>> #7  0x0000555555f80764 in bdrv_reset_dirty_bitmap_locked (bitmap=0x55555850a660, offset=0, bytes=1048576) at ../block/dirty-bitmap.c:563
>>> #8  0x0000555555f807ab in bdrv_reset_dirty_bitmap (bitmap=0x55555850a660, offset=0, bytes=1048576) at ../block/dirty-bitmap.c:570
>>> #9  0x0000555555f7bb16 in cbw_co_pdiscard_snapshot (bs=0x5555581a7f60, offset=0, bytes=1048576) at ../block/copy-before-write.c:330
>>> #10 0x0000555555f8d00a in bdrv_co_pdiscard_snapshot (bs=0x5555581a7f60, offset=0, bytes=1048576) at ../block/io.c:3734
>>> #11 0x0000555555fd2380 in snapshot_access_co_pdiscard (bs=0x5555582b4f60, offset=0, bytes=1048576) at ../block/snapshot-access.c:55
>>> #12 0x0000555555f8b65d in bdrv_co_pdiscard (child=0x5555584fe790, offset=0, bytes=1048576) at ../block/io.c:3144
>>> #13 0x0000555555f78650 in block_copy_task_entry (task=0x555557f588f0) at ../block/block-copy.c:597
>>
>> My guess for the cause is that in block_copy_calculate_cluster_size() we
>> only look at the target. But now that we need to discard the source,
>> we'll also need to consider that for the calculation?
>>
> 
> Just querying the source and picking the maximum won't work either,
> because snapshot-access does not currently implement .bdrv_co_get_info
> and because copy-before-write (doesn't implement .bdrv_co_get_info and
> is a filter) will just return the info of its file child. But the
> discard will go to the target child.
> 
> If I do
> 
> 1. .bdrv_co_get_info in snapshot-access: return info from file child
> 2. .bdrv_co_get_info in copy-before-write: return maximum cluster_size
> from file child and target child
> 3. block_copy_calculate_cluster_size: return maximum from source and target
> 
> then the issue does go away, but I don't know if that's not violating
> any assumptions and probably there's a better way to avoid the issue?
> 


Hmm. Taking maximum is not optimal for usual case without discard-source: user may want to work in smaller granularity than source, to save disk space.

In case with discarding we have two possibilities:

- either take larger granularity for the whole process like you propose (but this will need and option for CBW?)
- or, fix discarding bitmap in CBW to work like normal discard: it should be aligned down. This will lead actually to discard-source option doing nothing..

==
But why do you want fleecing image with larger granularity? Is that a real case or just experimenting? Still we should fix assertion anyway.

I think:

1. fix discarding bitmap to make aligning-down (will do that for v3)

2. if we need another logic for block_copy_calculate_cluster_size() it should be an option. May be explicit "copy-cluster-size" or "granularity" option for CBW driver and for backup job. And we'll just check that given cluster-size is power of two >= target_size.
Vladimir Sementsov-Ogievskiy Jan. 25, 2024, 5:28 p.m. UTC | #5
On 19.01.24 17:46, Fiona Ebner wrote:
> Am 17.01.24 um 17:07 schrieb Vladimir Sementsov-Ogievskiy:
>> Add a parameter that enables discard-after-copy. That is mostly useful
>> in "push backup with fleecing" scheme, when source is snapshot-access
>> format driver node, based on copy-before-write filter snapshot-access
>> API:
>>
>> [guest]      [snapshot-access] ~~ blockdev-backup ~~> [backup target]
>>     |            |
>>     | root       | file
>>     v            v
>> [copy-before-write]
>>     |             |
>>     | file        | target
>>     v             v
>> [active disk]   [temp.img]
>>
>> In this case discard-after-copy does two things:
>>
>>   - discard data in temp.img to save disk space
>>   - avoid further copy-before-write operation in discarded area
>>
>> Note that we have to declare WRITE permission on source in
>> copy-before-write filter, for discard to work.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> Unfortunately, setting BLK_PERM_WRITE unconditionally breaks
> blockdev-backup for a read-only node (even when not using discard-source):

Ohh, right.

So, that's the place when we have to somehow pass through discrard-souce option to CBW filter, so that it know that WRITE permission is needed. Will try in v3. Thanks again for testing!

> 
>> #!/bin/bash
>> ./qemu-img create /tmp/disk.raw -f raw 1G
>> ./qemu-img create /tmp/backup.raw -f raw 1G
>> ./qemu-system-x86_64 --qmp stdio \
>> --blockdev raw,node-name=node0,file.driver=file,file.filename=/tmp/disk.raw,read-only=true \
>> --blockdev raw,node-name=node1,file.driver=file,file.filename=/tmp/backup.raw \
>> <<EOF
>> {"execute": "qmp_capabilities"}
>> {"execute": "blockdev-backup", "arguments": { "device": "node0", "target": "node1", "sync": "full", "job-id": "backup0" } }
>> EOF
> 
> The above works before applying this patch, but leads to an error
> afterwards:
> 
>> {"error": {"class": "GenericError", "desc": "Block node is read-only"}}
> 
> Best Regards,
> Fiona
>
Fiona Ebner Jan. 26, 2024, 9:21 a.m. UTC | #6
Am 25.01.24 um 18:22 schrieb Vladimir Sementsov-Ogievskiy:
> 
> Hmm. Taking maximum is not optimal for usual case without
> discard-source: user may want to work in smaller granularity than
> source, to save disk space.
> 
> In case with discarding we have two possibilities:
> 
> - either take larger granularity for the whole process like you propose
> (but this will need and option for CBW?)
> - or, fix discarding bitmap in CBW to work like normal discard: it
> should be aligned down. This will lead actually to discard-source option
> doing nothing..
> 
> ==
> But why do you want fleecing image with larger granularity? Is that a
> real case or just experimenting? Still we should fix assertion anyway.
> 

Yes, it's a real use case. We do support different storage types and
want to allow users to place the fleecing image on a different storage
than the original image for flexibility.

I ran into the issue when backing up to a target with 1 MiB cluster_size
while using a fleecing image on RBD (which has 4 MiB cluster_size by
default).

In theory, I guess I could look into querying the cluster_size of the
backup target and trying to allocate the fleecing image with a small
enough cluster_size. But not sure if that would work on all storage
combinations, and would require teaching our storage plugin API (which
also supports third-party plugins) to perform allocation with a specific
cluster size. So not an ideal solution for us.

> I think:
> 
> 1. fix discarding bitmap to make aligning-down (will do that for v3)
> 

Thanks!

> 2. if we need another logic for block_copy_calculate_cluster_size() it
> should be an option. May be explicit "copy-cluster-size" or
> "granularity" option for CBW driver and for backup job. And we'll just
> check that given cluster-size is power of two >= target_size.
> 

I'll try to implement point 2. That should resolve the issue for our use
case.

Best Regards,
Fiona
diff mbox series

Patch

diff --git a/block/backup.c b/block/backup.c
index ec29d6b810..0c86b5be55 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -356,7 +356,7 @@  BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   BlockDriverState *target, int64_t speed,
                   MirrorSyncMode sync_mode, BdrvDirtyBitmap *sync_bitmap,
                   BitmapSyncMode bitmap_mode,
-                  bool compress,
+                  bool compress, bool discard_source,
                   const char *filter_node_name,
                   BackupPerf *perf,
                   BlockdevOnError on_source_error,
@@ -491,7 +491,8 @@  BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->len = len;
     job->perf = *perf;
 
-    block_copy_set_copy_opts(bcs, perf->use_copy_range, compress);
+    block_copy_set_copy_opts(bcs, perf->use_copy_range, compress,
+                             discard_source);
     block_copy_set_progress_meter(bcs, &job->common.job.progress);
     block_copy_set_speed(bcs, speed);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index 8fca2c3698..eebf643e86 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -137,6 +137,7 @@  typedef struct BlockCopyState {
     CoMutex lock;
     int64_t in_flight_bytes;
     BlockCopyMethod method;
+    bool discard_source;
     BlockReqList reqs;
     QLIST_HEAD(, BlockCopyCallState) calls;
     /*
@@ -282,11 +283,12 @@  static uint32_t block_copy_max_transfer(BdrvChild *source, BdrvChild *target)
 }
 
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
-                              bool compress)
+                              bool compress, bool discard_source)
 {
     /* Keep BDRV_REQ_SERIALISING set (or not set) in block_copy_state_new() */
     s->write_flags = (s->write_flags & BDRV_REQ_SERIALISING) |
         (compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+    s->discard_source = discard_source;
 
     if (s->max_transfer < s->cluster_size) {
         /*
@@ -418,7 +420,7 @@  BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                     cluster_size),
     };
 
-    block_copy_set_copy_opts(s, false, false);
+    block_copy_set_copy_opts(s, false, false, false);
 
     ratelimit_init(&s->rate_limit);
     qemu_co_mutex_init(&s->lock);
@@ -589,6 +591,12 @@  static coroutine_fn int block_copy_task_entry(AioTask *task)
     co_put_to_shres(s->mem, t->req.bytes);
     block_copy_task_end(t, ret);
 
+    if (s->discard_source && ret == 0) {
+        int64_t nbytes =
+            MIN(t->req.offset + t->req.bytes, s->len) - t->req.offset;
+        bdrv_co_pdiscard(s->source, t->req.offset, nbytes);
+    }
+
     return ret;
 }
 
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 4cd9b7d91e..b208a80ade 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -370,7 +370,7 @@  cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
              * works even if source node does not have any parents before backup
              * start
              */
-            *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
+            *nperm = *nperm | BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE;
             *nshared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
         }
     }
diff --git a/block/replication.c b/block/replication.c
index ca6bd0a720..0415a5e8b7 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -582,8 +582,8 @@  static void replication_start(ReplicationState *rs, ReplicationMode mode,
 
         s->backup_job = backup_job_create(
                                 NULL, s->secondary_disk->bs, s->hidden_disk->bs,
-                                0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
-                                &perf,
+                                0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, false,
+                                NULL, &perf,
                                 BLOCKDEV_ON_ERROR_REPORT,
                                 BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
                                 backup_job_completed, bs, NULL, &local_err);
diff --git a/blockdev.c b/blockdev.c
index 3a5e7222ec..aa64382f82 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2713,7 +2713,7 @@  static BlockJob *do_backup_common(BackupCommon *backup,
 
     job = backup_job_create(backup->job_id, bs, target_bs, backup->speed,
                             backup->sync, bmap, backup->bitmap_mode,
-                            backup->compress,
+                            backup->compress, backup->discard_source,
                             backup->filter_node_name,
                             &perf,
                             backup->on_source_error,
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 8b41643bfa..9555de2562 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -31,7 +31,7 @@  BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
 
 /* Function should be called prior any actual copy request */
 void block_copy_set_copy_opts(BlockCopyState *s, bool use_copy_range,
-                              bool compress);
+                              bool compress, bool discard_source);
 void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index ef31c58bb3..563aabf012 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -187,7 +187,7 @@  BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                             MirrorSyncMode sync_mode,
                             BdrvDirtyBitmap *sync_bitmap,
                             BitmapSyncMode bitmap_mode,
-                            bool compress,
+                            bool compress, bool discard_source,
                             const char *filter_node_name,
                             BackupPerf *perf,
                             BlockdevOnError on_source_error,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ca390c5700..1735769f9f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1599,6 +1599,9 @@ 
 #     node specified by @drive.  If this option is not given, a node
 #     name is autogenerated.  (Since: 4.2)
 #
+# @discard-source: Discard blocks on source which are already copied
+#     to the target.  (Since 9.0)
+#
 # @x-perf: Performance options.  (Since 6.0)
 #
 # Features:
@@ -1620,6 +1623,7 @@ 
             '*on-target-error': 'BlockdevOnError',
             '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
             '*filter-node-name': 'str',
+            '*discard-source': 'bool',
             '*x-perf': { 'type': 'BackupPerf',
                          'features': [ 'unstable' ] } } }