diff mbox series

[v5,11/16] block: introduce snapshot-access block driver

Message ID 20220228113927.1852146-12-vsementsov@virtuozzo.com
State New
Headers show
Series Make image fleecing more usable | expand

Commit Message

Vladimir Sementsov-Ogievskiy Feb. 28, 2022, 11:39 a.m. UTC
The new block driver simply utilizes snapshot-access API of underlying
block node.

In further patches we want to use it like this:

[guest]                   [NBD export]
   |                            |
   | root                       | root
   v                 file       v
[copy-before-write]<------[snapshot-access]
   |           |
   | file      | target
   v           v
[active-disk] [temp.img]

This way, NBD client will be able to read snapshotted state of active
disk, when active disk is continued to be written by guest. This is
known as "fleecing", and currently uses another scheme based on qcow2
temporary image which backing file is active-disk. New scheme comes
with benefits - see next commit.

The other possible application is exporting internal snapshots of
qcow2, like this:

[guest]          [NBD export]
   |                  |
   | root             | root
   v       file       v
[qcow2]<---------[snapshot-access]

For this, we'll need to implement snapshot-access API handlers in
qcow2 driver, and improve snapshot-access block driver (and API) to
make it possible to select snapshot by name. Another thing to improve
is size of snapshot. Now for simplicity we just use size of bs->file,
which is OK for backup, but for qcow2 snapshots export we'll need to
imporve snapshot-access API to get size of snapshot.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 qapi/block-core.json    |   4 +-
 block/snapshot-access.c | 132 ++++++++++++++++++++++++++++++++++++++++
 MAINTAINERS             |   1 +
 block/meson.build       |   1 +
 4 files changed, 137 insertions(+), 1 deletion(-)
 create mode 100644 block/snapshot-access.c

Comments

Hanna Czenczek March 3, 2022, 11:05 a.m. UTC | #1
On 28.02.22 12:39, Vladimir Sementsov-Ogievskiy wrote:
> The new block driver simply utilizes snapshot-access API of underlying
> block node.
>
> In further patches we want to use it like this:
>
> [guest]                   [NBD export]
>     |                            |
>     | root                       | root
>     v                 file       v
> [copy-before-write]<------[snapshot-access]
>     |           |
>     | file      | target
>     v           v
> [active-disk] [temp.img]
>
> This way, NBD client will be able to read snapshotted state of active
> disk, when active disk is continued to be written by guest. This is
> known as "fleecing", and currently uses another scheme based on qcow2
> temporary image which backing file is active-disk. New scheme comes
> with benefits - see next commit.
>
> The other possible application is exporting internal snapshots of
> qcow2, like this:
>
> [guest]          [NBD export]
>     |                  |
>     | root             | root
>     v       file       v
> [qcow2]<---------[snapshot-access]
>
> For this, we'll need to implement snapshot-access API handlers in
> qcow2 driver, and improve snapshot-access block driver (and API) to
> make it possible to select snapshot by name. Another thing to improve
> is size of snapshot. Now for simplicity we just use size of bs->file,
> which is OK for backup, but for qcow2 snapshots export we'll need to
> imporve snapshot-access API to get size of snapshot.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>   qapi/block-core.json    |   4 +-
>   block/snapshot-access.c | 132 ++++++++++++++++++++++++++++++++++++++++
>   MAINTAINERS             |   1 +
>   block/meson.build       |   1 +
>   4 files changed, 137 insertions(+), 1 deletion(-)
>   create mode 100644 block/snapshot-access.c

[...]

> diff --git a/block/snapshot-access.c b/block/snapshot-access.c
> new file mode 100644
> index 0000000000..77b87c1946
> --- /dev/null
> +++ b/block/snapshot-access.c

[...]

> +static int snapshot_access_open(BlockDriverState *bs, QDict *options, int flags,
> +                                Error **errp)
> +{
> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
> +                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
> +                               false, errp);
> +    if (!bs->file) {
> +        return -EINVAL;
> +    }
> +
> +    bs->total_sectors = bs->file->bs->total_sectors;

(If I hadn’t commented on patch 16, I wouldn’t’ve here, but now I might 
as well...)

Instead of just a comment in the commit message (which noone will really 
read later on), I prefer a TODO or FIXME comment directly here in the 
code, or even better in the API added in the previous patch (i.e. as 
part of the comment in the BlockDriver struct), that this will not work 
for qcow2, i.e. that we will need to inquire the snapshot size from the 
snapshot-providing node.

It’s OK not to implement that now, but I don’t think having a note just 
in the commit message will help us remember.

> +
> +    return 0;
> +}
Hanna Czenczek March 3, 2022, 11:11 a.m. UTC | #2
On 03.03.22 12:05, Hanna Reitz wrote:
> On 28.02.22 12:39, Vladimir Sementsov-Ogievskiy wrote:
>> The new block driver simply utilizes snapshot-access API of underlying
>> block node.
>>
>> In further patches we want to use it like this:
>>
>> [guest]                   [NBD export]
>>     |                            |
>>     | root                       | root
>>     v                 file       v
>> [copy-before-write]<------[snapshot-access]
>>     |           |
>>     | file      | target
>>     v           v
>> [active-disk] [temp.img]
>>
>> This way, NBD client will be able to read snapshotted state of active
>> disk, when active disk is continued to be written by guest. This is
>> known as "fleecing", and currently uses another scheme based on qcow2
>> temporary image which backing file is active-disk. New scheme comes
>> with benefits - see next commit.
>>
>> The other possible application is exporting internal snapshots of
>> qcow2, like this:
>>
>> [guest]          [NBD export]
>>     |                  |
>>     | root             | root
>>     v       file       v
>> [qcow2]<---------[snapshot-access]
>>
>> For this, we'll need to implement snapshot-access API handlers in
>> qcow2 driver, and improve snapshot-access block driver (and API) to
>> make it possible to select snapshot by name. Another thing to improve
>> is size of snapshot. Now for simplicity we just use size of bs->file,
>> which is OK for backup, but for qcow2 snapshots export we'll need to
>> imporve snapshot-access API to get size of snapshot.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   qapi/block-core.json    |   4 +-
>>   block/snapshot-access.c | 132 ++++++++++++++++++++++++++++++++++++++++
>>   MAINTAINERS             |   1 +
>>   block/meson.build       |   1 +
>>   4 files changed, 137 insertions(+), 1 deletion(-)
>>   create mode 100644 block/snapshot-access.c
>
> [...]
>
>> diff --git a/block/snapshot-access.c b/block/snapshot-access.c
>> new file mode 100644
>> index 0000000000..77b87c1946
>> --- /dev/null
>> +++ b/block/snapshot-access.c
>
> [...]
>
>> +static int snapshot_access_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>> +                                Error **errp)
>> +{
>> +    bs->file = bdrv_open_child(NULL, options, "file", bs, 
>> &child_of_bds,
>> +                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
>> +                               false, errp);
>> +    if (!bs->file) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    bs->total_sectors = bs->file->bs->total_sectors;
>
> (If I hadn’t commented on patch 16, I wouldn’t’ve here, but now I 
> might as well...)
>
> Instead of just a comment in the commit message (which noone will 
> really read later on), I prefer a TODO or FIXME comment directly here 
> in the code, or even better in the API added in the previous patch 
> (i.e. as part of the comment in the BlockDriver struct), that this 
> will not work for qcow2, i.e. that we will need to inquire the 
> snapshot size from the snapshot-providing node.
>
> It’s OK not to implement that now, but I don’t think having a note 
> just in the commit message will help us remember.

Considering softfreeze is next week, I’d propose I just add the 
following the patch 10, would that be OK for you?

(In case it is, I’ll hold off on applying patch 16 for now; it’s a test, 
so we can easily add it during freeze)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index c43315ae6e..5c8ad9ed78 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -385,6 +385,12 @@ struct BlockDriver {
       * in generic block-layer: no serializing, no alignment, no tracked
       * requests. So, block-driver that realizes these APIs is fully 
responsible
       * for synchronization between snapshot-access API and normal IO 
requests.
+     *
+     * TODO: To be able to support qcow2's internal snapshots, this API 
will
+     * need to be extended to:
+     * - be able to select a specific snapshot
+     * - receive the snapshot's actual length (which may differ from bs's
+     *   length)
       */
      int coroutine_fn (*bdrv_co_preadv_snapshot)(BlockDriverState *bs,
          int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t 
qiov_offset);
Vladimir Sementsov-Ogievskiy March 3, 2022, 5:26 p.m. UTC | #3
03.03.2022 14:11, Hanna Reitz wrote:
> On 03.03.22 12:05, Hanna Reitz wrote:
>> On 28.02.22 12:39, Vladimir Sementsov-Ogievskiy wrote:
>>> The new block driver simply utilizes snapshot-access API of underlying
>>> block node.
>>>
>>> In further patches we want to use it like this:
>>>
>>> [guest]                   [NBD export]
>>>     |                            |
>>>     | root                       | root
>>>     v                 file       v
>>> [copy-before-write]<------[snapshot-access]
>>>     |           |
>>>     | file      | target
>>>     v           v
>>> [active-disk] [temp.img]
>>>
>>> This way, NBD client will be able to read snapshotted state of active
>>> disk, when active disk is continued to be written by guest. This is
>>> known as "fleecing", and currently uses another scheme based on qcow2
>>> temporary image which backing file is active-disk. New scheme comes
>>> with benefits - see next commit.
>>>
>>> The other possible application is exporting internal snapshots of
>>> qcow2, like this:
>>>
>>> [guest]          [NBD export]
>>>     |                  |
>>>     | root             | root
>>>     v       file       v
>>> [qcow2]<---------[snapshot-access]
>>>
>>> For this, we'll need to implement snapshot-access API handlers in
>>> qcow2 driver, and improve snapshot-access block driver (and API) to
>>> make it possible to select snapshot by name. Another thing to improve
>>> is size of snapshot. Now for simplicity we just use size of bs->file,
>>> which is OK for backup, but for qcow2 snapshots export we'll need to
>>> imporve snapshot-access API to get size of snapshot.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   qapi/block-core.json    |   4 +-
>>>   block/snapshot-access.c | 132 ++++++++++++++++++++++++++++++++++++++++
>>>   MAINTAINERS             |   1 +
>>>   block/meson.build       |   1 +
>>>   4 files changed, 137 insertions(+), 1 deletion(-)
>>>   create mode 100644 block/snapshot-access.c
>>
>> [...]
>>
>>> diff --git a/block/snapshot-access.c b/block/snapshot-access.c
>>> new file mode 100644
>>> index 0000000000..77b87c1946
>>> --- /dev/null
>>> +++ b/block/snapshot-access.c
>>
>> [...]
>>
>>> +static int snapshot_access_open(BlockDriverState *bs, QDict *options, int flags,
>>> +                                Error **errp)
>>> +{
>>> +    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
>>> +                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
>>> +                               false, errp);
>>> +    if (!bs->file) {
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    bs->total_sectors = bs->file->bs->total_sectors;
>>
>> (If I hadn’t commented on patch 16, I wouldn’t’ve here, but now I might as well...)
>>
>> Instead of just a comment in the commit message (which noone will really read later on), I prefer a TODO or FIXME comment directly here in the code, or even better in the API added in the previous patch (i.e. as part of the comment in the BlockDriver struct), that this will not work for qcow2, i.e. that we will need to inquire the snapshot size from the snapshot-providing node.
>>
>> It’s OK not to implement that now, but I don’t think having a note just in the commit message will help us remember.
> 
> Considering softfreeze is next week, I’d propose I just add the following the patch 10, would that be OK for you?
> 
> (In case it is, I’ll hold off on applying patch 16 for now; it’s a test, so we can easily add it during freeze)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index c43315ae6e..5c8ad9ed78 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -385,6 +385,12 @@ struct BlockDriver {
>        * in generic block-layer: no serializing, no alignment, no tracked
>        * requests. So, block-driver that realizes these APIs is fully responsible
>        * for synchronization between snapshot-access API and normal IO requests.
> +     *
> +     * TODO: To be able to support qcow2's internal snapshots, this API will
> +     * need to be extended to:
> +     * - be able to select a specific snapshot
> +     * - receive the snapshot's actual length (which may differ from bs's
> +     *   length)

Yes, that sounds good

>        */
>       int coroutine_fn (*bdrv_co_preadv_snapshot)(BlockDriverState *bs,
>           int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset);
>
diff mbox series

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ffb7aea2a5..f13b5ff942 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2914,13 +2914,14 @@ 
 # @blkreplay: Since 4.2
 # @compress: Since 5.0
 # @copy-before-write: Since 6.2
+# @snapshot-access: Since 7.0
 #
 # Since: 2.9
 ##
 { 'enum': 'BlockdevDriver',
   'data': [ 'blkdebug', 'blklogwrites', 'blkreplay', 'blkverify', 'bochs',
             'cloop', 'compress', 'copy-before-write', 'copy-on-read', 'dmg',
-            'file', 'ftp', 'ftps', 'gluster',
+            'file', 'snapshot-access', 'ftp', 'ftps', 'gluster',
             {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
             {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' },
             'http', 'https', 'iscsi',
@@ -4267,6 +4268,7 @@ 
       'rbd':        'BlockdevOptionsRbd',
       'replication': { 'type': 'BlockdevOptionsReplication',
                        'if': 'CONFIG_REPLICATION' },
+      'snapshot-access': 'BlockdevOptionsGenericFormat',
       'ssh':        'BlockdevOptionsSsh',
       'throttle':   'BlockdevOptionsThrottle',
       'vdi':        'BlockdevOptionsGenericFormat',
diff --git a/block/snapshot-access.c b/block/snapshot-access.c
new file mode 100644
index 0000000000..77b87c1946
--- /dev/null
+++ b/block/snapshot-access.c
@@ -0,0 +1,132 @@ 
+/*
+ * snapshot_access block driver
+ *
+ * Copyright (c) 2022 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "sysemu/block-backend.h"
+#include "qemu/cutils.h"
+#include "block/block_int.h"
+
+static coroutine_fn int
+snapshot_access_co_preadv_part(BlockDriverState *bs,
+                               int64_t offset, int64_t bytes,
+                               QEMUIOVector *qiov, size_t qiov_offset,
+                               BdrvRequestFlags flags)
+{
+    if (flags) {
+        return -ENOTSUP;
+    }
+
+    return bdrv_co_preadv_snapshot(bs->file, offset, bytes, qiov, qiov_offset);
+}
+
+static int coroutine_fn
+snapshot_access_co_block_status(BlockDriverState *bs,
+                                bool want_zero, int64_t offset,
+                                int64_t bytes, int64_t *pnum,
+                                int64_t *map, BlockDriverState **file)
+{
+    return bdrv_co_snapshot_block_status(bs->file->bs, want_zero, offset,
+                                         bytes, pnum, map, file);
+}
+
+static int coroutine_fn snapshot_access_co_pdiscard(BlockDriverState *bs,
+                                             int64_t offset, int64_t bytes)
+{
+    return bdrv_co_pdiscard_snapshot(bs->file->bs, offset, bytes);
+}
+
+static int coroutine_fn
+snapshot_access_co_pwrite_zeroes(BlockDriverState *bs,
+                                 int64_t offset, int64_t bytes,
+                                 BdrvRequestFlags flags)
+{
+    return -ENOTSUP;
+}
+
+static coroutine_fn int
+snapshot_access_co_pwritev_part(BlockDriverState *bs,
+                                int64_t offset, int64_t bytes,
+                                QEMUIOVector *qiov, size_t qiov_offset,
+                                BdrvRequestFlags flags)
+{
+    return -ENOTSUP;
+}
+
+
+static void snapshot_access_refresh_filename(BlockDriverState *bs)
+{
+    pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
+            bs->file->bs->filename);
+}
+
+static int snapshot_access_open(BlockDriverState *bs, QDict *options, int flags,
+                                Error **errp)
+{
+    bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+                               BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
+                               false, errp);
+    if (!bs->file) {
+        return -EINVAL;
+    }
+
+    bs->total_sectors = bs->file->bs->total_sectors;
+
+    return 0;
+}
+
+static void snapshot_access_child_perm(BlockDriverState *bs, BdrvChild *c,
+                                BdrvChildRole role,
+                                BlockReopenQueue *reopen_queue,
+                                uint64_t perm, uint64_t shared,
+                                uint64_t *nperm, uint64_t *nshared)
+{
+    /*
+     * Currently, we don't need any permissions. If bs->file provides
+     * snapshot-access API, we can use it.
+     */
+    *nperm = 0;
+    *nshared = BLK_PERM_ALL;
+}
+
+BlockDriver bdrv_snapshot_access_drv = {
+    .format_name = "snapshot-access",
+
+    .bdrv_open                  = snapshot_access_open,
+
+    .bdrv_co_preadv_part        = snapshot_access_co_preadv_part,
+    .bdrv_co_pwritev_part       = snapshot_access_co_pwritev_part,
+    .bdrv_co_pwrite_zeroes      = snapshot_access_co_pwrite_zeroes,
+    .bdrv_co_pdiscard           = snapshot_access_co_pdiscard,
+    .bdrv_co_block_status       = snapshot_access_co_block_status,
+
+    .bdrv_refresh_filename      = snapshot_access_refresh_filename,
+
+    .bdrv_child_perm            = snapshot_access_child_perm,
+};
+
+static void snapshot_access_init(void)
+{
+    bdrv_register(&bdrv_snapshot_access_drv);
+}
+
+block_init(snapshot_access_init);
diff --git a/MAINTAINERS b/MAINTAINERS
index 34a36affff..1ccc546cc6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2520,6 +2520,7 @@  F: block/reqlist.c
 F: include/block/reqlist.h
 F: block/copy-before-write.h
 F: block/copy-before-write.c
+F: block/snapshot-access.c
 F: include/block/aio_task.h
 F: block/aio_task.c
 F: util/qemu-co-shared-resource.c
diff --git a/block/meson.build b/block/meson.build
index 41e9cc5dc3..038a95689b 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -34,6 +34,7 @@  block_ss.add(files(
   'raw-format.c',
   'reqlist.c',
   'snapshot.c',
+  'snapshot-access.c',
   'throttle-groups.c',
   'throttle.c',
   'vhdx-endian.c',