diff mbox

[v2,10/11] block: add option 'backing' to -drive options

Message ID 1374054136-28741-11-git-send-email-famz@redhat.com
State New
Headers show

Commit Message

Fam Zheng July 17, 2013, 9:42 a.m. UTC
This option allows overriding backing hd of drive. If the target drive
exists, it's referenced as the backing file and refcount incremented.

Example:
    qemu-system-x86_64 -drive \
        file.filename=foo.qcow2,if=none,id=foo \
        -drive file=bar.qcow2,backing=foo

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

Comments

Paolo Bonzini July 17, 2013, 12:36 p.m. UTC | #1
Il 17/07/2013 11:42, Fam Zheng ha scritto:
> This option allows overriding backing hd of drive. If the target drive
> exists, it's referenced as the backing file and refcount incremented.
> 
> Example:
>     qemu-system-x86_64 -drive \
>         file.filename=foo.qcow2,if=none,id=foo \
>         -drive file=bar.qcow2,backing=foo

I guess this is where we need the soft reference.

This has a _lot_ of potential for misuse, I think Kevin bashed me and
Federico very heavily when we tried to do something similar.

block/backup.c is the right place where we can override the backing hd
of the drive.  Perhaps we can add a way to open a file with
BDRV_O_NO_BACKING from the command line.

Paolo

> Signed-off-by: Fam Zheng <famz@redhat.com>
> ---
>  block.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 147a448..31fab07 100644
> --- a/block.c
> +++ b/block.c
> @@ -1083,12 +1083,30 @@ int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
>  
>      /* If there is a backing file, use it */
>      if ((flags & BDRV_O_NO_BACKING) == 0) {
> -        QDict *backing_options;
> -
> -        extract_subqdict(options, &backing_options, "backing.");
> -        ret = bdrv_open_backing_file(bs, backing_options);
> -        if (ret < 0) {
> -            goto close_and_fail;
> +        const char *backing_drive;
> +        backing_drive = qdict_get_try_str(options, "backing");
> +        if (backing_drive) {
> +            bs->backing_hd = bdrv_find(backing_drive);
> +            if (bs->backing_hd) {
> +                bdrv_ref(bs->backing_hd, false);
> +                pstrcpy(bs->backing_file, sizeof(bs->backing_file),
> +                        bs->backing_hd->filename);
> +                pstrcpy(bs->backing_format, sizeof(bs->backing_format),
> +                        bs->backing_hd->drv->format_name);
> +            } else {
> +                qerror_report(ERROR_CLASS_DEVICE_NOT_FOUND,
> +                        "Can't find drive with name %s\n", backing_drive);
> +                ret = -EINVAL;
> +                goto close_and_fail;
> +            }
> +            qdict_del(options, "backing");
> +        } else {
> +            QDict *backing_options;
> +            extract_subqdict(options, &backing_options, "backing.");
> +            ret = bdrv_open_backing_file(bs, backing_options);
> +            if (ret < 0) {
> +                goto close_and_fail;
> +            }
>          }
>      }
>  
>
Kevin Wolf July 17, 2013, 12:58 p.m. UTC | #2
Am 17.07.2013 um 14:36 hat Paolo Bonzini geschrieben:
> Il 17/07/2013 11:42, Fam Zheng ha scritto:
> > This option allows overriding backing hd of drive. If the target drive
> > exists, it's referenced as the backing file and refcount incremented.
> > 
> > Example:
> >     qemu-system-x86_64 -drive \
> >         file.filename=foo.qcow2,if=none,id=foo \
> >         -drive file=bar.qcow2,backing=foo
> 
> I guess this is where we need the soft reference.
> 
> This has a _lot_ of potential for misuse, I think Kevin bashed me and
> Federico very heavily when we tried to do something similar.

Not sure what exactly I "bashed" you for, but I think this is really the
right thing to do, and it's the general direction we're headed for with
blockdev-add. The user manages his BlockDriverStates and connects them
as he sees fit. The defaults of qemu are only used when the user doesn't
override them (and libvirt is going to override most to use fd passing).

My expectation is, admittedly, that it's hard to get right from where we
stand today, because of the coupling of BlockDriverStates with guest
devices, but with the refcounting patches (which I haven't reviewed
yet), maybe one of the biggest concerns is addressed.

This is basically restarting the discussion where I suggested to give
the targets of a block job names so that they can be reused. It's about
the same kind of misuse that becomes possible and that we need to
protect against.

Kevin
Paolo Bonzini July 17, 2013, 1:13 p.m. UTC | #3
Il 17/07/2013 14:58, Kevin Wolf ha scritto:
> Am 17.07.2013 um 14:36 hat Paolo Bonzini geschrieben:
>> Il 17/07/2013 11:42, Fam Zheng ha scritto:
>>> This option allows overriding backing hd of drive. If the target drive
>>> exists, it's referenced as the backing file and refcount incremented.
>>>
>>> Example:
>>>     qemu-system-x86_64 -drive \
>>>         file.filename=foo.qcow2,if=none,id=foo \
>>>         -drive file=bar.qcow2,backing=foo
>>
>> I guess this is where we need the soft reference.
>>
>> This has a _lot_ of potential for misuse, I think Kevin bashed me and
>> Federico very heavily when we tried to do something similar.
> 
> Not sure what exactly I "bashed" you for

Doing strange things with bs->backing_hd (blkmirror comes to mind).

> This is basically restarting the discussion where I suggested to give
> the targets of a block job names so that they can be reused. It's about
> the same kind of misuse that becomes possible and that we need to
> protect against.

Yes.  But then I'm not sure why we need to rush in blockdev-backup now.
 Instead we can simply make drive-backup optionally give a name to the
target.

I understand this is the right thing to do long term, but pre-opening of
the target is not really needed for fleecing.

Paolo
Kevin Wolf July 17, 2013, 1:48 p.m. UTC | #4
Am 17.07.2013 um 15:13 hat Paolo Bonzini geschrieben:
> Il 17/07/2013 14:58, Kevin Wolf ha scritto:
> > Am 17.07.2013 um 14:36 hat Paolo Bonzini geschrieben:
> >> Il 17/07/2013 11:42, Fam Zheng ha scritto:
> >>> This option allows overriding backing hd of drive. If the target drive
> >>> exists, it's referenced as the backing file and refcount incremented.
> >>>
> >>> Example:
> >>>     qemu-system-x86_64 -drive \
> >>>         file.filename=foo.qcow2,if=none,id=foo \
> >>>         -drive file=bar.qcow2,backing=foo
> >>
> >> I guess this is where we need the soft reference.
> >>
> >> This has a _lot_ of potential for misuse, I think Kevin bashed me and
> >> Federico very heavily when we tried to do something similar.
> > 
> > Not sure what exactly I "bashed" you for
> 
> Doing strange things with bs->backing_hd (blkmirror comes to mind).
> 
> > This is basically restarting the discussion where I suggested to give
> > the targets of a block job names so that they can be reused. It's about
> > the same kind of misuse that becomes possible and that we need to
> > protect against.
> 
> Yes.  But then I'm not sure why we need to rush in blockdev-backup now.
>  Instead we can simply make drive-backup optionally give a name to the
> target.
> 
> I understand this is the right thing to do long term, but pre-opening of
> the target is not really needed for fleecing.

So for how much longer should we plan to procrastinate? (I know, not an
entirely fair question, but we have to make the step at some point)

I guess we can give a name to the target, and we can make drive-backup
automatically connect the target with the original as its backing file
(still needs the refcounting, by the way). But is giving a name to the
target not enough to allow "interesting" things to be done? I don't
remember the details from the mirroring discussion, but it seems it were
enough that you didn't want to do it.

And we'll want to reference existing BDSes as backing/protocol files in
blockdev-add soon anyway, so if we decide against it here, it's just
moving from Fam's to-do list to mine...

So no, I'm not totally comfortable with allowing it, but not allowing it
isn't really an option either.

Kevin
Paolo Bonzini July 17, 2013, 2:16 p.m. UTC | #5
Il 17/07/2013 15:48, Kevin Wolf ha scritto:
>> I understand this is the right thing to do long term, but pre-opening of
>> the target is not really needed for fleecing.
> 
> So for how much longer should we plan to procrastinate? (I know, not an
> entirely fair question, but we have to make the step at some point)

If we bring it up during soft freeze, we will procrastinate it a lot. :)
 If we bring it up at the beginning of a release cycle, the wait could
be as short as 1 month...

> I guess we can give a name to the target, and we can make drive-backup
> automatically connect the target with the original as its backing file
> (still needs the refcounting, by the way).

No, it doesn't need the refcounting (see my reply to the cover letter).
 In his next submission of drive-backup sync modes, Ian is already going
to handle the automatic connection of the target with the original.

> But is giving a name to the
> target not enough to allow "interesting" things to be done? I don't
> remember the details from the mirroring discussion, but it seems it were
> enough that you didn't want to do it.

Yes, but this time we have to bite the bullet on that one at least,
because we have no other choice (we want to do at least one
"interesting" thing, namely connect to it with the NBD server).

> And we'll want to reference existing BDSes as backing/protocol files in
> blockdev-add soon anyway, so if we decide against it here, it's just
> moving from Fam's to-do list to mine...

We can reconsider these very patches in 1 month.  It's just the timing,
combined with the fact that this is not necessary for fleecing, that I'm
uncomfortable with.

Paolo

> So no, I'm not totally comfortable with allowing it, but not allowing it
> isn't really an option either.
> 
> Kevin
>
Kevin Wolf July 17, 2013, 3:09 p.m. UTC | #6
Am 17.07.2013 um 16:16 hat Paolo Bonzini geschrieben:
> Il 17/07/2013 15:48, Kevin Wolf ha scritto:
> >> I understand this is the right thing to do long term, but pre-opening of
> >> the target is not really needed for fleecing.
> > 
> > So for how much longer should we plan to procrastinate? (I know, not an
> > entirely fair question, but we have to make the step at some point)
> 
> If we bring it up during soft freeze, we will procrastinate it a lot. :)
>  If we bring it up at the beginning of a release cycle, the wait could
> be as short as 1 month...
> 
> > And we'll want to reference existing BDSes as backing/protocol files in
> > blockdev-add soon anyway, so if we decide against it here, it's just
> > moving from Fam's to-do list to mine...
> 
> We can reconsider these very patches in 1 month.  It's just the timing,
> combined with the fact that this is not necessary for fleecing, that I'm
> uncomfortable with.

Okay, I see where this is going. Let me reinforce one fundamental policy
that you may not like. I've rejected patches based on it before and I'll
likely have to do it again in the future. This is it:

    I'm not going to compromise on upstream APIs to accommodate
    downstream schedules.

If doing things properly means moving fleecing to 1.7, so be it. There's
no pressure to have it in upstream 1.6. It might mean downstream patches
for some distro, but that's not a valid point in upstream design
discussions.

Now is the right time to discuss design changes for 1.7, so that as soon
as block-next starts to come to life (i.e. beginning of August), we can
start with implementing this at the earliest possible point in the 1.7
release cycle.

> > I guess we can give a name to the target, and we can make drive-backup
> > automatically connect the target with the original as its backing file
> > (still needs the refcounting, by the way).
> 
> No, it doesn't need the refcounting (see my reply to the cover letter).
>  In his next submission of drive-backup sync modes, Ian is already going
> to handle the automatic connection of the target with the original.

Okay, I'll have a look, but I can't imagine how it work without
refcounting. As soon as it has a name, you can attach the target to a
guest, nbd server, start block jobs and do all kinds of fun with it so
that taking it away when the source goes away becomes problematic. But
I'll have a look at the patches and hope that the commit messages
explain the details.

> > But is giving a name to the
> > target not enough to allow "interesting" things to be done? I don't
> > remember the details from the mirroring discussion, but it seems it were
> > enough that you didn't want to do it.
> 
> Yes, but this time we have to bite the bullet on that one at least,
> because we have no other choice (we want to do at least one
> "interesting" thing, namely connect to it with the NBD server).

Yes, like I said, we might not feel comfortable with enabling these
cases, but not enabling them isn't an option either. So now is the time
to do the real thing.

Kevin
Paolo Bonzini July 17, 2013, 3:23 p.m. UTC | #7
Il 17/07/2013 17:09, Kevin Wolf ha scritto:
> Am 17.07.2013 um 16:16 hat Paolo Bonzini geschrieben:
>> Il 17/07/2013 15:48, Kevin Wolf ha scritto:
>>>> I understand this is the right thing to do long term, but pre-opening of
>>>> the target is not really needed for fleecing.
>>>
>>> So for how much longer should we plan to procrastinate? (I know, not an
>>> entirely fair question, but we have to make the step at some point)
>>
>> If we bring it up during soft freeze, we will procrastinate it a lot. :)
>>  If we bring it up at the beginning of a release cycle, the wait could
>> be as short as 1 month...
>>
>>> And we'll want to reference existing BDSes as backing/protocol files in
>>> blockdev-add soon anyway, so if we decide against it here, it's just
>>> moving from Fam's to-do list to mine...
>>
>> We can reconsider these very patches in 1 month.  It's just the timing,
>> combined with the fact that this is not necessary for fleecing, that I'm
>> uncomfortable with.
> 
> Okay, I see where this is going. Let me reinforce one fundamental policy
> that you may not like.

No, I like it actually.  In no way I want to force fleecing in 1.6.  And
this is by no means about downstream schedules, I don't care about them
(not even sure which downstream you're referring to :)).

However, this API seems (a) overengineered for the purposes of fleecing;
(b) ignoring what Ian is doing to support sync modes in block/backup.c.

>>> I guess we can give a name to the target, and we can make drive-backup
>>> automatically connect the target with the original as its backing file
>>> (still needs the refcounting, by the way).
>>
>> No, it doesn't need the refcounting (see my reply to the cover letter).
>>  In his next submission of drive-backup sync modes, Ian is already going
>> to handle the automatic connection of the target with the original.
> 
> Okay, I'll have a look, but I can't imagine how it work without
> refcounting.  As soon as it has a name, you can attach the target to a
> guest, nbd server, start block jobs and do all kinds of fun with it so
> that taking it away when the source goes away becomes problematic.

- Attach the target to a guest: yes, though I wouldn't do it because the
backup job can bdrv_close it under the guest's feet.  Deletion would be
protected by DriveInfo's refcount.

- NBD server: handles closing just fine

- start block jobs: job will be cancelled as soon as the backup job
bdrv_close's the target

- all kinds of fun with it so that taking it away when the source goes
away becomes problematic: taking away the source takes away the target
too, so it is in the same ballpark as attaching the target to a guest
(i.e. be prepared to have I/O that starts to fail, and remember to use
drive_get/put_ref).

>> Yes, but this time we have to bite the bullet on that one at least,
>> because we have no other choice (we want to do at least one
>> "interesting" thing, namely connect to it with the NBD server).
> 
> Yes, like I said, we might not feel comfortable with enabling these
> cases, but not enabling them isn't an option either. So now is the time
> to do the real thing.

I agree and understand.  At the same time, we already have an (ugly)
mechanism for "soft" reference counts, namely DriveInfo.  So I don't
care about the order between "move refcnt from DriveInfo to BDS" and
"support fleecing with the backup job", but they should be two
completely separate series.  Also, the right API shouldn't be influenced
by which part goes in first.

Paolo
Fam Zheng July 22, 2013, 6:07 a.m. UTC | #8
On Wed, 07/17 14:36, Paolo Bonzini wrote:
> Il 17/07/2013 11:42, Fam Zheng ha scritto:
> > This option allows overriding backing hd of drive. If the target drive
> > exists, it's referenced as the backing file and refcount incremented.
> > 
> > Example:
> >     qemu-system-x86_64 -drive \
> >         file.filename=foo.qcow2,if=none,id=foo \
> >         -drive file=bar.qcow2,backing=foo
> 
> I guess this is where we need the soft reference.
> 
> This has a _lot_ of potential for misuse, I think Kevin bashed me and
> Federico very heavily when we tried to do something similar.
> 
> block/backup.c is the right place where we can override the backing hd
> of the drive.  Perhaps we can add a way to open a file with
> BDRV_O_NO_BACKING from the command line.

OK. If we get the override in block/backup.c, the only thing we need is
naming the target so we can add it to nbd server. If refcounting,
overriding backing with option and blockdev-backup are still good to
have, I can split them and targeting to future releases.
Ian Main July 23, 2013, 7:57 p.m. UTC | #9
On Mon, Jul 22, 2013 at 02:07:15PM +0800, Fam Zheng wrote:
> On Wed, 07/17 14:36, Paolo Bonzini wrote:
> > Il 17/07/2013 11:42, Fam Zheng ha scritto:
> > > This option allows overriding backing hd of drive. If the target drive
> > > exists, it's referenced as the backing file and refcount incremented.
> > > 
> > > Example:
> > >     qemu-system-x86_64 -drive \
> > >         file.filename=foo.qcow2,if=none,id=foo \
> > >         -drive file=bar.qcow2,backing=foo
> > 
> > I guess this is where we need the soft reference.
> > 
> > This has a _lot_ of potential for misuse, I think Kevin bashed me and
> > Federico very heavily when we tried to do something similar.
> > 
> > block/backup.c is the right place where we can override the backing hd
> > of the drive.  Perhaps we can add a way to open a file with
> > BDRV_O_NO_BACKING from the command line.
> 
> OK. If we get the override in block/backup.c, the only thing we need is
> naming the target so we can add it to nbd server. If refcounting,
> overriding backing with option and blockdev-backup are still good to
> have, I can split them and targeting to future releases.

In my latest patchset I have attempted to override the backing hd of the
target.  It is a little over my head however so any input there would be
appreciated.  It seems to me like applying that patch would get us a
working baseline and then you can add the more fancy stuff on top of
that?  That sound reasonable?

	Ian
Ian Main July 23, 2013, 8:07 p.m. UTC | #10
On Wed, Jul 17, 2013 at 05:09:05PM +0200, Kevin Wolf wrote:
> Am 17.07.2013 um 16:16 hat Paolo Bonzini geschrieben:
> > Il 17/07/2013 15:48, Kevin Wolf ha scritto:
> > >> I understand this is the right thing to do long term, but pre-opening of
> > >> the target is not really needed for fleecing.
> > > 
> > > So for how much longer should we plan to procrastinate? (I know, not an
> > > entirely fair question, but we have to make the step at some point)
> > 
> > If we bring it up during soft freeze, we will procrastinate it a lot. :)
> >  If we bring it up at the beginning of a release cycle, the wait could
> > be as short as 1 month...
> > 
> > > And we'll want to reference existing BDSes as backing/protocol files in
> > > blockdev-add soon anyway, so if we decide against it here, it's just
> > > moving from Fam's to-do list to mine...
> > 
> > We can reconsider these very patches in 1 month.  It's just the timing,
> > combined with the fact that this is not necessary for fleecing, that I'm
> > uncomfortable with.
> 
> Okay, I see where this is going. Let me reinforce one fundamental policy
> that you may not like. I've rejected patches based on it before and I'll
> likely have to do it again in the future. This is it:
> 
>     I'm not going to compromise on upstream APIs to accommodate
>     downstream schedules.
> 
> If doing things properly means moving fleecing to 1.7, so be it. There's
> no pressure to have it in upstream 1.6. It might mean downstream patches
> for some distro, but that's not a valid point in upstream design
> discussions.

I'll just throw out there that *I'm* feeling pressure for 1.6, but at
the same time I understand your concerns.  Some feel that independent of
distro concerns this is a feature gap for KVM/qemu.

It seems to me a practical approach of taking what is needed to get this
working while also putting out patches for doing it right would work out
well in the end.

Anyway, just my 2c.  I am just helping out and I know you guys have to
deal with the long term effects so I respect your decisions. 

	Ian

> Now is the right time to discuss design changes for 1.7, so that as soon
> as block-next starts to come to life (i.e. beginning of August), we can
> start with implementing this at the earliest possible point in the 1.7
> release cycle.
> 
> > > I guess we can give a name to the target, and we can make drive-backup
> > > automatically connect the target with the original as its backing file
> > > (still needs the refcounting, by the way).
> > 
> > No, it doesn't need the refcounting (see my reply to the cover letter).
> >  In his next submission of drive-backup sync modes, Ian is already going
> > to handle the automatic connection of the target with the original.
> 
> Okay, I'll have a look, but I can't imagine how it work without
> refcounting. As soon as it has a name, you can attach the target to a
> guest, nbd server, start block jobs and do all kinds of fun with it so
> that taking it away when the source goes away becomes problematic. But
> I'll have a look at the patches and hope that the commit messages
> explain the details.
> 
> > > But is giving a name to the
> > > target not enough to allow "interesting" things to be done? I don't
> > > remember the details from the mirroring discussion, but it seems it were
> > > enough that you didn't want to do it.
> > 
> > Yes, but this time we have to bite the bullet on that one at least,
> > because we have no other choice (we want to do at least one
> > "interesting" thing, namely connect to it with the NBD server).
> 
> Yes, like I said, we might not feel comfortable with enabling these
> cases, but not enabling them isn't an option either. So now is the time
> to do the real thing.
> 
> Kevin
diff mbox

Patch

diff --git a/block.c b/block.c
index 147a448..31fab07 100644
--- a/block.c
+++ b/block.c
@@ -1083,12 +1083,30 @@  int bdrv_open(BlockDriverState *bs, const char *filename, QDict *options,
 
     /* If there is a backing file, use it */
     if ((flags & BDRV_O_NO_BACKING) == 0) {
-        QDict *backing_options;
-
-        extract_subqdict(options, &backing_options, "backing.");
-        ret = bdrv_open_backing_file(bs, backing_options);
-        if (ret < 0) {
-            goto close_and_fail;
+        const char *backing_drive;
+        backing_drive = qdict_get_try_str(options, "backing");
+        if (backing_drive) {
+            bs->backing_hd = bdrv_find(backing_drive);
+            if (bs->backing_hd) {
+                bdrv_ref(bs->backing_hd, false);
+                pstrcpy(bs->backing_file, sizeof(bs->backing_file),
+                        bs->backing_hd->filename);
+                pstrcpy(bs->backing_format, sizeof(bs->backing_format),
+                        bs->backing_hd->drv->format_name);
+            } else {
+                qerror_report(ERROR_CLASS_DEVICE_NOT_FOUND,
+                        "Can't find drive with name %s\n", backing_drive);
+                ret = -EINVAL;
+                goto close_and_fail;
+            }
+            qdict_del(options, "backing");
+        } else {
+            QDict *backing_options;
+            extract_subqdict(options, &backing_options, "backing.");
+            ret = bdrv_open_backing_file(bs, backing_options);
+            if (ret < 0) {
+                goto close_and_fail;
+            }
         }
     }